Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Werkzeug useragent parsing is off for Motorola One Macro Android phone #1923

Closed
mhelmetag opened this issue Sep 3, 2020 · 8 comments · Fixed by #1989
Closed

Werkzeug useragent parsing is off for Motorola One Macro Android phone #1923

mhelmetag opened this issue Sep 3, 2020 · 8 comments · Fixed by #1989

Comments

@mhelmetag
Copy link

mhelmetag commented Sep 3, 2020

I noticed this in a flask app but that uses werkzeug to parse useragent info and saw the same behavior in both the flask request object and standalone werkzeug useragent parser.

It looks like the presence of the substring "mac" in "macro" (the device model) is throwing it off. However, I feel like that should be overridden by the clear "Linux" and/or "Android 9".

>>> from werkzeug.useragents import UserAgent
>>> UserAgent("Mozilla/5.0 (Linux; Android 9; motorola one macro) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.111 Mobile Safari/537.36").platform
'macos'

I would expect the platform for this useragent string to be "android".

Environment: Linux

  • Python version: 3.7.3
  • Werkzeug version: 1.01
@jcorvino
Copy link

jcorvino commented Oct 4, 2020

The issue is that UserAgentParser finds the platform by naively searching for keywords.

Here's the relevant code snippet:

    platforms: Any = (
        (" cros ", "chromeos"),
        ("iphone|ios", "iphone"),
        ("ipad", "ipad"),
        (r"darwin|mac|os\s*x", "macos"),
        ("win", "windows"),
        (r"android", "android"),
        ...

In your case, "darwin|mac|os\s*x" finds a match before "android" is searched. From what I've seen of user agent headers, they can vary wildly. Perhaps someone who has more experience can suggest better regex patterns. I'm concerned that our simple keyword search is inadequate and that there could be more unnoticed bugs like yours.

@JavaScriptDude
Copy link

JavaScriptDude commented Dec 25, 2020

Rather than doing this in-house, maybe you can consider adding dependency on ua-parser library. I did a review of the project and its based on a pretty popular cross language core that has a very large test set. The python version supports a very wide array of python versions and the code itself should be pretty stable.

@mhelmetag
Copy link
Author

I'll offer up a PR unless you want to take a crack at it. Seems like a good solution.

@JavaScriptDude
Copy link

I would like to take a crack at it. Thanks!

@mhelmetag
Copy link
Author

Rad! Thanks!

@davidism
Copy link
Member

I'm happy to consider a PR that adjusts the matching rules to fix how this agent is matched, but we won't add a dependency for this.

@jcorvino
Copy link

The ua-parser package finds the appropriate user agent by using regex patterns (similar to our approach, but obviously more thorough). Their regex patterns take up ~5500 lines of code.

I think it's clear that we can't offer a complete UserAgentParser without adding ua-parser as a dependency or including parts of their code into Werkzeug.

Alternatively, we can add a message in the readme and/or docs to explain that our UserAgentParser is basic and won't work in many scenarios.

@tonydelanuez
Copy link
Contributor

I've opened #1989 to address this platform string but it is far from a complete fix for the rest of the conversation here.

@davidism davidism added this to the 2.0.0 milestone Jan 10, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants