Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation: explain how the BeautifulSoup parser can be specified #433

Open
scorchio opened this issue Feb 20, 2024 · 0 comments
Open

Comments

@scorchio
Copy link

In the following part of the documentation,

:param soup_config: Configuration passed to BeautifulSoup to affect
the way HTML is parsed. Defaults to ``{'features': 'lxml'}``.
If overridden, it is highly recommended to `specify a parser
<https://www.crummy.com/software/BeautifulSoup/bs4/doc/#specifying-the-parser-to-use>`__.
Otherwise, BeautifulSoup will issue a warning and pick one for
you, but the parser it chooses may be different on different
machines.

...it's mentioned that it's recommended to specify a parser, but looking at BeautifulSoup's documentation, the connection between the features and how the parser is specified is not evident because the BS doc never uses the features keyword, rather just tosses it in as the second argument:
image
(https://beautiful-soup-4.readthedocs.io/en/latest/#differences-between-parsers)

BS4's documentation also doesn't include an API reference that would list the different parameters, it only provides a high-level intro.

I would recommend modifying the MechanicalSoup docs to explain the parser setting more explicitly, like how BS4 does it in code:

:param features: Desirable features of the parser to be
         used. This may be the name of a specific parser ("lxml",
         "lxml-xml", "html.parser", or "html5lib") or it may be the
         type of markup to be used ("html", "html5", "xml"). It's
         recommended that you name a specific parser, so that
         Beautiful Soup gives you the same results across platforms
         and virtual environments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant