Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BeautifulSoup prints a GuessedAtParserWarning #24

Open
NelsonMinar opened this issue Jul 14, 2022 · 1 comment
Open

BeautifulSoup prints a GuessedAtParserWarning #24

NelsonMinar opened this issue Jul 14, 2022 · 1 comment

Comments

@NelsonMinar
Copy link

Running webpreview in default configuration yields this error

webpreview/previews.py:51: GuessedAtParserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 51 of the file /home/nelson/src/linkblog/pinboard-to-static/venv/lib/python3.9/site-packages/webpreview/previews.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor.

Presumably a change in BeautifulSoup since the last webpreview release. It works anyway, just annoying. Adding the suggested features argument does make the warning go away but raises the question of whether other parsers should be configurable.

@vduseev
Copy link
Collaborator

vduseev commented Aug 12, 2022

Hi @NelsonMinar, thank you for highlighting this issue.

This error should be gone starting with version 1.7.2, because BeautifulSoup is now initialized with a default parser ("html.parser") unless a different one is specified.

def initialize(
url: str,
timeout: Optional[str] = None,
headers: Optional[Dict[str, str]] = None,
content: Optional[str] = None,
target_attribute: Optional[str] = None,
properties: Optional[List[str]] = None,
parser: str = "html.parser",
) -> Tuple[str, str, BeautifulSoup]:

soup = BeautifulSoup(content, parser)

This still allows users to specify the parser of their own choice, such as slow but accurate "html5lib".

Let me know please, if the issue is gone for you in version 1.7.2 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants