Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] - Add support for parsing HTML #254

Open
nenb opened this issue Dec 17, 2023 · 0 comments
Open

[ENH] - Add support for parsing HTML #254

nenb opened this issue Dec 17, 2023 · 0 comments
Labels
type: enhancement 💅 New feature or request

Comments

@nenb
Copy link
Contributor

nenb commented Dec 17, 2023

Feature description

HTML tags encode their own kind of information about a document, which can help context generation. In some cases it may be preferable to use the HTML sources than other available sources for context generation (eg ar5iv-HTML vs arXiv-PDF, as PDF is challenging to parse accurately).

We should add support for parsing HTML documents out of the box.

Value and/or benefit

  • It opens up a wide-range of documents via the web.
  • It offers the possibility of more accurate context generation in certain situations.
  • It means that context can be easily generated from real-time sources (eg news websites) or social media (eg hackernews)

Anything else?

No response

@nenb nenb added the type: enhancement 💅 New feature or request label Dec 17, 2023
@nenb nenb changed the title Add support for parsing HTML [ENH] - Add support for parsing HTML Dec 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement 💅 New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant