Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add another parsing method #53

Open
MaurizioRicci opened this issue Sep 23, 2018 · 10 comments
Open

Add another parsing method #53

MaurizioRicci opened this issue Sep 23, 2018 · 10 comments

Comments

@MaurizioRicci
Copy link

MaurizioRicci commented Sep 23, 2018

Often desired information in a web site, are grouped under class name. For example in some site a list of torrent are a list of div with a particular class. So in some case it would be better to find elements by ID, ClassName or by Type instead of parsing page with the standard html parser and using various flags or variable to remember the state during parsing.

The question is, what about adding another parsing method? Something like jQuery, maybe pyquery or BeautifulSoup:
https://pythonhosted.org/pyquery/
https://www.crummy.com/software/BeautifulSoup/bs4/doc/

What do you think? @Chocobo1 @sledgehammer999 @Piccirello @zeule @ngosang @hannsen

@hannsen
Copy link

hannsen commented Sep 24, 2018

What do you propose, that they distribute BeautifulSoup with Qbit?

@MaurizioRicci
Copy link
Author

@hannsen Yes I was thinking something like that. I don't think that it will require too effort and it may help a lot people. What do you think?

@ngosang
Copy link
Member

ngosang commented Sep 24, 2018

I think beautifulsoup4 is perfect for this, but we have to include the package in the qbittorrent repository so the user doesn't have to install external packages. I can do it but I think @sledgehammer999 will oppose...

@MaurizioRicci
Copy link
Author

I understand, never mind. Mine was just a suggestion

@nindogo
Copy link
Contributor

nindogo commented Dec 1, 2018

Often desired information in a web site, are grouped under class name. For example in some site a list of torrent are a list of div with a particular class. So in some case it would be better to find elements by ID, ClassName or by Type instead of parsing page with the standard html parser and using various flags or variable to remember the state during parsing.

Could you please share a couple of sites with this issue?

nindogo

@MaurizioRicci
Copy link
Author

MaurizioRicci commented Dec 1, 2018 via email

@nindogo
Copy link
Contributor

nindogo commented Dec 2, 2018

Hi,

I have found for many of those, the re module can usually help.

@hannsen
Copy link

hannsen commented Dec 2, 2018

yeah I used a lot regex, too. It's faster than parsing but also not very readable, but neither is the standard html parser

@imDMG
Copy link

imDMG commented Jan 26, 2019

Right now i write (trying) a module (wrapper) for HTMLParser: https://github.com/imDMG/HTMLSelector
For now working some basic operations.

@MaurizioRicci
Copy link
Author

i like your module, it looks like pyquery or similar plus it's based on HTMLparser, wich is a standars module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants