Skip to content

How can I count the words on the page as the user of wikipedia sees it in the browser? #297

Answered by earwig
dpriskorn asked this question in Q&A
Discussion options

You must be logged in to vote

Since you want to expand templates, this might be best done without mwparserfromhell. You can use the action=render endpoint to get the expanded HTML for a live page (example) then beautifulsoup to extract the text:

soup = bs4.BeautifulSoup(text)
count = len(' '.join(soup.stripped_strings)

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@dpriskorn
Comment options

Answer selected by dpriskorn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants