Skip to content

How can I get all images? #296

Answered by earwig
dpriskorn asked this question in Q&A
Discussion options

You must be logged in to vote

Here's a start:

def is_file(link):
    return link.title.lstrip().lower().startswith(('file:', 'image:'))

files = mwparserfromhell.parse(text).filter_wikilinks(matches=is_file)
len(files)

There are some considerations:

  • This doesn't consider images that are transcluded through a template without the [[bracket]] syntax. A typical example would be an infobox ({{Infobox person |image=Example.jpg }} includes [[File:Example.jpg]].)
    • If you want to include these, you might need to expand templates or look at the generated HTML.
    • You could also apply heuristics to template parameter values (e.g. if a template with a name containing "infobox" has a parameter containing "image" whose value is non-…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by dpriskorn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants