Tutorial:Replacing a string for every page in a category
Suppose you have a category and wish to replace a given piece of text by a new, different piece of text in every page in that category. For example, suppose you were renaming the category "Greek presidants" to the correctly spelled "Greek presidents", and needed to replace [[Category:Greek presidants]]
on every page in that category with [[Category:Greek presidents]]
. The following code does this:
import mwclient
site = mwclient.Site(('https', 'en.wikipedia.org'))
site.login('username', 'password')
for page in site.Categories['Greek presidants']:
print page.page_title
text = page.text()
text = text.replace('[[Category:Greek presidants]]', '[[Category:Greek presidents]]')
page.save(text, summary='Renaming category Greek presidants to Greek presidents')
This basic text replacement suffices for many simple tasks. For more complex replacement tasks, Python regular expressions from the module "re" are useful. For example, suppose we also wanted to replace lowercase versions like [[Category:greek presidants]]
, or versions with the underscore [[Category:Greek_presidants]]
. We add the line import re
at the top and change the text.replace
line to read:
text = re.sub(r'(?i)\[\[Category:Greek[ _]presidants\]\]'), '[[Category:Greek presidents]]', text)
The (?i)
makes it case-insensitive. The [
and ]
brackets must be escaped with \
inside a regular expression. The [ _]
is a character set matching either space or underscore. Note that brackets are not escaped in the new text.
What if we want to replace text on every page in an entire category tree? For this we use a simple recursive function which calls itself on sub-categories, as in this example, which replaces all "cc-by-2.0" license tags with "cc0" in a given category tree on Commons:
import mwclient
def replace_in_category(category):
print 'Replacing in category ' + category.[[Page.page_title|page_title]]
for page in category:
if page.namespace == 14: # 14 is the category namespace
replace_in_category(page)
else:
print page.page_title
text = page.text()
text = text.replace('{{cc-by-2.0}}', '{{cc0}}')
page.save(text, summary='Replacing license tag ({{cc-by-2.0}} -> {{cc0}})')
print 'Done with category ' + cat.page_title
site = mwclient.Site(('https', 'commons.wikimedia.org'))
site.login('username', 'password')
replace_in_cat(site.Categories['Root category'])
This page was originally imported from the old mwclient wiki at SourceForge. The imported version was dated from 00:59, 18 March 2012, and its only editor was Derrickcoetzee (@dcoetzee).
- Querying Wikipedia with mwclient
- Replacing a string for every page in a category
- Creating a page listing all pages in a category
Note: the red links below are pages yet to be created. Feel free to add them!