Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assorted and unsorted Wikia failures #362

Open
nemobis opened this issue Feb 10, 2020 · 5 comments
Open

Assorted and unsorted Wikia failures #362

nemobis opened this issue Feb 10, 2020 · 5 comments
Labels
Milestone

Comments

@nemobis
Copy link
Member

nemobis commented Feb 10, 2020

Stuff to check, running with --xml --xmlrevisions (and no --images) at /84444bee36212b8a38d92abd8ed0ebf3be828e1f :
2020-02-10_Wikia_errors.log

@nemobis nemobis added this to the 0.4 milestone Feb 10, 2020
@nemobis
Copy link
Member Author

nemobis commented Feb 10, 2020

Ah and of course the .desc pages for image downloads must be broken if exportnowrap doesn't work.

@nemobis
Copy link
Member Author

nemobis commented Feb 19, 2020

Some 75 wikis have a "Wikia:About" page which is hijacked by a redirect to www.fandom.com:

$ grep "^Wikia:About" */*titles.txt
2016_dem_primary_fraudfandomcom-20200214-wikidump/2016_dem_primary_fraudfandomcom-20200214-titles.txt:Wikia:About
adventuresinbabysittingmoviefandomcom-20200214-wikidump/adventuresinbabysittingmoviefandomcom-20200214-titles.txt:Wikia:About
alteredstatesfandomcom-20200213-wikidump/alteredstatesfandomcom-20200213-titles.txt:Wikia:About
aotshipsfandomcom-20200214-wikidump/aotshipsfandomcom-20200214-titles.txt:Wikia:About
awzomtheawesomefandomcom-20200214-wikidump/awzomtheawesomefandomcom-20200214-titles.txt:Wikia:About
bestmlppmvsfandomcom-20200213-wikidump/bestmlppmvsfandomcom-20200213-titles.txt:Wikia:About
bigbrotherdisneyfandomcom-20200214-wikidump/bigbrotherdisneyfandomcom-20200214-titles.txt:Wikia:About
bournemouthdow2fandomcom-20200214-wikidump/bournemouthdow2fandomcom-20200214-titles.txt:Wikia:About
buniefandomcom-20200214-wikidump/buniefandomcom-20200214-titles.txt:Wikia:About
closedfandomcom-20200214-wikidump/closedfandomcom-20200214-titles.txt:Wikia:About
clubpenguin_pookiefandomcom-20200213-wikidump/clubpenguin_pookiefandomcom-20200213-titles.txt:Wikia:About
connectfandomcom-20200213-wikidump/connectfandomcom-20200213-titles.txt:Wikia:About
crafterwarfandomcom-20200215-wikidump/crafterwarfandomcom-20200215-titles.txt:Wikia:About
crimson_skies_ttsfandomcom-20200216-wikidump/crimson_skies_ttsfandomcom-20200216-titles.txt:Wikia:About
dannyphantomroleplayfandomcom-20200213-wikidump/dannyphantomroleplayfandomcom-20200213-titles.txt:Wikia:About
deadly_battlefandomcom-20200213-wikidump/deadly_battlefandomcom-20200213-titles.txt:Wikia:About
dodgerfilms_softballfandomcom-20200214-wikidump/dodgerfilms_softballfandomcom-20200214-titles.txt:Wikia:About
esportdanmarkfandomcom-20200214-wikidump/esportdanmarkfandomcom-20200214-titles.txt:Wikia:About
exoticbirdsfandomcom-20200213-wikidump/exoticbirdsfandomcom-20200213-titles.txt:Wikia:About
fanmade_works_v2fandomcom-20200214-wikidump/fanmade_works_v2fandomcom-20200214-titles.txt:Wikia:About
fictional_kids_mealsfandomcom-20200214-wikidump/fictional_kids_mealsfandomcom-20200214-titles.txt:Wikia:About
firstassaultonlinefandomcom-20200214-wikidump/firstassaultonlinefandomcom-20200214-titles.txt:Wikia:About
fishdom_deep_divefandomcom-20200215-wikidump/fishdom_deep_divefandomcom-20200215-titles.txt:Wikia:About
freerunningmovesfandomcom-20200215-wikidump/freerunningmovesfandomcom-20200215-titles.txt:Wikia:About
glh_campfandomcom-20200215-wikidump/glh_campfandomcom-20200215-titles.txt:Wikia:About
happy_endingsfandomcom-20200214-wikidump/happy_endingsfandomcom-20200214-titles.txt:Wikia:About
holy_cross_btt10fandomcom-20200215-wikidump/holy_cross_btt10fandomcom-20200215-titles.txt:Wikia:About
how_to_be_like_ray_donovanfandomcom-20200216-wikidump/how_to_be_like_ray_donovanfandomcom-20200216-titles.txt:Wikia:About
humblecannonsfandomcom-20200215-wikidump/humblecannonsfandomcom-20200215-titles.txt:Wikia:About
infinitejestfandomcom-20200214-wikidump/infinitejestfandomcom-20200214-titles.txt:Wikia:About
jg3399fandomcom-20200215-wikidump/jg3399fandomcom-20200215-titles.txt:Wikia:About
jordantlove_the_free_uk_sheffieldepiafandomcom-20200214-wikidump/jordantlove_the_free_uk_sheffieldepiafandomcom-20200214-titles.txt:Wikia:About
jurasjurastestfandomcom-20200213-wikidump/jurasjurastestfandomcom-20200213-titles.txt:Wikia:About
liberatorsespafandomcom-20200214-wikidump/liberatorsespafandomcom-20200214-titles.txt:Wikia:About
mariomario87796fandomcom-20200213-wikidump/mariomario87796fandomcom-20200213-titles.txt:Wikia:About
masinatfandomcom-20200214-wikidump/masinatfandomcom-20200214-titles.txt:Wikia:About
minecraftstorymodeshipsfandomcom-20200214-wikidump/minecraftstorymodeshipsfandomcom-20200214-titles.txt:Wikia:About
mtvblorefandomcom-20200214-wikidump/mtvblorefandomcom-20200214-titles.txt:Wikia:About
my_little_pony_creepypastafandomcom-20200213-wikidump/my_little_pony_creepypastafandomcom-20200213-titles.txt:Wikia:About
newbisfandomcom-20200214-wikidump/newbisfandomcom-20200214-titles.txt:Wikia:About
ningyofandomcom-20200214-wikidump/ningyofandomcom-20200214-titles.txt:Wikia:About
nuclear_throne_ultrafandomcom-20200213-wikidump/nuclear_throne_ultrafandomcom-20200213-titles.txt:Wikia:About
nuke73fandomcom-20200213-wikidump/nuke73fandomcom-20200213-titles.txt:Wikia:About
object_hotnessfandomcom-20200214-wikidump/object_hotnessfandomcom-20200214-titles.txt:Wikia:About
orakefandomcom-20200215-wikidump/orakefandomcom-20200215-titles.txt:Wikia:About
osawarifandomcom-20200213-wikidump/osawarifandomcom-20200213-titles.txt:Wikia:About
popee_the_performer_fanonfandomcom-20200215-wikidump/popee_the_performer_fanonfandomcom-20200215-titles.txt:Wikia:About
portal_fan_ideasfandomcom-20200214-wikidump/portal_fan_ideasfandomcom-20200214-titles.txt:Wikia:About
power_mom_databasefandomcom-20200216-wikidump/power_mom_databasefandomcom-20200216-titles.txt:Wikia:About
ppgdesignfandomcom-20200214-wikidump/ppgdesignfandomcom-20200214-titles.txt:Wikia:About
quernfandomcom-20200215-wikidump/quernfandomcom-20200215-titles.txt:Wikia:About
repaintedfandomcom-20200213-wikidump/repaintedfandomcom-20200213-titles.txt:Wikia:About
saiyajinfandomcom-20200214-wikidump/saiyajinfandomcom-20200214-titles.txt:Wikia:About
sangatsu_no_lionfandomcom-20200214-wikidump/sangatsu_no_lionfandomcom-20200214-titles.txt:Wikia:About
sayward_pinesfandomcom-20200213-wikidump/sayward_pinesfandomcom-20200213-titles.txt:Wikia:About
school_of_the_gifted_and_mysticalfandomcom-20200214-wikidump/school_of_the_gifted_and_mysticalfandomcom-20200214-titles.txt:Wikia:About
scuzzybetafandomcom-20200214-wikidump/scuzzybetafandomcom-20200214-titles.txt:Wikia:About
sergiolaredobarquerofandomcom-20200214-wikidump/sergiolaredobarquerofandomcom-20200214-titles.txt:Wikia:About
seventhdragon3fandomcom-20200215-wikidump/seventhdragon3fandomcom-20200215-titles.txt:Wikia:About
swg_legendsfandomcom-20200213-wikidump/swg_legendsfandomcom-20200213-titles.txt:Wikia:About
taenfandomcom-20200213-wikidump/taenfandomcom-20200213-titles.txt:Wikia:About
tangiblefandomcom-20200215-wikidump/tangiblefandomcom-20200215-titles.txt:Wikia:About
templarsoftherosefandomcom-20200214-wikidump/templarsoftherosefandomcom-20200214-titles.txt:Wikia:About
thebeemoviefandomcom-20200215-wikidump/thebeemoviefandomcom-20200215-titles.txt:Wikia:About
the_grovefandomcom-20200213-wikidump/the_grovefandomcom-20200213-titles.txt:Wikia:About
the_world_of_fluffy_bunnyfandomcom-20200215-wikidump/the_world_of_fluffy_bunnyfandomcom-20200215-titles.txt:Wikia:About
toybashfandomcom-20200216-wikidump/toybashfandomcom-20200216-titles.txt:Wikia:About
trigger_happy_the_gremlin_cityfandomcom-20200215-wikidump/trigger_happy_the_gremlin_cityfandomcom-20200215-titles.txt:Wikia:About
txkhotsaucefandomcom-20200214-wikidump/txkhotsaucefandomcom-20200214-titles.txt:Wikia:About
ultraman_seriesfandomcom-20200213-wikidump/ultraman_seriesfandomcom-20200213-titles.txt:Wikia:About
vegafandomcom-20200214-wikidump/vegafandomcom-20200214-titles.txt:Wikia:About
verse_and_dimensionsfandomcom-20200214-wikidump/verse_and_dimensionsfandomcom-20200214-titles.txt:Wikia:About
vg_fanonfandomcom-20200214-wikidump/vg_fanonfandomcom-20200214-titles.txt:Wikia:About
vibrationsfandomcom-20200215-wikidump/vibrationsfandomcom-20200215-titles.txt:Wikia:About
vichinofandomcom-20200213-wikidump/vichinofandomcom-20200213-titles.txt:Wikia:About

@nemobis
Copy link
Member Author

nemobis commented Feb 22, 2020

It's not just [[Wikia:About]], also [[Wikia:Copyrights]] and [[Wikia:Templates]]. Handled in 7289225 , now it just logs the error:
2020-02-22_wikia_errors.log

We're left with a few legit exceptions:

  File "/home/users/federico/.local/lib/python2.7/site-packages/mwclient/client.py", line 332, in handle_api_result
    info['error']['info'], info['error']['*'])
mwclient.errors.APIError: (u'internal_api_error_MWException', u'Exception Caught: LBFactory_Multi::newExternalLB: Unknown cluster "dev-archive"', u'')

and:

  File "./dumpgenerator.py", line 429, in getXMLHeader
    xml = r.json()['query']['export']['*']
  File "/home/users/federico/.local/lib/python2.7/site-packages/requests/models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python2.7/dist-packages/simplejson/__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/dist-packages/simplejson/decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "/usr/lib/python2.7/dist-packages/simplejson/decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

and:

    for xml in getXMLRevisions(config=config, session=session):
  File "./dumpgenerator.py", line 958, in getXMLRevisions
    prequest = site.api(http_method=config['http_method'], **pparams)
  File "/home/users/federico/.local/lib/python2.7/site-packages/mwclient/client.py", line 289, in api
    if self.handle_api_result(info, sleeper=sleeper):
  File "/home/users/federico/.local/lib/python2.7/site-packages/mwclient/client.py", line 334, in handle_api_result
    info['error']['info'], kwargs)
mwclient.errors.APIError: (u'rvmultpages', u'titles, pageids or a generator was used to supply multiple pages, but the limit, startid, endid, dirNewer, user, excludeuser, start and end parameters may only be used on a single page.', None)

@nemobis
Copy link
Member Author

nemobis commented Feb 29, 2020

Traceback (most recent call last):
  File "dumpgenerator.py", line 2527, in <module>
    main()
  File "dumpgenerator.py", line 2517, in main
    resumePreviousDump(config=config, other=other)
  File "dumpgenerator.py", line 2126, in resumePreviousDump
    getPageTitles(config=config, session=other['session'])
  File "dumpgenerator.py", line 381, in getPageTitles
    for title in titles:
  File "dumpgenerator.py", line 262, in getPageTitlesAPI
    for page in site.allpages(namespace=namespace):
  File "/home/users/federico/.local/lib/python2.7/site-packages/mwclient/listing.py", line 77, in next
    return self.__next__(*args, **kwargs)
  File "/home/users/federico/.local/lib/python2.7/site-packages/mwclient/listing.py", line 181, in __next__
    if info['ns'] == 14:
KeyError: 'ns'
tail: cannot open 'eigafandomcom-20200223-wikidump/eigafandomcom-20200223-history.xml' for reading: No such file or directory

@nemobis
Copy link
Member Author

nemobis commented Mar 7, 2020

That last one I did not understand...

Elaeagnifolia added a commit to Elaeagnifolia/wikiteam that referenced this issue Nov 9, 2020
Gamepedia's image files were moved to Wikia's image servers, so dumping images with the script - given a Gamepedia API - causes the same latest?cb= filename bug that Wikia/Fandom Wikis experienced (Issue WikiTeam#362).

Example: 
https://dragalialost.gamepedia.com/File:Notte.png
https://static.wikia.nocookie.net/dragalialost_gamepedia_en/images/e/e8/Notte.png/revision/latest?cb=20180919220831
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant