Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3 support for dumpgenerator.py #331

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Conversation

TimSC
Copy link
Contributor

@TimSC TimSC commented Dec 3, 2018

This should add python 3 support to dumpgenerator.py without breaking python 2 behavior.

@doronbehar
Copy link

Thanks for this patch :) It worked for me, too bad upstream is inactive...

@nemobis
Copy link
Member

nemobis commented Oct 15, 2019 via email

@doronbehar
Copy link

I think, your CI checks fail because of the Python version in travis...

And I'm not sure what Unicode bugs you are referring to..

@doronbehar
Copy link

Now I see:

I tried to resume a previous download session and the loadConfig failed all the time - I couldn't figure out why until I did this:

@@ -1395,12 +1397,12 @@ def domain2prefix(config={}, session=None):
 def loadConfig(config={}, configfilename=''):
     """ Load config file """

-    try:
-        with open('%s/%s' % (config['path'], configfilename), 'r') as infile:
-            config = pickle.load(infile)
-    except:
-        print ('There is no config file. we can\'t resume. Start a new dump.')
-        sys.exit()
+    #  try:
+    with open('%s/%s' % (config['path'], configfilename), 'r') as infile:
+        config = pickle.load(infile)
+    #  except:
+        #  print ('There is no config file. we can\'t resume. Start a new dump.')
+        #  sys.exit()

     return config

And I got this error:

Traceback (most recent call last):
  File "./dumpgenerator.py", line 2359, in <module>
    main()
  File "./dumpgenerator.py", line 2343, in main
    config = loadConfig(config=config, configfilename=configfilename)
  File "./dumpgenerator.py", line 1402, in loadConfig
    config = pickle.load(infile)
  File "/nix/store/swy0p01xr0wyh907d67hkxr1g0kngcpn-python3-3.7.4/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

I took me a while to trace it down, naturally because a "catch all" except statement was used and so the error message wasn't clear - the file was there. See this QA.

@doronbehar
Copy link

This QA says to use rb instead of r...

@nemobis
Copy link
Member

nemobis commented Oct 15, 2019 via email

@nemobis
Copy link
Member

nemobis commented Feb 8, 2020

I've started testing this, but it's a can of worms. We need to test various kinds of inputs, but a lot of failures are surfaced even with a single wiki, with a single launch or XML/image resumption attempt. Also, wikitools and reverse_readlines don't like python3, while pickle doesn't like strings. Hmpf.

I'm using Python 3.7.6, by the way.

And yes, there are some files which need to be opened in binary mode for the way this was written, plus there are some errors of concatenation of bytes with non-bytes. I'm not entirely sure what was your intention.

@nemobis
Copy link
Member

nemobis commented Feb 8, 2020

On the other hand, this rather simplistic change mostly works for me: nemobis@bcecfa2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants