Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: cannot pickle '_io.TextIOWrapper' object #233

Open
hadifar opened this issue Dec 4, 2020 · 7 comments · May be fixed by #238
Open

TypeError: cannot pickle '_io.TextIOWrapper' object #233

hadifar opened this issue Dec 4, 2020 · 7 comments · May be fixed by #238

Comments

@hadifar
Copy link

hadifar commented Dec 4, 2020

Hi dear maintainers,

After running the provided command in Readme:

python -m wikiextractor.WikiExtractor enwiki-latest-pages-articles.xml.bz2

it throws the following exception:

 ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_io.TextIOWrapper' object

MacOS Catalina
python 3.8.6

@hadifar hadifar changed the title When running the script TypeError: cannot pickle '_io.TextIOWrapper' object Dec 6, 2020
@masahirokjp
Copy link

Incorrect argument.

@prokotg
Copy link

prokotg commented Dec 15, 2020

@hadifar would it be possible to get full backtrack?

I have run into something similar (Python 3.7-3.8, Windows 10)

INFO: Preprocessed 2000000 pages
INFO: Preprocessed 2100000 pages
INFO: Preprocessed 2200000 pages
INFO: Preprocessed 2300000 pages
INFO: Loaded 55745 templates in 105.9s
INFO: Starting page extraction from .\plwiki-20201020-pages-articles.xml.
Traceback (most recent call last):
  File "C:\Users\Tom\miniconda3\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Tom\miniconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\Tom\miniconda3\lib\site-packages\wikiextractor-3.0.4-py3.8.egg\wikiextractor\WikiExtractor.py", line 620, in <module>
  File "C:\Users\Tom\miniconda3\lib\site-packages\wikiextractor-3.0.4-py3.8.egg\wikiextractor\WikiExtractor.py", line 615, in main
  File "C:\Users\Tom\miniconda3\lib\site-packages\wikiextractor-3.0.4-py3.8.egg\wikiextractor\WikiExtractor.py", line 357, in process_dump
  File "C:\Users\Tom\miniconda3\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\Tom\miniconda3\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\Tom\miniconda3\lib\multiprocessing\context.py", line 326, in _Popen
    return Popen(process_obj)
  File "C:\Users\Tom\miniconda3\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\Tom\miniconda3\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_io.TextIOWrapper' object
(base) PS C:\Users\Tom\Downloads\plwiki-20201020-pages-articles.xml> Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\Tom\miniconda3\lib\multiprocessing\spawn.py", line 102, in spawn_main
    source_process = _winapi.OpenProcess(
OSError: [WinError 87] The parameter is incorrect

and then I realized that in the OutputSplitter we open file before fork, so I made a fixup and it works. I'm going to attach a PR, this probably could be better in terms of code and I'm happy to help if you are willing to refactor this. I also did not check how this affect other systems and python versions.

@prokotg prokotg linked a pull request Dec 15, 2020 that will close this issue
@MrWook
Copy link

MrWook commented Feb 8, 2021

I tried this on MacOS too but it just don't seem to work. I always get the same error with typeError: cannot pickle '_io.TextIOWrapper' object and it's not some Incorrect argument..
After i tried to use it on my Mac, I used this library on a linux server with the same file and the same arguments and it worked just fine.

So it seems like this library won't work on MacOS

@masahirokjp
Copy link

Hey guys, If that's the case, docker for mac is the solution.

@gormlabenz
Copy link

Same on MacOS Big Sur and Windows 10

@prokotg
Copy link

prokotg commented Feb 16, 2021

@attardi any take on this and linked PR?

@gormlabenz
Copy link

I was able to execute wikiextractor by changing python 3.8 to python 3.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants