Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error offlining en.wikipedia.org with exit code 137 #626

Closed
automactic opened this issue Mar 17, 2019 · 18 comments
Closed

Error offlining en.wikipedia.org with exit code 137 #626

automactic opened this issue Mar 17, 2019 · 18 comments
Assignees
Labels
Milestone

Comments

@automactic
Copy link
Member

When trying to offline en.wikipedia.org, container returned with status code 137, stdout is empty string. The container was running for around 11hrs.

Traceback (most recent call last):
  File "/usr/src/app/operations/run_mwoffliner.py", line 37, in execute
    links={self.redis_container_name: 'redis'}, name='mwoffliner_{}'.format(self.task_id))
  File "/usr/local/lib/python3.6/site-packages/docker/models/containers.py", line 814, in run
    container, exit_status, command, image, out
docker.errors.ContainerError: Command 'mwoffliner --mwUrl=https://en.wikipedia.org --adminEmail=contact@kiwix.org --format=nopic --format=novid --useCache --customMainPage=User:Stephane_(Kiwix)/Landing --redis=redis://redis --outputDirectory=/output' in image 'openzim/mwoffliner:1.8.0' returned non-zero exit status 137: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 382, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 641, in __protected_call__
    return self.run(*args, **kwargs)
  File "/usr/src/app/tasks/mwoffliner.py", line 64, in run
    raise e
  File "/usr/src/app/tasks/mwoffliner.py", line 52, in run
    offliner_stdout = run_mwoffliner.execute()
  File "/usr/src/app/operations/run_mwoffliner.py", line 43, in execute
    raise OfflinerError(code='docker.ContainerError', stderr=e.stderr.decode("utf-8"))
operations.base.OfflinerError
@ISNIT0
Copy link
Contributor

ISNIT0 commented Mar 18, 2019

Do you have the previous logs? How far did it get in the scraping process? This doesn't look like a standard error (every error from MWOffliner is wrapped, so would never be empty).
Could be:

  • An exception from C++
  • Lack of memory issue

@ISNIT0
Copy link
Contributor

ISNIT0 commented Mar 19, 2019

@automactic Do you have the other logs? Would be helpful to know where in the process this happened

@automactic
Copy link
Member Author

No I don't. And this happens not just to English wikipedia, but to other wikis as well.

@kelson42
Copy link
Collaborator

@kelson42
Copy link
Collaborator

@automactic Can you please give the list of wiki also impacted? Is that systematic?

@kelson42 kelson42 added the bug label Mar 20, 2019
@ISNIT0
Copy link
Contributor

ISNIT0 commented Mar 20, 2019

Okay, I'll look at memory usage.
We build up a large data store in memory throughout the scrape, I'll look into finding a better storage solution (probably redis)

@kelson42
Copy link
Collaborator

@ISNIT0 Which kind of things do you want to move to redis exactly? mwoffliner should already not have a lot of things in memory... or do you have changed something regarding this?

@ISNIT0
Copy link
Contributor

ISNIT0 commented Mar 20, 2019

Yes, a lot of the downloading process was changed and simplified. I've found a drop-in solution which will allow us to keep the simple code, and use redis as the store

@kelson42 kelson42 added this to the 1.9 milestone Mar 26, 2019
@ISNIT0
Copy link
Contributor

ISNIT0 commented Mar 26, 2019

This is affected by #629

@automactic Would you mind testing with the master branch please?

@ISNIT0 ISNIT0 modified the milestones: 1.9, 1.8.2 Apr 3, 2019
@ISNIT0 ISNIT0 closed this as completed Apr 3, 2019
@automactic
Copy link
Member Author

still seeing the same thing with 1.8.2

@automactic automactic reopened this Apr 9, 2019
@automactic
Copy link
Member Author

automactic commented Apr 9, 2019

Mem usage of mwoffliner5, which is the node running April's wikipedia English with 1.8.2.

Screen Shot 2019-04-08 at 11 48 30 PM

So I don't think there is a memory issue, unless there is a hidden spike not distinguished in the chart.

I am running wikipedia english manually, so that I can inspect and see why the container exited

@automactic
Copy link
Member Author

Nope, not OOM, see docker inspect result

        "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 137,
            "Error": "",
            "StartedAt": "2019-04-09T03:51:40.009590423Z",
            "FinishedAt": "2019-04-09T04:38:26.811910408Z"
        },

And this container is run through a command, not by zimfarm. So likely no one killed it.

@kelson42, the container name is pensive_bassi on mwoffliner 5, I am leaving it there in case you want to dig around in it.

@automactic
Copy link
Member Author

However, if the host system run out of memory and killed the container processor, would the OOMKilled field be true? I am not sure

@ISNIT0
Copy link
Contributor

ISNIT0 commented Apr 9, 2019

@automactic I think this may be related to #660

Sometimes Parsoid force exits the processes without any errors or warnings.

I'm working on reducing the likelihood of this (changing timeouts and limits)

If we're not seeing errors, this is the most likely cause I can think of

@kelson42
Copy link
Collaborator

kelson42 commented Apr 9, 2019

@ISNIT0 Local parsoid is not (I hope so at least) used to scrape Wikipedia. We use the remote one.

@ISNIT0
Copy link
Contributor

ISNIT0 commented Apr 9, 2019

@kelson42 Sorry, my mistake

@ISNIT0
Copy link
Contributor

ISNIT0 commented Apr 9, 2019

Seems exit status 137 is related to memory & docker

moby/moby#21083

@ISNIT0 ISNIT0 modified the milestones: 1.8.2, 1.8.4 Apr 19, 2019
@kelson42 kelson42 modified the milestones: 1.8.4, 1.9 Apr 26, 2019
@kelson42
Copy link
Collaborator

This is a memory problem, see #706

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants