Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal instruction error #3738

Closed
thjashin opened this issue Mar 6, 2018 · 20 comments
Closed

Illegal instruction error #3738

thjashin opened this issue Mar 6, 2018 · 20 comments
Labels
Needed: replication Bug replication is required Support Support question

Comments

@thjashin
Copy link

thjashin commented Mar 6, 2018

Details

Expected Result

I regularly update the repo and the docs should rebuild.

Actual Result

In recent commits the building failed with the following error log when starting Sphinx:

python /home/docs/checkouts/readthedocs.org/user_builds/zhusuan/envs/latest/bin/sphinx-build -T -E -b readthedocs -d _build/doctrees-readthedocs -D language=en . _build/html
Running Sphinx v1.7.1
Illegal instruction

My local building works well and I've tried Sphinx 1.6.5 and 1.7.1. They both failed with the same error. Do you have an idea what the problem is?

@humitos humitos added the Needed: replication Bug replication is required label Mar 6, 2018
@humitos
Copy link
Member

humitos commented Mar 6, 2018

I imported your project in my local instance and I wasn't able to reproduce this issue. We will need more research on this.

@thjashin
Copy link
Author

thjashin commented Mar 6, 2018

Thanks for taking a look. Anything I can help? I've been bothered for a while about this.

@stsewd
Copy link
Member

stsewd commented Mar 7, 2018

I tried to import your project on my local instance, but I got an memory limit error, so I think probably that is the issue. Related to #3613.

@humitos Do you have the default limits for memory and time on your local installation?

@humitos
Copy link
Member

humitos commented Mar 7, 2018

@stsewd I modified them using the local_settings.py with these values

DOCKER_LIMITS = {
    'memory': '2048m',
    'time': 3600,
}

@thjashin
Copy link
Author

thjashin commented Mar 7, 2018

@stsewd Can you paste the log here? It's strange that my project takes that much memory.
I set

ulimit -Sv 1000000 #1000m

to run make html. It doesn't seem to fail.

@berkerpeksag
Copy link
Member

Looking at http://readthedocs.org/projects/zhusuan/builds/6828690/, the return code is 132 which means sphinx-build was interrupted by SIGILL. I don't rule out the OOM case since Python 2 is doing a poor job at handling OOM errors, but this might be caused by an extension module compiled with a flag that is not supported by CPU (or virtual CPU?) in our servers.

We probably need to check versions of all dependencies with extension modules (numpy, tensorflow, matplotlib etc.) in http://readthedocs.org/projects/zhusuan/builds/6828690/ (first failed build) and http://readthedocs.org/projects/zhusuan/builds/6822461/ (last completed build)

@thjashin
Copy link
Author

thjashin commented Mar 7, 2018

@stsewd @berkerpeksag I just updated the dev branch and found the memory error when downloading tensorflow 1.6.0.

@thjashin
Copy link
Author

thjashin commented Mar 8, 2018

I did some experiments on the dev branch. When I set TF to 1.4.0 in the doc requirements file, everything works well (build page). When I changed it to 1.6.0, the build failed with a MemoryError (build page) or Illegal instruction (build page).

I'm now using <=1.4.0 for a temporal fix. Should I request more memory for my project or wait for a fix?

@stsewd
Copy link
Member

stsewd commented Mar 8, 2018

@thjashin I'm glad that you have your docs working! And not really sure if with more memory your problem would go away, what @berkerpeksag mentions is also very valid (but the builds are executed within a docker container, so maybe a problem with the host?).

@berkerpeksag
Copy link
Member

berkerpeksag commented Mar 8, 2018

Looking at the build logs shared by @thjashin in #3738 (comment), I think there are two different problems:

  1. Getting MemoryError when installing tensorflow 1.6.0 (it looks like it randomly fails with MemoryError)
  2. SIGILL after tensorflow 1.6.0 successfully installed (without getting MemoryError)

There are some reports about the second problem in tensorflow's issue tracker: tensorflow/tensorflow#17373 (uses precompiled wheels like us), tensorflow/tensorflow#17411 (same issue) and tensorflow/tensorflow#17441 So I'm beginning to think that the cause of the problem is a buggy wheel distribution.

I don't know what to do with the first problem though. Perhaps we could just increase memory limit (this needs to be discussed with operations team) or implement a retry mechanism (but it's hard to guess whether a MemoryError raised randomly)

@humitos
Copy link
Member

humitos commented Mar 9, 2018

@thjashin simple question: do you really need TensorFlow to build your documentation? If it's not a strict requirement for building the docs, the better solution here is to not install it on RTD env. You can avoid installing it by using a docs/requirements.txt specific for RTD with only the packages needed to build your docs :)

@humitos humitos added the Support Support question label Mar 9, 2018
@thjashin
Copy link
Author

thjashin commented Mar 9, 2018

@stsewd @berkerpeksag Thanks for the comments and pointers. I guess the SIGILL problem would be solved in future versions of TF. As for the MemoryError, how about the solution proposed by @stsewd ?

@thjashin
Copy link
Author

thjashin commented Mar 9, 2018

@humitos Maybe I'm doing in the wrong way, but since there are plenty of import tensorflow in my source code, is there any way to let sphinx.autodoc generate api docs from doc strings without installing TF?

@stsewd
Copy link
Member

stsewd commented Mar 9, 2018

@thjashin
Copy link
Author

thjashin commented Mar 9, 2018

@stsewd Cool, thanks. I shall try this.

@humitos
Copy link
Member

humitos commented Mar 9, 2018

is there any way to let sphinx.autodoc generate api docs from doc strings without installing TF?

If you are using autodoc, you need to install it.

Otherwise, supposing that you don't need autodoc, you can mock it as http://docs.readthedocs.io/en/latest/faq.html#i-get-import-errors-on-libraries-that-depend-on-c-modules

@stsewd
Copy link
Member

stsewd commented Mar 16, 2018

@thjashin hey, I see your latest builds are passing, were you able to solve the issue?

@thjashin
Copy link
Author

@stsewd Thanks for asking. It passed because I'm using TF 1.4. I haven't got time to try the mock solution since I'm busy with other things these days. I will report in this thread once I try it.

@dsblank
Copy link

dsblank commented Apr 5, 2018

Downgrading worked for me too. Thanks for reporting, and sharing work-around!

@agjohnson
Copy link
Contributor

Thanks for the information everyone! I see the project is building now, so perhaps the immediate error was resolved with these solutions. I would echo using mocking to anyone hitting a similar issue. Closing this for now, but speak up if this error is still a problem for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needed: replication Bug replication is required Support Support question
Projects
None yet
Development

No branches or pull requests

6 participants