Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Scans pending caused by broken celery #1241

Closed
1 task done
metehan-arslan opened this issue May 5, 2024 · 21 comments · Fixed by #1251
Closed
1 task done

bug: Scans pending caused by broken celery #1241

metehan-arslan opened this issue May 5, 2024 · 21 comments · Fixed by #1251
Labels
bug Something isn't working

Comments

@metehan-arslan
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Rengine scans stuck at pending, whois doesn't works. make logs shows celery loop.

Thanks to Talanor from discord we were able to identify the issue. Running pip as root causing to crash existing system dependencies.

Removing the following line fixed the loop:
https://github.com/yogeshojha/rengine/blob/master/web/celery-entrypoint.sh#L81

Expected Behavior

Scans to work, loops in celery shouldn't happen.

celery-1       | Error: Invalid value for '-A' / '--app':
celery-1       | Unable to load celery application.
celery-1       | Module 'select' has no attribute 'epoll'
celery-1       | Usage: celery [OPTIONS] COMMAND [ARGS]...
celery-1       | Try 'celery --help' for help.
celery-1       | 
celery-1       | Error: Invalid value for '-A' / '--app':
celery-1       | Unable to load celery application.
celery-1       | Module 'select' has no attribute 'epoll'
celery-1       | Usage: celery [OPTIONS] COMMAND [ARGS]...
celery-1       | Try 'celery --help' for help.
celery-1       | 
celery-1       | Error: Invalid value for '-A' / '--app':
celery-1       | Unable to load celery application.
celery-1       | Module 'select' has no attribute 'epoll'
celery-1       | Usage: celery [OPTIONS] COMMAND [ARGS]...
celery-1       | Try 'celery --help' for help.
celery-1       | 
celery-1       | Error: Invalid value for '-A' / '--app':
celery-1       | Unable to load celery application.
celery-1       | Module 'select' has no attribute 'epoll'
celery-1       | Usage: celery [OPTIONS] COMMAND [ARGS]...
celery-1       | Try 'celery --help' for help.

Steps To Reproduce

git clone https://github.com/yogeshojha/rengine.git
sudo ./install

Environment

- reNgine: 2.0.5
- OS: Raspberry Pi OS (bookworm), Fedora 40
- Python: 3.11.2
- Docker Engine: 26.1.1
- Docker Compose: 2.27.0
- Browser: Firefox, Chrome, Ungoogled Chromium

Anything else?

see also: #1234

@metehan-arslan metehan-arslan added the bug Something isn't working label May 5, 2024
Copy link

github-actions bot commented May 5, 2024

👋 Hi @metehan-arslan,
Issues is only for reporting a bug/feature request. Please read documentation before raising an issue https://rengine.wiki
For very limited support, questions, and discussions, please join reNgine Discord channel: https://discord.gg/azv6fzhNCE
Please include all the requested and relevant information when opening a bug report. Improper reports will be closed without any response.

@Talanor
Copy link
Contributor

Talanor commented May 5, 2024

To add a bit more context here:
A modification from the default Dockerfile was made: arch is arm64 (for OS and go)

Probably one of this lib install breaks celery : https://github.com/laramies/theHarvester/blob/master/requirements/base.txt

Generally:

  • pip (or any non os mainter supported package manager) as root should be avoided.
  • virtual environments (venv, poetry, pipx, pick your poison) should be used for every python tool, including celery.
  • Performing pull / clone every entrypoint call gives us a bleeding edge version of every tool that will inevitably break every so often, having premade docker image with known-to-work tools with versions bumped periodically would help on that front. And / or leaving the user perform tool update (via the UI ?)

@shubhamvashist11
Copy link

_Removing the following line fixed the loop:
https://github.com/yogeshojha/rengine/blob/master/web/celery-entrypoint.sh#L81_

It didn't fix this for me. Any other workaround? @Talanor @metehan-arslan

@Talanor
Copy link
Contributor

Talanor commented May 7, 2024

_Removing the following line fixed the loop:
https://github.com/yogeshojha/rengine/blob/master/web/celery-entrypoint.sh#L81_

It didn't fix this for me. Any other workaround? @Talanor @metehan-arslan

If you have initialized your container once, edit your ./web/celery-entrypoint.sh to keep only the celery workers launch lines at the end, which look something like:

echo "Starting Workers..."
echo "Starting Main Scan Worker with Concurrency: $MAX_CONCURRENCY,$MIN_CONCURRENCY"
watchmedo auto-restart --recursive --pattern="*.py" --directory="/usr/src/app/reNgine/" -- celery -A reNgine.tasks worker --loglevel=info --autoscale=$MAX_CONCURRENCY,$MIN_CONCURRENCY -Q main_scan_queue &
[...]
watchmedo auto-restart --recursive --pattern="*.py" --directory="/usr/src/app/reNgine/" -- celery -A reNgine.tasks worker --pool=gevent --concurrency=10 --loglevel=info -Q theHarvester_queue -n theHarvester_worker
exec "$@"

Then docker compose down and docker compose up the celery container.
If the issue persits, it is due to something else.

If it works, some pip install breaks celery.
Add back lines you deleted slowly and down/up the celery container until it breaks to find the culprit.

I'm working on a container with venvs & pipx, but in the meantime that'll get you running.

@OffS3c
Copy link

OffS3c commented May 8, 2024

#1248 This issue seems to be related. I was having the same output.

@Talanor
Copy link
Contributor

Talanor commented May 8, 2024

#1248 This issue seems to be related. I was having the same output.

Unlikely, Infoga isn't cloned since it doesn't exist, so it can't make celery fail.
More likely, infoga wasn't cloned, hence the error, AND you had a broken install due to this issue.

@yogeshojha
Copy link
Owner

Hi, this looks very familiar to me.
Which branch did you clone, is it master or release/2.1.0?

I had this exact issue when I tried to use ollama with celery and it has known issues.

But on master this is very strange.

@Talanor
Copy link
Contributor

Talanor commented May 9, 2024

This is on master, the discord is full of people with clean install having that bug.

@Nandolorian
Copy link

If you have initialized your container once, edit your ./web/celery-entrypoint.sh to keep only the celery workers launch lines at the end, which look something like:

I have the same behaviour in a fresh install of Ubuntu Server 240.4, I follow your advise and comment the lines in ./web/celery-entrypoint.sh. I found that the error was generated by theHarvester in this line:

python3 -m pip install -r /usr/src/github/theHarvester/requirements/base.txt

In the file base.txt the library fastapi==0.111.0 is the culprit.
I hope that this helps

@yogeshojha
Copy link
Owner

@Nandolorian did you downgrade or upgrade the fastapi version?
When I was testing 2.1.0 I found out that asyncio was the culprit. Not sure why fastapi has issues with celery

@Nandolorian
Copy link

@yogeshojha I downgraded to 0.110.3 and the error didn't happen.
I run some scans using the OSINT scan engine and theHarvester runs without any trouble.

@yogeshojha
Copy link
Owner

@Nandolorian Thank you! I am downgrading fastapi and let me try doing the installation!

@yogeshojha
Copy link
Owner

@Nandolorian I tired with downgraded fastapi, sadly it doesnt work. Do you mind sharing with me all the requirements version?

You can do this by

docker exec -it rengine-celery-1 bash

and then

pip freeze

@yogeshojha
Copy link
Owner

yogeshojha commented May 10, 2024

Okay I think httpcore is the culprit here.

python-trio/trio#2848

I had this exact same issue when using ollama-python because it uses httpcore library and our celery workers are gevent based, httpcore which is a coroutine-based networking library and uses blocking I/O, which conflicts with gevent's cooperative multitasking model as per my understanding.

I guess finding which tool uses httpcore and removing them would solve this.

@yogeshojha yogeshojha mentioned this issue May 10, 2024
@Talanor
Copy link
Contributor

Talanor commented May 10, 2024

Or, installing tools in venvs ;)

@yogeshojha
Copy link
Owner

@Talanor yeah venv would be better, but either ways when any of the tools reNgine uses httpcore it wont be able to work with gevent and celery. We might have to change the way these tools run outside celery or use another event pool.
But I am open to hearing how you think venv will help us solve this?

@Nandolorian
Copy link

@Nandolorian I tired with downgraded fastapi, sadly it doesnt work. Do you mind sharing with me all the requirements version?

You can do this by

docker exec -it rengine-celery-1 bash

and then

pip freeze

I see you have discovered that httpcore is the problem. Anyway, I am posting the list for your reference in case it is still useful.

Requirements aiofiles==23.2.1
aiodns==3.2.0
aiohttp==3.9.5
aiomultiprocess==0.9.1
aiosignal==1.3.1
aiosqlite==0.20.0
amqp==5.2.0
annotated-types==0.6.0
anyio==4.3.0
appdirs==1.4.4
argcomplete==3.3.0
argh==0.26.2
asgiref==3.8.1
async-timeout==4.0.3
attrs==23.2.0
backoff==2.2.1
beautifulsoup4==4.12.3
billiard==4.2.0
blinker==1.4
Brotli==1.1.0
bs4==0.0.1
celery==5.4.0
censys==2.2.12
certifi==2024.2.2
cffi==1.16.0
chardet==5.0.0
charset-normalizer==2.1.1
click==8.1.7
click-didyoumean==0.3.1
click-plugins==1.1.1
click-repl==0.3.0
colorama==0.4.4
coreapi==2.3.3
coreschema==0.0.4
cron-descriptor==1.4.3
cryptography==3.4.8
cssselect2==0.7.0
dbus-python==1.2.18
decorator==5.1.1
Deprecated==1.2.14
discord-webhook==1.3.0
distro==1.7.0
Django==3.2.4
django-ace==1.0.11
django-celery-beat==2.6.0
django-login-required-middleware==0.6.1
django-mathfilters==1.0.0
django-role-permissions==3.2.0
django-timezone-field==6.1.0
djangorestframework==3.12.4
djangorestframework-datatables==0.6.0
dnspython==2.6.1
dotted-dict==1.1.3
drf-yasg==1.21.3
et-xmlfile==1.1.0
exceptiongroup==1.2.1
exrex==0.10.5
fastapi==0.110.3
filelock==3.14.0
fire==0.4.0
fonttools==4.51.0
frozenlist==1.4.1
future==0.18.2
fuzzywuzzy==0.18.0
gevent==24.2.1
greenlet==3.0.3
gunicorn==22.0.0
h11==0.14.0
h8mail==2.5.6
html5lib==1.1
httplib2==0.20.2
humanize==4.3.0
idna==3.3
importlib-metadata==4.6.4
importlib_resources==6.4.0
inflection==0.5.1
itypes==1.2.0
jeepney==0.7.1
Jinja2==3.1.4
keyring==23.5.0
kombu==5.3.7
launchpadlib==1.10.16
lazr.restfulclient==0.14.4
lazr.uri==1.0.6
Levenshtein==0.25.1
limits==3.11.0
loguru==0.6.0
lxml==5.2.1
Markdown==3.3.4
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
metafinder==1.2
more-itertools==8.10.0
multidict==6.0.5
netaddr==1.2.1
netlas==0.4.1
oauthlib==3.2.0
openai==0.28.0
openpyxl==3.1.2
orjson==3.9.0
outcome==1.3.0.post0
packaging==24.0
pikepdf==8.15.1
pillow==10.3.0
playwright==1.43.0
pluginbase==1.0.1
prettytable==3.10.0
prompt-toolkit==3.0.43
psycopg2==2.9.7
pycares==4.4.0
pycparser==2.22
pycvesearch==1.0
pydantic==2.7.1
pydantic_core==2.18.2
pydyf==0.10.0
pyee==11.1.0
Pygments==2.18.0
PyGObject==3.42.1
PyJWT==2.3.0
pyparsing==2.4.7
pyphen==0.15.0
PySocks==1.7.1
python-apt==2.4.0+ubuntu3
python-crontab==3.0.0
python-dateutil==2.9.0.post0
python-docx==1.1.2
python-Levenshtein==0.25.1
python-pptx==0.6.23
pytz==2024.1
PyVirtualDisplay==3.0
PyYAML==6.0.1
rapidfuzz==3.9.0
redis==5.0.3
requests==2.31.0
requests-file==2.0.0
retrying==1.3.4
rich==13.7.1
ruamel.yaml==0.18.6
ruamel.yaml.clib==0.2.8
scapy==2.4.3
SecretStorage==3.3.1
selenium==4.9.1
shodan==1.31.0
simplejson==3.19.2
six==1.16.0
slowapi==0.1.9
sniffio==1.3.1
sortedcontainers==2.4.0
soupsieve==2.3.2
SQLAlchemy==1.3.22
sqlparse==0.5.0
starlette==0.37.2
tenacity==8.0.1
termcolor==1.1.0
tinycss2==1.3.0
tinydb==4.8.0
tldextract==3.5.0
tqdm==4.64.0
treelib==1.6.1
trio==0.25.0
trio-websocket==0.11.1
typing_extensions==4.11.0
tzdata==2024.1
ujson==5.9.0
uritemplate==4.1.1
urllib3==1.26.9
uro==1.0.0
uvicorn==0.29.0
uvloop==0.19.0
validators==0.18.2
vine==5.1.0
wadllib==1.3.6
wafw00f==2.2.0
watchdog==4.0.0
wcwidth==0.2.13
weasyprint==53.3
webencodings==0.5.1
whatportis==0.8
win32-setctime==1.1.0
wrapt==1.16.0
wsproto==1.2.0
XlsxWriter==3.2.0
xmltodict==0.13.0
yarl==1.9.4
zipp==1.0.0
zope.event==5.0
zope.interface==6.3
zopfli==0.2.3

@Talanor
Copy link
Contributor

Talanor commented May 10, 2024

@Talanor yeah venv would be better, but either ways when any of the tools reNgine uses httpcore it wont be able to work with gevent and celery. We might have to change the way these tools run outside celery or use another event pool. But I am open to hearing how you think venv will help us solve this?

Please see my PR #1250 that adresses the issue while staying on the current versions.
The concept is that each tool is in its own virtual environment, so you can have multiple httpcore (or whatever else) versions installed without conflicts

@yogeshojha
Copy link
Owner

@Talanor your PR looks great, I liked the usage of poetry.

The problem is not conflicting versions of httcore or having multiple versions in same environment

and our celery workers are gevent based, httpcore which is a coroutine-based networking library and uses blocking I/O, which conflicts with gevent's cooperative multitasking model as per my understanding.

@Talanor
Copy link
Contributor

Talanor commented May 11, 2024

@Talanor your PR looks great, I liked the usage of poetry.

The problem is not conflicting versions of httcore or having multiple versions in same environment

and our celery workers are gevent based, httpcore which is a coroutine-based networking library and uses blocking I/O, which conflicts with gevent's cooperative multitasking model as per my understanding.

I must be missing something. I don't see httpcore in their pip freeze list?

@Nandolorian can you confirm newest master works on a fresh install for you?

@Talanor
Copy link
Contributor

Talanor commented May 11, 2024

Upon further investigation:
The celery workers from my PR do not install httpcore (as seen in my venv):

talanor@pentest:~/containers/reNgine-CaRE$ docker run --entrypoint /bin/bash -it talanor/rengine-celery:v0.3 
rengine@909fc90322cc:~/rengine$ ls
rengine@909fc90322cc:~/rengine$ cd
rengine@909fc90322cc:~$ ls
nuclei-templates  poetry.lock  pyproject.toml  rengine  results  scan_results  tools  wordlists
rengine@909fc90322cc:~$ poetry -C . shell
Spawning shell within /home/rengine/.cache/pypoetry/virtualenvs/celery-rengine-HmEJnPQT-py3.10
rengine@909fc90322cc:~$ . /home/rengine/.cache/pypoetry/virtualenvs/celery-rengine-HmEJnPQT-py3.10/bin/activate
(celery-rengine-py3.10) rengine@909fc90322cc:~$ pip list
Package                          Version
-------------------------------- -----------
aiodns                           3.0.0
aiohttp                          3.9.5
aiosignal                        1.3.1
amqp                             5.2.0
appdirs                          1.4.4
argh                             0.26.2
asgiref                          3.8.1
async-timeout                    4.0.3
attrs                            23.2.0
beautifulsoup4                   4.9.3
billiard                         4.2.0
Brotli                           1.1.0
celery                           5.4.0
certifi                          2024.2.2
cffi                             1.16.0
charset-normalizer               3.3.2
click                            8.1.7
click-didyoumean                 0.3.1
click-plugins                    1.1.1
click-repl                       0.3.0
coreapi                          2.3.3
coreschema                       0.0.4
cron-descriptor                  1.4.3
cssselect2                       0.7.0
decorator                        5.1.1
Deprecated                       1.2.14
discord-webhook                  1.3.0
Django                           3.2.4
django-ace                       1.0.11
django-celery-beat               2.6.0
django-login-required-middleware 0.6.1
django-mathfilters               1.0.0
django-role-permissions          3.2.0
django-timezone-field            6.1.0
djangorestframework              3.12.4
djangorestframework-datatables   0.6.0
dotted-dict                      1.1.3
drf-yasg                         1.21.3
et-xmlfile                       1.1.0
filelock                         3.14.0
fonttools                        4.51.0
frozenlist                       1.4.1
gevent                           24.2.1
greenlet                         3.0.3
gunicorn                         22.0.0
html5lib                         1.1
humanize                         4.3.0
idna                             3.7
inflection                       0.5.1
itypes                           1.2.0
Jinja2                           3.1.4
kombu                            5.3.7
lxml                             5.2.1
Markdown                         3.3.4
MarkupSafe                       2.1.5
metafinder                       1.2
multidict                        6.0.5
netaddr                          0.8.0
netlas                           0.4.1
openai                           0.28.0
openpyxl                         3.1.2
orjson                           3.9.0
packaging                        24.0
pikepdf                          8.15.1
pillow                           10.3.0
pip                              24.0
prettytable                      2.1.0
prompt-toolkit                   3.0.43
psycopg2                         2.9.7
pycares                          4.4.0
pycparser                        2.22
pycvesearch                      1.0
pydyf                            0.10.0
Pygments                         2.18.0
pyphen                           0.15.0
PySocks                          1.7.1
python-crontab                   3.0.0
python-dateutil                  2.9.0.post0
python-docx                      1.1.2
python-pptx                      0.6.23
pytz                             2024.1
PyYAML                           6.0.1
redis                            5.0.3
requests                         2.31.0
requests-file                    2.0.0
ruamel.yaml                      0.18.6
ruamel.yaml.clib                 0.2.8
scapy                            2.4.3
setuptools                       69.5.1
simplejson                       3.17.2
six                              1.16.0
soupsieve                        2.5
sqlparse                         0.5.0
tinycss2                         1.3.0
tinydb                           4.4.0
tldextract                       3.5.0
tqdm                             4.66.4
typing_extensions                4.11.0
tzdata                           2024.1
uritemplate                      4.1.1
urllib3                          2.2.1
uro                              1.0.0
validators                       0.18.2
vine                             5.1.0
watchdog                         4.0.0
wcwidth                          0.2.13
weasyprint                       53.3
webencodings                     0.5.1
whatportis                       0.8.2
wrapt                            1.16.0
XlsxWriter                       3.2.0
xmltodict                        0.13.0
yarl                             1.9.4
zope.event                       5.0
zope.interface                   6.3
zopfli                           0.2.3

However, it is installed as a fastapi dependency from theHarvester.
Installing theHarvester via venv do not install httpcore in the celery environment, and does not introduce conflict.

Basically, if you can fix it by hot removing a package via pip, its something that can (should) be solved with venvs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants