Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added 3 different celery queues for analyzers. #274

Merged
merged 1 commit into from Dec 3, 2020

Conversation

0ssigeno
Copy link
Contributor

It is possible to select the queue inside analyzer_config.json.
Added eventlet for the I/O queues.

@eshaan7
Copy link
Member

eshaan7 commented Nov 30, 2020

Is the performance improvement worth the 2 extra container/services and added complexity? We should really think about this because we have many services already; this change may result in existing users having to upgrade their instances/machines where they run IntelOwl.

EDIT: Maybe we could just modify the default queue to use eventlet ?

@0ssigeno
Copy link
Contributor Author

0ssigeno commented Dec 1, 2020

Eventlet was designed to work better with I/O bound threads. The vast majority of our analyzers is I/O bound, since we are using API in many cases. Unfortunely, it is not always the case. For example the docker analyzers or the static file analyzer.
Moreover, we have a huge difference in timeouts between analyzers. Meaning that we could, and we actually did, block the entire IntelOwl if many long task are scheduled. This way we can at least provide the results of the fast analyzers, even if the long queue is full.

The number one cons of this implementation, is that yes, IntelOwl will require more resources to be run, since two more containers will be created. A solution for this problem can be to provide another docker-compose production file, allowing the users to choose the architecture that better describe their needs.

I thought about putting eventlet as the default queue. Again, our vast majority of analyzers is I/O bound, so probably is a good idea. The only problem are the tests. The default queue has been tested till this day in a production environment, and it does it job quite well. We have no data if using eventlet will cause secondary effect that a short time testing is not able to detect (i.e. memory leaks and whatever).

@mlodic mlodic marked this pull request as draft December 1, 2020 08:48
@0ssigeno
Copy link
Contributor Author

0ssigeno commented Dec 1, 2020

I will wait your thoughts on the matter before completing the PR for travis.

@mlodic
Copy link
Member

mlodic commented Dec 1, 2020

I think that eventlet is absolutely worth to be experimented. I also agree with @eshaan7 concerns about actual IntelOwl users. Lot of them just use IntelOwl "the easy way" and do not need this change at all. On the other hand, groups that would like to make IntelOwl scale appropriately would probably benefit from a change like that. So ATM IMHO it is better to have this new feature as an optional configuration. We could explain the benefits + additional configuration required in the "Advanced Usage" section of the docs. Thoughts?

@0ssigeno
Copy link
Contributor Author

0ssigeno commented Dec 2, 2020

I'm fine with that. I Will add the eventlet queue as default everywhere and will provide an optional docker-compose for a multi-queue IntelOwl.

@0ssigeno 0ssigeno force-pushed the celery_queue branch 2 times, most recently from 4dfc394 to d930412 Compare December 2, 2020 10:02
intel_owl/settings.py Outdated Show resolved Hide resolved
@eshaan7
Copy link
Member

eshaan7 commented Dec 2, 2020

Another thought: with this implementation, the user would have 2 options: docker-compose.yml and docker-compose-multi-queue.yml, now what if they choose the traefik compose file ? then they can only use single queue but not the multi queue. We could create another compose file for traefik + multi-queue but this is not ideal because too many compose files multiplicating in parallel directions.

Maybe we can totally remove the celery worker services from all base docker compose files (production, test, travis, traefik) and create 2 standalone files like celery.single-queue.yml and celery.multi-queue.yml ? and then use the .env to join them like we do for optional analyzer services. I think this would also be more maintainable in the long run. Thoughts: @mlodic, @0ssigeno ?

@mlodic
Copy link
Member

mlodic commented Dec 2, 2020

Yes, at this point we should completely rework all the docker-compose files to leverage the override feature (https://docs.docker.com/compose/extends/).

This would reduce also all the problems related to the duplication of code in those files

@eshaan7
Copy link
Member

eshaan7 commented Dec 2, 2020

The extends keyword was removed after docker-compose version v2.1 but we are using v3 of docker-compose in all the compose files. (source).

What I am suggesting is something similar to this comment which is also more or less similar to the current implementation.

In any case, you are right. We do need to rework all compose files and also the directory structure.

@eshaan7
Copy link
Member

eshaan7 commented Dec 2, 2020

I also found this comment which says we can use extends keyword in v3 as well but the same is not updated in docker's docs. Will need to experiment with this and see for ourselves.

@mlodic
Copy link
Member

mlodic commented Dec 2, 2020

Sorry maybe I was misunderstood with that link. I did not mean to refer to extends but just to the override feature between docker-compose files.

Example, we can avoid to set this in each compose file:

  rabbitmq:
    image: library/rabbitmq:3.8-alpine
    container_name: intel_owl_rabbitmq

We can create a "base" docker-compose file that contains the basic configuration.
What I meant is basically your idea but applied to all the services, not only the ones related to celery.

@eshaan7
Copy link
Member

eshaan7 commented Dec 2, 2020

What I meant is basically your idea but applied to all the services, not only the ones related to celery.

Ah, I see. Yes, definitely what I meant as well.

Issue #277: So, let's create a compose.common.yml file with the services like rabbit-mq, postgres which are common in all compose files and also similarly for celery.

@lgtm-com
Copy link

lgtm-com bot commented Dec 2, 2020

This pull request introduces 1 alert when merging 87b73f7 into c339a63 - view on LGTM.com

new alerts:

  • 1 for Unused import

@eshaan7
Copy link
Member

eshaan7 commented Dec 2, 2020

we can totally remove the celery worker services from all base docker compose files (production, test, travis, traefik) and create 2 standalone files like celery.single-queue.yml and celery.multi-queue.yml ? and then use the .env to join them like we do for optional analyzer services. I think this would also be more maintainable in the long run

Should we do this change in this PR only or another one ? @mlodic

@intelowlproject intelowlproject deleted a comment from lgtm-com bot Dec 2, 2020
@mlodic
Copy link
Member

mlodic commented Dec 2, 2020

yep, I agree, one step at a time

@0ssigeno 0ssigeno marked this pull request as ready for review December 2, 2020 13:04
@0ssigeno
Copy link
Contributor Author

0ssigeno commented Dec 2, 2020

Do you prefer to split the configuration file before merge this PR?
If so, I can work on that in a separated PR

@eshaan7
Copy link
Member

eshaan7 commented Dec 2, 2020

I think we can do it in a seperate PR after merging this one. I say this because this commit/PR is already very big as of now. You can fix the LGTM alert and mark the PR ready for review.

It is possible to select the queue inside analyzer_config.json.
@0ssigeno 0ssigeno requested a review from eshaan7 December 3, 2020 08:19
@eshaan7 eshaan7 merged commit 6ab6b1c into intelowlproject:develop Dec 3, 2020
@eshaan7 eshaan7 mentioned this pull request Dec 3, 2020
6 tasks
@0ssigeno 0ssigeno deleted the celery_queue branch January 7, 2021 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants