Added 3 different celery queues for analyzers. #274

0ssigeno · 2020-11-30T16:31:01Z

It is possible to select the queue inside analyzer_config.json.
Added eventlet for the I/O queues.

eshaan7 · 2020-11-30T16:36:36Z

Is the performance improvement worth the 2 extra container/services and added complexity? We should really think about this because we have many services already; this change may result in existing users having to upgrade their instances/machines where they run IntelOwl.

EDIT: Maybe we could just modify the default queue to use eventlet ?

0ssigeno · 2020-12-01T08:23:44Z

Eventlet was designed to work better with I/O bound threads. The vast majority of our analyzers is I/O bound, since we are using API in many cases. Unfortunely, it is not always the case. For example the docker analyzers or the static file analyzer.
Moreover, we have a huge difference in timeouts between analyzers. Meaning that we could, and we actually did, block the entire IntelOwl if many long task are scheduled. This way we can at least provide the results of the fast analyzers, even if the long queue is full.

The number one cons of this implementation, is that yes, IntelOwl will require more resources to be run, since two more containers will be created. A solution for this problem can be to provide another docker-compose production file, allowing the users to choose the architecture that better describe their needs.

I thought about putting eventlet as the default queue. Again, our vast majority of analyzers is I/O bound, so probably is a good idea. The only problem are the tests. The default queue has been tested till this day in a production environment, and it does it job quite well. We have no data if using eventlet will cause secondary effect that a short time testing is not able to detect (i.e. memory leaks and whatever).

0ssigeno · 2020-12-01T13:53:21Z

I will wait your thoughts on the matter before completing the PR for travis.

mlodic · 2020-12-01T17:11:23Z

I think that eventlet is absolutely worth to be experimented. I also agree with @eshaan7 concerns about actual IntelOwl users. Lot of them just use IntelOwl "the easy way" and do not need this change at all. On the other hand, groups that would like to make IntelOwl scale appropriately would probably benefit from a change like that. So ATM IMHO it is better to have this new feature as an optional configuration. We could explain the benefits + additional configuration required in the "Advanced Usage" section of the docs. Thoughts?

0ssigeno · 2020-12-02T08:08:38Z

I'm fine with that. I Will add the eventlet queue as default everywhere and will provide an optional docker-compose for a multi-queue IntelOwl.

intel_owl/settings.py

eshaan7 · 2020-12-02T11:29:37Z

Another thought: with this implementation, the user would have 2 options: docker-compose.yml and docker-compose-multi-queue.yml, now what if they choose the traefik compose file ? then they can only use single queue but not the multi queue. We could create another compose file for traefik + multi-queue but this is not ideal because too many compose files multiplicating in parallel directions.

Maybe we can totally remove the celery worker services from all base docker compose files (production, test, travis, traefik) and create 2 standalone files like celery.single-queue.yml and celery.multi-queue.yml ? and then use the .env to join them like we do for optional analyzer services. I think this would also be more maintainable in the long run. Thoughts: @mlodic, @0ssigeno ?

mlodic · 2020-12-02T11:35:31Z

Yes, at this point we should completely rework all the docker-compose files to leverage the override feature (https://docs.docker.com/compose/extends/).

This would reduce also all the problems related to the duplication of code in those files

eshaan7 · 2020-12-02T11:41:30Z

The extends keyword was removed after docker-compose version v2.1 but we are using v3 of docker-compose in all the compose files. (source).

What I am suggesting is something similar to this comment which is also more or less similar to the current implementation.

In any case, you are right. We do need to rework all compose files and also the directory structure.

eshaan7 · 2020-12-02T11:47:34Z

I also found this comment which says we can use extends keyword in v3 as well but the same is not updated in docker's docs. Will need to experiment with this and see for ourselves.

mlodic · 2020-12-02T12:01:36Z

Sorry maybe I was misunderstood with that link. I did not mean to refer to extends but just to the override feature between docker-compose files.

Example, we can avoid to set this in each compose file:

  rabbitmq:
    image: library/rabbitmq:3.8-alpine
    container_name: intel_owl_rabbitmq

We can create a "base" docker-compose file that contains the basic configuration.
What I meant is basically your idea but applied to all the services, not only the ones related to celery.

eshaan7 · 2020-12-02T12:03:50Z

What I meant is basically your idea but applied to all the services, not only the ones related to celery.

Ah, I see. Yes, definitely what I meant as well.

Issue #277: So, let's create a compose.common.yml file with the services like rabbit-mq, postgres which are common in all compose files and also similarly for celery.

lgtm-com · 2020-12-02T12:04:00Z

This pull request introduces 1 alert when merging 87b73f7 into c339a63 - view on LGTM.com

new alerts:

1 for Unused import

eshaan7 · 2020-12-02T12:09:08Z

we can totally remove the celery worker services from all base docker compose files (production, test, travis, traefik) and create 2 standalone files like celery.single-queue.yml and celery.multi-queue.yml ? and then use the .env to join them like we do for optional analyzer services. I think this would also be more maintainable in the long run

Should we do this change in this PR only or another one ? @mlodic

mlodic · 2020-12-02T12:10:09Z

yep, I agree, one step at a time

0ssigeno · 2020-12-02T15:31:12Z

Do you prefer to split the configuration file before merge this PR?
If so, I can work on that in a separated PR

eshaan7 · 2020-12-02T16:34:22Z

I think we can do it in a seperate PR after merging this one. I say this because this commit/PR is already very big as of now. You can fix the LGTM alert and mark the PR ready for review.

It is possible to select the queue inside analyzer_config.json.

mlodic marked this pull request as draft December 1, 2020 08:48

0ssigeno force-pushed the celery_queue branch 2 times, most recently from 4dfc394 to d930412 Compare December 2, 2020 10:02

eshaan7 requested changes Dec 2, 2020

View reviewed changes

intel_owl/settings.py Outdated Show resolved Hide resolved

0ssigeno force-pushed the celery_queue branch from d930412 to 87b73f7 Compare December 2, 2020 11:41

eshaan7 mentioned this pull request Dec 2, 2020

Divide common docker services into diff. compose files #277

Closed

intelowlproject deleted a comment from lgtm-com bot Dec 2, 2020

0ssigeno marked this pull request as ready for review December 2, 2020 13:04

Added 3 different celery queues for analyzers.

e75a0a0

It is possible to select the queue inside analyzer_config.json.

0ssigeno force-pushed the celery_queue branch from 87b73f7 to e75a0a0 Compare December 3, 2020 07:35

0ssigeno requested a review from eshaan7 December 3, 2020 08:19

eshaan7 merged commit 6ab6b1c into intelowlproject:develop Dec 3, 2020

eshaan7 mentioned this pull request Dec 3, 2020

Used docker overwrite feature #279

Closed

6 tasks

0ssigeno deleted the celery_queue branch January 7, 2021 08:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added 3 different celery queues for analyzers. #274

Added 3 different celery queues for analyzers. #274

0ssigeno commented Nov 30, 2020

eshaan7 commented Nov 30, 2020 •

edited

0ssigeno commented Dec 1, 2020

0ssigeno commented Dec 1, 2020

mlodic commented Dec 1, 2020

0ssigeno commented Dec 2, 2020 •

edited

eshaan7 commented Dec 2, 2020 •

edited

mlodic commented Dec 2, 2020

eshaan7 commented Dec 2, 2020 •

edited

eshaan7 commented Dec 2, 2020

mlodic commented Dec 2, 2020 •

edited

eshaan7 commented Dec 2, 2020 •

edited

lgtm-com bot commented Dec 2, 2020

eshaan7 commented Dec 2, 2020 •

edited

mlodic commented Dec 2, 2020

0ssigeno commented Dec 2, 2020

eshaan7 commented Dec 2, 2020 •

edited

Added 3 different celery queues for analyzers. #274

Added 3 different celery queues for analyzers. #274

Conversation

0ssigeno commented Nov 30, 2020

eshaan7 commented Nov 30, 2020 • edited

0ssigeno commented Dec 1, 2020

0ssigeno commented Dec 1, 2020

mlodic commented Dec 1, 2020

0ssigeno commented Dec 2, 2020 • edited

eshaan7 commented Dec 2, 2020 • edited

mlodic commented Dec 2, 2020

eshaan7 commented Dec 2, 2020 • edited

eshaan7 commented Dec 2, 2020

mlodic commented Dec 2, 2020 • edited

eshaan7 commented Dec 2, 2020 • edited

lgtm-com bot commented Dec 2, 2020

eshaan7 commented Dec 2, 2020 • edited

mlodic commented Dec 2, 2020

0ssigeno commented Dec 2, 2020

eshaan7 commented Dec 2, 2020 • edited

eshaan7 commented Nov 30, 2020 •

edited

0ssigeno commented Dec 2, 2020 •

edited

eshaan7 commented Dec 2, 2020 •

edited

eshaan7 commented Dec 2, 2020 •

edited

mlodic commented Dec 2, 2020 •

edited

eshaan7 commented Dec 2, 2020 •

edited

eshaan7 commented Dec 2, 2020 •

edited

eshaan7 commented Dec 2, 2020 •

edited