Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible race condition with API log rotation and API calls #485

Open
davidjiglesias opened this issue May 8, 2020 · 0 comments
Open

Possible race condition with API log rotation and API calls #485

davidjiglesias opened this issue May 8, 2020 · 0 comments

Comments

@davidjiglesias
Copy link
Member

davidjiglesias commented May 8, 2020

Hello team,

I have been studying a possible race condition occurring in the API when some API calls are made at the exact same time the API is rotating its log at midnight. When that happens, the API receives a SIGTERM and terminates.

Based on these logs provided by one of our users:

root@wazuh-manager:/var/ossec/logs# cat api.log
WazuhAPI 2020-05-06 00:00:00 foo: Rotated: /var/ossec/logs/api/2020/May/api-06-1.gz
WazuhAPI 2020-05-06 00:00:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 00:00:00 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 00:00:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 00:00:01 foo: Exiting... (SIGTERM)
WazuhAPI 2020-05-06 06:08:27 : Listening on: https://:::55000
WazuhAPI 2020-05-06 06:08:35 foo: [::ffff:192.168.0.4] GET /manager/info? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:36 foo: [::ffff:192.168.0.4] GET /version? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:45 foo: [::ffff:192.168.0.4] GET /manager/info? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/?limit=1&sort=-dateAdd&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/summary? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/groups? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/stats/distinct?fields=version&select=version - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/stats/distinct?fields=node_name&select=node_name - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:47 foo: [::ffff:192.168.0.4] GET /agents/stats/distinct?fields=os.name%2Cos.platform%2Cos.version&select=os.name%2Cos.platform%2Cos.version - 200 - error: '0'.
WazuhAPI 2020-05-06 06:08:48 foo: [::ffff:192.168.0.4] GET /agents/?limit=500&offset=0 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:15:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:15:01 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:15:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:30:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:30:00 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:30:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:45:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 06:45:01 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 06:45:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:00:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:00:00 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 07:00:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:15:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:15:01 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 07:15:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:30:00 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=1&q=id!%3D000 - 200 - error: '0'.
WazuhAPI 2020-05-06 07:30:01 foo: [::ffff:192.168.0.4] GET /cluster/status? - 200 - error: '0'.
WazuhAPI 2020-05-06 07:30:01 foo: [::ffff:192.168.0.4] GET /agents/?offset=0&limit=500&q=id!%3D000 - 200 - error: '0'.

We can see there are three different calls (GET /agents?limit=1, GET /cluster/status and GET /agents) scheduled by the Kibana cronjob every 15 minutes, and they are occurring at exact hours in this case (07:00, 07:15, 07:30, etc). They do not always occur at exact hours, it depends on first execution.

The API log rotation occurs exactly at 00:00 and will concur in this case with some of those calls.

We should further investigate the issue based on these premises.

Best regards,

David J. Iglesias

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant