Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Harvester Limit to the Apache Tomcat Logs #9558

Merged
merged 29 commits into from May 17, 2024

Conversation

ishleenk17
Copy link
Contributor

Proposed commit message

  • WHAT: Addition of harvester limit to the Apache Tomcat logs to control the number of files to be opened in parallel
  • WHY: There are users who have more files to harvest than the system limit.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Related issues

Screenshots

Screenshot 2024-04-10 at 4 18 31 PM

@ishleenk17 ishleenk17 added the enhancement New feature or request label Apr 10, 2024
@ishleenk17 ishleenk17 self-assigned this Apr 10, 2024
@ishleenk17 ishleenk17 requested a review from a team as a code owner April 10, 2024 10:49
@ishleenk17
Copy link
Contributor Author

ishleenk17 commented Apr 10, 2024

@belimawr : Since this is the very first Integration that is using Harvester Limit config option.
Could you please review as well.

Also, I see mentioned in the harvester_limit documentation that it is advisable to use the close_* config along with the harvester limit. Thoughts on that?

@elasticmachine
Copy link

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@harnish-elastic harnish-elastic self-requested a review April 12, 2024 05:08
- name: harvester_limit
type: integer
title: Harvester Limit
description: "Limits the number of harvesters that are started in parallel. More details [here](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html_)."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link that you have attached is broken. Can you please fix it?

Copy link
Contributor

@harnish-elastic harnish-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

harnish-elastic

This comment was marked as duplicate.

- name: harvester_limit
type: integer
title: Harvester Limit
description: "Limits the number of harvesters that are started in parallel. More details [here](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html#filebeat-input-log-harvester-limit)."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion]

I'm not completely sure how big the description can be, however it would be nice to mention a couple of things:

  • zero means no limit
  • This is per input, not a global setting.

The second point can definitely be left for the documentation linked, however the first one I feel it is important to have stated.

@belimawr
Copy link
Contributor

@belimawr : Since this is the very first Integration that is using Harvester Limit config option. Could you please review as well.

Also, I see mentioned in the harvester_limit documentation that it is advisable to use the close_* config along with the harvester limit. Thoughts on that?

Because the harvesters are limited, this means not all files will be opened/harvested at the same time. A new file will only be opened after:

  • A previously open file is closed
  • and there is another scan for new files/changed files.

Also because the next file to be opened is picked randomly, there is no guarantee of how long a file will wait to be harvested.

Because of this considerations our documentation suggests to set some of the close_* fields to close files more often, hence increasing the rotation between the files being harvested.

The default to close a file is after 5 minutes of inactivity. In some environments that might be too long.

Thinking about it, I believe this PR should also include some of the close_* settings. At least close_inactive and close_eof also exposed.

Copy link
Contributor

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for changing my review, but something just crossed my mind: Is this integration using Filestream or Log input?

The file changed is called filestream.yml.hbs, however there is no type: filestream set and the documentation link is for the Log input. I'm a bit confused here.

@belimawr belimawr self-requested a review April 12, 2024 09:22
@ishleenk17
Copy link
Contributor Author

Sorry for changing my review, but something just crossed my mind: Is this integration using Filestream or Log input?

The file changed is called filestream.yml.hbs, however there is no type: filestream set and the documentation link is for the Log input. I'm a bit confused here.

This Integration uses Filestream input type here. Documentation is for log input since there exists no documentation for harvester_limit for the filestream input.

@belimawr
Copy link
Contributor

Sorry for changing my review, but something just crossed my mind: Is this integration using Filestream or Log input?
The file changed is called filestream.yml.hbs, however there is no type: filestream set and the documentation link is for the Log input. I'm a bit confused here.

This Integration uses Filestream input type here. Documentation is for log input since there exists no documentation for harvester_limit for the filestream input.

That can be confusing for users. Even though the behaviour of the log input and filestream are the same, some of the configuration keys are different. I'm a bit concerned that this might cause some SDHs. However I don't see it as a blocker, as we can update the documentation link once we document it in Filestream.

One of the key difference are the close_* settings that on Filestream are under a different key, e.g: close.on_state_change.inactive and close.on_state_change.removed.

I understand the lack of documentation is a shortcoming from Filestream :/;

If it is not super urgent to merge this PR, maybe we can try to get the filestream documentation released first.

@ishleenk17
Copy link
Contributor Author

If it is not super urgent to merge this PR, maybe we can try to get the filestream documentation released first.

The PR can wait. What would be the timeline for documenting this ib Filestream ?

@ishleenk17
Copy link
Contributor Author

If it is not super urgent to merge this PR, maybe we can try to get the filestream documentation released first.

The PR can wait. What would be the timeline for documenting this ib Filestream ?

@belimawr : Can you share when this documentation would be ready ?

@belimawr
Copy link
Contributor

@pierrehilbert do you think we can squeeze elastic/beats#38784 in our current sprint?

It is a small doc update and it will help to keep the documentation for this integration consistent.

@pierrehilbert
Copy link

If you are able to finalize your P0 first, I have no objections with having it in the current sprint.

Copy link
Contributor

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some small suggestions, they apply to all entries.

packages/apache_tomcat/data_stream/access/manifest.yml Outdated Show resolved Hide resolved
packages/apache_tomcat/data_stream/access/manifest.yml Outdated Show resolved Hide resolved
- name: harvester_limit
type: integer
title: Harvester Limit
description: "Limits the number of harvesters that are started in parallel. More details [here](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html#filebeat-input-log-harvester-limit)."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning to remove this link. Once the beats changes are out for this documentation. Will link it then.
@belimawr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, no need for ""

Copy link
Contributor

@ritalwar ritalwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In descriptions, we're sometimes using quotation marks and sometimes not. I think we should be consistent, and ideally, they're not needed.

packages/apache_tomcat/data_stream/access/manifest.yml Outdated Show resolved Hide resolved
packages/apache_tomcat/data_stream/access/manifest.yml Outdated Show resolved Hide resolved
- name: harvester_limit
type: integer
title: Harvester Limit
description: "Limits the number of harvesters that are started in parallel. More details [here](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html#filebeat-input-log-harvester-limit)."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, no need for ""

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @ishleenk17

Copy link

@ishleenk17 ishleenk17 merged commit 85f83e6 into elastic:main May 17, 2024
5 checks passed
@elasticmachine
Copy link

Package apache_tomcat - 1.5.0 containing this change is available at https://epr.elastic.co/search?package=apache_tomcat

@htioekb
Copy link

htioekb commented May 17, 2024

Hello @ishleenk17,

{{#if close.on_state_change.inactive}}
close.on_state_change.inactive: 
{{/if}}

are we missing a value here?

After editing and saving the integration in Kibana I get the following error message:
can not read a block mapping entry; a multiline key may not be an implicit key at line 10, column 31: close.on_state_change.inactive: ^

@ishleenk17
Copy link
Contributor Author

Hello @ishleenk17,

{{#if close.on_state_change.inactive}}
close.on_state_change.inactive: 
{{/if}}

are we missing a value here?

After editing and saving the integration in Kibana I get the following error message: can not read a block mapping entry; a multiline key may not be an implicit key at line 10, column 31: close.on_state_change.inactive: ^

The issue is being fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants