Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][jira] Jira changelog extractor and converter support Incremental Mode #7385

Open
3 tasks done
klesh opened this issue Apr 26, 2024 · 0 comments
Open
3 tasks done
Labels
type/feature-request This issue is a proposal for something new

Comments

@klesh
Copy link
Contributor

klesh commented Apr 26, 2024

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Use case

Reduce the time to crunching massive amounts of data

Description

Currently, Extractors and Converters operate exclusively in Full Sync mode, which involves deleting all target data and regenerating it via a Delete + Insert process. While effective, this approach poses several issues:

  1. Scalability Concerns: As the volume of records increases in the source tables, the time required for conversion scales linearly. In particular, operations such as the jira issue changelogs extraction and conversion have been reported to take up to half an hour. This is significantly slower than the data collection phase, impacting overall efficiency.
  2. Database Efficiency: Running in Full Sync mode tends to cause database bloat, particularly in databases like PostgreSQL. This bloat is evidenced by the table size being disproportionately large compared to the actual data stored — in some cases, as extreme as 18GB of space used for 1GB of actual data.

image

Proposed Solution:

I propose that extractors and converters should be enhanced to support Incremental Mode. This mode would enable the components to only process and insert new or changed data since the last collection, rather than performing a full refresh each time. This would likely yield the following benefits:

  1. Reduced Processing Time: Incremental updates would significantly reduce the time required for data conversion, as only new or changed records would be processed.
  2. Improved Database Performance: By avoiding the deletion and re-insertion of large volumes of data, we can prevent database bloat, leading to better utilization of resources and potentially lower storage costs.

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature-request This issue is a proposal for something new
Projects
None yet
Development

No branches or pull requests

1 participant