Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce data aggregation (#194) #234

Draft
wants to merge 3 commits into
base: develop
Choose a base branch
from
Draft

Conversation

stklcode
Copy link
Contributor

@stklcode stklcode commented Sep 8, 2022

We add a new column named hits to the database table with default value of 1.
The tracking routine is preserved as is, i.e. it will insert a single row per hit.

The cleanup routine (cron job) is extended with an aggregation function that walks through the database and groups distinct date/referrer/target combinations with the sum of all hits.

before:

created referrer target hits
2022-09-07 example.com / 1
2022-09-07 pluginkollektiv.org /test/ 1
2022-09-07 example.com / 1
2022-09-07 example.com / 1
2022-09-08 pluginkollektiv.org /test/ 1
2022-09-08 pluginkollektiv.org / 1

after:

created referrer target hits
2022-09-07 example.com / 3
2022-09-07 pluginkollektiv.org /test/ 1
2022-09-08 pluginkollektiv.org /test/ 1
2022-09-08 pluginkollektiv.org / 1

The output is generated from SUM(hits) instead of COUNT(*) then.

We also introduce a boolean book statify__skip_aggregation (defaults to false) to keep the previous behavior and disable the new aggregation.

This might be necessary if the inserted entry is extended with custom columns, e.g. in a statify__visit_saved hook. If this is the case, aggregation will fail, because we don't know about additional columns. We could of course add another hook for custom aggregation columns and dynamically build up the statements, but for most users this won't be necessary and it can be extended in future releases as well.

@stklcode stklcode linked an issue Sep 8, 2022 that may be closed by this pull request
This column will be used for aggregation and is filled with 1 initially.
The number of hits for a specific date/target/referrer is thus no longer
the number of entries, but the sum of all hits.
We now aggregate statistics data in the cleanup routine preserving
only distinct entries for each date/referrer/target combination with
the sum of all hits.
If Statify is extended by custom columns the aggregation routine will
fail. To support such scenarios, we introduce a boolean hook, so the
previous behavior can be preserved without breaking compatibility.
@patrickrobrecht
Copy link
Member

These changes would require changes to keep the functionality of Statify - Extended Evaluation, but hopefully helps to improve the performance with large Statify tables (I've installs with 1,000,000+ entries): patrickrobrecht/extended-evaluation-for-statify#29.

@stklcode stklcode force-pushed the feature/194-aggregation branch 2 times, most recently from 48543bb to e789310 Compare October 18, 2023 09:03
@sonarcloud
Copy link

sonarcloud bot commented Oct 18, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@stklcode
Copy link
Contributor Author

stklcode commented Nov 5, 2023

As previously discussed in various places this feature might need some additional planning.

  • With planned additional data collection (probably even dynamically extensible) this might require constant refactoring.
  • Is database size really still an issue so that it's worth the effort?
  • Probably drop this idea in favor of maybe some more elaborate caching of evaluation.

@stklcode stklcode marked this pull request as draft March 29, 2024 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data aggregation
3 participants