Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk Collection and Sending of Tracking Requests #168

Open
dheid opened this issue Nov 24, 2023 · 3 comments
Open

Bulk Collection and Sending of Tracking Requests #168

dheid opened this issue Nov 24, 2023 · 3 comments
Assignees

Comments

@dheid
Copy link
Collaborator

dheid commented Nov 24, 2023

Feature Description

Currently, the Matomo Java Tracker sends each tracking request to the Matomo server as soon as it is created. This can lead to a large number of individual requests being sent to the server, especially in high-traffic applications.

I propose adding a feature that allows the tracker to collect multiple tracking requests over a certain delay period and then send them all at once in a bulk request. This could potentially reduce the load on the Matomo server and improve the performance of the tracker.

Proposed Implementation

The tracker could have a configurable delay period (for example, 5 seconds) during which it collects all created tracking requests. At the end of this delay period, it sends all collected requests to the Matomo server in a single bulk request.

This feature could be optional and controlled by a new configuration property (for example, matomo.tracker.bulk-collection-delay). If this property is not set or set to 0, the tracker operates as it currently does, sending each request immediately.

Potential Challenges

One challenge could be ensuring that the tracker correctly handles the case where a new tracking request is created while it is in the middle of sending a bulk request. We would need to make sure that this new request is either included in the current bulk request (if possible) or held for the next bulk request.

Another challenge could be error handling for the bulk request. If the Matomo server returns an error for the bulk request, we would need a way to determine which individual request(s) caused the error.

Impact

This feature could significantly reduce the number of requests that the tracker sends to the Matomo server, potentially improving performance for both the tracker and the server. It could be particularly beneficial for high-traffic applications that generate a large number of tracking requests.

@dheid dheid self-assigned this Nov 24, 2023
@renatocjn
Copy link

Hi, I have been having some issues related to this. I'm guessing that implementing the periodic bulk tracking that you suggest here would solve it.

My app currently calls the bulk submission MatomoTracker::sendBulkRequestAsync to send a set of actions at the same time to the server. The problem I'm having is that when doing these in parallel, the calls block as if the request is being transmitted synchronously.

After debugging a bit, I think the issue is that the Java8Sender being used under the hood has synchronized blocks on the same variable on the function that queues the requests and on the function that transmits the requests (See L332-L341 and L351-L357). I'm using version 3.2.0.

To solve my issue, I implemented the async myself with my own Executor and supplyAsync calls to the sendBulk function. Perhaps a better solution would be to do the sendBulk call from outside the synchronized block on the sender code or use the periodic transmission that you suggest.

@dheid
Copy link
Collaborator Author

dheid commented Apr 19, 2024

Oh, thanks so much! That sounds awesome! I will consider that.

dheid added a commit that referenced this issue Apr 19, 2024
@dheid
Copy link
Collaborator Author

dheid commented Apr 19, 2024

@renatocjn Thanks for your analysis. The synchronization on queries came from the Matomo tracker I once created that was able to collect multiple send executions and create a bulk from them within a configurable delay. I removed that functionality due to the scope of the integration. However I forget to remove the synchronization on the queries.

I removed it until the feature is complete. You'll find a version 3.4.0 that contains a fix for that. No synchronization needed any longer. The bulk collection is not yet implemented in version 3.4.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants