-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate increasing bulk publish limit of 10 activities #1408
Comments
We're currently unable state the exact number as the limit of bulk publish. The limitation is relative and we believe it is mostly relative to the number of nodes and the depth of nodes of an activity rather than the number of activities. To increase the bulk publish limit, we'll need to test the max number of activities we can bulk publish without getting a system level error (like queue timeout, or memory error). To test this, we've come up with a test plan that looks something like this:
Upon finding the limit, we can then look into performing other improvements or workflow change/reconsideration. |
@PG-Momik @praweshsth - the above approach sounds sensible to me as a way to start testing limits. Feel free to estimate work if applicable or let me know if you want to discuss further? |
@emmajclegg here's a quick expansion on the things we want to test (Testables) and how we will be testing (Testing process/ Task) on the limit of bulk publish. Estimation for this issue have been made accordingly. Testables
Testing process/ Task
Important
|
Thanks @PG-Momik, @praweshsth - sounds like a good plan so I'm happy for you to start working on this. To note, the design work on #1423 is higher priority if that can be worked on this week. It's affecting more end users than this bulk publishing limit. |
@emmajclegg Hello Emma, just wanted to give you an update regarding this issue. We have started testing the bulk publish process. Initially we created an activity with 100% completeness which included 50 transactions, 50 results where each result had 50 indicators and each indicator had 25 periods each. With this configuration, the XML file created reached over 62MB for a single activity (not the merged file with multiple activities). It seems that IATI validator does not allow files over 60MB to be validated and hence, we received an error when trying to validate a single activity. We realized that we had input too much data in a single activity and no user would have so much data in a single activity. So, we updated the data by reducing the no of periods of each indicator to 10 and also reduced the total no of indicator for each result to 10 instead of the previously placed 25 and 50 respectively. We did not change the no of transactions or the no of results. With these new changes, the size of the XML file for an activity reduced to a little over 16MB which can be validated using IATI validator and so we will be moving forward with the aforementioned tests with the data we have input in the activity. However, unlike initially stated, we will start testing with a small no of activities and then move on for larger number instead of going the opposite direction i.e we will test initially for 5 activities, then for 10 and slowing increament the no of activities. |
@emmajclegg We moved ahead as stated above but found that when publishing even a single activity (with the large amounts of data), it was taking a considerably large amount of time. |
Ok thanks for the update @Sanilblank - it sounds like you're still troubleshooting the issue found. Let me know once you reach an activity limit - i.e. the point where a decision's needed on whether to invest development time on being able to bulk publish more activities. We can discuss together what's worth doing from that point. |
@emmajclegg We made a small adjustment to the timeout variable in the code for the bulk publishing activities job. With this change the system was able to perform the tasks of bulk publishing 25 activities at one time. Upon trying for 35 or more, we encountered an error and so we stopped the process here. |
Thanks @Sanilblank
Do you need any feedback here, or will push changes to production when ready?
If it includes no identifiable information on individual users or publishers then feel free to share it here, attached to the issue. I'll have a look at the findings from the bulk publishing testing and will likely suggest we pause work on it here, to gather more evidence from users on whether raising the activity publishing limit is worth additional work. To inform that, please share any ideas you have on what development work would be needed to increase the bulk publishing limit at this point. |
@emmajclegg I have attached the link to the spreadsheet [here].(https://docs.google.com/spreadsheets/d/1s31ptnV_L3Qwog56QEzhpYiBQ7nspCLnpyIwnby1OiI/edit?usp=sharing) The time has been recorded in seconds instead of minutes.
Basically the code is not changed all that much, we have just performed an eager loading process which tries to reduce the no of queries done to the database and improves performance. Our QA team will test these changes to ensure that no problems arise. I don't think we will need any other feedback here as not much has changed. We are performing the testing from another branch. So, if you do not have any confusions, you can close this issue. |
Ok, sounds good @Sanilblank . To check I understand the spreadsheet testing results:
So it took 116 seconds (approx 2 minutes) to publish 1 activity: |
Yes, this is correct.
Not exactly, the ones in red represent the overall total time for the registry validation job and bulk publishing job respectively. |
Thanks @Sanilblank - that makes sense. I'll close this issue as advised. |
To check initially - what would the impact be of increasing the bulk publish limit from 10 to e.g. 50 or 100 activities? (100 is the maximum number of activities we recommend for Publisher, so it seems a reasonable upper limit)
While users who are manually editing individual activities may never need to bulk publish, there are some users importing higher activity volumes where having to publish in batches of 10 must be limiting.
It's also not completely clear currently how many activities are being selected with the button below, when the user has activities across multiple pages.
Happy to discuss what's proportionate here.
The text was updated successfully, but these errors were encountered: