Benchmarks for Parsing and caching auto generated XForms #454

JohnTheBeloved · 2019-07-05T15:15:07Z

Closes #

It's an improvement of the recent Benchmarking PR and intended to run Benchmarks on Synthetic/Dynamic forms instead of static forms

What has been done to verify that this works as intended?

Benchmarks have being run on travis CI and results

Why is this the best possible solution? Were any other approaches considered?

It's still a Work in progress, so suggestions are most welcome

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Changes aren't part of the main sourceset so no user impact is expected

Do we need any specific form for testing your changes? If so, please attach one.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

lognaturel · 2019-07-06T03:19:17Z

Wow, there's a lot here, @JohnTheBeloved! Can you please describe your high-level approach and what decisions you made along the way? For example, why did you choose to generate main instances with the full range of possible controls and types? Do you expect there to be a performance difference between them? If so, then how can results between runs be compared?

Broadly, it seems to me that in this context, the forms that benchmarks are run over need to be known and stable for results to be comparable. That is, dynamically generating a certain number of elements of a predictable structure seems very valuable. I'm less sure about what the randomness brings and am interested in hearing more about why you chose that design.

JohnTheBeloved · 2019-07-08T14:31:59Z

Representing the whole XForm Structure in code can be a complex process as it comprises of various parts of the model and body as stated in the XForm specification.

This PR does not cover all the parts of the specification, rather the major parts of the xform structure which can be used to substantiatively verify some aspects of JavaRosa performance - in this case 1. Creating FormDef and 2. Caching the FormDef.

However, the synthetic XForm Builder I implemented wasn't limited to our current benchmarks in mind (which is reason you see other control types defined- they can be removed). Only 2 control types were used for this benchmarks

input text(string)
select control (was used due to the time taken to populate the options of a question)

I am not assuming that I am very experienced in most parts of the XForm structure and also that this PR covers all possible parts of the XForm Definition (XFormSpec)

The implementation was rather to quickly analyse benchmarks for different complexities of an XForm - complexity in in terms of

No Of Questions,
No Of Question Groups,
No of Internal Secondary Instances
No of External Secondary Instances
No of Elements in each secondary instance

The Builder Design pattern was employed in creating the different parts of the XML strings of the XForm Structure

Main Instance
Secondary Instance
Bindings
Controls

Some of the areas not implemented into the XFormBuilder class include

Itext translations
Other control types other than from select list and input text

As you can see below, this summarises the XFormBuilder build flow for easier maintainability

buildHtml()
                .buildHead()
                .buildBody()
                .buildTitle()
                .buildModel()
                .buildPrimaryInstance()
                .buildInternalSecondaryInstances()
                .buildExternalSecondaryInstances()
                .buildBind()
                .buildControls()
                .toString();

each build X method abstracts logic for creation of the corresponding X part of the Xform Structure

TO answer your questions directly

Why did you choose to generate main instances with the full range of possible controls and types?

Other control types were defined in case there is a future need for them to be used, they are not being used currently apart from the select and input text

The select was specifically used because the time taken to populate the the options of a question was put into consideration both when running the benchmarks and during manual testing of the forms
All odd indexes are text input controls while even indexes are select input controls

Do you expect there to be a performance difference between them?

Other than considering time to populate the options, NO , also the relative performance between control types is not what was intended.

I'm less sure about what the randomness brings and am interested in hearing more about why you chose that design

By Random, you mean the options of the select control, It's being used to pick any of the internal instances randomly.

JohnTheBeloved · 2019-07-08T15:27:02Z

The snippet below defines the benchmark parameters for running possible combinations that define an XForm.

In this case benchmarks

         @Param({"10", "500"})
        public int noOfQuestions = 500;
        @Param({"10", "50"})
        public int noOfInternalSecondaryInstances = 10;
        @Param({"50", "1000"})
        public int noOf2ndryInstanceElements = 1000;
        @Param({"0"})
        public int noOfQuestionGroups = 0;
        @Param({"0","50"})
        public int noOfExternalSecondaryInstances = 1;

The Benchmark would be run for combinations of the products of the parameters.

FormDef2CacheBenchmark.runBenchmark  50    0    1    0    10   avgt    2   0.004           s/op
FormDef2CacheBenchmark.runBenchmark  50    0    1    0    200  avgt    2   0.022           s/op
FormDef2CacheBenchmark.runBenchmark  50     0   1    0    500  avgt    2   0.049           s/op
FormDef2CacheBenchmark.runBenchmark  50     0   10   0    10   avgt    2   0.031           s/op
FormDef2CacheBenchmark.runBenchmark  50     0   10   0    200  avgt    2   0.048           s/op
FormDef2CacheBenchmark.runBenchmark  50     0   10   0    500  avgt    2   0.075           s/op
FormDef2CacheBenchmark.runBenchmark  500    0   1    0     10  avgt    2   0.031           s/op
FormDef2CacheBenchmark.runBenchmark  500    0   1    0    200  avgt    2   0.048           s/op
FormDef2CacheBenchmark.runBenchmark  500    0   1    0    500  avgt    2   0.075           s/op
FormDef2CacheBenchmark.runBenchmark  500    0   10   0     10  avgt    2   0.283           s/op
FormDef2CacheBenchmark.runBenchmark  500    0   10   0    200  avgt    2   0.310           s/op
FormDef2CacheBenchmark.runBenchmark  500    0   10   0    500  avgt    2   0.337           s/op
FormDef2CacheBenchmark.runBenchmark  5000   0   1    0     10  avgt    2   0.294           s/op
FormDef2CacheBenchmark.runBenchmark  5000   0   1    0    200  avgt    2   0.311           s/op
FormDef2CacheBenchmark.runBenchmark  5000   0   1    0    500  avgt    2   0.344           s/op
FormDef2CacheBenchmark.runBenchmark  5000   0   10   0     10  avgt    2   2.810           s/op
FormDef2CacheBenchmark.runBenchmark  5000   0   10   0    200  avgt    2   2.846           s/op
FormDef2CacheBenchmark.runBenchmark  5000   0   10   0    500  avgt    2   2.894           s/op
FormDef2CacheBenchmark.runBenchmark  50000  0   1    0     10  avgt    2   2.810           s/op
FormDef2CacheBenchmark.runBenchmark  50000  0   1    0    200  avgt    2   2.867           s/op
FormDef2CacheBenchmark.runBenchmark  50000  0   1    0    500  avgt    2   2.903           s/op
FormDef2CacheBenchmark.runBenchmark  50000  0   10   0     10  avgt    2  29.020           s/op
FormDef2CacheBenchmark.runBenchmark  50000  0   10   0    200  avgt    2  30.210           s/op
FormDef2CacheBenchmark.runBenchmark  50000  0   10   0    500  avgt    2  29.340           s/op

lognaturel · 2019-07-09T05:22:57Z

Thank you for the extra context. In general, the simpler the approach and code, the more likely it is someone else can give meaningful feedback and eventually build on it, especially when there are no comments or context in commits. I completely understand wanting to plan ahead and it's really cool that you dove deep into generating forms. That said, it would really have helped to have as narrowly-scoped of a PR as possible.

At a high level, the Cache2FormDefBenchmark, FormDef2CacheBenchmark and XForm2FormDefBenchmark look like exactly the implementations to evaluate form opening experience. Building fake secondary instances that have values and labels that are sequential numbers looks great.

We talked about this briefly last week but I want to confirm one more time. The results you are getting out of these benchmarks are correlated with the results you have gotten in profiling sessions on real Android devices running Collect, right? That is, if these benchmarks show a linear relationship between number of elements in a secondary instance and time taken to load from cache, it's also roughly linear in profiling? I say roughly because the benchmarking environment is much more controlled but if somehow the shape of the curve is different, there's a problem. I can't imagine what would lead to that but stranger things have happened.

The select was specifically used because the time taken to populate the the options of a question was put into consideration both when running the benchmarks and during manual testing of the forms

Are you saying you actually measured a difference between having selects and other question types in the primary instance? That is surprising to me because I don't believe choices are populated from secondary instances until the question is displayed. Either way, this assumption and its implications should be documented explicitly in the code or in a commit message.

Focusing on the three benchmark classes listed above, could you please write a little about how you picked the parameter values you used? Do you feel like all combinations in the cross product add value? What relationships are you trying to identify and would it be possible to identify them with fewer cases? The 10 instances with 50000 elements each case in particular seems unrealistic and time consuming to run. Is it possible to give a CSV of input values like with junit to avoid the full cross product?

lognaturel · 2019-07-09T03:23:39Z

build.gradle

@@ -77,15 +79,15 @@ jmh {
    exclude = "(BenchmarkTemplate)"
    threads = 1
    fork = 1


What led to these changes? In particular, I thought you ended up with a high number of warmup iterations because the results weren't stable otherwise. Is that not the case?

lognaturel · 2019-07-09T03:24:59Z

src/jmh/java/org/javarosa/benchmarks/Cache2FormDefBenchmark.java

+        @Param({"0"})
+        public int noOfQuestionGroups = 1;
+        @Param({"0"})
+        public int noOfExternalSecondaryInstances = 50;


It looks like the explicit value that the parameters are set to are ignored in favor of the injected values. Is there a reason to have those explicit values? It would be clearer to just declare the fields without initializing them.

This was set in other to debug and test as a java class without running the benchmarks,

…alization/Deserialization of FormDef to cache

… benchmark_refactor

codecov-io · 2019-07-09T23:50:17Z

Codecov Report

Merging #454 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##             master     #454      +/-   ##
============================================
- Coverage     49.03%   49.02%   -0.02%     
+ Complexity     2940     2939       -1     
============================================
  Files           246      246              
  Lines         13698    13698              
  Branches       2650     2650              
============================================
- Hits           6717     6715       -2     
  Misses         6132     6132              
- Partials        849      851       +2

Impacted Files	Coverage Δ	Complexity Δ
...a/org/javarosa/core/services/PrototypeManager.java	`79.16% <0%> (-8.34%)`	`8% <0%> (-1%)`
...org/javarosa/core/model/condition/Triggerable.java	`67.76% <0%> (ø)`	`25% <0%> (ø)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 196cd18...8808a60. Read the comment docs.

JohnTheBeloved · 2019-07-10T08:07:59Z

Hi @lognaturel

I initially created the draft to give to an idea of how I implemented the mock xform creator,

I have made a couple of clean ups which should make things a bit more clearer.

As for the comparison between the benchmarks and profilings, Yes they are commensurate, Do you need some data to back that up?

I just assumed the options were used during the Question creation in a way, I have remove the select list control, so we just have input text as the only control type being used

The parameters that were used categorised the Questions, and Options to

Small
Medium
Large
Extralarge

I don't think it we have to benchmark only real world parameters at all times, The outrageous parameters also serve as a means of a stress test for the methods and also to see where efficiency begins to drop.

lognaturel · 2019-07-17T15:53:45Z

As for the comparison between the benchmarks and profilings, Yes they are commensurate, Do you need some data to back that up?

That's great. If you've taken a look and confirmed it, nothing more needed, I think.

I'll take another pass through soon. Overall my goal is not to be too picky about this now. We can look at individual benchmarks in more detail as they're being used to evaluate possible changes.

JohnTheBeloved changed the title ~~Benchmark refactor~~ WIP: Benchmark refactor Jul 5, 2019

lognaturel reviewed Jul 9, 2019

View reviewed changes

JohnTheBeloved added 4 commits July 9, 2019 20:12

Added model abstractions for parts of the XForm

3b114b4

Added Util classes used for building the Xform XML String

3495347

File generates the XForm file

73d18fa

Updated utility methods for getting working directory and cache path

c38e46b

JohnTheBeloved force-pushed the benchmark_refactor branch from 381d023 to c38e46b Compare July 9, 2019 23:06

JohnTheBeloved added 3 commits July 10, 2019 00:19

Added Benchmarks for Parsing autogenerated Xforms to FormDef and Seri…

d9de453

…alization/Deserialization of FormDef to cache

Merge branch 'master' of https://github.com/opendatakit/javarosa into…

019350e

… benchmark_refactor

Refactorings and fixings

8808a60

JohnTheBeloved marked this pull request as ready for review July 10, 2019 09:09

JohnTheBeloved changed the title ~~WIP: Benchmark refactor~~ Benchmark for Parsing and caching auto generated XForms Jul 10, 2019

JohnTheBeloved changed the title ~~Benchmark for Parsing and caching auto generated XForms~~ Benchmarks for Parsing and caching auto generated XForms Jul 10, 2019

Merge branch 'master' into benchmark_refactor

a0ce6b4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks for Parsing and caching auto generated XForms #454

Benchmarks for Parsing and caching auto generated XForms #454

JohnTheBeloved commented Jul 5, 2019 •

edited

lognaturel commented Jul 6, 2019

JohnTheBeloved commented Jul 8, 2019 •

edited

JohnTheBeloved commented Jul 8, 2019 •

edited

lognaturel commented Jul 9, 2019

lognaturel Jul 9, 2019

lognaturel Jul 9, 2019 •

edited

JohnTheBeloved Jul 10, 2019

codecov-io commented Jul 9, 2019

JohnTheBeloved commented Jul 10, 2019

lognaturel commented Jul 17, 2019

Benchmarks for Parsing and caching auto generated XForms #454

Are you sure you want to change the base?

Benchmarks for Parsing and caching auto generated XForms #454

Conversation

JohnTheBeloved commented Jul 5, 2019 • edited

What has been done to verify that this works as intended?

Why is this the best possible solution? Were any other approaches considered?

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Do we need any specific form for testing your changes? If so, please attach one.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

lognaturel commented Jul 6, 2019

JohnTheBeloved commented Jul 8, 2019 • edited

JohnTheBeloved commented Jul 8, 2019 • edited

lognaturel commented Jul 9, 2019

lognaturel Jul 9, 2019

Choose a reason for hiding this comment

lognaturel Jul 9, 2019 • edited

Choose a reason for hiding this comment

JohnTheBeloved Jul 10, 2019

Choose a reason for hiding this comment

codecov-io commented Jul 9, 2019

Codecov Report

JohnTheBeloved commented Jul 10, 2019

lognaturel commented Jul 17, 2019

JohnTheBeloved commented Jul 5, 2019 •

edited

JohnTheBeloved commented Jul 8, 2019 •

edited

JohnTheBeloved commented Jul 8, 2019 •

edited

lognaturel Jul 9, 2019 •

edited