Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch-diff-report-tool: add multi-thread support to CheckstyleReportsParser #344

Open
rnveach opened this issue Jan 13, 2019 · 5 comments
Open

Comments

@rnveach
Copy link
Member

rnveach commented Jan 13, 2019

Execution:

java -jar /home/ricky/opensource/contribution/patch-diff-report-tool/target/patch-diff-report-tool-0.1-SNAPSHOT-jar-with-dependencies.jar --baseReport /home/ricky/regression_reports/savemaster/openjdk10/results.xml --patchReport /home/ricky/regression_reports/savepull/openjdk10/results.xml --output /home/ricky/regression_reports/diff/openjdk10 --baseConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --patchConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --refFiles /home/ricky/regression_repositories/openjdk10
patch-diff-report-tool execution started.
XML parsing is started.

Config:

<?xml version="1.0"?>
<!DOCTYPE module PUBLIC
          "-//Puppy Crawl//DTD Check Configuration 1.3//EN"
          "http://www.puppycrawl.com/dtds/configuration_1_3.dtd">

<module name="Checker">
    <property name="charset" value="UTF-8"/>
    <property name="severity" value="warning"/>
    <property name="haltOnException" value="false"/>

    <module name="TreeWalker">

<module name="HiddenField">
  <property name="id" value="HiddenField1" />
</module>
<module name="HiddenField">
  <property name="id" value="HiddenField2" />
  <property name="ignoreConstructorParameter" value="true"/>
</module>
<module name="HiddenField">
  <property name="id" value="HiddenField3" />
  <property name="ignoreSetter" value="true"/>
</module>
<module name="HiddenField">
  <property name="id" value="HiddenField4" />
  <property name="setterCanReturnItsClass" value="true"/>
</module>

<module name="MagicNumber">
  <property name="id" value="MagicNumber1" />
</module>
<module name="MagicNumber">
  <property name="id" value="MagicNumber2" />
  <property name="ignoreHashCodeMethod" value="true" />
</module>
<module name="MagicNumber">
  <property name="id" value="MagicNumber3" />
  <property name="ignoreAnnotation" value="true" />
</module>
<module name="MagicNumber">
  <property name="id" value="MagicNumber4" />
  <property name="ignoreFieldDeclaration" value="true" />
</module>

<module name="ModifiedControlVariable">
  <property name="id" value="ModifiedControlVariable1" />
</module>
<module name="ModifiedControlVariable">
  <property name="id" value="ModifiedControlVariable2" />
  <property name="skipEnhancedForLoopVariable" value="true" />
</module>

<module name="MultipleVariableDeclarations"/>

    </module>
</module>

Both report XML files are 169 and 171 megs.

So far I have been waiting 45 minutes for the utility to finish. Based on the displays it has only been working on CheckstyleReportsParser.parse this entire time.
I have 4 cores (8 threads) which are sitting idle while I am waiting for this 1 thread to finish basically all the work.

I believe we could multi-thread the following section:

while (baseReader.hasNext() || patchReader.hasNext()) {
parseXmlPortion(content, baseReader, portionSize, BASE_REPORT_INDEX);
parseXmlPortion(content, patchReader, portionSize, PATCH_REPORT_INDEX);
}

Maybe process each file separately with its own thread and memory and then combine the 2 after it is all loaded into memory.

Edit: It finally finished after a bit more than 1 hour.

@rnveach rnveach changed the title patch-diff-report-tool: multi thread patch-diff-report-tool: add multi-thread support Jan 13, 2019
@rnveach
Copy link
Member Author

rnveach commented Jan 13, 2019

We could split these into 2 threads, but this at best this will only cut the amount of time needed by half. At best is half because we are doubling the number of threads, but we still need to merge the 2 back into 1 so we probably never reach perfection of half the time.
To split into more threads would require splitting up the XML into halves/quarters/etc. but I'm not sure how we can do that quickly without damaging the XML structure unless we force guarantees on how the XML file is structured.

@rnveach rnveach changed the title patch-diff-report-tool: add multi-thread support patch-diff-report-tool: add multi-thread support to CheckstyleReportsParser Jan 13, 2019
@rnveach
Copy link
Member Author

rnveach commented Jan 13, 2019

Starting to test out new change. Will update post as going on.
See rnveach@10fc72a

Old run:

www-data@ubuntu17:/tmp/test1$ time java -jar /home/ricky/opensource/contribution/patch-diff-report-tool/target/patch-diff-report-tool-0.1-SNAPSHOT-jar-with-dependencies.jar --baseReport /home/ricky/regression_reports/savemaster/openjdk10/results.xml --patchReport /home/ricky/regression_reports/savepull/openjdk10/results.xml --output /tmp/test1 --baseConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --patchConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --refFiles /home/ricky/regression_repositories/openjdk10
patch-diff-report-tool execution started.
XML parsing is started.
Creation of configuration report is started.
Creation of diff html site is started.
Creation of the result site succeed.
patch-diff-report-tool execution finished.

real	60m40.815s
user	60m53.476s
sys	0m2.844s

New run with XML reading multi-threaded:

www-data@ubuntu17:/tmp/test1$ time java -jar /home/ricky/opensource/contribution/patch-diff-report-tool/target/patch-diff-report-tool-0.1-SNAPSHOT-jar-with-dependencies.jar --baseReport /home/ricky/regression_reports/savemaster/openjdk10/results.xml --patchReport /home/ricky/regression_reports/savepull/openjdk10/results.xml --output /tmp/test1 --baseConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --patchConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --refFiles /home/ricky/regression_repositories/openjdk10
patch-diff-report-tool execution started.
XML parsing is started.
Start Parse. 2019.01.13 14.32.19.786
Parsing /home/ricky/regression_reports/savemaster/openjdk10/results.xml. 2019.01.13 14.32.19.791
Parsing /home/ricky/regression_reports/savepull/openjdk10/results.xml. 2019.01.13 14.32.19.791
Done Parsing /home/ricky/regression_reports/savepull/openjdk10/results.xml. 2019.01.13 14.32.29.465
Done Parsing /home/ricky/regression_reports/savemaster/openjdk10/results.xml. 2019.01.13 14.32.29.477
Merging XML results. 2019.01.13 14.32.29.477
Generate Diff Statistics.2019.01.13 15.33.43.012
End Parse. 2019.01.13 15.33.43.020
Creation of configuration report is started.
Creation of diff html site is started.
Creation of the result site succeed.
patch-diff-report-tool execution finished.

real	61m34.888s
user	61m59.280s
sys	0m2.752s

@rnveach
Copy link
Member Author

rnveach commented Jan 13, 2019

I can already tell the change won't have much impact.
Parsing the XML files finishes pretty fast (~10 seconds):

Start Parse. 2019.01.13 14.32.19.786
Parsing /home/ricky/regression_reports/savemaster/openjdk10/results.xml. 2019.01.13 14.32.19.791
Parsing /home/ricky/regression_reports/savepull/openjdk10/results.xml. 2019.01.13 14.32.19.791
Done Parsing /home/ricky/regression_reports/savepull/openjdk10/results.xml. 2019.01.13 14.32.29.465
Done Parsing /home/ricky/regression_reports/savemaster/openjdk10/results.xml. 2019.01.13 14.32.29.477

It is the merging that it is now stuck on. I will have to look at the method more closely, but the next obvious area of slowdown would be DiffReport.addRecords when it has to produce the differences.

Edit:
Yes it is the diff that is the slowdown. I can see it stalling on single file diffs like /home/ricky/regression_repositories/openjdk10/test/java/lang/instrument/BigClass.java.

Edit 2:
Latest change is definately seeing a performance increase.
See rnveach@51636a1

www-data@ubuntu17:/tmp/test1$ time java -jar /home/ricky/opensource/contribution/patch-diff-report-tool/target/patch-diff-report-tool-0.1-SNAPSHOT-jar-with-dependencies.jar --baseReport /home/ricky/regression_reports/savemaster/openjdk10/results.xml --patchReport /home/ricky/regression_reports/savepull/openjdk10/results.xml --output /tmp/test1 --baseConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --patchConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --refFiles /home/ricky/regression_repositories/openjdk10
patch-diff-report-tool execution started.
XML parsing is started.
Start Parse. 2019.01.13 16.12.46.652
Parsing /home/ricky/regression_reports/savemaster/openjdk10/results.xml. 2019.01.13 16.12.46.657
Parsing /home/ricky/regression_reports/savepull/openjdk10/results.xml. 2019.01.13 16.12.46.657
Done Parsing /home/ricky/regression_reports/savepull/openjdk10/results.xml. 2019.01.13 16.12.56.367
Done Parsing /home/ricky/regression_reports/savemaster/openjdk10/results.xml. 2019.01.13 16.12.56.432
Merging XML results. 2019.01.13 16.12.56.433
Generate Diff Statistics.2019.01.13 16.51.02.379
End Parse. 2019.01.13 16.51.02.397
Creation of configuration report is started.
Creation of diff html site is started.
Creation of the result site succeed.
patch-diff-report-tool execution finished.

real	39m15.898s
user	76m37.736s
sys	0m2.956s

@rnveach
Copy link
Member Author

rnveach commented Jan 14, 2019

I got it down to 20 minutes with the latest changes (originally 1 hour) using all 4 threads.
See rnveach@a5455f2

www-data@ubuntu17:/tmp/test1$ time java -jar /home/ricky/opensource/contribution/patch-diff-report-tool/target/patch-diff-report-tool-0.1-SNAPSHOT-jar-with-dependencies.jar --baseReport /home/ricky/regression_reports/savemaster/openjdk10/results.xml --patchReport /home/ricky/regression_reports/savepull/openjdk10/results.xml --output /tmp/test1 --baseConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --patchConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --refFiles /home/ricky/regression_repositories/openjdk10
patch-diff-report-tool execution started.
XML parsing is started.
Start Parse. 2019.01.13 20.36.02.781
Parsing /home/ricky/regression_reports/savepull/openjdk10/results.xml. 2019.01.13 20.36.02.786
Parsing /home/ricky/regression_reports/savemaster/openjdk10/results.xml. 2019.01.13 20.36.02.786
Done Parsing /home/ricky/regression_reports/savemaster/openjdk10/results.xml. 2019.01.13 20.36.12.516
Done Parsing /home/ricky/regression_reports/savepull/openjdk10/results.xml. 2019.01.13 20.36.12.603
Merging XML results. 2019.01.13 20.36.12.603
Generate Diff Statistics.2019.01.13 20.56.36.905
End Parse. 2019.01.13 20.56.36.913
Creation of configuration report is started.
Creation of diff html site is started.
Creation of the result site succeed.
patch-diff-report-tool execution finished.
^C
real	22m11.856s
user	81m50.440s
sys	0m9.072s

For whatever reason time doesn't end when the utility ends so I had to cancel it with Ctrl + C.

diff utility shows the output is the same between the 2 runs.

@rnveach
Copy link
Member Author

rnveach commented Jan 14, 2019

For whatever reason time doesn't end when the utility ends so I had to cancel it with Ctrl + C.

The ExecutorService has to be shutdown manually otherwise the threads continue to live in the background and hold up the process from exiting. A regression test run wouldn't finish without doing this.

@romani I will continue testing with regression to make sure I don't see anything else.

rnveach added a commit to rnveach/contribution that referenced this issue Oct 23, 2020
pbludov pushed a commit to pbludov/contribution that referenced this issue Oct 31, 2020
pbludov pushed a commit to pbludov/contribution that referenced this issue Oct 31, 2020
pbludov pushed a commit to pbludov/contribution that referenced this issue Oct 31, 2020
pbludov pushed a commit to pbludov/contribution that referenced this issue Oct 31, 2020
pbludov pushed a commit to pbludov/contribution that referenced this issue Oct 31, 2020
pbludov pushed a commit to pbludov/contribution that referenced this issue Oct 31, 2020
pbludov pushed a commit to pbludov/contribution that referenced this issue Nov 1, 2020
pbludov pushed a commit to pbludov/contribution that referenced this issue Nov 1, 2020
pbludov pushed a commit to pbludov/contribution that referenced this issue Nov 4, 2020
pbludov pushed a commit to pbludov/contribution that referenced this issue Nov 4, 2020
pbludov pushed a commit to pbludov/contribution that referenced this issue Nov 4, 2020
rnveach added a commit to rnveach/contribution that referenced this issue Jan 8, 2021
rnveach added a commit to rnveach/contribution that referenced this issue Jun 11, 2021
rnveach added a commit to rnveach/contribution that referenced this issue Oct 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant