Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make tests run faster #517

Open
edi9999 opened this issue May 13, 2020 · 25 comments
Open

Make tests run faster #517

edi9999 opened this issue May 13, 2020 · 25 comments

Comments

@edi9999
Copy link
Member

edi9999 commented May 13, 2020

I am very willing to have fast tests. This is because if you have tests that take less than a second to run, it is possible to have them run in watch mode, and save your changes and see immediately that your tests are still passing.

With the time however, the tests are running in a bit more than 1 second in watch mode (on my computer, it takes 1940ms on average).

I think this could be improved upon.

First things I would like to do is to measure things. There are separate steps for running the tests :

  1. Load all docx fixtures into nodejs buffers
  2. Unzip the docx with pizzip
  3. Do some templating with docxtemplater
  4. Have some expectations on the output.

Since 1. is probably cached by the "page cache" : https://en.wikipedia.org/wiki/Page_cache

And I expect that 3. should be fast, I'm guessing that 2 && 4 could be improved upon.

Measuring is really the first step, the ideas should not be implemented before doing proper measuring and reporting the results in this issue.

Idea 1 (cached zip)

One of my ideas would be that for tests, we don't use Pizzip to unzip the docx for each test run (which takes time), but instead, we could create a mock that would have a "cached" version of a zip, which would be stored as a folder. For example, if you have in : examples/foo.docx , a docx document is a zipfile which you can unzip. If you unzip that, you get a folder like this :

foo
├── [Content_Types].xml
├── docProps
│   ├── app.xml
│   └── core.xml
├── _rels
└── word
    ├── document.xml
    ├── fontTable.xml
    ├── numbering.xml
    ├── _rels
    │   └── document.xml.rels
    ├── settings.xml
    ├── stylesWithEffects.xml
    ├── styles.xml
    ├── theme
    │   └── theme1.xml
    └── webSettings.xml

We could store this folder instead of foo.docx, and by creating a good mock of Pizzip, we could avoid to unzip the docx for each test run but instead read the files that we want from disk.

Idea 2 (parallel testing)

As far as I can tell, mocha currently runs each test suite sequentially (on one CPU only). Since most computers nowadays have at least 4 or 8 CPUs, it would make sense to run the tests in multiple processes.

@edi9999
Copy link
Member Author

edi9999 commented May 21, 2020

For the change of zip loading, the code that is called a lot is the load function :

function load(name, content, obj) {
         // name is the full file name for example :   "name": "tag-example.docx"
         // content is the content of that filename, which is a binary string
        // obj is a cache of each docxtemplater instance that was created using load.
	const zip = new PizZip(content);
	obj[name] = new Docxtemplater();
	obj[name].loadZip(zip);
	obj[name].loadedName = name;
	obj[name].loadedContent = content;
	return obj[name];
}

Since this load function calls new PizZip(content), if new PizZip is slow, then by changing PizZip to something faster would make the tests run faster.

It would be interesting to see, for a test run, how much time is spent running new Pizzip(content) here.

@edi9999
Copy link
Member Author

edi9999 commented May 21, 2020

If the measurements make it clear that PizZip is slow, it would make sense to write basically :

const zip = new MockedZip(name);

and the MockedZip would "behave" exactly as the PizZip library, except that underneath, it uses a folder which is already unzipped.

@edi9999
Copy link
Member Author

edi9999 commented May 21, 2020

The two ideas I had were just a starting point, probably by using a profiler we could find which spots cause the slowlessness.

@edi9999
Copy link
Member Author

edi9999 commented May 21, 2020

You can now to use npm run test:watch (after pulling since I just pushed this).

@shadab14meb346
Copy link
Contributor

shadab14meb346 commented May 21, 2020

Benchmark Results

method for this process.hrtime();

  1. load function mentioned in this discussion is called total 402 times for complete test using command npm test. And it has taken an average of 0.612358 seconds for almost 10 times running it.
    It varies from 0.742381 seconds to 0.49462005 seconds for my computer which i5 third generation.
    Kazam_screenshot_00004 Benchmark Results Pic1

git diff was not showing the console.log part of code so included screenshot showing the comparing. Left side window has old code and right side window have new code to benchmark allload function calls.
benchmark-code-changes Benchmark Results Pic2

let loadFunctionCallFrequency = 0;
let totalCallTimeInNanoSeconds = 0;
function load(name, content, obj) {
	let start = process.hrtime();
	loadFunctionCallFrequency++;
	const zip = new PizZip(content);
	obj[name] = new Docxtemplater();
	obj[name].loadZip(zip);
	obj[name].loadedName = name;
	obj[name].loadedContent = content;
	start = process.hrtime(start);
	totalCallTimeInNanoSeconds += start[1];
	console.log(
		`In calling all ${loadFunctionCallFrequency} times load function total time taken in seconds:`,
		totalCallTimeInNanoSeconds / 1e9
	);
	console.log(
		"Total number of load function calls:",
		loadFunctionCallFrequency
	);
	return obj[name];
}

Benchmark Results Code Block1

My Environment:
NodeV12.16.1 NPMv6.13.4

For the change of zip loading, the code that is called a lot is the load function :

function load(name, content, obj) {
         // name is the full file name for example :   "name": "tag-example.docx"
         // content is the content of that filename, which is a binary string
        // obj is a cache of each docxtemplater instance that was created using load.
	const zip = new PizZip(content);
	obj[name] = new Docxtemplater();
	obj[name].loadZip(zip);
	obj[name].loadedName = name;
	obj[name].loadedContent = content;
	return obj[name];
}

Since this load function calls new PizZip(content), if new PizZip is slow, then by changing PizZip to something faster would make the tests run faster.

It would be interesting to see, for a test run, how much time is spent running new Pizzip(content) here.

@edi9999
Copy link
Member Author

edi9999 commented May 21, 2020

Can you do the same measurements with npm run test:es6:fast ?

Here I suspect that you are running all tests (even the ones we know take more than 2 or 3 seconds).

When I run npm run test:es6:fast it runs 288 tests.

Can you do the same measurement just for new PizZip, ie not include the new Docxtemplater ?

@shadab14meb346
Copy link
Contributor

shadab14meb346 commented May 21, 2020

Can you do the same measurements with npm run test:es6:fast ?

Here I suspect that you are running all tests (even the ones we know take more than 2 or 3 seconds).

When I run npm run test:es6:fast it runs 288 tests.

Can you do the same measurement just for new PizZip, eg not include the new Docxtemplater ?

Benchmark Results

  1. by using test command npm run test:es6:fast. For both new PizZip and new Docxtemplater

Average time taken 0.5015457105 seconds for 10 test runs.

IndividualTimesInSeconds:0.493444716,0.532960591,0.518716382,0.506935504,0.497533393,0.491868178,0.47420252,0.523161315,0.497780322,0.478854184

Code block for this test is exactly same as this reply Benchmark Results Code Block1
Kazam_screenshot_00005
2.
As asked by you

Can you do the same measurement just for new PizZip, eg not include the new Docxtemplater ?

By this what I am able to understand is two things.

2.1 Do benchmarking for load function just with new PizZip. If this is the case below code block I have used to benchmark it. But as expected it is failing the test cases, it is giving error TypeError: obj[name].loadZip is not a function and I am also not getting any time console.

let loadFunctionCallFrequency = 0;
let totalCallTimeInNanoSeconds = 0;
function load(name, content, obj) {
	let start = process.hrtime();
	loadFunctionCallFrequency++;
	const zip = new PizZip(content);
	start = process.hrtime(start);
	totalCallTimeInNanoSeconds += start[1];
	console.log(
		`In calling all ${loadFunctionCallFrequency} times load function with only PizZip total time taken in seconds:`,
		totalCallTimeInNanoSeconds / 1e9
	);
	console.log(
		"Total number of load function calls with only PizZip:",
		loadFunctionCallFrequency
	);
	// obj[name] = new Docxtemplater();
	// obj[name].loadZip(zip);
	// obj[name].loadedName = name;
	// obj[name].loadedContent = content;
	return obj[name];
}

Benchmark Results Code Block2

2.2 And case two that I should benchmark new PizZip standalone by taking a file as an example benchmarkPizZip.js file and require PizZip in that and run node benchmarkPizZip.js as a normal node process. The expected code for benchmarkPizZip.js is below. I tried this way and for the file cyrillic.docx in the examples folder the average timings in seconds is 0.0144781315 for running 5 times.

const PizZip = require("pizzip");
const fs = require("fs");
const path = require("path");
//Load the docx file as a binary
const content = fs.readFileSync(
	path.resolve(__dirname, "../../examples/cyrillic.docx"),
	"binary"
);
let totalCallTimeInNanoSeconds = 0;
let start = process.hrtime();
const zip = new PizZip(content);
start = process.hrtime(start);
totalCallTimeInNanoSeconds = start[1];
console.log(
	`In calling PizZip time taken in seconds:`,
	totalCallTimeInNanoSeconds / 1e9
);

Benchmark Results Code Block3

@edi9999
Copy link
Member Author

edi9999 commented May 22, 2020

I meant measuring the time of new Pizzip, but still keep everything as is. So almost 2.1, but without the comments :

let loadFunctionCallFrequency = 0;
let totalCallTimeInNanoSeconds = 0;
function load(name, content, obj) {
	let start = process.hrtime();
	loadFunctionCallFrequency++;
	const zip = new PizZip(content);
	start = process.hrtime(start);
	totalCallTimeInNanoSeconds += start[1];
	console.log(
		`In calling all ${loadFunctionCallFrequency} times load function with only PizZip total time taken in seconds:`,
		totalCallTimeInNanoSeconds / 1e9
	);
	console.log(
		"Total number of load function calls with only PizZip:",
		loadFunctionCallFrequency
	);
	obj[name] = new Docxtemplater();
	obj[name].loadZip(zip);
	obj[name].loadedName = name;
	obj[name].loadedContent = content;
	return obj[name];
}

@shadab14meb346
Copy link
Contributor

I meant measuring the time of new Pizzip, but still keep everything as is. So almost 2.1, but without the comments :

let loadFunctionCallFrequency = 0;
let totalCallTimeInNanoSeconds = 0;
function load(name, content, obj) {
	let start = process.hrtime();
	loadFunctionCallFrequency++;
	const zip = new PizZip(content);
	start = process.hrtime(start);
	totalCallTimeInNanoSeconds += start[1];
	console.log(
		`In calling all ${loadFunctionCallFrequency} times load function with only PizZip total time taken in seconds:`,
		totalCallTimeInNanoSeconds / 1e9
	);
	console.log(
		"Total number of load function calls with only PizZip:",
		loadFunctionCallFrequency
	);
	obj[name] = new Docxtemplater();
	obj[name].loadZip(zip);
	obj[name].loadedName = name;
	obj[name].loadedContent = content;
	return obj[name];
}

Ya sure now I understood.
Here are the results just for PizZip.
I ran it 5 times. test command npm run test:es6:fast
Average time:0.2619314742 seconds
Kazam_screenshot_00006

@edi9999
Copy link
Member Author

edi9999 commented May 22, 2020

Ok, so it means 0.26 seconds of improvement for a total run time of about 2 seconds.

I think in this case, it doesn't make much sense to try to optimize it, since it won't be possible to do an improvement of more than 10%.

Can you try to profile the test code to see if there are some bottlenecks that could be made faster ?

Else, I think it would maybe be faster by parallelizing, I'm not sure how this is doable with mocha.

@shadab14meb346
Copy link
Contributor

shadab14meb346 commented May 22, 2020

For my computer the test with this command npm run test:es6:fast is taking total 3 seconds.
Currently I am looking into how to profile test cases code.
Later I will look into parallel testing in mocha.

@shadab14meb346
Copy link
Contributor

I was just giving an initial try for parallel testing idea and I installed mocha-parallel-tests npm module and tried running our test cases but it is taking even more time as compared to npm run test:es6:fast.
So some issue is there with the mocha-parallel-tests module.
Continuing my search for test cases code profiling but could not something useful yet.

@edi9999
Copy link
Member Author

edi9999 commented May 22, 2020

When you run mocha-parallel-tests, was it with the FAST=true environment variable ?

The command run by npm run test:es6:fast is :

"test:es6:fast": "FAST=true mocha --full-trace --check-leaks es6/tests/index.js",

If you use mocha-parallel-tests, you should use the same env to be able to compare.

@shadab14meb346
Copy link
Contributor

Yes I am using the same env to compare both.

"parallel-test": "FAST=true mocha-parallel-tests --full-trace --check-leaks es6/tests/index.js"
 "test:es6:fast": "FAST=true mocha --full-trace --check-leaks es6/tests/index.js"

@shadab14meb346
Copy link
Contributor

After using nodejs inspector with this command FAST=true ./node_modules/.bin/mocha --inspect-brk --full-trace --check-leaks es6/tests/index.js. And doing a CPU profiling. Below results I deduced.
Functions taking more than 500ms time

  1. LoadFile file-path://docxtemplater/es6/tests/utils.js

  2. asText in PizZip file-path://docxtemplater/node_modules/pizzip/js/object.js

  3. nodebuffer file-path://docxtemplater/node_modules/pizzip/js/utils.js

  4. load file-path://docxtemplater/es6/tests/utils.js

  5. render file-path://docxtemplater/es6/docxtemplater.js

  6. ZipEntries in PizZip file-path://docxtemplater/node_modules/pizzip/js/zipEntries.js

  7. dataToString in PizZip file-path://docxtemplater/node_modules/pizzip/js/object.js

  8. compile file-path://docxtemplater/es6/docxtemplater.js

  9. makeDocx file-path://docxtemplater/es6/tests/utils.js

  10. module.exports in PizZip file-path://docxtemplater/node_modules/pizzip/js/load.js

  11. PizZip file-path://docxtemplater/node_modules/pizzip/js/index.js

  12. load file-path://docxtemplater/node_modules/pizzip/js/zipEntries.js

  13. readLocalFiles in PizZip file-path://docxtemplater/node_modules/pizzip/js/zipEntries.js

  14. module.exports file-path://docxtemplater/node_modules/pizzip/js/nodeBuffer.js

  15. exports.TransformTo file-path://docxtemplater/node_modules/pizzip/js/utils.js

As you can see a lot of functions which are taking more time are associated with PizZip and I guess if we can moc PizZip as discussed by you here maybe we can increase the test speed.

@edi9999
Copy link
Member Author

edi9999 commented May 30, 2020

Yes, I think you're right, lets give it a go then.

@shadab14meb346
Copy link
Contributor

I will start looking into it how to write a mock for PizZip. Please do share any suggestions if you feel so.

@edi9999
Copy link
Member Author

edi9999 commented May 31, 2020

Hello @shadab14meb346 ,

the idea is to mock PizZip, meaning that it should behave in the same way as the PizZip library but be as fast as possible.

For example, one should be able to do :

var zip = new MockedPizzip(filepath);
zip.file(/./).map(({ name }) => {
    // this should print all filenames of the faked zip
    console.log(name);
});

const content = zip.files["word/document.xml"].asText();
// This should log the content of word/document.xml
console.log(content);

const newContent = "foobar";
// This should change the content of word/document.xml
zip.file("word/document.xml", newContent, { createFolders: true });

@edi9999
Copy link
Member Author

edi9999 commented Jun 2, 2020

Does this make things more clear ?

@shadab14meb346
Copy link
Contributor

Hi. Ya, sure thanks. it is helping me.
So currently I am trying to understand PizZip working. For this,
1. I made a testZip.zip zip file using PizZip. The folder unzips as below

testZip
├── Hello.txt
├── documents
        ├── users.docx

2. Now I am trying to use the step onetestZip.zipzip file and read the file names by using this code block

const PizZip = require("pizzip");
const path = require("path");
const zippedFilePath = path.join(__dirname, "/testZip.zip");
const zip = new PizZip(zippedFilePath);
zip.file(/./).map(({ name }) => {
    // this should print all filenames of the faked zip
    console.log(name);
});

But this is giving me the below error

docxtemplater/node_modules/pizzip/js/zipEntries.js:191
throw new Error("Can't find end of central directory : is this a zip file ?");
        ^
Error: Can't find end of central directory : is this a zip file ?```
Can you suggest something, please?

@edi9999
Copy link
Member Author

edi9999 commented Jun 2, 2020

I think it is because when using a real pizzip instance, you need to use the file contents, not the file path.

IE :

const zippedFilePath = path.join(__dirname, "/testZip.zip");
const content = fs.readFileSync(zippedFilePath).toString()
const zip = new PizZip(content);

@shadab14meb346
Copy link
Contributor

I think it is because when using a real pizzip instance, you need to use the file contents, not the file path.

IE :

const zippedFilePath = path.join(__dirname, "/testZip.zip");
const content = fs.readFileSync(zippedFilePath).toString()
const zip = new PizZip(content);

It got worked but in the place of const content = fs.readFileSync(zippedFilePath).toString() line it should be like const content = fs.readFileSync(zippedFilePath). So the final working code block will be like below

const zippedFilePath = path.join(__dirname, "/testZip.zip");
const content = fs.readFileSync(zippedFilePath);
const zip = new PizZip(content);

@shadab14meb346
Copy link
Contributor

I am very willing to have fast tests. This is because if you have tests that take less than a second to run, it is possible to have them run in watch mode, and save your changes and see immediately that your tests are still passing.

With the time however, the tests are running in a bit more than 1 second in watch mode (on my computer, it takes 1940ms in average).

I think this could be improved upon.

First things I would like to do is to measure things. They are separate steps for running the tests :

  1. Load all docx fixtures into nodejs buffers
  2. Unzip the docx with pizzip
  3. Do some templating with docxtemplater
  4. Have some expectations on the output.

Since 1. is probably cached by the "page cache" : https://en.wikipedia.org/wiki/Page_cache

And I expect that 3. should be fast, I'm guessing that 2 && 4 could be improved upon.

Measuring is really the first step, the ideas should not be implemented before doing proper measuring and reporting the results in this issue.

Idea 1 (cached zip)

One of my ideas would be that for tests, we don't use Pizzip to unzip the docx for each test run (which takes time), but instead, we could create a mock that would have a "cached" version of a zip, which would be stored as a folder. For example, if you have in : examples/foo.docx , a docx document is a zipfile which you can unzip. If you unzip that, you get a folder like this :

foo
├── [Content_Types].xml
├── docProps
│   ├── app.xml
│   └── core.xml
├── _rels
└── word
    ├── document.xml
    ├── fontTable.xml
    ├── numbering.xml
    ├── _rels
    │   └── document.xml.rels
    ├── settings.xml
    ├── stylesWithEffects.xml
    ├── styles.xml
    ├── theme
    │   └── theme1.xml
    └── webSettings.xml

We could store this folder instead of foo.docx, and by creating a good mock of Pizzip, we could avoid to unzip the docx for each test run but instead read the files that we want from disk.

Idea 2 (parallel testing)

As far as I can tell, mocha currently runs each test suite sequentially (on one CPU only). Since most computers nowadays have at least 4 or 8 CPUs, it would make sense to run the tests in multiple processes.

I tried to understand but I am facing a little difficulty so can you please help me In the Idea 1 (cached zip) as you mentioned

For example, if you have in : examples/foo.docx , a docx document is a zipfile which you can unzip. If you unzip that, you get a folder like this :

Can you please point me out the folder and line number where the unzipping of docx file is happening for the test cases.
And the folder and line number where PizZip is using the unzipped files to do all sort of things like below.

var zip = new MockedPizzip(filepath);
zip.file(/./).map(({ name }) => {
    // this should print all filenames of the faked zip
    console.log(name);
});

const content = zip.files["word/document.xml"].asText();
// This should log the content of word/document.xml
console.log(content);

const newContent = "foobar";
// This should change the content of word/document.xml
zip.file("word/document.xml", newContent, { createFolders: true });

@edi9999
Copy link
Member Author

edi9999 commented Jun 11, 2020

The zip creation is done here :

 docxtemplater/es6/tests/utils.js line 403
	const zip = new PizZip(content);

The usage of the zip is done at several places, for example :

docxtemplater/es6/collect-content-types.js line 15 
docxtemplater/es6/docxtemplater.js line 45
docxtemplater/es6/docxtemplater.js line 242
docxtemplater/es6/docxtemplater.js line 342
docxtemplater/es6/docxtemplater.js line 352
docxtemplater/es6/docxtemplater.js line 363

In all these cases, the zip instance created in docxtemplater/es6/tests/utils.js line 403 is the one that is going to be used in all other places, what I mean by that is that it is the exact same object.

@edi9999
Copy link
Member Author

edi9999 commented Jul 30, 2020

@shadab14meb346 are you alright ? I think I will tackle this issue soon if I don't get a response in the next few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants