Make tests run faster #517

edi9999 · 2020-05-13T20:41:24Z

I am very willing to have fast tests. This is because if you have tests that take less than a second to run, it is possible to have them run in watch mode, and save your changes and see immediately that your tests are still passing.

With the time however, the tests are running in a bit more than 1 second in watch mode (on my computer, it takes 1940ms on average).

I think this could be improved upon.

First things I would like to do is to measure things. There are separate steps for running the tests :

Load all docx fixtures into nodejs buffers
Unzip the docx with pizzip
Do some templating with docxtemplater
Have some expectations on the output.

Since 1. is probably cached by the "page cache" : https://en.wikipedia.org/wiki/Page_cache

And I expect that 3. should be fast, I'm guessing that 2 && 4 could be improved upon.

Measuring is really the first step, the ideas should not be implemented before doing proper measuring and reporting the results in this issue.

Idea 1 (cached zip)

One of my ideas would be that for tests, we don't use Pizzip to unzip the docx for each test run (which takes time), but instead, we could create a mock that would have a "cached" version of a zip, which would be stored as a folder. For example, if you have in : examples/foo.docx , a docx document is a zipfile which you can unzip. If you unzip that, you get a folder like this :

foo
├── [Content_Types].xml
├── docProps
│   ├── app.xml
│   └── core.xml
├── _rels
└── word
    ├── document.xml
    ├── fontTable.xml
    ├── numbering.xml
    ├── _rels
    │   └── document.xml.rels
    ├── settings.xml
    ├── stylesWithEffects.xml
    ├── styles.xml
    ├── theme
    │   └── theme1.xml
    └── webSettings.xml

We could store this folder instead of foo.docx, and by creating a good mock of Pizzip, we could avoid to unzip the docx for each test run but instead read the files that we want from disk.

Idea 2 (parallel testing)

As far as I can tell, mocha currently runs each test suite sequentially (on one CPU only). Since most computers nowadays have at least 4 or 8 CPUs, it would make sense to run the tests in multiple processes.

The text was updated successfully, but these errors were encountered:

edi9999 · 2020-05-21T15:18:57Z

For the change of zip loading, the code that is called a lot is the load function :

function load(name, content, obj) {
         // name is the full file name for example :   "name": "tag-example.docx"
         // content is the content of that filename, which is a binary string
        // obj is a cache of each docxtemplater instance that was created using load.
	const zip = new PizZip(content);
	obj[name] = new Docxtemplater();
	obj[name].loadZip(zip);
	obj[name].loadedName = name;
	obj[name].loadedContent = content;
	return obj[name];
}

Since this load function calls new PizZip(content), if new PizZip is slow, then by changing PizZip to something faster would make the tests run faster.

It would be interesting to see, for a test run, how much time is spent running new Pizzip(content) here.

edi9999 · 2020-05-21T15:22:23Z

If the measurements make it clear that PizZip is slow, it would make sense to write basically :

const zip = new MockedZip(name);

and the MockedZip would "behave" exactly as the PizZip library, except that underneath, it uses a folder which is already unzipped.

edi9999 · 2020-05-21T15:25:54Z

The two ideas I had were just a starting point, probably by using a profiler we could find which spots cause the slowlessness.

edi9999 · 2020-05-21T15:34:35Z

You can now to use npm run test:watch (after pulling since I just pushed this).

shadab14meb346 · 2020-05-21T20:11:56Z

Benchmark Results

method for this process.hrtime();

load function mentioned in this discussion is called total 402 times for complete test using command npm test. And it has taken an average of 0.612358 seconds for almost 10 times running it.
It varies from 0.742381 seconds to 0.49462005 seconds for my computer which i5 third generation.
Benchmark Results Pic1

git diff was not showing the console.log part of code so included screenshot showing the comparing. Left side window has old code and right side window have new code to benchmark allload function calls.
Benchmark Results Pic2

let loadFunctionCallFrequency = 0;
let totalCallTimeInNanoSeconds = 0;
function load(name, content, obj) {
	let start = process.hrtime();
	loadFunctionCallFrequency++;
	const zip = new PizZip(content);
	obj[name] = new Docxtemplater();
	obj[name].loadZip(zip);
	obj[name].loadedName = name;
	obj[name].loadedContent = content;
	start = process.hrtime(start);
	totalCallTimeInNanoSeconds += start[1];
	console.log(
		`In calling all ${loadFunctionCallFrequency} times load function total time taken in seconds:`,
		totalCallTimeInNanoSeconds / 1e9
	);
	console.log(
		"Total number of load function calls:",
		loadFunctionCallFrequency
	);
	return obj[name];
}

Benchmark Results Code Block1

My Environment:
NodeV12.16.1 NPMv6.13.4

For the change of zip loading, the code that is called a lot is the load function :
function load(name, content, obj) {
         // name is the full file name for example :   "name": "tag-example.docx"
         // content is the content of that filename, which is a binary string
        // obj is a cache of each docxtemplater instance that was created using load.
	const zip = new PizZip(content);
	obj[name] = new Docxtemplater();
	obj[name].loadZip(zip);
	obj[name].loadedName = name;
	obj[name].loadedContent = content;
	return obj[name];
}
Since this load function calls new PizZip(content), if new PizZip is slow, then by changing PizZip to something faster would make the tests run faster.

It would be interesting to see, for a test run, how much time is spent running new Pizzip(content) here.

edi9999 · 2020-05-21T20:18:55Z

Can you do the same measurements with npm run test:es6:fast ?

Here I suspect that you are running all tests (even the ones we know take more than 2 or 3 seconds).

When I run npm run test:es6:fast it runs 288 tests.

Can you do the same measurement just for new PizZip, ie not include the new Docxtemplater ?

shadab14meb346 · 2020-05-21T22:35:23Z

Can you do the same measurements with npm run test:es6:fast ?

Here I suspect that you are running all tests (even the ones we know take more than 2 or 3 seconds).

When I run npm run test:es6:fast it runs 288 tests.

Can you do the same measurement just for new PizZip, eg not include the new Docxtemplater ?

Benchmark Results

by using test command npm run test:es6:fast. For both new PizZip and new Docxtemplater

Average time taken 0.5015457105 seconds for 10 test runs.

IndividualTimesInSeconds:0.493444716,0.532960591,0.518716382,0.506935504,0.497533393,0.491868178,0.47420252,0.523161315,0.497780322,0.478854184

Code block for this test is exactly same as this reply Benchmark Results Code Block1

2.
As asked by you

Can you do the same measurement just for new PizZip, eg not include the new Docxtemplater ?

By this what I am able to understand is two things.

2.1 Do benchmarking for load function just with new PizZip. If this is the case below code block I have used to benchmark it. But as expected it is failing the test cases, it is giving error TypeError: obj[name].loadZip is not a function and I am also not getting any time console.

let loadFunctionCallFrequency = 0;
let totalCallTimeInNanoSeconds = 0;
function load(name, content, obj) {
	let start = process.hrtime();
	loadFunctionCallFrequency++;
	const zip = new PizZip(content);
	start = process.hrtime(start);
	totalCallTimeInNanoSeconds += start[1];
	console.log(
		`In calling all ${loadFunctionCallFrequency} times load function with only PizZip total time taken in seconds:`,
		totalCallTimeInNanoSeconds / 1e9
	);
	console.log(
		"Total number of load function calls with only PizZip:",
		loadFunctionCallFrequency
	);
	// obj[name] = new Docxtemplater();
	// obj[name].loadZip(zip);
	// obj[name].loadedName = name;
	// obj[name].loadedContent = content;
	return obj[name];
}

Benchmark Results Code Block2

2.2 And case two that I should benchmark new PizZip standalone by taking a file as an example benchmarkPizZip.js file and require PizZip in that and run node benchmarkPizZip.js as a normal node process. The expected code for benchmarkPizZip.js is below. I tried this way and for the file cyrillic.docx in the examples folder the average timings in seconds is 0.0144781315 for running 5 times.

const PizZip = require("pizzip");
const fs = require("fs");
const path = require("path");
//Load the docx file as a binary
const content = fs.readFileSync(
	path.resolve(__dirname, "../../examples/cyrillic.docx"),
	"binary"
);
let totalCallTimeInNanoSeconds = 0;
let start = process.hrtime();
const zip = new PizZip(content);
start = process.hrtime(start);
totalCallTimeInNanoSeconds = start[1];
console.log(
	`In calling PizZip time taken in seconds:`,
	totalCallTimeInNanoSeconds / 1e9
);

Benchmark Results Code Block3

edi9999 · 2020-05-22T06:25:38Z

I meant measuring the time of new Pizzip, but still keep everything as is. So almost 2.1, but without the comments :

let loadFunctionCallFrequency = 0;
let totalCallTimeInNanoSeconds = 0;
function load(name, content, obj) {
	let start = process.hrtime();
	loadFunctionCallFrequency++;
	const zip = new PizZip(content);
	start = process.hrtime(start);
	totalCallTimeInNanoSeconds += start[1];
	console.log(
		`In calling all ${loadFunctionCallFrequency} times load function with only PizZip total time taken in seconds:`,
		totalCallTimeInNanoSeconds / 1e9
	);
	console.log(
		"Total number of load function calls with only PizZip:",
		loadFunctionCallFrequency
	);
	obj[name] = new Docxtemplater();
	obj[name].loadZip(zip);
	obj[name].loadedName = name;
	obj[name].loadedContent = content;
	return obj[name];
}

shadab14meb346 · 2020-05-22T06:52:04Z

I meant measuring the time of new Pizzip, but still keep everything as is. So almost 2.1, but without the comments :

let loadFunctionCallFrequency = 0;
let totalCallTimeInNanoSeconds = 0;
function load(name, content, obj) {
	let start = process.hrtime();
	loadFunctionCallFrequency++;
	const zip = new PizZip(content);
	start = process.hrtime(start);
	totalCallTimeInNanoSeconds += start[1];
	console.log(
		`In calling all ${loadFunctionCallFrequency} times load function with only PizZip total time taken in seconds:`,
		totalCallTimeInNanoSeconds / 1e9
	);
	console.log(
		"Total number of load function calls with only PizZip:",
		loadFunctionCallFrequency
	);
	obj[name] = new Docxtemplater();
	obj[name].loadZip(zip);
	obj[name].loadedName = name;
	obj[name].loadedContent = content;
	return obj[name];
}

Ya sure now I understood.
Here are the results just for PizZip.
I ran it 5 times. test command npm run test:es6:fast
Average time:0.2619314742 seconds

edi9999 · 2020-05-22T14:18:34Z

Ok, so it means 0.26 seconds of improvement for a total run time of about 2 seconds.

I think in this case, it doesn't make much sense to try to optimize it, since it won't be possible to do an improvement of more than 10%.

Can you try to profile the test code to see if there are some bottlenecks that could be made faster ?

Else, I think it would maybe be faster by parallelizing, I'm not sure how this is doable with mocha.

shadab14meb346 · 2020-05-22T14:48:25Z

For my computer the test with this command npm run test:es6:fast is taking total 3 seconds.
Currently I am looking into how to profile test cases code.
Later I will look into parallel testing in mocha.

shadab14meb346 · 2020-05-22T18:07:28Z

I was just giving an initial try for parallel testing idea and I installed mocha-parallel-tests npm module and tried running our test cases but it is taking even more time as compared to npm run test:es6:fast.
So some issue is there with the mocha-parallel-tests module.
Continuing my search for test cases code profiling but could not something useful yet.

edi9999 · 2020-05-22T19:32:27Z

When you run mocha-parallel-tests, was it with the FAST=true environment variable ?

The command run by npm run test:es6:fast is :

"test:es6:fast": "FAST=true mocha --full-trace --check-leaks es6/tests/index.js",

If you use mocha-parallel-tests, you should use the same env to be able to compare.

shadab14meb346 · 2020-05-22T19:47:24Z

Yes I am using the same env to compare both.

"parallel-test": "FAST=true mocha-parallel-tests --full-trace --check-leaks es6/tests/index.js"
 "test:es6:fast": "FAST=true mocha --full-trace --check-leaks es6/tests/index.js"

shadab14meb346 · 2020-05-30T00:57:52Z

After using nodejs inspector with this command FAST=true ./node_modules/.bin/mocha --inspect-brk --full-trace --check-leaks es6/tests/index.js. And doing a CPU profiling. Below results I deduced.
Functions taking more than 500ms time

LoadFile file-path://docxtemplater/es6/tests/utils.js
asText in PizZip file-path://docxtemplater/node_modules/pizzip/js/object.js
nodebuffer file-path://docxtemplater/node_modules/pizzip/js/utils.js
load file-path://docxtemplater/es6/tests/utils.js
render file-path://docxtemplater/es6/docxtemplater.js
ZipEntries in PizZip file-path://docxtemplater/node_modules/pizzip/js/zipEntries.js
dataToString in PizZip file-path://docxtemplater/node_modules/pizzip/js/object.js
compile file-path://docxtemplater/es6/docxtemplater.js
makeDocx file-path://docxtemplater/es6/tests/utils.js
module.exports in PizZip file-path://docxtemplater/node_modules/pizzip/js/load.js
PizZip file-path://docxtemplater/node_modules/pizzip/js/index.js
load file-path://docxtemplater/node_modules/pizzip/js/zipEntries.js
readLocalFiles in PizZip file-path://docxtemplater/node_modules/pizzip/js/zipEntries.js
module.exports file-path://docxtemplater/node_modules/pizzip/js/nodeBuffer.js
exports.TransformTo file-path://docxtemplater/node_modules/pizzip/js/utils.js

As you can see a lot of functions which are taking more time are associated with PizZip and I guess if we can moc PizZip as discussed by you here maybe we can increase the test speed.

edi9999 · 2020-05-30T07:17:02Z

Yes, I think you're right, lets give it a go then.

shadab14meb346 · 2020-05-30T07:42:44Z

I will start looking into it how to write a mock for PizZip. Please do share any suggestions if you feel so.

edi9999 · 2020-05-31T15:02:50Z

Hello @shadab14meb346 ,

the idea is to mock PizZip, meaning that it should behave in the same way as the PizZip library but be as fast as possible.

For example, one should be able to do :

var zip = new MockedPizzip(filepath);
zip.file(/./).map(({ name }) => {
    // this should print all filenames of the faked zip
    console.log(name);
});

const content = zip.files["word/document.xml"].asText();
// This should log the content of word/document.xml
console.log(content);

const newContent = "foobar";
// This should change the content of word/document.xml
zip.file("word/document.xml", newContent, { createFolders: true });

edi9999 · 2020-06-02T07:55:48Z

Does this make things more clear ?

shadab14meb346 · 2020-06-02T08:17:57Z

Hi. Ya, sure thanks. it is helping me.
So currently I am trying to understand PizZip working. For this,
1. I made a testZip.zip zip file using PizZip. The folder unzips as below

testZip
├── Hello.txt
├── documents
        ├── users.docx

2. Now I am trying to use the step onetestZip.zipzip file and read the file names by using this code block

const PizZip = require("pizzip");
const path = require("path");
const zippedFilePath = path.join(__dirname, "/testZip.zip");
const zip = new PizZip(zippedFilePath);
zip.file(/./).map(({ name }) => {
    // this should print all filenames of the faked zip
    console.log(name);
});

But this is giving me the below error

docxtemplater/node_modules/pizzip/js/zipEntries.js:191
throw new Error("Can't find end of central directory : is this a zip file ?");
        ^
Error: Can't find end of central directory : is this a zip file ?```
Can you suggest something, please?

edi9999 · 2020-06-02T13:19:13Z

I think it is because when using a real pizzip instance, you need to use the file contents, not the file path.

IE :

const zippedFilePath = path.join(__dirname, "/testZip.zip");
const content = fs.readFileSync(zippedFilePath).toString()
const zip = new PizZip(content);

shadab14meb346 · 2020-06-02T15:32:43Z

I think it is because when using a real pizzip instance, you need to use the file contents, not the file path.

IE :
const zippedFilePath = path.join(__dirname, "/testZip.zip");
const content = fs.readFileSync(zippedFilePath).toString()
const zip = new PizZip(content);

It got worked but in the place of const content = fs.readFileSync(zippedFilePath).toString() line it should be like const content = fs.readFileSync(zippedFilePath). So the final working code block will be like below

const zippedFilePath = path.join(__dirname, "/testZip.zip");
const content = fs.readFileSync(zippedFilePath);
const zip = new PizZip(content);

shadab14meb346 · 2020-06-11T16:14:16Z

I am very willing to have fast tests. This is because if you have tests that take less than a second to run, it is possible to have them run in watch mode, and save your changes and see immediately that your tests are still passing.

With the time however, the tests are running in a bit more than 1 second in watch mode (on my computer, it takes 1940ms in average).

I think this could be improved upon.

First things I would like to do is to measure things. They are separate steps for running the tests :

Load all docx fixtures into nodejs buffers

Unzip the docx with pizzip

Do some templating with docxtemplater

Have some expectations on the output.

Since 1. is probably cached by the "page cache" : https://en.wikipedia.org/wiki/Page_cache

And I expect that 3. should be fast, I'm guessing that 2 && 4 could be improved upon.

Measuring is really the first step, the ideas should not be implemented before doing proper measuring and reporting the results in this issue.

Idea 1 (cached zip)

One of my ideas would be that for tests, we don't use Pizzip to unzip the docx for each test run (which takes time), but instead, we could create a mock that would have a "cached" version of a zip, which would be stored as a folder. For example, if you have in : examples/foo.docx , a docx document is a zipfile which you can unzip. If you unzip that, you get a folder like this :
foo
├── [Content_Types].xml
├── docProps
│   ├── app.xml
│   └── core.xml
├── _rels
└── word
    ├── document.xml
    ├── fontTable.xml
    ├── numbering.xml
    ├── _rels
    │   └── document.xml.rels
    ├── settings.xml
    ├── stylesWithEffects.xml
    ├── styles.xml
    ├── theme
    │   └── theme1.xml
    └── webSettings.xml
We could store this folder instead of foo.docx, and by creating a good mock of Pizzip, we could avoid to unzip the docx for each test run but instead read the files that we want from disk.

Idea 2 (parallel testing)

As far as I can tell, mocha currently runs each test suite sequentially (on one CPU only). Since most computers nowadays have at least 4 or 8 CPUs, it would make sense to run the tests in multiple processes.

I tried to understand but I am facing a little difficulty so can you please help me In the Idea 1 (cached zip) as you mentioned

For example, if you have in : examples/foo.docx , a docx document is a zipfile which you can unzip. If you unzip that, you get a folder like this :

Can you please point me out the folder and line number where the unzipping of docx file is happening for the test cases.
And the folder and line number where PizZip is using the unzipped files to do all sort of things like below.

var zip = new MockedPizzip(filepath);
zip.file(/./).map(({ name }) => {
    // this should print all filenames of the faked zip
    console.log(name);
});

const content = zip.files["word/document.xml"].asText();
// This should log the content of word/document.xml
console.log(content);

const newContent = "foobar";
// This should change the content of word/document.xml
zip.file("word/document.xml", newContent, { createFolders: true });

edi9999 · 2020-06-11T19:45:59Z

The zip creation is done here :

 docxtemplater/es6/tests/utils.js line 403
	const zip = new PizZip(content);

The usage of the zip is done at several places, for example :

docxtemplater/es6/collect-content-types.js line 15 
docxtemplater/es6/docxtemplater.js line 45
docxtemplater/es6/docxtemplater.js line 242
docxtemplater/es6/docxtemplater.js line 342
docxtemplater/es6/docxtemplater.js line 352
docxtemplater/es6/docxtemplater.js line 363

In all these cases, the zip instance created in docxtemplater/es6/tests/utils.js line 403 is the one that is going to be used in all other places, what I mean by that is that it is the exact same object.

edi9999 · 2020-07-30T20:45:17Z

@shadab14meb346 are you alright ? I think I will tackle this issue soon if I don't get a response in the next few days.

edi9999 added the complexity:hard label May 13, 2020

edi9999 added the enhancement label Oct 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make tests run faster #517

Make tests run faster #517

edi9999 commented May 13, 2020 •

edited

edi9999 commented May 21, 2020 •

edited

edi9999 commented May 21, 2020

edi9999 commented May 21, 2020

edi9999 commented May 21, 2020

shadab14meb346 commented May 21, 2020 •

edited

edi9999 commented May 21, 2020 •

edited

shadab14meb346 commented May 21, 2020 •

edited

edi9999 commented May 22, 2020

shadab14meb346 commented May 22, 2020

edi9999 commented May 22, 2020

shadab14meb346 commented May 22, 2020 •

edited

shadab14meb346 commented May 22, 2020

edi9999 commented May 22, 2020

shadab14meb346 commented May 22, 2020

shadab14meb346 commented May 30, 2020

edi9999 commented May 30, 2020

shadab14meb346 commented May 30, 2020

edi9999 commented May 31, 2020

edi9999 commented Jun 2, 2020

shadab14meb346 commented Jun 2, 2020

edi9999 commented Jun 2, 2020

shadab14meb346 commented Jun 2, 2020

shadab14meb346 commented Jun 11, 2020

Idea 1 (cached zip)

Idea 2 (parallel testing)

edi9999 commented Jun 11, 2020 •

edited

edi9999 commented Jul 30, 2020

Make tests run faster #517

Make tests run faster #517

Comments

edi9999 commented May 13, 2020 • edited

Idea 1 (cached zip)

Idea 2 (parallel testing)

edi9999 commented May 21, 2020 • edited

edi9999 commented May 21, 2020

edi9999 commented May 21, 2020

edi9999 commented May 21, 2020

shadab14meb346 commented May 21, 2020 • edited

Benchmark Results

edi9999 commented May 21, 2020 • edited

shadab14meb346 commented May 21, 2020 • edited

Benchmark Results

edi9999 commented May 22, 2020

shadab14meb346 commented May 22, 2020

edi9999 commented May 22, 2020

shadab14meb346 commented May 22, 2020 • edited

shadab14meb346 commented May 22, 2020

edi9999 commented May 22, 2020

shadab14meb346 commented May 22, 2020

shadab14meb346 commented May 30, 2020

edi9999 commented May 30, 2020

shadab14meb346 commented May 30, 2020

edi9999 commented May 31, 2020

edi9999 commented Jun 2, 2020

shadab14meb346 commented Jun 2, 2020

edi9999 commented Jun 2, 2020

shadab14meb346 commented Jun 2, 2020

shadab14meb346 commented Jun 11, 2020

Idea 1 (cached zip)

Idea 2 (parallel testing)

edi9999 commented Jun 11, 2020 • edited

edi9999 commented Jul 30, 2020

edi9999 commented May 13, 2020 •

edited

edi9999 commented May 21, 2020 •

edited

shadab14meb346 commented May 21, 2020 •

edited

edi9999 commented May 21, 2020 •

edited

shadab14meb346 commented May 21, 2020 •

edited

shadab14meb346 commented May 22, 2020 •

edited

edi9999 commented Jun 11, 2020 •

edited