Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: rewrite task runner in c++ #52609

Merged
merged 15 commits into from
May 2, 2024

Conversation

anonrig
Copy link
Member

@anonrig anonrig commented Apr 20, 2024

This is a rewrite of the task runner in C++. The benchmark speaks for themselves with a caveat of removing support for --env-file and related CLI flags in the task runner. I think the performance can be improved further more since somehow we still interact with V8.

While moving the implementation to C++, I've moved the tests for escapeShell to C++ as well, since we can't write a unit test running on JS side anymore. (Ref: please take a look at cctest/test_node_task_runner.cc

I'll investigate further after this pull-request to reduce the task runner to around ~5ms.

❯ hyperfine '../node/main-branch --run test' '../node/cpp-rewrite --run test' 'npm run test' -i
Benchmark 1: ../node/main-branch --run test
  Time (mean ± σ):      28.9 ms ±   0.9 ms    [User: 24.2 ms, System: 3.4 ms]
  Range (min … max):    27.5 ms …  31.7 ms    96 runs

  Warning: Ignoring non-zero exit code.

Benchmark 2: ../node/cpp-rewrite --run test
  Time (mean ± σ):      18.3 ms ±   0.6 ms    [User: 16.0 ms, System: 1.5 ms]
  Range (min … max):    17.5 ms …  20.8 ms    139 runs

  Warning: Ignoring non-zero exit code.

Benchmark 3: npm run test
  Time (mean ± σ):     148.8 ms ±   2.0 ms    [User: 131.9 ms, System: 22.1 ms]
  Range (min … max):   145.3 ms … 154.5 ms    19 runs

  Warning: Ignoring non-zero exit code.

Summary
  ../node/cpp-rewrite --run test ran
    1.58 ± 0.07 times faster than ../node/main-branch --run test
    8.14 ± 0.29 times faster than npm run test

cc @nodejs/performance

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/startup

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Apr 20, 2024
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

src/node_task_runner.cc Outdated Show resolved Hide resolved
src/node_task_runner.cc Outdated Show resolved Hide resolved
src/node_task_runner.cc Outdated Show resolved Hide resolved
src/node_task_runner.cc Outdated Show resolved Hide resolved
src/node_task_runner.cc Outdated Show resolved Hide resolved
@anonrig anonrig requested a review from lemire April 20, 2024 15:41
src/node.cc Outdated Show resolved Hide resolved
src/node_task_runner.cc Outdated Show resolved Hide resolved

// Check if input contains any forbidden characters
// If it doesn't, return the input as is.
if (!std::regex_search(input, forbidden_characters)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::regex has a bit of problem in terms of platform support and performance when I ported js2c to C++. I'd suggest just write a wrapper operating on the bytes as you find the characters, or you can use std::string::includes and std::string::search if the performance of this doesn't matter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can replace this for sure but the rest of it is going to be a problem

src/node_task_runner.cc Outdated Show resolved Hide resolved
Copy link
Member

@benjamingr benjamingr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, two issues (RE and mem leak pending)

@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 24, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 24, 2024
@nodejs-github-bot
Copy link
Collaborator

@lemire
Copy link
Member

lemire commented Apr 24, 2024

@anonrig

I will review 'soon'.

Please be patient.

@GeoffreyBooth
Copy link
Member

This should’ve had the @nodejs/test_runner team tagged.

I assume this change is fine, and desirable, but I’d like to hear from them what it might mean for future development of the test runner. Would moving the core (or more than the core) of the test runner into C++ mean that developing new test runner features would become difficult or impossible for the current test runner contributors? Or even if it would, maybe that’s okay since the test runner is close to feature complete by this point?

@juliangruber

This comment was marked as off-topic.

@GeoffreyBooth
Copy link
Member

this is about the task runner, not the test runner

Sorry, my mistake!

@GeoffreyBooth
Copy link
Member

For node --run, I kind of feel like it's important to support --env-file? It's a common use case to define environment variables for running scripts or starting servers.

@anonrig anonrig force-pushed the rewrite-task-runner-in-cpp branch from eb3ea04 to a7afb78 Compare May 1, 2024 22:37
@nodejs-github-bot
Copy link
Collaborator

@anonrig anonrig force-pushed the rewrite-task-runner-in-cpp branch from c092d62 to 584122e Compare May 2, 2024 00:35
@anonrig anonrig force-pushed the rewrite-task-runner-in-cpp branch from 584122e to 435aed9 Compare May 2, 2024 01:16
@nodejs-github-bot
Copy link
Collaborator

@voxpelli
Copy link

voxpelli commented May 2, 2024

Is performance really the bottle neck of the task runner?

Putting code into Node.js core makes it harder to contribute to by people only familiar with contributing to the wider npm module ecosystem.

Converting the code within Node.js core to C++ makes it even harder by requiring knowledge in C++ as well.

Is there a policy decision or similar that performance is preferable over maintainability?

@nodejs-github-bot
Copy link
Collaborator

@anonrig
Copy link
Member Author

anonrig commented May 2, 2024

Is performance really the bottle neck of the task runner?

Task runner currently takes 30ms. Before that, it took 200ms (using npm). The goal of implementing it on Node.js core was due to performance.

Currently, out of 30ms, only 5ms is lost on the actual child process, but the rest, 83%, is spent on non-task runner related things such as initializing V8. In order to reduce this unnecessary cost, we have to implement it in C++.

Putting code into Node.js core makes it harder to contribute to by people only familiar with contributing to the wider npm module ecosystem.

The goal of Node.js task runner is not to replace npm or npm module ecosystem. It's merely a helper cli function to achieve what a subset of npm run does, but in a really performant and fast way.

Converting the code within Node.js core to C++ makes it even harder by requiring knowledge in C++ as well.

You are indeed correct, but it's almost impossible to contribute to Node.js without writing (or at least understanding) Node.js, due to the internals written in C++. Hence, C++ knowledge is required at some level.

Is there a policy decision or similar that performance is preferable over maintainability?

There is no policy decision regarding this. Let's not forget that the goal of porting Node.js task runner into Node.js core is performance, not maintenance.

@voxpelli
Copy link

voxpelli commented May 2, 2024

Let's not forget that the goal of porting Node.js task runner into Node.js core is performance, not maintenance.

This is kind of the policy for this specific feature then. Is it documented as the goal anywhere?

@anonrig
Copy link
Member Author

anonrig commented May 2, 2024

This is kind of the policy for this specific feature then. Is it documented as the goal anywhere?

@voxpelli We don't document feature specific goals anywhere. The original PR which added node --run in the first place shows my intentions and my goals: #52190.

@lemire
Copy link
Member

lemire commented May 2, 2024

@anonrig If someone wants to provide the equivalent node --run using pure JavaScript, with no concern for performance, is it not the case that it can be done, without even contributing directly to Node.js ?

@voxpelli
Copy link

voxpelli commented May 2, 2024

This is kind of the policy for this specific feature then. Is it documented as the goal anywhere?

We don't document feature specific goals anywhere. The original PR which added node --run in the first place shows my intentions and my goals: #52190.

It would have been good because then you could have pointed me to such a goal, or I could have found it myself, and there would be no need for a discussion 🙂

For future reference: Lots of discussions on this topic is happening on Twitter in the threads that followed https://x.com/yagiznizipli/status/1785999524143464681?s=46&t=1mQKe1AKaQ-2YwRjxDfrOg

@voxpelli
Copy link

voxpelli commented May 2, 2024

If someone wants to provide the equivalent node --run using pure JavaScript, with no concern for performance, is it not the case that it can be done, without even contributing directly to Node.js ?

@lemire Isn't that what the implementation in lib/internal/main/run.js essentially already provides?

And npm etc of course already provides it.

Question is what is best for the node.js project. Essentially: Taking more of an undici approach or this more tightly integrated approach.

Speaking as a co-maintainer of npm-run-all2, it could for sure be intriguing if eg. it could use the same core run logic as here and possibly also if there would be a route to possibly upstream parts of that to the node.js task runner if there would be a wider community interest in that. This PR essentially shuts the door on both of those.

@lemire
Copy link
Member

lemire commented May 2, 2024

@voxpelli

Isn't that what the implementation in lib/internal/main/run.js essentially already provides?

I think you may not have examined @anonrig's work fully. It was always motivated by performance and contains significant C++ code (prior to this PR). You are just pointing a part of the implementation that was in JavaScript.

Speaking as a co-maintainer of npm-run-all2, it could for sure be intriguing if eg. it could use the same core run logic as here and possibly also if there would be a route to possibly upstream parts of that to the node.js task runner if there would be a wider community interest in that. This PR essentially shuts the door on both of those

Here is how the functionality has been submitted...

The purpose of this pull-request to offer a fast alternative to npm run xxx. With this pull-request, node supports node run test which executes test command inside package.json scripts.

Just so we are clear, are we in agreement that npm run xxx remains? That it can be further expanded and that you can seek to contribute to npm run xxx if you so desire, with or without this PR? That node --run xxx as proposed as a fast alternative?

@lemire
Copy link
Member

lemire commented May 2, 2024

@anonrig Let me try something.

@voxpelli
Copy link

voxpelli commented May 2, 2024

Here is how the functionality has been submitted...

The purpose of this pull-request to offer a fast alternative to npm run xxx. With this pull-request, node supports node run test which executes test command inside package.json scripts.

Just so we are clear, are we in agreement that npm run xxx remains? That it can be further expanded and that you can seek to contribute to npm run xxx if you so desire, with or without this PR? That node --run xxx as proposed as a fast alternative?

@lemire You're making this into something about my personal desires when it's not.

I wrote:

Question is what is best for the node.js project. Essentially: Taking more of an undici approach or this more tightly integrated approach.

I'm confident that @anonrig has understood my point though so any further discussion on this topic from me will happen outside of this issue when/if/where there's a broader discussion on this topic and how it aligns with current or future goals and needs for Node.js as a project. I do hope such a discussion happen and I would gladly be invited to or involved in it.

@benjamingr
Copy link
Member

I understand both sentiments (contribution should be easy, performance should be good). In this particular feature I think that since this dramatically effects the startup of every process lunched through the task runner performance triumphs.

Copy link
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect to have a PR on your PR in a few minutes. But this can be merged as-is in my opinion.

@anonrig anonrig added the commit-queue Add this label to land a pull request using GitHub Actions. label May 2, 2024
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label May 2, 2024
@nodejs-github-bot nodejs-github-bot merged commit c5cfdd4 into nodejs:main May 2, 2024
54 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in c5cfdd4

Whitecx pushed a commit to Whitecx/node that referenced this pull request May 2, 2024
PR-URL: nodejs#52609
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Ch3nYuY pushed a commit to Ch3nYuY/node that referenced this pull request May 8, 2024
PR-URL: nodejs#52609
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
targos pushed a commit that referenced this pull request May 8, 2024
PR-URL: #52609
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
lukins-cz pushed a commit to lukins-cz/OS-Aplet-node that referenced this pull request Jun 1, 2024
PR-URL: nodejs#52609
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet