Benchmark performance #17

rattrayalex · 2017-07-07T04:23:21Z

As mentioned in #16 (comment), it would be great to have single-file and multi-file typechecking performance comparisons.

niieani · 2017-07-08T10:07:57Z

I think @vkurchatkin mentioned something on Twitter (if I recall correctly) about making some basic benchmarks in the past. Could you share the code you used to generate your codebase?

Probably the best benchmark would be the same, large codebase in both languages, unfortunately I don't see that happening anytime soon, unless anybody knows of any company that migrated from TS to Flow or the other way around. If they would be willing to share some benchmarks, it would be really helpful.

On the other hand, perhaps with the TypeScript parser now coming to Babel, it might be feasible to try and write a converter between those two syntaxes. It wouldn't be easy, cause of the subtle differences, but I think you could get 90% done properly and automatically for most code-bases.

It would be really useful for Flow's poor situation with typings for external libs (being able to translate DefinitelyTyped to Flow). Auto-translation might be problematic for more advanced use cases (like heavy FP), but it's a minority of libs.

ngryman · 2019-08-20T10:47:39Z

Hey folks,

In my company we're evaluating the migration from Flow to Typescript. We wanted to validate that Typescript is faster (or at least not significantly slower) than Flow. As a result we developed this benchmarks: https://github.com/zapier/type-system-benchmarks

The benchmarks generate a configurable number of React components and compare Flow and Typescript performance when it comes to type checking. It also tries to illustrate real world Webpack configurations and compare the overall compilation performance.

It's not perfect but it already gives a good idea about the relative performance of each tool. Interestingly, Flow actually outperforms Typescript in these tests... However my assumption is that the more you add dependencies, the more Typescript will outperform Flow. I still need to add a benchmark that will add more dependencies to the generated code.

Anyway I'll be interested in your feedback or suggestions about it!

jeffvandyke · 2019-08-20T11:33:43Z

Nice work! It's nice to have at least some kind of objective comparable benchmarks when comparing these two systems.

niieani · 2019-08-20T12:17:08Z

@ngryman I think my main objection to the benchmark is that rerunning the type check from scratch on a small project (many individual files) won't give the same results as running a single type check on a large codebase.

Actually, this benchmark seems will mostly check the runtime initialization time, rather than type-checking time. It's fine for testing babel vs tsc compilation time, though, since types don't matter for that.

A more useful benchmark would be to generate a large dependency tree with various typings (including things like mapped types and generics) and then re-run that 10-20 times to get the averages.

E.g. in our codebase, Flow takes about 2 minutes to complete a single complete run from scratch (on a fast computer). So it seems it's the complex dependency tree that's causing major slowdowns, rather than type-checking many individual files.

It would also be interesting to benchmark the re-check time - after a file change has occurred (once the language server is already running), ideally not on a leaf file, but somewhere deep in the dependency tree (a type change that affects multiple files).

ngryman · 2019-08-20T12:52:31Z

Thanks for your feedback guys.

@niieani

A more useful benchmark would be to generate a large dependency tree with various typings (including things like mapped types and generics) and then re-run that 10-20 times to get the averages.

E.g. in our codebase, Flow takes about 2 minutes to complete a single complete run from scratch (on a fast computer). So it seems it's the complex dependency tree that's causing major slowdowns, rather than type-checking many individual files.

I completely agree with you. The current benchmark is not finished and that's why I went to seek for feedback before continuing it. For now I think it gives an idea on the bare performance of each tools and how they compare when processing almost the same number of files. We found that information useful and somewhat enough for us to validate the performance aspect of the migration.

However I agree that this is not representative of a real world project with a large dependency tree. I would need to add an option to the benchmark that will add various big dependencies to measure how both tool scale. That's something I could work on.

It would also be interesting to benchmark the re-check time - after a file change has occurred (once the language server is already running), ideally not on a leaf file, but somewhere deep in the dependency tree (a type change that affects multiple files).

Agreed, right now the benchmark measures initialization + first checking pass. I could definitely add an option to the benchmark to measure re-runs.

If I had to extract action items from this feedback, that would be:

Add an option to the code generator that will add a given number of dependencies to the generated code.
Add an option to the typecheck benchmark that will measure subsequent type checks when a file has changed.

Does it looks good? I can open PRs for that. Would you be open to give me your feedback in those PRs?

niieani · 2019-08-20T13:08:10Z

@ngryman

I think regarding dependencies - it's important to generate deeply nested trees, rather than only a lot of imports (e.g. a file that imports another, that imports another, etc.) - and maybe add in some safe circular dependencies while you're at it 🤔. Some language features are also more expensive than others. E.g. in TypeScript it turned out that large unions make it slow. No idea how it affects flow.

Maybe the TypeScript perf suite would be an inspiration (some info here)? I don't know where it lives though. @weswigham could you help us out here?

As for the subsequent type checks, it would be good to measure the time by running the language server, instead of the CLI, since that's how IDEs do it. CLI would probably add some overhead.

Happy to give you further feedback if the time allows.

Anyway, I'm super happy this is happening! Thanks @ngryman! 🏅

weswigham · 2019-08-20T17:09:37Z

Maybe the TypeScript perf suite would be an inspiration (some info here)? I don't know where it lives though. @weswigham could you help us out here?

It's just a set of virtualized snapshots of some now quite old codebases (partially msft internal, so not public) - an old Monaco build, an old copy of the compiler itself (circa 1.6 I think?), and an old version of the azure devops frontend... Real code, unmodified except for being serialized into a virtual fs to remove disk time from the benchmark (which is unstable even on one machine thanks to os and disk level caching), then tracked over time (we have graphs and stuff).

More telling is probably the sampling you now see over on definitely typed, where we attempt to measure the (comparitive) language server slowness of each package there (people can write a Turing machine in the type system, after all, and we'd rather not invoke a type which computes for far too long in a popular package) by sampling response time for some common actions across all the identifiers in the package tests. That's still being tested, but can be informative when authoring a package with complex types - if someone needs to wait an extra 1000ms for completions on your package because calculating the 24th digit of pi in the typesystem isn't particularly efficient, maybe you shouldn't do that~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark performance #17

Benchmark performance #17

rattrayalex commented Jul 7, 2017

niieani commented Jul 8, 2017

ngryman commented Aug 20, 2019 •

edited

jeffvandyke commented Aug 20, 2019

niieani commented Aug 20, 2019 •

edited

ngryman commented Aug 20, 2019 •

edited

niieani commented Aug 20, 2019 •

edited

weswigham commented Aug 20, 2019

Benchmark performance #17

Benchmark performance #17

Comments

rattrayalex commented Jul 7, 2017

niieani commented Jul 8, 2017

ngryman commented Aug 20, 2019 • edited

jeffvandyke commented Aug 20, 2019

niieani commented Aug 20, 2019 • edited

ngryman commented Aug 20, 2019 • edited

niieani commented Aug 20, 2019 • edited

weswigham commented Aug 20, 2019

ngryman commented Aug 20, 2019 •

edited

niieani commented Aug 20, 2019 •

edited

ngryman commented Aug 20, 2019 •

edited

niieani commented Aug 20, 2019 •

edited