refactor(traverse): simplified way to get TraverseCtx #3216

Dunqing · 2024-05-09T13:36:07Z

When we want to use TraverseCtx in nested functions, we run into trouble. We need to keep passing the TraverseCtx around. I've put TraverseCtx inside TransformerCtx because TransformerCtx is everywhere.

Dunqing · 2024-05-09T13:36:25Z

refactor(traverse): simplified way to get TraverseCtx #3216 👈
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @Dunqing and the rest of your teammates on Graphite

codspeed-hq · 2024-05-09T13:42:28Z

CodSpeed Performance Report

Merging #3216 will degrade performances by 3.13%

_{Comparing 05-09-refactor_traverse_simplified_way_to_get_traversectx (83f50bc) with main (9525653)}

Summary

⚡ 1 improvements
❌ 1 regressions
✅ 25 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`05-09-refactor_traverse_simplified_way_to_get_traversectx`	Change
⚡	`semantic[RadixUIAdoptionSection.jsx]`	411 µs	349.5 µs	+17.59%
❌	`transformer[pdf.mjs]`	108.6 ms	112.1 ms	-3.13%

Dunqing · 2024-05-09T14:06:46Z

I don't know how to fix these ancestor_lifetime tests

Dunqing · 2024-05-09T14:56:55Z

It appears that the performance regression is due to the overhead of Rc and RefCell. I'm not sure if it's worth sacrificing 3% to obtain a better DX

overlookmotel · 2024-05-09T15:05:36Z

I'm afraid I don't think this is possible.

Why you can't do this

The safety of the scheme depends on making it impossible for an Ancestor to exist outside of/outlive an enter_* or exit_* method. If it was possible, then you could:

Save an Ancestor into a property of self.
Later on in an exit_* method e.g. exit_program, pull that Ancestor back out of self.
Call one of the methods on the Ancestor to get a reference to an AST node.
You now hold a &mut Program and simultaneously an immut & ref to one of its children.
Boom! You've broken the aliasing rules. Undefined behavior!

The thing that prevents you "holding on" to an Ancestor for longer is:

You have to borrow it from &ctx.
&ctx only lives as long as the visitor function is executing.

So tying the lifetime of the ctx object is essential to the safety of the scheme. And only way to do that is have it as a function parameter so that walk_* controls the lifetime.

Traverse allows you to do something you usually can't do (go back up the tree), but without breaking Rust's aliasing rules. The API is quite carefully designed to accomplish this. I've made the API as pleasant as I can, but unfortunately there are limits to how far it can be pushed within Rust's rules, without breaking the whole thing.

Some of this is quite subtle. I have added docs a couple of days ago. But probably I should add more docs to explain it more fully.

Performance impact

In this PR, you're also introducing an Rc<RefCell> where there wasn't one before.

CodSpeed is showing this as a -3% perf regression. 3% doesn't sound like much, but bear in mind that that benchmark time also includes the parser, and the parser is 80% of the time - only ~20% is the transformer itself. So if a change to transformer shows as 3% regression on this benchmark, that's 3% out of 20%. i.e. it means the transformer itself has got 15% slower. That's quite bad.

(I'm amazed Rc<RefCell> is so costly, but it seems it is)

What we could do instead

I had actually been thinking we could go in the opposite direction - move more stuff that all transforms need into TraverseCtx. This would:

Remove repeated code from all the individual transformer impls.
Likely remove the need for some of the other Rcs which we currently use, which will give a perf boost.

I already put the allocator and AST builder in TraverseCtx. That was meant to be just the start.

Does this make any sense?

overlookmotel · 2024-05-09T15:15:58Z

When we want to use TraverseCtx in nested functions, we run into trouble

Can you give me an example of this?

Dunqing · 2024-05-09T15:32:12Z

When we want to use TraverseCtx in nested functions, we run into trouble

Can you give me an example of this?

oxc/crates/oxc_transformer/src/react/jsx/mod.rs

Line 262 in 83f50bc

    
           fn transform_jsx<'b>(&mut self, e: &JSXElementOrFragment<'a, 'b>) -> Expression<'a> {

I want to call find_ancestor here. This means that I need to pass TraverseCtx throughout the call stack.

Similar examples are abundant in the typescript plugin. In the typescript plugin. We split the entire plugin transformer into multiple modules. Once we need to use TraverseCtx in some deep function. We then have to one by one passing TraverseCtx

overlookmotel · 2024-05-09T17:11:22Z

Ah I see. And this isn't a very helpful reply, but... sorry, can't do anything about it. If you need to use ctx in a function, it has to be passed into that function, there's no way to make it global within self.

But... unlike Visit and VisitMut, Traverse's enter_* and exit_* don't call walk_* themselves, so that does open the door to better composability.

If all individual transforms are Traverse impls:

struct MyTransform1;

impl<'a> Traverse<'a> for MyTransform1 {
  fn enter_program(&mut self, program: &mut Program<'a>, ctx: &TraverseCtx<'a>) {
    // Do stuff
  }
}

struct MyTransform2;

impl<'a> Traverse<'a> for MyTransform2 {
  fn enter_program(&mut self, program: &mut Program<'a>, ctx: &TraverseCtx<'a>) {
    // Do more stuff
  }
}

Then we can codegen a generic type CombinedTraverse:

struct CombinedTraverse<T1: Traverse, T2: Traverse> {
  t1: T1,
  t2: T2,
}

impl<T1: Traverse, T2: Traverse> CombinedTraverse<T1, T2> {
  pub fn new(t1: T1, t2, T2) -> Self {
    Self { t1, t2 }
  }
}

impl<T1: Traverse, T2: Traverse> CombinedTraverse<T1, T2> {
  fn enter_program(&mut self, program: &mut Program<'a>, ctx: &TraverseCtx<'a>) {
    self.t1.enter_program(program, ctx);
    self.t2.enter_program(program, ctx);
  }

  // etc - all the enter/exit methods
}

Then rather than writing by hand a Traverse which just forwards all methods to the individual transforms, you combine 2 transforms and run them both with e.g.:

let transform = CombinedTraverse::new(
  TypescriptTransform::new(),
  ReactTransform::new(),
);
traverse_mut(&mut transform, program, allocator);

or even:

let transform = CombinedTraverse::new(
  TypescriptTransform::new(),
  CombinedTraverse::new(
    ReactTransform::new(),
    CombinedTraverse::new(
      ES6Transform::new(),
      ES5Transform::new(),
    ),
  ),
);
traverse_mut(&mut transform, program, allocator);

Or with a macro to sugar that:

let transform = combine_transforms!(
  TypescriptTransform::new(),
  ReactTransform::new(),
  ES6Transform::new(),
  ES5Transform::new(),
);
traverse_mut(&mut transform, program, allocator);

Can't do anything about needing ctx param in all your visitor functions, but it would remove the need to have intermediate types which pass down ctx everywhere.

Does this API appeal to you?

NB: What I don't know is whether this would blow up compile times. Maybe it's impractical for that reason.

Dunqing · 2024-05-10T04:30:30Z

Why you can't do this

The safety of the scheme depends on making it impossible for an Ancestor to exist outside of/outlive an enter_* or exit_* method. If it was possible, then you could:

Save an Ancestor into a property of self.

Later on in an exit_* method e.g. exit_program, pull that Ancestor back out of self.

Call one of the methods on the Ancestor to get a reference to an AST node.

You now hold a &mut Program and simultaneously an immut & ref to one of its children.

Boom! You've broken the aliasing rules. Undefined behavior!

The thing that prevents you "holding on" to an Ancestor for longer is:

You have to borrow it from &ctx.

&ctx only lives as long as the visitor function is executing.

So tying the lifetime of the ctx object is essential to the safety of the scheme. And only way to do that is have it as a function parameter so that walk_* controls the lifetime.

Traverse allows you to do something you usually can't do (go back up the tree), but without breaking Rust's aliasing rules. The API is quite carefully designed to accomplish this. I've made the API as pleasant as I can, but unfortunately there are limits to how far it can be pushed within Rust's rules, without breaking the whole thing.

Some of this is quite subtle. I have added docs a couple of days ago. But probably I should add more docs to explain it more fully.

Thank you for explaining this. Do we have a runtime check way to avoid a break?

#[allow(unsafe_code)]
pub fn traverse_mut<'a, Tr: Traverse<'a>>(traverser: &mut Tr, program: &mut Program<'a>) {
    // SAFETY: Walk functions are constructed to avoid unsoundness
    // Save an `TraverseCtx` into property a of`self`
    unsafe { walk::walk_program(traverser, program as *mut Program) };
    debug_assert!(traverser.ctx().ancestors_depth() == 1);
    // Remove `TraverseCtx` from self 
}

Is this possible?

I think ease of use is something we have to consider. We have tons of plugins to write.

Dunqing · 2024-05-10T04:45:03Z

But... unlike Visit and VisitMut, Traverse's enter_* and exit_* don't call walk_* themselves, so that does open the door to better composability.

If all individual transforms are Traverse impls:

struct MyTransform1;

impl<'a> Traverse<'a> for MyTransform1 {
  fn enter_program(&mut self, program: &mut Program<'a>, ctx: &TraverseCtx<'a>) {
    // Do stuff
  }
}

struct MyTransform2;

impl<'a> Traverse<'a> for MyTransform2 {
  fn enter_program(&mut self, program: &mut Program<'a>, ctx: &TraverseCtx<'a>) {
    // Do more stuff
  }
}

Then we can codegen a generic type CombinedTraverse:

struct CombinedTraverse<T1: Traverse, T2: Traverse> {
  t1: T1,
  t2: T2,
}

impl<T1: Traverse, T2: Traverse> CombinedTraverse<T1, T2> {
  pub fn new(t1: T1, t2, T2) -> Self {
    Self { t1, t2 }
  }
}

impl<T1: Traverse, T2: Traverse> CombinedTraverse<T1, T2> {
  fn enter_program(&mut self, program: &mut Program<'a>, ctx: &TraverseCtx<'a>) {
    self.t1.enter_program(program, ctx);
    self.t2.enter_program(program, ctx);
  }

  // etc - all the enter/exit methods
}

Then rather than writing by hand a Traverse which just forwards all methods to the individual transforms, you combine 2 transforms and run them both with e.g.:

let transform = CombinedTraverse::new(
  TypescriptTransform::new(),
  ReactTransform::new(),
);
traverse_mut(&mut transform, program, allocator);

or even:

let transform = CombinedTraverse::new(
  TypescriptTransform::new(),
  CombinedTraverse::new(
    ReactTransform::new(),
    CombinedTraverse::new(
      ES6Transform::new(),
      ES5Transform::new(),
    ),
  ),
);
traverse_mut(&mut transform, program, allocator);

Or with a macro to sugar that:

let transform = combine_transforms!(
  TypescriptTransform::new(),
  ReactTransform::new(),
  ES6Transform::new(),
  ES5Transform::new(),
);
traverse_mut(&mut transform, program, allocator);

Can't do anything about needing ctx param in all your visitor functions, but it would remove the need to have intermediate types which pass down ctx everywhere.

Does this API appeal to you?

Yes, I think it's quite good. It will help us reduce a lot of boilerplate code. But for complex transformations, we still need to abstract into multiple functions (such as transform_jsx), and we still need to pass the ctx.

overlookmotel · 2024-05-10T05:47:26Z

Thank you for explaining this. Do we have a runtime check way to avoid a break?

If you try to make an Ancestor "escape" from enter_*, that's a compile time error. So we can't do a runtime check, as it would refuse to compile!

But this is what the compile failure tests are for - those are the tests which are failing on this PR.

Is this possible?
I think ease of use is something we have to consider. We have tons of plugins to write.

I do see your point, but I don't think it's possible. Let me think about it. Maybe I can come up with something, but it's pretty tricky. The whole scheme is quite finely balanced to give us what we want (ability to access back up the tree) while avoiding the pitfalls of potential UB.

I also would like to leave the door open for mutating up, which I think can be possible with this scheme, and could be useful in future. So I would like to avoid blocking that possibility while trying to remove ctx. And in my opinion, we also need to avoid the 15% perf cost of Rc<RefCell>.

So, yes please leave it with me and I will try to come up with something. But I'm afraid I'm unlikely to get anywhere for probably a week (it took me 3 weeks and 4 attempts to get this far).

Yes, I think it's quite good. It will help us reduce a lot of boilerplate code. But for complex transformations, we still need to abstract into multiple functions (such as transform_jsx), and we still need to pass the ctx.

Composition can be many levels deep. i.e. any transform can compose multiple other transforms.

What I had in mind is that React, ReactJsx, ReactJsxSelf, ReactJsxSource, ReactDisplayName are all implemented as Traverse impls. Then React transform is created as a combination of its parts:

let preset_react_transform = combine_transforms!(
  ReactJsx::new(&options),
  ReactJsxSelf::new(&options),
  ReactJsxSource::new(&options),
  ReactDisplayName::new(&options),
);

and then the top level of transformer:

let all_transform = combine_transforms!(
  Typescript::new(&options),
  React::new(&options),
  // etc...
);

i.e. Each transform "plugin" is either a direct implementation of a transform or a composition of multiple other transforms. And each of those other transforms can be a composition of more transforms. You can go as deep as you want to.

@Boshen Can you advise if this CombinedTraverse is going to be a compile-time killer? Or do we just need to try it to find out?

overlookmotel · 2024-05-10T07:02:24Z

I've implemented CombinedTraverse in #3219 so we can try it out.

github-actions bot added the A-transformer Area - Transformer / Transpiler label May 9, 2024

Dunqing requested review from overlookmotel and Boshen May 9, 2024 13:59

Dunqing force-pushed the 05-09-refactor_traverse_simplified_way_to_get_traversectx branch from 2450bad to 52a307b Compare May 9, 2024 14:13

refactor(traverse): simplified way to get TraverseCtx

83f50bc

Dunqing force-pushed the 05-09-refactor_traverse_simplified_way_to_get_traversectx branch from 52a307b to 83f50bc Compare May 9, 2024 14:34

overlookmotel mentioned this pull request May 10, 2024

feat(traverse): CombinedTraverse type to compose Traverses #3219

Closed

Dunqing closed this May 13, 2024

overlookmotel mentioned this pull request May 13, 2024

Alternative API for getting TraverseCtx in #3260

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(traverse): simplified way to get TraverseCtx #3216

refactor(traverse): simplified way to get TraverseCtx #3216

Dunqing commented May 9, 2024 •

edited

Dunqing commented May 9, 2024

codspeed-hq bot commented May 9, 2024 •

edited

Dunqing commented May 9, 2024

Dunqing commented May 9, 2024

overlookmotel commented May 9, 2024 •

edited

overlookmotel commented May 9, 2024

Dunqing commented May 9, 2024 •

edited

overlookmotel commented May 9, 2024 •

edited

Dunqing commented May 10, 2024

Why you can't do this

Dunqing commented May 10, 2024

overlookmotel commented May 10, 2024 •

edited

overlookmotel commented May 10, 2024

refactor(traverse): simplified way to get TraverseCtx #3216

refactor(traverse): simplified way to get TraverseCtx #3216

Conversation

Dunqing commented May 9, 2024 • edited

Dunqing commented May 9, 2024

codspeed-hq bot commented May 9, 2024 • edited

CodSpeed Performance Report

Merging #3216 will degrade performances by 3.13%

Summary

Benchmarks breakdown

Dunqing commented May 9, 2024

Dunqing commented May 9, 2024

overlookmotel commented May 9, 2024 • edited

Why you can't do this

Performance impact

What we could do instead

overlookmotel commented May 9, 2024

Dunqing commented May 9, 2024 • edited

overlookmotel commented May 9, 2024 • edited

Dunqing commented May 10, 2024

Why you can't do this

Dunqing commented May 10, 2024

overlookmotel commented May 10, 2024 • edited

overlookmotel commented May 10, 2024

Dunqing commented May 9, 2024 •

edited

codspeed-hq bot commented May 9, 2024 •

edited

overlookmotel commented May 9, 2024 •

edited

Dunqing commented May 9, 2024 •

edited

overlookmotel commented May 9, 2024 •

edited

overlookmotel commented May 10, 2024 •

edited