Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Keeping Attribute Name Casing #1128

Open
EisenbergEffect opened this issue Feb 7, 2024 · 9 comments
Open

Enable Keeping Attribute Name Casing #1128

EisenbergEffect opened this issue Feb 7, 2024 · 9 comments

Comments

@EisenbergEffect
Copy link

Some libraries/frameworks would like to use Parse5 in a way that preserves the casing of attribute names. A common use case for this is a templating engine that enables binding to custom events or JS properties (which natively allow for both lower and upper case).

As far as I can tell, this is controlled by the Tokenizer's _stateAttributeName method, which can be found here:

https://github.com/inikulin/parse5/blob/master/packages/parse5/lib/tokenizer/index.ts#L1673

Would it be possible to add a feature that tells this method to not lower case every letter?

Alternatively, is there a way to create a custom tokenizer that simply overrides this protected method with the altered behavior? If there is, I just need a code sample for that. Currently, I'm monkey patching the Tokenizer prototype directly, which I'd prefer not to do.

Thanks for the help!

@tenadolanter
Copy link

tenadolanter commented Feb 22, 2024

Have encountered the same problem.

I use a mapping to store tags and attributes with uppercase letters, and then replace them after conversion, but I don't think this is a good way.

@EisenbergEffect
Copy link
Author

Can anyone from the parse5 team comment? I'm happy to put together a PR to enable this setting if it would be accepted and someone could guide me as to the preferred way to add it.

@wooorm
Copy link
Collaborator

wooorm commented May 14, 2024

This is an HTML parser (following the WHATWG spec). Not a custom language parser. Most of the closed issues are people asking similar questions: https://github.com/inikulin/parse5/issues?q=is%3Aissue+sort%3Aupdated-desc+is%3Aclosed. So I don’t think this will be accepted.

Particularly this seems a duplicate of #221.

To me it seems likely that this is an XY question. Perhaps I can help you better if I know more about your root problem.

@EisenbergEffect
Copy link
Author

I am implementing a compiler. It takes HTML as input and produces a reusable template that can be bound to data. Think of just about any front-end rendering library as an example.

The HTML needs to include bindings to attributes, properties, and events. While attributes are always case normalized in HTML, the properties and events fired by the underlying Node instances can have any combination of casing. So, the templating language needs to be able to support that. For example:

<my-element :someProperty="{{this.someOtherProperty}}">

When parsing the HTML, we don't want the parser to return :someproperty because that is not the JS property name. We need it to preserve the casing, so we can get :someProperty.

We do not know the properties and events ahead of time, so we cannot correct the casing automatically. We need the casing preserved. Other than this, everything is normal HTML.

This is why I requested an option to have the parser not normalize the casing of attributes. I have monkey patched this for now, but would prefer not to have to take that approach. I would also prefer not to have to fork or write a parser from scratch just to essentially remove one invocation of toAsciiLower().

The community apparently brings this up often, so it seems like a legitimate broad need. It doesn't seem like it would be a lot of work to implement and could remain completely backwards compatible. Only those who want this would opt-in.

@wooorm
Copy link
Collaborator

wooorm commented May 14, 2024

The community apparently brings this up often, so it seems like a legitimate broad need

Maybe. But it’s also free software. It’s legitimate for people to not want to maintain the things other folks want. To not do everything every user ever wants. We also get folks wanting to pass Vue files through. Or folks who want <div/> to be closing. That’s all also out of scope. You can use patch-package or fork if you want.

Personally, maintaining a lot of parsers, particularly around the markdown space, for years, I’m very strongly on sticking with the specs and not allowing deviations. Especially for mature languages/projects.

I’d recommend either:
a) fork / patch-package / build a new parser
b) use XML
c) use JSX
d) use actual HTML

I’d bet on the popular JSX or HTML.

@43081j
Copy link
Collaborator

43081j commented May 14, 2024

@EisenbergEffect you can at least use location info to get hold of it:

const source = '<div casedAttribute="abc"></div>';
const frag = parseFragment(source, {
  sourceCodeLocationInfo: true
});
const div = frag.childNodes[0];
const {startOffset, endOffset} = div.sourceCodeLocation.attrs.casedattribute;
source.slice(startOffset, endOffset); // casedAttribute="abc"

@EisenbergEffect
Copy link
Author

@wooorm I've been doing open source for 20 years, so I totally get it. You need to do what's best for your project.

For my part, I don't want JSX or XML. There's not a great way to use actual HTML for this purpose without introducing a fairly verbose syntax. I've already patched things, and have everything working. I just wanted to explore whether there was a better way.

@43081j That may do the trick. I'll give it a try. Thanks!

@wooorm
Copy link
Collaborator

wooorm commented May 14, 2024

actual HTML for this purpose without introducing a fairly verbose syntax

One idea: the dataset api in html has a similar "problem". It is solved there by dash vs camelcase. So the data attributes are data-foo-bar, which corresponds to the property dataset.fooBar.

There is also the question of whether it would be good to support properties when writing attributes. Preact/Vue never did, always going with attributes. React had until V19 just now a huge problem adding support for custom elements and more because they went the property route. So perhaps sidestepping the problem may be better

@EisenbergEffect
Copy link
Author

The fact of the matter is that DOM nodes, both built-ins and custom, have properties that need to be manipulated. Supporting attributes only would be a major problem. Using data- isn't great either because that creates a real mismatch/confusion with respect to the actual properties that are targeted. They aren't data properties at all. We could say that any attr with the : prefix should follow the data- casing conversion pattern. That's not terrible, but it adds more cognitive load on the template author, which isn't great.

I appreciate the thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants