Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.0 | Proposal: improvements to DocBlock tokenization #484

Open
jrfnl opened this issue May 8, 2024 · 0 comments
Open

4.0 | Proposal: improvements to DocBlock tokenization #484

jrfnl opened this issue May 8, 2024 · 0 comments

Comments

@jrfnl
Copy link
Member

jrfnl commented May 8, 2024

Current Situation

As things are, tokens within DocBlocks are tokenized as T_DOC_COMMENT_* tokens.

Additionally, there are two tokens in a docblock which receive additional information:

  • The T_DOC_COMMENT_OPEN_TAG token will have the following additional keys in the token array:
    • comment_tags: array<int> - array with stack pointers to DocBlock tags
    • comment_closer: int - stack pointer to the DocBlock closer
  • The T_DOC_COMMENT_CLOSE_TAG token will have the following additional key in the token array:
    • comment_opener: int - stack pointer to the DocBlock opener

The problem

Sometimes a sniff may want to listen for a specific Docblock token, like T_DOC_COMMENT_TAG, and depending on the findings may then want to jump to the DocBlock opener/closer to check some additional info.

This is currently not possible.

The work-around is to always listen for a T_DOC_COMMENT_OPEN_TAG token, loop through the comment_tags to find the target(s), and jump to the tag from there. The comment_closer info is then available to the sniff on the T_DOC_COMMENT_OPEN_TAG token.

Alternatively, a sniff can walk to the start/end of the DocBlock from any token within, but depending on how extensive the available documentation is (including possible annotations for Doctrine, Psalm, PHPUnit), walking the tokens to the start/end of the DocBlock can easily mean having to walk hundreds of tokens.

Proposal

I'd like to propose adding the comment_opener and comment_closer indexes to all tokens within a DocBlock, so the information about the opener/closer is always available.

For those sniffs affected by the above outlined problem, this should allow for improving the performance.

Alternatives considered

Other token types with an *_opener/*_closer often have a token array index on the tokens between the "opener" and "closer" in array format with information about all applicable (potentially nested) openers/closers (or "owners") (also see the details in the fold-out below), however DocBlocks can not be nested, so putting the information about the opener/closer in an array does not make sense to me for DocBlocks.

Other considerations

For all other opener/closer type of indexes, the *_opener/*_closer indexes are consistently both added to both the opener as well as the closer.

At this time, the way the indexes are set for DocBlocks is not consistent with that as the indexes for DocBlocks are set asymmetrically. This proposal fixes that too.

Details on all opener/closer types and what info is available in the token array
Bracket type Opener + closer has these indexes And may have Tokens between have Notes
Parentheses parenthesis_opener + parenthesis_closer parenthesis_owner nested_parenthesis (array<int opener => int closer>)
Curly braces (all) bracket_opener + bracket_closer -- -- See #12 for a proposal to add a nested_brackets key
Square brackets (all, including short arrays) bracket_opener + bracket_closer -- -- See #12 for a proposal to add a nested_brackets key
Curly braces for control structures scope_opener + scope_closer scope_condition conditions (array<int owner => int|string owner code>)
Alternative syntax for control structures scope_opener + scope_closer scope_condition conditions (array)
Heredoc/Nowdoc scope_opener + scope_closer scope_condition conditions (array)
Attributes attribute_opener + attribute_closer -- attribute_opener, attribute_closer, nested_attributes (array<int opener => int closer>) Yeah, not all that consistent, but never mind that for now.
Comments Opener: comment_closer, Closer: comment_opener -- --

Unless otherwise annotated, these indexes will all be integer stack pointers.


Planning

While this isn't necessarily a breaking change, I still propose to include the change in the 4.0 release for safety reasons, as there may be sniffs which explicitly only expect a comment_opener/comment_closer index on the T_DOC_COMMENT_CLOSE_TAG/T_DOC_COMMENT_OPEN_TAG tokens respectively and those sniffs may need to be adjusted.

Opinions ?

Please let me know if you have any concerns about this proposal or any suggestions for further enhancements to the DocBlock tokenization.

/cc @asispts @dingo-d @fredden @GaryJones @greg0ire @kukulich @michalbundyra @Ocramius @sirbrillig @stronk7 @weierophinney @wimg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant