Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newlines stops consuming tag descriptions #241

Open
jaapio opened this issue May 13, 2024 · 3 comments
Open

Newlines stops consuming tag descriptions #241

jaapio opened this issue May 13, 2024 · 3 comments

Comments

@jaapio
Copy link
Contributor

jaapio commented May 13, 2024

I got an issue report on my project phpDocumentor/ReflectionDocBlock#365 where we found an issue in this parser. Summarized the users are facing an issue with tag descriptions containing empty lines.

/**
 * @param string $myParam this is a description of the param
 *
 *                     And this line also belongs to $myParam.
 * @param array $other this is a new param.
 */

Newlines in tag descriptions are possible, but when the newline is detected the parser stops consuming new tokens. While if should only stop when a new tag starts. Other reported issues are more or less out of scope of this issue and I can workaround them. But the situation with the newlines cannot be patched from the outside. To prove the solution I made an ugly hack in our code to see if we can overcome this issue by just adjusting a single method.

phpDocumentor/ReflectionDocBlock#369

The POC shows that the method getSkippedHorizontalWhiteSpaceIfAny is not working when the detected token is processed as a newline token. As a result the leading whitespace between param and And in my example is removed by the parser.

I would be able to create a PR to solve this issue. But before I invest more time I would like to see if other solutions are already available.

@ondrejmirtes
Copy link
Member

Have you tried setting textBetweenTagsBelongsToDescription in PhpDocParser to true?

These options are similar to PHPStan's bleeding edge. Right now they're set to false for BC reasons but they will be removed in 2.0 and code will behave the new modern way (as if these options were set to true).

Also check out options in TypeParser and ConstExprParser.

@jaapio
Copy link
Contributor Author

jaapio commented May 13, 2024

Yes I did check those settings, those resulted in the same behavior.
I didn't check the other parsers but I assume they have nothing Todo with the issue we are facing as it is reproduced with just the description. In fact the text processing stops after an empty line in all cases I checked.

@jaapio
Copy link
Contributor Author

jaapio commented May 17, 2024

I checked again... since I was using the parser just for a single tag this caused issues. So I do have a workaround that works for us right now. As we can simply consume all tokens from the lexer.

Which brings me to another issue, which might be one you also are facing when reconstructing docblocks. When I have a docblock line like this:

/**
 * @param string $myParam this is a description of the param
 *
 *                     And this line also belongs to $myParam.
 * @param array $other this is a new param.
 */
*/

The token on line 2 contains: \n with type Lexer::TOKEN_PHPDOC_EOL but as the parser sees this as a EOL it will remove the trailing whitespace when building the description.

We fixed this by adding some post processing on the tokens:

    private function tokenizeLine(string $tagLine): TokenIterator
    {
        $tokens = $this->lexer->tokenize($tagLine);
        $fixed = [];
        foreach ($tokens as $token) {
            if ($token[1] === Lexer::TOKEN_PHPDOC_EOL) {
                if (rtrim($token[0], " \t") !== $token[0]) {
                    $fixed[] = [rtrim($token[Lexer::VALUE_OFFSET], " \t"), Lexer::TOKEN_PHPDOC_EOL, $token[2] ?? null];
                    $fixed[] = [ltrim($token[Lexer::VALUE_OFFSET], "\n\r"), Lexer::TOKEN_HORIZONTAL_WS, ($token[2] ?? null) + 1];
                    continue;
                }
            }

            $fixed[] = $token;
        }
        return new TokenIterator($fixed);;
    }

But I think this is a bug in the way phpstan parses the docblocks. This expression: self::TOKEN_PHPDOC_EOL => '\\r?+\\n[\\x09\\x20]*+(?:\\*(?!/)\\x20?+)?', seems to be to greedy.

Do you think it's worth fixing this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants