Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache DocBlocks #608

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Declspeck
Copy link
Contributor

This PR adds caching for DocBlocks in DefinitionResolver. The cache is cleared after each file is parsed and after each server operation to avoid memory leaks. This makes indexing ~17% faster on my machine.

Performance.php before (total 76 sec):

0
1000
2000
3000
4000
5000
------------------------------
Time [drupal]: 20.532634019852
------------------------------
Time [wordpress]: 1.5450940132141
0
------------------------------
Time [php-language-server]: 0.35776019096375
0
1000
2000
3000
4000
5000
16000
7000
8000
9000
10000
11000
12000
13000
14000
------------------------------
Time [tolerant-php-parser]: 38.144486904144
0
------------------------------
Time [math-php]: 0.732008934021
0
1000
2000
------------------------------
Time [symfony]: 11.475233078003
------------------------------
Time [codeigniter]: 0.75957489013672
------------------------------
Time [cakephp]: 2.8739709854126

Performance.php after (total 63 sec):

0
1000
2000
3000
4000
5000
------------------------------
Time [drupal]: 16.952955961227
------------------------------
Time [wordpress]: 1.3229329586029
0
------------------------------
Time [php-language-server]: 0.25634002685547
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
------------------------------
Time [tolerant-php-parser]: 31.547545909882
0
------------------------------
Time [math-php]: 0.574462890625
0
1000
2000
------------------------------
Time [symfony]: 9.5325789451599
------------------------------
Time [codeigniter]: 0.60508608818054
------------------------------
Time [cakephp]: 2.0659110546112

@codecov
Copy link

codecov bot commented Feb 24, 2018

Codecov Report

Merging #608 into master will increase coverage by 0.04%.
The diff coverage is 90.32%.

@@             Coverage Diff              @@
##             master     #608      +/-   ##
============================================
+ Coverage     81.63%   81.67%   +0.04%     
- Complexity      910      914       +4     
============================================
  Files            61       62       +1     
  Lines          2075     2085      +10     
============================================
+ Hits           1694     1703       +9     
- Misses          381      382       +1
Impacted Files Coverage Δ Complexity Δ
src/LanguageServer.php 77.47% <0%> (-0.71%) 27 <0> (ø)
src/PhpDocument.php 79.66% <100%> (+0.35%) 24 <0> (ø) ⬇️
src/DefinitionResolver.php 87.38% <100%> (+0.06%) 325 <1> (-5) ⬇️
src/CachingDocBlockFactory.php 90.47% <90.47%> (ø) 9 <9> (?)

public function getDocBlock(Node $node)
{
$cacheKey = $node->getStart() . ':' . $node->getUri();
if (array_key_exists($cacheKey, $this->cache)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is a rather hot path, so using
If(isset(...) || array_key_exists(...))
Could speed it up further

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - I could try how that affects the performance, but I'm not too optimistic. I think that forming the cache key dominates the performance impact of the function.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why only isset is not sufficient? $cache doesn't have a type declaration unfortunately but I assume it would not contain null values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has nulls at least for fields where there is no comment text, but the cost for those fields is probably not big, since the comment is never parsed. But there was the following comment in the code, which is the primary reason I added this - I have not verified if that is still the case though:

try {
    // create() throws when it thinks the doc comment has invalid fields.
    // For example, a @see tag that is followed by something that doesn't look like a valid fqsen will throw.
    return $this->docBlockFactory->create($text, $context);
} catch (\InvalidArgumentException $e) {
    return null;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could use a NullObject instead of null in this cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the NullObject pattern - so just define class NullObject {} and store those in the array, without NullObjects ever leaving the cache class? I'm not sure if that would make the code clearer here, since there would have to be a check for instanceof NullObject when reading the cache.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A NullObject would only use unneeded memory and put pressure on the GC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not with

class NullObject {
    public static function getInstance(): self {
        static $instance;
        return $instance ?? ($instance = new self);
    }
}

However, I'm still not convinced this would make the code clearer, especially since nullness only needs to be checked in one place.

return $this->cache[$cacheKey];
}
$text = $node->getDocCommentText();
return $this->cache[$cacheKey] = $text === null ? null : $this->createDocBlockFromNodeAndText($node, $text);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard to read, could you set parenthesis?

/**
* Maps file + node start positions to DocBlocks.
*/
private $cache = [];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add an @var annotation

@@ -346,6 +310,11 @@ public function resolveReferenceNodeToFqn(Node $node)
return null;
}

public function clearCache()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a docblock.

@@ -141,6 +141,10 @@ public function __construct(ProtocolReader $reader, ProtocolWriter $writer)
$e
);
}

// When a request is processed, clear the caches of definition resolver as not to leak memory.
$this->definitionResolver->clearCache();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the cache needs to be cleared after every single protocol message. This is a very hot code path that is even hit in the middle of indexing. Could you explain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent was to

  • clear the cache when some user action causes the cache to be filled to avoid memory leaks, and
  • not have to worry about invalidating the cache on user edits.

You're right, the code does not fulfill that intent. I also did not realize that it could be hit in the middle of indexing - I imagine that this might cause some weird behavior if user edit is performed on a file that is being parsed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't happen while a file is parsing, but whenever control is yielded back to the event loop (definitely between indexing files) or getting file contents.

The problem I have is that a cache is not useful if it is constantly cleared when it doesn't have to. For example, moving the mouse over a document can trigger thousands of hover requests within milliseconds. All of these would cause the whole cache to be cleared every time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - it currently only speeds things up when the same docblock is parsed multiple times on the same request. If it's moved into PhpDocument, this issue could also be resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixfbecker - I started to move this to PhpDocument and found that it was rather difficult to pass PhpDocument to all functions requiring access to docblocks.

Should I

  • Pass PhpDocument as argument individually to all DefinitionResolver functions requiring it, even if it requires adding the same argument to many places.
  • Have a per-document DefinitionResolver, containing a reference to PhpDocument
  • Have PhpDocumentFactory in DefinitionResolver, access the document through the URI of a node. This would require some changes though, since PhpDocument parses in constructor, so it would not be accessible via PhpDocumentFactory during the initial parse.
  • Something else?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Maybe create a dedicated class for getting docblocks that caches them? The only constraint is that the memory needs to be disposed when a PhpDocument is disposed.

class CachingDocBlockFactory
{
/**
* Maps file + node start positions to DocBlocks.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the pre-tolerant parser version I used to save the docblock instance on the Node itself so they would get naturally get cached and garbage collected when the document gets garbage collected. The CachingDocBlockFactory instance is saved in the DefinitionResolver, which is a long-living singleton-like object. Why not save this map on PhpDocument instances? That way it is not a cache that needs to be cleared on seemingly unrelated events like new messages, it is simply a memoization store to remember lazily computed docblocks. It might actually be just as fast to eagerly compute them all because we need to do that anyway to get the hover contents for the index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a lot of sense. Perhaps the name resolution cache from the Scope thing could also live in PhpDocument then?

@jens1o
Copy link
Contributor

jens1o commented Mar 23, 2018

Is there something I can do to help getting this into master?

@Mte90
Copy link

Mte90 commented Mar 9, 2020

any updates?

@jens1o
Copy link
Contributor

jens1o commented Mar 29, 2020

@felixfbecker I wonder whether you still have enough power (given the circumstances) to maintain this project? If so, I'd maybe invest some time in this in the following weeks. I'm happy to hearing back from you. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants