Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextToPath cache #66

Open
andersmelander opened this issue Apr 11, 2019 · 5 comments
Open

TextToPath cache #66

andersmelander opened this issue Apr 11, 2019 · 5 comments
Assignees

Comments

@andersmelander
Copy link
Member

As mentioned in issue #65 it would be good for performance if some sort of cache was involved in the text rendering pipeline.

Well, here you go:

GR32_Text_VCL.zip

The attached modifications implements a cache in the VCL backend TextToPath functions.
With the cache enabled the performance of the Text rendering test in the benchmark example has more than doubled. Where do I collect my paycheck? :-)

The cache has been inserted into the existing implementation using conditional defines so it can be enabled or disabled at compile time.

The cache is implemented with generics TDictionary<> using LOGFONT as the primary key into the dictionary and the glyph character value as a secondary key.

I'm caching the font metrics, the individual glyph metrics and the individual TT glyph polygon data. It will be trivial to add kerning data as well. As could be expected only the caching of TT polygon data seems to have a significant impact on the performance.

I have not tried to profile or optimize it in any way.

@AngusJohnson
Copy link
Contributor

While I haven't yet looked at the code, I've given it a quick test drive and it seems to work just fine.

Where do I collect my paycheck? :-)

I recommend we double your salary :).

@andersmelander
Copy link
Member Author

Changes are now in the TextToPath-cache branch.

@andersmelander
Copy link
Member Author

I tried enhancing the cache replacement policy to make it more resistant to trashing. Since I store the hit count on each individual glyph there's plenty of data available to make this possible.

My results show that the gain is nominal. First one has to make the cache really small before items start being replaced in the cache. The size of a single glyph is only 400-500 bytes and with a default max cache size of 256K there's room for approx. 500 different glyphs. When the polygon benchmark example is executed it uses about 32Kb caching ~60 glyphs. In other words: With typical use the cache will probably never trash, so the added complexity (and overhead) of a more intelligent cache replacement policy is simply not worth it.

FTR, the improved algorithm I used considers the "worth" of an item before purging it from the cache.
When the cache must be trimmed to make room for a new entry, I look at the LRU item. If the hit count on the item is greater than the sum of the hit count of all the remaining items, then the item is considered too valuable to be purged and I move on to the next item in the LRU list. A down side of this is that once an entry has become very valuable it is almost impossible to get it out of the cache no matter how old and irrelevant it becomes - A bit like Russian politicians. To solve this problem I would need to add some sort of aging mechanism that devaluates items with age.

@andersmelander
Copy link
Member Author

The current cache implementation is by design not thread safe.
It would be relatively simple to make it thread safe, but since it is only used by the existing TextToPath functions, which are themselves not thread safe, I didn't see any point in doing so.

The question however is if it is by design that TextToPath isn't thread safe?

@andersmelander
Copy link
Member Author

I've done a bit of quick benchmarking.

  • Caching the raw font outline data, as returned by GDI, improves the text rendering performance by 95-150% depending of the renderer used (VPR or VPR2).
  • Caching the path (TArrayOfArrayOfFloatPoint) instead improves the performance by 135-220%.

An increase in performance by 100% means that the rendering takes half the time it did before or that two times the amount of text can be rendered in the same time. So cached text is approximately between two and three times faster than uncached text.

The variance in gain between VPR and VPR2 is due to the performance of the different rasterizers. The faster the rasterizer, the less overhead rasterizing and the more benefit from the cache.

None of the cache implementations has been optimized yet but I expect any optimization gains to be dwarfed by the performance improvements already achieved by this proof of concept implementstion.

@andersmelander andersmelander added this to To do in 3.x (New features) via automation Apr 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants