racket-xp-mode: Handling very large files #522

greghendershott · 2021-02-25T14:26:04Z

Via Racket Slack, @samth supplied this example file which is over 8 million bytes and 86,000 lines long.

It seems to take nearly 60 seconds for the Racket Mode back end to run check-syntax and prepare the response.

[  debug] racket-mode: 47913 cpu | 48131 real | 3047 gc <= drracket/check-syntax/expanded-expression
[  debug] racket-mode:    0 cpu |    0 real |    0 gc <= drracket/check-syntax/expansion-completed
[  debug] racket-mode:   50 cpu |   50 real |    2 gc <= defs/uses
[  debug] racket-mode:  732 cpu |  733 real |   82 gc <= get-annotations
[  debug] racket-mode:  114 cpu |  114 real |   67 gc <= imports
[  debug] racket-mode: 51073 cpu | 51293 real | 4029 gc <= total /var/tmp/samth-huge.rkt

The response is a single huge s-expression, which after our elisp-writeln (which takes about 2 seconds) is about 20,452,540 bytes long.

Emacs then seems to freeze when reading the response.

I'm not yet sure how much time is spent attempting an Emacs Lisp read of the process buffer text. Since the process filter will be getting text in smaller chunks, the read is being attempted multiple times before succeeding.

If it were to get past that, I'm not sure how much time is spent in the command response handler dolist of the response sexpr, adding text properties to the buffer.

Clearly the current design isn't intended or able to handle files this large. A 5,000 line file like drracket/private/unit.rkt is closer to the envisioned definition of "large".

What I don't yet know, is what to do about it.

Simple mitigation

Of course there can be a mitigation: We can supply people a function to use for a racket-mode-hook, that is not merely racket-xp-mode. Instead of unconditionally enabling that mode, it would check buffer-size.

Another mitigation could be, don't change the hook function. Instead, have racket-xp-mode itself check buffer-size, and act differently. e.g. Set racket-xp-after-change-refresh-delay to nil for such a buffer, and, have the manual racket-xp-annotate command warn about such buffers.

"Streaming"

Instead of returning a single response, the back end command could send a stream of notifications.

Although this would probably solve the issue of Emacs "freezing", it's not clear that ~60 seconds of such notifications and updates to the buffer is going to be a good or coherent experience. For example, what if the user edits, prompting a new check-syntax? How/when do we delete text properties for the previous generation?

Don't give Emacs the data and use text properties, at all

Another idea is not to return the data in a command response and insert it as text properties; instead hold it in the back end. Add a command to query it, e.g. "What are the annotations for this interval of the buffer?" Drawbacks here include managing a new source of state in the back end. Also, we consult properties in an Emacs 'pre-redisplay-functions hook, so we can update things on screen as the user navigates. Now each such movement of point will need to issue a command to the back end; will that be fast enough to be satisfactory?

Other?

Probably there are other ideas, which I'm not yet thinking of, with their own tradeoffs, which I don't yet know.

The text was updated successfully, but these errors were encountered:

samth · 2021-02-25T14:42:07Z

My feelings are:

racket-mode and racket-xp-mode should themselves prevent emacs from getting stuck, rather than require additional configuration for that.
47 seconds for check-syntax is too long, since expansion takes only 1/10th of that time.
Check syntax probably needs to be more streaming as well.

greghendershott · 2021-02-25T18:59:00Z

racket-mode and racket-xp-mode should themselves prevent emacs from getting stuck, rather than require additional configuration for that.

I plan to merge commit 6a911b1 for this.

Check syntax probably needs to be more streaming as well.

I think it already is -- it calls the various syncheck methods one by one, as it discovers things to report.

The Racket Mode back end gathers these into a single response.

Partly because that is convenient for "normal" loads. But also because it does some massaging, e.g. turning the one-way arrows into two-way definitions <-> uses, for the kind of "navigate among things" UX you need in lieu of graphical arrows.

I could "streamify" that. But ...

47 seconds for check-syntax is too long, since expansion takes only 1/10th of that time.

Even modulo the gathering/massaing work Racket Mode's back end does -- for instance, if I just make all the syncheck methods no-ops -- running check-syntax itself takes a good 30-40 seconds.

So. Streaming the response would indeed prevent Emacs from freezing, on these outlier extremely large files.

But I fear it won't help beyond that -- it won't result in a satisfying experience, if the buffer updates in dribbles over 30+ seconds.

My gut instinct ATM is that, unless check-syntax itself could work something like 10X faster (if that's even possible??), a streaming redesign of the Racket Mode back end wouldn't be worth it (given the mitigation above). Initially it would just break things for normal size files, until I found and fixed those breakages.

But that's just my gut instinct ATM. I'll mull more.

greghendershott · 2022-11-10T17:14:09Z

Just a comment about a likely future direction for this:

On the experimental hash-lang branch, I've been trying a model where the Emacs front end handles things via the default Emacs JIT font-lock mechanism. With this, Emacs does not eagerly font-lock the entire buffer. Instead it does so as/when portions of the buffer becomes visible (e.g. due to user navigation).

Furthermore, even when Emacs asks us to font-lock some portion of the buffer, we simply mark it "fontified", submit a request to the back end (for the lang lexer stuff to re-run), then return immediately. When the back end eventually returns a response, only then is the information used to apply properties (like face) to the buffer. In other words it is handled "asynchronously". Emacs isn't tied up waiting for the answer.

With this approach, performance seems good, even for very large buffers. Worst case, some buffer portion might briefly appear un-fontified, then resolve correctly. (Alas I have a recent report where it seems not to be working at all, about which I'm still completely mystified: #642. But presumably the basic idea is still valid (?).)

Assuming that approach pans out, and I do merge the hash-lang branch, then probably I will mostly unify that approach -- for both check-syntax and hash-langs -- for handling buffer-update -> back end recalculation -> notification to front end. I think that would go a long way toward closing this specific issue.

greghendershott added the bug label Feb 25, 2021

greghendershott added a commit that referenced this issue Feb 25, 2021

Add check-syntax buffer-size limit; see issue #522

df66707

greghendershott added a commit that referenced this issue Feb 25, 2021

Add check-syntax buffer-size limit; see issue #522

6a911b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

racket-xp-mode: Handling very large files #522

racket-xp-mode: Handling very large files #522

greghendershott commented Feb 25, 2021

samth commented Feb 25, 2021

greghendershott commented Feb 25, 2021 •

edited

greghendershott commented Nov 10, 2022 •

edited

racket-xp-mode: Handling very large files #522

racket-xp-mode: Handling very large files #522

Comments

greghendershott commented Feb 25, 2021

Simple mitigation

"Streaming"

Don't give Emacs the data and use text properties, at all

Other?

samth commented Feb 25, 2021

greghendershott commented Feb 25, 2021 • edited

greghendershott commented Nov 10, 2022 • edited

greghendershott commented Feb 25, 2021 •

edited

greghendershott commented Nov 10, 2022 •

edited