Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using racket-hash-lang-mode with org-mode source blocks #692

Open
bremner opened this issue Dec 23, 2023 · 18 comments
Open

Using racket-hash-lang-mode with org-mode source blocks #692

bremner opened this issue Dec 23, 2023 · 18 comments
Labels
enhancement racket-hash-lang-mode Issues using racket-hash-lang-mode instead of "classic" racket-mode for edit buffers waiting-for-response

Comments

@bremner
Copy link
Contributor

bremner commented Dec 23, 2023

I'd like to be able to set a variable to tell racket-hash-lang-mode what the buffer syntax is. My use case is editing source blocks in org-mode where the #lang is implicit. Source might look like the following

#+begin_src smol :shebang "#lang smol/fun" :tangle lecture1/smol1.rkt
  (defvar x 10)
  (deffun (f y) (+ x y))
  (f 3)
#+end_src

I can configure this to translate smol to racket-hash-lang-mode when running org-edit-special (and it works without a file, thanks for that). I'm just experimenting with racket-hash-lang-mode, so I have a hard time seeing what is lost by not actually knowing the #lang, but I guess it must be something, right?

@greghendershott
Copy link
Owner

I'm just experimenting with racket-hash-lang-mode, so I have a hard time seeing what is lost by not actually knowing the #lang, but I guess it must be something, right?

In racket-hash-lang-mode the #lang line says where to find live Racket code supplied by the language -- which does quite a few things:

  • classify strings, comments, other (largely replacing the Emacs "char syntax" and parse-partial-sexp machinery)
  • color the above, i.e. font-lock
  • navigate, i.e. supplying a forward-sexp-function that uses the lang concept of "grouping"
  • indent, i.e. supplying an indent-line-function and indent-region-function

I don't use org babel stuff enough to know off the top of my head exactly what you're expecting, and how that would work. I'd be happy to take a look.

Already racket-repl-mode needs the idea of the #lang not being found in the buffer text. So there's some precedent and support for something like that.

@greghendershott
Copy link
Owner

greghendershott commented Feb 2, 2024

I don't use org babel stuff enough to know off the top of my head exactly what you're expecting, and how that would work. I'd be happy to take a look.

I looked at this again, trying to learn more about org source block handling (with which I have almost no hands-on experience). I read the org-source docs, and skimmed some of the source code.

Maybe I'm wrong, but:

  • Something like :shebang "#lang smol/fun" in your example seems like the right idea -- but unfortunately :shebang only applies when tangling.

  • There's also :prologue -- but that only applies when executing.

  • AFAICT there's no property that causes something to be prepended when editing (and removed when done editing).

So status quo I think your only choice is to include the #lang line as the first line of each org src block contents? 😞


Maybe we could stipulate some new header argument property like :racket-hash-lang and somehow connect that to racket-hash-lang-mode. But I'm not sure if the properties are intended to be extensible by third parties.

@greghendershott
Copy link
Owner

Maybe we could stipulate some new header argument property like :racket-hash-lang and somehow connect that to racket-hash-lang-mode. But I'm not sure if the properties are intended to be extensible by third parties.

Actually from looking at some examples like ob-c and ob-clojure, it seems the the header argument properties are open. OK to add more, ad hoc, per lang, it seems?

Also org-edit-src-code funcalls a lang-specific "edit prep function" with the props in a babel-info data structure:

	(let ((edit-prep-func (intern (concat "org-babel-edit-prep:" lang))))
	  (when (fboundp edit-prep-func)
	    (funcall edit-prep-func babel-info)))

So conceivably I could define a org-babel-edit-prep:racket-hash-lang to look for a new property, called say ":hash-lang". Or maybe just reuse :shebang here.

And either way, do something with the value:

  • Maybe just prepend a #lang line in the new edit buffer (but how to delete it later saving back, idk).

  • Or maybe add a text property (invisible), and use roughly the same approach that racket-repl-mode uses to use a lang lacking any #lang xxx text in the buffer.

@greghendershott
Copy link
Owner

I pushed to a topic branch a simple commit ddd0f46. It just adds to racket-hash-lang.el:

(defun org-babel-edit-prep:racket-hash-lang (babel-info)
  "Recreate back-end hash-lang object using :shebang property.

`org-edit-src-code' calls us AFTER the `racket-hash-lang-mode'
buffer is created. So if there is a :shebang property with
\"#lang foo\", we need to recreate the back end object using the
option where we can supply this."
  (pcase babel-info
    (`(,_racket-hash-lang ,_contents ,props . ,_)
     (when-let (shebang (cdr (assq :shebang props)))
       ;; re-create back end hash-lang object
       (racket--hash-lang-delete)
       (setq-local racket--hash-lang-id
                   (racket--cmd/await
                    nil
                    `(hash-lang
                      create
                      ,(cl-incf racket--hash-lang-next-id)
                      ,shebang
                      ,(buffer-substring-no-properties (point-min) (point-max)))))))))

Given an example file issue-692.org:

#+begin_src racket-hash-lang :shebang "#lang rhombus"
  "string"
  ~string
  1 + 1
#+end_src

I did C-c ' and "it works".

I'm not sure it's quite that simple...?

@greghendershott
Copy link
Owner

p.s. This also seems to support "DRY" patterns to avoid repeating a property like :shebang on every block.

Like a property for the whole org file:

#+PROPERTY: header-args:racket-hash-lang :shebang "#lang rhombus"

#+begin_src racket-hash-lang
  ~string
  1 + 1
#+end_src

As well as a property drawer for just a section within the org file.

As well as setting the Emacs Lisp variable org-babel-default-header-args:racket-hash-lang for system-wide, multiple org files.

(Again I have almost no experience with this stuff. I just read the docs and tried a quick example to confirm.)

@greghendershott greghendershott changed the title fake hash-lang Using racket-hash-lang-mode with org-mode source blocks Feb 2, 2024
@greghendershott greghendershott added the racket-hash-lang-mode Issues using racket-hash-lang-mode instead of "classic" racket-mode for edit buffers label Feb 2, 2024
@bremner
Copy link
Contributor Author

bremner commented Feb 3, 2024 via email

@greghendershott
Copy link
Owner

I think this is because the racket-hash-lang-mode sets the buffer read-only, until the back end hash-lang object becomes ready. Some comments from its code:

  ;; Create back end hash-lang object.
  ;;
  ;; On the one hand, `racket--cmd/await' would be simpler to use
  ;; here. On the other hand, when the back end isn't running, there's
  ;; a delay for that to start, during which the buffer isn't
  ;; displayed and Emacs seems frozen. On the third hand, if we use
  ;; `racket--cmd/async' naively the buffer could try to interact with
  ;; a back end object that doesn't yet exist, and error.
  ;;
  ;; Warm bowl of porridge: Make buffer read-only and use async
  ;; command to create hash-lang object. Only when the response
  ;; arrives, i.e. the back end object is ready, enable read/write and
  ;; set various hook functions that depend on `racket--hash-lang-id'.
  ;;
  ;; Also, handle the back end returning nil for the create -- meaning
  ;; there's no sufficiently new syntax-color-lib -- by downgrading to
  ;; plain `prog-mode'.

This works fine when the user interactively starts racket-hash-lang-mode, including via the org edit source command. If they are too quick, they just get a "sorry not ready" message.

However org-babel-tangle creates a buffer, calls racket-hash-lang-mode, then immediately tries to use the buffer. Quite reasonably; normally an Emacs mode is ready to use when the mode init returns. (I think org-babel-tangle would be fine if racket-hash-lang-mode blocked and didn't return until ready. But I'm not sure how to detect being called by that and blocking only in that case -- or if that's even the best strategy.)

I'm not sure how to balance all the competing needs here but I'll give it a think...


p.s. When tracing through the code, I noticed that :shebang has the side-effect of giving created files executable mode -- not just automatically adding that text as the first line. So will probably want to revisit that, too.

@greghendershott
Copy link
Owner

Do you want a seperate issue for this, or just leave all the org-src+hash-lang mode discussion here?

I think for now one issue makes sense. Seems like some overlap among the three things -- edit, tangle, execute. At least wrt decisions like using :shebang and/or some new property, etc.

@greghendershott
Copy link
Owner

greghendershott commented Feb 5, 2024

Update: I understand more about how things work. I think I see how to make both org-edit-src-block and org-babel-tangle work.

However I don't see how to make formatting work for the source block itself in the original .org buffer. What org does in that case is, create a hidden buffer using the lang mode -- e.g. named " *org-src-fontification:racket-hash-lang-mode*", copy the contents to that buffer, ensure font lock, and copy the faces back to the org buffer. That hidden buffer has no access to the org source block information, including the header argument properties like the lang. As a result, racket-hash-lang-mode can't know the actual lang. If there's no explicit #lang x in the source, it can't format for the hash-lang. 😞

I've looked for kludgy ways for that special buffer to discover the corresponding source block and buffer, and the metadata like the lang. So far I'm stumped. Even if I had some kludge, it might be fragile.


To go with the grain of Emacs and org-mode expectations, each lang gets its own major mode, which knows about formatting the lang. Whereas the concept of racket-hash-lang-mode acting for various langs, doesn't really fit. This is an example.

A better fit would be for each hash-lang to have its own Emacs major mode. This could be a small major mode using define-derived-mode around racket-hash-lang-mode. It could set some Emacs var to hold the lang, for the base racket-hash-lang-mode to use.

Maybe I could make a little Emacs macro to do the define-derived-mode, as well as the couple of org-babel settings.

@greghendershott
Copy link
Owner

p.s. In my previous comment I'm referring to the formatting that happens when org-src-fontify-natively is non nil. I'm not sure if you have that enabled, or if you do, whether you really care about the formatting working -- but it bugs me that it doesn't format appropriately, like it does for other modes.

greghendershott added a commit that referenced this issue Feb 12, 2024
More work for issue #692

As far as I can tell, org source blocks and org-babel are designed
around the assumption that each language will have its own major mode.
Otherwise, the source block language isn't available in all scenarios.

Therefore go with the flow: Even though racket-hash-lang-mode can
handle all hash-langs, people will need to derive from it a new major
mode for each lang they want to use with org source blocks.

A new racket-define-hash-lang macro makes this easier, as well as
handling related configuration like auto-mode-alist,
org-src-lang-modes, and org-babel-tangle-lang-exts.

With this we (intend to) fully support org source block editing and
tangling.

When it comes to executing, we supply a basic org-babel-execute:<lang>
function that knows how to run all hash-langs. However it only
supports the :result-type output -- not values. And it does not
support input :vars. In both cases, the syntax and semantics will of
course vary among languages. However a user could define a
org-babel-expand-body:<lang> to support :vars for a given lang. (But I
don't yet have any how :result-type value would work.)

One issue that comes up for all three (edit, tangle, execute) is what
to do about #lang lines. A Racket program must start with exactly one.

- edit: The user need not include one. We add one automatically when
they C-c ' to edit in the dedicated edit buffer), to keep things like
racket-xp-mode happy. And we subtract it when writing back to the org
buffer.

- execute: We add one if the block lacks one.

- tnagle: It's up to the user to start the /first/ block (for each
lang) with one, but no others.
greghendershott added a commit that referenced this issue Feb 14, 2024
Closes issue #692

As far as I can tell, org source blocks and org-babel are designed
around the assumption that each language will have its own major mode.
Otherwise, the source block language isn't available in all scenarios.

Therefore go with the flow: Even though racket-hash-lang-mode can
handle all hash-langs, people will need to derive from it a new major
mode for each lang they want to use with org source blocks.

A new racket-define-hash-lang macro makes this easier, as well as
handling related configuration like auto-mode-alist,
org-src-lang-modes, and org-babel-tangle-lang-exts.

With this we (intend to) fully support org source block
formatting, editing, and tangling.

When it comes to executing, we supply a basic org-babel-execute:<lang>
function that knows how to run all hash-langs. However it only
supports the :result-type output -- not values. And it does not
support input :vars. In both cases, the syntax and semantics will of
course vary among languages. However a user could define a
org-babel-expand-body:<lang> to support :vars for a given lang. (But I
don't yet have any how :result-type value would work.)

One issue that comes up for all four scenarios is what to do about
lang lines -- a Racket program must start with exactly one.

1. format: We use the back end hash-lang option to set the lang
separately (as we also use for the REPL).

2. edit: The user need not include one. We add one automatically when
they C-c ' to edit in the dedicated edit buffer), to keep things like
racket-xp-mode happy. And we subtract it when writing back to the org
buffer.

3. execute: We add one if the block lacks one.

4. tangle: It's up to the user to start the /first/ block (for each
lang) with one, but not the remainder.
greghendershott added a commit that referenced this issue Feb 14, 2024
Closes issue #692

As far as I can tell, org source blocks and org-babel are designed
around the assumption that each language will have its own major mode.
Otherwise, the source block language isn't available in all scenarios.

Therefore go with the flow: Even though racket-hash-lang-mode can
handle all hash-langs, people will need to derive from it a new major
mode for each lang they want to use with org source blocks.

A new racket-define-hash-lang macro makes this easier, as well as
handling related configuration like auto-mode-alist,
org-src-lang-modes, and org-babel-tangle-lang-exts.

With this we (intend to) fully support org source block
formatting, editing, and tangling.

When it comes to executing, we supply a basic org-babel-execute:<lang>
function that knows how to run all hash-langs. However it only
supports the :result-type output -- not values. And it does not
support input :vars. In both cases, the syntax and semantics will of
course vary among languages. However a user could define a
org-babel-expand-body:<lang> to support :vars for a given lang. (But I
don't yet have any how :result-type value would work.)

One issue that comes up for all four scenarios is what to do about
lang lines -- a Racket program must start with exactly one.

1. format: We use the back end hash-lang option to set the lang
separately (as we also use for the REPL).

2. edit: The user need not include one. We add one automatically when
they C-c ' to edit in the dedicated edit buffer), to keep things like
racket-xp-mode happy. And we subtract it when writing back to the org
buffer.

3. execute: We add one if the block lacks one.

4. tangle: It's up to the user to start the /first/ block (for each
lang) with one, but not the remainder.
@greghendershott
Copy link
Owner

OK this took awhile but I think I understand the problem space now.

I have a solution that I believe is generally correct. (Caveat: Although I've tried to think of all the scenarios and edge cases, I've probably overlooked some.)

With commit 4491cc0 you get a racket-define-hash-lang macro.

In your original example, you can (racket-define-hash-lang smol ".smol") and all of {format, edit, tangle, execute} should "just work" when you use smol as the language for org source blocks.

Copy of the commit message:

racket-hash-lang: org source block {format edit tangle execute}

Closes issue #692.

As far as I can tell, org source blocks and org-babel are designed
around the assumption that each language will have its own major mode.
Otherwise, the source block language isn't available in all scenarios.

Therefore go with the flow: Even though racket-hash-lang-mode can
handle all hash-langs, people will need to derive from it a new major
mode for each lang they want to use with org source blocks.

A new racket-define-hash-lang macro makes this easier, as well as
handling related configuration like auto-mode-alist,
org-src-lang-modes, and org-babel-tangle-lang-exts.

With this we (intend to) fully support org source block
formatting, editing, and tangling.

When it comes to executing, we supply a basic org-babel-execute:<lang>
function that knows how to run all hash-langs. However it only
supports the :result-type output -- not values. And it does not
support input :vars. In both cases, the syntax and semantics will of
course vary among languages. However a user could define a
org-babel-expand-body:<lang> to support :vars for a given lang. (But I
don't yet have any how :result-type value would work.)

One issue that comes up for all four scenarios is what to do about
lang lines -- a Racket program must start with exactly one.

1. format: We use the back end hash-lang option to set the lang
separately (as we also use for the REPL).

2. edit: The user need not include one. We add one automatically when
they C-c ' to edit in the dedicated edit buffer), to keep things like
racket-xp-mode happy. And we subtract it when writing back to the org
buffer.

3. execute: We add one if the block lacks one.

4. tangle: It's up to the user to start the /first/ block (for each
lang) with one, but not the remainder.

What do you think about the idea?

If you're able to try the commit from that branch, how does it work for you?

@bremner
Copy link
Contributor Author

bremner commented Mar 4, 2024 via email

@greghendershott
Copy link
Owner

I think the macro is actually called racket-declare-hash-lang-for-org-babel ?

Yes, sorry about the name change. Originally I thought this might be relevant beyond org-babel, but then realized not, and decided to make the name more specific to reflect that.

Would it make sense to update auto-mode-alist to open files with the
given suffix in the newly created mode? Maybe that is mission creep,
I'm not sure.

Similarly, originally I had the macro do exactly that, but took it out.

For a few real-world langs -- like racket, scribble, and rhombus -- there are langs people might prefer to use with a dedicated "classic" mode (like racket-mode, scribble-mode, or rhombus-mode) as opposed to racket-hash-lang-mode. So I think it is mission creep.

somehow the tangled file does not have a "#lang"
line. Is that expected?

Yes. I thought I had it in the doc string, but, you need to do the #lang line explicitly in the first block. org-tangle is just concatenating all those.

@bremner
Copy link
Contributor Author

bremner commented Mar 5, 2024 via email

@greghendershott
Copy link
Owner

That makes sense, but I then I wonder (lazily without looking at the
code) what the file extension is needed for?

It's used to add the lang to org-babel-tangle-lang-exts.

IIUC so in the scenario where you tangle foo.org but don't supply a filename it will just do foo.<ext>?

It also works to add :shebang to any block. As you
observed in some previous email that has side effects with respect to
permissions, but I just override that globally with something like

#+PROPERTY: header-args :tangle-mode (identity #o644)

I have not noticed any problems with abusing :shebang in this way. I use
it quite extensively for tangling racket files (with classic racket-mode).

Although it's been a few weeks now, I recall from code spelunking that :shebang was something that org-babel-tangle knew how to handle specially. IOW there wasn't any obvious way for me to do a similar "first block only" behavior automatically from the src block language property, or any other property.

Given that, it's going to be up to the user to have the tangled output start with a #lang line... somehow -- either via the :shebang property, or, by including the #lang line literally in the first block. (I feel like the latter is simpler for me to document, which is what I did -- but I didn't mean to imply the former can't work or that you shouldn't prefer it.)

@bremner
Copy link
Contributor Author

bremner commented Mar 6, 2024 via email

@greghendershott
Copy link
Owner

I pushed another commit with some doc prose edits, to the issue-692 branch.

The doc string now:

(defmacro racket-declare-hash-lang-for-org-babel (lang ext)
  "Arrange for a Racket hash-lang to work with org-babel.

LANG should be an unquoted symbol, same as you would use in a
Racket =#lang= line.

EXT should be a string with the file extension for LANG, /not/
including any dot.

Examples:

  (racket-define-hash-lang rhombus \"rhm\")
  (racket-define-hash-lang scribble/manual \"scrbl\")

This macro will:

0. Define a major mode derived from `racket-hash-lang-mode' named
   `racket-hash-lang:LANG-mode'.

1. Add the language to `org-src-lang-modes' and
   `org-babel-tangle-lang-exts'.

2. Define a org-babel-edit-prep:LANG function.

3. Define a org-babel-execute:LANG function, which delegates to
   `racket--hash-lang-org-babel-execute'. See its doc string for
   more information -- including why this macro /cannot/ also
   define a org-babel-expand-body:LANG function.

4. Allow a buffer to omit the explicit #lang line, when it is
   created by `org-mode' for user editing or formatting of a
   source code block whose language property is LANG.

Discussion:

A valid Racket program consists of one outermost module per
source file, using one lang. Typically this is expressed using a
=#lang= line -- which must occur exactly once at the start of the
file. In such a buffer, `racket-hash-lang-mode' \"just works\".

When using multiple `org-mode' source blocks of the same lang,
the situation is trickier:

- Although you could start /every/ block with a lang line, that's
  tedious, and org-tangle will concatenate them into an invalid
  program.

- On the other hand, if you start only the /first/ block with a
  lang line, then various org-babel features won't work properly
  with the subsequent blocks. Basically this is because org
  creates a hidden buffer using `racket-hash-lang-mode', but the
  source block's lang property value is not available to that
  buffer, so it can't know what lang line to add automatically.

- Similarly, if you use the :shebang property to tangle
  correctly, that property value is not available in the hidden
  buffers created by org mode.

TL;DR: Org assumes that each lang will have a major mode that
knows enough to do what is required. To accommodate this it is
simplest to define a distinct major mode for each org source
block language."

Unfortunately I think that prose is still not great about explaining that shebang is another good/sufficient way to make tangling work.

Most of the discussion (attempts to) explain that org-mode creates hidden buffers, and those buffers get no access to any of these src block properties (source lang, shebang, whatever). That's what pushes us to derive a major mode for each source lang.

(The whole situation is kind of confusing. I want to make sure I understand it, and also try to make users not need to understand it more than necessary. So the macro tries to help do that, but the doc string still needs to explain the situation just in case... ugh.)

@bremner
Copy link
Contributor Author

bremner commented Mar 9, 2024 via email

greghendershott added a commit that referenced this issue Mar 13, 2024
Closes issue #692

As far as I can tell, org source blocks and org-babel are designed
around the assumption that each language will have its own major mode.
Otherwise, the source block language isn't available in all scenarios.

Therefore go with the flow: Even though racket-hash-lang-mode can
handle all hash-langs, people will need to derive from it a new major
mode for each lang they want to use with org source blocks.

A new racket-define-hash-lang macro makes this easier, as well as
handling related configuration like auto-mode-alist,
org-src-lang-modes, and org-babel-tangle-lang-exts.

With this we (intend to) fully support org source block
formatting, editing, and tangling.

When it comes to executing, we supply a basic org-babel-execute:<lang>
function that knows how to run all hash-langs. However it only
supports the :result-type output -- not values. And it does not
support input :vars. In both cases, the syntax and semantics will of
course vary among languages. However a user could define a
org-babel-expand-body:<lang> to support :vars for a given lang. (But I
don't yet have any how :result-type value would work.)

One issue that comes up for all four scenarios is what to do about
lang lines -- a Racket program must start with exactly one.

1. format: We use the back end hash-lang option to set the lang
separately (as we also use for the REPL).

2. edit: The user need not include one. We add one automatically when
they C-c ' to edit in the dedicated edit buffer), to keep things like
racket-xp-mode happy. And we subtract it when writing back to the org
buffer.

3. execute: We add one if the block lacks one.

4. tangle: It's up to the user to start the /first/ block (for each
lang) with one, but not the remainder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement racket-hash-lang-mode Issues using racket-hash-lang-mode instead of "classic" racket-mode for edit buffers waiting-for-response
Projects
None yet
Development

No branches or pull requests

2 participants