Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkifying function doesn't strip certain Org markup from headers #3

Open
wasamasa opened this issue Aug 1, 2015 · 21 comments
Open

Comments

@wasamasa
Copy link

wasamasa commented Aug 1, 2015

I've tried this on my init.org and found two examples of stripping not being zealous enough:

  • TODO states are kept
  • Headings that are links break the TOC link
@wasamasa wasamasa changed the title Linkifying function doesn't strip any Org markup from headers Linkifying function doesn't strip certain Org markup from headers Aug 1, 2015
@snosov1
Copy link
Owner

snosov1 commented Aug 1, 2015

Thx for testing and submitting!

  • When I just implemented the mode, GitHub didn't treat the TODO and DONE states correctly (i.e. it treated those as plain text). That's why I also ignored them. But it seems this has changed, so now I strip those states as well.
  • I implemented the links "flattening" in the TOC. It makes the links work in github, but the internal org linking doesn't work. I will have to dig a bit more into why it happens and whether there's a good way to fix this.

@snosov1
Copy link
Owner

snosov1 commented Aug 17, 2015

It seems like org itself doesn't treat headings that are a links properly.

If I call org-store-link on such heading and then call org-insert-link, the resulting link looks broken. C-c C-o doesn't work properly either.

So, I guess, such scenarios should be fixed in org first. (For example, in the same way GitHub handles this). For now, I'm closing this.

@snosov1 snosov1 closed this as completed Aug 17, 2015
@wasamasa
Copy link
Author

Would you mind if I'd hand in a workaround for this?

@snosov1
Copy link
Owner

snosov1 commented Aug 28, 2015

It depends on what you have in mind. If it's going to be a "more or less legitimate hack", we could consider adding it.

From what I've seen so far, I think the issue should be fixed on Org side first. But maybe I just don't see an existing good solution.

@wasamasa
Copy link
Author

My idea is simply adding an extra filter into the toc-org-hrefify-gh function which replaces Org's bracket links with their caption.

@wasamasa
Copy link
Author

wasamasa commented Sep 9, 2015

I've pushed a TOC to Github now and noticed that I didn't want to strip links, it was the Org markup that isn't displayed inside links...

Assuming I have a =title= heading, it gets turned into [[#title][=title=]] which is displayed as =title=. I'd prefer =title= being turned into [[#title][title]] instead.

@snosov1
Copy link
Owner

snosov1 commented Sep 9, 2015

Assuming I have a =title= heading, it gets turned into [[#title][=title=]] which is displayed as =title=. I'd prefer =title= being turned into [[#title][title]] instead.

Ok, I see. I guess it makes sense to strip the formatting from the link text.

Unfortunately, I'm a bit busy at the moment, so don't know when I'll have a chance to look into it.

@snosov1 snosov1 reopened this Sep 9, 2015
@wasamasa
Copy link
Author

wasamasa commented Sep 9, 2015

That's no problem, I'll prepare another pull request once I've figured out how to do this properly.

@wasamasa
Copy link
Author

OK, this is sort of ugly:

;; strip emphasis
(goto-char (point-min))
;; the beginning of a headline deeper than the first level is
;; recognized as emphasis, so let's jump to the end of the
;; leading asterisks, then operate from there to the end of the
;; line to strip the markup
(while (re-search-forward "^\\*+ " nil t)
  ;; as the RE includes an extra space at the end, hop a char
  ;; backwards to recognize an emphasized piece of text
  ;; immediately following it
  (backward-char)
  (while (re-search-forward org-emph-re (line-end-position) t)
    (replace-match "\\1\\4\\5")
    ;; see above
    (backward-char)))

BRB, taking a shower.

@alphapapa
Copy link
Contributor

FYI, I added alphapapa/org-make-toc@be4c1f8 to do this sort of thing by omitting any text in link descriptions that Org would normally make invisible.

@snosov1
Copy link
Owner

snosov1 commented Nov 26, 2018

Hey, @alphapapa ! I was going to steal your implementation, but after looking at it, I see that we, once again, have different approach to the implementation =)

In my opinion, such functionality can only operate with the text of the file and shouldn't rely on any "external" libs to do the work. I see that you use font-lock and invisible properties that are invisible (pun intended) to any 3rd-party non-elisp implementation (i.e. org-ruby used by github)

@alphapapa
Copy link
Contributor

alphapapa commented Nov 26, 2018

In my opinion, such functionality can only operate with the text of the file and shouldn't rely on any "external" libs to do the work. I see that you use font-lock and invisible properties that are invisible (pun intended) to any 3rd-party non-elisp implementation (i.e. org-ruby used by github)

@snosov1 I don't understand what you mean. The target platform in this case is org-ruby, and AFAICT it simply does not properly handle emphasis characters in link descriptions. AFAIK the easiest and safest way to remove them is to remove characters that Org makes invisible.

What's the alternative? Messy regexp matching? Write your own Org parser in Elisp? Parse headings with org-element and special-case-parse the results to ignore emphasis?

@snosov1
Copy link
Owner

snosov1 commented Nov 26, 2018

What's the alternative? Messy regexp matching?

Yes =) That's pretty much what I do with toc-org.

Parse headings with org-element and special-case-parse the results to ignore emphasis?

Also, a valid option, if one prefers.

AFAIK the easiest and safest way to remove them is to remove characters that Org makes invisible.

It depends on the definition of "safe". In my book, relying on something that the "target platform" never has access to, does not help in the safety department. Besides, in my experience, font-lock is the single most performance-hungry part of Emacs.

@alphapapa
Copy link
Contributor

alphapapa commented Nov 26, 2018

Yes =) That's pretty much what I do with toc-org.

That seems to me too much like writing another Org parser inside of Emacs. :)

It depends on the definition of "safe". In my book, relying on something that the "target platform" never has access to, does not help in the safety department.

By "safe" I mean, most likely to work properly, least likely to have edge cases and unexpected behavior, etc.

I don't understand why it would be necessary to use something the target platform (i.e. org-ruby) would have access to. We're outputting data from Org, which is the canonical source, and which our tools run inside of. How is that relevant?

Besides, in my experience, font-lock is the single most performance-hungry part of Emacs.

That puzzles me. ISTM that font-locking in Emacs is extremely well optimized. But, anyway, we're talking here about fontifying a few words at a time in a dedicated buffer (that's how my code works, anyway; I do the same thing in helm-org-rifle for fontifying Org outline paths, and it's very fast).

@snosov1
Copy link
Owner

snosov1 commented Nov 26, 2018

That seems to me too much like writing another Org parser inside of Emacs. :)

You don't need a full-blown Org parser to generate the TOC. Both, toc-org and org-ruby are the living evidence for that.

I don't understand why it would be necessary to use something the target platform (i.e. org-ruby) would have access to.

For this particular case, it might not be a real issue, but just me being too cautious/opinionated.

However, this is where I'm coming from. I had several asks/contributions previously that would introduce new behavior that controls the look of the TOC by setting some elisp variable. You can think of toc-org-max-depth, for example. The problem is that when one person has toc-org-max-depth equal to 2 and the other has it equal to 3, they will have constant merge conflicts if they live in a single repo. Since there's no way for org-ruby to get the correct value of the variable.

That's why I have the feature of specifying :TOC_3:. In that case, all the info can be obtained from the document itself, without relying on Emacs state on a particular machine. Since the very early days of toc-org I maintain it as an implicit rule - "The TOC look should only depend on the document contents and toc-org implementation".

For font-lock implementation, consider that 2 users have different versions of Org that treat some emphasis differently. The TOC for the same file can, potentially, be generated differently for them (and org-ruby has no idea about it).

ISTM that font-locking in Emacs is extremely well optimized.

I can't really make a competent comment on that, but try to open a 500 Mb json file (in js-mode) and modify it. At least for me the cursor/action delay drops to seconds. If I use find-file-literally, it works seamlessly. And that's not just an issue with js-mode. Pretty much any mode that I use will have issues with fontifying hundreds of megabytes of text. In all such cases I resort to find-file-literally and work from there.

we're talking here about fontifying a few words at a time

I see your point, But Still (tm)

@alphapapa
Copy link
Contributor

The problem is that when one person has toc-org-max-depth equal to 2 and the other has it equal to 3, they will have constant merge conflicts if they live in a single repo. Since there's no way for org-ruby to get the correct value of the variable.

For font-lock implementation, consider that 2 users have different versions of Org that treat some emphasis differently. The TOC for the same file can, potentially, be generated differently for them (and org-ruby has no idea about it).

Sorry, I must be missing something: I don't understand how org-ruby's ability to understand settings is relevant. We're talking about generating the ToC from Org, before org-ruby sees it--specifically generating data that org-ruby can understand.

"The TOC look should only depend on the document contents and toc-org implementation".

And toc-org depends on Org and Emacs. It's turtles all the way down. :)

I can't really make a competent comment on that, but try to open a 500 Mb json file (in js-mode) and modify it. At least for me the cursor/action delay drops to seconds. If I use find-file-literally, it works seamlessly.

500 MB is a very large file to open in a text editor and expect it to be fontified in a way that can depend on the entire contents of the file before point. Have you tried vlf-mode? I don't know how well it handles font-locking, but it probably tries.

I see your point, But Still (tm)

So you're saying that toc-org is intended to be usable on 500 MB files? ;)

@snosov1
Copy link
Owner

snosov1 commented Nov 26, 2018

We're talking about generating the ToC from Org, before org-ruby sees it-

Example (numbers and the actual behavior is fictional):
Person 1 has Org version 8.3.2 that doesn't recognize =werwe\ne= as verbatim due to a newline. So, the "font-lock" version inserts a toc preserving the equal signs.
Person 2 has new Org version 10.3.1 that fixes this bug. So, his (same) "font-lock" implementation now strips the equal signs as the text is verbatim.

If Person 1 and Person 2 work with the same repository, they will be constantly generating different TOCs and experience merge conflicts (one will strip equal signs and the other will not).

And toc-org depends on Org and Emacs. It's turtles all the way down. :)

Well, yes. But not really. My reliance on Org is very limited - only the parts that are required for "links jumping" (C-c C-o) interact with Org (and you can find ugly branches, like that because of it). With Emacs, I mostly rely on its regexp engine and routines for text manipulation. And these proved their stability with time. I don't expect them to break backward compatibility or change in unexpected ways silently any time soon. So, you can say that my bottom turtle exists and is rock solid ;)

500 MB is a very large file to open in a text editor

Where do I open it if not in Emacs, then? =) And as we agree - it's not unreasonable to ask for it if you are willing to sacrifice the "fancy features" - Emacs does it easily.

Have you tried vlf-mode?

Yeah, I remember trying it out, but it didn't stick. I believe it didn't offer much on top of find-file-literally for me. Besides, if your first character in a json file is { and your last character is }, there's no way around of parsing the whole thing.

So you're saying that toc-org is intended to be usable on 500 MB files? ;)

Actually, I've never tried it before you asked. On my machine, I notice the delay becomes unfriendly (~1 sec) for a file of 1.5 Mb. Frankly, I don't know how inefficient my implementation is, but my guess is that using font-lock wouldn't help me in that regard =)

@alphapapa
Copy link
Contributor

If Person 1 and Person 2 work with the same repository, they will be constantly generating different TOCs and experience merge conflicts (one will strip equal signs and the other will not).

In a hypothetical scenario as you describe, that could happen. In that case, I would suggest that the user on the older version upgrade Org. There would likely be other, more serious conflicts of behavior between such a large version gap. But if you want to handle it that way, up to you. :)

Where do I open it if not in Emacs, then?

I don't know how other editors would perform fontifying a 500 MB file in its entirety at once. Maybe there are some that perform better. If you are inclined to improve Emacs' font-lock code, I'm sure patches would be welcome, haha. ;)

@snosov1
Copy link
Owner

snosov1 commented Nov 26, 2018

I would suggest that the user on the older version upgrade Org

I know you're the cutting-edge guy and update the software frequently. Myself, on the contrary, is a don't touch it if it works-guy. My org package has a version 20161118 and it's almost 2019 =) I think I mentioned it to you, but my laptop still has Ubuntu 12.04 installed and I see no reason to switch. I believe, there's merit to both yours and mine strategies.

I don't know how other editors

My path to Emacs wasn't, like "Let me try this new shiny editor because I'm bored with my current editor". It was more like "I need a professional tool for life". Meaning - it's as reliable as it can get, it does only what I need/ask it to do (i.e. no bloat), it doesn't change to become unrecognizable with every release, it can work at scale (read 500Mb files, work on Android source base ~10Gb, etc.). And Emacs proved itself in situations where other tools failed miserably.

If you are inclined to improve Emacs' font-lock code

Actually, I'm more on the Douglas Crockford's side with respect to code coloring:

I want to say something about the coloring.
You've all seen syntax coloring, right?
That's something we put in our text editors to make it easier
for kindergartners to do programming.

So, when I can have it - great. When I don't have it - I don't mind too much =)

@alphapapa
Copy link
Contributor

I know you're the cutting-edge guy and update the software frequently. Myself, on the contrary, is a don't touch it if it works-guy. My org package has a version 20161118 and it's almost 2019 =) I think I mentioned it to you, but my laptop still has Ubuntu 12.04 installed and I see no reason to switch. I believe, there's merit to both yours and mine strategies.

Not really. I only upgrade packages from MELPA when I need to (and I store them in git so I can rollback). I didn't upgrade from Org 8.2.10 to Org 9 for a long time after Org 9 came out. I run Ubuntu 14.04 on my main system (not as old as yours though, haha), and Debian Stable on others.

Actually, I'm more on the Douglas Crockford's side with respect to code coloring:

But, I guess, in this respect, you would consider me a heretic:
emacs

@snosov1
Copy link
Owner

snosov1 commented Nov 26, 2018

But, I guess, in this respect, you would consider me a heretic:

[runs away screaming] =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants