Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

org-roam-extract-subtree results in data loss when extracting folded headings with properties under them #2425

Open
n-hebert opened this issue Mar 14, 2024 · 16 comments

Comments

@n-hebert
Copy link

Description

Steps to Reproduce

  1. Navigate to a heading
  2. Run org-roam-extract-subtree

Backtrace

No error occurs

Expected Results

A file is created with the contents of the heading extracted.

Actual Results

The file is subtly pseudo-deleted. The contents are deleted from the Org buffer you're in, but they are not taken to the proper Roam file. This can result in a scare later where you believe the text has been utterly deleted when, in the extracted roam file, all you see is:

#+title: The failed-extract's headline text here

However, there is a saving grace: in the Emacs autosave file (e.g. 20240314-foo.org~), the full file contents exist.

Reproducibility

  • About 20% of the time?
    • It must have happened a half-dozen to a dozen times when I've extracted subtrees, and I only have 167 nodes total, many of which were created without extraction
    • I'll try and gain more knowledge here, as I go.

I'm not sure what's interacting here, but even the same snippet of text fails to repeat the same behaviour if I undo in my Org buffer and then run the extract again. Maybe it's extremely sensitive to the headline text (which I need to add foo to in order to get a new file to auto populate a distinct file name), but I doubt it.

Environment

  • Emacs: GNU Emacs 30.0.50 (build 4, x86_64-pc-linux-gnu, GTK+ Version 3.24.33, cairo version 1.16.0)
    of 2024-03-03
  • Framework: N/A
  • Org: Org mode version 9.6.15 (release_9.6.15 @ /opt/emacs-30.0.50/share/emacs/30.0.50/lisp/org/)
  • Org-roam: 2.2.2- sqlite-connector: sqlite-builtin

Further Environment Details

This happens on every Emacs from 27 forward. I've been updating various things to see if that helps

Open to beta-test

I'm happy to begin digging in to any targets to test. I see that my 2.2.2 has the exact same org-roam-extract-subtree as main, should I try a different branch?

@jmay
Copy link
Contributor

jmay commented Mar 14, 2024

@n-hebert I've been using org-roam-extract-subtree fairly regularly without problems. Any extra info you can provide would be appreciated. Some things in particular to check:

  • is the subtree you are extracting already an org-roam node? i.e. does that entry have an ID property.
  • is the original file immediately at the top level under org-roam-directory, or a subdirectory, or elsewhere?

@n-hebert
Copy link
Author

Great questions! At this point I'm totally confused as to what causes it so I'll take these suggestions and test on them.

Please feel free to add more as they appear

@akashpal-21
Copy link

akashpal-21 commented Mar 16, 2024

cannot replicate on my end - emacs 29.2 - works alright for me.

Please provide the exact file if possible - will try to replicate the issue.

@n-hebert
Copy link
Author

n-hebert commented Apr 6, 2024

About the Rarity

cannot replicate on my end - emacs 29.2 - works alright for me.

That's pretty reasonable to hear on my side; I don't expect anyone to replicate without significant & heavy daily usage of this function. It's what I would say is a real rare edge case, but tragically is also critically important to avoid, since it's a potential data loss.

It is still happening

I did get it to happen again in GNU Emacs 30.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.33, cairo version 1.16.0) of 2024-03-24.

One sign that it has happened is that there is no reference to the extracted subtree's title when you do a follow up C-c C-n i, but this is necessary and not sufficient to show the bug; I've seen that happen a lot and am still trying to find out why to better file a bug.

It's not input or file-name dependent

If I undo, delete files, db sync, and re-run the extraction, everything is fine. It is not input related, so pasting my exact files won't help.

That being understood...

The output files do look interesting

It's always like this, with the title in the one, and the whole data in the backup file, with the ID still on the top heading.

20240405164729-emacs_tasks.org

#+title: Emacs Tasks [1/4]

20240405164729-emacs_tasks.org~

Note that this is the alternative, backup file for Emacs.

* Emacs Tasks [1/4]
:PROPERTIES:
:CREATED:  [2024-03-09 Sat 11:29]
:ID:       0ea61981-1ad0-478c-823f-cf79d3e795c7
:END:
** TODO Learn more about Bookmarks
:PROPERTIES:
:CREATED:  [2024-03-09 Sat 11:29]
:END:
** TODO Learn more about Registers
:PROPERTIES:
:CREATED:  [2024-03-09 Sat 11:29]
:END:
** DONE Learn how to IRC in Emacs
CLOSED: [2024-03-09 Sat 13:36]
   :PROPERTIES:
   :CREATED:  [2024-02-10 Sat 09:43]
   :END:
** TODO Improve ERC experience
:PROPERTIES:
:CREATED:  [2024-03-09 Sat 13:36]
:END:

Hypothetical root cause

Could this be something to do with autosave mode? It's the only thing I'm thinking might explain the rarity. If the changes aren't atomic and emacs runs a save on the incomplete extraction in the background, maybe that's what's happening?

I'm absolutely stabbing in the dark here. I have no data. Please suggest tests! 😅

@n-hebert
Copy link
Author

n-hebert commented Apr 6, 2024

  • is the subtree you are extracting already an org-roam node? i.e. does that entry have an ID property.

Not an org-roam node, no. In general I only extract non-Roam nodes, as I'm converting all things to be more Roam-y

  • is the original file immediately at the top level under org-roam-directory, or a subdirectory, or elsewhere?

Directly under the org-roam directory, yes. I doubt this will change the response of the bug, but the times I've observed it have been directly under the org-roam directory.

@akashpal-21
Copy link

akashpal-21 commented Apr 6, 2024

I studied the code a little - I dont know personally still why it would fail - but it seems to save multiple times -- I have put some debug towards the end of the function to catch any errors if may occur - maybe we can study the function itself - but we need a process to catch the error more concretely - I cannot reproduce the error; and I am out of ideas what can trigger it

(defun org-roam-extract-subtree ()
  "Convert current subtree at point to a node, and extract it into a new file."
  (interactive)
  ;; Save the current cursor position and restore it after the function completes
  (save-excursion
    ;; Move the cursor to the beginning of the heading or the buffer's start
    (org-back-to-heading-or-point-min t)
    ;; Check if the cursor is already at the top-level node, if so, abort
    (when (bobp) (user-error "Already a top-level node"))
    ;; Generate an Org ID for the current heading if it doesn't already exist
    (org-id-get-create)
    ;; Save the current buffer
    (save-buffer)
    ;; Update the Org-roam database for the current file
    (org-roam-db-update-file)
    ;; Declare variables for template information, current node, and template string
    (let* ((template-info nil)
           (node (org-roam-node-at-point))
           (template (org-roam-format-template
                      ;; Generate a template for the new node
                      (string-trim (org-capture-fill-template org-roam-extract-new-file-path))
                      ;; Handle filling the template with information from the current node
                      (lambda (key default-val)
                        (let ((fn (intern key))
                              (node-fn (intern (concat "org-roam-node-" key)))
                              (ksym (intern (concat ":" key))))
                          ;; Check if there's a function bound to the key, if yes, call it with the current node
                          (cond
                           ((fboundp fn) (funcall fn node))
                           ;; Check if there's a node function bound to the key, if yes, call it with the current node
                           ((fboundp node-fn) (funcall node-fn node))
                           ;; Otherwise, prompt the user to enter the information
                           (t (let ((r (read-from-minibuffer (format "%s: " key) default-val)))
                                ;; Put the user-entered information into the template info
                                (plist-put template-info ksym r)
                                r)))))))
           ;; Prompt the user to enter the file path to extract the node to
           (file-path
            (expand-file-name
             ;; Use a template to suggest a file name to the user
             (read-file-name "Extract node to: "
                             ;; Specify the default directory as org-roam-directory
                             (file-name-as-directory org-roam-directory) template nil template)
             org-roam-directory)))
      ;; Check if the file path already exists, if yes, throw a user error and abort
      (when (file-exists-p file-path)
        (user-error "%s exists. Aborting" file-path))
      ;; Cut the current subtree from the buffer
      (org-cut-subtree)
      ;; Save the buffer again
      (save-buffer)
      ;; Open the file specified by file-path in a new buffer
      (with-current-buffer (find-file-noselect file-path)
        ;; Paste the subtree into the new buffer
        (org-paste-subtree)
        ;; Promote the subtree until it becomes a top-level heading in the new buffer
        (while (> (org-current-level) 1) (org-promote-subtree))
        ;; Save the new buffer again
        (message "Saving extracted node...")
        (condition-case err
            (progn
              (save-buffer)
              (message "Extracted node saved successfully!"))
          (error
           (message "Error while saving extracted node: %s" err)))
        ;; Promote the entire buffer as an Org-roam note
        (org-roam-promote-entire-buffer)
        ;; Save the new buffer again
        (message "Saving promoted buffer...")
        (condition-case err
            (progn
              (save-buffer)
              (message "Promoted buffer saved successfully!"))
          (error
           (message "Error while saving promoted buffer: %s" err)))))))

basically afai understand -- it first creates an id if it doesn't exist - then asks user for file name - then pastes the content - then promotes until it is a top level node then runs org-roam-promote-entire-buffer to take the :ID: from it and paste it at the top of the buffer (more as in deletes the headline then pastes it as title after the metadata)- I feel if its bugging - its towards the end wherein it is promoting the node.

(defun org-roam-promote-entire-buffer ()
  "Promote the current buffer.
Converts a file containing a single level-1 headline node to a file
node."
  (interactive)
  (unless (org-roam--buffer-promoteable-p)
    (user-error "Cannot promote: multiple root headings or there is extra file-level text"))
  (org-with-point-at 1
    (let ((title (nth 4 (org-heading-components)))
          (tags (org-get-tags)))
      (kill-whole-line)
      (org-roam-end-of-meta-data t)
      (insert "#+title: " title "\n")
      (when tags (org-roam-tag-add tags))
      (org-map-region #'org-promote (point-min) (point-max))
      (org-roam-db-update-file))))

If I had to guess its related to this function and probably my intuition tells me its when its executing org-map-region.
I think you wouldn't face this problem anymore in newer emacs - but keep eyes open and let know if it occurs again.

If #+title is present - then we are sure where we have reached in the process - keep this function on if you have fear about data loss - the debug messages will confirm that the final buffer is prepared. The function saves one last time after this.

(defun org-roam-promote-entire-buffer ()
  "Promote the current buffer.
Converts a file containing a single level-1 headline node to a file
node."
  (interactive)
  ;; Buffer is promotable if 1- There is only one level 1 headline and
  ;; 2 - There is no object including blank lines before first headline (h1)
  (unless (org-roam--buffer-promoteable-p)
    (user-error "Cannot promote: multiple root headings or there is extra file-level text"))
  ;; Debug message: Print confirmation that the buffer is promoteable
  (message "Buffer is promoteable. Proceeding with promotion...")
  (org-with-point-at 1
    (let ((title (nth 4 (org-heading-components)))
          (tags (org-get-tags)))
      (kill-whole-line)
      ;; Debug message: Print the extracted title and tags
      (message "Extracted title: %s, Tags: %s" title tags)
      ;; Go to end of :Properties: Drawer
      (org-roam-end-of-meta-data t)
      ;; Debug message: Print confirmation that the metadata section has been reached
      (message "Reached end of metadata section.")
      (insert "#+title: " title "\n")
      (when tags
        (org-roam-tag-add tags)
        ;; Debug message: Print confirmation that tags have been added
        (message "Tags added: %s" tags))
      ;; We have successfully reached here already
      ;; Promote each headline 1 level up, if error occurs - it MUST be here
      (org-map-region
       #'org-promote
       (point-min)
       (point-max))
      ;; Debug message: Print confirmation that promotion is completed
      (message "Promotion completed.")
      (org-roam-db-update-file))))

But I still struggle to understand what can go wrong here - it is a very simple procedure, and may be an extreme edge case. But until we have a way to consistently reproduce the error - solving it is next to impossible- the underlying code is extremely simple and well laid out.

@akashpal-21
Copy link

Another wild guess - but we are going into the realm of conspiracy theory with this one - is sometimes character encoding can throw wrench in the cogs -- I have utf-8 encoding for all my files for this purpose

(set-default-coding-systems 'utf-8)                                             ; set utf-8 encoding as default; iso-latin-1 is /very/ restrictive
(add-to-list 'file-coding-system-alist '("\\.org\\'" . utf-8))                  ; Do not revert to iso-latin-1 on revert buffer

But then again - wild conspiratorial guess.

@n-hebert
Copy link
Author

n-hebert commented Apr 8, 2024

All great input, thanks.

Am I correct to say that second snippet is written such that I can just add it to an init.el area and it will over-ride the current function with one with more debug output? I can try that; that will get us more info.
I can also try the strict file encoding; I'm not averse to that.

Guiding from your new information to me, my expectation that it is an autosave-mode interaction grows sharply. If it saves multiple times, it may get interrupted and fight with autosave-mode -- what happens in those cases?

I would not call this extremely rare or impossible to fix at all. We simply need to collect more data. It's not been too long that this has been open 😉 -- we'll get a track on it, I'm sure. It is a recurring failure, even if a bit slippery.

@akashpal-21
Copy link

Am I correct to say that second snippet is written such that I can just add it to an init.el area and it will over-ride the current function with one with more debug output?

There's no one way of doing something -- but important to remember that execution and time are very intimately related, so depending on when and where a function is last read, results might be different - if org-roam functions are lazy loaded for efficiency not all definitions might be read before hand and subject to the original function is evaluated later for whatever reason your init.el definition might be discarded if you do not use a different name - so we should ideally use a different name and use one of the different approaches to override the original function - one safe way imo is to use an advice override

(defun debug/org-roam-promote-entire-buffer ()
  "THIS IS A DEBUG FUNCTION
THIS FUNCTION IS TEMPORARILY DEFINED TO OVER-RIDE THE ORIGINAL FUNCTION

Promote the current buffer.
Converts a file containing a single level-1 headline node to a file
node."
  (interactive)
  ;; Buffer is promotable if 1- There is only one level 1 headline and
  ;; 2 - There is no object including blank lines before first headline (h1)
  (unless (org-roam--buffer-promoteable-p)
    (user-error "Cannot promote: multiple root headings or there is extra file-level text"))
  ;; Debug message: Print confirmation that the buffer is promoteable
  (message "Buffer is promoteable. Proceeding with promotion...")
  (org-with-point-at 1
    (let ((title (nth 4 (org-heading-components)))
          (tags (org-get-tags)))
      (kill-whole-line)
      ;; Debug message: Print the extracted title and tags
      (message "Extracted title: %s, Tags: %s" title tags)
      ;; Go to end of :Properties: Drawer
      (org-roam-end-of-meta-data t)
      ;; Debug message: Print confirmation that the metadata section has been reached
      (message "Reached end of metadata section.")
      (insert "#+title: " title "\n")
      (when tags
        (org-roam-tag-add tags)
        ;; Debug message: Print confirmation that tags have been added
        (message "Tags added: %s" tags))
      ;; We have successfully reached here already
      ;; Promote each headline 1 level up, if error occurs - it MUST be here
      (org-map-region
       #'org-promote
       (point-min)
       (point-max))
      ;; Debug message: Print confirmation that promotion is completed
      (message "Promotion completed.")
      (org-roam-db-update-file))))

;; Override org-roam-promote-entire-buffer with debug version using advice
(advice-add 'org-roam-promote-entire-buffer :override #'debug/org-roam-promote-entire-buffer)

something like this would be more appropriate for your init.el in my understanding.

also consider turning on #'debug-on-error -- this will provide a trace on encountering errors which we can read also --

Finally,


> Backup Files (filename~):

Backup files are created by default when you save a file. They have the same name as the original file with a ~ character appended to the end.
For example, if you edit a file named example.txt, Emacs will create a backup file named example.txt~.
These backup files contain the contents of the original file as it was before the most recent save.
Backup files provide a simple and effective way to recover the contents of a file if something goes wrong during editing.

> Auto-save Files (#filename#):

Auto-save files are created periodically while you are editing a file. They have # characters appended to the beginning and end of the filename.
For example, if you are editing example.txt, Emacs will periodically create auto-save files named #example.txt#.
Auto-save files contain the contents of the buffer as it was last auto-saved. They are typically created every few minutes by default.
Auto-save files are useful for recovering your work in case Emacs crashes or your computer loses power unexpectedly.

My intuition tells me autosave has nothing to do -- but I can be wrong.

Please let know if you face the same error again, also look into your configuration for things -- for example you seem to introduce custom property metadata somehow,, check if they are valid and so on.

@n-hebert
Copy link
Author

n-hebert commented Apr 9, 2024

Sounds good, thanks for those snippets. I'll plug them in.
As to auto-save files, I didn't realize that the #hashmarked.files were called auto-saves.

I meant this:
http://xahlee.info/emacs/emacs/emacs_auto_save.html

My init.el has

;; Autosave. See more examples at http://xahlee.info/emacs/emacs/emacs_auto_save.html
(when (>= emacs-major-version 26)
  (setq auto-save-visited-mode 1)
  )

Hopefully that seems incredibly related. It writes to the active file that's opened.

Maybe the test here is to crank that up to every second and see how badly this function behaves while cutting a bunch of headings?

@akashpal-21
Copy link

Hopefully that seems incredibly related. It writes to the active file that's opened.

Maybe the test here is to crank that up to every second and see how badly this function behaves while cutting a bunch of headings?

I personally think this mode is incredibly dangerous, but not because it should conflict -- functions occur one after the other, there is no possibility that I know of in which when you execute two functions -- at the same time if it was possible -- somehow the result is a random linear combination of the two functions -- but ofcourse problems may arise if somehow these two activities were related by way a common yet to be determined function - but from studying the mode I couldn't spot any obvious common connexion -- but on an unrelated note: this mode undoes any benefit that Emacs tries to offer with its autosave files;

The general condition is of chaos and disequilibrium - we wish to create a controlled environment in which inputs and outputs are predictable -- and one way to avoid chaos is not to execute any and all random modes and functions, more times than not you'd find things don't mutually work together very well, things are not as you predicted - etc,

But subject to you having faith, and my ability to understand the problem - the autosaves are most probably not what's causing the problem -- your problems should be resolved in newer versions of org, consider subscribing to getting the latest org versions and live life without fear. If next time such a problem ever to occur, youd surely catch it in its root and it would be trivial to create condition cases to resolve it. Keep us updated,

Best.

@n-hebert
Copy link
Author

I deleted an earlier message that might have popped a notification, as, during writing about how I couldn't quite get it entirely pinned down, I actually pinned it down.

This took hours of dedicated testing work 😅

In brief:

  • Extract a folded heading
  • That has a property

In full, try this repo: https://github.com/n-hebert/roam-2425

Full steps and all details on how to get the bug with 100% reproducibility and totally input-dependent results.

Startlingly, the emacs auto save files don't seem to be default, so in this repo's case, the data loss truly occurs and there is no backup.org~ working for me to act as a saving grace in there.

Now we can probably start figuring out what's wrong in the code!

@n-hebert n-hebert changed the title org-roam-extract-subtree results in potential data loss org-roam-extract-subtree results in data loss when extracting folded headings with properties under them Apr 23, 2024
@akashpal-21
Copy link

Good work - #'org-roam-promote-entire-buffer fails in this case - we need to expand before killing entire line

(defun custom/org-roam-promote-entire-buffer ()
  "Promote the current buffer.
Converts a file containing a single level-1 headline node to a file
node."
  (interactive)
  (unless (org-roam--buffer-promoteable-p)
    (user-error "Cannot promote: multiple root headings or there is extra file-level text"))
  (org-with-point-at 1
    (let ((title (nth 4 (org-heading-components)))
          (tags (org-get-tags)))
      ;; expand before killing line
      (org-fold-show-all)
      (kill-whole-line)
      (org-roam-end-of-meta-data t)
      (insert "#+title: " title "\n")
      (when tags (org-roam-tag-add tags))
      (org-map-region #'org-promote (point-min) (point-max))
      (org-roam-db-update-file))))

(advice-add 'org-roam-promote-entire-buffer :override #'custom/org-roam-promote-entire-buffer)

this will fix it.

@akashpal-21
Copy link

I did some more testing - the bug seems to emanate from how #'(kill-whole-line) behaves. The behaviour is uncertain -- when on the beginning of the line it kills all folded content - if not on the first character it doesn't kill the folded content.

The following operates the same way -- very untuitive behaviour. Just for example -- use the above which explicitly unfolds the region.

(defun custom/org-roam-promote-entire-buffer ()
  "Promote the current buffer.
Converts a file containing a single level-1 headline node to a file
node."
  (interactive)
  (unless (org-roam--buffer-promoteable-p)
    (user-error "Cannot promote: multiple root headings or there is extra file-level text"))
  (org-with-point-at 1
    (let ((title (nth 4 (org-heading-components)))
          (tags (org-get-tags)))
      ;; kill-whole-line behaviour crooked
      ;; if on beginning of line kills folded text
      ;; if on point >1 doesn't kill folded text? why
      (forward-char)
      (kill-whole-line)
      (org-roam-end-of-meta-data t)
      (insert "#+title: " title "\n")
      (when tags (org-roam-tag-add tags))
      (org-map-region #'org-promote (point-min) (point-max))
      (org-roam-db-update-file))))

(advice-add 'org-roam-promote-entire-buffer :override #'custom/org-roam-promote-entire-buffer)

@akashpal-21
Copy link

Also consider formatting the text better -- instead of one new line character after inserting #+title -- two new lines looks better it doesn't put the content smashed just next to the title. Change the (insert "#+title: " title "\n") to (insert "#+title: " title "\n\n") -- one extra "\n" it would look better to put a new line between #+title and the content.

@n-hebert
Copy link
Author

n-hebert commented Apr 23, 2024

I did some more testing - the bug seems to emanate from how #'(kill-whole-line) behaves. The behaviour is uncertain -- when on the beginning of the line it kills all folded content - if not on the first character it doesn't kill the folded content.

Ah! I suspected this, but didn't quite pin it! This is why I wrote on my steps to just hit the down cursor twice. I noticed that doing so upped my hit rate to 100% and wanted to just capture it before asking why. 😅

It's quite a wacky behaviour that is not in any way obvious to a naive user just extracting the tree or killing a line, whatever the depth of your knowledge on the malfunction. Great find. Also thanks for pointing this out in specific, as you've restored a piece of my sanity. 😂

It sounds like we'll be able to commit a fix to org-roam to work around this, but it also sounds like there's an upstream, maybe emacs itself, where a more in depth discussion about kill-line semantics could take place. Could go either way upstream, so just expanding the content in this function makes so much sense. Costs basically nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants