Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserving indents and consecutive newlines #44

Open
daeh opened this issue Mar 14, 2021 · 3 comments
Open

Preserving indents and consecutive newlines #44

daeh opened this issue Mar 14, 2021 · 3 comments

Comments

@daeh
Copy link

daeh commented Mar 14, 2021

Context
I just found this repo and it looks really useful. I'm curious about the feasibility of preserving the whitespace features of evernote files.

I made heavy use of indents and consecutive line breaks in evernote. Would it be possible to preserve the whitespace features of the CDATA? It shouldn't cause any problems for vanilla markdown editors but would allow editors like Typora to use the information.

E.g. when evernote2md processes this CDATA code:

<div>line 1</div>
<div><br /></div>
<div><br /></div>
<div><br /></div>
<div>line 5</div>
<div><br /></div>
<div><br /></div>
<div>level 1</div>
<div style="padding-left:40px;">level 2</div>
<div style="padding-left:80px;">level 3</div>
<div style="padding-left:40px;">level 2</div>

it currently generates this markdown:

line 1

line 5

level 1

level 2

level 3

level 2

whereas it would be fantastic if it generated this instead:

line 1



line 5


level 1
	level 2
		level 3
	level 2

Does this seem like a possible enhancement?

@wormi4ok
Copy link
Owner

Hey @daeh ,

Thanks for the kind words!

I'm afraid this change would be too complicated to implement the way you expect it to work. Evernote generates many meaningless tags that I intentionally skip or remove from the original note to get a clean Markdown. And this is important for users that don't use plain text editors to work with Markdown.

Furthermore, this could work fine with a very basic markup done via Evernote tools. Still, many people use Evernote web clipper to capture content from the websites and rendering correct whitespaces for such content will require a headless browser under the hood, which is not an option.

As for the example, some tags could represent the semantics of the content better than whitespace. Exactly three lines between line 1 and line 5 could be represented as a table:

| line 1 |
|        |
|        |
|        |
| line 5 |

Different indentation levels are best reflected in lists.

* level 1
    * level 2
        * level 3
    * level 2

I will keep this issue opened for some time to see if this interest in this problem. So far, the use case looks pretty narrow; only people that use Typora editor, write the notes in Evernote manually and rely on whitespaces in the notes would benefit from this feature.

@dge8
Copy link
Contributor

dge8 commented Aug 30, 2021

+1

The combination of <div> and <br/> is causing me trouble too.

I will add that sometimes Evernote leaves out the <div>s, e.g.:

<div>This is a line.</div>
This is another line in the same paragraph.
<div><br/></div>
<div>This is a new paragraph.</div>
<div>This paragraph has a second line too.</div>

becomes this:

This is a line.
This is another line in the same paragraph.

This is a new paragraph.

This paragraph has a second line too.

My solution for now has been to comment out the extra newline for divs in godown.go (line 362 (here)[https://github.com/wormi4ok/godown/blob/1bea1c1b0bac9b8a8980f3b23c980f785d4a1410/godown.go]) and then manually re-split the lines on those few notes where the lines were combined because of missing <div>s.

It seems to me that the better long-term solution would be for godown to still include newlines before and after a <div>, but to recognise when two <div>s are adjacent and only include a single newline in that case. A <br/> in a <div> would be an empty div (it seems to do that already).

Another possibility is for evernote2md to simply strip the <div> tags before calling godown, but it seems plausible that this would break clipped websites.

For padding/indentation, what if evernote2md simply divided the padding-left attribute by 10 and prefixed that number of non-breaking space characters? Only for divs.
e.g.
<div style="padding-left:40px;">level 2</div>
becomes
<div>&nbsp;&nbsp;&nbsp;&nbsp;level 2</div>

@andrijast
Copy link

I agree that this problem should be handled. I don't see a reason for the <br/> tag to be ignored, as it brings inconsistency in rendering and loses information.

For anyone interested, a workaround for this would be to first replace all <br/> tags with something like %br% in .enex file, and then convert it like that. Now you get plain %br% wildcards in your .md files, and you can use some regex trickery to format them to your will.

Here are the substitutions I used:

/\n\n(?!%)/gm  ->  "\n"
/\n%br%/gm     ->  ""

Of course, this method doesn't cover all the cases, but I think it reduces manual intervention by a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants