Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve markup regex to be consistent with latex & html export #117

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

sipi
Copy link
Member

@sipi sipi commented Aug 10, 2018

#114 one bold, verbatim, italic, underline or code character does not correctly end
#115 Feature request: manage +strike-through+

improve markup regexp to fulfill test in the file below (test with latex & html export), also add support for +strike-through+ (but it does not really strike-through :(, text is in red instead)

specification draft found here: https://orgmode.org/worg/dev/org-syntax.html#Emphasis_Markers

See below, some tests:

#+OPTIONS: ^:nil #disable super/sub script.

/Tested with latex & html export./

* match bold, italic, code, verbatim, strike-through but NOT UNDERLINE !!!
/Is a emacs org-export bug ? I think yes./

 (=test=) 

 (=test=)  (=m=)

 {=test=}  {=m=}

 '=test=' '=m='

 "=test=" "=m="

 (=test=  (=m=

 {=test=  {=m=

 '=test=  '=m=

 "=test=  "=m=

 (=test=' (=m=' 

 {=test=' {=m='

 '=test=) '=m=)

 "=test=' "=m='

  not match"=match="not match

 (=match=

 {=test=}

 (=match="

(=match=}

* should match

/Note : letter a and sequence "not match" should not match. Others should match./

=test=

=t=

a =test=

a =t=

a =test= a

a =t= a

=test test=

=t test=

a =test test=

a =t test t test t test= a

  =(test)=

  ={test}=

  not match =match=

  =m= not match =match=

=match= not match =m=

=m= not match =m=

=match= not match =match=

=t= =t= =t= =t= =t=

=match= =m= =match= =match= =m= =m= =m= =match= =m=

=it match= =it should match= =it match= =it should match= =it match= =it should match=

=match=m=match=   # match globally

=match=match=match=    # match globally

=match m=m match m=m match=    # match globally

=match=
  
  =match='

  =match=)

  =match=}

  =match="


* should not match
/Note : nothing should match./

=test=]

}=test={

}=test=(

<=test=>

«=test=»

a=test=a

[=test=]

=test={

=test=(

}=test=

)=test=

 =test =

 =test=a=test =

 =not match t=t not match t=t not match =

 not heading = test =

 not heading = test=a=test =

 not heading = not match t=t not match t=t not match =

 not heading = test=

 not heading = test=a=test=

 not heading = not match t=t not match t=t not match=


* mixed
/Note : "match" should match, "not match" should not match!/


=match= not match =match= 

=m= n =m=

=match= = not match = =match=

=m= not match =match= 

=match= not match =m=

also add support of strike-through

The three followings tests does not success (due to consumption of the space just after the end markup char, and it can't be use anymore for next match)

=t= =t= =t= =t= =t=

=match= =m= =match= =match= =m= =m= =m= =match= =m=

=it match= =it should match= =it match= =it should match= =it match= =it should match=

Copy link
Member Author

@sipi sipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some identified cases are not yet managed…
see #114

@sipi
Copy link
Member Author

sipi commented Sep 13, 2018

🎉 I found specification draft here: https://orgmode.org/worg/dev/org-syntax.html#Emphasis_Markers

PRE MARKER CONTENTS MARKER POST

PRE is a whitespace character, (, {, ' or a double quote. It can also be a beginning of line.

MARKER is a character among * (bold), = (verbatim), / (italic), + (strike-through), _ (underline), ~ (code).

CONTENTS is a string following the pattern:

BORDER BODY BORDER

BORDER can be any non-whitespace character excepted ,, ' or a double quote.

BODY can contain contain any character but may not span over more than 3 lines.

BORDER and BODY are not separated by whitespaces.

CONTENTS can contain any object encountered in a paragraph when markup is “bold”, “italic”, “strike-through” or “underline”.

POST is a whitespace character, -, ., ,, :, !, ?, ', ), } or a double quote. It can also be an end of line.

PRE, MARKER, CONTENTS, MARKER and POST are not separated by whitespace characters.


Note that in fact POST can also be ;

@dmytrokyrychuk
Copy link

@sipi I think you're meant to put changes in org.tmLanguage.template.json and then run npm run generate-syntaxes in order to update org.tmLanguage.json.

Copy link
Member

@AdrieanKhisbe AdrieanKhisbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed @sipi as @dmytrokyrychuk point out, changes need first to be made to the grammar template, then regenerate syntaxes/org.tmLanguage.json

@sipi
Copy link
Member Author

sipi commented Oct 29, 2018

Thanks for your reviews, I will look that as soon as possible…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants