Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subscripting error? #9

Open
rbjones opened this issue Dec 26, 2019 · 10 comments
Open

Subscripting error? #9

rbjones opened this issue Dec 26, 2019 · 10 comments
Assignees
Labels

Comments

@rbjones
Copy link
Collaborator

rbjones commented Dec 26, 2019

According to usr001 the down character is intended to have the same behaviour as the LaTeX _ character.
That should then be to subscript the next character.
It doesn't do that, it typically will subscript the rest of the name (I haven't figured out the exact meaning of the macro).
This is determined by the keyword file.

We should adjust the description and the reality to fit each other.
Personally, dropping a single character would suit me better so I would be inclined to amend the keyword file, but I guess that would upset any users who actually like it the way it is.
But if we are to leave it alone we should change the description.

I would settle for adding "_" to the characters which terminate a subscript, since the occasions when I have a subscript inside an identifier it is always followed by an underbar. (and making the description accurate).

@rbjones
Copy link
Collaborator Author

rbjones commented Dec 27, 2019

Studying the macro definition more carefully I can't see why it is not terminating the subscript at an underbar (which is what I would like it to do), so I will look at the code in sieve and see if I can spot a bug there.

@rbjones rbjones self-assigned this Dec 27, 2019
@RobArthan
Copy link
Owner

RobArthan commented Dec 27, 2019

I can't find that statement in usr001.pp, but if it is there, it is out of date. The feature allowing multi-character subscripts was added some years ago at the express request of QinetiQ. Underscores do seem to terminate subscripts when I try it: pptex converts ABC⋎123_ into +ABC\PrIJ{123}_.

@rbjones
Copy link
Collaborator Author

rbjones commented Dec 27, 2019

The passage I refer to is in section 9.9. I would be quite happy to update this to refer to the keyword file and to describe the action prescribed by the default keyword file. However the actual behaviour does not seem to me to correspond to my reading of the regular expression, so that leaves me stumped. I attach a recent document which uses the down character but only ever one per identifier and I am finding that the down always continues on to the end of the identifier even though there is often an underbar on the way.
t055.pdf

@RobArthan
Copy link
Owner

The regular expression in the sieve keyword file should read:

[^ ⌝%]|%[a-zA-Z]%|[a-zA-Z][a-zA-Z0-9][a-zA-Z0-9]|[0-9]+

which matches: (1) any single character other than space, tab, end-of-quotation or percent sign, (2) an alphanumeric identifier enclosed in percent signs, (3) an alphanumeric identifier, or, (4) a decimal number.

Unfortunately, the ⌝ character appears as ® in the sieve keyword file.

@RobArthan
Copy link
Owner

Where is the problem in t055.pdf? (Did you mean to post the PDF?)

@rbjones
Copy link
Collaborator Author

rbjones commented Dec 27, 2019

OK, so I changed pptex.skw to this (the * characters and a tab character were lost in your post, and maybe will be in mine when I push the button):

\PrIJ#[^ ⌝%]|%[a-zA-Z]%|[a-zA-Z][a-zA-Z0-9][a-zA-Z0-9]|[0-9]+

But I'm presuming that one character change will not affect my problem, so I didn't rebuild to check.
I should have posted t056.pdf, t056.tex says pretty much what you would expect judging by the subscripting, e.g. things like \PrIJ{o_thm}, so the question is, why did it include the "_"?
t056.pdf

@rbjones rbjones added the bug label Dec 28, 2019
@rbjones
Copy link
Collaborator Author

rbjones commented Dec 28, 2019

This is now known (Rob's detective work) to be a bug caused by reading the keyword file in locale C and compiling regular expressions in that locale and then executing them in a utf8 locale.

@rbjones
Copy link
Collaborator Author

rbjones commented Dec 28, 2019

Among an as yet unknown number of changes if we are to process utf8 keyword files is the need to select a default appropriate to the mode of working,
Hitherto it was assumed that the keyword files were ascii and the same file is chosen whatever the mode.

@rbjones
Copy link
Collaborator Author

rbjones commented Dec 29, 2019

This is all resolved on the utf8 branch.

@rbjones rbjones closed this as completed Dec 29, 2019
@rbjones rbjones reopened this Dec 29, 2019
@rbjones
Copy link
Collaborator Author

rbjones commented Dec 29, 2019

I didn't do the update to usr001.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants