Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert tab-defined tables into real tables #16

Open
Conal-Tuohy opened this issue Dec 13, 2016 · 8 comments
Open

Convert tab-defined tables into real tables #16

Conal-Tuohy opened this issue Dec 13, 2016 · 8 comments
Assignees

Comments

@Conal-Tuohy
Copy link
Owner

Tables that are defined using tabs ("tabular sections") should be converted into TEI tables.

Need to clearly distinguish tabular sections from other uses of tabs, such as to indent paragraphs.

Some tabular sections will be unevenly tabulated, because a variable number of tabs will be used to achieve the same alignment on different paragraphs, depending on the width of the text in those paragraphs. These tabular sections should be excluded from automatic conversion to tables, and left as a residual to be manually edited.

@Conal-Tuohy
Copy link
Owner Author

Conal-Tuohy commented Dec 13, 2016

Probably best to deal with the paragraphs that tabs for indent as a separate issue, and come back to this more complex step when those tabs have been converted into paragraph indent formatting, and tidied away.

@LucasHorseshoeBend
Copy link
Collaborator

Is the features facet "tab alignment" supposed to capture these?
In some cases I think that is what is being shown, and in others I can't see it.

@Conal-Tuohy
Copy link
Owner Author

Yes the "tab alignment" facet value is supposed to identify letters in which appear a sequence of two or more paragraphs which contain tab characters, where the tabs are not at the start of the paragraph. Where a tab occurs only at the start of the paragraph, I've assumed that's not for aligning into columns, but rather just a paragraph indentation.

Maybe that's not a foolproof test, but it's the best I could come up with. Any suggestion for improvement?

Also, if you can point to an example of a table which is constructed with tab characters (rather than a Word table), but which doesn't belong to that facet, please post a link. Cheers!

@LucasHorseshoeBend
Copy link
Collaborator

This still causes problems in XProc. e.g. see 85-08-15a
I had thought the solution is editorial, by creating tables which would work in the one above, but problematic in cases like 54-09-00. I know an editorial solution to this too, just need to add one or two spaces before the tab I think. Only 41 in a final state, so not insurmountable, and it's probably better for me to spend time on that than you trying to find a tweak that will discriminate cases. Views?

@Conal-Tuohy
Copy link
Owner Author

Here are the documents referred to, both of which have a sequence of paragraphs containing tabs, which aren't converted to tables:
https://vmcp.rbg.vic.gov.au/id/85-08-15a
https://vmcp.rbg.vic.gov.au/id/54-09-00
NB actually these two cases might more appropriately be converted to lists rather than tables, though it's not a big deal if they are treated as tables.

@Conal-Tuohy Conal-Tuohy self-assigned this Sep 30, 2022
@LucasHorseshoeBend
Copy link
Collaborator

Unfortunately, Word's list capability appears limited to strictly defined presets—lots of them—with the only options allowing styling for fonts or so on.
I've tried and can't adapt any to reflect the way most of these documents are written.
I will find the "best" way for each case, if necessary by iteration.

@Conal-Tuohy
Copy link
Owner Author

Conal-Tuohy commented Oct 3, 2022

@LucasHorseshoeBend yes I agree that Word's "lists" aren't adequate to capturing these lists, and that comment about converting them to lists was more of a note to myself; I meant that the Word-to-TEI converter could convert them to a TEI list instead of a TEI table. But the difference between a TEI list and a TEI table with just two columns is not huge. I'd rather just fix this bug and get them converted to a table, and put off converting them to lists until later on, or never.

@LucasHorseshoeBend
Copy link
Collaborator

I did amend "most of" the cases (I missed one block in one of the letters), so that where the XProc display said, e,g, "dodo" meaning two dittos under a previous entry these have now been separated as "do do", and where numbers followed by a tab then text now reads, e.g. "1 some text" instead of "1sometext".

Many of the cases would not work as tables, because the line above, say, was set out with spaces and not tabs so it would produce a more misleading representation than the characters separated but not aligned vertically. To make a meaningful table on those cases would require editorial intervention anyway, so it's not worth getting rid of the bug, which would risk not picking up the resulting problem cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants