Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-line text blocks #278

Open
notator opened this issue Dec 13, 2021 · 17 comments
Open

Multi-line text blocks #278

notator opened this issue Dec 13, 2021 · 17 comments

Comments

@notator
Copy link
Contributor

notator commented Dec 13, 2021

Multi-line text blocks have been discussed before, but we don't seem to have reached any conclusion. I think we need them, and that they would be of general use.

In #277, I used a temporary strategy that involved defining the following two elements:

  • <string>:
    • Attribute text (a String): defines a string of text that will be rendered in a single line.
    • Content: none
  • <text-block>:
    • Attibutes: none
    • Content: one or more <string> elements that define the lines of a multi-line text block.

Can this be improved? Is there a better way to do it?

@joeberkovitz
Copy link
Contributor

I agree that we need them. I foresee style properties governing their font size, dimensions, borders and padding/margins in a natural way. Ultimately there should be a way of including these elements directly in a <page> so that text blocks can be freely interpolated with systems, e.g.

   <score>
       <page>
        <system .../>
        <text-block .../>
        <system .../>
      </page>
   </score>

@clnoel
Copy link

clnoel commented Dec 13, 2021

I like this in general. We should say that any element that has String content (not an attribute) should be changed to take a <text-block> instead. Right now, this is directions->expression, directions->instruction, and part->part-abbreviation. (Hmm... do we even still need that last one?)

Caveats:

  1. The text-block needs a justification attribute that governs how the strings in the block justify relative to each other. (left|right|center)
  2. This is not a new issue here, but I want to make it again, since I can't find it anywhere: This String type assumes the same font for the entirety of the string. We might need to allow for <em> and <strong>, where the fonts to be used by those tags get defined by the style properties. And allow for SmufL code-points using something like &segno;, which I remember being proposed somewhere but can't find right now. This would involve changes/clarification to the String type.

@samuelbradshaw
Copy link

samuelbradshaw commented Dec 13, 2021

Instead of specifying a list of HTML tags like <em>, <strong>, etc. that can be borrowed and used in text blocks, what are your thoughts about supporting an <html-block> that can take HTML directly?

For example, here are two examples of credits from a hymnal. The formatting includes:

  • Specific font family and font size that isn't necessarily the same as the lyrics
  • Paragraphs with a hanging indent (<p> with a class)
  • Italicized text (<i>Text:</i>)
  • Emphasized text (<em>works</em>)
  • Small-caps text (<span> with a class)
  • Hyperlinks (<a href="…">Psalm 8:3–9</a>)

IMG_EA9363633A85-1
IMG_F7AFAC2231A9-1

@notator
Copy link
Contributor Author

notator commented Dec 13, 2021

Apparently, we can have an element that is a block of HTML if we import the XHTML namespace into our schema.
See the top answer at https://stackoverflow.com/questions/17012597/how-can-i-include-html-in-my-xml-schema/17014653

That looks quite promising, especially for larger blocks like the ones envisaged by @joeberkovitz above.
I imagine that our html-type <text-block> could have the usual external style properties governing fonts, dimensions, borders and padding/margins, rather in the way that html's <body> also has such default properties.

@mdgood Does that make sense to you?

@notator
Copy link
Contributor Author

notator commented Dec 14, 2021

I'm still not sure if we should go down this road, but defining <text-block> as containing HTML (like HTML's <body>) would seem to be a good way to be able to code line-breaks (using <br/>), and so to enable multiple lines. This is, so far, the only convincing way to do that that I've seen. (Thanks, @samuelbradshaw for the brainstorm.)

If <text-block> contains HTML tags, then the MNX parser is going to have to be able to interpret them.
Recognising and interpreting all the possible HTML tags is (initially) out of the question, but we could decide to support a subset in MNX 1.0. The subset could be just the <br/> tag, but it could also include one or more of the tags @samuelbradshaw mentions:

  • Italicized text (<i>...</i>)
  • Emphasized text (<em>...</em>)
  • Spans (<span> with a class)
  • Paragraphs with a hanging indent (<p> with a class)
  • Hyperlinks (<a href="…">...</a>)

Note that spans and paragraphs can have a class. Presumably, those could be defined locally or inherited from the <text-block>'s context.
Including spans, paragraphs and hyperlinks in MNX 1.0 might make it unnecessarily complicated, and discourage uptake. But, if we decide to adopt this approach in general, I'd be prepared to try it out in a demo application to find out just how difficult it would be (and to test if the approach works at all).

Something that's bothering me:
Using <text-block>s should be made as simple as possible. My original use-case (Example 2 in #277) just consists of two lines separated by a newline. These would be coded like this:

<text-block>
    line1<br/>
    line2
</text-block>

Can I simply assume that the <text-block> has minimal size and minimal margins?

@samuelbradshaw
Copy link

samuelbradshaw commented Dec 16, 2021

Can I simply assume that the <text-block> has minimal size and minimal margins?

I think it would make sense to have something like block, inline, and inline-block (from display in CSS) as an attribute to specify if the element should take the full page width, wrap with its surroundings as inline text, or wrap as a block.

I'm not sold on the name <text-block> because "block" in CSS has the connotation of something that always stretches the full width of the page. This isn't CSS, but people will bring their intuition with them as they try to understand MNX. Additionally, if we allow HTML, "text" may make people think that only plain text is allowed. I would propose <text> or <content-text> or <content-txt> as an element that contains plain text, and something like <html-content> or <content-html> as an element that contains HTML. Or maybe something that could allow for more formats in the future, like <content type="text"> and <content type="html">.

@samuelbradshaw
Copy link

samuelbradshaw commented Dec 16, 2021

Including spans, paragraphs and hyperlinks in MNX 1.0 might make it unnecessarily complicated, and discourage uptake. But, if we decide to adopt this approach in general, I'd be prepared to try it out in a demo application to find out just how difficult it would be (and to test if the approach works at all).

Instead of saying "only these HTML elements are supported," which breaks from the recommendation in the StackExchange link above and prevents us from using the XHTML schema for validation, would it be better to say something like: "Any valid HTML [or XHTML?] content is allowed, but support for rendering complex HTML may vary between applications. Applications that render MNX are expected to support, as a minimum, these common HTML elements inside <content type="html">: [list of common HTML elements]. Unsupported tags or elements may be stripped out or ignored by the application."

This leaves the door open for browser-based applications to use the automatic browser rendering of the HTML, instead of having to parse through the HTML and strip it down to the artificially limited list defined by MNX – and at the same time, it allows flexibility for non-browser-based applications that may not want, or may not be able to include a full webview in their app.

@mdgood
Copy link

mdgood commented Dec 17, 2021

@notator I don't understand this issue. In XML you have multi-line text blocks by including line breaks in the element text. If you want to indicate that applications don't strip them, you can add an xml:space="preserve" attribute to the element containing the text. This works fine in MusicXML. I don't see a need to invent something new to replace what we already get from XML itself.

The XHTML ideas are interesting for other aspects of styling, but they seem unnecessary for multi-line text. So perhaps that discussion could be moved to a different issue?

@notator
Copy link
Contributor Author

notator commented Dec 17, 2021

@mdgood

In XML you have multi-line text blocks by including line breaks in the element text. If you want to indicate that applications don't strip them, you can add an xml:space="preserve" attribute to the element containing the text.

Ah, thanks, I didn't know that.
I've been learning a lot about schemas etc. recently. Among other things, my MNXtoSVG application now does proper validation of MNX files against my draft schema (using code similar to that recommended in this YouTube video).
But your answer considerably simplifies the posting I was about to send here. I'll rework it before sending it.
Thanks again for the information!

@samuelbradshaw:
In the light of @mdgood's information, I now think its highly unlikely that we will be using HTML-like <text-blocks>.

I'm not sold on the name <text-block>

Currently, the docs define specialised elements for <expression>, <instruction> etc. That may well be correct, in which case the naming isn't a problem. Alternatively, such types might be created using a general <text-block> type with a class attribute. In that case, you could be right and we need a better name for the general type.

@ahankinson
Copy link

ahankinson commented Dec 17, 2021

In XML you have multi-line text blocks by including line breaks in the element text. If you want to indicate that applications don't strip them, you can add an xml:space="preserve" attribute to the element containing the text.

Unfortunately, this isn't always true, and when it is true, it may not be what is intended. Newlines can be stripped with standard XML tools, regardless of the value of xml:space.

Given:

<?xml version="1.0" encoding="UTF-8"?>
<p xml:space="preserve" xmlns:xml="http://www.w3.org/XML/1998/namespace">
    This is some text
    split over several lines
    to test whether whitespace is correctly
    handled.
</p>

An xPath query of //p/text() (aka: "Get the text content of the p element") will produce:

This is some text     split over several lines     to test whether whitespace is correctly     handled.

Namely, that a) newlines are stripped but b) indent spaces are preserved. This is likely the opposite of what was intended.

Trying another xPath query, //p/string() does keep the newlines, but also preserves the leading whitespace, including indent levels

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <section>
        <subsection>
            <p xml:space="preserve" xmlns:xml="http://www.w3.org/XML/1998/namespace">
                This is some text
                split over several lines
                to test whether whitespace is correctly
                handled.
            </p>
        </subsection>
    </section>    
</root>

Results in

                    This is some text
                    split over several lines
                    to test whether whitespace is correctly
                    handled.

Trying to normalize spaces, an xPath query of normalize-space(//p/text()) or normalize-space(//p/string()) on either file produces:

This is some text split over several lines to test whether whitespace is correctly handled.

In this case, the indent spaces are collapsed, but the newlines are still not preserved.

A good explanation of the issues are given on this page: http://www.xmlplease.com/xml/xmlspace/

The reason many XML and XML-like schemas (HTML being the main one) even bother with elements like <br /> and don't rely on \n for formatting is exactly because newlines are not guaranteed to be preserved when rendering, and if they are then it's hard to tell "significant whitespace" from "XML-formatting whitespace". The only way to guarantee that a block of text is rendered with newlines where intended, regardless of the processing application and the surrounding markup, is to define elements that can be interpreted as a newline by the consumer.

I would be interested in examples to the contrary!

@mdgood
Copy link

mdgood commented Dec 17, 2021

@ahankinson The string() function is doing exactly what it should, returning all the text, and normalize-space is also working exactly as one would expect. I'm not sure what I would expect text() to do since it's a node test rather than a function.

Pretty-printing text content won't work if you are using xml:space="preserve". You need the exact text content without adding any "XML-formatting whitespace" to make the XML file itself look pretty. This is no problem since MNX use cases have no need for such XML-formatting whitespace. Sometimes MusicXML developers make this mistake when learning the format, but usually only once.

MusicXML applications have been exchanging exact text content with line breaks for nearly 20 years without problems. The way MusicXML does styling is something that MNX could improve, but line breaks work just fine.

@ahankinson
Copy link

ahankinson commented Dec 17, 2021

You need the exact text content without adding any "XML-formatting whitespace" to make the XML file itself look pretty.

If I were to rephrase this to try and understand, are you saying that you need to ensure that you have no "non-significant" (e.g., indent spaces) whitespace in your text blocks? So your text blocks should be left-aligned, like an HTML <pre> block, to ensure that only "significant" whitespace is present in the block?

If I take the following literal XML:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <section>
    <subsection>
      <p xml:space="preserve">
This is some text
with some whitespace
to test how it works.
      </p>
    </subsection>
  </section>
</root>

And put it in the Atom text editor, set it to XML, and then choose "Beautify language: XML", it becomes:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <section>
    <subsection>
      <p xml:space="preserve">
        This is some text with some whitespace to test how it works.
      </p>
    </subsection>
  </section>
</root>

So although Atom is a pretty standard and widely-used tool, but it's "formatting" behaviour does not assume that newlines have significance.

Even If I take a more XML-aware application, like oXygen, and do the same, the first example becomes the following if I choose the 'indent selection' option (having selected the whole file, all other indents are the same):

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <section>
        <subsection>
            <p xml:space="preserve">
                This is some text
                with some whitespace
                to test how it works.
            </p>
        </subsection>
    </section>
</root>

If I choose "Format and Indent" and have "xml:space="preserve"" that it behaves as expected.

While we can argue whether this is correct behaviour, these examples at least show that you cannot necessarily rely on the various transformations that an XML file may go through to preserve the linebreaks and whitespace as intended.

MusicXML has had the advantage that most applications that consumed MusicXML used XML only because they wanted to use MusicXML, and few people (other than developers) actually worked with the XML source directly. This is fine as an interchange format, but if you want to use MNX as XML proper, and have it work consistently with a wide range of generic XML tools, then you will likely need additional text formatting elements.

Edit: It also occurs to me that there will be problems with long text blocks. Editors, human or software, sometimes impose a limit on line length, after which they wrap the text with a newline for ease of reading in the editor software. If all spaces were preserved then this would mean that text would be flowed according to the line length of the XML source inn the editor, rather than allowing the text to fill the rendered width.

@mdgood
Copy link

mdgood commented Dec 18, 2021

For exact space you would more likely want:

<root>
  <section>
    <subsection>
      <p xml:space="preserve">This is some text
with some whitespace
to test how it works.</p>
    </subsection>
  </section>
</root>

All your examples seem to fall into the category of "I tell my tool to reformat the text and then it does it" so I don't understand how they are relevant to MNX. I think MNX has similar XML tooling requirements to MusicXML. You want it to work with tools like parsers available in different languages, editors, differencing tools, and databases. All of those work just fine with standard XML line breaks and white space. If you don't want your white space to change, don't tell your tool to change them.

@ahankinson
Copy link

ahankinson commented Dec 18, 2021

If you don't want your white space to change, don't tell your tool to change them.

I provided small examples for clarity; often, when editing large files you won’t have the opportunity to choose what parts it reformats and what parts it doesn’t. The atom example is a fairly naive, but fair one. I open an existing MNX file, insert some XML and then ask the tool to reflow the XML.

At this point it will reflow the whole file, and remove any line breaks from text content. I may not even be aware that it changed it as it might be in a different part than I was editing.

Again, my point isn’t that this tool is or isn’t “behaving” correctly. My point is that I was, with very little effort, able to come up with several examples of where widely used software tools, in fairly naive usage instances, can remove whitespace characters, and the only way to absolutely guarantee that text formatting is preserved across editors is to encode it with dedicated elements.

I think MNX has similar XML tooling requirements to MusicXML

What are these tooling requirements?

@mdgood
Copy link

mdgood commented Dec 18, 2021

Sure, so don't ask your editor to reflow the XML. Most editors will format things just fine as you insert them without having to ask to reformat the whole file. Most MNX text editing will likely be like MusicXML text editing. That usually involves making small edits for testing and debugging new features in your app, or less commonly, doing some hand-editing for features you don't export.

The tooling requirements I had in mind are listed right after the sentence you quoted.

@notator
Copy link
Contributor Author

notator commented Dec 18, 2021

@mdgood said

In XML you have multi-line text blocks by including line breaks in the element text. If you want to indicate that applications don't strip them, you can add an xml:space="preserve" attribute to the element containing the text.

Its much easier to deal with <text-block> elements if applications do strip the (invisible) newlines and irrelevant whitespace, so I agree with @ahankinson about this.

The problem I had, when opening this issue, was that I didn't know how to define an element containing mixed text and elements in a schema. However, immediately after my last posting, I discovered the "mixed" attribute for the schema language's "complexType", and all was sweetness and light.

Our <text-block> elements should look and feel superficially like html, but only support a small subset of html's content elements. Initially, I'd include text, <i>, <em>, <span> and <br> elements, possibly adding <a> and other elements later.
(To be perfectly clear: This is a very simple solution, that does not use XHTML.)

A simple <text-block> would look like this:

<text-block>
    line1<br/>
    line2
</text-block>

A more complex one currently looks like this:

<text-block>
    A <em>complex</em> line one <i>might</i> look like this.<br/>
    But <span>this line is formatted with a span element.</span><br/>
    And this line inherits its formatting from its enclosing text-block element.
</text-block>

We need to discuss precisely which attributes <text-block> and <span> elements need to have.
@clnoel wants

  • a justification attribute on <text-block> (I'd call it align)
  • the ability to use custom entities such as &segno;
    @ahankinson mentioned these here. How could we define these globally for MNX?

@joeberkovitz envisions style, and possibly other properties for <text-block>...

I've now tested this <text-block> by adding it (temporarily?) to my draft schema, and a couple of times to file below (which is based on Hello World).
This file is validated, and can be parsed using my MNXtoSVG code. I have programmatic access to the separate lines of text (from which <br/> and <!-- --> comments have been removed), but have not yet used that information to create SVG.

helloTextBlock.mnx
<?xml version ="1.0"?>
<mnx xmlns="https://github.com/notator/mnx"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="https://github.com/notator/mnx ../schema/common1900.xsd">
    <global>
        <measure-global barline="regular">
            <directions-global>
                <time signature="4/4"/>
            </directions-global>
        </measure-global>
    </global>
    <part id="partID">
        <measure>
            <directions-part>
                <clef sign="G" line="2"/>

                <text-block>
                    line1<br />
                    line2<br />
                    <span>line3</span><br />
                    <span>line4</span><br />
                    line5 that contains whitespace.
                </text-block>

                <text-block id="textBoxID" class="textBoxClass" location="0.75" vPos="2" align="right">
                    <span>line1: Span Text</span><br />
                    <i>line2: Italic Text</i><br />
                    <em>line3: Emphasised Text</em><br />
                    <span>line4: More Span Text</span><br />
                    <a>line5: Link Text</a><br /> <!-- "a" could be extended to have the usual link syntax -->
                    line6<br />
                    <span>line7: Span Text</span><br />
                    <i>line8: Italic Text</i><br />
                    <em>line9: Emphasised Text</em><br />
                    <span>line10: More Span Text</span><br />
                    <a>line11: LinkText</a><br /> <!-- "a" could be extended to have the usual link syntax -->
                </text-block>
                
            </directions-part>
            <sequence>
                <event value="/1">
                    <note pitch="C4"/>
                </event>
            </sequence>
        </measure>
    </part>
</mnx>

@samuelbradshaw
Copy link

samuelbradshaw commented Dec 18, 2021

It sounds like there's consensus that multi-line text blocks are needed. The two potential solutions discussed above are 1) just using XML's built-in xml:space="preserve" and 2) using HTML-like syntax. Because using HTML syntax is a larger discussion than just supporting multiple lines, I opened a separate issue for that here: #280

I also agree that we need "style properties" for these text elements as mentioned by @joeberkovitz. Style properties could be broken out into two categories: Formatting (what the element itself looks like, including its size and padding) and positioning (how the element positions itself on the page relative to sibling elements). Should formatting and positioning be separate issues, or discussed here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants