Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are notes encoded in sounding or written pitch? #4

Closed
joeberkovitz opened this issue Apr 25, 2017 · 48 comments
Closed

Are notes encoded in sounding or written pitch? #4

joeberkovitz opened this issue Apr 25, 2017 · 48 comments
Milestone

Comments

@joeberkovitz
Copy link
Contributor

In MusicXML, all pitches are framed in terms of the instrument's transposed pitch. In MNX, we can re-examine this question.

Encoding in transposed pitches makes performance interpretation into more work since the instrument's transposition must be applied, and this transposition could even vary during the course of a piece.

It also makes dynamic display of concert pitch trickier, since assumptions must be made about enharmonic spelling and key signatures.

On the other hand, encoding transposed pitches will more accurately reflect a manuscript that has been composed in transposed form, and may ultimately be more robust in terms of delivering a definitive final instrumental part.

Other pros and cons need to be brought out also.

@mogenslundholm
Copy link

I feel I am not sure about the meaning of the words: Let say we have a pitch, that we measure by an oscilloscope or tuner when the note is played on the instrument, the real pitch - is this "in concert".? Anyway, I think the real pitch value should appear in MNX.
As I remember it - MusicXML is different, but there is also the symbol "Ottava" or "8va" and in this case MusicXML uses the real pitch values. Also different Clef-symbols will have same pitch-values, but the notes will look different on the paper.

@siennamw
Copy link

Perhaps this topic should be renamed, "Are pitches encoded in sounding or written pitch?" for clarity.

I can see advantages and disadvantages with both approaches. Perhaps both should be supported. Each part could be flagged as being encoded with either sounding or written pitches, with <part pitch-encoding='sounding'> or something similar.

@joeberkovitz joeberkovitz changed the title Are pitches framed in concert or transposed pitch? Are notes encoded in sounding or concert pitch? Apr 28, 2017
@joeberkovitz
Copy link
Contributor Author

I've renamed the topic for greater clarity. Noted that octave displacements play a role here as well.

@mdgood, can you chime in at some point with an explanation of the design rationales for MusicXML's pitch encoding so we have your background on this?

@joeberkovitz joeberkovitz changed the title Are notes encoded in sounding or concert pitch? Are notes encoded in sounding or written pitch? Apr 28, 2017
@mogenslundholm
Copy link

Two ways to write transposed pitch? No, I would not like that. This would double testing and increase risk of errors, because properly one of them will be used seldom and therefore not tested in practice. With only one way to do it, this way will be tested every time.
(For me the sound is the real pitch - but I can add an offset like with MusicXML)

@siennamw
Copy link

I'm still not sure that we're talking about the same thing here yet @mogenslundholm. The question is, how do we encode music for transposing instruments? When we write for a clarinet in B-flat, for example, if we want to hear a B-flat are we encoding B-flat (sounding pitch) or C (written pitch)?

@mogenslundholm
Copy link

I would prefer encoding sounding pitch (here B-flat) to be in MNX: <note pitch="Bb4"/>.
A command to transpose could be something similar to the MusicXML command:
<transpose><diatonic>1</diatonic><chromatic>2</chromatic></transpose>
This would inform the notation program to show a C(written pitch). I am looking at the Lilypond MusicXML test example "72a-TransposingInstruments.xml".
But I would definitely not like to have a choice - better a decision of what to use, sounding or written.

@mdgood
Copy link

mdgood commented May 1, 2017

One of the design principles of MusicXML is to primarily represent a written piece of music in a semantic way. This has two important advantages:

  • It provides a reference to musical artifacts that are directly relevant to musicians, independent of any particular application.
  • It makes it easy to check an encoding against a given written piece of music.

MusicXML thus represents written pitch, with a transpose element to convert transposing instruments into sounding pitch. Written pitch in this case is not necessarily the same as the position on a staff. A piece of music that looks like a middle C with an 8va line over it will be represented in octave 5, not octave 4. The octave-shift element represents the 8va line.

I think that design decision has worked very well and makes life much easier for anyone comparing a MusicXML encoding to a printed piece of music.

MusicXML's representation does have some issues in concert scores. Given MNX's desire to be able to make better use of a single file for score and parts, it would be good to have transposition information available in a concert score. Octave transpositions that are still used in concert scores are another issue, as discussed in MusicXML issue 39. I think we can resolve these issues in MNX while still retaining the great usability benefit of having MNX directly and semantically represent what the musician sees and understands.

I do agree with @mogenslundholm that either choice is far, far preferable to supporting both.

@webern
Copy link

webern commented May 18, 2017

I agree with Michael Good. My preference would be for MNX to represent the pitch as it appears on the page of music, which is the written pitch in the case of a transposed score and transposing instrument.

MNX should make these facts clear (that it is a transposing score and that the current location of the current staff/instrument is written in transposed pitch and that the transposition about is X).

@joeberkovitz
Copy link
Contributor Author

From chair call: perhaps we should explore how we can represent alternative spellings of keys and notes for the same part, so that this choice of transposed vs concert pitch is less weighty of a decision? (Or at least informs that choice.) This is now captured as #34

@joeberkovitz
Copy link
Contributor Author

From the chairs: along with #34 we want to move this issue into focus so that we resolve this important aspect of MNX-Common.

@clnoel
Copy link

clnoel commented Jun 21, 2018

Just to clarify the terminology and issue here, so I don't make a fool of myself.

Is the following case what we are talking about?

I have the following measure:
image

If I was notating this for a trumpet (a Bb instrument), I would instead notate like this:
image

We are trying to figure out whether we want to encode the trumpet part as it looks or how it sounds.
Looks: starts with a C5 event, and with a notation somewhere in the definition of the part that everything needs to be transposed down a whole-step when interpreting it for audio generation.
Sounds: starts with a Bb4 event, and with a change to the clef element to indicate that you put a Bb4 on the third space on the staff (with no flat).

Does that summarize the issue?

Edited for joeberkovitz's nitpick. Of course a Bb is a whole-step down from a C, not a half-step. Sigh.

@joeberkovitz
Copy link
Contributor Author

@clnoel That's an excellent summary. (Nit: I think you meant whole-step, not half-step).

@cecilios
Copy link

I can see advantages and disadvantages with both approaches, as it has been previously commented by others. But I think it is better to notate music in written pitch.

First, from a formal point of view, I consider written pitch more in line with the objective of capturing the semantics, as for me this implies to capture how the music is written using paper and pen. In my opinion the meaning of 'semantic' should not be changed in some particular cases, such as for the sound of transposing instruments. An analogy: imagine a text in English with some paragraphs in French. Using sounded pitch would be as encoding the English paragraphs using characters and the French paragraphs using phonetic symbols instead of characters.

And second, from a pragmatical point of view, dealing with transposing instruments will always require to transpose provided pitch, and it does not matter if the score is written in sounded pitch or in written pitch. This is because if the user changes the instrument for playback (i,e from trumpet in Bb to flute in concert pitch or to alto sax in Eb), the application will always have to re-compute pitches. So there is no gain encoding in sounded pitch, only in marginal cases in which changing instruments is not allowed. And using sounded pitch, will force to do more computations for properly displaying the score. On the contrary, using written pitch simplifies score display. And does not impose any penalty to playback, as playback will always have to deal with transposing instruments, as commented above.

Databases for music analysis or music search perhaps would benefit from using sounded pitch, but it is up to those applications to store music in the format more appropriate for the processing they will do.

@mdgood
Copy link

mdgood commented Jun 25, 2018

Just to clarify @clnoel's summary: no matter which approach we use, the transposition data needs to include both the number of diatonic steps and the number of chromatic steps. MusicXML represents these directly but there are other possible representations that are equivalent. The important thing is that the number of chromatic steps by itself is not sufficient for notation applications. The Bb transposition for instance is one diatonic step and two chromatic steps.

@joeberkovitz
Copy link
Contributor Author

I've been moving towards a unified approach to this and a variety of other issues; please let's carry the discussion to #138 (at least for now) and see if that proposal will work! Thanks.

@adrianholovaty
Copy link
Contributor

After a lengthy discussion at Musikmesse 2018 in Frankfurt, we've decided on storing written pitch. (Meeting minutes and video here, with discussion starting around 1:05:20 in the video.)

I'll put a pull request together momentarily with small changes to the spec to clarify this. Note, though, that there's still ambiguity — specifically about how 8va/8vb markings are taken into account.

@shoogle
Copy link

shoogle commented Apr 20, 2019

It's good that a decision has been reached. Written pitch certainly has its advantages, but so too does sounding pitch, particularly for new compositions. The group may have decided on written pitch, but as I noted in #138, there is a workaround for people who prefer sounding pitch: simply write your scores in concert pitch where written pitch and sounding pitch are the same.

@shoogle
Copy link

shoogle commented Apr 20, 2019

As for the matter of ottava lines, I think that notes under these markings should definitely be notated at sounding pitch. This is because:

  • Notes are shifted by an entire octave, so arguments about transposition and information loss do not apply.
  • Ottava lines can differ between editions, and even between the parts and the score within the same edition.

As such, I would consider ottava lines to be "presentational" rather than "semantic". Furthermore, the fact that ottava lines can differ between editions opens up the possibility that somebody may wish to store multiple editions in a single file (i.e. multiple layouts):

<ottava-start id="1" octaves="+1" editions="smith1837,brown1972"/>
<note pitch="C6"/>
<ottava-start id="2" octaves="+1" editions="jones1954"/>
<note pitch="D6"/>
<ottava-stop id="1"/>
<ottava-stop id="2"/>

Notice how the ottava starts at different places in different editions, yet the notes only had to be specified once. This only possible if the notes under the ottavas are stored at sounding pitch, which is the same in all editions, rather than at written pitch, which can vary between editions.

Now applications can give the user the option to switch between layouts of different historic editions, or to define a new layout of their own. Ottavas from other editions would be ignored for the purposes of rendering the score, but would be maintained on saving and loading to ensure all information is preserved.

Encoding multiple layouts may not be something we want to include in version 1 of MNX, but it would be wrong to preclude it as a possibility if there are no other reasons to prefer sounding or written pitch under ottava lines.

@shoogle
Copy link

shoogle commented Apr 20, 2019

As always in music notation, there are some special cases:

  1. Ottavas that apply to notes in one voice but not the other.
  2. Ottavas on a shared line that apply to some instruments and not others.
  3. Optional ottava markings.

Regarding (1), I've never seen an example of this, but I have seen clefs that apply to one voice only, so it probably occurs somewhere. The problem of deciding which notes are affected by the ottava is the same regardless of whether the notes are stored at written or sounding pitch, so (1) is not relevant to this discussion.

Re. (2), I seem to remember seeing this on some vocal scores when men and women are written on the same staff. Sometimes the phrase "men sing 8vb" might be given as an indication that, while the lines are written in unison, men and women are supposed to sing in their respective octaves. However, most choirs would interpret the line this way anyway, even if the instruction was missing or it said "unison"!

For singers, the marking "men sing 8vb" is arguably non-semantic, but if it was for instruments (e.g. "flute plays as written, oboe 8vb") then it would be more important. The lines share written rather than sounding pitch, so it is written pitch that must be recorded if you want to avoid writing the lines twice. However, I can't recall ever seeing an example like that, and if any exist then they probably belong to a wider discussion about how to handle parts and shared staves. (Perhaps this kind of split should be a property of the clef rather than an ottava line.)

Re. (3), this also occurs most often in vocal scores. You sometimes see something along the lines of:

  • "8va if possible"
    • i.e. sing this section an octave above written pitch (but only if you can reach that high)
  • "8va if necessary"
    • i.e. sing this at written pitch (unless you can't sing that low in which case sing in the octave above)

In these situations the performer (and therefore the playback algorithm) has a decision to make, but there is usually a preferred option that is either specified by the composer or has become established by convention. Whether the notes should be stored at sounding or at written pitch depends on whether the convention is to play the ottava or to ignore it.

  • In the "8va if possible" case, the default option is to apply the ottava and modify the pitches. This is essentially just an ordinary ottava, so notes can be safely stored at sounding pitch (i.e. up the octave).
  • In the "8va if necessary" case, the default option is to ignore the ottava and sing at written pitch. In this case the ottava is more like an annotation rather than a semantic marking, so it should be marked as such, and the notes stored at written pitch (which in this case is equal to the default sounding pitch).

@notator
Copy link
Contributor

notator commented Apr 25, 2019

Since this posting would otherwise be far too long, I am splitting it into 3 parts:

  1. This summary of what what @clnoel and I are currently thinking
  2. A discussion about <directions> (including the discussion about 8vas that has begun above)
  3. A discussion about the pitch attribute and accidentals (which also belongs to the discussion around MNX-Common storing written information).

I'm providing the following summary because this thread is a continuation of #138, that thread is very long, and the current draft spec no longer reflects the current state of the debate.
If it proves impossible to keep the current draft spec up to date (its very difficult to edit), perhaps I should keep updates of this summary in an up-to-date pull-request? Has anyone got a better suggestion for how to keep track of these discussions?

Summary of what @clnoel and I are currently thinking (25.04.2019)

(@clnoel - please correct me if you disagree with any of this.)

A <note>'s pitch attribute is compulsory, and very strictly contains only graphical information. In other words, it describes what is written, i.e. how the symbol looks in the printed score.
(This formulation complies with the co-chair's decision as described above, but that decision still needs to be clarified.)

<note> is going to have a separate (optional) sounding attribute that, if present, determines its frequency. The frequency information can't be located in the (graphic) pitch information since the <note>'s frequency usually depends on external state information (the parser's current key, 8va, transposition, measure states etc.). The <note>'s sounding attribute, if it exists, overrides the frequency calculated using the graphic information in the pitch attribute and the current total "transposition state" maintained by the parser.
This simplifies things considerably, since the pitch attribute no longer has to contain both graphical and temporal information (as in §5.2.2.4 of the current spec and in #19's opening comment).

<transposition> is a new, optional <direction> element whose values are limited to whole semitones. The current <transposition> state will be taken into account by the parser when calculating a frequency from a <note>'s pitch graphics.

@clnoel
Copy link

clnoel commented May 2, 2019

There is more than one remaining question, at least in my mind. I want to clarify it here.

Here is the sample note that I am working off of:
image

There are two ambiguities here.
A) How do we represent the octave for that note (with or without octava)
B) How do we represent the accidental for that note (with or without the key signature's flat)

Given that there are generally two parser types (audio and graphical), here are the options for the spelling of the written pitch:
1: "E5" (audio parser must know about key signature and octava, but graphical parser is exact.)
2: "E4" (audio parser must know about key signature, and a graphical parser must know about the octava to properly place the notehead.)
3: "Eb5" (audio parser must know about octava, and a graphical parser must know about key-signature to omit the b (or we have an empty "accidental" property, and the graphical parser uses that).
4: "Eb4" (audio parser is exact, and a graphical parser must know about both key-signature and the octava (or, as 3, uses an empty "accidental" property).

Since we are choosing to represent written-pitch, at least partially because of the difficulties of correctly spelling pitches, I feel we should be leaning toward option 1, where we represent the graphical look of the pitch. Because if we start changing the pitch-spelling from what you see in the original graphics, where do we draw the line?

However, as I look through the philosophical ideals of MNX, I see that we want to represent the note not necessarily as just a graphical object, but also as a concept. From this standpoint, I can understand if we don't want to go that route. I wanted to present the options and their effect, to highlight what I see as the remaining issues.

--Christina

@notator
Copy link
Contributor

notator commented May 2, 2019

@clnoel Welcome back! I like the audio parser / graphics parser terminology very much!
Maybe we should just forget about the philosophy, and get on with the practical details! :-)

Yes, I also want

1: "E5" (audio parser must know about key signature and octava, but graphical parser is exact.)

The audio parser also has to know if this is a transposing instrument. Adding "in F" to your example should mean that the audio ends up another 7 semitones lower. Others note: In this case there is no ambiguity about the pitch -- as would be the case if we were trying to transpose the graphics.

Here are some thoughts about defining the meaning of (non-standard) accidentals. These illustrate, I think, a further advantage of adopting the proposal that is being supported by @clnoel and myself, rather than continuing to use MusicXML's approach (as described in the current draft spec, and proposed by the co-chair in PR #152)
In the above posting, I simply replaced the pitch attribute is by a head attribute whose value (in combination with the current clef state) defines the vertical position of the notehead on the staff. The optional accidental attribute is separate, and can define a non-standard accidental. There is also an optional sounding attribute, that overrides the <direction> attributes being read by the audio parser.

As in @clnoel's example, these attributes can be used in combination with the current total transposition state (defined in the <direction> elements) to calculate the <note>'s sounding pitch.
Apart from the <direction> elements, the above example would simply be

<note head="E5" />

The audio parser would lower the pitch by 1 semitone for the Eb in the key signature and by one octave for the 8vab.

If a forced natural was needed on the note, it would be encoded like this:

<note head="E5" accidental="n" />

Such an accidental would override the key-signature.

<note> also has an optional sounding attribute, whose value is a (midi.cent) frequency that overrides the frequency otherwise calculated by the audio parser.

<note head="E5" sounding="76.2" />      <!-- 76 is the midi code for E5 -->

Is a note that looks like @clnoel's example (with key signature and 8vab) but sounds like a rather sharp E5 (overriding the key-signature and 8vab).

Since the accidental attribute is separate, and parsed by the audio parser, it becomes possible to define the contribution of each accidental used in the file to the total transposition state.
The default "cent-offset" values to be added to the current transposition state for each of the standard accidentals would be:

accidental cent-offset
sharp 100
flat -100
natural 0
double-sharp 200
double-flat -200

It would be straightforward to redefine these values and/or define values for other, non-standard accidentals used in the file, in a table somewhere earlier in the file. For example, some of the Stockhausen accidentals could be defined like this

accidental name unicode cent-offset
raised U+ED50 25
flatRaised U+ED52 -75
threeQuartersSharp U+ED5A 150

The accidentals in the key signature in @mogenslundholm's example could also be coded in this way.
Notes that have unique tunings can still be defined using a sounding attribute.

@mdgood
Copy link

mdgood commented May 2, 2019

@clnoel, the spec states that the pitch attribute is "the musical pitch of the note". This corresponds to choice 4, Eb4. This is similar to how MusicXML works, which has been successful. Some of your other choices are reminiscent of how NIFF worked. I found that most NIFF parsers got pitch wrong due to thinking of pitch graphically rather than semantically.

I think there's room for discussion of a separate issue about the exact details of the accidental attribute. The current draft spec carries forward one of MusicXML's more error-prone design decisions, so it would be great to fix this in MNX-Common. I have started a new issue #153 for this.

@notator, this issue is only about written vs sounding pitch, not for alternate syntaxes for pitch representation. The basics of pitch representation syntax have already been decided. There are separate issues for the accidental attribute (#153) and using cents instead of decimal fractions of a semitone (#19).

@notator
Copy link
Contributor

notator commented May 3, 2019

@mdgood, As I think you said somewhere, we are currently using issues to record ideas. This seemed the best place to record an idea that follows from the proposal I'm currently backing, and which I think shows that it is superior to MusicXML's approach.

The basics of pitch representation syntax have already been decided.

I'm not so sure about that. We haven't decided PR #152 yet, so we don't know if §5.2.6.1 The <note> element is going to stay the way it is.

Thanks for opening #153, and mentioning #19. Maybe we should agree about PR #152 before continuing with those, otherwise we will be talking at cross-purposes.

@dspreadbury
Copy link
Contributor

I think we can say with some degree of certainty that the element will stay largely as it is, notwithstanding the issue of how the sounding pitch should be encoded, because the co-chairs are not convinced of the benefits of any of the alternative approaches that have been proposed.

I can't be persuaded to agree with either of @clnoel's first two proposals. I could be persuaded either way about whether the note in her example should be Eb5 or Eb4. In Dorico, for example, the note's pitch is not altered by the octave line/instruction: only its staff position. I believe that octave lines, like clefs, are presentational: they do not change the pitch of the note, only where on the staff it is displayed. For that reason I would be inclined not to include the effect of the octave line/instruction on the pitch of the note, but, as I say, I could be persuaded either way.

@notator
Copy link
Contributor

notator commented May 3, 2019

@dspreadbury
(Please don't get upset by my use of "head" rather than "pitch" in the following examples. We are discussing two proposals, and are still working on PR #152.)

As I said above,

(Some simple accidentals could be included as shortcuts in the head string later, but lets leave them out for the moment.)

So I wouldn't mind using

<note head="Eb5" />

to mean the same thing as

<note head="E5" accidental="b" />

Its just that I think MNX-Common also needs a consistent mechanism for dealing with non-standard accidentals. (#153 needs to be addressed later.)

In Dorico, for example...

Just a remark: I think the code in Dorico's implementation of CWMN is important in deciding what CWMN is, and that there must be an unambiguous way for Dorico to import/export MNX-Common files. But that does not mean that MNX-Common has to mirror Dorico's internal code structures exactly. Different applications will use different approaches to implementing CWMN, so they can't all use exactly the same approaches as MNX-Common.

I would be inclined not to include the effect of the octave line/instruction on the pitch of the note

Am I right in thinking that you are inclining to agree with me that Eb5 is correct? Nice that even if not, you might be persuaded otherwise. :-)

All the best,
James

@dspreadbury
Copy link
Contributor

dspreadbury commented May 3, 2019

Just a remark: stating "In Dorico, for example" does not imply that I believe MNX-Common should mirror Dorico's approach at all. I believe your command of the English language is more than good enough to understand the idiom "for example", so please don't put words into my mouth.

@adrianholovaty
Copy link
Contributor

@clnoel Thanks for spelling out those four options — it's quite useful for focusing the discussion!

I vote for the fourth option (Eb4), as it strikes the right balance between semantics and presentation. This is what my pull request in #152 is intended to communicate.

I agree with @dspreadbury that ottavas and clefs are more presentational than semantic. This is nicely captured by @shoogle in his comment above pointing out different editions have different ottava decisions for the same underlying music.

A note's accidental is both semantic and presentational: it fundamentally affects the meaning of the pitch itself (e.g., Eb4 is a different pitch than E4), but its graphical display is a presentational decision (e.g., does this note have a accidental visually rendered next to it? See issue #153).

Finally, a subtle note about the meaning of "written pitch." @clnoel said in the comment above that she leans toward option 1 because it reflects the written pitch and we'd decided written pitch is the way forward. I'd like to attempt to clarify our discussion by separating the vague concept of "written pitch" into two concepts:

  • Literal Written Pitch: the literal note position as graphically rendered, which only has an accidental if the graphical note has a visible accidental rendered next to it. In @clnoel's example, this would be "E5" (option 1).

  • Performer-Centric Written Pitch: the note as conceived by the musician playing it. In @clnoel's example, this would be "Eb4" (option 4).

(These terms were completely made up by me, on the spot, and they're serving only for the purposes of clarifying this discussion thread. :-) )

I believe MNX-Common should use Performer-Centric Written Pitch. The pull request in #152 attempts to codify that, via its definition of the term "written pitch," and I'm very interested to get feedback on whether the definition is clear and unambiguous enough. It defines it in terms of the pitch generated by an concert-pitch instrument playing it, which may or may not be a good approach.

@notator
Copy link
Contributor

notator commented May 3, 2019

@dspreadbury

please don't put words into my mouth

I was just amplifying/agreeing with what you said. It would help if you didn't always, by default, bite my head off.

@notator
Copy link
Contributor

notator commented May 3, 2019

@adrianholovaty

...a subtle note about the meaning of "written pitch." @clnoel said in the comment above that she leans toward option 1 because it reflects the written pitch and we'd decided written pitch is the way forward.

I also misunderstood the co-chair's decision in the same way in the above posting where I said

(This formulation complies with the co-chair's decision as described above, but that decision still needs to be clarified.).

Which only goes to show how important it is to use a precise terminology whose meaning we all agree about.

I'm trying very hard to be constructive here, but I have to say that I think its a mistake to try to encapsulate what the performer may be thinking in a file format that is going to be read by programmers and machines.

As a programmer, I would prefer the local XML (i.e. the <note> definition) to clearly describe what I'm expecting to see in the printed score (graphics). I don't want to have to look through the file to analyse the current graphic state (clefs, 8vas, transposition instructions etc.) in order to know what to write in the <note> definition in order to get a particular result in the printed score. If I write an E5 in the <note> definition, I would like to see an E5 in the printed score, regardless of the clef, 8va signs etc.
A performer reads a printed score, not the XML, and the printed score contains all the clefs, 8va signs and transposition information that allow him/her to infer which audio pitch to play.

Hope that helps.

@kepper
Copy link

kepper commented May 3, 2019

FWIW, and mostly because @bhamblok requested so, here's how MEI deals with notes. It can be used in multiple ways, it's focussed on the visual representation (written pitch). However, sounding pitch is available as well. The situation from above could be encoded in multiple ways, but no matter how it would be encoded, there is no way to misunderstand the encoding:

<note pname="c" oct="5" dur="4"/>
This would be a written C5 quarter (no matter how it sounds).

<note pname.ges="c" oct.ges="5" dur.ges="4"/>
This would sound like a C5 quarter (no matter how it's written). .ges stands for gestural domain, i.e. sound.

<note pname="c" pname.ges="b" oct="5" oct.ges="4" accid.ges="f" dur="4"/>
This would be a written C5, which sounds like a Bb4.

An encoding may decide to not provide one or the other (actually, it could go without any of those, but that's a different story). In that case, it's very often still possible to infer the missing information from key signatures, information about transposing instruments or other places, but that may require more processing than every application would be willing to invest, simply because it would be out of scope.

I'm not trying to advertise anything, I just want to make some other's conclusions available to this thread.

@notator
Copy link
Contributor

notator commented May 4, 2019

Further to my previous posting, I'd like to walk through the XML-programming scenario in a little more detail, to convince others (and myself) that it is sensible, and really works, and that there are no hidden problems. (If anyone can find a problem, I'm all ears.)
The MEI approach seems overly complicated by comparison.
This may mean some repetition of earlier info, but a recap that isn't necessarily a bad thing...
(I'll keep using head rather than pitch as the name of the note attribute, but that is irrelevant here, so don't let it disturb anyone.)


We are writing XML code for an 8-measure score with one staff (one player).
There is an ordinary treble clef at the start of measure 1, and a <note> in measure 4, defined as
<note head="C4" />
In measure 4, the graphics parser will interpret that to mean
image
The audio parser ignores the clef, and uses the <note>'s default frequency. The default frequency for C4 is 60 in midi.cent units.

Note 1) that if there is lots of code in measures 1, 2 and 3, the clef definition could be a very long way from the note definition in the XML, and 2) that the clef can be changed at will without changing the frequency calculated by the audio parser.
Changing the clef tells the graphics parser to render the C4 notehead in the following ways:
image
The <note>'s head attribute can be changed to something else (e.g. "A3") independently of the active clef. The graphics parser takes care of the details when creating a printout. The usual situation is that the current clef would be changed (e.g. if there were too many ledgerlines), by someone editing the printed score using a score editor (i.e. not editing the XML directly). But it should be very easy for a programmer debugging an MNX-Common reader or writer, to find the relevant information in the XML.

Note that the graphics parser and audio parser have completely separate responsibilities. The graphics parser generates graphics (in space), the audio parser generates audio (in time). They both use <direction> and <note> information, but in different ways.

Lets now add an <ottava-start> <direction> in measure 2 and an <ottava-end> <direction> attribute in measure 6.
The graphics parser will draw the appropriate "8va" text, dotted line and end mark.
The audio parser will add 12 to the (midi.cent) frequency it is currently using for all the notes in the 8va scope. The frequency for any C4 in scope (default value midi.cent 60) becomes midi.cent 72.
Note that the 8va <direction> can be added (or removed) without looking at, or changing, any of the current <note> definitions, and that any of the <note> definitions can be changed without looking for <directions> that may be a long way away in the XML.
Similarly for <transposition-start> and <transposition-end> <direction>s.
Such a <direction> has two parts: the graphics (e.g. the string "in F" ) and the audio increment (which would be -7 midi.cents for an instrument playing in F.
If such a <direction> were to be added in measure 1, the frequency of the note in measure 4 would become 72 - 7 (=65).

I think it would be extremely complicated, by comparison, to have to change all the <note> definitions when adding an 8va <direction> (as currently proposed by the co-chair). That would not only be more work for the XML-writing software, it would also mean that the graphics parser would have to keep track of the audio parser. Things stay much simpler if their domains are kept completely separate.

@lpugin
Copy link

lpugin commented May 4, 2019

I am not sure I understand what you mean when saying that the MEI approach is overly complicated. In any case, if you are looking at a XML-programming scenario, you will probably not do head="C4", that involves the attribute value to be parsed outside the XML parsor, but rather (taking MEI as example) pname="c" oct="5". It happens that this also have the advantage that, in a case of octava, you can specify the sounding octave with oct.ges="4" while the pitch name remains the same.

@notator
Copy link
Contributor

notator commented May 5, 2019

@lpugin
Welcome back! :-)

I am not sure I understand what you mean when saying that the MEI approach is overly complicated.

I meant that distinguishing clearly between the graphics parser and the audio parser makes having separate attributes for the graphics and audio unnecessary. In this proposal, the XML is designed so that it can be parsed in either way. Applications that are not interested in audio can just parse the graphics, and vice-versa. Apps that want to parse both can, of course, do so -- and the two domains will be automatically synchronised.

head="C4" could, of course, be split into pname="c" and oct="4", but I think that would be both unnecessary and confusing here.
In this proposal, "C4" is an attribute that can be interpreted either as a graphic (using the current clef), or as providing a default frequency for the audio parser.
The values taken by the head attribute are defined in Scientific Pitch Notation to have particular (default) frequencies, so I don't think we need to have a separate oct attribute. (I think oct is an MEI implementation detail).
Using pname (short for "pitch name" ?) would be confusing because the pitch (=audio frequency) is context-dependent, and actually completely independent of the value of this attribute. In the current proposal, <note> even has an optional sounding parameter that completely overrides the current context, and can have arbitrary midi.cent frequency values.

@clnoel
Copy link

clnoel commented May 8, 2019

Technically, the "written pitch" is an image on a page, which I think we all agree is way too far towards the graphical side of things. We have already decided that performed frequency (that takes everything into account all the way through transposing instruments and unwritten microtones) is way too far toward the audible side of things, and put that in a separate optional property (sounding pitch).

The question we are addressing here is: Where is the line that establishes enough semantic value to make both a graphical and an audible representation viable (assuming no sounding pitch is specified)?

I would also like to point out that the difficulty of establishing "pitch spelling" (the set of accidentals displayed in the graphics) is one of the reasons we decided to move away from sounding pitch in the first place.

I've been thinking about this a lot since I last commented with the set of options above. I've talked about it with my colleagues here, and I now think I'm actually leaning toward using "E4" (The second option).

With this option, the key-signature has semantic meaning: it is necessary to the audio-parser, which defaults to it if there is nothing in the accidental property. It makes the discussion about how to do pitch-spellings simpler, because the "spelling" part goes in the accidental property, never in the base-pitch.

It also makes the ottava have semantic meaning, which is programmatically equivalent to an intervening clef-change, and changes where the graphical display of the note goes, while being ignorable for an audio parser.

I completely understand that this is a kind of half-and-half representation. I feel that that kind of half-and-half representation is necessary now that we have decided not to go all the way graphical or all the way audible. I acknowledge that this might make some analyses harder, because the fact that it is an Eb, not an E, would need to be figured in by using the accidental. However, given the difficulties in correctly specifing, (e.g.) Ebb4, by putting the "bb" in both the pitch and the accidental properties, I think this also provides less duplication!

--Christina

@bhamblok
Copy link

bhamblok commented May 9, 2019

Sorry, I don't agree. An E is an E and an Eb is an Eb. Can you elaborate with some examples how you would encode two sequential E flats (in a key of C Major) where the second one doesn't need the accidental to be shown? It would be really confusing if they are encoded in a different way.

I think "semantics" are utmost superior to written and/or sounding properties.

@notator
Copy link
Contributor

notator commented May 9, 2019

@clnoel and @bhamblok
I'm still sitting on the fence about what the head attribute should contain, and don't really want to discuss #153 until the PR in #152 has been resolved. That's so that we know which proposal we are talking about, and don't get confused again. Will we be talking about the proposal in the current spec, or the double-parser proposal?

However: :-)
The value of the head attribute could be understood as including the accidental.

<note head="C#4" />

would mean that the graphics parser would write a C4 notehead preceded by a # accidental.
The audio parser would interpret that as the default frequency for a C#4.
Currently (in Scientific pitch notation) we only have default frequencies defined for noteheads that have no accidentals, but it would be very easy to extend that to define the default frequencies of the noteheads that do have (standard) accidentals. The value of the head attribute is only the name of a bit of graphics. We just have to decide (in #153) whether or not it includes an accidental.

@adrianholovaty
Copy link
Contributor

@clnoel wrote:

I've been thinking about this a lot since I last commented with the set of options above. I've talked about it with my colleagues here, and I now think I'm actually leaning toward using "E4" (The second option).

I don't think I can ever be convinced of options 1 or 2. :-/ An E-flat is not an E. In my view, this doesn't pass a baseline test of "is this note represented semantically?"

Options 1 and 2 require too much knowledge of state (the key signature), for something too important to mess up (the pitch).

It makes the discussion about how to do pitch-spellings simpler, because the "spelling" part goes in the accidental property, never in the base-pitch.

But the base-pitch is part of the spelling, no? Consider G-flat vs. F-sharp. The spelling difference between those two notes exists in the accidental and the base-pitch.

@notator
Copy link
Contributor

notator commented May 9, 2019

"Semantic" is a tricky word...
A note element in the XML actually has two meanings, a graphic meaning and an audio meaning.
(I still don't want to talk about the way accidentals are handled until PR #152 has been resolved.)

@shoogle
Copy link

shoogle commented May 10, 2019

I like Option 4.

Options 1 and 2 require too much knowledge of state (the key signature), for something too important to mess up (the pitch).

Agreed. They also require an assumption that accidentals remain in effect until the end of the measure, or until superceeded by a different accidental. While this is true for most sheet music, Gould mentions (I forget the page number) that other conventions have existed, such as requiring accidentals to be explicitly stated (i.e. any note without an accidental is a natural). This kind of music can be encoded by Options 3 or 4 but not by 1 or 2 (at least not without risking incorrect playback).

Option 1 or 2 would make sence for OMR, but for pretty much any other use-case Option 3 or 4 is a better choice.

@cecilios
Copy link

Sorry for this long post. It is difficult for me to express my ideas in English and this results in a longer text. Sorry!

In the beginning, more or less we all assumed that MNX would follow MusicXML for representing pitch, as no issues were raised with MusicXML pitch representation.

Later, an important question was raised: the issue of what to do for transposing instruments. This introduced the concept of written pitch vs. sounding pitch. But in any case, when the issue was raised, the meaning of 'written pitch' and 'sounding pitch' was basically:

  • written pitch: what MusicXML uses
  • sounded pitch:
    a) for not transposing instruments: the same as written pitch, what MusicXML uses
    b) for transposing instruments: apply the transposition and use written pitch

After some argumentation it was clear that written pitch (what MusicXML uses) is the most practical. This should have been closed this issue, so that we can proceed with other work.

Unfortunately, the words 'written pitch' and 'sounding pitch' are open to interpretation, and the Pandora box was open if we interpret those words differently. And IMO this is current situation: a lot of different proposals trying to solve non known problems of MusicXML approach.

Music is sound. And for more than ten centuries people has being trying to represent music with symbols. The music score is the best system found for this. So, now we are trying to represent the music score (not its graphical appearance but its content) but using 'computer symbols'. To me the best approach is to mimic the music score (the best known system to represent music, apart from audio recordings). The notes are represented by a notehead placed on a staff and the sound (pitch) is implied by many other symbols: the notehead position on the staff, the clef, the accidentals, the 8va marks, etc.

To me, when we talk about 'written pitch' I understand 'notehead position on the staff' and the simplest way of expressing this location is by the 'displayed pitch' (this is basically what MusicXML uses). So in @clnoel example, notehead position is E5 (or Eb5 -- more on this later --). To me the current problems arise when we compare this written pitch with the sounding pitch, as in this example they are different. But the problem disappears if we return to the idea of understanding 'written pitch' not as pitch but as 'position on the staff'. So E5 is not a pitch but a reference to notehead position: 'notehead on fourth space'. That is MusicXML understanding and that is what I propose to follow. It does not give preference to 'sound' parsers nor to 'graphical parses. It is just a way of expressing were the notehead is placed on the staff.

Now to the issue of E5 vs Eb5. MusicXML takes into account applicable accidentals and would use Eb5, I assume that this decision was taken to simplify having to track applicable accidentals. For long time, in my applications I choose the opposite approach, use E5 (as if a was writing the score with pen and paper) and force the application to compute applicable accidentals. In my many years experience, I found that both systems work well and no special problems arise with any of them. But I have found MusicXML system (Eb5) better than my application system (E5), as it simplifies the algorithms for preserving displayed accidentals when a transposition is applied.

So, my vote is for current MusicXML approach, Eb5, option 3. Although option 1, E5, would also be acceptable to me.

Hope this helps!

@mdgood
Copy link

mdgood commented May 14, 2019

Closed with #152.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests