New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are notes encoded in sounding or written pitch? #4
Comments
I feel I am not sure about the meaning of the words: Let say we have a pitch, that we measure by an oscilloscope or tuner when the note is played on the instrument, the real pitch - is this "in concert".? Anyway, I think the real pitch value should appear in MNX. |
Perhaps this topic should be renamed, "Are pitches encoded in sounding or written pitch?" for clarity. I can see advantages and disadvantages with both approaches. Perhaps both should be supported. Each part could be flagged as being encoded with either sounding or written pitches, with |
I've renamed the topic for greater clarity. Noted that octave displacements play a role here as well. @mdgood, can you chime in at some point with an explanation of the design rationales for MusicXML's pitch encoding so we have your background on this? |
Two ways to write transposed pitch? No, I would not like that. This would double testing and increase risk of errors, because properly one of them will be used seldom and therefore not tested in practice. With only one way to do it, this way will be tested every time. |
I'm still not sure that we're talking about the same thing here yet @mogenslundholm. The question is, how do we encode music for transposing instruments? When we write for a clarinet in B-flat, for example, if we want to hear a B-flat are we encoding B-flat (sounding pitch) or C (written pitch)? |
I would prefer encoding sounding pitch (here B-flat) to be in MNX: <note pitch="Bb4"/>. |
One of the design principles of MusicXML is to primarily represent a written piece of music in a semantic way. This has two important advantages:
MusicXML thus represents written pitch, with a transpose element to convert transposing instruments into sounding pitch. Written pitch in this case is not necessarily the same as the position on a staff. A piece of music that looks like a middle C with an 8va line over it will be represented in octave 5, not octave 4. The octave-shift element represents the 8va line. I think that design decision has worked very well and makes life much easier for anyone comparing a MusicXML encoding to a printed piece of music. MusicXML's representation does have some issues in concert scores. Given MNX's desire to be able to make better use of a single file for score and parts, it would be good to have transposition information available in a concert score. Octave transpositions that are still used in concert scores are another issue, as discussed in MusicXML issue 39. I think we can resolve these issues in MNX while still retaining the great usability benefit of having MNX directly and semantically represent what the musician sees and understands. I do agree with @mogenslundholm that either choice is far, far preferable to supporting both. |
I agree with Michael Good. My preference would be for MNX to represent the pitch as it appears on the page of music, which is the written pitch in the case of a transposed score and transposing instrument. MNX should make these facts clear (that it is a transposing score and that the current location of the current staff/instrument is written in transposed pitch and that the transposition about is X). |
From chair call: perhaps we should explore how we can represent alternative spellings of keys and notes for the same part, so that this choice of transposed vs concert pitch is less weighty of a decision? (Or at least informs that choice.) This is now captured as #34 |
From the chairs: along with #34 we want to move this issue into focus so that we resolve this important aspect of MNX-Common. |
@clnoel That's an excellent summary. (Nit: I think you meant whole-step, not half-step). |
I can see advantages and disadvantages with both approaches, as it has been previously commented by others. But I think it is better to notate music in written pitch. First, from a formal point of view, I consider written pitch more in line with the objective of capturing the semantics, as for me this implies to capture how the music is written using paper and pen. In my opinion the meaning of 'semantic' should not be changed in some particular cases, such as for the sound of transposing instruments. An analogy: imagine a text in English with some paragraphs in French. Using sounded pitch would be as encoding the English paragraphs using characters and the French paragraphs using phonetic symbols instead of characters. And second, from a pragmatical point of view, dealing with transposing instruments will always require to transpose provided pitch, and it does not matter if the score is written in sounded pitch or in written pitch. This is because if the user changes the instrument for playback (i,e from trumpet in Bb to flute in concert pitch or to alto sax in Eb), the application will always have to re-compute pitches. So there is no gain encoding in sounded pitch, only in marginal cases in which changing instruments is not allowed. And using sounded pitch, will force to do more computations for properly displaying the score. On the contrary, using written pitch simplifies score display. And does not impose any penalty to playback, as playback will always have to deal with transposing instruments, as commented above. Databases for music analysis or music search perhaps would benefit from using sounded pitch, but it is up to those applications to store music in the format more appropriate for the processing they will do. |
Just to clarify @clnoel's summary: no matter which approach we use, the transposition data needs to include both the number of diatonic steps and the number of chromatic steps. MusicXML represents these directly but there are other possible representations that are equivalent. The important thing is that the number of chromatic steps by itself is not sufficient for notation applications. The Bb transposition for instance is one diatonic step and two chromatic steps. |
I've been moving towards a unified approach to this and a variety of other issues; please let's carry the discussion to #138 (at least for now) and see if that proposal will work! Thanks. |
After a lengthy discussion at Musikmesse 2018 in Frankfurt, we've decided on storing written pitch. (Meeting minutes and video here, with discussion starting around 1:05:20 in the video.) I'll put a pull request together momentarily with small changes to the spec to clarify this. Note, though, that there's still ambiguity — specifically about how 8va/8vb markings are taken into account. |
It's good that a decision has been reached. Written pitch certainly has its advantages, but so too does sounding pitch, particularly for new compositions. The group may have decided on written pitch, but as I noted in #138, there is a workaround for people who prefer sounding pitch: simply write your scores in concert pitch where written pitch and sounding pitch are the same. |
As for the matter of ottava lines, I think that notes under these markings should definitely be notated at sounding pitch. This is because:
As such, I would consider ottava lines to be "presentational" rather than "semantic". Furthermore, the fact that ottava lines can differ between editions opens up the possibility that somebody may wish to store multiple editions in a single file (i.e. multiple layouts): <ottava-start id="1" octaves="+1" editions="smith1837,brown1972"/>
<note pitch="C6"/>
<ottava-start id="2" octaves="+1" editions="jones1954"/>
<note pitch="D6"/>
<ottava-stop id="1"/>
<ottava-stop id="2"/> Notice how the ottava starts at different places in different editions, yet the notes only had to be specified once. This only possible if the notes under the ottavas are stored at sounding pitch, which is the same in all editions, rather than at written pitch, which can vary between editions. Now applications can give the user the option to switch between layouts of different historic editions, or to define a new layout of their own. Ottavas from other editions would be ignored for the purposes of rendering the score, but would be maintained on saving and loading to ensure all information is preserved. Encoding multiple layouts may not be something we want to include in version 1 of MNX, but it would be wrong to preclude it as a possibility if there are no other reasons to prefer sounding or written pitch under ottava lines. |
As always in music notation, there are some special cases:
Regarding (1), I've never seen an example of this, but I have seen clefs that apply to one voice only, so it probably occurs somewhere. The problem of deciding which notes are affected by the ottava is the same regardless of whether the notes are stored at written or sounding pitch, so (1) is not relevant to this discussion. Re. (2), I seem to remember seeing this on some vocal scores when men and women are written on the same staff. Sometimes the phrase "men sing 8vb" might be given as an indication that, while the lines are written in unison, men and women are supposed to sing in their respective octaves. However, most choirs would interpret the line this way anyway, even if the instruction was missing or it said "unison"! For singers, the marking "men sing 8vb" is arguably non-semantic, but if it was for instruments (e.g. "flute plays as written, oboe 8vb") then it would be more important. The lines share written rather than sounding pitch, so it is written pitch that must be recorded if you want to avoid writing the lines twice. However, I can't recall ever seeing an example like that, and if any exist then they probably belong to a wider discussion about how to handle parts and shared staves. (Perhaps this kind of split should be a property of the clef rather than an ottava line.) Re. (3), this also occurs most often in vocal scores. You sometimes see something along the lines of:
In these situations the performer (and therefore the playback algorithm) has a decision to make, but there is usually a preferred option that is either specified by the composer or has become established by convention. Whether the notes should be stored at sounding or at written pitch depends on whether the convention is to play the ottava or to ignore it.
|
Since this posting would otherwise be far too long, I am splitting it into 3 parts:
I'm providing the following summary because this thread is a continuation of #138, that thread is very long, and the current draft spec no longer reflects the current state of the debate. Summary of what @clnoel and I are currently thinking (25.04.2019)(@clnoel - please correct me if you disagree with any of this.) A
|
@clnoel Welcome back! I like the audio parser / graphics parser terminology very much! Yes, I also want
The audio parser also has to know if this is a transposing instrument. Adding "in F" to your example should mean that the audio ends up another 7 semitones lower. Others note: In this case there is no ambiguity about the pitch -- as would be the case if we were trying to transpose the graphics. Here are some thoughts about defining the meaning of (non-standard) accidentals. These illustrate, I think, a further advantage of adopting the proposal that is being supported by @clnoel and myself, rather than continuing to use MusicXML's approach (as described in the current draft spec, and proposed by the co-chair in PR #152) As in @clnoel's example, these attributes can be used in combination with the current total transposition state (defined in the
The audio parser would lower the pitch by 1 semitone for the Eb in the key signature and by one octave for the 8vab. If a forced natural was needed on the note, it would be encoded like this:
Such an accidental would override the key-signature.
Is a note that looks like @clnoel's example (with key signature and 8vab) but sounds like a rather sharp E5 (overriding the key-signature and 8vab). Since the
It would be straightforward to redefine these values and/or define values for other, non-standard accidentals used in the file, in a table somewhere earlier in the file. For example, some of the Stockhausen accidentals could be defined like this
The accidentals in the key signature in @mogenslundholm's example could also be coded in this way. |
@clnoel, the spec states that the pitch attribute is "the musical pitch of the note". This corresponds to choice 4, Eb4. This is similar to how MusicXML works, which has been successful. Some of your other choices are reminiscent of how NIFF worked. I found that most NIFF parsers got pitch wrong due to thinking of pitch graphically rather than semantically. I think there's room for discussion of a separate issue about the exact details of the accidental attribute. The current draft spec carries forward one of MusicXML's more error-prone design decisions, so it would be great to fix this in MNX-Common. I have started a new issue #153 for this. @notator, this issue is only about written vs sounding pitch, not for alternate syntaxes for pitch representation. The basics of pitch representation syntax have already been decided. There are separate issues for the accidental attribute (#153) and using cents instead of decimal fractions of a semitone (#19). |
@mdgood, As I think you said somewhere, we are currently using issues to record ideas. This seemed the best place to record an idea that follows from the proposal I'm currently backing, and which I think shows that it is superior to MusicXML's approach.
I'm not so sure about that. We haven't decided PR #152 yet, so we don't know if §5.2.6.1 The Thanks for opening #153, and mentioning #19. Maybe we should agree about PR #152 before continuing with those, otherwise we will be talking at cross-purposes. |
I think we can say with some degree of certainty that the element will stay largely as it is, notwithstanding the issue of how the sounding pitch should be encoded, because the co-chairs are not convinced of the benefits of any of the alternative approaches that have been proposed. I can't be persuaded to agree with either of @clnoel's first two proposals. I could be persuaded either way about whether the note in her example should be Eb5 or Eb4. In Dorico, for example, the note's pitch is not altered by the octave line/instruction: only its staff position. I believe that octave lines, like clefs, are presentational: they do not change the pitch of the note, only where on the staff it is displayed. For that reason I would be inclined not to include the effect of the octave line/instruction on the pitch of the note, but, as I say, I could be persuaded either way. |
@dspreadbury As I said above,
So I wouldn't mind using
to mean the same thing as
Its just that I think MNX-Common also needs a consistent mechanism for dealing with non-standard accidentals. (#153 needs to be addressed later.)
Just a remark: I think the code in Dorico's implementation of CWMN is important in deciding what CWMN is, and that there must be an unambiguous way for Dorico to import/export MNX-Common files. But that does not mean that MNX-Common has to mirror Dorico's internal code structures exactly. Different applications will use different approaches to implementing CWMN, so they can't all use exactly the same approaches as MNX-Common.
Am I right in thinking that you are inclining to agree with me that Eb5 is correct? Nice that even if not, you might be persuaded otherwise. :-) All the best, |
Just a remark: stating "In Dorico, for example" does not imply that I believe MNX-Common should mirror Dorico's approach at all. I believe your command of the English language is more than good enough to understand the idiom "for example", so please don't put words into my mouth. |
@clnoel Thanks for spelling out those four options — it's quite useful for focusing the discussion! I vote for the fourth option (Eb4), as it strikes the right balance between semantics and presentation. This is what my pull request in #152 is intended to communicate. I agree with @dspreadbury that ottavas and clefs are more presentational than semantic. This is nicely captured by @shoogle in his comment above pointing out different editions have different ottava decisions for the same underlying music. A note's accidental is both semantic and presentational: it fundamentally affects the meaning of the pitch itself (e.g., Eb4 is a different pitch than E4), but its graphical display is a presentational decision (e.g., does this note have a accidental visually rendered next to it? See issue #153). Finally, a subtle note about the meaning of "written pitch." @clnoel said in the comment above that she leans toward option 1 because it reflects the written pitch and we'd decided written pitch is the way forward. I'd like to attempt to clarify our discussion by separating the vague concept of "written pitch" into two concepts:
(These terms were completely made up by me, on the spot, and they're serving only for the purposes of clarifying this discussion thread. :-) ) I believe MNX-Common should use Performer-Centric Written Pitch. The pull request in #152 attempts to codify that, via its definition of the term "written pitch," and I'm very interested to get feedback on whether the definition is clear and unambiguous enough. It defines it in terms of the pitch generated by an concert-pitch instrument playing it, which may or may not be a good approach. |
I was just amplifying/agreeing with what you said. It would help if you didn't always, by default, bite my head off. |
I also misunderstood the co-chair's decision in the same way in the above posting where I said
Which only goes to show how important it is to use a precise terminology whose meaning we all agree about. I'm trying very hard to be constructive here, but I have to say that I think its a mistake to try to encapsulate what the performer may be thinking in a file format that is going to be read by programmers and machines. As a programmer, I would prefer the local XML (i.e. the Hope that helps. |
FWIW, and mostly because @bhamblok requested so, here's how MEI deals with notes. It can be used in multiple ways, it's focussed on the visual representation (written pitch). However, sounding pitch is available as well. The situation from above could be encoded in multiple ways, but no matter how it would be encoded, there is no way to misunderstand the encoding:
An encoding may decide to not provide one or the other (actually, it could go without any of those, but that's a different story). In that case, it's very often still possible to infer the missing information from key signatures, information about transposing instruments or other places, but that may require more processing than every application would be willing to invest, simply because it would be out of scope. I'm not trying to advertise anything, I just want to make some other's conclusions available to this thread. |
I am not sure I understand what you mean when saying that the MEI approach is overly complicated. In any case, if you are looking at a XML-programming scenario, you will probably not do |
@lpugin
I meant that distinguishing clearly between the graphics parser and the audio parser makes having separate attributes for the graphics and audio unnecessary. In this proposal, the XML is designed so that it can be parsed in either way. Applications that are not interested in audio can just parse the graphics, and vice-versa. Apps that want to parse both can, of course, do so -- and the two domains will be automatically synchronised.
|
Technically, the "written pitch" is an image on a page, which I think we all agree is way too far towards the graphical side of things. We have already decided that performed frequency (that takes everything into account all the way through transposing instruments and unwritten microtones) is way too far toward the audible side of things, and put that in a separate optional property (sounding pitch). The question we are addressing here is: Where is the line that establishes enough semantic value to make both a graphical and an audible representation viable (assuming no sounding pitch is specified)? I would also like to point out that the difficulty of establishing "pitch spelling" (the set of accidentals displayed in the graphics) is one of the reasons we decided to move away from sounding pitch in the first place. I've been thinking about this a lot since I last commented with the set of options above. I've talked about it with my colleagues here, and I now think I'm actually leaning toward using "E4" (The second option). With this option, the key-signature has semantic meaning: it is necessary to the audio-parser, which defaults to it if there is nothing in the accidental property. It makes the discussion about how to do pitch-spellings simpler, because the "spelling" part goes in the accidental property, never in the base-pitch. It also makes the ottava have semantic meaning, which is programmatically equivalent to an intervening clef-change, and changes where the graphical display of the note goes, while being ignorable for an audio parser. I completely understand that this is a kind of half-and-half representation. I feel that that kind of half-and-half representation is necessary now that we have decided not to go all the way graphical or all the way audible. I acknowledge that this might make some analyses harder, because the fact that it is an Eb, not an E, would need to be figured in by using the accidental. However, given the difficulties in correctly specifing, (e.g.) Ebb4, by putting the "bb" in both the pitch and the accidental properties, I think this also provides less duplication! --Christina |
Sorry, I don't agree. An E is an E and an Eb is an Eb. Can you elaborate with some examples how you would encode two sequential E flats (in a key of C Major) where the second one doesn't need the accidental to be shown? It would be really confusing if they are encoded in a different way. I think "semantics" are utmost superior to written and/or sounding properties. |
@clnoel and @bhamblok However: :-)
would mean that the graphics parser would write a C4 notehead preceded by a # accidental. |
@clnoel wrote:
I don't think I can ever be convinced of options 1 or 2. :-/ An E-flat is not an E. In my view, this doesn't pass a baseline test of "is this note represented semantically?" Options 1 and 2 require too much knowledge of state (the key signature), for something too important to mess up (the pitch).
But the base-pitch is part of the spelling, no? Consider G-flat vs. F-sharp. The spelling difference between those two notes exists in the accidental and the base-pitch. |
"Semantic" is a tricky word... |
I like Option 4.
Agreed. They also require an assumption that accidentals remain in effect until the end of the measure, or until superceeded by a different accidental. While this is true for most sheet music, Gould mentions (I forget the page number) that other conventions have existed, such as requiring accidentals to be explicitly stated (i.e. any note without an accidental is a natural). This kind of music can be encoded by Options 3 or 4 but not by 1 or 2 (at least not without risking incorrect playback). Option 1 or 2 would make sence for OMR, but for pretty much any other use-case Option 3 or 4 is a better choice. |
Sorry for this long post. It is difficult for me to express my ideas in English and this results in a longer text. Sorry! In the beginning, more or less we all assumed that MNX would follow MusicXML for representing pitch, as no issues were raised with MusicXML pitch representation. Later, an important question was raised: the issue of what to do for transposing instruments. This introduced the concept of written pitch vs. sounding pitch. But in any case, when the issue was raised, the meaning of 'written pitch' and 'sounding pitch' was basically:
After some argumentation it was clear that written pitch (what MusicXML uses) is the most practical. This should have been closed this issue, so that we can proceed with other work. Unfortunately, the words 'written pitch' and 'sounding pitch' are open to interpretation, and the Pandora box was open if we interpret those words differently. And IMO this is current situation: a lot of different proposals trying to solve non known problems of MusicXML approach. Music is sound. And for more than ten centuries people has being trying to represent music with symbols. The music score is the best system found for this. So, now we are trying to represent the music score (not its graphical appearance but its content) but using 'computer symbols'. To me the best approach is to mimic the music score (the best known system to represent music, apart from audio recordings). The notes are represented by a notehead placed on a staff and the sound (pitch) is implied by many other symbols: the notehead position on the staff, the clef, the accidentals, the 8va marks, etc. To me, when we talk about 'written pitch' I understand 'notehead position on the staff' and the simplest way of expressing this location is by the 'displayed pitch' (this is basically what MusicXML uses). So in @clnoel example, notehead position is E5 (or Eb5 -- more on this later --). To me the current problems arise when we compare this written pitch with the sounding pitch, as in this example they are different. But the problem disappears if we return to the idea of understanding 'written pitch' not as pitch but as 'position on the staff'. So E5 is not a pitch but a reference to notehead position: 'notehead on fourth space'. That is MusicXML understanding and that is what I propose to follow. It does not give preference to 'sound' parsers nor to 'graphical parses. It is just a way of expressing were the notehead is placed on the staff. Now to the issue of E5 vs Eb5. MusicXML takes into account applicable accidentals and would use Eb5, I assume that this decision was taken to simplify having to track applicable accidentals. For long time, in my applications I choose the opposite approach, use E5 (as if a was writing the score with pen and paper) and force the application to compute applicable accidentals. In my many years experience, I found that both systems work well and no special problems arise with any of them. But I have found MusicXML system (Eb5) better than my application system (E5), as it simplifies the algorithms for preserving displayed accidentals when a transposition is applied. So, my vote is for current MusicXML approach, Eb5, option 3. Although option 1, E5, would also be acceptable to me. Hope this helps! |
Closed with #152. |
In MusicXML, all pitches are framed in terms of the instrument's transposed pitch. In MNX, we can re-examine this question.
Encoding in transposed pitches makes performance interpretation into more work since the instrument's transposition must be applied, and this transposition could even vary during the course of a piece.
It also makes dynamic display of concert pitch trickier, since assumptions must be made about enharmonic spelling and key signatures.
On the other hand, encoding transposed pitches will more accurately reflect a manuscript that has been composed in transposed form, and may ultimately be more robust in terms of delivering a definitive final instrumental part.
Other pros and cons need to be brought out also.
The text was updated successfully, but these errors were encountered: