Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

Multiple text formats for symbols via "format" expression (native port) #12624

Merged
merged 12 commits into from Oct 15, 2018

Conversation

ChrisLoer
Copy link
Contributor

@ChrisLoer ChrisLoer commented Aug 13, 2018

Port of mapbox/mapbox-gl-js#6994.

  • Port automatic argument coercion for compound expressions (this is no longer necessary for the format expression itself, but it was part of the JS PR and we want to keep native in sync)
  • Port format expression parsing code
  • Port format-aware shaping code
  • Implement Android conversion/serialization
  • Implement non-ICU styled Bidi for Qt
  • Figure out iOS and Android binding strategy: cc @fabian-guerra and @LukasPaczos. It shouldn't be too complicated, but like collator, format uses the new object/dictionary argument format e.g. { "font-scale": 1.5, "text-font": "awesome" }. I think it should be just fine to defer binding work to later PRs.

@ChrisLoer ChrisLoer added ⚠️ DO NOT MERGE Work in progress, proof of concept, or on hold Core The cross-platform C++ core, aka mbgl labels Aug 13, 2018
@ChrisLoer
Copy link
Contributor Author

@asheemmamoowala if you want to start looking I just got an initial port of the expression code itself in...

ChrisLoer added a commit that referenced this pull request Aug 21, 2018
Requires changing `generate-style-code` to treat 'formatted' as being the same as 'string' until gl-native gets 'formatted' support with #12624.
To make nitpick happy, PropertyFunction.java uses the latest "text-field" description from v8.json. It's technically correct, just kind of pointless since the "If a plain `string` is provided" clause will always be true.
ChrisLoer added a commit that referenced this pull request Aug 22, 2018
Requires changing `generate-style-code` to treat 'formatted' as being the same as 'string' until gl-native gets 'formatted' support with #12624.
To make nitpick happy, PropertyFunction.java uses the latest "text-field" description from v8.json. It's technically correct, just kind of pointless since the "If a plain `string` is provided" clause will always be true.
@1ec5 1ec5 added the GL JS parity For feature parity with Mapbox GL JS label Aug 25, 2018
@1ec5
Copy link
Contributor

1ec5 commented Aug 25, 2018

Figure out iOS and Android binding strategy: … It shouldn't be too complicated, but like collator, format uses the new object/dictionary argument format e.g. { "font-scale": 1.5, "text-font": "awesome" }.

The iOS/macOS runtime styling API will support this expression operator automatically via the ugly MGL_FUNCTION() fallback expression. #12739 tracks implementing a more convenient affordance of some sort in NSExpression.

Since this expression operator will be used to implement bilingual or multilingual labels, #12738 tracks enhancing the iOS/macOS expression localization functionality to avoid making such labels repeat the same name multiple times. (The same considerations apply to the Mapbox GL Language plugin and the Android map SDK’s localization plugin.)

/cc @nickidlugash

@ChrisLoer
Copy link
Contributor Author

Status update: the initial version of this is working 🎉 !

There's a lot of cleanup that remains to be done and it appears I'm occasionally doing some sort of undefined/unexpected behavior when I run the BiDi algorithm on styled text (note the random incorrect e -- I just sometimes see garbage like this show up):

screenshot 2018-09-13 17 11 20

@ChrisLoer ChrisLoer removed the ⚠️ DO NOT MERGE Work in progress, proof of concept, or on hold label Sep 22, 2018
@ChrisLoer ChrisLoer force-pushed the formatted-text branch 2 times, most recently from b54e51f to dafaa89 Compare September 25, 2018 21:47
@ChrisLoer ChrisLoer changed the title WIP: Multiple text formats for symbols via "format" expression (native port) Multiple text formats for symbols via "format" expression (native port) Sep 25, 2018
XCTAssertEqual(rawLayer->getTextField(), propertyValue,
@"Setting text to a constant value expression should update text-field.");
XCTAssertEqualObjects(layer.text, constantExpression,
@"text should round-trip constant value expressions.");

constantExpression = [NSExpression expressionWithFormat:@"'Text Field'"];
constantExpression = [NSExpression expressionWithFormat:@"MGL_FUNCTION('format', 'Text Field', %@)", @{}];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still possible to set the textField property to a string as opposed to a formatted string? If so, we should continue to test that capability alongside the tests for formatted strings. It might require some refactoring of the EJS code that generates this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that's what the first layer.text = constantExpression on line 1100 does? (see NSExpression *constantExpression = [NSExpression expressionWithFormat:@"'Text Field'"];)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I placed my comment on the wrong diff. Specifically, I was concerned that setting the property to a plain string would fail to round trip – it would be upgraded to a formatted string. If that’s the case, it’s going to be fairly difficult for developers to work with even after we implement special format expression support. Perhaps there could be logic that downgrades a formatted string to a plain string if there’s nothing special about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm pretty unsure about all the pathways that can be followed here, but I think what you're talking about is something like this?

    MGLStyleLayer *countryLayer = [self.mapView.style layerWithIdentifier:@"country-label"];
    if ([countryLayer respondsToSelector:@selector(text)]) {
        [countryLayer setValue:[NSExpression expressionWithFormat:@"'Fooistan'"] forKey:@"text"];
        NSExpression* roundTrip = [countryLayer performSelector:@selector(text)];
        NSLog(@"%@", roundTrip);
    }

In which case the results of the logging is just plain "Fooistan". (Also the map looks much cleaner after the change)

This round tripping works because of the logic in toPropertyValue:

if (expression.expressionType == NSConstantValueExpressionType) {
MBGLType mbglValue;
getMBGLValue(expression.constantValue, mbglValue);
return mbglValue;
}

Are you specifically concerned about constant strings vs. expressions that return a value of type string? (these are the ones that will get wrapped in a coercion by the parsing, although I think this should be analogous to other implicit coercions based on property values)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow I’m concerned about is existing code like this:

layer.text = [NSExpression expressionForConstantValue:@"Foo"];
// later
if ([layer.text.constantValue isEqualToString:@"Foo"]) { … } // crash
// or
if ([layer.text isEqual:[NSExpression expressionForConstantValue:@"Foo"]]) { … } // false

One always takes a risk by directly accessing NSExpression.constantValue without first checking the expression type, but if the developer set the text property in the first place, they may not expect to have to introduce that check when upgrading to a minor release of the SDK.

In other words, this implicit coercion is unusual in that it’s new. It’s also unusual in that formatted strings are a fairly foreign concept to iOS/macOS development, since we didn’t go the route of using NSAttributedString to represent them. Whereas coercions between strings and numbers are understood as invocations of the CAST() operator, there’s no CAST() operator support for converting between NSString and whatever class we’d represent a formatted string with. Introducing that CAST() support would be an elegant way of at least explaining away the surprise that some developers might experience.

Even so, we know from the v4.0.0 release that any changes to how coercions work can catch developers by surprise. In that case, we’re still fielding questions from developers who don’t expect to have to introduce CAST() into their inequality predicates. In this case, on the other hand, developers would be surprised at suddenly having to downcast from formatted strings to strings.

So my suggestion would be to automatically downcast from formatted strings to strings when there’s nothing but a ["formatted", "foo", {}]. Extra CAST() support would be cool too.

Copy link
Contributor Author

@ChrisLoer ChrisLoer Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the getter for any constant expression does the downcast, so the cases you provided all return true. This is a little different from doing an automatic downcast based on whether there are options. Even if you add an explicit format expression, you'll still always get a string constant value back. Basically, format is almost invisible from the point of view of the iOS SDK with this PR, except that you can set it via MGL_FUNCTION.

Test code I used:

    MGLSymbolStyleLayer *layer = (MGLSymbolStyleLayer*)[self.mapView.style layerWithIdentifier:@"country-label"];
    layer.text = [NSExpression expressionForConstantValue:@"Foo"];
    // later
    if ([layer.text.constantValue isEqualToString:@"Foo"]) {
        NSLog(@"constantValue comparison worked");
    }
    // or
    if ([layer.text isEqual:[NSExpression expressionForConstantValue:@"Foo"]]) {
        NSLog(@"constant expression comparison worked");
    }
    
    layer.text = [NSExpression expressionWithFormat:@"MGL_FUNCTION('format', 'Bar', %@)", @{}];
    // later
    if ([layer.text.constantValue isEqualToString:@"Bar"]) {
        NSLog(@"constantValue bar comparison worked");
    } 
    // or
    if ([layer.text isEqual:[NSExpression expressionForConstantValue:@"Bar"]]) {
        NSLog(@"constant bar expression comparison worked");
    }

@ChrisLoer ChrisLoer requested review from kkaefer and removed request for ansis September 25, 2018 22:40
@ChrisLoer
Copy link
Contributor Author

OK, I think this is ready for review!

@1ec5, could you look at the darwin bindings? This is not meant to be a full set of bindings -- just enough to let existing usage keep working and enable use of format via MGL_FUNCTION('format',...)

@LukasPaczos, could you look at the Android bindings? I just hacked in an undocumented Expression.format method to allow the tests to pass, and I don't understand the tests well enough to be sure that all the round-tripping is still working. The idea is that when you set text-field, it will convert that to a FormatExpression under the hood, but when you get results out they'll convert back to plain strings for now.

@kkaefer Can I interest you in reviewing the core part? I'm not really sure who's best qualified to dig into all the expression serialization/conversion code (which I'm most iffy about). The actual shaping code @ansis might be well placed to review, but that part's not too complicated and it's a pretty direct port from JS.

@@ -165,7 +165,12 @@ public class <%- camelize(type) %>LayerTest extends BaseActivityTest {
<% if (property.tokens) { -%>

layer.setProperties(<%- camelizeWithLeadingLowercase(property.name) %>("{token}"));
<% if (property.type === 'formatted') { -%>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to re-iterate concerns from #12624 (review) for Android as well. Will plain text, or other than format expressions accepted for text-field property? If yes, we need to rework this .ejs file to generate tests for both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah plain text is accepted but may be transformed... 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transformed to an expression? This would change the behavior of PropertyValue#isExpression when the text-field property is retrieved by SymbolLayer#getTextField which would return true even though it should return false for plain text. That's maybe a minor inconsistency but worth taking into consideration.

@@ -3224,6 +3224,11 @@ public static Expression collator(Expression caseSensitive, Expression diacritic
return new Expression("collator", new ExpressionMap(map));
}

public static Expression format(Expression input) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the plan is to introduce the full format expression implementation in a follow-up PR, I'd rather skip the public API + tests part here entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, like instead of modifying the test to use this, just omit the test entirely for the text-field case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, and remove the public API entry from Expression class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.

Always clear errors before trying automatic coercion for a new compound expression overload.
@ChrisLoer
Copy link
Contributor Author

Sounds like two votes for BiDi unit tests -- I'll implement that tomorrow.

// the styling options to use for rendering that code point
// The data structure is intended to accomodate the reordering/interleaving
// of formatting that can happen when BiDi rearranges inputs
using StyledText = std::pair<std::u16string, std::vector<uint8_t>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisLoer Since all symbol labels will be coerced to formatted text, are you concerned about memory consumption per-label increasing with each label being represented by a StyledText? The worst-case scenario is for a string to have a different style per-character, but I imagine that for the bulk of strings (those coerced) there will only be a single style.
Would it be possible to maintain the styling options in a block format instead of per-code-point? the size of the vector would equal the number of sections and each entry would be the end index of the section with the corresponding index in the sections array.

Text hello hola こんにちは
Section end index 4 8 13?
Section 0 1 2

TaggedString would need to perform a search linear in sections length for getSectionAt() but for the most common case that shouldn't be too bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the JS side I did a little experimentation with a data structure kind of along the lines of what you're suggesting. It is a more memory-compact representation for the common case. What I was doing was iterating over the blocks as a whole, which made a lot of the iteration logic in the shaping code significantly more complicated. I think you're suggesting maintaining the same interface to TaggedString but just calculating the section index on-the-fly, so a CPU vs. memory trade off. That seems reasonable, but I don't know which would win without actually doing the experiment.

FWIW, on the JS side I looked at overall memory consumption in a full-screen map before and after this change, and couldn't detect any difference. It should be 1 byte-per-character, and I'd guess a full screen map has maybe on the order of 10k characters worth of text?

@ChrisLoer ChrisLoer force-pushed the formatted-text branch 3 times, most recently from 0fc0825 to 0b1466e Compare October 10, 2018 23:07
@ChrisLoer
Copy link
Contributor Author

OK, I think I've addressed all of the current round of review comments -- there were a lot of them, please let me know if I've left yours unresolved!

@asheemmamoowala I tried doing some memory profiling, but it's pretty hard to get reproducible numbers. There's at least no huge change. I think we just have to rely on the back-of-the-envelope estimate that says it shouldn't be too bad. The thing I worry about more in this PR is that it will lengthen tile layout/preparation time because of the extra format-related work. The good news is foregound/rendering costs should be pretty much unaffected since we're not adding any new GPU data or any extra foreground CPU processing.

@1ec5 Besides the existing tests, I played around a little with the text setter/getter to see it behaved the way I expected, and I tested that the macosapp's automatic localizer functionality still works without requiring any changes (testing that actually exposed a bug in my initial implementation of the aftermarket expression logic for format).

@LukasPaczos I went back to the "no Android wrapper for now" approach, with the assumption that we'll want to re-visit that when it's time to do a semver-major release. The constant-value getter/setters are still String-typed. The getter will now return an expression with the implicit format built-in. Instead of removing the tests for text-field, I added manual expression building code to the tests to handle format (e.g. Converter.convert(new JsonArray().add("format").add(<rest of expression>)))

Copy link
Contributor

@asheemmamoowala asheemmamoowala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing some memory profiling, but it's pretty hard to get reproducible numbers. There's at least no huge change. I think we just have to rely on the back-of-the-envelope estimate that says it shouldn't be too bad.

@ChrisLoer this sounds good. Good call on avoiding pre-mature optimizations

The thing I worry about more in this PR is that it will lengthen tile layout/preparation time because of the extra format-related work.

I would imagine that additional time is distributed across parsing/expression-creation, shaping, and memory allocation. You could add short-cut code paths in the shaping and bidi code to handle single section strings, but I doubt that it helps much by itself - while increasing code size.

@LukasPaczos
Copy link
Member

I'm working on android bindings in parallel in #12985 and I think I'm able to exhaust all the paths and get expected results.

All in all, LGTM 👍

@1ec5
Copy link
Contributor

1ec5 commented Oct 12, 2018

This is not meant to be a full set of bindings -- just enough to let existing usage keep working and enable use of format via MGL_FUNCTION('format',...)

Do these changes account for the exception when using MGL_FUNCTION within an NSExpression or NSPredicate format string (#13026)?

@ChrisLoer
Copy link
Contributor Author

Do these changes account for the exception when using MGL_FUNCTION within an NSExpression or NSPredicate format string (#13026)?

I've looked at that issue, but I don't understand how to trigger the problem (I'm sure you said how to do it, I just don't understand the vocabulary around NSExpression). FWIW, in terms of that issue, I'd expect the behavior here to be the same as the behavior for MGL_FUNCTION('collator' ... since I followed that pattern.

@fabian-guerra
Copy link
Contributor

@ChrisLoer this is an example of what makes MGL_FUNCTION crash:

NSExpression *gradientExpression = [NSExpression expressionWithFormat:@"mgl_interpolate:withCurveType:parameters:stops:(MGL_Function('line-progress'), 'linear', nil, %@)", stops];

@@ -28,6 +28,13 @@ struct Converter<PropertyValue<T>> {
? PropertyValue<T>(PropertyExpression<T>(convertTokenStringToExpression(t)))
: PropertyValue<T>(t);
}

PropertyValue<T> maybeConvertTokens(const expression::Formatted& t) const {
std::string todoText = t.sections[0].text;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a const std::string& to avoid the copy.

sed -i.bak 's/((index)|((int32_t)(level)<<31))/((index)|((uint32_t)(level)<<31))/' "src/ubidiimp.h"
sed -i.bak 's/((x)|=((int32_t)(level)<<31)/((x)|=((uint32_t)(level)<<31)/' "src/ubidiimp.h"
rm -rf src/ubidiimp.h.bak

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this works, I think that using an actual patch conveys the intention of the command better. Additionally, we'll be alerted if the upstream file changes in an incompatible way when we're updating the ICU version.

patch -p0 << PATCH
--- src/ubidiimp.h
+++ src/ubidiimp.h
@@ -198,8 +198,8 @@
 /* in a Run, logicalStart will get this bit set if the run level is odd */
 #define INDEX_ODD_BIT (1UL<<31)
 
-#define MAKE_INDEX_ODD_PAIR(index, level) ((index)|((int32_t)(level)<<31))
+#define MAKE_INDEX_ODD_PAIR(index, level) ((index)|((uint32_t)(level)<<31))
-#define ADD_ODD_BIT_FROM_LEVEL(x, level)  ((x)|=((int32_t)(level)<<31))
+#define ADD_ODD_BIT_FROM_LEVEL(x, level)  ((x)|=((uint32_t)(level)<<31))
 #define REMOVE_ODD_BIT(x)                 ((x)&=~INDEX_ODD_BIT)
 
 #define GET_INDEX(x)   ((x)&~INDEX_ODD_BIT)
PATCH

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, definitely better!

@@ -100,6 +100,108 @@ std::vector<std::u16string> BiDi::processText(const std::u16string& input,

return applyLineBreaking(lineBreakPoints);
}

std::vector<StyledText> BiDi::processStyledText(const StyledText& input, std::set<std::size_t> lineBreakPoints) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this fix looks good; let's apply it until there's an upstream fix.

}

void TaggedString::verticalizePunctuation() {
// Relies on verticalization changing characters in place so that style indices don't need updating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to rely on, but we could add a comment to

const std::map<char16_t, char16_t> verticalPunctuation = {
{ u'!', u'' }, { u'#', u'' }, { u'$', u'' }, { u'%', u'' }, { u'&', u'' },
{ u'(', u'' }, { u')', u'' }, { u'*', u'' }, { u'+', u'' }, { u',', u'' },
{ u'-', u'' }, { u'.', u'' }, { u'/', u'' }, { u':', u'' }, { u';', u'' },
{ u'<', u'︿' }, { u'=', u'' }, { u'>', u'' }, { u'?', u'' }, { u'@', u'' },
{ u'[', u'' }, { u'\\', u'' }, { u']', u'' }, { u'^', u'' }, { u'_', u'' },
{ u'`', u'' }, { u'{', u'' }, { u'|', u'' }, { u'}', u'' }, { u'~', u'' },
{ u'¢', u'' }, { u'£', u'' }, { u'¥', u'' }, { u'¦', u'' }, { u'¬', u'' },
{ u'¯', u'' }, { u'', u'' }, { u'', u'' }, { u'', u'' }, { u'', u'' },
{ u'', u'' }, { u'', u'' }, { u'', u'' }, { u'', u'' }, { u'', u'' },
{ u'', u'' }, { u'', u'' }, { u'', u'︿' }, { u'', u'' }, { u'', u'' },
{ u'', u'' }, { u'', u'' }, { u'', u'' }, { u'', u'' }, { u'', u'' },
{ u'', u'' }, { u'', u'' }, { u'', u'' }, { u'', u'' }, { u'', u'' },
{ u'', u'' }, { u'', u'' }, { u'', u'' }, { u'', u'' }, { u'', u'' },
{ u'', u'' }, { u'', u'' }, { u'', u'' }, { u'', u'' }, { u'', u'︿' },
{ u'', u'' }, { u'', u'' }, { u'', u'' }, { u'', u'' }, { u'_', u'' },
{ u'', u'' }, { u'', u'' }, { u'', u'' }, { u'', u'' }, { u'', u'' },
{ u'', u'' }, { u'', u'' }, { u'', u'' },
};
indicating that we also need to change it here.

@@ -55,17 +55,21 @@ class Glyph {
};

using Glyphs = std::map<GlyphID, optional<Immutable<Glyph>>>;
using GlyphMap = std::map<FontStack, Glyphs>;
using GlyphMap = std::map<FontStackHash, Glyphs>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why change this to a hash instead of using std::unordered_map with a custom Hash template argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we look up the glyphs in the shaping code, all we have is the FontStackHash (this is to keep PositionedGlyphs and TaggedString relatively compact). Is there a way to do a find-by-hash on an unordered_map with a custom hasher?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah hm, maybe we could store an iterator instead? Iterators in unordered_map aren't invalidated, except when the item is removed. I was just noticing that there are a lot of changes where we explicitly invoke the Hasher which feels a bit awkward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true, the explicit hashing does feel awkward. I think the part that feels worst is when we have to do the reverse from hash->[string] for dependency lookup (i.e. GeometryTileWorker::onGlyphsAvailable).

I'm interested in the "store an iterator" idea, but I'm concerned that it will make the code harder to audit because you have to make sure that if you store an iterator you're using the same map when you try to use the iterator later -- and GeometryTileWorker already merges GlyphMaps from multiple requests to GlyphManager over time...

Add support for `MGL_FUNCTION('format', <text>, <options dictionary>)`
- No dedicated support for creating format expressions
- Java accessors for 'text-field' flatten back to String
- 'text-field' setter implicitly creates a 'format' expression. For tests, use JsonArray to build an equivalent format expression by hand.
- Port of arabic.test.js from mapbox-gl-rtl-text
- Modify BiDi::getLine to remove trailing nulls in the event UBIDI_REMOVE_BIDI_CONTROLS causes the string to shorten.
- Patch vendored ICU to avoid undefined undefined bit shifting behavior (triggered sanitizer failure)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Core The cross-platform C++ core, aka mbgl GL JS parity For feature parity with Mapbox GL JS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants