Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing AttributedSpans.collapseSpans to treat end positions as exclusive #642

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jmatth
Copy link
Contributor

@jmatth jmatth commented Jun 23, 2022

Currently, AttributedSpans.collapseSpans returns objects with an end index that is inclusive. This is in contrast to every other list or string related method in Dart and Flutter, which use exclusive end indexes. This PR brings collapseSpans in line with the rest of the ecosystem by making the end indexes in its return exclusive.

There are currently multiple tests broken that I'm working my way through, but I wanted to open this PR to allow for discussion to take place in the meantime.

The current behavior of collapseSpans produces output that is probably not what most users are expecting, and there are currently at least two open issues related to it: #593 and #632. The use of inclusive end indexes means that users have to perform annoying index manipulation to use the output with other methods such as substring. For example, naively printing the contents of each collapsed span drops a character at each boundary:

test('Attributed spans substrings', () {
  final textRegular = 'regular';
  final textBold = 'bold';
  final textItalic = 'italic';
  final text = '$textRegular$textBold$textItalic';
  final attrText = AttributedText(text: text)
    ..addAttribution(
      ExpectedSpans.bold,
      SpanRange(
        start: textRegular.length,
        end: textRegular.length + textBold.length,
      ),
    )
    ..addAttribution(
      ExpectedSpans.italics,
      SpanRange(
        start: textRegular.length + textBold.length,
        end: text.length,
      ),
    );

  final collapsedSpans = attrText.spans.collapseSpans(contentLength: text.length);

  for (final span in collapsedSpans) {
    print('"${text.substring(span.start, span.end)}" - $span"');
  }
});

Output:

"regula" - [MultiAttributionSpan] - attributions: {}, start: 0, end: 6"
"bol" - [MultiAttributionSpan] - attributions: {[NamedAttribution]: bold}, start: 7, end: 10"
"" - [MultiAttributionSpan] - attributions: {[NamedAttribution]: bold, [NamedAttribution]: italics}, start: 11, end: 11"
"talic" - [MultiAttributionSpan] - attributions: {[NamedAttribution]: italics}, start: 12, end: 17"

This is also the case in super_editor, especially in the IME code, where incoming TextSelection values need -1 applied to query attributions within a given range 1 2 3 4.

Given the above, I think collapseSpans should be modified to return exclusive end indexes for better compatibility with the rest of the Dart/Flutter ecosystem. Doing so should probably be considered a breaking change, as any code written with the old behavior in mind would then contain off-by-one errors.

These changes also resolve a couple of bugs I found while investigating this: one where collapseSpans would return a span where the end index was off by one, and one where it could throw a StateError.

Resloves #593, resolves #632.

@jmatth jmatth force-pushed the attributed-span-end-exclusive branch from 7e9c46a to ddb1af5 Compare June 24, 2022 04:40
@jmatth jmatth marked this pull request as ready for review June 24, 2022 16:42
@matthew-carroll
Copy link
Contributor

It's true that inclusive end caps work differently than traditional string operations. Then again, we aren't doing traditional string operations. With regard to span operations, I made the decision for caps to be inclusive so that we avoid other points of confusion. For example, if a span covers a single character at index 2, it's strange to see 2->3. We don't actually "span" the indices of 2->3, we only "span" the character at index 2.

Additionally, SpanMarkers refer to the actual index to which they apply. It can be confusing to define a span from 2 to 2 and then end up with a span range that's 2 -> 3.

I'm not convinced that making span ranges behave like typical string ranges is a good idea. This change might make some things more clear, but I think it also makes other things more confusing. I'm not sure this is a good tradeoff.

Let me now if there's more to the argument than "span ranges don't behave like typical string ranges". If there's a fundamental problem, or if there's evidence that this behavior is adding non-trivial work to commonly re-occurring developer tasks, then maybe it's worth doing. If that's the case, I'd like to see such examples. If it's just off-by-one issues, those can either go one way or another. Adjusting things by 1 happens with traditional ranges, too. It's just a question of which situations need an adjustment by 1, and which don't.

@jmatth
Copy link
Contributor Author

jmatth commented Jun 28, 2022

To address SpanRange first: it currently contains several confusing inconsistencies:

  1. Passing a SpanRange to [AttributedText.addAttribution][AttributedText.addAttribution] treats end as inclusive
  2. All the methods on SpanRange that were copied from TextSpan treat end as exclusive (for example, textInside)
  3. The docs say end is exclusive

As a developer, I would expect SpanRange.end to at least be consistently treated as either exclusive or inclusive. Currently this is not the case:

  test('SpanRange consistency', () {
    const text = 'Hello there';
    final attrText = AttributedText(text: text);
    const range = SpanRange(start: 0, end: 5);
    attrText.addAttribution(boldAttribution, range);
    final substring = range.textInside(text);
    for (var i = 0; i < attrText.text.length; i++) {
      if (i < substring.length) {
        expect(
          attrText.getAllAttributionsAt(i),
          contains(boldAttribution),
          reason: 'Characters returned from textInside should be bold',
        );
      } else {
        expect(
          attrText.getAllAttributionsAt(i),
          isNot(contains(boldAttribution)),
          reason: 'Characters excluded from textInside should not be bold',
        );
      }
    }
  });

Output:

Expected: not contains NamedAttribution:<[NamedAttribution]: bold>
  Actual: Set:[NamedAttribution:[NamedAttribution]: bold]
Characters excluded from textInside should not be bold

I agree that SpanMarker and SpanRange should agree on the behavior of the end index (or SpanMarker should just be internal). This PR also updates SpanMarker to use exclusive end indexes, which actually simplified several parts of AttributedSpans.collapseSpans where it no longer needs to distinguish between start and end markers.

I disagree that having exclusive end indexes is inherently more confusing. It might be true in a vacuum, but this is a Dart library and for better or worse Dart was deliberately designed to be similar to other C-like languages, which includes exclusive end indexes. I think it is more confusing to learn Dart String, List, and maybe Flutter TextRange, and then suddenly have to contend with a different paradigm when using this library. Even once that paradigm is learned, mentally switching between vanilla Dart/Flutter and AttributedText index conventions adds extra cognitive overhead during development.

One last thing I'll push back on:

Adjusting things by 1 happens with traditional ranges, too.

This is true, but now it has to happen at least twice as often: once while you're calculating whatever your index is, and once moving in/out of this library to vanilla dart. More if you're crossing that boundary multiple times. Each one of these fiddly +/- 1 operations increases the risk of programmer error.

The diff for this PR demonstrates how consistency with the rest of the ecosystem simplifies using AttributedText. Everywhere that used a calculated rather than hard-coded index, the only changes were removing +/- 1 adjustments. Nowhere did I have to add such adjustments. Once AttributedText agreed with String.substring, TextRange, etc. it Just Works ™️.

@matthew-carroll
Copy link
Contributor

As a developer, I would expect SpanRange.end to at least be consistently treated as either exclusive or inclusive. Currently this is not the case:

Yeah, we definitely want consistency across super_editor.

The diff for this PR demonstrates how consistency with the rest of the ecosystem simplifies using AttributedText. Everywhere that used a calculated rather than hard-coded index, the only changes were removing +/- 1 adjustments. Nowhere did I have to add such adjustments. Once AttributedText agreed with String.substring, TextRange, etc. it Just Works ™️.

That's definitely an attractive point. If this were just about SpanRange, I'd tend to agree. However, I still find the following marker definitions misleading:

final spans = AttributedSpans(
          attributions: [
            const SpanMarker(attribution: ExpectedSpans.bold, offset: 0, markerType: SpanMarkerType.start),
            const SpanMarker(attribution: ExpectedSpans.bold, offset: 16, markerType: SpanMarkerType.end),
          ],
        );

To me, markers are like offsets. If you want the character at index 5, you don't specify an index of 6. Yet, the above span specifies an offset of 16 when defining a marker position that technically applies to 15.

Do you have any thoughts on improving marker clarity? If we can find a reasonable marker API that doesn't easily confuse the end index, then I'm open to making this change.

@jmatth
Copy link
Contributor Author

jmatth commented Jul 5, 2022

Do you have any thoughts on improving marker clarity? If we can find a reasonable marker API that doesn't easily confuse the end index, then I'm open to making this change.

If the sticking point here is specifically how SpanMarkers are treated, I would say the easiest answer is to just make them internal. As far as I can tell the only place they're available to users right now is in this AttributedSpans constructor. Once an AttributedSpans object is constructed none of the methods take or return SpanMarkers. Even without this whole end index situation, I'd argue SpanMarkers should be made internal because:

  1. The way AttributedSpans stores its spans is an implementation detail and should be hidden from users anyway
  2. This sort of inconsistency where one type is used in the constructor and others in all subsequent methods makes the API harder to use
  3. Allowing users to specify a list of span markers lets them write broken code like AttributedSpans(attributions: [SpanMarker(attribution: ExpectedSpans.bold, offset: 0, markerType: SpanMarkerType.start)])

SpanMarker and SpanMarkerType could both be made internal and the constructor signature for AttributedSpans could be changed to something like this:

AttributedSpans({
    List<AttributionSpan>? attributions,
  }) : markers = (attributions ?? [])
            .expand(
              (span) => [
                SpanMarker(attribution: span.attribution, offset: span.start, markerType: SpanMarkerType.start),
                SpanMarker(attribution: span.attribution, offset: span.end, markerType: SpanMarkerType.end),
              ],
            )
            .toList() {
    _sortAttributions();
  }

The old signature could also be retained as an internal constructor to make implementing certain things like copy() easier:

AttributedSpans._fromMarkers({
    List<SpanMarker>? attributions,
  }) : markers = [...?attributions] {
    _sortAttributions();
  }

Of course, this just pushes the problem elsewhere. I think I missed AttributionSpan in my previous work converting end indexes to be exclusive but I would argue it should be subject to the same change for the same reasons: this is Dart, everything else uses exclusive end indexes, maintaining that paradigm means this library will play nice with the rest of the ecosystem etc. Same for all the other methods on AttributedSpans that just operate on indexes and attributions such as getAttributionsWithin. Would you be ok with those changes, or do you think that inclusive end indexes need to remain in the AttributedSpans public API?

One last thought: if you want to make a really dramatic API change, Apple sidesteps this issue entirely in NSRange by giving it a location and a length. That should be easy enough to copy and seems to remove any ambiguity about the indexes. Of course, that's an older API and newer things built into Swift like Range specify start and (exclusive) end indexes so maybe they discovered a reason to use that over a location+length based system after all.

@matthew-carroll
Copy link
Contributor

It sounds like the change is reasonable, and making the markers private should be OK. Changing AttributionSpan makes sense, too.

First, I'd like to make sure that the API changes in this PR work throughout the mono repo. Can you run all tests locally for all projects to ensure that they pass? I think CI might use the public dependency versions, which won't pick up your changes, but I'm not sure about that.

Also, given that we have plenty of holes in our test coverage, can you build and run each sub-project's example app and poke around for issues?

This will be a breaking change, but it's important that we update all of our internal uses appropriately.

For the spans, do you think it would be helpful for users to also be able to access the end index? We know the range will provide start inclusive and end exclusive. Would it make sense to also add something like endIndex (or a better name)? That way developers have both pieces of information without needing to do silly +1/-1 adjustments...

@jmatth jmatth force-pushed the attributed-span-end-exclusive branch from 4545f89 to 7985a20 Compare July 6, 2022 21:12
@jmatth
Copy link
Contributor Author

jmatth commented Jul 6, 2022

For the spans, do you think it would be helpful for users to also be able to access the end index? We know the range will provide start inclusive and end exclusive. Would it make sense to also add something like endIndex (or a better name)? That way developers have both pieces of information without needing to do silly +1/-1 adjustments...

I don't think it would be that useful since most Dart developers should be used to dealing with exclusive end indexes anyway. But at the same time it's pretty trivial and Dart already encourages redundant getters like isEmpty/isNotEmpty, so I'm not strongly against it. Maybe endInclusive? Doesn't really flow but it's obvious what it is from the name and will pop right up in any IDE autocomplete when the user types end. inclusiveEnd matches English grammar better but might not appear in autocomplete as readily. If we don't care about optimizing for autocomplete lastIndex also comes to mind.

Also, given that we have plenty of holes in our test coverage, can you build and run each sub-project's example app and poke around for issues?

Just did this, super_editor/example needed a handful of updates and super_text_layout seems unaffected. I think the link function in the toolbar of the super_editor example app is broken on main so I'll spend some more time figuring out a workaround to test that particular functionality later.

First, I'd like to make sure that the API changes in this PR work throughout the mono repo. Can you run all tests locally for all projects to ensure that they pass? I think CI might use the public dependency versions, which won't pick up your changes, but I'm not sure about that.

Done, the tests pass locally. This seems like an issue worth addressing though. I'll open another PR to improve the monorepo setup for CI as well as local development shortly.

@matthew-carroll
Copy link
Contributor

I don't think it would be that useful since most Dart developers should be used to dealing with exclusive end indexes anyway

This is true for spans, but not offsets. I think developers routinely specify, for example, a TextPosition based on a TextSelection, or possibly a TextRange. The TextPosition takes the index in question. I think I'd prefer endIndex over endInclusive and lastIndex. In the case of lastIndex, there's no need to switch from end to last. For endInclusive, that still sounds like it's related to a range rule, when really we're accessing an index.

I'll open another PR to improve the monorepo setup for CI as well as local development shortly.

Can you describe your intentions, first? I'd like to avoid any significant refactoring, if possible.

@jmatth
Copy link
Contributor Author

jmatth commented Jul 6, 2022

I think I'd prefer endIndex over endInclusive and lastIndex.

SGTM

I'll open another PR to improve the monorepo setup for CI as well as local development shortly.

Can you describe your intentions, first? I'd like to avoid any significant refactoring, if possible.

Sure, I'm just planning to use Melos. It's a command-line tool written specifically for managing Dart monorepos. You remove the dependency_overrides sections from the various projects, drop in a config file, and run melos bootstrap once instead of flutter pub get in each package. Melos handles overriding the dependencies between any projects in the monorepo. This makes developing in the monorepo easier, and can be added to the CI run to make tests run with code from the repo instead of pub.dev. It should also make publishing a bit easier since I'm guessing you had to manually remove the dependency_overrides sections to make pub publish happy each time you wanted to publish an update.

If you want to avoid 3rd party tools we can still improve the CI situation by adding a step to each workflow that writes a pubspec_overrides.yaml to disk before running pub get. This is a new way to override dependencies during development that was added in Dart 2.17, and what Melos uses behind the scenes. I'd advocate for using Melos though since it makes creating 5 of these override files for local development a lot easier and has some other features that make monorepo development more convenient.

Edit: here is the diff of what I have so far. I've actually been using Melos for development in this repo for a month or two without committing the file. The changes are mostly to various pubspec.yaml files, I just need to test that the CI run works as expected.

@matthew-carroll
Copy link
Contributor

Sure, I'm just planning to use Melos

Sounds good. I was originally going to try that, but when I discovered the dependency overrides in the pubspec it was good enough at the time. If there are any new commands that we're supposed to run so that Melos does its thing, please add those to the README. Also, can you file an issue for this change and briefly describe what you're about to do, and why?

@jmatth
Copy link
Contributor Author

jmatth commented Jul 7, 2022

Opened #680 and its associated PR, #681.

@jmatth jmatth force-pushed the attributed-span-end-exclusive branch from c75dd9f to 69b50a1 Compare August 1, 2022 03:02
@matthew-carroll
Copy link
Contributor

@jmatth now that we're introducing breaking changes, would you like to refresh this PR? I think we're ready to finalize this and merge it.

@jmatth
Copy link
Contributor Author

jmatth commented Nov 16, 2022

Yeah, I'll rebase from main and ping you when it's ready.

@jmatth jmatth force-pushed the attributed-span-end-exclusive branch from 69b50a1 to 3a7aef0 Compare November 22, 2022 21:08
@jmatth
Copy link
Contributor Author

jmatth commented Nov 22, 2022

@matthew-carroll this should be ready to go. I poked around in the example apps and didn't find any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AttributedText: collapsedSpans returns overly large MultiAttributionSpan for unattributed string
2 participants