`UWordBoundIndices` doesn't expose the indices #35

wez · 2018-02-09T16:40:52Z

As far as I can tell, UWordBoundIndices is just a wrapper around UWordBounds with an identical interface.

In my use case I have a line of text and an index into the .chars() of that string from a mouse double click and I need to obtain the indices of the start and end of the word that enclose that index.

It seemed to me that UWordBoundIndices is what I'd want here, but I don't see how to use it for this purpose. Is this an oversight, or is there a better way to do get the result I'd like?

The text was updated successfully, but these errors were encountered:

wez · 2018-02-09T16:51:07Z

Oh, is it just that the docs at https://unicode-rs.github.io/unicode-segmentation/unicode_segmentation/struct.UWordBoundIndices.html are stale?

tapeinosyne · 2018-02-12T12:59:09Z

The UWordBoundIndices iterator definitely yields word indices, and the docs don't appear stale. However, you are right in that the current interface isn't suitable for identifying word boundaries from random access.

Graphemes suffered the same issue prior to the introduction of a cursor API in #21, and I suppose that word segmentation could be similarly updated.

wez · 2018-02-12T16:12:44Z

The problem I had was that that critical portion of the docs on that page:

type Item = (usize, &'a str)

is buried a bit further down in the page (that's just how they render), so I was left to fixate on the as_str() method. Would you mind expanding the doc comment to something like this to make it a little clearer?

External iterator for word boundaries and byte offsets.
Yields (usize, &str), the byte offset and string slice for each word.

I would love to have an API directed at random access! I have this somewhat clunky solution for the moment:

  for (x, word) in line.split_word_bound_indices() { 
     if event.x < x {
        break;
     }
     if event.x <= x + word.len() {
        // this is the matching word
       return;
     }
  }

tapeinosyne · 2018-02-17T14:35:27Z

that critical portion of the docs […] is buried a bit further down in the page (that's just how they render), so I was left to fixate on the as_str() method. Would you mind expanding the doc comment to something like this to make it a little clearer?

Yep, it can be pretty easy to miss things. Trait impls often look a bit lost in the rendered page, and the convention established by the standard library is that the behavior of iterators is documented on their builder method rather than the struct itself. I'll add a comment.

I would love to have an API directed at random access! I have this somewhat clunky solution for the moment:

I'd be happy to work on it, but before that I wouldn't mind seeing some consolidation between the unicode-rs organization and the recent, seemingly more active unic. That's a conversation that should be started, although not here. @Manishearth, could I maybe ping you on IRC to get a sense of where we stand, or would you rather I opened an issue/forum thread directly?

tapeinosyne mentioned this issue Feb 17, 2018

Docs: clarify behavior of UWordBoundIndices #36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`UWordBoundIndices` doesn't expose the indices #35

`UWordBoundIndices` doesn't expose the indices #35

wez commented Feb 9, 2018

wez commented Feb 9, 2018

tapeinosyne commented Feb 12, 2018

wez commented Feb 12, 2018

tapeinosyne commented Feb 17, 2018

UWordBoundIndices doesn't expose the indices #35

UWordBoundIndices doesn't expose the indices #35

Comments

wez commented Feb 9, 2018

wez commented Feb 9, 2018

tapeinosyne commented Feb 12, 2018

wez commented Feb 12, 2018

tapeinosyne commented Feb 17, 2018

`UWordBoundIndices` doesn't expose the indices #35

`UWordBoundIndices` doesn't expose the indices #35