Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UWordBoundIndices doesn't expose the indices #35

Open
wez opened this issue Feb 9, 2018 · 4 comments
Open

UWordBoundIndices doesn't expose the indices #35

wez opened this issue Feb 9, 2018 · 4 comments

Comments

@wez
Copy link

wez commented Feb 9, 2018

As far as I can tell, UWordBoundIndices is just a wrapper around UWordBounds with an identical interface.

In my use case I have a line of text and an index into the .chars() of that string from a mouse double click and I need to obtain the indices of the start and end of the word that enclose that index.

It seemed to me that UWordBoundIndices is what I'd want here, but I don't see how to use it for this purpose. Is this an oversight, or is there a better way to do get the result I'd like?

@wez
Copy link
Author

wez commented Feb 9, 2018

@tapeinosyne
Copy link

The UWordBoundIndices iterator definitely yields word indices, and the docs don't appear stale. However, you are right in that the current interface isn't suitable for identifying word boundaries from random access.

Graphemes suffered the same issue prior to the introduction of a cursor API in #21, and I suppose that word segmentation could be similarly updated.

@wez
Copy link
Author

wez commented Feb 12, 2018

The problem I had was that that critical portion of the docs on that page:

type Item = (usize, &'a str)

is buried a bit further down in the page (that's just how they render), so I was left to fixate on the as_str() method. Would you mind expanding the doc comment to something like this to make it a little clearer?

External iterator for word boundaries and byte offsets.
Yields (usize, &str), the byte offset and string slice for each word.

I would love to have an API directed at random access! I have this somewhat clunky solution for the moment:

  for (x, word) in line.split_word_bound_indices() { 
     if event.x < x {
        break;
     }
     if event.x <= x + word.len() {
        // this is the matching word
       return;
     }
  }

@tapeinosyne
Copy link

that critical portion of the docs […] is buried a bit further down in the page (that's just how they render), so I was left to fixate on the as_str() method. Would you mind expanding the doc comment to something like this to make it a little clearer?

Yep, it can be pretty easy to miss things. Trait impls often look a bit lost in the rendered page, and the convention established by the standard library is that the behavior of iterators is documented on their builder method rather than the struct itself. I'll add a comment.

I would love to have an API directed at random access! I have this somewhat clunky solution for the moment:

I'd be happy to work on it, but before that I wouldn't mind seeing some consolidation between the unicode-rs organization and the recent, seemingly more active unic. That's a conversation that should be started, although not here. @Manishearth, could I maybe ping you on IRC to get a sense of where we stand, or would you rather I opened an issue/forum thread directly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants