Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Information gap in suffix array article #1048

Open
bruisedsamurai opened this issue Feb 25, 2023 · 2 comments
Open

Information gap in suffix array article #1048

bruisedsamurai opened this issue Feb 25, 2023 · 2 comments

Comments

@bruisedsamurai
Copy link
Contributor

After we use count sort. It's not clear as why are we comparing strings and what part of strings are we comparing with.

image

Moreover, how $2^k$ value came into the picture too is still a mystery. How are we making sure strings are of length $2^k$?

@jakobkogler
Copy link
Member

jakobkogler commented Feb 26, 2023

We look at the strings in a cyclic form.
So there are substrings of arbitrary lenght here. E.g. if we have the string "abcde" and we take the substring of length 8 starting at index 1 we get "bcdeabcd".

And the goal is, to sort all shifts, which is equivalent to sorting the cyclic substrings of length $2^k$ with $2^k >= n$, e.g. the sorted order of all substrings of length 8 for an array of length 5.

I hope that clears everything up. Tell us, if we should improve the article more. But I think it already tell all those ideas that I wrote.

@bruisedsamurai
Copy link
Contributor Author

@jakobkogler That explains some of it. I assume we take $2^k$ so that later we can compare $2^{k-1}$ substrings.

I think a concrete example with comparisons of two strings (that includes sentinel '$')will definitely help here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants