[stdlib] Fix out of bounds access in `List.index()` #2745

gabrieldemarmiesse · 2024-05-18T20:49:40Z

Related to #2687

There were multiple bugs related to clipping there.

Long story short, the behavior of list.index() in python is this one: given a start and end, python will look for the element in my_list[start:end] and report the result, (start is added to the result to give the index with respect to the original list).

You can take a look at the description of the index method here: https://docs.python.org/3/tutorial/datastructures.html#more-on-lists

Since there is slicing semantics applied to start and end, we should do multiple things:

default to the start and the end of the list
normalize negative values by doing + len(my_list)
clip both start and end between 0 and len(my_list)

The last step wasn't done correctly. Especially for the stop argument where the clipping was applied only when negative values were found (this caused the out of bounds bug in the tests).
Effectively

return end if end > 0 else min(end + size, size)

is equivalent to

return end if end > 0 else end + size

since the min is applied only when end <= 0. So end + size <= size

This test can cause some flakyness in our CI: test_list_a.index(10, start=5, stop=50) for a list of size 6.
The stop was positive, so it was never clipped, thus too many values (out of bounds) were tried, causing some to match, sometimes.

Another bug was that because of this condition, end = 0 means "check until the end of the list" while in python, with the slicing semantics, end = 0 means "do nothing" (empty slice).

TL;DR

Multiple clipping bugs. Slicing semantics applied to start and stop now like in Python. No more flakyness in our CI. No more out of bounds access. Corresponding tests added.

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>

JoeLoser

Nice! I really appreciate the detailed commit message and explanation of the problem(s) at hand. It made this review easy. Thank you for that and fixing the bugs!

JoeLoser · 2024-05-20T21:57:41Z

!sync

modularbot · 2024-05-20T22:58:20Z

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

[External] [stdlib] Fix out of bounds access in `List.index()` Related to #2687 There were multiple bugs related to clipping there. Long story short, the behavior of `list.index()` in python is this one: given a `start` and `end`, python will look for the element in `my_list[start:end]` and report the result, (`start` is added to the result to give the index with respect to the original list). You can take a look at the description of the `index` method here: https://docs.python.org/3/tutorial/datastructures.html#more-on-lists Since there is slicing semantics applied to `start` and `end`, we should do multiple things: 1) default to the start and the end of the list 2) normalize negative values by doing `+ len(my_list)` 3) clip both start and end between `0` and `len(my_list)` The last step wasn't done correctly. Especially for the `stop` argument where the clipping was applied only when negative values were found (this caused the out of bounds bug in the tests). Effectively ```mojo return end if end > 0 else min(end + size, size) ``` is equivalent to ```mojo return end if end > 0 else end + size ``` since the min is applied only when `end <= 0`. So `end + size <= size` This test can cause some flakyness in our CI: `test_list_a.index(10, start=5, stop=50)` for a list of size 6. The stop was positive, so it was never clipped, thus too many values (out of bounds) were tried, causing some to match, sometimes. Another bug was that because of this condition, `end = 0` means "check until the end of the list" while in python, with the slicing semantics, `end = 0` means "do nothing" (empty slice). ### TL;DR Multiple clipping bugs. Slicing semantics applied to `start` and `stop` now like in Python. No more flakyness in our CI. No more out of bounds access. Corresponding tests added. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2745 MODULAR_ORIG_COMMIT_REV_ID: 7bad2830dc41d96ae383fc4a8eac9ca3a69581de

modularbot · 2024-05-21T20:07:49Z

Landed in 387bc03! Thank you for your contribution 🎉

[External] [stdlib] Fix out of bounds access in `List.index()` Related to modularml#2687 There were multiple bugs related to clipping there. Long story short, the behavior of `list.index()` in python is this one: given a `start` and `end`, python will look for the element in `my_list[start:end]` and report the result, (`start` is added to the result to give the index with respect to the original list). You can take a look at the description of the `index` method here: https://docs.python.org/3/tutorial/datastructures.html#more-on-lists Since there is slicing semantics applied to `start` and `end`, we should do multiple things: 1) default to the start and the end of the list 2) normalize negative values by doing `+ len(my_list)` 3) clip both start and end between `0` and `len(my_list)` The last step wasn't done correctly. Especially for the `stop` argument where the clipping was applied only when negative values were found (this caused the out of bounds bug in the tests). Effectively ```mojo return end if end > 0 else min(end + size, size) ``` is equivalent to ```mojo return end if end > 0 else end + size ``` since the min is applied only when `end <= 0`. So `end + size <= size` This test can cause some flakyness in our CI: `test_list_a.index(10, start=5, stop=50)` for a list of size 6. The stop was positive, so it was never clipped, thus too many values (out of bounds) were tried, causing some to match, sometimes. Another bug was that because of this condition, `end = 0` means "check until the end of the list" while in python, with the slicing semantics, `end = 0` means "do nothing" (empty slice). ### TL;DR Multiple clipping bugs. Slicing semantics applied to `start` and `stop` now like in Python. No more flakyness in our CI. No more out of bounds access. Corresponding tests added. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes modularml#2745 MODULAR_ORIG_COMMIT_REV_ID: 7bad2830dc41d96ae383fc4a8eac9ca3a69581de

[External] [stdlib] Fix out of bounds access in `List.index()` Related to #2687 There were multiple bugs related to clipping there. Long story short, the behavior of `list.index()` in python is this one: given a `start` and `end`, python will look for the element in `my_list[start:end]` and report the result, (`start` is added to the result to give the index with respect to the original list). You can take a look at the description of the `index` method here: https://docs.python.org/3/tutorial/datastructures.html#more-on-lists Since there is slicing semantics applied to `start` and `end`, we should do multiple things: 1) default to the start and the end of the list 2) normalize negative values by doing `+ len(my_list)` 3) clip both start and end between `0` and `len(my_list)` The last step wasn't done correctly. Especially for the `stop` argument where the clipping was applied only when negative values were found (this caused the out of bounds bug in the tests). Effectively ```mojo return end if end > 0 else min(end + size, size) ``` is equivalent to ```mojo return end if end > 0 else end + size ``` since the min is applied only when `end <= 0`. So `end + size <= size` This test can cause some flakyness in our CI: `test_list_a.index(10, start=5, stop=50)` for a list of size 6. The stop was positive, so it was never clipped, thus too many values (out of bounds) were tried, causing some to match, sometimes. Another bug was that because of this condition, `end = 0` means "check until the end of the list" while in python, with the slicing semantics, `end = 0` means "do nothing" (empty slice). ### TL;DR Multiple clipping bugs. Slicing semantics applied to `start` and `stop` now like in Python. No more flakyness in our CI. No more out of bounds access. Corresponding tests added. Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com> Closes #2745 MODULAR_ORIG_COMMIT_REV_ID: 7bad2830dc41d96ae383fc4a8eac9ca3a69581de

gabrieldemarmiesse added 3 commits May 18, 2024 20:49

[stdlib] Fix out of bounds access in List.index()

6eeae95

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>

Add more tests

f284c5d

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>

Add more tests

d029edf

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>

gabrieldemarmiesse marked this pull request as ready for review May 18, 2024 21:16

gabrieldemarmiesse requested a review from a team as a code owner May 18, 2024 21:16

gabrieldemarmiesse mentioned this pull request May 19, 2024

[BUG] [stdlib] [collections] [list] List test sometimes fails sometimes doesn't in nightly 2024.5.1805 #2738

Closed

JoeLoser self-assigned this May 20, 2024

JoeLoser approved these changes May 20, 2024

View reviewed changes

modularbot added the imported-internally Signals that a given pull request has been imported internally. label May 20, 2024

modularbot added merged-internally Indicates that this pull request has been merged internally merged-externally Merged externally in public mojo repo labels May 20, 2024

modularbot closed this May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stdlib] Fix out of bounds access in `List.index()` #2745

[stdlib] Fix out of bounds access in `List.index()` #2745

gabrieldemarmiesse commented May 18, 2024 •

edited

JoeLoser left a comment

JoeLoser commented May 20, 2024

modularbot commented May 20, 2024

modularbot commented May 21, 2024

[stdlib] Fix out of bounds access in List.index() #2745

[stdlib] Fix out of bounds access in List.index() #2745

Conversation

gabrieldemarmiesse commented May 18, 2024 • edited

TL;DR

JoeLoser left a comment

Choose a reason for hiding this comment

JoeLoser commented May 20, 2024

modularbot commented May 20, 2024

modularbot commented May 21, 2024

[stdlib] Fix out of bounds access in `List.index()` #2745

[stdlib] Fix out of bounds access in `List.index()` #2745

gabrieldemarmiesse commented May 18, 2024 •

edited