Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Fix out of bounds access in List.index() #2745

Conversation

gabrieldemarmiesse
Copy link
Contributor

@gabrieldemarmiesse gabrieldemarmiesse commented May 18, 2024

Related to #2687

There were multiple bugs related to clipping there.

Long story short, the behavior of list.index() in python is this one: given a start and end, python will look for the element in my_list[start:end] and report the result, (start is added to the result to give the index with respect to the original list).

You can take a look at the description of the index method here: https://docs.python.org/3/tutorial/datastructures.html#more-on-lists

Since there is slicing semantics applied to start and end, we should do multiple things:

  1. default to the start and the end of the list
  2. normalize negative values by doing + len(my_list)
  3. clip both start and end between 0 and len(my_list)

The last step wasn't done correctly. Especially for the stop argument where the clipping was applied only when negative values were found (this caused the out of bounds bug in the tests).
Effectively

return end if end > 0 else min(end + size, size)

is equivalent to

return end if end > 0 else end + size

since the min is applied only when end <= 0. So end + size <= size

This test can cause some flakyness in our CI: test_list_a.index(10, start=5, stop=50) for a list of size 6.
The stop was positive, so it was never clipped, thus too many values (out of bounds) were tried, causing some to match, sometimes.

Another bug was that because of this condition, end = 0 means "check until the end of the list" while in python, with the slicing semantics, end = 0 means "do nothing" (empty slice).

TL;DR

Multiple clipping bugs. Slicing semantics applied to start and stop now like in Python. No more flakyness in our CI. No more out of bounds access. Corresponding tests added.

Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Signed-off-by: gabrieldemarmiesse <gabrieldemarmiesse@gmail.com>
Copy link
Collaborator

@JoeLoser JoeLoser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I really appreciate the detailed commit message and explanation of the problem(s) at hand. It made this review easy. Thank you for that and fixing the bugs!

@JoeLoser
Copy link
Collaborator

!sync

@modularbot modularbot added the imported-internally Signals that a given pull request has been imported internally. label May 20, 2024
@modularbot
Copy link
Collaborator

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

@modularbot modularbot added merged-internally Indicates that this pull request has been merged internally merged-externally Merged externally in public mojo repo labels May 20, 2024
modularbot pushed a commit that referenced this pull request May 21, 2024
[External] [stdlib] Fix out of bounds access in `List.index()`

Related to #2687

There were multiple bugs related to clipping there.

Long story short, the behavior of `list.index()` in python is this one:
given a `start` and `end`, python will look for the element in
`my_list[start:end]` and report the result, (`start` is added to the
result to give the index with respect to the original list).

You can take a look at the description of the `index` method here:
https://docs.python.org/3/tutorial/datastructures.html#more-on-lists

Since there is slicing semantics applied to `start` and `end`, we should
do multiple things:
1) default to the start and the end of the list
2) normalize negative values by doing `+ len(my_list)`
3) clip both start and end between `0` and `len(my_list)`

The last step wasn't done correctly. Especially for the `stop` argument
where the clipping was applied only when negative values were found
(this caused the out of bounds bug in the tests).
Effectively

```mojo
return end if end > 0 else min(end + size, size)
```
 is equivalent to
```mojo
return end if end > 0 else end + size
```
since the min is applied only when `end <= 0`. So `end + size <= size`

This test can cause some flakyness in our CI: `test_list_a.index(10,
start=5, stop=50)` for a list of size 6.
The stop was positive, so it was never clipped, thus too many values
(out of bounds) were tried, causing some to match, sometimes.

Another bug was that because of this condition, `end = 0` means "check
until the end of the list" while in python, with the slicing semantics,
`end = 0` means "do nothing" (empty slice).

### TL;DR
Multiple clipping bugs. Slicing semantics applied to `start` and `stop`
now like in Python. No more flakyness in our CI. No more out of bounds
access. Corresponding tests added.

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes #2745
MODULAR_ORIG_COMMIT_REV_ID: 7bad2830dc41d96ae383fc4a8eac9ca3a69581de
@modularbot
Copy link
Collaborator

Landed in 387bc03! Thank you for your contribution 🎉

@modularbot modularbot closed this May 21, 2024
martinvuyk pushed a commit to martinvuyk/mojo that referenced this pull request May 24, 2024
[External] [stdlib] Fix out of bounds access in `List.index()`

Related to modularml#2687

There were multiple bugs related to clipping there.

Long story short, the behavior of `list.index()` in python is this one:
given a `start` and `end`, python will look for the element in
`my_list[start:end]` and report the result, (`start` is added to the
result to give the index with respect to the original list).

You can take a look at the description of the `index` method here:
https://docs.python.org/3/tutorial/datastructures.html#more-on-lists

Since there is slicing semantics applied to `start` and `end`, we should
do multiple things:
1) default to the start and the end of the list
2) normalize negative values by doing `+ len(my_list)`
3) clip both start and end between `0` and `len(my_list)`

The last step wasn't done correctly. Especially for the `stop` argument
where the clipping was applied only when negative values were found
(this caused the out of bounds bug in the tests).
Effectively

```mojo
return end if end > 0 else min(end + size, size)
```
 is equivalent to
```mojo
return end if end > 0 else end + size
```
since the min is applied only when `end <= 0`. So `end + size <= size`

This test can cause some flakyness in our CI: `test_list_a.index(10,
start=5, stop=50)` for a list of size 6.
The stop was positive, so it was never clipped, thus too many values
(out of bounds) were tried, causing some to match, sometimes.

Another bug was that because of this condition, `end = 0` means "check
until the end of the list" while in python, with the slicing semantics,
`end = 0` means "do nothing" (empty slice).

### TL;DR
Multiple clipping bugs. Slicing semantics applied to `start` and `stop`
now like in Python. No more flakyness in our CI. No more out of bounds
access. Corresponding tests added.

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes modularml#2745
MODULAR_ORIG_COMMIT_REV_ID: 7bad2830dc41d96ae383fc4a8eac9ca3a69581de
modularbot pushed a commit that referenced this pull request Jun 7, 2024
[External] [stdlib] Fix out of bounds access in `List.index()`

Related to #2687

There were multiple bugs related to clipping there.

Long story short, the behavior of `list.index()` in python is this one:
given a `start` and `end`, python will look for the element in
`my_list[start:end]` and report the result, (`start` is added to the
result to give the index with respect to the original list).

You can take a look at the description of the `index` method here:
https://docs.python.org/3/tutorial/datastructures.html#more-on-lists

Since there is slicing semantics applied to `start` and `end`, we should
do multiple things:
1) default to the start and the end of the list
2) normalize negative values by doing `+ len(my_list)`
3) clip both start and end between `0` and `len(my_list)`

The last step wasn't done correctly. Especially for the `stop` argument
where the clipping was applied only when negative values were found
(this caused the out of bounds bug in the tests).
Effectively

```mojo
return end if end > 0 else min(end + size, size)
```
 is equivalent to
```mojo
return end if end > 0 else end + size
```
since the min is applied only when `end <= 0`. So `end + size <= size`

This test can cause some flakyness in our CI: `test_list_a.index(10,
start=5, stop=50)` for a list of size 6.
The stop was positive, so it was never clipped, thus too many values
(out of bounds) were tried, causing some to match, sometimes.

Another bug was that because of this condition, `end = 0` means "check
until the end of the list" while in python, with the slicing semantics,
`end = 0` means "do nothing" (empty slice).

### TL;DR
Multiple clipping bugs. Slicing semantics applied to `start` and `stop`
now like in Python. No more flakyness in our CI. No more out of bounds
access. Corresponding tests added.

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes #2745
MODULAR_ORIG_COMMIT_REV_ID: 7bad2830dc41d96ae383fc4a8eac9ca3a69581de
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported-internally Signals that a given pull request has been imported internally. merged-externally Merged externally in public mojo repo merged-internally Indicates that this pull request has been merged internally
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants