Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in has_duplicates.py #102

Open
bigabdoul opened this issue Oct 16, 2022 · 1 comment
Open

Bug in has_duplicates.py #102

bigabdoul opened this issue Oct 16, 2022 · 1 comment

Comments

@bigabdoul
Copy link

The two implementations of has_duplicates (https://github.com/AllenDowney/ThinkPython2/blob/master/code/has_duplicates.py) contain a bug when called with specific arguments. Since dictionary keys must be immutable (hashable types), if we pass a nested list as an argument, both functions will crash:

t = [1, 2, 3, [4, 5]]
print(has_duplicates(t))

The same is true for a set. To avoid this, you should convert the key to an immutable type (i.e., a string).

def has_duplicates(t):
    """Checks whether any element appears more than once in a sequence.
    Simple version using a for loop.
    t: sequence
    """
    d = {}
    for x in t:
        if str(x) in d:
            return True
        d[str(x)] = True
    return False

I don't even know if there is a fix for the second version has_duplicates2.

@AllenDowney
Copy link
Owner

Good point. As you said, this function will only work if the items in the sequence are hashable. Converting them to strings is not a bad idea, but it introduces the unexpected behavior that the integer 1 and the string '1' would be considered duplicates. So it might be best to change the specification of the function to indicate that it is only expected to work if the items are hashable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants