Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird failure on unicode/windows or bytes/linux build #165

Open
pombredanne opened this issue Mar 7, 2022 · 3 comments
Open

Weird failure on unicode/windows or bytes/linux build #165

pombredanne opened this issue Mar 7, 2022 · 3 comments
Milestone

Comments

@pombredanne
Copy link
Collaborator

The windows tests with a unicode build and the linux tests with a non-unicode are failing this test:

On bytes/linux:

_____________________________________________________ TestTrieIterators.test_items _____________________________________________________

self = <test_unit.TestTrieIterators testMethod=test_items>

    def test_items(self):
        A = self.A
        I = []
        for i, w in enumerate(self.words):
            A.add_word(conv(w), i + 1)
            I.append((conv(w), i + 1))
    
        L = [x for x in A.items()]
        self.assertEqual(len(L), len(I))
>       self.assertEqual(set(L), set(I))
E       AssertionError: Items in the first set but not the second:
E       (b'a\x00h', 3)
E       (b'p\x00y\x00t\x00', 2)
E       (b'c\x00o\x00r\x00a\x00', 4)
E       (b'w\x00o\x00', 1)
E       Items in the second set but not the first:
E       (b'word', 1)
E       (b'python', 2)
E       (b'aho', 3)
E       (b'corasick', 4)

tests/test_unit.py:431: AssertionError

on windows/unicode:

 ________________________ TestTrieIterators.test_items _________________________
  
  self = <test_unit.TestTrieIterators testMethod=test_items>
  
      def test_items(self):
          A = self.A
          I = []
          for i, w in enumerate(self.words):
              A.add_word(conv(w), i + 1)
              I.append((conv(w), i + 1))
      
          L = [x for x in A.items()]
          self.assertEqual(len(L), len(I))
  >       self.assertEqual(set(L), set(I))
  E       AssertionError: Items in the first set but not the second:
  E       ('w\x00o\x00', 1)
  E       ('p\x00y\x00t\x00', 2)
  E       ('a\x00h', 3)
  E       ('c\x00o\x00r\x00a\x00', 4)
  E       Items in the second set but not the first:
  E       ('corasick', 4)
  E       ('python', 2)
  E       ('aho', 3)
  E       ('word', 1)
  
  D:\a\pyahocorasick\pyahocorasick\tests\test_unit.py:422: AssertionError

I wonder if this is because there are some narrow vs. wide Python unicode builds done on windows?

@pombredanne
Copy link
Collaborator Author

It feels as if a null was being injected after each letter and as if Windows was built with bytes and not the unicode define.

@WojciechMula
Copy link
Owner

It feels as if a null was being injected after each letter and as if Windows was built with bytes and not the unicode define.

True, it looks as you described. I have no windows machine to check this.

@pombredanne
Copy link
Collaborator Author

pombredanne commented Mar 8, 2022

True, it looks as you described. I have no windows machine to check this.

No worries! I am looking into this with tests ... and will push some investigation in my WIP branch for 2.0
The issue is also on Linux FWIW

pombredanne added a commit that referenced this issue Mar 8, 2022
The behaviour with bytes build has some issues. This helps testing this

Reference: #65
Reference: #165
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Mar 8, 2022
The environment variables AHOCORASICK_UNICODE and AHOCORASICK_BYTES now
drive the flavor of the build if defined (using any value).

Reference: #65
Reference: #165
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Mar 8, 2022
Use environment variables AHOCORASICK_UNICODE and AHOCORASICK_BYTES
to test vboth builds on all supported OSes.

Reference: #65
Reference: #165
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne pombredanne added this to the v3.0 milestone Jan 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants