Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PaddedString decoding error handling #1073

Open
pompushko opened this issue Mar 8, 2024 · 8 comments
Open

PaddedString decoding error handling #1073

pompushko opened this issue Mar 8, 2024 · 8 comments

Comments

@pompushko
Copy link

Hello

If I have in code something like that PaddedString(11, "ascii"), is it possible error handling behavior like in bytes.decode('ascii', errors='replace')?

Thank you.

@pompushko
Copy link
Author

No any solution :(?

@jpsnyder
Copy link

That's not support since it looks like it just does a call to decode/encode with just the encoding.
https://github.com/construct/construct/blob/master/construct/core.py#L1719

A solution would be to update construct to support it, or to make your own custom adapter.

Alternatively, if you okay with a sneaky hack, you could temporarily change the default "strict" error handler that is used with codecs.register_error https://docs.python.org/3/library/codecs.html#codecs.register_error

import codecs

spec = PaddedString(5, "ascii")

try:
    codecs.register_error("strict", codecs.replace_errors)   # set "strict" to use the "replace" handler
    my_str = spec.parse(b"he\xfflo")
finally:
    codecs.register_error("strict", codecs.strict_errors)   # restore proper handler

@pompushko
Copy link
Author

pompushko commented Mar 14, 2024

Sorry, I'm bad with my git... And bad in python :D
I hope all will work :D

diff --git a/construct/core.py b/construct/core.py
index f56ca16..687d301 100644
--- a/construct/core.py
+++ b/construct/core.py
@@ -1708,15 +1708,16 @@ def encodingunit(encoding):
 class StringEncoded(Adapter):
     """Used internally."""

-    def __init__(self, subcon, encoding):
+    def __init__(self, subcon, encoding, errors):
         super().__init__(subcon)
         if not encoding:
             raise StringError("String* classes require explicit encoding")
         self.encoding = encoding
+        self.errors = errors

-    def _decode(self, obj, context, path):
+    def _decode(self, obj, context, path, errors):
         try:
-            return obj.decode(self.encoding)
+            return obj.decode(self.encoding, errors=errors)
         except:
             raise StringError(f"cannot use encoding {self.encoding!r} to decode {obj!r}")

@@ -1741,7 +1742,7 @@ class StringEncoded(Adapter):
         # return f"({self.subcon._compilebuild(code)}).encode({repr(self.encoding)})"


-def PaddedString(length, encoding):
+def PaddedString(length, encoding, errors='strict'):
     r"""
     Configurable, fixed-length or variable-length string field.

@@ -1765,14 +1766,14 @@ def PaddedString(length, encoding):
         >>> d.parse(_)
         u'Афон'
     """
-    macro = StringEncoded(FixedSized(length, NullStripped(GreedyBytes, pad=encodingunit(encoding))), encoding)
+    macro = StringEncoded(FixedSized(length, NullStripped(GreedyBytes, pad=encodingunit(encoding))), encoding, errors)
     def _emitfulltype(ksy, bitwise):
-        return dict(size=length, type="strz", encoding=encoding)
+        return dict(size=length, type="strz", encoding=encoding, errors=errors)
     macro._emitfulltype = _emitfulltype
     return macro


-def PascalString(lengthfield, encoding):
+def PascalString(lengthfield, encoding, errors='strict'):
     r"""
     Length-prefixed string. The length field can be variable length (such as VarInt) or fixed length (such as Int64ub). :class:`~construct.core.VarInt` is recommended when designing new protocols. Stored length is in bytes, not characters. Size is not defined.

@@ -1789,7 +1790,7 @@ def PascalString(lengthfield, encoding):
         >>> d.parse(_)
         u'Афон'
     """
-    macro = StringEncoded(Prefixed(lengthfield, GreedyBytes), encoding)
+    macro = StringEncoded(Prefixed(lengthfield, GreedyBytes), encoding, errors)

     def _emitparse(code):
         return f"io.read({lengthfield._compileparse(code)}).decode({repr(encoding)})"
@@ -1798,14 +1799,14 @@ def PascalString(lengthfield, encoding):
     def _emitseq(ksy, bitwise):
         return [
             dict(id="lengthfield", type=lengthfield._compileprimitivetype(ksy, bitwise)),
-            dict(id="data", size="lengthfield", type="str", encoding=encoding),
+            dict(id="data", size="lengthfield", type="str", encoding=encoding, errors=errors),
         ]
     macro._emitseq = _emitseq

     return macro


-def CString(encoding):
+def CString(encoding, errors='strict'):
     r"""
     String ending in a terminating null byte (or null bytes in case of UTF16 UTF32).

@@ -1824,14 +1825,14 @@ def CString(encoding):
         >>> d.parse(_)
         u'Афон'
     """
-    macro = StringEncoded(NullTerminated(GreedyBytes, term=encodingunit(encoding)), encoding)
+    macro = StringEncoded(NullTerminated(GreedyBytes, term=encodingunit(encoding)), encoding, errors)
     def _emitfulltype(ksy, bitwise):
-        return dict(type="strz", encoding=encoding)
+        return dict(type="strz", encoding=encoding, errors=errors)
     macro._emitfulltype = _emitfulltype
     return macro


-def GreedyString(encoding):
+def GreedyString(encoding, errors='strict'):
     r"""
     String that reads entire stream until EOF, and writes a given string as-is. Analog to :class:`~construct.core.GreedyBytes` but also applies unicode-to-bytes encoding.

@@ -1848,9 +1849,9 @@ def GreedyString(encoding):
         >>> d.parse(_)
         u'Афон'
     """
-    macro = StringEncoded(GreedyBytes, encoding)
+    macro = StringEncoded(GreedyBytes, encoding, errors)
     def _emitfulltype(ksy, bitwise):
-        return dict(size_eos=True, type="str", encoding=encoding)
+        return dict(size_eos=True, type="str", encoding=encoding, errors=errors)
     macro._emitfulltype = _emitfulltype
     return macro

@jpsnyder
Copy link

Seems good. I would make a PR. But a couple notes on first observation:

  • The errors in StringEncoded should be a keyword argument with a default, like you did the other ones.
  • I would also include the errors argument for the encode direction as well.
  • I would add some tests.

@pompushko
Copy link
Author

Sorry, my knowledge of Python is bad ;(

@pompushko
Copy link
Author

@jpsnyder still need my updated patch? Thank you

@jpsnyder
Copy link

Sorry, I"m not the maintainer for this project. I'm just a lurker.

I would make the PR yourself, best way to learn how to do it :)

@pompushko
Copy link
Author

Sorry, I"m not the maintainer for this project. I'm just a lurker.

I would make the PR yourself, best way to learn how to do it :)

Ohhh :D
Thats hard for me :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants