Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rmp_serde::Raw considered harmful #305

Open
Lucretiel opened this issue Apr 14, 2022 · 2 comments
Open

rmp_serde::Raw considered harmful #305

Lucretiel opened this issue Apr 14, 2022 · 2 comments

Comments

@Lucretiel
Copy link
Contributor

Lucretiel commented Apr 14, 2022

The unsafe behavior of Raw is trivially shown to be unsound. str is required to be valid UTF-8, and its methods assume that it contains valid UTF-8 for the purpose of, for example, scanning for code points. I'm able to produce UB by creating a simple Serializer that serializes strings as an array of chars, and then feeding it malformed UTF-8 data:

/// Serializer that serializes strings as a list of char
struct CharListSerializer {
    output: Vec<char>,
}

impl ser::Serializer for &mut CharListSerializer {    
    fn serialize_str(self, s: &str) -> Result<Self::Ok, Self::Error> {
        self.output.extend(s.chars());
        Ok(())
    }

    // Other methods simply panic
}

fn main() {
    let mut ser = CharListSerializer { output: Vec::new() };
    let data = "ABC";
    data.serialize(&mut ser).unwrap();
    assert_eq!(ser.output, ['A', 'B', 'C']);
    ser.output.clear();

    // An emoji: 😣
    let data = [0xf0, 0x9f, 0x98, 0xa3];
    let data = Raw::from_utf8(data.as_slice().to_owned());
    data.serialize(&mut ser).unwrap();
    assert_eq!(ser.output, ['😣']);
    ser.output.clear();

    // Uh oh. 0xf0 signals a 4-byte UTF-8 code point,
    // but there are only two bytes in this buffer.
    let data = [0xf0, 0x9f];
    let data = Raw::from_utf8(data.as_slice().to_owned());
    data.serialize(&mut ser).unwrap();
}

When I run this normally, I get a crash:

error: process didn't exit successfully: `target\debug\rust-playground.exe` (exit code: 0xc000001d, STATUS_ILLEGAL_INSTRUCTION)

And when I run with cargo miri run, it immediately detects the undefined behavior:

error: Undefined Behavior: entering unreachable code
   --> C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\hint.rs:51:14
    |
51  |     unsafe { intrinsics::unreachable() }
    |              ^^^^^^^^^^^^^^^^^^^^^^^^^ entering unreachable code
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information

    = note: inside `std::hint::unreachable_unchecked` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\hint.rs:51:14
    = note: inside `std::option::Option::<&u8>::unwrap_unchecked` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\option.rs:877:30
    = note: inside `core::str::validations::next_code_point::<std::slice::Iter<u8>>` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\str\validations.rs:56:27
    = note: inside `<std::str::Chars as std::iter::Iterator>::next` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\str\iter.rs:44:18
    = note: inside `std::vec::Vec::<char>::extend_desugared::<std::str::Chars>` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\vec\mod.rs:2696:35
    = note: inside `<std::vec::Vec<char> as std::vec::spec_extend::SpecExtend<char, std::str::Chars>>::spec_extend` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\vec\spec_extend.rs:18:9
    = note: inside `<std::vec::Vec<char> as std::iter::Extend<char>>::extend::<std::str::Chars>` at C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\alloc\src\vec\mod.rs:2670:9
note: inside `<&mut CharListSerializer as serde::Serializer>::serialize_str` at src\main.rs:86:9
   --> src\main.rs:86:9
    |
86  |         self.output.extend(s.chars());
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    = note: inside `<rmp_serde::Raw as serde::Serialize>::serialize::<&mut CharListSerializer>` at C:\Users\Lucre\.cargo\registry\src\github.com-1ecc6299db9ec823\rmp-serde-1.0.0\src\lib.rs:190:9
note: inside `main` at src\main.rs:211:5
   --> src\main.rs:211:5
    |
211 |     data.serialize(&mut ser).unwrap();
    |     ^^^^^^^^^^^^^^^^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to previous error

error: process didn't exit successfully: `C:\Users\Lucre\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\bin\cargo-miri.exe target\miri\x86_64-pc-windows-msvc\debug\rust-playground.exe` (exit code: 1)

This type should be either:

  • Removed,
  • Redesigned to use serialize_bytes
  • Marked as unsafe and changed to require valid utf-8.

To be frank, it's not at all clear to me why this type exists in the first place? It isn't used by RMP anywhere, and I don't really understand why a user would ever be allowed to call serialize_str on non-UTF8 data, when the entire point of that method is that valid UTF-8 can be assumed. It appears to exist for no other purpose than to create this kind of soundness hole.

My full reproduction code is available as a gist.

@kornelski
Copy link
Collaborator

Thanks for the report. It is indeed unsound.

I think this functionality could be implemented without the invalid transmute. Serde doesn't support specialization directly, but it is possible to hack it by abusing newtype serialization with a custom name. This is hack is already used for ExtSerializer injecting _ExtStruct((_,_)). So similarly, the library could tell serde to serialize invalid UTF-8 string as _InvalidStringStruct(&[u8]) and have msgpack serializer react to this appropriately.

I don't have time to implement this workaround, and I also don't want to break existing users even if they rely on the bad implementation, so for now I'll deprecate the Raw/RawRef newtypes.

@photino
Copy link

photino commented Oct 11, 2023

Any plan for a fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants