Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't deserialize entire file #317

Open
StuartHadfield opened this issue Oct 14, 2022 · 8 comments
Open

Can't deserialize entire file #317

StuartHadfield opened this issue Oct 14, 2022 · 8 comments

Comments

@StuartHadfield
Copy link

StuartHadfield commented Oct 14, 2022

I can't deserialize an entire file because the Deserializer does not implement into_iter as other serde libraries do.

How can I get around this?

Code thus far is:

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let file_path = "./src/foo.msgpack";
    let reader = BufReader::new(File::open(file_path).unwrap());
    let writer = BufWriter::new(File::create("./src/results.json").unwrap());

    let mut deserializer = rmp_serde::Deserializer::from_read(reader);

    // let mut serializer = serde_json::Serializer::new(io::stdout());
    let mut serializer = serde_json::Serializer::pretty(writer);

    serde_transcode::transcode(&mut deserializer, &mut serializer).unwrap();
    serializer.into_inner().flush().unwrap();

    Ok(())
}
@kornelski
Copy link
Collaborator

How can I get around this?

Make a PR that adds into_inner

@StuartHadfield
Copy link
Author

@kornelski 🤔 do you mean into_iter, not into_inner?

@StuartHadfield
Copy link
Author

(I'm happy to have a bash, but I'm a real newbie to Rust, so not sure I'll manage haha)

@kornelski
Copy link
Collaborator

I assume you mean into_inner, because Iterator doesn't make sense here.

@StuartHadfield
Copy link
Author

StuartHadfield commented Oct 17, 2022

Ah... Hmmm 🤔 What does into_inner look like?

I thought making an iterator - because that seems to be how Python's msgpack implementation works (https://github.com/msgpack/msgpack-python/blob/500a238028bdebe123b502b07769578b5f0e8a3a/msgpack/_unpacker.pyx#L539-L540).

into_inner conventionally just returns the wrapped object, right? So we'd return the Reader? Which means we can...?

Also - into_inner is already implemented for Deserializer

@kornelski
Copy link
Collaborator

In that case I'm completely confused about what you want.

Serde fundamentally creates a single object of a given type. There is nothing to iterate in the decoder. Even if you deserialize a vector, you iterate the vector, not the decoder.

I thought you meant into_inner that returns the io::Reader so that you can recycle it for other I/O operations. That's not related to iteration.

@StuartHadfield
Copy link
Author

Ah - okay - let me clarify.

If you have serialized the following array of objects into msgpack:

{
  "foo": "bar"
},
{
  "lorem": "ipsum"
}

We should be able to read all of them - out of a file stream. However, once serde_rmp reaches the end of the first object (probably some delineating character?), it concludes decoding, despite the fact there's loads of information still to be read out of the buffer. You can actually see this if you print out the bytes read by fs::read vs what's decoded by rmp_serde.

I thought about into_iter after seeing it in the json implementation of serde - https://docs.rs/serde_json/latest/serde_json/de/struct.Deserializer.html#method.into_iter.

Does that make any more sense @kornelski ?

@kornelski
Copy link
Collaborator

I don't think that's a correct usage of serde. Serde is a type-based one-shot deserializer, not a streaming deserializer. It gives you one and exactly one object of the type you've requested. If you've requested a single struct, that's all you will ever get. Two objects next to each other is not a type. If you have multiple objects to deserialize with serde, the deserialize them all into a single Vec<Object>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants