Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization/Deserialization vs strings with funny characters #846

Closed
mishun opened this issue Sep 9, 2023 · 8 comments
Closed

Serialization/Deserialization vs strings with funny characters #846

mishun opened this issue Sep 9, 2023 · 8 comments
Labels

Comments

@mishun
Copy link

mishun commented Sep 9, 2023

Hello again!

Unlike #845, following works with JsonCompatible() but fails without it:

using System.Diagnostics;
using YamlDotNet.Serialization;

public class Program
{
    public static void Main(string[] argv)
    {
        var ser = new SerializerBuilder().Build();
        var des = new DeserializerBuilder().Build();

        var src = "~";
        var yaml = ser.Serialize(src);
        var dst = des.Deserialize<string>(yaml);

        Debug.Assert(src == dst);
    }
}

It also fails for "\r" and "\t" strings.

Tested with:

<PackageReference Include="YamlDotNet" Version="13.3.1" />
@EdwardCooke
Copy link
Collaborator

For the tilde that is a representation of null. It’s very possible it will return back an empty string or one with an empty line.

The other two you mentioned is white space that will also be treated as nothing so the result would be an empty string.

that being said I haven’t actually ran it on a computer yet so I’m not 100% certain. What are you seeing and expecting?

@mishun
Copy link
Author

mishun commented Sep 12, 2023

Yes, des.Deserialize<string>("~") == null.

Currently:

var ser = new SerializerBuilder().Build();
Debug.Assert(ser.Serialize("~") == "~\n");
Debug.Assert(ser.Serialize("\t") == "\t\n");
Debug.Assert(ser.Serialize("\r") == ">2\n\n");

I'd expect something more like:

Debug.Assert(ser.Serialize("~") == "\"\\x7E\"\n");
Debug.Assert(ser.Serialize("\t") == "\"\\t\"\n");
Debug.Assert(ser.Serialize("\r") == "\"\\r\"\n");

perhaps? Or for tilde to be encoded just as tilde and null string --- with explicit tag?

@EdwardCooke
Copy link
Collaborator

When you instantiate the serializerbuilder cal withquotenecessarystrings.

@mishun
Copy link
Author

mishun commented Sep 14, 2023

Thank you! That's much better:

var ser = new SerializerBuilder().WithQuotingNecessaryStrings().Build();
Console.WriteLine(ser.Serialize("\x0d"));

-->

"\r"

Unfortunately:

var ser = new SerializerBuilder().WithQuotingNecessaryStrings().Build();
Console.WriteLine(ser.Serialize("\x0d\x61"));

--->

>2-

  a

(0x61 is a code for 'a')

If you're wondering where do I get these annoying examples, I'm using QuickCheck variation for .NET.

@EdwardCooke
Copy link
Collaborator

Now that my laptop is up and running again I got to look at this.
I was able to narrow it down to differences in line endings. Windows is (0A0D) and Linux is (0A).
Using 0D is also valid, but things will get weird, as you're seeing.

Using the correct line endings things work as expected since it deserializes to the correct value.

End

The reason we output \r when that's the only character is due to the underlying emitter, when the scalar type is quoted (which quote necessary strings will set a string containing only line breaks, whitespace and other special characters) it will replace the special characters with escape codes. You can see where it does the escaping and what characters are escaped here

if (!IsPrintable(character) || IsBreak(character, out _) || character == '"' || character == '\\')

Since the second test, with the letter a on it, doesn't need to be quoted then it can be output using an empty line at the beginning as you saw.

To force it to default to double quoting where your use case will always pass, you can use .WithDefaultScalarStyle(YamlDotNet.Core.ScalarStyle.DoubleQuoted) instead of QuoteNecessaryStrings() like I suggested on the SerializerBuilder and it will pass. But, everything will default to double quotes and entries with new lines will become difficult to read.

If you want to only apply this to a specific property/field on an object, you can use the YamlMember attribute on that property/field and set the ScalarStyle to DoubleQuoted.

Here's the code I used to validate that this will work with 0A and 0D line endings.

using YamlDotNet.Serialization;

var str = new[] { "\x0a", "\x0a\x61", "\x0d", "\x0d\x61" };

Console.WriteLine("============================");
Console.WriteLine("Testing direct string");

foreach (var s in str)
{
    Test(s);
}

Console.WriteLine("============================");
Console.WriteLine("Testing class yamlmember");

foreach (var s in str)
{
    Test1(s);
}

void Test(string value)
{
    var serializer = new SerializerBuilder().WithDefaultScalarStyle(YamlDotNet.Core.ScalarStyle.DoubleQuoted).Build();
    var deserializer = new DeserializerBuilder().Build();
    var serialized = serializer.Serialize(value);
    var deserialized = deserializer.Deserialize<string>(serialized);
    Console.WriteLine("------");
    Console.WriteLine("Testing:");
    Console.Write(value);
    Console.WriteLine("---Serialized:");
    Console.WriteLine(serialized);
    Console.WriteLine("Deserialized:");
    Console.Write(deserialized);
    Console.WriteLine("---Matches:");
    Console.WriteLine(deserialized == value);
}

void Test1(string value)
{
    var tc = new TestClass {  X = value };
    var serializer = new SerializerBuilder().Build();
    var deserializer = new DeserializerBuilder().Build();
    var serialized = serializer.Serialize(tc);
    var deserialized = deserializer.Deserialize<TestClass>(serialized);
    Console.WriteLine("------");
    Console.WriteLine("Testing:");
    Console.Write(value);
    Console.WriteLine("---Serialized:");
    Console.WriteLine(serialized);
    Console.WriteLine("Deserialized:");
    Console.Write(deserialized);
    Console.WriteLine("---Matches:");
    Console.WriteLine(deserialized.X == value);
}

class TestClass
{
    [YamlMember(ScalarStyle = YamlDotNet.Core.ScalarStyle.DoubleQuoted)]
    public string X { get; set; } = string.Empty;
}

@EdwardCooke
Copy link
Collaborator

Did that answer your question?

@mishun
Copy link
Author

mishun commented Oct 4, 2023

Sorry, got distracted.
Indeed, it seems to work with DoubleQuoted, thank you!

@EdwardCooke
Copy link
Collaborator

Fantastic. I’m going to close this issue then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants