Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asymmetric handling of null String values in TextColumn and StringColumn #1236

Open
maxhillaert opened this issue Oct 3, 2023 · 1 comment

Comments

@maxhillaert
Copy link

maxhillaert commented Oct 3, 2023

Consider the java test below.

    @Test
    public void assymetricNullHandling() {
        Table t1 = Table.create(TextColumn.create("str1"), TextColumn.create("str2"));
        Row row = t1.appendRow();
        row.setString("str1", null);
        row.setString("str2", "");
        StringWriter w = new StringWriter();
        t1.write().csv(w);
        String csv = w.toString();
        Table t2 = Table.read().csv(CsvReadOptions.builder(new StringReader(csv)).columnTypesToDetect(List.of(ColumnType.TEXT)).build());
        assertThat(t2.getString(0, "str1"), equalTo(null)); // fails as it's empty string ""
        assertThat(t2.getString(0, "str2"), equalTo("")); // succeed
    }

Null values are written as "" and read back as empty string "" , which is the missingValueIndicator for the StringParser.
This kind of asymmetry where both null and "" are written as "" destroys the semantical difference between an empty value or a non existent value.
Semantically, empty string is not a missing value per se. Empty string has to be distinguished from null.

The missingValueIndicator in StringParser is always "", so i don't see a way around this without just doing this handling myself by using "" or something in my own converters.

Am I missing something?

@ccleva
Copy link
Contributor

ccleva commented Dec 15, 2023

Hi @maxhillaert. You are not missing anything.

If your use case requires to distinguish between empty strings and null strings, you have to manage this on your side by using a marker string for null values and workaround it.

Watch out as some common null marker strings are treated as empty values so you would run into the same issue. See #1244 for more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants