Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uncheckedGetField returning null for null values, not empty byte array #6

Open
carlaustin opened this issue May 16, 2014 · 3 comments

Comments

@carlaustin
Copy link

In the method uncheckedGetField there is a test if the value is null (val == null) and if it is the method returns null.
While this works to evaluate the correct results, the null returned causes the previous valid value for this field for the previous row of data to take place of the null in display (only when using a predicate it seems).
Changing this to return an empty byte array (new byte[0]) fixes this issue.

e.g. for 2 rows of data
rowID | a | b

id1 | 1 | 1
id2 | 2 | NULL

when doing a select with predicate, e.g. select * from table where b IS NULL you will actually see in HIVE as the result

id2 | 2 | 1

Hard to explain in text, however I can confirm this change fixes the issue.

@carlaustin
Copy link
Author

Scratch that, it only works that way when I apply my custom iterators. The issue still exists without so I will try to track it down and will post when/if I've confirmed a solution that also works without my custom iterators (which are interpreting the empty byte array as a null, hence it appears to solve it).
Sorry about the confusion but there does seem to be a genuine issue here.

@carlaustin
Copy link
Author

An update in the hope someone reading may have an idea:

The issue only occurs when a MapReduce job is required because the AccumuloSerde isn't used for the final hive side bit, rather the LazySimpleSerDe is. Because LazyAccumuloRow.uncheckedGetField, when val == null, doesn't init the field as well as returning null. The field should be initialized with the null sequence as so:

                byte[] nullBytes = this.getInspector().getNullSequence().getBytes();
                ref = new ByteArrayRef();
                ref.setData(nullBytes);
                getFields()[id].init(ref, 0, nullBytes.length);
                return null;

Unfortunately the null sequence expected by Hive is "\N" and doing this with String type fields causes them to be escaped to "\N" prior to reaching the LazyStruct on the hive client side, and thus it is no longer treated as a null but a the string "\N".

I've spent quite a bit of time trying to figure out a way around this and have come up blank so far.

This is easy to recreate by creating a table in accumulo with rows missing key/value pairs (effectively null) and then mapping those to an external table as string columns. Then any statement that causes a MapReduce will display the nulls incorrectly. I will continue to track this down unless you have any ideas?

@carlaustin
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant