uncheckedGetField returning null for null values, not empty byte array #6

carlaustin · 2014-05-16T16:56:52Z

In the method uncheckedGetField there is a test if the value is null (val == null) and if it is the method returns null.
While this works to evaluate the correct results, the null returned causes the previous valid value for this field for the previous row of data to take place of the null in display (only when using a predicate it seems).
Changing this to return an empty byte array (new byte[0]) fixes this issue.

e.g. for 2 rows of data
rowID | a | b

id1 | 1 | 1
id2 | 2 | NULL

when doing a select with predicate, e.g. select * from table where b IS NULL you will actually see in HIVE as the result

id2 | 2 | 1

Hard to explain in text, however I can confirm this change fixes the issue.

carlaustin · 2014-05-19T09:15:04Z

Scratch that, it only works that way when I apply my custom iterators. The issue still exists without so I will try to track it down and will post when/if I've confirmed a solution that also works without my custom iterators (which are interpreting the empty byte array as a null, hence it appears to solve it).
Sorry about the confusion but there does seem to be a genuine issue here.

carlaustin · 2014-05-19T16:33:48Z

An update in the hope someone reading may have an idea:

The issue only occurs when a MapReduce job is required because the AccumuloSerde isn't used for the final hive side bit, rather the LazySimpleSerDe is. Because LazyAccumuloRow.uncheckedGetField, when val == null, doesn't init the field as well as returning null. The field should be initialized with the null sequence as so:

                byte[] nullBytes = this.getInspector().getNullSequence().getBytes();
                ref = new ByteArrayRef();
                ref.setData(nullBytes);
                getFields()[id].init(ref, 0, nullBytes.length);
                return null;

Unfortunately the null sequence expected by Hive is "\N" and doing this with String type fields causes them to be escaped to "\N" prior to reaching the LazyStruct on the hive client side, and thus it is no longer treated as a null but a the string "\N".

I've spent quite a bit of time trying to figure out a way around this and have come up blank so far.

This is easy to recreate by creating a table in accumulo with rows missing key/value pairs (effectively null) and then mapping those to an external table as string columns. Then any statement that causes a MapReduce will display the nulls incorrectly. I will continue to track this down unless you have any ideas?

carlaustin · 2014-05-22T12:59:31Z

I've solved the issue, it is described here by the HBase guys https://issues.apache.org/jira/browse/HIVE-3179 and fixed in their LazyHBaseRow http://search-hadoop.com/c/Hive:hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java%7C%7C+%2522primitive+type%2522

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uncheckedGetField returning null for null values, not empty byte array #6

uncheckedGetField returning null for null values, not empty byte array #6

carlaustin commented May 16, 2014

carlaustin commented May 19, 2014

carlaustin commented May 19, 2014

carlaustin commented May 22, 2014

uncheckedGetField returning null for null values, not empty byte array #6

uncheckedGetField returning null for null values, not empty byte array #6

Comments

carlaustin commented May 16, 2014

carlaustin commented May 19, 2014

carlaustin commented May 19, 2014

carlaustin commented May 22, 2014