Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ArrayIndexOutOfBoundsException in primary key index #400

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ghost
Copy link

@ghost ghost commented Feb 22, 2019

@PaulSandoz
Copy link
Contributor

PaulSandoz commented Feb 25, 2019

The issue is not with the primary key index but with what is considered an absent value (referred to as a null value).

Even for values (not references) such as the value of a long field an absence of the value is encoded and a certain bit pattern has to represent the absence.

When writing out state the maximum number of bits required to encode a field is computed and encoded, which is the maximum of the number of bits to encode each value. More specifically for an integral value (such as a long value) it is the number of bits required to encode the zig zag transformation of the value (https://en.wikipedia.org/wiki/Variable-length_quantity#Zigzag_encoding).

In summary, the maximum number of bits of an integral value is the result of the expression 64 - Long.numberOfLeadingZeros(zigZag(value) + 1)

The absent integral value is chosen to be (1L << maxBits) - 1. When such a value is read then the minimum value is returned e.g for long it is Long.MIN_VALUE.

The problem encountered is that the maximum number of bits for the TypeA.id1 field for values { 9223372034562340851L, 0L, 3L } is 64 bits and (1L << 64) -1 is 0, since the 1L << 64 is the same as 1L << 0 which is 1. Therefore the bit pattern for encoding an absent value for this field is 0.

The result of new GenericHollowObject(readStateEngine, "TypeA", 1).toString() is:

  a1: null
  a2: Beasts of No Nation
  a3: 2015

It is assumed the primary key index ignores absent values, hence the object for an id of 0 cannot be found.

I don't yet know what can be done about this. My advice for now would be to restrict long values to a maximum of 62 bits.

@jkade
Copy link

jkade commented Feb 26, 2019

Thanks for the detailed explanation, @PaulSandoz!

When you say:

My advice for now would be to restrict long values to a maximum of 62 bits.

Do you mean, restrict long values to a maximum of 62 bits if lookup with 0 indexed key is necessary? Or restrict long values to a maximum of 62 bits full stop?

If I understand the explanation, it sounds like 0 should be the only problematic value. Am I missing something?

@PaulSandoz
Copy link
Contributor

@jkade yes, you can avoid a 0 value if you wish. I was being conservative suggesting that you restrict the maximum bit size thereby you don't need to think about particular values and their meaning with regards to absence (e.g. use an int field instead if you can). Note that Long.MIN_VALUE is the token absent value for long fields used at the API level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants