Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ft:binary-field does not cast booleans #5193

Open
Bpolitycki opened this issue Jan 19, 2024 · 2 comments
Open

[BUG] ft:binary-field does not cast booleans #5193

Bpolitycki opened this issue Jan 19, 2024 · 2 comments
Assignees
Labels
bug issue confirmed as bug
Milestone

Comments

@Bpolitycki
Copy link

Describe the bug
Assuming I have a binary-field definition in my collection.xconf like this:

<!-- ... -->
  <field name="my-field" expression="true()" type="xs:boolean" binary="yes"/>
<!-- ... -->

and I am using ft:binary-field in the following way:

ft:binary-field($something, 'my-field', 'xs:boolean')

It does not cast the retrieved value into a xs:boolean.

Expected behavior
I would expect – as mentioned in the functions docs and the documentation on the index – that the value is casted or and error is raised. But instead it returns true as xs:string.

To Reproduce
See description above.

Context (please always complete the following information)

  • Build: eXist-6.2.0
  • Java: build 1.8.0_392-b08 (OpenJDK)
  • OS: Alpine-Linux (Docker)

Notes

I am not a Java person, but after a quick search in the eXist-source I would assume the problem is the missing case in bytesToAtomic. I think the addition of the following case might enable correct casting:

case Type.BOOLEAN:
                return BooleanValue.deserialize(ByteBuffer.wrap(field.bytes));

Otherwise it should be documented what values can be casted by calling ft:binary-field.

@adamretter
Copy link
Member

adamretter commented Jan 19, 2024

It looks like correct conversion for quite a lot of the XDM types is sadly still missing from both the bytesToAtomic and stringToAtomic functions. In fact this is something I flagged as a problem in the PR when the code was previously proposed! Unfortunately it was not corrected before it was merged.

I think the addition of the following case might enable correct casting

In principle your suggestion looks good, but it is only 1/2 of the equation. At the moment when the xs:boolean value is indexed as a binary field, it is stored as a String. Ideally, it should be stored as a single bit (i.e. 0 or 1, representing false or true respectively).

The code for that is in LuceneFieldConfig#convertToDocValue(String), and you would need to add something like this to that method:

case Type.BOOLEAN:
    final BooleanValue bv = new BooleanValue(Boolean.valueOf(content));
    return new BinaryDocValuesField(fieldName, new BytesRef(bv.toJavaObject(byte[].class)));

Add the following at the top of the BooleanValue class:

private static final byte TRUE_BYTE = 1;
private static final byte FALSE_BYTE = 0;
private static final byte[] TRUE_BYTES = { TRUE_BYTE };
private static final byte[] FALSE_BYTES = { FALSE_BYTE };

and add the following to BooleanValue#toJavaObject(Class) method:

} else if (target == byte[].class) {
    return value ? (T) TRUE_BYTES : (T) FALSE_BYTES;
}

but you would also need to implement the function BooleanValue.deserialize as it is not yet present in the code base. Something like the following should suffice:

public static AtomicValue deserialize(final ByteBuffer buf) {
    return buf.get() == 1 ? TRUE : FALSE;
}

Finally you would need to add an additional test-case for binary fields of xs:boolean type to the XQSuite file: https://github.com/eXist-db/exist/blob/eXist-6.2.0/extensions/indexes/lucene/src/test/xquery/lucene/facets.xql

Please free free to send a PR for this enhancement.


One concern though - Binary Fields in eXist-db appear to have been implemented by using Lucene's BinaryDocValuesField class. The documentation for that class in Lucene clearly states:

The values are stored directly with no sharing, which is a good fit when the fields don't share (many) values, such as a title field. If values may be shared and sorted it's better to use SortedDocValuesField.

It would seem to me that this would be a bad fit for xs:boolean (and some other XDM types) as they have a very limited value space (e.g. just true or false), and therefore there will potentially be many shared values if you have more than one xs:boolean field. Unfortunately as far as I can see, eXist-db does not currently make SortedDocValuesField available in any way for indexing.

@adamretter adamretter added the bug issue confirmed as bug label Jan 19, 2024
@adamretter adamretter added this to the eXist-6.2.1 milestone Jan 19, 2024
@line-o line-o self-assigned this Jan 19, 2024
@Bpolitycki
Copy link
Author

@adamretter thanks for you reply! @line-o self-assigned this issue – so I think he will took a closer look at this issue (let me know, if I can support you in any way).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug issue confirmed as bug
Projects
None yet
Development

No branches or pull requests

3 participants