Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datawave 825 part2 - Modification to the MetadataHelper class required to datawave issue #825 #5

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

jzgithub1
Copy link

Updates to the MetadataHelper.isIndexed and getIndexFields function required for use of the new IndexColumnIterator in Datawave

@@ -209,6 +209,68 @@ public Boolean isIndexed(Text colf, Entry<String,Entry<String,Set<String>>> key)
return result;
}

/**
* Method that fetches fetches a DataFrequencyValue for an indexed rowid.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is incorrect


Range range = new Range(upCaseFieldName);
scanner.setRange(range);
scanner.fetchColumnFamily(colf);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why pass in the colf when we know it should be "i" ?

private static final Logger log = LoggerFactory.getLogger(IndexedDatesValue.class);

private YearMonthDay startDay;
private TreeSet<YearMonthDay> indexedDatesSet = new TreeSet<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were going to internalize this as a set of ranges instead of distinct YearMonthDays. Doing it this way will create a ton of objects. Also, we need to figure out how to deal with that fact that data is indexed already but has no bits set. So either we need a way of updating legacy metadata to presume that it has been indexed since the beginning of time, or we assume that we have indexed data from the beginning of time until the first bit set.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking the storage of a bitset is excellent, we simply should transform that into a set of FieldIndexHoles instead of a large set of YearMonthDay objects directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants