You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TrailDB could handle high-cardinality fields more efficiently. We have faced two examples of high-cardinality fields recently:
IDs of format granular_timestamp + random ID (e.g. 144500000009837478)
Continuous-valued fields with a limited range (e.g. 100002, 100003, 100011)
Although the cardinality of these fields can be huge, they are not pure entropy. Both the cases have a highly repetitive prefix. We could potentially reduce the size of the lexicon files remarkably by storing them as tries, compressing away the common prefixes.
The main downside of this is that functions accessing the lexicon such as tdb_get_item_value would need to reconstruct the value on the fly, which makes it harder to return a stable pointer.
The text was updated successfully, but these errors were encountered:
TrailDB could handle high-cardinality fields more efficiently. We have faced two examples of high-cardinality fields recently:
granular_timestamp + random ID
(e.g.144500000009837478
)100002
,100003
,100011
)Although the cardinality of these fields can be huge, they are not pure entropy. Both the cases have a highly repetitive prefix. We could potentially reduce the size of the lexicon files remarkably by storing them as tries, compressing away the common prefixes.
The main downside of this is that functions accessing the lexicon such as
tdb_get_item_value
would need to reconstruct the value on the fly, which makes it harder to return a stable pointer.The text was updated successfully, but these errors were encountered: