You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For discussion - it would be nice if TrailDB could deduplicate events. Below is a simple script that inserts some records twice. Clearly it's a little bit silly to append the exact same database twice, but it's possible that I might have some duplicate events when merging a bunch of different log types for a given time period.
from traildb import TrailDB, TrailDBConstructor
from uuid import uuid4
fields = ['text']
cons = TrailDBConstructor('/tmp/test1', fields)
for x in range(2):
uid = uuid4().hex
for ts in range(5):
cons.add(uid, ts, ['trail {}, time {}'.format(uid, ts)])
tdb = cons.finalize()
print '{} fields, {} trails, {} events'.format(tdb.num_fields, tdb.num_trails, tdb.num_events)
cons = TrailDBConstructor('/tmp/test2', fields)
cons.append(tdb)
cons.append(tdb)
tdb = cons.finalize()
print '{} fields, {} trails, {} events'.format(tdb.num_fields, tdb.num_trails, tdb.num_events)
What did you have in mind for the semantics of deduplication? Are you picturing like a flag that you pass to the constructor that causes it to drop exact duplicates of previously handled events on the floor?
duplicates in this context means that all fields are equal, including the timestamp and the uuid? Implementing dedup logic like this should be quite doable.
Yes, all the fields including timestamp and uuid would be equal if the event was to be considered a duplicate.
Different UUID? Lightning struck Alice instead of Bob. Log it.
Different timestamp? Bob got hit by lightning again. Log it.
Alice and Carol both telling me that Bob got hit by lightning at noon? If deduplication is active, I don't care who told me, only that I have a record of the event. (The logged event may or may not have a source host field, as appropriate).
For discussion - it would be nice if TrailDB could deduplicate events. Below is a simple script that inserts some records twice. Clearly it's a little bit silly to append the exact same database twice, but it's possible that I might have some duplicate events when merging a bunch of different log types for a given time period.
prints
The text was updated successfully, but these errors were encountered: