Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect ordering of geofence events - Tile38 for historical analyses. #661

Open
maxschub opened this issue Nov 2, 2022 · 11 comments
Open

Comments

@maxschub
Copy link

maxschub commented Nov 2, 2022

Hi,

I have a use case where I want to use my tile38 pipeline to also geofence historical data. In particular, there a about 100 geofences and 1 object that is moving in space with >100k locations a few seconds apart. When sending the locations sorted (by time) into tile38, then events are sent to a Kafka topic and polled by a consumer.
When analysing the events in Kafka, they are not ordered in time anymore, thus entering and exiting a geofence are not logical.

Reproduction
Set up 100 geofences (non overlapping) with DETECT inside
Send 10k + locations for same "agent" e.g.
await tile38.set('agents', agent_id).fields({"ts":ts }).point(lat, lon).exec()
SUBSCRIBE to geofences.
Analyse "ts" of each event against the previous event -> they are not monotonously increasing

When I add a time.sleep(0.001) between each SET, then tile38 seems to be completing the geofencing operations in time before the next SET du to the concurrent nature. However for the same agent I would expect them to happen in the same order as the SET operations.

Operating System

  • OS: Mac OS
  • CPU: Apple M1
  • Version: Ventura
  • Container: Docker
@iwpnd
Copy link
Contributor

iwpnd commented Nov 2, 2022

Hey,

tile38 sends the events when the underlying spatial query resolves. meaning that if you have a hook whos query takes 50ms and the next event in line resolves in 10ms, then the faster one sends the event before the other - if I read the source correctly. @tidwall might correct me here.

how do you send the SETs? Maybe we can replica the issue. And how realistic is a scenario where a single agent sends events at almost the same time?

@tidwall
Copy link
Owner

tidwall commented Nov 2, 2022

Hi,

Here's a high level view of how Tile38 sends events.

Each SET command runs atomically.

For example, for the SET command:

> SET fleet truck POINT 33 -112

The server will effectively do:

server-lock          # lock server for one write command at a time
update-collection    # update the point in the fleet collection
evaluate-geofences   # see if any geofences are affected
queue-events         # write geofence events, if any, to the queue.db
signal-webhooks      # send webhook signals that new events are waiting
server-unlock        # unlock the server for the next write command

Each geofence webhooks, on the other hand, run asynchronously.

> SETHOOK geofence1 kafka://topic1 INTERSECTS fleet BOUNDS ...
server-lock
update-geofence-webook           # insert/update geofence1
start-geofence-webhook-manager   # runs in a background thread
server-unlock

Each webhook has is own background manager that is responsible for processing its own events.

All the manager does is wait for a signal that there are new events, and when there are, it sends them, in order, over its own network connection to its assigned endpoint.

for-loop 
  wait-for-signal   # wait for signal that there are pending events
  send-events       # send queued events over network 

In short:

Server-wide, all event are queued in order with monotonically increasing timestamps.
Per each geofence webhook, all events are sent over its own network connection, in order.
There's no guarantee of order between different webhooks, even if they share the same endpoint, because each webhook sends events independently in the background.

@tidwall
Copy link
Owner

tidwall commented Nov 2, 2022

When analysing the events in Kafka, they are not ordered in time anymore, thus entering and exiting a geofence are not logical.

The enter/exit ordering for a specific geofence should always be in order.

If this is not happening then I would like to look into it further, and reproduce on my side.

@eduardotang
Copy link

eduardotang commented Nov 15, 2022

@tidwall , have you been able to find a clue?

i am noticing similar problem even with only 1 or 2 fences

12:16:18 -send 3 SETs
13:12:55 -send 16 SETs
14:15:40 - receive event from one of the SET in above 3 SETs
14:17:03 - receive event from one of the SET in the 16 SETs

hooks are setup with detecting inside and outside
i am sending the SET commands plus a GC at the end each time with Promise.all as described in auto-pipelining from https://github.com/redis/node-redis

the same set of SET commands are sent to 2 instances of tile38, 1 with 1 fence, another with 2 fences
the delay are larger in the instance with 2 fences

both 2 instances of tile38 running on azure containerapp with 2cpu 4G memory

@iwpnd
Copy link
Contributor

iwpnd commented Nov 16, 2022

Please correct me if I am wrong, but I am pretty sure Promise.all does not guarantee that commands resolve in order. It only attempts to send both at the same time, but due to transit, both can resolve at different points in time.

await Promise.all([
   tile38.set('agents', agent_id).fields({"ts":ts }).point(lat, lon).exec(),
   tile38.set('agents', agent_id).fields({"ts":ts + 1 }).point(lat + 1, lon + 1).exec(),
   tile38.gc()
])

The GC might resolve earlier than the SET and the second SET might resolve earlier than the first.

the same set of SET commands are sent to 2 instances of tile38, 1 with 1 fence, another with 2 fences
the delay are larger in the instance with 2 fences

What do you mean here?
Can you please provide a minimum viable reproducible code example?

@eduardotang
Copy link

your code snippet was what i did but i actually haven't noticed ordering problem yet within 1 batch ( 1 call to Promise.all)

the problem was among different batches
i.e.
i got the inside/outside event from some SETs in a batch (earlier timestamp) later than another batch with a later timestamp
(see below, SET in batch A event triggered after batch C , and apart from the ordering, there are diff delays.... )
batch C - 38mins, batch A - 2h42mins, batch B -1h37mins

12:13:19 - await Promise.all (multiple SETs) A
13:19:17 - await Promise.all (multiple SETs) B
14:01:25 - await Promise.all (multiple SETs) C

14:39:38 - receive event from above ( 1 SET from the batch at 14:01:25) C
14:55:06 - receive event from above ( 1 SET from the batch at 12:13:19) A
14:56:43 - receive event from above ( 1 SET from the batch at 13:19:17) B

for the 2 instances, i was just trying out to reproduce problem with different number of fences where i noticed the delay is different

@eduardotang
Copy link

below is the code snippet

async function cmd_to_tile38(client, cmds) {
    await Promise.all ( cmds.map((c) => client.sendCommand(c)) )
}

async function process_points(.....) {
    const commands = []
    for ( const p of pts) {
      commands.push ( [
           'SET', 'key', 'id' , 'POINT', p.lat, p.lon,
      ])
    }
    await Promise.all( tile38clients.map( (t) => cmd_to_tile38( t, commands) ))
}

@eduardotang
Copy link

Please correct me if I am wrong, but I am pretty sure Promise.all does not guarantee that commands resolve in order. It only attempts to send both at the same time, but due to transit, both can resolve at different points in time.

await Promise.all([
   tile38.set('agents', agent_id).fields({"ts":ts }).point(lat, lon).exec(),
   tile38.set('agents', agent_id).fields({"ts":ts + 1 }).point(lat + 1, lon + 1).exec(),
   tile38.gc()
])

The GC might resolve earlier than the SET and the second SET might resolve earlier than the first.

the same set of SET commands are sent to 2 instances of tile38, 1 with 1 fence, another with 2 fences
the delay are larger in the instance with 2 fences

What do you mean here? Can you please provide a minimum viable reproducible code example?

and actually, when there is just a few IDs (means less latlon points), it did work properly, so i guess the auto-pipelining should have no problem..........

@iwpnd
Copy link
Contributor

iwpnd commented Nov 16, 2022

I'm sorry I lost you. I cannot seem to properly understand what you do and what you expect to happen.

If you send 10 SET in a Promise.all you cannot expect Tile38 to send 10 inside events in the order of the timestamps in your SET. It will return 10 inside events in the order it receives the SET commands.

You are stacking promises here to the max, and I'm not surprised that this causes issues with bigger batches.
If you're doing async maps, consider using bluebird.

I sense that this is more your applications problem, then Tile38.

@eduardotang
Copy link

I just realized that auto pipelining from https://github.com/redis/node-redis is not that pipelining of redis
anyways, i turn to use https://github.com/luin/ioredis which supports redis pipelining , the problem seem resolved.....
thx

@iwpnd
Copy link
Contributor

iwpnd commented Nov 21, 2022

Glad you found a solution @eduardotang 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants