Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question to transition probability model #135

Open
dvent7 opened this issue Feb 27, 2019 · 3 comments
Open

Question to transition probability model #135

dvent7 opened this issue Feb 27, 2019 · 3 comments
Labels

Comments

@dvent7
Copy link

dvent7 commented Feb 27, 2019

Hi,

Iam really wondering why the base parameter was removed from calculating the transition. I could not find any hinds in the issues.

c74a36f#diff-dede88fbc9079c51b81be7fbf26f4298R285

double transition = (1 / beta)
* Math.exp((-1.0) * route.cost(new TimePriority()) / beta);

@smattheis
Copy link

I will come back to your question soon. As I think it's necessary to give you a detailed the answer here, I need to find a bit of time. Today won't work but next days should work. Sorry for the delay.

@dvent7
Copy link
Author

dvent7 commented Mar 4, 2019

Ok, Thank you Sebastian.
I am curious why you changed the alogrithm this way. Take your time.

@smattheis
Copy link

smattheis commented Mar 10, 2019

Approach 1: The algorithm of Newson and Krumm

In Newson and Krumm's paper "Hidden Markov Map Matching Through Noise and Sparseness" [1], a sequence of emissions (position measurements) z0, z1, z2, ... is the input. Each emission zt consists of position (lat lon) and a time of measurement t. For each emission, we determine a set of state candidates (possible map positions) through a simple radius search given the measured position as the center point:

  1. z0 -> s0,0, ..., s0,N0
  2. z1 -> s1,0, ..., s1,N1
  3. z2 -> s2,0, ..., s2,N2
    ...

The emission probability is modeled with a Gaussian distribution such that the probability of a candidate si,t given the respective measurement zt is:

where σz is typically the standard error of GPS, i.e., 5-8 meters, and the great circle distance is the distance between two positions on the earth surface.

To model movements of objects in a road network, map positions must be connected with paths along the roads of the map denoted as transitions between state candidates and determined with a simple router.

The transition probability is modeled with an exponential distribution, such that the probability of a transition from a state st,j to a possible subsequent state st+1,i is:

where β is an experimentally determined constant but (!) strongly depends on the sampling interval. The route distance denotes the path length of the shortest path (!) between candidates while base distance is here also the great circle distance of positions on the earth surface.

Now, let's look at some pitfalls:

Pitfall A: The contradiction of shortest path preference with our intuition of road types

Consider an intersection as shown in the figure below where two main roads intersect (red) and where a side road (blue) is a shortcut that goes over rough and smooth such that although it's shorter it takes much longer than the path along the main roads.

Now, assume an object has taken the path along the main roads (green path) and we obtained a sequence of measurements z0, z1, z2, z3 as shown in the figure. Given the measurements, we can construct a scenario with the following presumptions:

  1. p(s1,0 | s0,0) = p(s1,1 | s0,0)
  2. p(s3,0 | s2,0) = p(s3,0 | s2,1)
  3. p(s1,0 | z1) = p(s1,1 | z1)
  4. p(s2,0 | z2) = p(s2,1 | z2)

The pitfall arises from the preference of shortest paths in the transition probability model such that

||s2,0-s1,0||route > ||s2,1-s1,1||route (> ||z2-z1||great circle) ⇒ p(s2,0|s1,0) < p(s2,1|s1,1)

which means that s0,0, s1,1, s2,1, s3,0 is determined as the more likely path although it contradicts the intuition of our example where the object has taken the faster green path.

Pitfall B: The inversion of shortest path preference through absolute values

Consider a road network with several intersections as can be found in residential areas and as illustrated in the figure below.

Now, assume an object has taken the green path and we obtained position measurements z0, z1, z2, z3 as shown in the figure. Note that the measurement errors of z1 and z2 in this example and the shown distances d0, d1, and d2 (perpendicular to the respective roads). Given the measurement errors, we can construct a scenario with the following presumptions:

  1. p(s1,0|z1) = p(s1,1|z1) (d0 = d1)
  2. ||s1,0-s0,0||route = ||s1,1-s0,0||route ⇒ p(s1,0|s0,0) = p(s1,1|s0,0)
  3. ||s2,0-s1,0||route < ||s2,0-s1,1||route

The pitfall arises from the large error of z2 with d0, d1 ≪ d2 such that for presumption 3, we conclude:

||s2,0-s1,0||route < ||s2,0-s1,1||route ⇏ p(s2,0|s1,0) > p(s2,0|s1,1). (Not follows!)

The reason is the large measurement error of z2 with ||z2-z1||great circle > ||s2,0-s1,1||route, such that it follows the inverse

||s2,0-s1,0||route < ||s2,0-s1,1||route ∧ ||z2-z1||great circle > ||s2,0-s1,1||route ⇒ p(s2,0|s1,0) < p(s2,0|s1,1)

which means that s0,0, s1,1, s2,0, s3,0 is determined as the more likely path although it contradicts the preference of shorter paths.

Approach 2: A quick-fix

To avoid pitfall A, we use a router that takes road types into account. Since Barefoot uses OpenStreetMap data, we followed the practices of well-known Open Source routers for OpenStreetMap. More specifically, we follow some principles of pgrouting and use a cost function that calculates the driving time t and multiplies it with a factor f that represents the road types. In detail, for each segment of the path we use the respective length of the segment s and its maximum speed v, calculate the time t (with t = s / v) and multiply it with a factor f ≥ 1 that is predefined for all types of roads, where major roads have a lower value and minor roads have a larger value, e.g., highways are assigned factor f = 1 and roads in residential areas are assigned f = 2.5.

This alternative cost function, however, requires also an alternative base distance that corresponds to the cost function. For that reason, we calculate the base distance similarly to the cost function but use the great circle distance between measurements, assume a maximum speed of 216 km/h (= 60 m/s) and the minimum road type factor of 1.

To avoid pitfall B, we also adapted the transition probability model to not use the absolute value for the difference between base distance and route distance but bound the difference to a minimum of zero, in the case that ||st+1,i-st,j||route < ||zt+1-zt||great circle, as follows:

This algorithm solves the problems of pitfall A but only some cases of problems of pitfall B. To be specific, reconsider the example of pitfall B. The algorithm will not obtain p(s2,0|s1,0) < p(s2,0|s1,1) but instead p(s2,0|s1,0) = p(s2,0|s1,1). That fixes the problems for some cases but it still can annul the preference of lower route costs because although ||s2,0-s1,0||route < ||s2,0-s1,1||route we can construct a scenario where we obtain p(s2,0|s1,0) = p(s2,0|s1,1) and observe:

||s2,0-s1,0||route < ||s2,0-s1,1||route ⇏ p(s2,0|s1,0) > p(s2,0|s1,1). (Not follows!)

The reason is simple: The difference of route costs in such cases is "absorbed" by the base distance!

The consequence is quite inadequate because the example of pitfall B can easily degenerate to cases where p(s1,0|z1) < p(s1,1|z1) (d0 > d1) such that also here we obtain again s0,0, s1,1, s2,0, s3,0 as the more likely path although it contradicts the preference of lower route distances. (Note, although such cases may seem artificial we've seen them occur. Please remember Murphy's law.)

Approach 3: A more robust solution

The solution we found for our use cases is as simple as elegant: The base distance is simply not necessary. This means, the transition probability describes an exponential distribution of the route distance defined as follows

where the route cost function is the same as for algorithm 2 and the expected value β is given with the time between the measurements (multiplied with factor 1).

The intuition is that the route distance that can be traveled within a given time interval follows an exponential distribution. Seen from a different angle, we describe the probability of route distance (time-based) between measurements which describes the original problem domain of the exponential distribution. Of course, one could argue for and against its applicability or the validity of the Poisson assumptions, required to assume a Poisson process and to justify the use of an exponential distribution. However, note that we observed in our data that cars spend lots of time waiting at traffic lights, in traffic jams, in parking scenarios, etc. Given that, there are good arguments to consider all Poisson assumptions to be satisfied such that we consider the exponential distribution to be a robust description of the route distance distribution.

A side effect for this approach, since β is given with the measurement interval we gain an inherent self-adaptability of the algorithm to varying sampling rates.

@smattheis smattheis pinned this issue Mar 11, 2019
@smattheis smattheis changed the title Question to transition calculation Question to transition probability model Mar 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants