Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed calculation error for composite when individual trackers do not have same error #69

Closed
jata1 opened this issue Mar 20, 2024 · 38 comments

Comments

@jata1
Copy link

jata1 commented Mar 20, 2024

Hello Phil. Me again!

Thought I'd start a new thread/issue for this. My testing setup is as follows and I noticed a speed spike today that I can't understand.

  1. composite for HA device tracker (single entity)
  2. composite for iCloud device tracker (single entity)
  3. composite with HA and iCloud together (2 entities)

Today I see a speed spike (error) on 3 but not seeing this on either 1 or 2 - so that is weird, right?

I would expect to see the glitch in one of the underlying trackers. Not sure what I can do to try to track this down. I will see if it happens again without making any changes to my setup.

Chart below. Also see the good progress I am making picking up speeding events in HA.

dash 2

@pnbruckner
Copy link
Owner

I'm not really surprised. When you're combining inputs with effectively different time references, this can definitely happen.

First, the times these events are shown on the graph are when the entity updates, not when the events really happened. But remember, the speed calculations are done using the data timestamps. One of the inputs uses (in theory) the timestamps from when the location data actually changes, whereas the other uses timestamps of when the new location data reaches HA. So, one is using the phone's time reference, and the other is using HA's time reference.

As a concrete example, let's say the HA app sees a new location at t1. It then sends that to HA and the device_tracker sees that new location at t2. That is what the composite will record (last_updated) because the HA app doesn't send the actual timestamp of t1. Now iCloud3 sees a new location at t3 and it gets to HA at t4. The composite will record t3 (last_timestamp) because it is available. Now the composite calculates the speed between those two points (and shows it at t4.) The real speed is the difference in location divided by t3 - t1. But because t1 is not available, it uses t2, so the resultant speed is instead the difference in location divided by t3 - t2. Since t2 is closer to t3 than t1, the resultant speed is too high. Given the right conditions, this could definitely create spikes.

@jata1
Copy link
Author

jata1 commented Mar 20, 2024

I see and it is complicated for sure.

Given (assuming) the composite speed for iC3 (yellow on chart) and the composite speed for HA (purple) individually are OK - they look good above. Maybe I need to take a different approach for speed - for the purposes of my project that will not work well with false positives...

What do you think about this:

Use a template sensor that takes the speed values from both HA or iC3 as they change/update. So the most recent speed value is held in the sensor and this is passed to traccar with the GPS data. So no additional attempts to calculate speed based on location.

@jata1
Copy link
Author

jata1 commented Mar 20, 2024

I have used a 'helper' to combine the state of the HA and iC3 speed - using 'most recently updated' data. Will see how this looks on the drive home from work this evening.

@jata1
Copy link
Author

jata1 commented Mar 20, 2024

Some new data from the drive home. I have learnt a few things.

  1. The data from icloud seems to be causing some issues. As I get close to 'home' the icloud tracker starts pulling GPS data very quickly generating lots of weird speed readings. I'm not sure why it's doing this (maybe to help accurately determine distance/time from home).
  2. Overall the HA app seems to do quite a good job. I'm not sure how this is even possible given the issues you have discussed with me on the it missing reliable fix time data.
  3. the concept of using a most recently updated speed data in a helper seems to work well so long as the source data is good

I will see if the icloud3 dev can throw some light onto what is happening with his integration. Chart below highlights what I am seeing

image

@jata1
Copy link
Author

jata1 commented Mar 20, 2024

Just had an idea - is there a way to configure the composite device tracker integration to do something about these high frequency samples? Either drop some of them or take the average / median over a few minute interval...

@pnbruckner
Copy link
Owner

Use a template sensor that takes the speed values from both HA or iC3 as they change/update. So the most recent speed value is held in the sensor and this is passed to traccar with the GPS data. So no additional attempts to calculate speed based on location.

That definitely sounds reasonable.

You might also want to feed that into the Statistics sensor. That sensor, obviously, introduces some delay, since it's combining multiple readings to generate its output, but I think in your use case, that is not an issue.

The only problem, however, might be that your template sensor probably won't cause a state update if its output value doesn't change. E.g., let's say that one input changes to 10, so the template sensor changes to 10. Now the other input changes to 10. The template sensor will probably not cause a state changed event because its value isn't changing. But those are two valid sample points for the statistics sensor to treat as individual data points. Without a state changed event, it won't "see" the second data point.

You might want to do something with the template's output so that every change of an input causes a change in its state. That could be something in an attribute.

@pnbruckner
Copy link
Owner

2. Overall the HA app seems to do quite a good job. I'm not sure how this is even possible given the issues you have discussed with me on the it missing reliable fix time data.

That's not very surprising either. Consider my explanation above, but a composite that uses just one input. In this case, the two points will be timestamped with t2 & t4. If the delay from t1 to t2 and t3 to t4 (i.e., the delay between reading the location and it getting to HA) is consistent, then it doesn't matter if t4 - t2 or t3 - t1 is used in the calculation.

In my case I have a composite that combines Google Maps and GPSLogger from my Android phone. Here is the speed for my drive to work this morning:

image

There are some ups and downs, but that's reasonable considering it's a short drive with turns, stop signs and traffic lights. The pikes are roughly the speed I was driving when actually moving.

If you're interested, I have a Python script that can extract information from HA's database and display details about changing entities. E.g., here is the output from the first part of the drive shown graphically above.

image

I had it show the last_seen attribute of the composite entity, as well as which input device_tracker caused each update. You can see when I'm moving slowly, both inputs contribute. But once I get moving, then mostly the updates are from GPSLogger (because that's how I have it set up.)

@pnbruckner
Copy link
Owner

Just had an idea - is there a way to configure the composite device tracker integration to do something about these high frequency samples? Either drop some of them or take the average / median over a few minute interval...

Well, it does do some filtering, as I think I might have mentioned. If two updates come from the same input, then they have to be at least 3 seconds apart for them to cause a speed update. And if they are from different inputs, then they have to be at least 9 seconds apart. Other than that, I'm not sure it's appropriate for the composite entity to do much processing of the data. It's better to do that with something else, like the Statistics sensor I mentioned above.

@jata1
Copy link
Author

jata1 commented Mar 20, 2024

EDIT - sorry I missed your post above! I will see what I can do with the statistics sensor

Thanks as always Phil.

I have checked in with the icloud3 dev and it is correct that his integration switches to high frequency GPS requests (every 15 seconds) when it you are within 1km of the set home location - so this is by design and it is screwing up speed data.

@jata1
Copy link
Author

jata1 commented Mar 21, 2024

Phil - I have setup a couple of statistics sensors and will monitor. I need to have the right conditions to trigger the high frequency GPS data requests that cause the issue so I will report back. But it is all getting quite complicated.

The other idea I am thinking about is to limit the GPS data frequency going into composite (e.g. to say a max of 1 per minute). I think I will need to create a template device tracker entity to achieve this or will composite accept a sensor entity? This might be a cleaner solution as I won't need to do stats that are likely to have some other consequences and may still overestimate speed.

So can you give me a bit of guidance as to what composite needs. Do the configured entities need to be device trackers? If I can use a template sensor - what attributes do you need?

@jata1
Copy link
Author

jata1 commented Mar 22, 2024

Hey not much luck with using the statistics platform - not really stopping the false positive speed spikes.

I am having better results using a time pattern triggered (every 2mins) sensor with the attributes your composite integration needs - see yaml below. I then use this sensor in a composite and it is looking much better.

Now I just want to force icloud to get new GPS data (mainly for testing and when home to reset speed). Is there are generic way to do this? I see a service called device_tracker: See

- trigger:
    - platform: time_pattern
      minutes: "/2"
  sensor:
    - name: "Jago iCloud Custom"
      unique_id: jago_icloud_custom
      icon: "mdi:map-marker"
      state: "{{ states('device_tracker.jago_iphone') }}"            
      attributes:
        latitude: "{{ state_attr('device_tracker.jago_iphone', 'latitude') }}"
        longitude: "{{ state_attr('device_tracker.jago_iphone', 'longitude') }}"
        gps_accuracy: "{{ state_attr('device_tracker.jago_iphone', 'gps_accuracy') }}"
        last_timestamp: "{{ state_attr('device_tracker.jago_iphone', 'last_timestamp') }}"

@pnbruckner
Copy link
Owner

I have checked in with the icloud3 dev and it is correct that his integration switches to high frequency GPS requests (every 15 seconds) when it you are within 1km of the set home location - so this is by design and it is screwing up speed data.

Interesting. I have GPSLogger configured to send updates every 10 seconds if I'm moving over a certain speed. (At lower speeds it doesn't send any updates, and I just rely on Google Maps.) It does have small inaccuracies from time to time, but nothing like you're seeing. Seems like maybe iCloud3 is asking for rather low accuracy readings maybe??? Does the tracker have a gps_accuracy attribute, and does that number get fairly high during this time (higher numbers being less accurate)?

So can you give me a bit of guidance as to what composite needs. Do the configured entities need to be device trackers? If I can use a template sensor - what attributes do you need?

No, they do not need to be device_tracker entities. They can be a sensor entity. They need to have latitude & longitude attributes, or lat & lon attributes. They also need to have a gps_accuracy or acc attribute. It would be good to have a last_seen or last_timestamp attribute, but as we've discussed, that is optional.

Hey not much luck with using the statistics platform - not really stopping the false positive speed spikes.

Did you use the mean value? Haven't thought too much about it, but maybe try median instead???

I am having better results using a time pattern triggered (every 2mins) sensor with the attributes your composite integration needs - see yaml below. I then use this sensor in a composite and it is looking much better.

That doesn't guarantee it filters out spikes, right? Isn't it just luck whether or not it happens to sample when there was a location data point that would have caused a spike in speed?

It seems the real problem is inaccurate location sample points. Maybe you need to filter those out somehow. I know Google Maps has a max GPS accuracy config option. Does iCloud3 have anything like that? Maybe your template sensor could trigger on all updates from the iCloud3 entity, but only use the new values if gps_accuracy is below some threshold. Or maybe just do the speed calculation yourself right there, where it rejects two location samples where one is obviously wrong or too inaccurate???

Now I just want to force icloud to get new GPS data (mainly for testing and when home to reset speed).

There's the generic homeassistant.update_entity service, but not all integrations really honor that. You'd have to try it on the iCloud3 entity, or ask the developer.

I see a service called device_tracker: See

I personally would not use that. It will create a "legacy" tracker, meaning it will create a known_devices.yaml file. Probably not worth messing with. I think you're better off using some sort of template sensor, triggered or not.

@pnbruckner
Copy link
Owner

So, I've never noticed this before. (I.e., I don't know if it never happened before, or I just didn't notice it.)

I have a graph that shows the speed of each of the eight devices my system monitors over the last four hours. I just noticed that one of those speeds had a spike of over 400 mph. And, no, they were not in an airplane (although I have seen that, too!)

I used my script to extract the states & attributes for device_tracker & speed sensor for that device around that time. When I hand calculate what the speed should have been, it was not the value shown in the speed sensor, or in the corresponding DEBUG log message:

2024-03-22 07:12:20.390 DEBUG (MainThread) [custom_components.composite.device_tracker] Michel: Sending speed: 9.0 m/s, angle: 179°
2024-03-22 07:12:30.415 DEBUG (MainThread) [custom_components.composite.device_tracker] Michel: Sending speed: 180.7 m/s, angle: 179°
2024-03-22 07:13:01.499 DEBUG (MainThread) [custom_components.composite.device_tracker] Michel: Sending speed: 26.4 m/s, angle: 179°

For the latitude, longitude & last_seen attributes for the two samples involved in that calculation, the speed should have been 3.1 m/s, not 180.7 m/s.

In case you're curious, this is where the calculations for speed & angle are done:

if prev_ent and self._prev_seen and prev_lat and prev_lon and gps:
assert lat
assert lon
assert attributes
last_ent = cast(str, attributes[ATTR_LAST_ENTITY_ID])
last_seen = cast(datetime, attributes[ATTR_LAST_SEEN])
seconds = (last_seen - self._prev_seen).total_seconds()
min_seconds = MIN_SPEED_SECONDS
if last_ent != prev_ent:
min_seconds *= 3
if seconds < min_seconds:
_LOGGER.debug(
"%s: Not sending speed & angle (time delta %0.1f < %0.1f)",
self.name,
seconds,
min_seconds,
)
return
meters = cast(float, distance(prev_lat, prev_lon, lat, lon))
try:
speed = round(meters / seconds, 1)
except TypeError:
_LOGGER.error("%s: distance() returned None", self.name)
else:
if speed > MIN_ANGLE_SPEED:
angle = round(degrees(atan2(lon - prev_lon, lat - prev_lat)))
if angle < 0:
angle += 360
if (
speed is not None
and self._driving_speed is not None
and speed >= self._driving_speed
and self.state == STATE_NOT_HOME
):
self._location_name = STATE_DRIVING
_LOGGER.debug("%s: Sending speed: %s m/s, angle: %s°", self.name, speed, angle)
async_dispatcher_send(
self.hass, f"{SIG_COMPOSITE_SPEED}-{self.unique_id}", speed, angle
)

I'm not sure how this is happening. I think I need to add some more DEBUG output to log the intermediate values of meters & seconds (and probably all the inputs to those as well) to see where it's going wrong.

I don't know if the spikes you are seeing are due to this same problem, or if maybe the source location values are wrong (and the resulting speed is correct for those values.)

Hmm...

@pnbruckner
Copy link
Owner

pnbruckner commented Mar 22, 2024

Oops, never mind. I read the times wrong. I was thinking the samples were 6 minutes apart, but they were really 6 seconds apart. That's why the value was off by a factor of 60. D'oh! I guess I need another cup of coffee!

I just went through the calculations again, and the speed was calculated correctly for those two samples.

Now, the question is, why are those two samples causing a speed of this value??? They are both from the same source, which is Google Maps. Hmm...

@pnbruckner
Copy link
Owner

The weird thing is, the reported GPS accuracy for that group of samples (from when he was driving) are all around 4 or 5 meters, so fairly accurate (in theory.) Also, when I looked at the points on a map, I did not see an "outlier". The points that corresponded to the spiked speed value were reasonably spaced and in line. Hmm...

@pnbruckner
Copy link
Owner

pnbruckner commented Mar 22, 2024

So, I crunched the numbers a bit more. I took a group of location points & timestamps, and calculated distance between each pair of samples (in meters), time between the pair of samples (in seconds), and resulting speed in m/s, k/h & m/h:

 457.5  13  35.2 126.7  78.7
1203.2 134   9.0  32.3  20.1
1108.3   6 184.7 665.0 413.2
 901.5  34  26.5  95.5  59.3
1050.7  36  29.2 105.1  65.3
 471.0  20  23.6  84.8  52.7
 501.6  50  10.0  36.1  22.4

The distances seem fine, but the time difference for the 2nd and 3rd rows seem way off. It seems like Google Maps (or the phone itself???) incorrectly time stamped at least one of those samples. Hmm...

@jata1
Copy link
Author

jata1 commented Mar 22, 2024

Thanks so much for looking into this with me.

Given you are generally getting good results from GPSlogger with high frequency is interesting and indicates something is going wrong with the icloud GPS data. Maybe something simple like the icloud api response time takes longer than 15secs.

I have redesigned my approach as follows:

  1. created a template sensor with the GPS info that composite needs - working!
  2. I have a script and automation that forces a GPS locate for icloud3 every 2 mins then waits a bit then updates the template sensor
  3. I have a composite setup using the template sensor that should only update every 2 mins with new GPS data

I will now monitor my custom icloud3 speed sensor and compare it to the stock icloud3 sensor...

If this works, then I have my work-a-round / solution

@jata1
Copy link
Author

jata1 commented Mar 23, 2024

Nooooo. My custom template sensor is being updated whenever the underlying attributes change so it is basically a clone/copy of the original and has the same issue as my main icloud sensor.

I think the reason is that the custom sensor 'state' is using the state of the main icloud sensor so I will try changing this next but I am not confident as the attributes will be changing in any case.

Man this is so frustrating but I will not give up yet. haha

@jata1
Copy link
Author

jata1 commented Mar 23, 2024

Looks like this might work better - but need to test.

The trigger condition is met each time the script runs so hopefully will only update at the set frequency

- trigger:
  - platform: state
    entity_id: script.jago_icloud_gps_update
    from:
      - 'on'
    to: 
      - 'off'
  sensor:
    - name: "Jago iCloud GPS"
      unique_id: jago_icloud_gps
      icon: "mdi:map-marker-radius"
      state: "{{ states('device_tracker.jago_comp') }}"            
      attributes:
        latitude: "{{ state_attr('device_tracker.jago_iphone', 'latitude') }}"
        longitude: "{{ state_attr('device_tracker.jago_iphone', 'longitude') }}"
        gps_accuracy: "{{ state_attr('device_tracker.jago_iphone', 'gps_accuracy') }}"
        last_timestamp: "{{ state_attr('device_tracker.jago_iphone', 'last_timestamp') }}"
        last_ran: "{{ states.sensor.jago_icloud_gps.last_updated | as_local }}" 

@jata1
Copy link
Author

jata1 commented Mar 25, 2024

It's finally working. Composite sensor that uses corrected icloud3 and HA location. Thanks for all the help Phil.

Corrected (composite with HA) and raw icloud in the chart below.

image

@jata1
Copy link
Author

jata1 commented Mar 25, 2024

Unfortunately one lone glitch for my wife. Her data based only on icloud3 adjusted to 2 min intervals with no other GPS data so a bit strange/unexpected. So not sure what could have gone wrong with the speed calc.

Any ideas?

image

@pnbruckner
Copy link
Owner

I'm not saying it's impossible, but I seriously doubt composite is calculating the speed incorrectly. (If it is, though, I'd certainly like to know so I can fix it.)

I think you need to start looking at the data you are feeding into composite. As I said, I have a Python script that can extract data (last_updated, state and optionally attributes) for any entities. I could also whip up another script that could take that output and do the calculations I did above "by hand". Let me know if you're interested.

@jata1
Copy link
Author

jata1 commented Mar 25, 2024

Phil - thanks. I would like to investigate further. I'm away camping over Easter so will probably investigate when we get back (and will have plenty of driving data then too)!

I have contacted Jeff and he has explained how I can get raw GPS data logs from the icloud3 integration. I think I already have a clue as this issue only seems to affect my wife and I can see that some of her GPS data is less accurate than mine (50m).

Can you share the script and let me know what I need to do to setup testing please?

@jata1
Copy link
Author

jata1 commented Mar 26, 2024

Just enabled raw logging for icloud3 and having a look at what is produced. I think it looks useful but there is a lot of noise in the logs so it's not straightforward to analyse. But I think we can use this to look at the end to end and see if there is anything obvious wrong.

To get these huge speed spikes indicates moving a either a large distance in short time or a smaller distance in a really short period. Both of these scenarios are not happening in reality but that is what the data is saying - I think/hope.

I have found something in the logs. Happens from time to time when my script runs and the icloud3 normal process runs at a similar time - maybe there is something in this.

Can you do me a quick favour and do a speed calculation between these two GPS locations please? It looks weird as time is 4 seconds apart and there is a difference in GPS coordinates. Also the get location/call timestamp for are 2 minutes apart but the GPS data is only a few seconds.

03-26 17:02:02 [pyicloud_ic3:1506] ICLOUD > ──────── FAMSHR DATA - <MAGDA'S IPHONE/MAGDA_IPHONE> ────────
{'▶ITEMS◀ (items)': {'id': 'AdpQUk+gea...', 'modelDisplayName': 'iPhone', 'lostModeCapable': True, 'name': "Magda's iPhone", 'deviceClass': 'iPhone', 'deviceStatus': '200', 'rawDeviceModel': 'iPhone14,2', 'batteryLevel': 0.4399999976158142, 'deviceDisplayName': 'iPhone 13 Pro', 'prsId': 'MTQ1MDE3NTcyNg~~', 'batteryStatus': 'not charging', 'deviceModel': 'iphone13Pro-1-1-0', 'data_source': 'FamShr'}, '▶LOCATION◀ (location)': {'isOld': False, 'isInaccurate': False, 'altitude': 0.0, 'latitude': -33.86737200527077, 'horizontalAccuracy': 13.59509039048701, 'timeStamp': 1711432921789, 'verticalAccuracy': 0.0, 'longitude': 151.20902364724404, 'timestamp': 1711432921, 'location_time': '5:02:01p'}}


03-26 17:04:00 [pyicloud_ic3:1506] GETLOC ⡇ ──────── FAMSHR DATA - <MAGDA'S IPHONE/MAGDA_IPHONE> ────────
{'▶ITEMS◀ (items)': {'id': 'AdpQUk+gea...', 'modelDisplayName': 'iPhone', 'lostModeCapable': True, 'name': "Magda's iPhone", 'deviceClass': 'iPhone', 'deviceStatus': '200', 'rawDeviceModel': 'iPhone14,2', 'batteryLevel': 0.4399999976158142, 'deviceDisplayName': 'iPhone 13 Pro', 'prsId': 'MTQ1MDE3NTcyNg~~', 'batteryStatus': 'not charging', 'deviceModel': 'iphone13Pro-1-1-0', 'data_source': 'FamShr'}, '▶LOCATION◀ (location)': {'isOld': False, 'isInaccurate': False, 'altitude': 0.0, 'latitude': -33.86806048337412, 'horizontalAccuracy': 35.0, 'timeStamp': 1711432925349, 'verticalAccuracy': 0.0, 'longitude': 151.20977275677794, 'timestamp': 1711432925, 'location_time': '5:02:05p'}}

@jata1
Copy link
Author

jata1 commented Mar 26, 2024

and correlates with a speed spike in my project/tracking.

image

@jata1
Copy link
Author

jata1 commented Mar 27, 2024

Hey Phil - I am trying a different approach with the speed data from icloud3. I have improved my logic and sensors so that I ignore icloud3 GPS data that is too close together. Although this will result in less data from time to time, I'm hoping the data will be clean and fix the issue I have.

I'll update you after Easter! Have a good one!

@pnbruckner
Copy link
Owner

Sorry, I meant to get back to you, but it looks like maybe Life360's server is accessible again, and I got distracted seeing if I could maybe get that to work again.

Can you share the script and let me know what I need to do to setup testing please?

You can find it here: https://github.com/pnbruckner/homeassistant-config/blob/master/tools/hadb.py

It is a Python3 script & it requires two pypi.org packages, ordered-set & termcolor. I'm using versions 4.1.0 & 2.2.0, respectively.

The script can open and extract data from an SQLite database file, the kind that HA uses by default (i.e., home-assistant_v2.db.) The script must be able to open that file directly, and it can do so, even while HA is running, without conflict.

The ability to use that script depends heavily on the type of HA install you use, how you have it configured, and you're familiarity with doing this sort of thing. I'm happy to help. I have the most experience with HA "raw" installs into a Python venv and in a docker container. I have used HA OS, but not much, and I'm not familiar with its advanced usage, so I can't help with that.

Assuming HA is running on Linux, you know how to run stuff from the OS command line, and your HA system is installed and configured such that you can get direct access to the SQLite database file, here's a brief tutorial for how to get started:

$ wget https://github.com/pnbruckner/homeassistant-config/raw/master/tools/hadb.py
$ chmod +x hadb.py
$ python3 -m pip install --user "ordered-set==4.1.0" "termcolor==2.2.0"
$ ./hadb.py -h

That will print out the basic help for the script. Let me know if you have any questions.

Can you do me a quick favour and do a speed calculation between these two GPS locations please?

>>> from homeassistant.util.location import distance
>>> from datetime import datetime
>>> from homeassistant.util.unit_conversion import SpeedConverter
>>> from homeassistant.const import UnitOfSpeed
>>> meters = distance(-33.86737200527077, 151.20902364724404, -33.86806048337412, 151.20977275677794)
>>> seconds = (datetime.fromtimestamp(1711432925349/1000) - datetime.fromtimestamp(1711432921789/1000)).total_seconds()
>>> mps = meters/seconds
>>> kph = SpeedConverter.convert(mps, UnitOfSpeed.METERS_PER_SECOND, UnitOfSpeed.KILOMETERS_PER_HOUR)
>>> mph = SpeedConverter.convert(mps, UnitOfSpeed.METERS_PER_SECOND, UnitOfSpeed.MILES_PER_HOUR)
>>> print(round(meters), round(seconds, 2), round(mps, 1), round(kph), round(mph))
103 3.56 29.0 104 65

Happy Easter! Hope you have a great time on your camping trip!!

@pnbruckner
Copy link
Owner

FWIW, I noticed some speed spikes from my daughter's iPhone earlier today. I'm tracking it with Google Maps, fed into Composite. I then set up a few statistics sensors, with average_linear, mean & median. I just saw another spike. Here is the raw composite speed with the three filtering sensors:

image

Light blue is mean, purple is average_linear, and orange is median. To me, median seems to do the trick. For all three I used a sampling_size of 5 and a max_age of 10 minutes.

Also, for the spike I saw earlier, I checked the states, and the history path on the map, and they both showed that the spike was correct for the data obtained from Google Maps. Clearly, though, Google Maps (or maybe iOS) improperly timestamped one of the fixes.

@pnbruckner
Copy link
Owner

pnbruckner commented Mar 29, 2024

Currently, the speed sensor does forced updates, even if the value isn't changing, to make sure statistics sensors can work properly. However, it doesn't do this when the value is zero. The thought was, why fill up the database with a lot of unchanging values when, for most people I would think, there are usually long periods where they're not moving, especially when they're sleeping?! 😄

But now I think it should always do forced updates, even when the value is zero. I think the recorder has gotten a lot smarter about managing mostly unchanging data. And this will allow the statistics sensors to still have a value (other than "unknown") when the speed is zero for longer than the max_age parameter of the statistics sensors.

@pnbruckner
Copy link
Owner

I think I'm starting to come around to your way of thinking, i.e., regarding composite doing some filtering itself. 😉

The main reason is, composite has more data than any post-processing (such as with a statistics sensor) will, because it has the raw data, which includes knowing which input each data point came from, but more importantly, the timestamp for each of those data points. Even if I add a timestamp attribute to the speed sensor (which I'm thinking of doing, too), it really wouldn't help, because the standard statistics integration won't use it.

So here is my current thought. Add a speed filter option (so not everyone would be forced to use it.) At first, it will be a simple median filter using the last 5 data points.

5 because each "hiccup" seems to be represented by two consecutive outliers, usually one way too high, and the next too low. A median filter using the last 4 samples should remove the outliers, but then it has to choose from two points. With 5, there will always be a middle point to use. Or maybe I should go with 4 points and just average the two middle points? Using only 4 points as opposed to 5 would reduce the lag. Hmm... (Statistics never were my strong point in college!)

Whichever way the filtering works, the main state of the speed sensor would be the filtered value and there would be an attribute with the "raw" speed value. I'd probably also have a timestamp attribute for the filtered value, and one for the raw value, since they won't be the same.

If the filtering option is not used, then the main state would be like before, the raw value, and it would have a timestamp attribute as well.

Anyway, I'll start playing with that. Let me know what you think.

@jata1
Copy link
Author

jata1 commented Apr 1, 2024

Hey Phil - This sounds amazing. I do agree some processing in composite would be the best way to identify and then drop data that is clearly incorrect. You could look at calculated speed delta between 2 or 3 points and drop data when you get a spike.

I have been away camping with no internet/mobile at all so that was nice but only now can I look back at my travelling speed data. Had my wife in the car with me so can do a good comparison of the data from 2 sources at the same time/location and speed.

I'll have a quick look at the data to see if my approach to manage/throttle GPS data before going into composite has worked - although I think the issue is not resolved.

Interesting you managed to get the stats approach working. I did try median with similar age limit / sample size parameters and it didn't work for me but I could have another look.

Thanks also for doing the speed calc on the points above. Definitely a spike that I can see in the graph and then track down to actual data in the icloud3 log so I think (with your help) we can work this all through.

@jata1
Copy link
Author

jata1 commented Apr 2, 2024

I forgot to ask in my last post whether the median / stats sensor will be a 'rolling' statistic and not require 4-5 GPS data to create the speed calculation?

The reason I ask is that if it is not a rolling statistic, then each speed calculation would be over a long period of time and unlikely to catch the speeding events I am looking to catch (apart from on long drives on a motorway).

@pnbruckner
Copy link
Owner

I'm not sure I'm understanding your question.

The existing, standard statistics sensor uses multiple data points. At startup, it will get them from the database (i.e., data points recorded for the input entity before HA was restarted.) If those are not available, then once it has enough data points it will have a valid value (as opposed to unknown.) So, bottom line, it uses the latest data point, and as many before that as defined by the options (sampling_size and/or max_age.)

What I would do in the composite would be similar. It would keep the last 5 "raw" speed values (derived from the input state changes it decided to use), and then apply a median filter to those to determine the speed sensor's state value. At least, that would be my first implementation. It could get more complex at some point if necessary.

@jata1
Copy link
Author

jata1 commented Apr 3, 2024

Phil - sorry to confuse you. What I am trying to say is that the impact of doing statistics (median) will smooth the speed data and this will hide some of the trends/events that I am actually most interested in showing - e.g. going too fast on local roads for a short period of time (this is not a false positive spike and I need to identify these)

What would like to work with you on is something like...

  1. identify and remove any obviously erroneous data - e.g. from 20kmh to 200kmh to 20kmh (in a minute) - then remove the 200kmh data point as it is highly likely to be an error.
  2. have a actual calculated speed (adjusted for erroneous data)
  3. have a median calculated speed (adjusted for erroneous data)
  4. sensor to output actual speed data points with the calculated median used to fill the erroneous data gaps.

@jata1
Copy link
Author

jata1 commented Apr 3, 2024

Driving to work today, I got a speed spike where the individual speed sensors were fine. I am using a composite made up of my adjusted icloud3 data (updated every 2 minutes) combined with HA app.

It looks like it is combining multiple 'clean' data sources together can create the spikes. I think simply due to how close in time the data points are (with different GPS coordinates). Maybe another 'setting' for the composite, is to drop data that is closer than x seconds from the previous data?

I think this setting would fix the original issue I have with icloud3 (15sec interval when close to home) and potentially catch nearly all the 'erroneous' data I mentioned above...

@pnbruckner
Copy link
Owner

I see what you're saying now.

The main problem with your proposed solution is it would significantly delay the speed data. If it had to take the last three data points into account for each output, then the output would be delayed by two sample periods. If the samples are two minutes apart, then the speed output would be delayed by four minutes. That would definitely not work for many use cases. E.g., when my garage door integration worked (MyQ, argh!!!) I had an automation that would automatically open the door when I got close to home. One of the conditions was that I was traveling more than 10 mph so that it would not trigger if I was just out for a walk. Delaying the speed by four minutes would definitely not work for this kind of scenario.

Also, if 20 to 200 is "erroneous", why isn't 20 to 100, or 20 to 30, or ...? I.e., where should the cutoff be, that would work in all cases?

I suppose some of the parameters could be controlled by the user, but too many dials and knobs can confuse people, too.

At this point, I'm not sure what makes most sense. And I have a few other pressing things I need to take care of, so I think I'm going to have to put this on the "back burner" for now.

@jata1
Copy link
Author

jata1 commented Apr 3, 2024

No problem Phil. I totally appreciate all the help you have provided over the past few weeks. I hope you get time to look at this again when you are not so busy.

I'm going to try to hack together a way to drop data that is close together in time (will try to set this to 60secs) and see if this helps if/when using GPS data from multiple sources (icloud and HA app).

I am also trying to just use one GPS source (icloud3) set at a 2min update interval when away from home.

@pnbruckner
Copy link
Owner

Closing since there hasn't been an update in a while. Not really sure what composite could/should do to consistently & robustly filter/reject large spikes in speed calculation. Feel free to reopen with new data/info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants