Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detection_id is not unique within get_acoustic_detections() #283

Open
PietrH opened this issue Jun 20, 2023 · 6 comments
Open

detection_id is not unique within get_acoustic_detections() #283

PietrH opened this issue Jun 20, 2023 · 6 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@PietrH
Copy link
Member

PietrH commented Jun 20, 2023

There is a test that checks if get_acoustic_detections() returns unique detection_id's, this test sometimes fails (because setting limit = TRUE doesn't always result in the same records).

To replicate:

limited_detections <- etn::get_acoustic_detections(limit = TRUE)

limited_detections
#> # A tibble: 100 × 20
#>    detection_id date_time           tag_serial_number acoustic_tag_id
#>           <int> <dttm>              <chr>             <chr>          
#>  1     89123998 2014-07-01 09:41:46 141002734         A69-1206-2734  
#>  2     89123998 2014-07-01 09:41:46 141002734         A69-1206-2734  
#>  3     89123999 2014-07-01 09:43:10 141002734         A69-1206-2734  
#>  4     89123999 2014-07-01 09:43:10 141002734         A69-1206-2734  
#>  5     89124000 2014-07-01 09:53:34 141002734         A69-1206-2734  
#>  6     89124000 2014-07-01 09:53:34 141002734         A69-1206-2734  
#>  7     89124001 2014-07-01 10:00:12 141002734         A69-1206-2734  
#>  8     89124001 2014-07-01 10:00:12 141002734         A69-1206-2734  
#>  9     89124002 2014-07-01 10:03:05 141002734         A69-1206-2734  
#> 10     89124002 2014-07-01 10:03:05 141002734         A69-1206-2734  
#> # ℹ 90 more rows
#> # ℹ 16 more variables: animal_project_code <chr>, animal_id <int>,
#> #   scientific_name <chr>, acoustic_project_code <chr>, receiver_id <chr>,
#> #   station_name <chr>, deploy_latitude <dbl>, deploy_longitude <dbl>,
#> #   sensor_value <dbl>, sensor_unit <chr>, sensor2_value <dbl>,
#> #   sensor2_unit <chr>, signal_to_noise_ratio <int>, source_file <chr>,
#> #   qc_flag <chr>, deployment_id <int>

These detections also have the same timestamp, tag_serial_code, acoustic_project_code, ...

image

The package is treating this as a bug, is this intentional?

@PietrH PietrH added question Further information is requested help wanted Extra attention is needed labels Jun 20, 2023
@PietrH
Copy link
Member Author

PietrH commented Jun 20, 2023

@PieterjanVerhelst @jreubens @cfmuniz Would you expect get_acoustic_detections() to always return unique detection_id's?

PietrH added a commit that referenced this issue Jun 20, 2023
@PieterjanVerhelst
Copy link
Collaborator

PieterjanVerhelst commented Jun 21, 2023

Do you mean unique detection ids or unique records and hence remove duplicates in the background? In case of the latter, it would be good that the function only keeps one record of the duplicates, but then we have to define the columns on which the function should identify duplicates. A first thought would be tag_serial_number, datetime and station_name.

@PietrH
Copy link
Member Author

PietrH commented Jun 22, 2023

I mean unique detection ids, I wasn't expecting duplicates as in the screenshot above where you have the same detection id with different animal_ids and animal project codes, but with the same deploy latitude and longitude and date_time.

@PieterjanVerhelst
Copy link
Collaborator

Aah now I see! What is the definition of a detection ID? I find it strange that there are duplicate detection IDs, but each with a different animal_project_code. Normally, each detection ID should only have one animal_project_code. Unless my understanding of detection ID is wrong.

@peterdesmet
Copy link
Member

I think detection_id is assigned uniquely by the database to each detection. The detections we retrieve is the result of a view though, where each detection is join with a tag and animal using acoustic_tag_id (or alternative_acoustic_tag_id). Since some tags (across projects) have the same acoustic_tag_id it duplicates the row when joining the two tables. The detections view is designed smartly enough to only join to a project within the time range that the tag was used, but when a tag closing date (e.g. battery_estimated_end_date) is left open, ranges can overlap.

Is this the cause here?

@PieterjanVerhelst
Copy link
Collaborator

ok, that makes sense. I don't think this is the cause here, because the detection date_time is the same. I interpret this as a detection is under two animal_project_codes.
My guess of what is going on: could it be that there are projects registered in both the OTN database and ETN database each with a different animal_project_code, but since ETN links to OTN both are shown as duplicates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants