Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEFBMP2012-19 - Duplicates in source data #1

Open
MelinaHoule opened this issue Jun 8, 2022 · 1 comment
Open

NEFBMP2012-19 - Duplicates in source data #1

MelinaHoule opened this issue Jun 8, 2022 · 1 comment
Assignees
Labels
duplicate This issue or pull request already exists Temporary upload Uploaded data that have unanswered issues

Comments

@MelinaHoule
Copy link
Collaborator

Duplicates are based on Location, Date/Time, Species, Abundance and Protocols (distance/duration).

NEFBMP2012-19 has duplicates in the source data (sheet : Bird Data)

Example: Point_number = 601;
Observer: Wildgust, Allon;
Date: 2015-06-20;
species: BADO ;

We treat them as duplicates for now.

Waiting to hear back from our Partner to validate they are real duplicates or if we should add them up to make an abundance of 2.

@MelinaHoule MelinaHoule added the Temporary upload Uploaded data that have unanswered issues label Jun 8, 2022
@MelinaHoule MelinaHoule self-assigned this Jun 8, 2022
@MelinaHoule MelinaHoule added the duplicate This issue or pull request already exists label Jun 22, 2022
@MelinaHoule
Copy link
Collaborator Author

Answer from the data partner:
"I don't believe these are duplicate entries--all data were checked with original field sheets after data entry occurred, so these raw data should be summed. Of course I can't rule out the possibility that there was only one BADO which was double-entered, and then that error was missed during the error-checking process, but that would be an isolated occurrence and very unlikely to happen."

Duplicates occur 10% of the time. It can't be considered as isolated. I propose to sum those rows.

Another case of duplicate exist: 24 rows have identical attributes with the exception of detection_cues. Since detection cues is recorded in the extended table, I propose that we sum the abundance in the survey table, but split them apart in the extended table to record the proper behavior. Abundance attributes is found in both table. To avoid confusion, we may need to rename abundance in the extended table to reflect that difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists Temporary upload Uploaded data that have unanswered issues
Projects
None yet
Development

No branches or pull requests

1 participant