Deduplication logic for City Marshal Evictions Dataset #309

sarahJune1 · 2024-01-20T17:30:02Z

The deduplication logic for evictions dataset must take into account both the court index number and borough code. The current logic assumes that court index number is unique across the dataset and therefore is dropping data.

The scope of this issue would add the borough code and court index number to the logic.

Additional context:

There are about 100 rows per month that are missing in the nycdb but are present in the Open Data source file due to the current deduplication logic that uses only the court index number.
This dataset is used in the displacement project

Link to EDA notebook: https://colab.research.google.com/drive/1sLET77zixEa_bDzbaqsWuUwbsUKwhM7z?usp=sharing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduplication logic for City Marshal Evictions Dataset #309

Deduplication logic for City Marshal Evictions Dataset #309

sarahJune1 commented Jan 20, 2024 •

edited

Deduplication logic for City Marshal Evictions Dataset #309

Deduplication logic for City Marshal Evictions Dataset #309

Comments

sarahJune1 commented Jan 20, 2024 • edited

sarahJune1 commented Jan 20, 2024 •

edited