Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datefind mistakenly identifies "pre-qualification may" as a date resulting in the date list as "on may" #187

Open
MichelRobitaille opened this issue Apr 14, 2023 · 0 comments

Comments

@MichelRobitaille
Copy link

In any sentence containing two words where the first word ends by "on" and the second word is contains the name of a month such as "may", the list of dates will contain "on may". Clearly "on may" is not a date.
You may use the following sentence as test case:
"On February 10, 2012, DHI Mortgage became aware that a software security breach by external sources had occurred in its Internet Loan Prequalification System. DHI Mortgage immediately isolated the affected server, purged certain affected files, and modified the electronic security measures. People who provided their information online for pre-qualification may have had their names, Social Security numbers, dates of birth, contact information, marital status, employment information, income, asset information, and liability information exposed."
The list of dates will have:
['On February 10, 2012', 'on may', 'on, mar']

Clearly only the first date is correct and the last two are erroneously added.

Also is there a way to solve the following warning the datefinder is used at the import time using Python
import datefinder

C:\Users\User\anaconda3\lib\site-packages\dateutil\parser_parser.py:1207: UnknownTimezoneWarning: tzname PDT identified but not understood. Pass tzinfos argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.

Thanks in advance.
Regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant