Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date Cleaning (clean_date) falied to clean dates with 'August' #968

Open
DaedalusInMaze opened this issue Jul 12, 2023 · 1 comment
Open
Assignees
Labels
type: bug Something isn't working

Comments

@DaedalusInMaze
Copy link

DaedalusInMaze commented Jul 12, 2023

Describe the bug
When cleaning dates with clean_date module, if the source date contains 'August', the function will not recognize it as a date. All other text months including 'Aug' can be properly identified and cleaned.

To Reproduce

from dataprep.clean import clean_date
import pandas as pd
samp = pd.DataFrame({'date': ['2021 August 21', '2021 Aug 21', '2021 July 21', '2021 Jul 21', '2021 08 21', 'Aug 21 2021']})
clean_date(samp, 'date')

Expected behavior
E.g. '2021 August 21' will be cleaned into '2021-08-21 00:00:00'.

Screenshots
image

Desktop (please complete the following information):

  • OS: Windows 11
  • Browser: N/A
  • Platform: VSCode
  • Platform Version: 1.80.0
  • Python Version: 3.10.11
  • Dataprep Version: 0.4.5

Additional context
I noticed that there is already an issue open on FutureWarning: Meta is not valid.

@DaedalusInMaze DaedalusInMaze added the type: bug Something isn't working label Jul 12, 2023
@DaedalusInMaze
Copy link
Author

DaedalusInMaze commented Jul 12, 2023

The issue might be in tokens = split(date, JUMP) where 'st' is in the JUMP list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants