Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logic to infer data type and locale specific number formats #478

Open
sshikov opened this issue Jan 19, 2018 · 0 comments
Open

Logic to infer data type and locale specific number formats #478

sshikov opened this issue Jan 19, 2018 · 0 comments

Comments

@sshikov
Copy link

sshikov commented Jan 19, 2018

private static final Pattern LONG = Pattern.compile("\d+");
private static final Pattern DOUBLE = Pattern.compile("\d*\.\d*[dD]?");
private static final Pattern FLOAT = Pattern.compile("\d*\.\d*[fF]?");

I suggest that different locale specific numbers formatting should also be supported. What do you think about custom formats recognizers for types like Dates, UUIDs etc?

Also, looks like only 1st line of CSV is used for type inference, not first 25 as expected.
private static final int DEFAULT_INFER_LINES = 25;

I have a file there 1st row of data contains some column like "device model" with digits only, and in the 2nd row there are also letters. Schema inferred contains union type "null", "string", and import failed on the same 2nd row.

@sshikov sshikov changed the title Logic to infer data type and locale specific bumber formats Logic to infer data type and locale specific number formats Jan 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant