csvsql: agate-sql: add option to use data's maximum varchar length #928

StudentForever · 2018-01-31T16:25:40Z

Hi

Not sure if I am not passing the right flags, I noticed that the column sizes in the create table query default to max e.g. varchar (max) . Is there an option so that column sizes are inferred based on the actual data.


 cat test.csv
tag,tag2,Date,col1,col2,col3,col4,col5
DFAN8086,sss,12/24/2012,0.9454054,0.391277,0.696738598,0.963471,0.422531
FDG491BT1088,wew,12/24/2012,0.922601887,0.3125817,0.991514,0.740261,0.3096907
GEW00328745,tere,12/24/2012,0.341763918,0.587643,0.834997,0.611671,0.3686074
GWW02993182,qewqeqw,12/24/2012,0.078535992,0.497771,0.394849,0.4730659,0.4074645


csvsql -i mssql test.csv
CREATE TABLE test (
        tag VARCHAR(max) NOT NULL,
        tag2 VARCHAR(max) NOT NULL,
        [Date] DATE NOT NULL,
        col1 DECIMAL(38, 9) NOT NULL,
        col2 DECIMAL(38, 7) NOT NULL,
        col3 DECIMAL(38, 9) NOT NULL,
        col4 DECIMAL(38, 7) NOT NULL,
        col5 DECIMAL(38, 7) NOT NULL
);

The text was updated successfully, but these errors were encountered:

jpmckinney · 2018-02-02T14:28:51Z

Different databases have different default behavior in agate-sql (which csvsql uses) - it seems mssql defaults to max. We can add a flag to csvsql to calculate the max length from the data. Some users don't want this, because they may add new data whose fields are longer, or they may not want csvsql to do any inference on the data (which can be slow for large files).

StudentForever · 2018-02-02T15:53:41Z

@jpmckinney Thanks for reviewing the request.

regards
kiran

Freewilly3d · 2018-04-22T15:11:22Z

I'm interested in something like this for Oracle also. Another option would be to set a default for varchar on the command line like varchar2(1000).

Thanks

chriszrc · 2024-03-29T12:34:47Z

I really thought csvsql used to do this already, at least for postgres. Finding max lengths for values in multiple columns and setting reasonable varchar limits is a pain, would really this to be automated

jpmckinney · 2024-03-29T16:27:07Z

Why do you need limits when using Postgres? The current behavior with Postgres is to set no limit.

jpmckinney · 2024-03-29T16:29:15Z

Postgres describes no performance concern, so the only reason I can imagine is database-level data validation - though validation is often better handled in code. https://www.postgresql.org/docs/current/datatype-character.html
See also https://stackoverflow.com/a/4849030

chriszrc · 2024-03-29T16:44:37Z

@jpmckinney yes, it's for data validation, it's saved me more than once from erroneous data being inserted in wrong/mismatched columns

StudentForever changed the title ~~csvsql field size~~ csvsql field size defaults to varchar(max) instead of inferring from data ? Jan 31, 2018

jpmckinney changed the title ~~csvsql field size defaults to varchar(max) instead of inferring from data ?~~ agate-sql: mssql: csvsql field size defaults to varchar(max) instead of inferring from data Feb 2, 2018

jpmckinney added Normal Priority feature: upstream labels Feb 2, 2018

jpmckinney changed the title ~~agate-sql: mssql: csvsql field size defaults to varchar(max) instead of inferring from data~~ agate-sql: mssql: csvsql add option to calculate maximum varchar length Feb 2, 2018

jpmckinney changed the title ~~agate-sql: mssql: csvsql add option to calculate maximum varchar length~~ agate-sql: csvsql add option to provide / calculate maximum varchar length Apr 22, 2018

jpmckinney changed the title ~~agate-sql: csvsql add option to provide / calculate maximum varchar length~~ agate-sql: csvsql add option to use data's maximum varchar length May 21, 2018

jpmckinney added Low Priority and removed Normal Priority labels May 21, 2018

jpmckinney added the csvsql label Oct 17, 2023

jpmckinney changed the title ~~agate-sql: csvsql add option to use data's maximum varchar length~~ csvsql: agate-sql: add option to use data's maximum varchar length Oct 17, 2023

jpmckinney removed the Low Priority label Oct 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csvsql: agate-sql: add option to use data's maximum varchar length #928

csvsql: agate-sql: add option to use data's maximum varchar length #928

StudentForever commented Jan 31, 2018

jpmckinney commented Feb 2, 2018

StudentForever commented Feb 2, 2018

Freewilly3d commented Apr 22, 2018

chriszrc commented Mar 29, 2024

jpmckinney commented Mar 29, 2024

jpmckinney commented Mar 29, 2024 •

edited

chriszrc commented Mar 29, 2024

csvsql: agate-sql: add option to use data's maximum varchar length #928

csvsql: agate-sql: add option to use data's maximum varchar length #928

Comments

StudentForever commented Jan 31, 2018

jpmckinney commented Feb 2, 2018

StudentForever commented Feb 2, 2018

Freewilly3d commented Apr 22, 2018

chriszrc commented Mar 29, 2024

jpmckinney commented Mar 29, 2024

jpmckinney commented Mar 29, 2024 • edited

chriszrc commented Mar 29, 2024

jpmckinney commented Mar 29, 2024 •

edited