Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to limit the number of rows output (head, tail, slice) #669

Open
jasperboyd opened this issue Oct 14, 2016 · 8 comments
Open

Option to limit the number of rows output (head, tail, slice) #669

jasperboyd opened this issue Oct 14, 2016 · 8 comments

Comments

@jasperboyd
Copy link

Hey I have a large CSV I'm looking to break up into smaller chunks to perform an import. Is this functionality already available with csvcut or csvgrep. I was reviewing the docs but I didn't see any avenue to do it?

If this isn't currently possible I'd be happy to contribute it to the project.

Cheers

@jpmckinney
Copy link
Member

So, a sort of csvhead or csvtail?

@jasperboyd
Copy link
Author

Yeah exactly, the way I was envisioning, would be either defining the amount of rows you want in each section, 1000 on a 10,000 row csv would create 10 files, with the option to include the header row in each file. However creating a csvhead and csvtail functionality would probably be a good middle step.

@jpmckinney
Copy link
Member

We try to avoid creating files so that people can pipe the output of one tool into another tool, so I think we should start with csvhead (and/or csvtail). However, you may just want to write your own bit of code using agate on which csvkit relies.

@jasperboyd
Copy link
Author

Sounds good to me!

@superurbi
Copy link

+1

@onyxfish
Copy link
Collaborator

I'd seriously consider a csvsplit or csvhead or csvslice or some-such as an additional tool. Proposed interface:

csvslice -s [START_ROW] -n [NUM_ROWS] input.csv

Can hook on agate's Table.limit.

Not sure if/how it should handle the "gimme 50 files" case. Probably with a flag.

@onyxfish onyxfish changed the title Ability to Split CSV by Row? csvhead / csvslice / csvsplit Dec 29, 2016
@jpmckinney
Copy link
Member

Re: -s [START_ROW], that will be a common option across almost all tools: #775

@jpmckinney jpmckinney changed the title csvhead / csvslice / csvsplit csvtail to drop and take rows May 21, 2018
@jpmckinney jpmckinney changed the title csvtail to drop and take rows Option to limit the number of rows output Oct 17, 2023
@jpmckinney jpmckinney mentioned this issue May 2, 2024
@jpmckinney jpmckinney changed the title Option to limit the number of rows output Option to limit the number of rows output (head, tail, slice) May 2, 2024
@jpmckinney
Copy link
Member

Noting that if there is demand, we can take inspiration from https://github.com/dannguyen/csvmedkit/blob/main/csvmedkit/utils/csvslice.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants