Is entire file loaded into memory to be queried? would prefer read line by line, not whole file loaded in (10Gb file) #313

therobyouknow · 2023-04-11T09:37:04Z

I have a 10Gb TSV file that I'd like to read using SQL commands.

As a TSV, tab-separated value file, it is a spreadsheet-like file in that it has headings/columns and rows. It's effectively like a single table database.

Being 10Gb I'd prefer not to bulk complete read in all at once as the whole file into memory, due to time taken to do so and limitation of machines' memory size (though I do have 16Gb, 24Gb and 32Gb machines).

Can you advise if, when running queries on the TSV, it is loaded into memory entirely all at once?

harelba · 2023-04-12T23:08:35Z

Hi, sorry for the late reply.

q does load the data into memory for processing, but it does contain an automatic caching feature which might help for large files.

If you run q with the -C readwrite parameter, then a cache for each file will be generated automatically (if the cache file doesn't already exist). That would cause the first execution to be slow, but all additional executions of q for that file will be extremely fast.

In order to create the cache file, you could run q -t -C readwrite "select count(*) from original-file.tsv. This will create another file called original-file.tsv.qsql.

After this preparation step (which will take a considerable time for a 10GB file), you will be able to do either of the following:

Run additional q commands which use original-file.tsv (with either -C readwrite or -C read as additional params). The cache file will be detected automatically and used, speeding up queries considerably.
Run additional q command directly against the .qsql file. For example q -t "select ... from original-file.tsv.qsql where ...". These will use the cache file directly and will not even require the original tsv file. Since you'll be using the .qsql file directly, you can copy it to another machine and use it directly there as well, without having to copy the original file.

Hope that will help. I'd appreciate it if you can write down your impressions of the speedup here after testing this.

therobyouknow · 2023-04-13T12:03:16Z

thank you @harelba

Hope that will help. I'd appreciate it if you can write down your impressions of the speedup here after testing this.

I will let you know! thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is entire file loaded into memory to be queried? would prefer read line by line, not whole file loaded in (10Gb file) #313

Is entire file loaded into memory to be queried? would prefer read line by line, not whole file loaded in (10Gb file) #313

therobyouknow commented Apr 11, 2023

harelba commented Apr 12, 2023

therobyouknow commented Apr 13, 2023

Is entire file loaded into memory to be queried? would prefer read line by line, not whole file loaded in (10Gb file) #313

Is entire file loaded into memory to be queried? would prefer read line by line, not whole file loaded in (10Gb file) #313

Comments

therobyouknow commented Apr 11, 2023

harelba commented Apr 12, 2023

therobyouknow commented Apr 13, 2023