Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed issue #269

Open
budddemarketing opened this issue Apr 30, 2024 · 1 comment
Open

Speed issue #269

budddemarketing opened this issue Apr 30, 2024 · 1 comment

Comments

@budddemarketing
Copy link

I got this working for an Access Query I needed to export to JSON each month.
Did a top 10 for testing and it was working well.
Finished in seconds

Now I tried it with 60k records and ~20 columns and while I wasn't expecting speed racer it's slow.
I have let it run for hours so far and haven't gotten anywhere.

Maybe make a temp table first might help a little instead of the query.
Maybe but that is grasping.

When I stepped though it nothing jumped out at me.
Anyone else running this with larger record-sets?

@Nick-vanGemeren
Copy link

First, a bit of maths.

  • To process 60k rows in 1 hour, each row has to be done in 3600/60000 = 0.06 seconds, or 16 per second. This is for the whole process from database to output, not just JSON conversion.
  • With 20 columns, each of the 1.2M values has to be processed in 3 milliseconds.
  • If the rows are converted to JSON Objects and each value converts to an average 16 characters of key-pair output, the final Unicode string will be around 39 MB. A significant part of that is just repetition of the keys.

So if your 10 rows took 6 sec. instead of 0.6, the 60k should take 10 hours. If the data is coming from the database into Excel as a block, then into VBA structures, then into JSON, I don’t think much can be done.

The killer will be that large output string. Issue #203 showed that problems can occur with strings of 10M characters. So maybe your process failed after 6 hours with an ‘out of string space’ error or similar.

It may be possible to rework the internal ‘buffer’ mechanism to minimise using large strings internally, but the large final string would still remain. So I don’t know if that would help much.

You haven’t mentioned why you need JSON output. CSV output will be half the size and can be directly loaded into Excel. Maybe your database can export directly in CSV or JSON format.

If you really have to program JSON output, it may be best to write to a file with ‘records’ for each row including an initial character to form the top-level JSON Array:

	[<JSON for row 1>
	,<JSON for row 2>
	…
	,<JSON for row n>
	]

It may still take a long time, but there will be no issues with ballooning strings or task memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants