Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split large data inserts #445

Closed
chicco785 opened this issue Feb 4, 2021 · 4 comments
Closed

split large data inserts #445

chicco785 opened this issue Feb 4, 2021 · 4 comments
Assignees

Comments

@chicco785
Copy link
Contributor

Is your feature request related to a problem? Please describe.

When a large set of data is received, if doing a single insert, crate bombs:

ERROR:app:Exception on /v2/notify [POST]
Traceback (most recent call last):
  File "/src/ngsi-timeseries-api/src/translators/sql_translator.py", line 295, in _insert_entity_rows
    self.cursor.executemany(stmt, rows)
  File "/usr/local/lib/python3.8/site-packages/crate/client/cursor.py", line 67, in executemany
    self.execute(sql, bulk_parameters=seq_of_parameters)
  File "/usr/local/lib/python3.8/site-packages/crate/client/cursor.py", line 53, in execute
    self._result = self.connection.client.sql(sql, parameters,
  File "/usr/local/lib/python3.8/site-packages/crate/client/http.py", line 331, in sql
    content = self._json_request('POST', self.path, data=data)
  File "/usr/local/lib/python3.8/site-packages/crate/client/http.py", line 458, in _json_request
    _raise_for_status(response)
  File "/usr/local/lib/python3.8/site-packages/crate/client/http.py", line 187, in _raise_for_status
    raise ProgrammingError(error.get('message', ''),
crate.client.exceptions.ProgrammingError: SQLActionException[CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [1570329861/1.4gb], which is larger than the limit of [1530082099/1.4gb], usages [request=222986240/212.6mb, fielddata=0/0b, in_flight_requests=196118812/187mb, query=10660857/10.1mb, jobs_log=71309008/68mb, operations_log=67110264/64mb, accounting=10343710/9.8mb]]

Describe the solution you'd like

Split insert in smaller batches for multi notifications.

Describe alternatives you've considered

Fixing on the client side currently.

Additional context
N/A

@c0c0n3
Copy link
Member

c0c0n3 commented Feb 5, 2021

This goes hand in hand with #193

@chicco785
Copy link
Contributor Author

@c0c0n3 fixing the inject to db with a sub batches should not be much pain, can you have a look? the issue occurred in prod on ekz

@c0c0n3 c0c0n3 mentioned this issue Feb 10, 2021
8 tasks
@amotl
Copy link

amotl commented Feb 10, 2021

Hi @c0c0n3 and @chicco785,

first of all, I want to salute you for #450. Then, I wanted to take the chance to inform you that there is a new asyncpg release, which just happened a few hours ago, see [1]. This includes the long awaited patch for improved executemany() performance wrt. batch operations, see MagicStack/asyncpg#295.

We are currently looking into how that can improve performance when talking to CrateDB using the PostgreSQL wire protocol. So, while I recognize QuantumLeap is still using the HTTP interface, that might well be an option to look into within a future iteration.

Whether Flask/Werkzeug will support asyncio natively is obviously the other side of the story in order to leverage most out of asyncpg. Things like [2] and [3] are obviously just workarounds and should not yet be considered for production use. Also, experiments like [4] or [5] are also, well, experimental. However, people at pallets/flask#3339 and pallets/werkzeug#1322 are discussing the real thing on this matter.

With kind regards,
Andreas.

[1] https://github.com/MagicStack/asyncpg/releases/tag/v0.22.0
[2] https://jdhao.github.io/2020/06/07/asyncio_inside_flask/
[3] https://stackoverflow.com/questions/47841985/make-a-python-asyncio-call-from-a-flask-route
[4] https://github.com/miguelgrinberg/aioflask
[5] https://flask-aiohttp.readthedocs.io/

cc @mfussenegger, @seut

@c0c0n3
Copy link
Member

c0c0n3 commented Feb 11, 2021

Hi @amotl, thanks for your support and appreciation.

QuantumLeap is still using the HTTP interface, that might well be an option to look into within a future iteration

Yes, definitely. Just give us a shout when you guys are done w/ those postgres wire protocol improvements, we'll definitely consider switching to asyncpg then so we can kill two birds w/ one stone. Btw, thanks for letting us know about asyncpg, judging by the pr you referenced the guys there are doing an awesome job!

Whether Flask/Werkzeug will support asyncio natively is obviously the other side of the story

Yep, spot on. We looked into that a while back and got stuck since not all libs in our stack were designed to work w/ an event loop.

people at pallets/flask#3339 and pallets/werkzeug#1322 are discussing the real thing on this matter.

Exciting news, thanks for sharing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants