Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query performance tests / unit tests for queries #109

Open
fionera opened this issue Dec 12, 2022 · 1 comment
Open

Query performance tests / unit tests for queries #109

fionera opened this issue Dec 12, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@fionera
Copy link
Collaborator

fionera commented Dec 12, 2022

Because of the lookups that where implemented in the last commits, the queries got way slower. A proper testing toolkit should be built and test the existing queries to prevent introduction of performance issues.

Example

Currently:

SELECT DISTINCT
    SamplerAddress,
    SamplerToString(SamplerAddress) AS Sampler
FROM flows_raw

Query id: b394549a-1803-4271-95e2-c318f3dd8b77

┌─SamplerAddress───┬─Sampler────────────┐
│ ::ffff:127.0.0.1 │ netmeta.fionera.de │
└──────────────────┴────────────────────┘

1 row in set. Elapsed: 50.286 sec. Processed 741.91 million rows, 11.87 GB (14.75 million rows/s., 236.06 MB/s.)

Original:

SELECT DISTINCT SamplerAddress
FROM flows_raw

Query id: 1f1779a9-1d90-4164-b4de-170db889a5bf

┌─SamplerAddress───┐
│ ::ffff:127.0.0.1 │
└──────────────────┘

1 row in set. Elapsed: 1.501 sec. Processed 741.92 million rows, 11.87 GB (494.32 million rows/s., 7.91 GB/s.)

Solution:

SELECT SamplerToString(SamplerAddress) AS Sampler
FROM
(
    SELECT DISTINCT SamplerAddress
    FROM flows_raw
)

Query id: 5c80bb3c-dea1-417d-b1fb-bc50b229ef75

┌─Sampler────────────┐
│ netmeta.fionera.de │
└────────────────────┘

1 row in set. Elapsed: 1.522 sec. Processed 741.92 million rows, 11.87 GB (487.55 million rows/s., 7.80 GB/s.)
@leoluk leoluk added the enhancement New feature or request label Dec 21, 2022
@leoluk leoluk changed the title Query performance tests / Unit-Tests for Queries Query performance tests / unit tests for queries Dec 21, 2022
@UnamedRus
Copy link

BTW, if you don't use LIMIT functions, it can make sense to look into GROUP BY instead of DISTINCT

https://kb.altinity.com/altinity-kb-queries-and-syntax/distinct-vs-group-by-vs-limit-by/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants