Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to create example.log file used in the performance measurements #132

Closed
tstack opened this issue Feb 14, 2024 · 2 comments
Closed

how to create example.log file used in the performance measurements #132

tstack opened this issue Feb 14, 2024 · 2 comments

Comments

@tstack
Copy link

tstack commented Feb 14, 2024

The performance section mentions a log file used for testing, can you provide a link to that or what you used to generate it?

I'd like to try it out with lnav to see how it performs and if there is anything I can improve.

Thanks

@pamburus
Copy link
Owner

Hi, unfortunately I cannot share this log file because it contains private and confidential information.
I used it to measure performance because it is big enough and quite typical for my everyday use cases.

But I can gather some statistics about it and share them. In the performance section I only shared the total file size and the number of log lines in it, the length of the lines varies from 109 to 763670 bytes. I think this is the most important information describing the source, and most real log files of similar size would show similar performance. But if you need additional statistics, I think I can easily collect them. For example, I've just collected the distribution of line lengths and the number of top-level keys in lines.

Here is the distribution of the number of keys:

Occurrences Keys per line
160 16
42 15
70743 14
28024 13
3643 12
9027 11
386409 10
1464520 9
60525 8
117789 7
3813411 6
17038 5

The data was collected using the following command:

jq 'length' example.log | sort -rn | uniq -c >example.nkeys

There are at least 5 keys in each line that are: "level", "ts", "logger", "msg", "caller".

Here is the distribution of line lengths: example.len.zip.
The data was collected using the following command:

awk '{print length}' example.log | sort -rn | uniq -c >example.len

Feel free to ask me to collect any additional statistics.
You can try to write a script to generate some synthetic data, or I can try jotting it down when I have time.

@pamburus
Copy link
Owner

pamburus commented May 31, 2024

I found another open dataset, transformed it slightly and made the measurements.

Source file

Web robot detection - Server logs

Transformation command

pv -c -N input <public_v2.json | jq -c 'to_entries[] | {"request-id": .key} + .value + {"response": (.value.response | tonumber), "bytes": (.value.bytes | tonumber), level: "info"}' | (pv -c -N output >web-robot.log)

Notes

  • hlogf 1.4.1 had issues with parsing this file, so it was excluded

Measurements

graph_3249171545.pdf

Raw details

❯ hl --version
hl 0.29.5

❯ time hl web-robot.log -c -o /dev/null
hl web-robot.log -c -o /dev/null  12.06s user 0.72s system 930% cpu 1.374 total

# ---

❯ humanlog --version
humanlog version 0.7.6+deb0543

❯ time humanlog <web-robot.log --color always >/dev/null
humanlog> reading stdin...
humanlog --color always < web-robot.log > /dev/null  92.02s user 4.38s system 108% cpu 1:28.76 total

# ---

❯ fblog --version
fblog 4.10.0

❯ time fblog web-robot.log >/dev/null
fblog web-robot.log > /dev/null  25.85s user 1.65s system 98% cpu 28.001 total

❯ time fblog -d web-robot.log >/dev/null
fblog -d web-robot.log > /dev/null  131.54s user 14.14s system 99% cpu 2:26.33 total

# ---

❯ wc -cl web-robot.log
 4091155 3312769299 web-robot.log

❯ sysctl -n machdep.cpu.brand_string
Apple M1 Max

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants