New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TX and Requests wrong with incremental processing #2667
Comments
Thanks for providing details about the possible issue. Can you try replicating it with a smaller dataset? Also, are you running it with the Lastly, would you be able to share the smaller, reproducible dataset with me, either by uploading it here or sending it to me via email so I can replicate it on my end? |
Thanks for your quick response!
Command used here for the test on the smaller log, slightly sanitized: # initial run
goaccess access_log.1 --geoip-database=./GeoLite2-ASN.mmdb --config-file=./goaccess.conf --html-report-title="Server Statistics" -o ./report.html --anonymize-ip --jobs=2 --db-path=./db --persist
# subsequent runs
goaccess access_log --geoip-database=./GeoLite2-ASN.mmdb --config-file=./goaccess.conf --html-report-title="Server Statistics" -o ./report.html --anonymize-ip --jobs=2 --db-path=./db --persist --restore --keep-last=14 In case the order of arguments is important there: when setting up the script for the server, I moved the "global options" (i.e. those identical for all servers) to variables, like (again slightly sanitized): GeoIp="--geoip-database=$GeoIpPath/GeoLite2-ASN.mmdb"
GlobalOpts="--config-file=$BINDIR/goaccess.conf $GeoIp --anonymize-ip --jobs=2 --persist --restore --keep-last=14"
goaccess $LogBase/$vhost/$LogName $GlobalOpts --html-report-title="$vhost" -o $OutDir/${vhost}.html --db-path=$DBBaseDir/$vhost ( |
Absolutely, using a smaller dataset or log with fewer data entries or requests would definitely simplify the debugging process. If the issue occurs consistently, it's likely to appear even with a smaller log. Feel free to send it over to hello [at] goaccess.io. Also, please share your config file. Thanks |
Will do, thanks. Will take a little, though, to run the tests – but I'll sure come over with the data. If you can let me know where to send the mail to that is – I'd rather not post the full logs (with IPs and all) publicly 😉 If you prefer sending me the mail by mail, you can construct my address from my nick shown (the shorter one) 'at' my login, and the DE TLD. |
OK, short update in between from the server of the test runs above. I refrained from any "additional runs" for a full day. The "wrong values" indeed only affected the one day when I did the additional runs, and went back to normal after that. So the "wrong day" will "age out" thanks to the "rolling window": 16th (the day with alleged 1.2 GB xfer) was when I had the multiple runs. Before and after are looking fine. Still using the very same DB. |
I haven't seen any emails come through. Go ahead and send them if you'd like. By the way, did you run a smaller test to see if you could recreate the problem? |
😉 OK, meanwhile found the mail address in the But the main error in this regard is on my end, though: I didn't yet get to run the tests, apologies. Not forgotten, I was just drowning in other tasks… Did that now, and wow, looks like I got a quite extreme example – with a log file of just 140 kB. Was able to more than reproduce with just 3 runs:
At this point, I prepared the files for the mail. Then ran another initial run. Results: xfer 41.32 MB, requests 692 – as expected. Will now prepare and send the mail with the files: both resulting Expect the mail in a few minutes (need to zip the collection up, get it from the server I've run the test on, and write up the mail). Edit: mail sent. |
Mail bounced, sorry: GMail refused your forward. You seem to have your listed address shipping all incoming mails to a gmail address, delivering them to gmail in the name of the original sender (i.e. "mail from me" gets there with "delivered by your server"), which breaks SPF as your domain is not listed in my domains SPF records (why should it be?). I've now sent it again to the gmail address the bounce report mentioned. Hope it gets through this time. Please let me know when you got it. (that said, gmail doesn't behave fine with many small providers. Would be good if there was a mail account with a non-monopolized "trust-able provider" 😉) |
Got the mail. I'll check on this. Thanks! |
Just curious, did you find the culprit already? |
So, I spent some time tinkering with it over the weekend, and I couldn't reproduce any issues with the access log and config file you sent. Here's what I did:
and then I ran it repeatedly using
Each time, I got identical results, except for some slight changes in percentages due to overall hits/visitors totals changing (i.e., If you're still encountering the problem, could you walk me through step by step how you're triggering it using the exact tar.gz file you shared with me? |
Now that is strange. I was able to immediately reproduce it (on Linux). My friend in the US as well (on BSD). Steps as outlined above (and in my mail, IIRC). Only difference I see on a quick glance is that we used HTML reports – but that should not be relevant to the data themselves IMHO. Let me quote the full command line (sanitized) again to make sure: # initial run
goaccess access_log.1 --geoip-database=./GeoLite2-ASN.mmdb --config-file=./izzy.conf --html-report-title="Server Statistics" -o output/report.html --anonymize-ip --jobs=2 --db-path=./webdb --persist
# subsequent runs
goaccess access_log --geoip-database=./GeoLite2-ASN.mmdb --config-file=./izzy.conf --html-report-title="Server Statistics" -o output/report.html --anonymize-ip --jobs=2 --db-path=./webdb --persist --restore --keep-last=14 That's what I used. My friend used it without the |
I noticed you've got two log files, access_log.1 and access_log. But in the tar.gz you sent over, I only see access_log. Please feel free to share access_log.1. It'll help me dig into this. Thanks. |
Oof, that I copied and sanitized from my notes now (for "initializing a new vhost"). Exact steps for my test case are described in
while I specified several additional parameters (again sanitized here, the real values you have in the file):
Blind guess would be |
Thanks Gerardo! I don't understand it. The only difference I can see there is that you now have each subsequent run on a "static log file" (as I only sent you the log with the latest state), while during my runs the file was (minimal) appended to¹. I'll see to find some time for another test run here, copying the logs in between to have the possibility of 100% reproducible runs (with the same changes in the files). If I again can reproduce (which I assume), I'll send you another mail with all the log file copies. Again, thanks a lot for digging in! ¹ as the text file with the test protocol mentions how many hits/lines where added to the log, you should be able to produce the original states by removing the corresponding lines and then use each corresponding copy. That should fully reproduce my test run then. Though I'm really perplexed it does not even in the slightest show the changes I saw here… |
PS: I was able to reproduce again. Mail with the 3 logs and the protocol will go out to you in a few minutes. Stats in short:
Interesting 670 + 2 = 672; I assume if I'd only run test1.log and test3.log, I'd end up with the same results. But no: that gives 262.8 MiB and 1314 requests 🤷♂️ Btw: running test3.log another time doesn't change a thing anymore. |
Got the files, thanks for sharing those. I understand now. So basically, the current setup relies on the inode. Since each of those three logs has a different inode, it treats them as separate logs and parses them individually. But if you pipe data directly into goaccess, you should see the results you're after.
The thing is, when data is piped, it operates on timestamps without knowing the inode or the last line processed. Now, if you're wondering why this doesn't happen when you directly pass a log to goaccess, well, it's because that would be super time-consuming. It'd have to go through each and every line again, comparing timestamps. Right now, it just jumps straight to the last line it read and carries on from there, which is much faster. Let me know if that makes sense. |
That's only for this test run today. Last test in April it was the very same file the webserver appended to, with the same results. I just copied the different "stages" to separate files for you to easier reproduce. So basically, as I described before: if you first parsed Fun thing is, too, that this only happens when I have multiple runs in short sequence (until now, I only had that with multiple runs in the same hour). I do not notice such when running once every 24h. Have not tried with once per hour yet, though (which could be a common setup again, with a cron job). |
To elaborate, are you suggesting that by parsing test1.log, executing goaccess with --persist, then adding the 28 lines from test2.log and rerunning goaccess with --persist and --restore, followed by appending the additional 2 lines from test3.log to test1.log and again running goaccess with --persist and --restore, I can replicate the problem? e.g.,
then
then
then
then
|
With the exception that the |
Got it, thanks for letting me know about this. I've managed to reproduce the issue. Could you please build from development and thoroughly test it to see if it sorts out the problem for you? Seems like this hiccup cropped up when we introduced the multi-threading feature back in v1.9. |
Phew! I was just about to give up and thought it must be something with the machines of my friend and me, though I'd have no idea what… Thanks for your patience and sticking to it!
Oh, so my guess with
Will see to it, sure! (oof, needed to install |
After a few test runs: looks good! I could no longer reproduce the issue. I wouldn't exactly say I "thoroughly tested" it, but the combinations that reproduced the issue in the past (a few subsequent runs within a few minutes, some irregular runs during the same day) no longer do that, numbers seem fine. My friend prefers to wait until BSD has packaged, so no second tester unfortunately 🤷♂️ |
That's great news! That should take care of it. Thanks a lot for bringing this issue to my attention. I'll be rolling out a new release in the coming days, so stay tuned. And if you encounter any more issues, please don't hesitate to let me know. Users feedback is greatly appreciated! |
I'm generating HTML reports with a "floating window" (
--keep-days
) using incremental processing, as described in the man pages – i.e. I ran the initial report with--persist
only, and consecutive runs with--persist --restore
. All runs were processing the same log file. I've noticed that, especially with multiple runs per day, TX and requests were suddenly jumping unproportionally, while all other values seemed reasonable. Resetting the database and starting with a clean slate, TX and requests were back on normal – which leads me to the conclusion it must have to do with incremental processing.Example from the "wrong state":
Note the sudden jump of TX for the last day in the table, while there is no corresponding increase in values for visitors and hits; it almost looks like TX and Requests are added on each run even for lines having been processed already with a previous run, while all other values are unaffected (the progress shown in the terminal indicates lines are skipped, as it stays on 0 for a while before starting to count).
This example is just from a smaller server where I tried to reproduce this phenomenon – which I first encountered on a much bigger log (which took to long to process to try it multiple times), where emptying the database and processing the entire log again also brought back the "reasonable values" (not sure for how long, with having returned to incremental processing).
GoAccess is the latest release (1.9.2) with above example, same happened with 1.9.1 and the bigger log. In the above case, it's an Apache logfile, the bigger one comes from Caddy. If you need further details, please let me know and I see that I can provide them.
PS: in case it matters, the issue happened on both, Linux and BSD.
Thanks in advance for any hints!
The text was updated successfully, but these errors were encountered: