Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TX and Requests wrong with incremental processing #2667

Closed
IzzySoft opened this issue Apr 16, 2024 · 25 comments
Closed

TX and Requests wrong with incremental processing #2667

IzzySoft opened this issue Apr 16, 2024 · 25 comments

Comments

@IzzySoft
Copy link
Contributor

IzzySoft commented Apr 16, 2024

I'm generating HTML reports with a "floating window" (--keep-days) using incremental processing, as described in the man pages – i.e. I ran the initial report with --persist only, and consecutive runs with --persist --restore. All runs were processing the same log file. I've noticed that, especially with multiple runs per day, TX and requests were suddenly jumping unproportionally, while all other values seemed reasonable. Resetting the database and starting with a clean slate, TX and requests were back on normal – which leads me to the conclusion it must have to do with incremental processing.

Example from the "wrong state":

image

Note the sudden jump of TX for the last day in the table, while there is no corresponding increase in values for visitors and hits; it almost looks like TX and Requests are added on each run even for lines having been processed already with a previous run, while all other values are unaffected (the progress shown in the terminal indicates lines are skipped, as it stays on 0 for a while before starting to count).

This example is just from a smaller server where I tried to reproduce this phenomenon – which I first encountered on a much bigger log (which took to long to process to try it multiple times), where emptying the database and processing the entire log again also brought back the "reasonable values" (not sure for how long, with having returned to incremental processing).

GoAccess is the latest release (1.9.2) with above example, same happened with 1.9.1 and the bigger log. In the above case, it's an Apache logfile, the bigger one comes from Caddy. If you need further details, please let me know and I see that I can provide them.

PS: in case it matters, the issue happened on both, Linux and BSD.

Thanks in advance for any hints!

@allinurl
Copy link
Owner

Thanks for providing details about the possible issue. Can you try replicating it with a smaller dataset? Also, are you running it with the --real-time-html flag? Additionally, could you please share the exact command you're using?

Lastly, would you be able to share the smaller, reproducible dataset with me, either by uploading it here or sending it to me via email so I can replicate it on my end?

@IzzySoft
Copy link
Contributor Author

Thanks for your quick response!

  • what do you mean with "smaller dataset" – a smaller log? I could test with the smallest Apache VHost I'm running here, where the log is < 200 kB at the moment, using the same log format. Not sure how many tries it takes to reproduce it, though – and if I'm able to reproduce it with such a small log in a "visible way".
    And where would I find the mail address to send it to (I could include the configs and the resulting HTML then as well)?
  • no, I'm not running with --real-time-html
  • exact command differs. On the big log processed on BSD, all settings are in the config (I'll have to ask the webmaster there for the exact command they use, IIRC they just run goaccess without any parameters; they don't use a "rolling window" either, though that should not have had any effects in my test case either as the 14 days were not reached). Also, they use a different GeoIP database (dbip-country-lite-2024-04.mmdb IIRC).

Command used here for the test on the smaller log, slightly sanitized:

# initial run
goaccess access_log.1 --geoip-database=./GeoLite2-ASN.mmdb --config-file=./goaccess.conf --html-report-title="Server Statistics" -o ./report.html --anonymize-ip --jobs=2 --db-path=./db --persist

# subsequent runs
goaccess access_log --geoip-database=./GeoLite2-ASN.mmdb --config-file=./goaccess.conf --html-report-title="Server Statistics" -o ./report.html --anonymize-ip --jobs=2 --db-path=./db --persist --restore --keep-last=14

In case the order of arguments is important there: when setting up the script for the server, I moved the "global options" (i.e. those identical for all servers) to variables, like (again slightly sanitized):

GeoIp="--geoip-database=$GeoIpPath/GeoLite2-ASN.mmdb"
GlobalOpts="--config-file=$BINDIR/goaccess.conf $GeoIp --anonymize-ip --jobs=2 --persist --restore --keep-last=14"
goaccess $LogBase/$vhost/$LogName $GlobalOpts --html-report-title="$vhost" -o $OutDir/${vhost}.html --db-path=$DBBaseDir/$vhost

($LogName defaults to access_log and can be overridden on script call, which is used e.g. with LogRotate's postprocess as access_log.1). This happened at the beginning of the test (i.e. after the first run).

@allinurl
Copy link
Owner

Absolutely, using a smaller dataset or log with fewer data entries or requests would definitely simplify the debugging process. If the issue occurs consistently, it's likely to appear even with a smaller log. Feel free to send it over to hello [at] goaccess.io.

Also, please share your config file. Thanks

@IzzySoft
Copy link
Contributor Author

Will do, thanks. Will take a little, though, to run the tests – but I'll sure come over with the data. If you can let me know where to send the mail to that is – I'd rather not post the full logs (with IPs and all) publicly 😉 If you prefer sending me the mail by mail, you can construct my address from my nick shown (the shorter one) 'at' my login, and the DE TLD.

@IzzySoft
Copy link
Contributor Author

OK, short update in between from the server of the test runs above. I refrained from any "additional runs" for a full day. The "wrong values" indeed only affected the one day when I did the additional runs, and went back to normal after that. So the "wrong day" will "age out" thanks to the "rolling window":

image

16th (the day with alleged 1.2 GB xfer) was when I had the multiple runs. Before and after are looking fine. Still using the very same DB.

@allinurl
Copy link
Owner

I haven't seen any emails come through. Go ahead and send them if you'd like. By the way, did you run a smaller test to see if you could recreate the problem?

@IzzySoft
Copy link
Contributor Author

IzzySoft commented Apr 20, 2024

If you can let me know where to send the mail to that is

😉 OK, meanwhile found the mail address in the NEWS file.

But the main error in this regard is on my end, though: I didn't yet get to run the tests, apologies. Not forgotten, I was just drowning in other tasks…

Did that now, and wow, looks like I got a quite extreme example – with a log file of just 140 kB. Was able to more than reproduce with just 3 runs:

  • initial run: xfer 40.69 MB, requests 687
  • 1st incremental run (just 1 access in between: me checking the report): xfer 77.29 MB, requests 1,377
  • 2nd incremental run, again just a single request in between: xfer 116.25 MB, requests 2,068

At this point, I prepared the files for the mail. Then ran another initial run. Results: xfer 41.32 MB, requests 692 – as expected. Will now prepare and send the mail with the files: both resulting *.html reports, the log and the config used, plus a text file with the full command line used in both cases. If you need any additional details, please let me know.

Expect the mail in a few minutes (need to zip the collection up, get it from the server I've run the test on, and write up the mail).

Edit: mail sent.

@IzzySoft
Copy link
Contributor Author

Mail bounced, sorry: GMail refused your forward. You seem to have your listed address shipping all incoming mails to a gmail address, delivering them to gmail in the name of the original sender (i.e. "mail from me" gets there with "delivered by your server"), which breaks SPF as your domain is not listed in my domains SPF records (why should it be?). I've now sent it again to the gmail address the bounce report mentioned. Hope it gets through this time. Please let me know when you got it.

(that said, gmail doesn't behave fine with many small providers. Would be good if there was a mail account with a non-monopolized "trust-able provider" 😉)

@allinurl
Copy link
Owner

Got the mail. I'll check on this. Thanks!

@IzzySoft
Copy link
Contributor Author

Just curious, did you find the culprit already?

@allinurl
Copy link
Owner

So, I spent some time tinkering with it over the weekend, and I couldn't reproduce any issues with the access log and config file you sent. Here's what I did:

# goaccess -p izzy.conf access_log --persist

2024-05-12-203044_569x452_scrot

and then I ran it repeatedly using

# goaccess -p izzy.conf access_log --persist --restore --keep-last=14

2024-05-12-203107_537x474_scrot

Each time, I got identical results, except for some slight changes in percentages due to overall hits/visitors totals changing (i.e., --keep-last=14).

If you're still encountering the problem, could you walk me through step by step how you're triggering it using the exact tar.gz file you shared with me?

@IzzySoft
Copy link
Contributor Author

Now that is strange. I was able to immediately reproduce it (on Linux). My friend in the US as well (on BSD). Steps as outlined above (and in my mail, IIRC). Only difference I see on a quick glance is that we used HTML reports – but that should not be relevant to the data themselves IMHO. Let me quote the full command line (sanitized) again to make sure:

# initial run
goaccess access_log.1 --geoip-database=./GeoLite2-ASN.mmdb --config-file=./izzy.conf --html-report-title="Server Statistics" -o output/report.html --anonymize-ip --jobs=2 --db-path=./webdb --persist

# subsequent runs
goaccess access_log --geoip-database=./GeoLite2-ASN.mmdb --config-file=./izzy.conf --html-report-title="Server Statistics" -o output/report.html --anonymize-ip --jobs=2 --db-path=./webdb --persist --restore --keep-last=14

That's what I used. My friend used it without the --keep-last=14 and had all his settings in the config. He meanwhile switched to process the entire log each time, I switched to a single run per day. Both variants "work around" this issue.

@allinurl
Copy link
Owner

I noticed you've got two log files, access_log.1 and access_log. But in the tar.gz you sent over, I only see access_log. Please feel free to share access_log.1. It'll help me dig into this. Thanks.

@IzzySoft
Copy link
Contributor Author

Oof, that I copied and sanitized from my notes now (for "initializing a new vhost"). Exact steps for my test case are described in _testruns.txt in the archive and indeed only used a single log. Apologies for the confusion; my focus was rather on the parameters here. According to your comment, you just used

goaccess -p izzy.conf access_log --persist

while I specified several additional parameters (again sanitized here, the real values you have in the file):

--geoip-database=./GeoLite2-ASN.mmdb --html-report-title=server_stats -o output.html --jobs=2 --db-path=./stats

Blind guess would be --jobs could have some affect there. The GeoIP DB is just for lookup, so I don't think it's that (though the config has the GeoLite2-Country while I used GeoLite2-ASN).

@allinurl
Copy link
Owner

Here's the output I'm getting, and it's staying consistent on my end.

2024-05-13 18-49

@IzzySoft
Copy link
Contributor Author

Thanks Gerardo! I don't understand it. The only difference I can see there is that you now have each subsequent run on a "static log file" (as I only sent you the log with the latest state), while during my runs the file was (minimal) appended to¹. I'll see to find some time for another test run here, copying the logs in between to have the possibility of 100% reproducible runs (with the same changes in the files). If I again can reproduce (which I assume), I'll send you another mail with all the log file copies.

Again, thanks a lot for digging in!

¹ as the text file with the test protocol mentions how many hits/lines where added to the log, you should be able to produce the original states by removing the corresponding lines and then use each corresponding copy. That should fully reproduce my test run then. Though I'm really perplexed it does not even in the slightest show the changes I saw here…

@IzzySoft
Copy link
Contributor Author

PS: I was able to reproduce again. Mail with the 3 logs and the protocol will go out to you in a few minutes. Stats in short:

  • test1.log (642 lines); xfer: 131.14 MiB, requests: 642
  • test2.log (670 lines); xfer: 262.3 MiB, requests: 1312 — so 28 log lines added 670 requests
  • test3.log (672 lines); xfer: 393.96 MiB, requests: 1984 — so 2 more lines added 672 requests

Interesting 670 + 2 = 672; I assume if I'd only run test1.log and test3.log, I'd end up with the same results. But no: that gives 262.8 MiB and 1314 requests 🤷‍♂️

Btw: running test3.log another time doesn't change a thing anymore.

@allinurl
Copy link
Owner

Got the files, thanks for sharing those. I understand now. So basically, the current setup relies on the inode. Since each of those three logs has a different inode, it treats them as separate logs and parses them individually. But if you pipe data directly into goaccess, you should see the results you're after.

# cat test1.log | goaccess - --geoip-database=../dbip-asn-lite-2024-04.mmdb --config-file=../incrun_test/izzy.conf --html-report-title=stats.izzysoft.de -o stats.html --jobs=2 --db-path=./stats --persist

2024-05-13-205622_960x187_scrot

# cat test2.log | goaccess - --geoip-database=../dbip-asn-lite-2024-04.mmdb --config-file=../incrun_test/izzy.conf --html-report-title=stats.izzysoft.de -o stats.html --jobs=2 --db-path=./stats --persist  --restore

2024-05-13-205631_958x178_scrot

# cat test3.log | goaccess - --geoip-database=../dbip-asn-lite-2024-04.mmdb --config-file=../incrun_test/izzy.conf --html-report-title=stats.izzysoft.de -o stats.html --jobs=2 --db-path=./stats --persist  --restore

2024-05-13-205638_953x180_scrot

The thing is, when data is piped, it operates on timestamps without knowing the inode or the last line processed. Now, if you're wondering why this doesn't happen when you directly pass a log to goaccess, well, it's because that would be super time-consuming. It'd have to go through each and every line again, comparing timestamps. Right now, it just jumps straight to the last line it read and carries on from there, which is much faster.

Let me know if that makes sense.

@IzzySoft
Copy link
Contributor Author

IzzySoft commented May 14, 2024

Since each of those three logs has a different inode, it treats them as separate logs and parses them individually

That's only for this test run today. Last test in April it was the very same file the webserver appended to, with the same results. I just copied the different "stages" to separate files for you to easier reproduce.

So basically, as I described before: if you first parsed test1.log with --persist, then added the additional last 28 lines from test2.log to it with --persist --restore, then the last 2 lines from test3.log (thus imitating Apache having added to the log – which is what happened in my previous test and in the real cases bringing up this issue), you should see the same results. Can you reproduce that, too? Apache appending to the log shouldn't change the inode IMHO (at least that would very much surprise me).

Fun thing is, too, that this only happens when I have multiple runs in short sequence (until now, I only had that with multiple runs in the same hour). I do not notice such when running once every 24h. Have not tried with once per hour yet, though (which could be a common setup again, with a cron job).

@allinurl
Copy link
Owner

allinurl commented May 14, 2024

To elaborate, are you suggesting that by parsing test1.log, executing goaccess with --persist, then adding the 28 lines from test2.log and rerunning goaccess with --persist and --restore, followed by appending the additional 2 lines from test3.log to test1.log and again running goaccess with --persist and --restore, I can replicate the problem? e.g.,

goaccess test1.log --geoip-database=../dbip-asn-lite-2024-04.mmdb --config-file=../incrun_test/izzy.conf --html-report-title=stats.izzysoft.de -o stats.html --jobs=2 --db-path=./stats --persist

then

tail -28 test2.log >> test1.log

then

goaccess test1.log --geoip-database=../dbip-asn-lite-2024-04.mmdb --config-file=../incrun_test/izzy.conf --html-report-title=stats.izzysoft.de -o stats.html --jobs=2 --db-path=./stats --persist  --restore

then

tail -3 test2.log >> test1.log

then

goaccess test1.log --geoip-database=../dbip-asn-lite-2024-04.mmdb --config-file=../incrun_test/izzy.conf --html-report-title=stats.izzysoft.de -o stats.html --jobs=2 --db-path=./stats --persist  --restore

@IzzySoft
Copy link
Contributor Author

With the exception that the tail -2 should be applied from test3.log, yes. That should imitate Apache having written new lines to the log (with the inode staying the same, hopefully).

@allinurl
Copy link
Owner

Got it, thanks for letting me know about this. I've managed to reproduce the issue. Could you please build from development and thoroughly test it to see if it sorts out the problem for you? Seems like this hiccup cropped up when we introduced the multi-threading feature back in v1.9.

@IzzySoft
Copy link
Contributor Author

I've managed to reproduce the issue.

Phew! I was just about to give up and thought it must be something with the machines of my friend and me, though I'd have no idea what… Thanks for your patience and sticking to it!

multi-threading feature

Oh, so my guess with --jobs was not that far off then. Well, whatever it was, glad it got identified!

Could you please build from development and thoroughly test it to see if it sorts out the problem for you?

Will see to it, sure! (oof, needed to install autoconf and autopoint first – just mentioning that in case you want to point it out with the instructions. Compiled successfully then) Can use my stats server's logs for that so I don't disrupt the others. Will let you know then (hopefully before the weekend), thanks again!

@IzzySoft
Copy link
Contributor Author

After a few test runs: looks good! I could no longer reproduce the issue. I wouldn't exactly say I "thoroughly tested" it, but the combinations that reproduced the issue in the past (a few subsequent runs within a few minutes, some irregular runs during the same day) no longer do that, numbers seem fine. My friend prefers to wait until BSD has packaged, so no second tester unfortunately 🤷‍♂️

@allinurl
Copy link
Owner

That's great news! That should take care of it. Thanks a lot for bringing this issue to my attention. I'll be rolling out a new release in the coming days, so stay tuned. And if you encounter any more issues, please don't hesitate to let me know. Users feedback is greatly appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants