Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daily rotation does not remove oldest files if one day is missing #2553

Open
daniele77 opened this issue Nov 25, 2022 · 9 comments
Open

Daily rotation does not remove oldest files if one day is missing #2553

daniele77 opened this issue Nov 25, 2022 · 9 comments

Comments

@daniele77
Copy link

If for any reason one of the old files is missing (e.g., because one day the application has not logged), the oldest file is no more removed.
For example, having these files:

myapp_2022-11-25.log
myapp_2022-11-24.log
myapp_2022-11-23.log
myapp_2022-11-21.log
myapp_2022-11-20.log

myapp_2022-11-21.log and myapp_2022-11-20.log won't be never deleted.

This function causes the issue:

https://github.com/gabime/spdlog/blob/v1.x/include/spdlog/sinks/daily_file_sink.h#L183

void init_filenames_q_()
{
    using details::os::path_exists;

    filenames_q_ = details::circular_q<filename_t>(static_cast<size_t>(max_files_));
    std::vector<filename_t> filenames;
    auto now = log_clock::now();
    while (filenames.size() < max_files_)
    {
        auto filename = FileNameCalc::calc_filename(base_filename_, now_tm(now));
        if (!path_exists(filename))  // <---------------- stop at the first file missing
        {
            break;
        }
        filenames.emplace_back(filename);
        now -= std::chrono::hours(24);    // <-------- goes backward 24h
    }
    for (auto iter = filenames.rbegin(); iter != filenames.rend(); ++iter)
    {
        filenames_q_.push_back(std::move(*iter));
    }
}

I guess one fix could be to change the function so that it enumerates the files with the right pattern and remove the oldest one based on the file timestamp.

@gabime
Copy link
Owner

gabime commented Nov 25, 2022

Or remove the break. The problem is that in both cases it might take long time. Depending on the max_files_ value.

@daniele77
Copy link
Author

You can't remove the break: when there are less than max_files_ files, the while loop can't exit.
The solution I was proposing does not take a long time of execution.

@gabime
Copy link
Owner

gabime commented Nov 25, 2022

I think the cleaning function could be fixed to stop relying on the filenames_q_ and search for old files to delete instead.
If this fixed, the entire filenames_q_ mechanism can be removed, which would simplify the code significantly.

PR is welcome.

@daniele77
Copy link
Author

I agree.
I'm pretty busy right now, so I'm not sure I'll be able to submit a PR anytime soon.

@andy-byers
Copy link

I was thinking that we can't guarantee that there won't be a ton of files to look through, so what if we just allowed a certain number of missing files in the search, something like:

int allowed_missing = 32;
while (filenames.size() < max_files_)
{
    auto filename = FileNameCalc::calc_filename(base_filename_, now_tm(now));
    if (path_exists(filename))
    {
        filenames.emplace_back(filename);
    } else if (--allowed_missing <= 0)
    {
        break;
    }
    now -= std::chrono::hours(24);
}

I know it' not very robust, but is that a sensible compromise? Otherwise, I'd be willing to implement searching for files to remove. It seems like we could repurpose code from count_files() to do this.

@daniele77
Copy link
Author

@andy-byers this could be a partial solution, but you should have the const allowed_missing big enough to take into account the case when one starts the application a few months after the last run (e.g., an embedded system that you use just once in a while).

To manage the general case, you would end up with an allowed_missing so big that every time the algorithm must check hundreds of paths before stopping. And this would happen when there is no gap in the log files, as well.

So, what's the worse between:

  • checking the name of all the files in the directory (being sure that the algorithm never fails) OR
  • checking the presence of hundreds of file names (with the possibility that there are files older than the allowed_missing value we choose)?

I've no doubt the best solution is to check all the files in the directory for log files name.

@gabime
Copy link
Owner

gabime commented Jan 19, 2023

I've no doubt the best solution is to check all the files in the directory for log files name.

I second that.

@andy-byers
Copy link

@daniele77 and @gabime: that makes sense to me. I'll start a fork and see what I can do. Thanks!

@mtn241
Copy link

mtn241 commented Jan 23, 2023

If checking the name of all the files in the directory is inevitable maybe it's worth to divide max_files_ into two different N:

  • Keep files for last N days. Cleanup based on date only.
  • Keep last N files, Cleanup based on number of files.

This will extend sink options and simplify logic for creator (date or number but not both ). Strategy for each N can be injected into daily_file_sink constructor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants