Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically cut off database-readable logs by id & job_id age #108

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

meatballhat
Copy link
Contributor

@meatballhat meatballhat commented Apr 24, 2017

The idea here is to introduce an automatic window of time for reading logs from the logs database for purposes of being able to drop older records. With this change, records with id or job_id lower than the cutoff will automatically be assumed to be "archived", meaning they will be read from S3 by travis-api. In reality, this tends to happen within ~3h of job completion, so this change is mostly about defining a window of time within which we allow logs to be mutated, as is done via job restart.

  • other humans are OK with this idea
  • we have a plan for if/how to message this via web and cli

@igorwwwwwwwwwwwwwwwwwwww
Copy link
Contributor

igorwwwwwwwwwwwwwwwwwwww commented Apr 24, 2017

@meatballhat can you give a brief description of:

  • the existing behaviour
  • the proposed behaviour
  • the reason for the change

@meatballhat
Copy link
Contributor Author

@igorwwwwwwwwwwwwwwwwwwww I was backfilling while you commented. Sorry about the delay.

Copy link

@renee-travisci renee-travisci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. I'm a big fan of limiting the amount of back data we allow a user to display through the web UI. However, I think we should see how many customers try to get logs older than 6 months to understand better how many customers this will impact. I assume if Enterprise starts using the logs API their customers may want to set a longer back date, but they may also want to turn off the extra query here and allow everything - something for enterprise to answer.

@@ -52,6 +52,7 @@ class Config < Travis::Config
log_parts_autovacuum_vacuum_scale_factor: 0.001,
log_parts_autovacuum_vacuum_threshold: 0,
min_messages: 'warning',
min_readable_cutoff_age: 60 * 60 * 24 * 180,

This comment was marked as spam.

@meatballhat
Copy link
Contributor Author

meatballhat commented Apr 24, 2017

@renee-travisci I'm sorry for not being more explicit about this, but this change is not intended to alter reading logs data via web/cli, but rather only to change how long we'll allow it to be mutated.

@acnagy
Copy link

acnagy commented Apr 24, 2017

@meatballhat @renee-travisci Ah.... this is interesting, and it makes more sense (thanks for clarifying @meatballhat!). From my experience looking at users' job stats, people generally don't mutate/restart a job more than a few days to a week. Generally, old jobs are hard to find because they get buried on the web ui.

However, conceivably, someone who's not been using Travis much will want to restart a very old job when they resume working on a project. I'm not sure how frequently that happens, if at all, but I assume if we documented the behavior pretty clearly in the docs, people would understand

@meatballhat
Copy link
Contributor Author

@acnagy I have a lot of sympathy for folks who need/want to restart a job that ran more than a few months in the past.

We could decide to start with a cutoff of something like 2 years, then maybe tighten it up over time? I suspect much of the potential pain could be avoided if we were to change our default mode of mutating build, job, and log records to instead create new records, but I think that's a more involved change.

@svenfuchs
Copy link
Contributor

Personally I only ever restart jobs when they error, and I gotta get CI green.

@acnagy I think in the example of picking up a project after a number of months it would be pretty unlikely I'm interested in restarting the old stuff. I can't think of a single case like that. Instead I'd move forward and create new commits/builds?

@acnagy
Copy link

acnagy commented Apr 24, 2017

@meatballhat @svenfuchs I think it's a matter of workflow... some people create new commits, and I feel like I've emailed with someone who just jumped in and restarted... Can't remember exactly though...

That said, I'm not sure it's worth supporting the restarting-very-old-builds workflow very much. The problem is, if they restart after a long time, the image/dependencies could have changed, and then they could get new errors... which means they'll end up needing to do more commits anyway. I know the #reproducibility-study people run into these issues... So, I think 6 months is probably a fairly smart cut-off, we just need to document it

@emdantrim
Copy link
Contributor

Bumping this in the interest of getting documentation sorted out and getting this PR merged.

Where in the docs do you think this belongs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants