Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document CI / Devop pipelines #1731

Open
wants to merge 62 commits into
base: main
Choose a base branch
from
Open

Conversation

ruffsl
Copy link
Member

@ruffsl ruffsl commented May 14, 2020

Also updates and accompanying Dockerfiles.
Fixes #1716

@SteveMacenski
Copy link
Member

SteveMacenski commented May 14, 2020

@ruffsl I think you need to rebase, all of those errors have been dealt with.

Let me know when you want me to take a look at this.

.circleci/config.yml Outdated Show resolved Hide resolved
@SteveMacenski
Copy link
Member

SteveMacenski commented May 15, 2020

A couple of suggestions

  • On the main page there's a really nice chart overview of everything. I think it could be good to add the "debug" and "release" builds on the bottom circle CI part. It just shows 2 builds where 1 has its artifacts run through code coverage but it doesn't say which or why its running twice. Or if the N and N+1 isn't meant to show the 2 builds, that's not also immediately clear.
  • The diagram is missing the nightly automated runs
  • The circle CI "ros-planning/navigation2 Build Trigger" doesn't have arrows to the 2 workflows like the dockerhub section has to show that it triggers these 2 build/test runs

The section headers for circle CI cover the bases, thank you.

@ruffsl ruffsl force-pushed the ci-docs branch 2 times, most recently from 2c2a51a to 78d2bb6 Compare May 19, 2020 01:11
@SteveMacenski
Copy link
Member

@ruffsl what's the state of this?

@ruffsl
Copy link
Member Author

ruffsl commented May 27, 2020

Had a workshop deadline, but hope to finish and merge before foxy release so I can reference this in the official Docker Hub library docs as an assortment of Dockerfile and ROS2 workspace examples.

@SteveMacenski
Copy link
Member

Hope the workshop proposal went well 😄 awesome, thanks for the update.

@SteveMacenski
Copy link
Member

I know its not related to this documentation, but do you know if we can trigger codecov to upload results if debug test succeeds but release test fails? That's a common occurrence and it would be nice to have updated results there as I'm actively tracking coverage. 39% was rookie numbers, 65% is a C-, 90% or bust.

@ruffsl
Copy link
Member Author

ruffsl commented May 29, 2020

do you know if we can trigger codecov to upload results if debug test succeeds but release test fails?

No, not sure. Your seeing missing code coverage results on the web UI for nightly CI workflows where only the release jobs are failing? Would if_ci_failed do what you want?

https://docs.codecov.io/docs/commit-status#if_ci_failed

Although, given CI uploads code coverage, regardless if the debug test fails, I'm not sure we'd want to set the status to success even if the CI fails:

https://github.com/ros-planning/navigation2/blob/a9171d0233de1aac5467cb0cf89adc1de0e73dfb/.circleci/config.yml#L275-L279

Per the docs, Codecov treats a failed CI as not complete because not all tests were executed:

https://docs.codecov.io/docs/ci-service-relationship#if-ci-fails-then-the-coverage-results-are

It looks like we could ignore a CI provider, but I don't see an option for ignoring one CI job:

https://docs.codecov.io/docs/detecting-ci-services

@SteveMacenski
Copy link
Member

Your seeing missing code coverage results on the web UI for nightly CI workflows where only the release jobs are failing

Correct, I'll look into the if_fail option. I'm going off of the fact that there's no report for code coverage given when the release_test fails, it only appears when all CI tests succeed (you can see that in PRs and in the master CI summary). I've also noticed a lack of updates in the web interface, but that could also be due to other weird issues where tests that are clearly run are not caught by codecov and show 0% coverage for certain areas (commonly recoveries and DWB critics)

@ruffsl
Copy link
Member Author

ruffsl commented May 30, 2020

@SteveMacenski you can take a look at the dockerfile.md and dockerhub.md docs. Still working on the other sections.

Copy link
Member

@SteveMacenski SteveMacenski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think it needs to be more organized. There are a ton of open questions not answered about the specifics of "what's going on" and "why is it done that way". In the dockerfile file, there's a bunch of open questions about how you structured the document, you reference "stages" that don't exist in any of the dockerfiles and it seems to jump out of order. You also never actually explain how the caching happens, where it happens, how to break it, etc.

doc/continuous_integration/dockerfile.md Outdated Show resolved Hide resolved
doc/continuous_integration/dockerfile.md Outdated Show resolved Hide resolved
doc/continuous_integration/dockerfile.md Outdated Show resolved Hide resolved
doc/continuous_integration/dockerfile.md Show resolved Hide resolved
doc/continuous_integration/dockerfile.md Show resolved Hide resolved
doc/continuous_integration/dockerhub.md Show resolved Hide resolved
doc/continuous_integration/dockerhub.md Outdated Show resolved Hide resolved
doc/continuous_integration/dockerhub.md Show resolved Hide resolved
doc/continuous_integration/dockerhub.md Outdated Show resolved Hide resolved
doc/continuous_integration/dockerhub.md Outdated Show resolved Hide resolved
@SteveMacenski
Copy link
Member

SteveMacenski commented Jun 1, 2020

I think you need good introduction sections to introduce the concepts and why things are important / motivation before getting into it. I would say that I understand generally how things are going on, but I have to say that that documentation did not answer any of my questions and it was really confusing to read.

@SteveMacenski
Copy link
Member

Lightly poking on this

.circleci/config.yml Outdated Show resolved Hide resolved
@tylerjw
Copy link

tylerjw commented Jul 10, 2020

As to how CircleCI caching is used. Because you create a new cache tag for each run of CI the cache is never re-used between CI runs. You can see this in the config around line 46.

@ruffsl
Copy link
Member Author

ruffsl commented Jul 10, 2020

As to how CircleCI caching is used. Because you create a new cache tag for each run of CI the cache is never re-used between CI runs. You can see this in the config around line 46.

That is incorrect, although a unique cash key is created, the look up strategy for cashing is the most recent longest matching prefix, thus the unix epoch simply serves to version bump the cash to restore from, if you will. The restore step omits appending the epoch for this reason.

https://github.com/ros-planning/navigation2/blob/baff7aef8467cda134f375d7eabae6aaaf118539/doc/continuous_integration/circleci.md#caching

@tylerjw
Copy link

tylerjw commented Jul 10, 2020

Based on this (https://circleci.com/docs/2.0/caching/#partial-dependency-caching-strategies) and my testing I don't think that is correct. I think you have to specify to level of prefix you are ok with restoring from. For example you might do something like this:

      - restore_cache:
          name: Restoring Cache ccache-v1
          keys:
            - ccache-v1-{{ .Branch }}-{{ .Revision }}
            - ccache-v1-

      - save_cache:
          name: Saving Cache ccache-v1
          key: ccache-v1-{{ .Branch }}-{{ .Revision }}
          paths:
            - ~/.ccache
...

@tylerjw
Copy link

tylerjw commented Jul 10, 2020

One more thing I haven't figured out yet is I'm trying to cache the workspaces but it is re-building them even though i restored from a chache.

@ruffsl
Copy link
Member Author

ruffsl commented Jul 10, 2020

@tylerjw , perhaps it would help too dissect an executed workflow for nav2 as example, e.g.:

https://circleci.com/workflow-run/83b96516-b005-4f29-b587-98cf5c7f9f7f

For the release build, we can see the cache was busted given how the nightly image that was pulled from for the executor has been updated. Thus the attempts to restore the workspace from the CI cache are missed. However, given the checkout PR makes no changes to the underlay used, the underlay provided by the docker image can be saved to seed the CI cache for this PR.

https://circleci.com/gh/ros-planning/navigation2/12882

Restoring the cache for the overlay workspace similarly misses, but can used the ccache that was seeded from the docker image to speed the overlay build to a little under 4min. From the following ccache stats command, you can see the hit rate for ccache was about 99%, meaning that though the source overlay was changed, very little of it differs from the master branch. The updated ccache is then it self cached for later workflows that start from the same PR and use the same nightly image sha.

The release test job then successfully restores the workspace built by the build job

https://circleci.com/gh/ros-planning/navigation2/12883

.circleci/config.yml Outdated Show resolved Hide resolved
@SteveMacenski
Copy link
Member

Should I look at this again?

@ruffsl
Copy link
Member Author

ruffsl commented Aug 11, 2020

I need to update it a little to reflect #1934 now that it's using matrix stanzas and RAM disks for speeding up ccache disk IO.

ruffsl and others added 15 commits November 7, 2020 13:49
Co-authored-by: Steve Macenski <stevenmacenski@gmail.com>
Co-authored-by: Steve Macenski <stevenmacenski@gmail.com>
Co-authored-by: Steve Macenski <stevenmacenski@gmail.com>
Co-authored-by: Steve Macenski <stevenmacenski@gmail.com>
Co-authored-by: Steve Macenski <stevenmacenski@gmail.com>
Co-authored-by: Steve Macenski <stevenmacenski@gmail.com>
now that `Autobuild` is disabled for the build rules on Dockerhub

The caches for release and debug jobs in the same workflow don't collide thanks to the different prefix in the cache key names.
Add two arrows to make that separation more apparent.

Also, correct the fact that test jobs don't reuse caches from prior test jobs, only from build jobs in the same workflow.
Co-authored-by: Steve Macenski <stevenmacenski@gmail.com>
@SteveMacenski
Copy link
Member

SteveMacenski commented Nov 9, 2020

@ruffsl I went through everything - from here on in the review, can you not hit the "resolve" button? It will help me keep track of which of these I already went through and 'OKed' your fix vs the open items. Then once its done and I just check that the intent of my comment was met, I'll hit the resolve button. This just helps me not have to go back and click every resolved discussion and look over it again and again.

Normally its not a problem, but there are alot of comments on this one and that would suck up a lot of time

@SteveMacenski
Copy link
Member

@ruffsl any progress?

@SteveMacenski
Copy link
Member

@ruffsl any update? This is the oldest pending PR in Nav2 and I'd love to drive this to conclusion with the new galactic release

@ruffsl
Copy link
Member Author

ruffsl commented Apr 6, 2021

No updates. My writing time is being spent on thesis work. We could meet up offline and I could answer any lingering questions over a call so you could do any last editing for sections you feel are still unclear.

@SteveMacenski
Copy link
Member

Unfortunately, I really don't have the cycles to commit to finishing this myself. I'm a bit overloaded by the current stack of work I have I can't add this to the queue. I suppose this work is on pause for the foreseeable future?

@mergify
Copy link
Contributor

mergify bot commented Oct 15, 2021

This pull request is in conflict. Could you fix it @ruffsl?

@tylerjw
Copy link

tylerjw commented Oct 15, 2021

We approve mergify and it attacks! 🏴‍☠️

@ros-navigation ros-navigation deleted a comment from mergify bot Oct 15, 2021
@SteveMacenski
Copy link
Member

SteveMacenski commented Oct 15, 2021

Just Nav2, right?!

Yeah, I'm testing setting this up. Looks like one of my rules regex got messed up, but the others are working great! There's so many boilerplate things I have to tell people that this is going to help greatly to reduce my burden of. Now any time people nuke my required PR template fields, it'll give them back to me in a comment, if conflicts tell them, if builds fail tell them, if they open PRs in inappropriate branches tells them, etc

@SteveMacenski
Copy link
Member

We can talk about it more in this ticket #2593 if you like. I don't want to get off topics in this PR

@mergify
Copy link
Contributor

mergify bot commented Oct 15, 2021

@ruffsl, your PR has failed to build. Please check CI outputs and resolve issues.
You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

@mergify
Copy link
Contributor

mergify bot commented Sep 23, 2022

This pull request is in conflict. Could you fix it @ruffsl?

2 similar comments
@mergify
Copy link
Contributor

mergify bot commented Mar 20, 2023

This pull request is in conflict. Could you fix it @ruffsl?

@mergify
Copy link
Contributor

mergify bot commented Jul 27, 2023

This pull request is in conflict. Could you fix it @ruffsl?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document CI / Devop pipelines
3 participants