RLCI

An experimental CI/CD system designed to suit my needs.

I have used Jenkins in many projects and I've had a lot of success with it. However, over time I've also become more and more frustrated with it, thinking that it is not the right tool for the job. It feels like a patchwork of functionality that is difficult to get to work the way I would like a CI/CD system to work.

Vision

RLCI is an attempt to build a CI/CD system to suit my needs. What are my needs? This is what I initially came up with:

Ability to define flexible, first-class pipelines.
- A pipeline defines a process. A change flows through a pipeline. (In Jenkins, a pipeline is run producing multiple pipeline runs.)
- All steps should run in isolation. (In Jenkins, workspaces are reused, and one step can see the workspace used in another step.)
- Pipelines should have a visual representation. (Jenkins can't show the whole pipeline until all steps have been run. If some steps at the end are run seldom, they will never be visualized.)
Everything (pipelines/configuration/etc) is written in code. (In Jenkins, many things can be configured in the GUI which is a convenient way to start, but a pain in the long run.)
Ability to re-rerun parts of a failed pipeline.
Pipeline visualization:
- Visualise as as a flow of commits passing through stages and gathering confidence.
- Show statistics how long each stage takes and its failure rate.

Notes

Where should a pipeline be defined?

Jenkins and other CI/CD systems make us define the pipeline in the repo itself. Does that make sense?

On the one hand, it ensures that the pipeline is always in sync with the repo. If a build command changes, we can update the pipeline build step accordingly.

On the other hand, it doesn't feel right that the repo has knowledge of where it is deployed for example.

A pipeline encodes a process for software delivery. The process and the code can change independently.

If things change together, they should be together. If not, they should not.

My current thinking is that the repo should expose an interface to the pipeline for doing certain tasks. For example ./zero.py build to build and test, and ./zero.py deploy to deploy the application somewhere. If the build process changes, only a change to zero.py is needed, and the pipeline can stay the same.

Server requirements

This section documents requirements on the server that RLCI runs on. Currently these requirements are not automated, but RLCI assumes that they are in place:

SSH access (using keys) for user X

# /etc/ssh/sshd_config
PrintLastLog no
PermitRootLogin no
PasswordAuthentication no

Directory /opt/rlci present with full permissions to user X
Git configured with email/username
Web server configured to serve static content from /opt/rlci/html

# /etc/nginx/conf.d/rlci.conf
server {
    listen       80;
    server_name  ci.rickardlindberg.me;
    location / {
        root         /opt/rlci/html;
    }
}

Supervisor configuration

# /etc/supervisord.d/rlci-engine.ini
[program:rlci-engine]
command=python /opt/rlci/current/rlci-server-listen.py /tmp/rlci-engine.socket python /opt/rlci/current/rlci-engine.py
numprocs=1
autostart=true
autorestart=true
user=rlci
environment=HOME="/home/rlci"

Software installed:
- Python
- Git
- Supervisor

I'm currently not sure how/where to automate all of this, so that's why the documentation exists instead. But hopefully, we can get rid of it.

Development

I will practice agile software development in this project. Some guiding principles:

What is the simplest thing that could possibly work?
You ain't gonna need it! / Evolutionary design
TDD / Refactoring
- Get it working as fast as possible, then refactor/design
Zero Friction Development
How can we make a feature smaller?

Stories

This is the backlog of stories to serve as a reminder of what might be interesting to work on next.

More robust deploy
- ./rlci.py engine version to verify new version
  - Run in separate stage with restart in between
More responsive server
- Use asyncio to make server process multiple requests simultaeneoysly
- Listen in while loop?
  - Handle reload-engine command
    - Requires async engine server
    - Requires more advanced parameter passing IPC
./zero.py integrate should create branch with unique name instead of BRANCH.
What is the report right after the first deploy?
Restart Supervisor if rlci-server-listen.py changed?

History

This is where I document the completed stories.

#1 Runs a hard-coded, pre-defined pipeline

Running pipelines is the core function of the CI/CD server. If we can get it to run a hard-coded, pre-defined pipeline, we have for sure demonstrated some progress.

I completed the main part of this story in a video. Watch me get all the infrastructure in place to write a test for the very first version of RLCI.

VIDEO: Rebooting RLCI with an agile approach using TDD and zero friction development.

Browse the code as it looked like at the end of the video and look at the complete diff of changes.

After the video I did some refactoring and made some more improvements to the build system:

Usage should exit with code 1.
Exit with code 0 if tests fail. (Should have been "code 1" in message.)
Update usage and inluce the build command.

#2 Extend hard coded pipeline to integrate a branch

The hard coded pipeline currently does nothing. We could start fleshing out this pipeline to be the pipeline that RLCI could use. The first step in that pipeline should be to integrate a branch (merge with main and run tests and promote if passed).

I completed this story along with some clean up in a video.

VIDEO: Adding continuous integration functionality to RLCI.

Browse the code as it looked like at the end of the video and look at the complete diff of changes.

#3 Extend hard coded pipeline to run in isolation

This prevents multiple pipeline runs to interfere with each other via contaminated workspaces.

I completed this story in a video. Watch me do refactoring, internal improvements, and finally adding functionality to execute pipelines in isolation.

VIDEO: Making RLCI pipelines run in isolation.

Browse the code as it looked like at the end of the video and look at the complete diff of changes.

Retro

I am not happy with how the design turned out. I'm not sure it will allow me to move forward smoothly. I think I will spend some time researching testing/design strategies applicable for me in this situation.

But can I do that without also working on a story?

I think adding more realistic output will require that I have a better design. So perhaps I should try to refactor towards a design that will make reporting easy to implement. And then implement that.

Overall, I think that much time needs to be spend on refactoring/design. Perhaps this ratio is higher in the beginning of a project. I feel like 90/10 design/refactoring vs. implementing stories.

#4 Make pipeline print to the terminal what it is doing

This story started out with a bunch of refactoring and design. I wasn't really sure what story to work on when I started. I just knew I needed to clean up some things before I could move on. Perhaps I should have done that in the previous story already. Once I was happier with the design, it was quite natural to extend the pipeline to report what it was doing, so that's what I did.

Browse the code as it looked like at the end of the story and look at the complete diff of changes.

Retro

The article Favor real dependencies for unit testing presented a solution to a design problem I was having. For more info, see my video about it.
Functional core, imperative shell. Hexagonal architecture. A-frame architecture. They are all similar. Thinking in terms of pure/IO Haskell functions made it pretty clear to me. I feel like RLCI is quite free from pure logic at this point. It is mostly stitching together infrastructure code. But I will keep it in mind and look for opportunities to extract pure functions.
Evolutionary design is hard. What if the first step was in the wrong direction? At least a rewrite is not a rewrite of that much.
I used the TDD principle of taking every shortcut possible to get a test passing, and then improved the design with refactoring. (When writing the new infrastructure wrapper Process.) It felt awkward to do ugly things, but I got to a clean solution faster.
I caught myself having done some premature parametrization and removed it.
Tests have guided my design decision more than they have done in the past. Mainly in the way that I try to think about why writing a test is complicated and then changing the design to make testing simple.

#5 More realistic environment (run RLCI on server)

One purpose of a CI server is to provide the same environment for integration builds. That requires the CI server to not run on my laptop. Create a dedicated server to which RLCI can be deployed and run. (My CI-server is a Linode. My pipeline is the RLCI program. Linode provides same environment / integration point. RLCI provides process.)

I implemented the main part of the functionality in a video:

VIDEO: Deploying my continuous integration software to a server.

Browse the code as it looked like at the end of the video and look at the complete diff of changes.

When reviewing the work, I came up with the following list of refactorings and improvements to work on before considering the story done:

Second test case for no second argument to deploy
Always delete temporary branch
Integrate without commits?
Diffs hard to read
Assumes /opt/rlci exists
Ugly tests. How to make them better?
Don't execute zero.py through Shell? Missed failure of git checkout ''
Clean up CI serer home folder

I worked on those in another video:

VIDEO: Therapeutic refactoring and polishing of a feature.

Browse the code as it looked like at the end of the video and look at the complete diff of changes.

Retro

I feel like there is a tension between extending the codebase with new functionality versus keeping it clean.

I am still not sure how to integrate refactoring and design into an agile approach. It is supposed to be done in the background somehow while the main focus is delivering stories.

#6 Custom pipelines

There is only one hard coded pipeline. Make it possible to define more and trigger them from the CLI.

I started this story by working on extracting a database class. I figured, with a database, we can store multiple pipelines, and we can also store the logs so that we can show history of pipeline runs.

I made a video about this process:

VIDEO: I made a mistake when evolving the design of RLCI to support a database.

After working on the database for a while, I realized that I had designed it too much up front.

I reverted the speculative changes and instead committed to this story only.

The following tests prove that multiple pipelines can be triggered.

>>> RLCIApp.run_in_test_mode(
...     args=["trigger", "rlci"]
... ).has("STDOUT", "Triggered RLCIPipeline")
True

>>> RLCIApp.run_in_test_mode(
...     args=["trigger", "test-pipeline"]
... ).has("STDOUT", "Triggered TEST-PIPELINE")
True

We still have to define the pipelines in the source code. Eventually though we should be able to store them in an external database instead. But that is for another story.

Browse the code as it looked like at the end of the story and look at the complete diff of changes.

#7 More realistic output

The pipeline currently writes its "report" to stdout. I imagine the CI-server having a web-front end to display its status. Therefore convert the stdout report to a HTML-file that can be served by a web server.

I made a video about the process of working on this story:

VIDEO: I did the simplest thing that could possibly work. Here's what happened.

I did the absolute minimal and simple thing that could possible work. Many more improvements can be made to the HTML report, but this is a first version.

Browse the code as it looked like at the end of the story and look at the complete diff of changes.

#8 Synchronize integrations

When I integrate my changes, and someone else is currently integrating, I have to wait for them to finish.

I made a video series about this change:

VIDEO: Converting RLCI to client/server architecture (part 1/3).

VIDEO: Converting RLCI to client/server architecture (part 2/3).

VIDEO: Converting RLCI to client/server architecture (part 3/3).

I found it difficult to implement this story in small steps and without manual debugging on the server. How could it have been done better?

I guess I can analyze the situations where I had to do manual work and see how I could have prevented it.

#9 Safer deployments

When I integrate my changes and deployment fails, the old version of rlci should still be running.

I modified the deploy script to deploy to a separate folder, and then make the switch only once the deployment was successful.

I got that idea from here: https://github.com/acg/dream-deploys/blob/master/deploy

Browse the code as it looked like at the end of the story and look at the complete diff of changes.

Making these changes, I modified the deploy script incrementally and integrated my changes. It worked, but it required RLCI to be deployed incrementally as well. You could not go from version 1 to version 5, you had to deploy all the versions in between.

I'm not sure how I should approach that.

I guess the deploy script would have to check to see how the current deploy is made and adjust accordingly.

#10 Don't drop connections

When two people try to trigger a pipeline at the same time, the later one will get an error because the server restarts and drops the connection.

I modified the server in small steps to become a reliable socket server that never drops connections.

Browse the code as it looked like at the end of the story and look at the complete diff of changes.

I made two videos about this change.

First, the process of the change:

VIDEO: What does working in small steps look like?

Second, how to write reliable socket servers:

VIDEO: How to write reliable socket servers that survive crashes and restarts?

The second one I actually did before and I also have a blog post about it.

Name		Name	Last commit message	Last commit date
Latest commit History 438 Commits
legacy		legacy
rlci		rlci
test_resources		test_resources
.gitignore		.gitignore
README.md		README.md
rlci-cli.py		rlci-cli.py
rlci-engine.py		rlci-engine.py
rlci-server-listen.py		rlci-server-listen.py
zero.py		zero.py

rickardlindberg/rlci

Folders and files

Latest commit

History

Repository files navigation

RLCI

Vision

Notes

Where should a pipeline be defined?

Server requirements

Development

Stories

History

#1 Runs a hard-coded, pre-defined pipeline

#2 Extend hard coded pipeline to integrate a branch

#3 Extend hard coded pipeline to run in isolation

Retro

#4 Make pipeline print to the terminal what it is doing

Retro

#5 More realistic environment (run RLCI on server)

Retro

#6 Custom pipelines

#7 More realistic output

#8 Synchronize integrations

#9 Safer deployments

#10 Don't drop connections

About

Topics

Resources

Stars

Watchers

Forks

Languages