Skip to content

When and How CC Updates the Runtime

Greg Cobb edited this page Nov 2, 2020 · 12 revisions

Introduction

There are multiple paths through which CC updates the runtime (Diego or Eirini). These paths are often hidden and circuitous. As a result, it can be difficult to reason about whether an API interaction will update the runtime.

This document aims to be descriptive of the current world rather than prescriptive of a future world. We will change and improve things, so this document may drift out of alignment with reality. Also, this stuff is complicated, so this document will probably get some things wrong.

Overview

The two main methods for updating the runtime are the ProcessObserver class and the Diego::Sync job. The ProcessObserver triggers when certain fields are updated on processes and immediately updates the runtime. The Diego::Sync job runs periodically and will update the runtime if certain fields have changed on the process.

Details

Base Case

In general, fields on the Cloud Controller API are desired state and do not necessarily reflect the actual state of the runtime. By default, changing a field on the API will not automatically update the runtime unless you restart your application (for example: security groups, environment variables, disk). That said, there are numerous exceptions, which we will explore below.

Process Version

To understand if a change to a process will automatically propagate to the runtime, one must first understand process versioning. When a proccess's state, memory, health_check_type, health_check_http_endpoint, or ports fields are updated, then the process's version will be updated to a new random guid. Changing a process's version will then result in the process's LRP changing as will be demonstrated in coming sections.

Manifest Bypassing Version

Server side manifests are special-case'd to skip updating process versions if the memory field is provided. This means that as long as the process memory is set in the manifest, the process version will not change, regardless of what other fields are changed on the process.

Open Question: Why does this not apply to other manifest fields like health_check_type and health_check_http_endpoint?

Process Observer

ProcessObserver triggers when changes to processes are committed to to the database. It is called using after_commit hooks in the after_save and after_destroy model hooks.

Note: Because most of Cloud Controller unit tests use transactions for database isolation, these after_commit hooks don't trigger in tests. To trigger the hooks, switch the test's database isolation to :truncation.

If the process's state, diego, enable_ssh, or ports fields are updated the the ProcessObserver will start or update the process. Note that this method name is a bit confusing, because, as we will see, the Runner.start method will also update already running processes.

Focusing on the Diego case, Diego::Runner#start directly forwards on to Diego::Messenger#send_desire_request, which then forwards on to the Diego::DesireAppHandler.create_or_update_app, where it is finally revealed that we will also be updated an existing process, not just creating new processes.

Here is where the process version comes into play. The Diego::DesireAppHandler checks to see if the process has a matching LRP. To do this, it uses Diego::BbsAppsClient#get_app, which in turn uses Diego::ProcessGuid. Here we see that the LRP is identified using a combination of the process's guid and it's version. This means if the process's version has changed since Diego was last updated, Diego::BbsAppsClient will not find a matching LRP for the process.

If a matching LRP is found (e.g. the process's version hasn't changed), then Diego::BbsAppsClient#update_app is called. Note that only instances, updated_at, and routes can be updated on an LRP. Changing other fields will require creating a new LRP.

If a matching LRP is NOT found (e.g. the process is new or the version has changed), then Diego::BbsAppsClient#desire_app creates a new LRP for the process.

Open Question: What happens to the old LRP? Do we wait for the sync job to sweep it up?

Instances

If none of the above fields are updated and the instances field is updated, then the process will be scaled via Diego::Runner#scale.

If the process's package is currently pending (calculating this is a whole can of worms), then nothing happens. In this case the sync job will be responsible for eventually scaling the process.

Open Question: In the v3 world, does this flow make sense? Theoretically uploading a new package should be independent from scaling the process.

If the process's package is NOT pending, it will call Diego::Messenger#send_desire_request and re-join the flow above.

Sync Job

The runtime sync job runs every 30 seconds (by default) on the Cloud Controller Clock. Again focusing on Diego, when the sync job runs, it checks to make sure that CC's processes match the LRPs in Diego using Diego::ProcessesSync.

Diego::ProcessesSync loops over all the processes in the ccdb and checks if there is a corresponding LRP in Diego. To do this comparison, it again uses Diego::ProcessGuid. Remember that this class identifies processes using a combination of the process's guid and it's version.

At this point, the sync job behaves very similarly to the ProcessObserver. If a matching LRP is found and the process's updated_at is different than the LRP's, then it calls Diego::BbsAppsClient#update_app. If a matching LRP is NOT found, then it calls Diego::BbsAppsClient#desire_app.

Finally, the sync job deletes all remaining LRPs that haven't been matched to a process.

Restart Action

The AppRestart action is a special way to reach into the runtime without going through the ProcessObserver or the Diego::Sync job.

The ProcessObserver will not fail if there are difficulties communicating with Diego (since the sync job will be around to clean up later). This means that stopping an app will not guarantee that the app is actually stopped in the runtime. AppRestart aims to make sure that the app's processes are actually stopped before starting them again.

To do this, the AppRestart action calls ProcessRestart.restart. This in turn calls Diego::Runner#stop, Diego::Messenger#send_stop_app_request, and then Diego::BbsAppsClient#stop_app, which deletes the LRP. It then goes through a similar call stack to create a new LRP for the process.

One interesting eccentricity of bypassing the ProcessObserver is that the updated_at timestamps won't match between the process and the LRP, so the clock will come through and do an unnecessary, albeit harmless, update on the LRP.

Deployments

In the base case, the deployment updater rolls between processes by updating those process's instances. This then filters through the ProcessObserver to the runtime as described above. Thus deployments do not act meaningfully differently than scaling individual processes via the API.

Non-Web Processes

The one case where deployments reach directly to the runtime is essentially the same as AppRestart described above. For non-web process types, the deployment updater calls ProcessRestart, which then follows the steps described above.

Summary

Here is a flow chart summarizing what happens when updating a process:

Process update flowchart

Clone this wiki locally