Continuous Deployment Pipelines with Workflows on GCP
Cloud-native continuous deployment pipelines with Google Cloud Workflows. Workflows is a powerful tool to build CI/CD pipelines. Flow control, serverless and call many types of GCP services!
The problem
For some time, the recommended approach to doing continuous deployment on Google Cloud has been to leverage the cloudbuild service, and just jam everything you have into it. Your builds, deploys, tests, etc. Everything. This might be ok, in some very limited situations, but rarely works well when you scale. Or are bigger than say, a blogging website. Ahem.
There are many tools that have been created which fill this gap. I won’t name them now, but if you do a search for such tool sets you’re sure to find millions of search results of which there will be hundreds of options. However, I tend to try to lean on a cloud provider as much as possible where I can because usually it means there’s one less “provider” or tool set to worry about which I need to either:
Pay a licence for
Update and maintain
Run on a server somewhere.
None of these options to me fit the definition of serverless, and being a serverless enthusiast, and one to save money where possible, I looked to some of the tooling available inside GCP to see if I work out a solution.
The solution
Turns out, there’s an awesome, little-known service called Workflows which fits the bill of what we need, kinda.
On the product page, it describes a shipping example, where the request comes in, there’s a decision made about the availability of the product and then a notification is sent back if there is product with the database updated reducing stock, or a request sent to the supplier if there isn’t stock left. At the end a notice is sent back to the client.
Using the decision making nature of workflows, I wondered if I could build out a CI/CD pipeline to utilise as a mechanism for orchestrating the deployment of code, as well as enabling the pipeline to be self updating. Turns out, you can do this, and it’s actually relatively straightforward, once you get a handle on the workflow yaml syntax, which I personally think is quite nice to use.
How it works
Here’s an example. When the workflow is executed, it accepts a JSON payload of input which contains the sha
key. In it we pass it our git hash. Once the workflow starts, it assigns this sha a value with the getSha
step. Then it calls pendingGithub
step. This step invokes a cloudfunction which updates github to say that the pipeline is running. On github, this is denoted by the little yellow dot next to the commit hash. The above two steps are defined by the workflow service below in YAML.
main:
params: [input]
steps:
- getSha:
assign:
- sha: ${input.sha}
next: pendingGithub
- pendingGithub:
call: http.post
args:
url: https://australia-southeast1-cloud-pipeline-dev.cloudfunctions.net/notify-github
auth:
type: OIDC
body:
sha: ${sha}
repo: "workflow-cicd-demo"
owner: "jgunnink"
state: "pending"
context: "Workflow CD"
description: "Pipeline running..."
workflowId: ${sys.get_env("GOOGLE_CLOUD_WORKFLOW_EXECUTION_ID")}
Looking at the example, it’s fairly straightforward what’s going on. How do we do the decision making then? In my case, I am using cloudbuild triggers to invoke to run cloudbuilds and using the exit status (either zero or non-zero) to decide how to handle the next stage.
Let me show with an example.
- deployCode:
try:
call: googleapis.cloudbuild.v1.projects.triggers.run
args:
projectId: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
triggerId: deployCode
body:
commitSha: ${sha}
result: result
except:
as: error
assign:
- pipelineError: ${error}
- deployCodeResult:
switch:
- condition: ${pipelineError != null} # Go to failure if unsuccessful
next: failGithub
Here we can see that we make use of the try/catch (except) syntax to control flow. We call the cloudbuild run to deploy code within the try block. If the cloudbuild run fails, we catch it in the except block and assign the error to a variable called pipelineError
.
In the result stage, we check for an error in the deployCodeResult
step which, if isn’t null, sends us to the end of the pipeline by running the failGithub
step.
I’ve used these switches all over the place in my particular use case. For example, I do a validation on the section of code that I’m about to deploy. I do a check to see if the code has actually changed, and if so, I then run the deploy step. If it hasn’t changed, I just skip over it and continue on my merry way in the pipeline saving both time, resources and making the pipeline more efficient in the process. Additionally, there’ll be less steps to run so when the bill comes around, there will be a reduced cost as Workflows only charges you by steps consumed.
Zooming out
So taking a step back, what does it look like? How does it all fit together. Time for another drawing of mine which is the best way I can think of to explain a concept with text in a blog.
In the diagram here, you can see a few things going on. When code is committed to github, a cloudbuild instance is started which executes the workflow. The workflow accepts an input of a commit hash. This can be seen in the kickoff-workflow.yaml
steps:
- name: gcr.io/cloud-builders/gcloud
entrypoint: bash
args:
- "-c"
- |
gcloud workflows execute workflow-1 --location=asia-southeast1 --data={\"sha\":\"$COMMIT_SHA\"}
Then, once the workflow starts it sets the sha
variable and notifies github the pipeline is running. This is done via invoking a cloud function which can be seen here:
import axios from "axios";
export interface R {
body: {
sha: string;
repo: string;
owner: string;
description: string;
state: "pending" | "success" | "failure";
context: string;
workflowId: string;
};
}
export const notifyGithub = (req: R, res: any) => {
const r = req.body;
const sha = r.sha;
const repo = r.repo;
const owner = r.owner || "jgunnink";
const url = `https://api.github.com/repos/${owner}/${repo}/statuses/${sha}`;
if (r.state === "failure") {
r.description = `Failed - ${r.description}`;
}
const data = JSON.stringify({
state: r.state,
context: r.context,
description: r.description,
target_url: `https://console.cloud.google.com/workflows/workflow/asia-southeast1/workflow-1/execution/${r.workflowId}?project=${process.env.GCP_PROJECT}`,
});
const config = {
headers: {
Accept: "application/vnd.github.v3+json",
Authorization: `token ${process.env.GITHUB_TOKEN}`,
"Content-Type": "application/json",
"Content-Length": data.length,
"User-Agent": "jgunnink-workflow-bot",
},
};
axios
.post(url, data, config)
.then(res => {
console.log(`Github reponded with: ${res.status}`);
})
.catch(error => {
console.error(error);
});
res.send("Notified Github");
return data;
};
The function accepts parameters it needs to notify github and provide the “niceties” of a github commit status update.
Then we do a comparison against the workflow file for the commit we’re working with and the previous commit, to see if there are changes between them and if so, to run the deployment. This is a cloud build runner which looks like this:
steps:
- name: gcr.io/cloud-builders/git
entrypoint: bash
args:
- "-c"
- |
git checkout master && \
git pull origin master --unshallow && \
git checkout $COMMIT_SHA && \
(git diff --quiet HEAD~1 $COMMIT_SHA ${_COMPARISON}) && \
echo "No difference detected, skipping as there's nothing to deploy." || \
echo "File difference found. Failing build so workflow can deploy changes."
The full source code - and terraform for all this infrastructure is here https://github.com/jgunnink/workflow-cicd-demo/
The code is all open source and free for you to use. Please give it a try and let me know how it works for you.
Short falls of workflows
I’ll admit there are a couple of places where workflows shows that it is still in it’s infancy.
The web user interface is a bit clunky and there isn’t a way to visualise a workflow in flight. You can see that it’s running, but not where it’s at. If it fails, you can’t see where it failed, just that it did.
Additionally, regardless of successful or failed runs, you can’t see the path it took - eg, which stages/decisions ran and how it got to the failed or succeeded state.
The visualiser can get messy if you have complicated workflows and the one for example in my source code above is a victim of this.
You can’t “pause” a workflow and resume it later. This actually is a good thing (and makes sense to me) but in the enterprise land of pipelines for software delivery, many management want a manual stage gate to block automated flows to production (for example). So whilst you can’t do it via pausing one workflow, you could, for example create a second workflow which could be invoked by an authorised user to deploy to production only.
Summarising Workflows for CI/CD
Some of the reasons I love this:
The pipeline is self-managing - it will update itself as part of it running.
The pipeline is serverless (so, no costs unless it’s being used, nothing to maintain or patch and a generous free tier!)
It’s extensible. Want to update a JIRA ticket? Notify slack channel? Report statuses back to github? Update a database? Easy!
In this sample, I’ve used two connectors, functions and cloudbuild. But there are more. There’s bigquery, cloud sql, firestore, pubsub, scheduler, translation, secret manager, google kubernetes engine and many more. There’s other features too of Workflows, like retry policies, error handling (which I’ve touched on here), authentication (which I’ve used to authenticate calls to cloud functions) and so on.
The ability to customise a pipeline to your hearts content can unlock a lot of neat features which might have previously been a pipedream. Additionally, the power to be able to use decisions and controlling the flow of the pipeline is power. Many pipelines are designed with the only way to use a pipeline is step by step, ordered and controlled flows. But having the ability to do different things, depending on previous steps is powerful for many use-cases.
Keen to hear your thoughts and if you have any questions, please ask!