Software & Cloud Engineer. Perth, Western Australia

Automatic rotation of CloudSQL passwords with Secret Manager

Posted on April 27, 2021

A pile of gold and silver keys.
Photo by Samantha Lam on Unsplash

I came across a problem over the past few weeks regarding password rotation for Cloud SQL. Whilst automated provisions exist for SSL/TLS certificates as well as being able to connect with the Cloud SQL proxy the ability to rotate passwords is not a feature that’s offered by GCP.

After some thought, I came up with a hacky(?) solution to automate the current manual task of rotating the db password.

Currently, compute workloads - be they functions, containers or whatever - would request the database credential when they start by reaching out to Secret Manager. They then connect to the database, with the credentials returned. When a rotation needs to occur, currently a human (gasp!) needs to go into Secret Manager and update the password and then ensures the database has that new password. Here be dragons. A person shouldn’t have that kind of access to a running database. We need to automate this situation to ensure we don’t have to have someone with access to production database passwords.

Even better, a serverless solution!

Architecture diagram depicting the flow of data through the automated system

To describe what’s going on above, there’s a Cloud Scheduler event which is triggered based on a Unix cron format of your choosing so that you can decide when the password needs rotating. Cloud scheduler notifies Cloud Pub/Sub which will be our message broker to notify downstream subscribers who are interested in the event. At this time, there’s just the one subscriber which is our Cloud Function. The function has three tasks.

  1. The first is to generate a “random” password, or sufficiently lengthy set of characters to make the secret long enough that it constitutes a complex enough password suitable for your database.
  2. The second task is to update the database password which is stored in secrets manager.
  3. Finally, it’s last task is to store the updated password in the running database as the new password to use.

Here’s some code I hacked together from the various documentation for our purposes, in Go.

package pwrotator

import (
  "context"
  "fmt"
  "log"
  "golang.org/x/oauth2/google"
  "google.golang.org/api/sqladmin/v1beta4"
  secretmanager "cloud.google.com/go/secretmanager/apiv1"
  secretmanagerpb "google.golang.org/genproto/googleapis/cloud/secretmanager/v1"
)

// PubSubMessage is the payload of a Pub/Sub event.
// See the documentation for more details:
// https://cloud.google.com/pubsub/docs/reference/rest/v1/PubsubMessage
type PubSubMessage struct {
  Data []byte `json:"data"`
}

// SecretsUpdater consumes a Pub/Sub message and updates a secret.
func SecretsUpdater(ctx context.Context, m PubSubMessage) error {
  parent := "projects/<PROJECT_NUMBER>/secrets/my-test-secret-1"
  password := []byte("insert your generated password here")
  updateCloudSQL(password)
  addSecretVersion(parent, password)
  return nil
}

// addSecretVersion adds a new secret version to the given secret with the
// provided payload.
func addSecretVersion(parent string, payload []byte) error {
  // Create the client.
  ctx := context.Background()
  client, err := secretmanager.NewClient(ctx)
  if err != nil {
    return fmt.Errorf("failed to create secretmanager client: %v", err)
  }

  // Build the request.
  req := &secretmanagerpb.AddSecretVersionRequest{
    Parent: parent,
    Payload: &secretmanagerpb.SecretPayload{
      Data: payload,
    },
  }

  // Call the API.
  result, err := client.AddSecretVersion(ctx, req)
  if err != nil {
    return fmt.Errorf("failed to add secret version: %v", err)
  }
  log.Println("Updated the secret version")
  log.Println(result.Name)
  return nil
}

func updateCloudSQL(password []byte) {
  ctx := context.Background()
  c, err := google.DefaultClient(ctx, sqladmin.CloudPlatformScope)
  if err != nil {
    log.Fatal(err)
  }

  sqladminService, err := sqladmin.New(c)
  if err != nil {
    log.Fatal(err)
  }

  // Project ID of the project that contains the instance.
  project := "<PROJECT_NUMBER>"
  // Database instance ID. This does not include the project ID.
  instance := "my-db-instance"
  rb := &sqladmin.User{
    Host:     "%",
    Name:     "root",
    Password: string(password[:]),
  }

  resp, err := sqladminService.Users.Update(project, instance, rb).Context(ctx).Do()
  if err != nil {
    log.Fatal(err)
  }

  fmt.Printf("%#v\n", resp)
}

Hopefully the comments in the functions are helpful point out what the various pieces are doing.

Since password authentication generally validates the credentials before every SQL command, it will mean downtime before the application can be rebooted or the pods bounced as they’ll need to re-fetch the secret. However, if using a file (or volume) for retrieving passwords, then this can have great effects on the automation of secret rotation.

From the Google Cloud docs:

Mounting the secret as a volume. This makes the secret available to the function as a file. If you reference a secret as a volume, your function accesses the secret value from Secret Manager each time the file is read from disk. This makes mounting as a volume a good strategy if you want to reference the latest version of the secret instead of a pinned version of the secret. This method also works well if you plan to implement secret rotation.

Alternatively, if you’re making an API call to secrets manager when the application starts and storing it as a variable somewhere, you could implement some logic if the password authentication is failing. An attempt at a failsafe might look something like this.

if (passwordIsNotValid) {
  newPassword = fetchPasswordFromSecretsManager;
}

… but that would probably add unacceptable latency to your application whilst it’s refetching the password, then doing the database query before responding to your user. The question then becomes, is it better to have a latent application response succeed or one that fails fast.

I hope you got something out of this. As many organisations are moving to the cloud, legacy applications which need solutions and security without immediate rework sometimes need solutions which are a bit left field. Or as the meme says:

Man in suit and tie pointing at camera with mouth open mid-sentence