Software & Cloud Engineer. Perth, Western Australia

Give some love to your dead letters, with the pubsub republisher

Posted on August 1, 2023

Pile of letters in envelopes tied with string.
Photo by Suzy Hazelwood from Pexels

When making use of asynchronous messaging across distributed systems, there’s a promise of sorts made in that there’s a “you give me this message, and I’ll take care of it” mentality that works well until it doesn’t. Anyone who’s worked on any kind of large scale system or even indeed any system, knows that there’s an inherent probability that the system will fail at some point because of any number of unexpected reasons. And whilst in programming code you could handle that in a try/catch error handle, how do you do it in distributed systems where messages between systems are done in an eventual manner? A number of applications will reach for some kind of messaging system like RabbitMQ or if you’re in the cloud, you might reach for Pub/Sub in Google Cloud or a combination of SNS and SQS if you’re from the Amazon.

But what happens when the downstream system in the chain of processing some task fails to handle the message properly? Dead letters. But if you’re reading this post you probably already knew that. If you didn’t, the term dead letter simply means a message that couldn’t be processed by the downstream system of that message and so the message was taken out of the queue (usually after a number of retries) and put into another message queue for analysis by a development team.

From these dead letters further investigations can be done on the message to determine why it failed. Was the message corrupt? Did it include data we weren’t expecting? Do we have a unit test for this kind of messaging in our application code so that the message doesn’t get dead-lettered again?

So what happens after this analysis is done, and the cause is determined to be outside of the message subscriber scope? i.e. an external system was offline or responded in an unexpected way (API key was changed? Bill wasn’t paid?). In any case, let’s say the API key was changed due to a security risk and now our service can’t communicate with that service. Well, now that we have updated to a new key we want to reprocess those messages, but how can we do this if there’s thousands of messages, or even say 100, you don’t want to do it manually.

Introducing the pubsub re-publisher tool!

I made this a 10 days ago (at time of publishing) as a little tool which uses a Google Cloud serverless function to hook into your pubsub subscriptions, and republish those dead letters. Here’s the link to the code and repo which you can freely use:

The readme has everything you need to get started with the tool and you can invoke with credentials to ensure no-one else can run your function without authorisation. Additionally, you can use IAM role bindings to ensure that only the service account you designate to run the tool has permission to pull from a given subscription and publish to a given topic if you need to use the tool for just one use-case. However I wouldn’t recommend that and here’s why:

  • The tool isn’t automated, and is designed to be manually invoked by humans.
  • As such programmatic invocations aren’t possible (unless IAM permissions permit).
  • By not locking it down to a 1-1 topic/subscription, you can re-use the same function for different messages queues in the project. The function accepts two required parameters which specify which subscription to pull from and which topic to publish to.
  • Additionally, you won’t need to deploy the function multiple times for each use case.

You can also run this tool manually locally if you want to without having to deploy it to GCP as a cloud function. Simply clone the repo, then login with application default credentials using the gcloud cli:

gcloud auth application-default login

Then you’ll be able to POST http requests to localhost:8080 with hot reload. Enjoy.