Common traps with message queues based communication (AWS SQS)

Message queues are a great way to distribute data across microservices and for asynchronous communication. Our team uses queues to handle the following:

communication between microservices;
for extracting long running jobs and decoupling;
to handle request spikes;
pub-sub.

Queues are inevitable if you want highly-scalable applications and a fault tolerant infrastructure, but they can also be potentially disastrous.

In this post, I’ll outline the top 3 most common pitfalls based on our experience. Keep in mind that we work with AWS and this article is from SQS perspective. If you are using other tools, the situation might be different.

Out of order deliveries

AWS promises that Standard queues provide “best-effort ordering” which means that messages may occasionally be delivered in a different order than the one they were originally sent in. In many use cases that’s not a problem, but if you application depends on persistence (e.g. a message is dependent on another message’s delivery), you will eventually end up in trouble.

Two options here:

Configure your backend so that it would return the premature message to the queue if it can’t be processed properly;
use FIFO queues that guarantees messages are delivered in the same order they were sent, but keep in mind that FIFO queues can handle up to 300 transactions per second (Standard queues can handle unlimited number of transactions) so that can quickly become a bottleneck for your application.

If it’s possible, always stick with option 1 as it would ensure high throughput.

Multiple deliveries

Standard SQS queues guarantee at least one delivery which means that from time to time a message will be delivered twice. This occurs because of AWS’s severe parallelization for the sake of high-availability. It may sound a little innocent at first if you are using queues to send emails and convert documents, but now imagine a customer being charged twice for the same order.

Depending on the situation there are couple scenarios that might be appropriate:

attach an identifier to the message and then validate that identifier’s uniqueness;
check with the submitter whether the action still hasn’t been completed;
again, use FIFO while being cautions of their scalability limitations.

No delivery confirmations

When a messages is sent to the queue, the submitter would always receive a success response for the message submission, but never a response that indicates whether the message has been successfully processed or not. That can be limiting, but at the end, it’s part of the asynchronous communication paradigm.

If you need a delivery confirmation, consider the following potential solutions:

create an SNS topic, subscribe the interested parties and publish the result there;
send an email if the result is something the user is interested in.