Yesterday, someone joined the #mcollective IRC channel to ask how to connect MCollective to Amazon’s Simple Queue Service. I explored that idea earlier this year, and decided not to pursue it, but it seems I didn’t get around to sharing the results of the experiment. Until now.
I was setting up a small infrastructure in EC2, but I wanted to have MCollective available. I couldn’t justify the cost of the extra capacity required to run ActiveMQ – I didn’t want the extra hassle, either. On the face of it, SQS seemed like a good place to start my investigation.
Amazon Simple Queue Service (SQS) is a fast, reliable, scalable, fully
managed message queuing service. SQS makes it simple and
cost-effective to decouple the components of a cloud application.
As the name implies, SQS just provides queues – to have something analogous to pub-sub topics, we’re pointed towards the Simple Notification Service.
When combined with Amazon Simple Notification Service (SNS),
developers can ‘fanout’ identical messages to multiple SQS queues in
parallel. When developers want to process the messages in multiple
passes, fanout helps complete this more quickly, and with fewer delays
due to bottlenecks at any one stage. Fanout also makes it easier to
record duplicate copies of your messages, for example in different
Fast, reliable, simple – what’s not to like?
Though I maintain that I’m not much of a programmer, I thought I could take a shot at writing a suitable MCollective connector plugin. There were enough examples of working connectors for me to refer to, and it only had to be good enough to prove a point – I figured I could get someone to help tidy it up if it worked.
You can find the code on my Github, but don’t expect to use it. It’s state is somewhere between “doesn’t work very well” and “doesn’t work at all”. To be fair, most of that is down to my code – but I came to the conclusion that I would not be able to make it work well enough that I’d want to use it.
Unhelpful characteristics of SQS:
- messages may not be delivered in FIFO order
- messages may be delivered more than once
- deleted messages may be redelivered under some circumstances
- the standard polling behaviour might not return all available message – and if there’s fewer than 1000 messages in the queue, it may return nothing at all!
- “long polling” guarantees that if messages are in the queue, at least 1 will be returned – but the definition of this behaviour is pretty vague.
- “long polling” can wait a maximum of 20 seconds for a message.
- SQS is charged per-request – polling (and receiving nothing) is chargeable.
I also ran into trouble automating subscriptions of SQS queues to SNS topics – while it worked, it was possible to end up with duplicate subscriptions that eventually resulted in silent failures.
None of the issues is a show-stopper as long as you can accept mcollective latency being measured in seconds instead of milliseconds.
I’m not planning to spend any more time on this. If I was, I’d look at the following:
- Add a thread to handle long polling messages from SQS into a buffer
- Dedupe received messages to ensure each is delivered to mco only once
- Potentially fiddle order of received messages (whether or not this really matters will depend on your use)
- Carefully groom SNS topics and subscriptions
I found a project on Github called ec2_collective, which aims to build something like MCollective on top of SQS. It claims arbitrary command execution as a feature, and I suspect it underestimates usage charges, but it’s worth a look.
As for me, I’m looking at MCollective’s redis connector. That involves its own tradeoffs, which I’ll save for another post.
You can certainly build solid, scalable applications around SNS and SQS – but they’re not a straight replacement for a traditional STOMP or AMQP message broker. Fire-and-forget and loosely-coupled asynchronous messaging patterns should work out alright – orchestration, not so much.