Published: Sat 16 November 2013
By Zachary Stevens
In Blog .
tags: mcollective aws
Yesterday, someone joined the #mcollective IRC channel to
ask how to connect MCollective to Amazon's Simple Queue Service .
I explored that idea earlier this year, and decided not to pursue it,
but it seems I didn't get around to sharing the results of the
experiment. Until now.
Context
I was setting up a small infrastructure in EC2, but I wanted to have
MCollective available. I couldn't justify the cost of the extra
capacity required to run ActiveMQ - I didn't want the extra hassle,
either. On the face of it, SQS seemed like a good place to start my
investigation.
Amazon Simple Queue Service (SQS) is a fast, reliable, scalable, fully
managed message queuing service. SQS makes it simple and
cost-effective to decouple the components of a cloud application.
— Amazon, SQS Overview
As the name implies, SQS just provides queues - to have something
analogous to pub-sub topics, we're pointed towards the
Simple Notification Service .
When combined with Amazon Simple Notification Service (SNS),
developers can 'fanout' identical messages to multiple SQS queues in
parallel. When developers want to process the messages in multiple
passes, fanout helps complete this more quickly, and with fewer delays
due to bottlenecks at any one stage. Fanout also makes it easier to
record duplicate copies of your messages, for example in different
databases.
— Amazon, SQS Overview
Fast, reliable, simple - what's not to like?
Experiment
Though I maintain that I'm not much of a programmer, I thought I could
take a shot at writing a suitable MCollective connector plugin. There
were enough examples of working connectors for me to refer to, and it
only had to be good enough to prove a point - I figured I could get
someone to help tidy it up if it worked.
You can find
the code on my Github ,
but don't expect to use it. It's state is somewhere between "doesn't
work very well" and "doesn't work at all". To be fair, most of that
is down to my code - but I came to the conclusion that I would not be
able to make it work well enough that I'd want to use it.
Unhelpful characteristics of SQS:
messages may not be delivered in FIFO order
messages may be delivered more than once
deleted messages may be redelivered under some circumstances
the standard polling behaviour might not return all available
message - and if there's fewer than 1000 messages in the queue, it
may return nothing at all!
"long polling" guarantees that if messages are in the queue, at
least 1 will be returned - but the definition of this behaviour is
pretty vague.
"long polling" can wait a maximum of 20 seconds for a message.
SQS is charged per-request - polling (and receiving nothing) is
chargeable.
I also ran into trouble automating subscriptions of SQS queues to SNS
topics - while it worked, it was possible to end up with duplicate
subscriptions that eventually resulted in silent failures.
None of the issues is a show-stopper as long as you can accept
mcollective latency being measured in seconds instead of
milliseconds.
Future Possibilities?
I'm not planning to spend any more time on this. If I was, I'd look
at the following:
Add a thread to handle long polling messages from SQS into a buffer
Dedupe received messages to ensure each is delivered to mco only
once
Potentially fiddle order of received messages (whether or not this
really matters will depend on your use)
Carefully groom SNS topics and subscriptions
Alternatives
I found a project on Github called
ec2_collective ,
which aims to build something like MCollective on top of SQS. It
claims arbitrary command execution as a feature, and I suspect it
underestimates usage charges, but it's worth a look.
As for me, I'm looking at MCollective's redis connector. That
involves its own tradeoffs, which I'll save for another post.
Summary
You can certainly build solid, scalable applications around SNS and
SQS - but they're not a straight replacement for a traditional STOMP
or AMQP message broker. Fire-and-forget and loosely-coupled
asynchronous messaging patterns should work out alright -
orchestration, not so much.