/ code

[Guide] Setting up RabbitMQ cluster over AWS EC2 instances

I did not find a proper guide over the web and had to troubleshoot my way through when implementing the same. And hence, the following!

Since my experience of working with production systems, I have had a few realizations-

“There are some tasks that take time, and there are some that don’t.-”

and also

“There are some tasks that are the need of the hour, while there are some that are not”.

Most of the applications we work with use APIs. And thus, API response time is a major metric to the evaluation of application performance, which in turn adds to the business value. To make the response time shorter, the ideal approach would be to execute only tasks required and take minimal time possible to provide the response. But then, we don’t live in an ideal world. Maybe, we want to send emails on an event or we need to notify that other service. But, is it fair to let your user wait for all that? Of course, not. All of those tasks need to be queued for future execution so that the need of the hour can be addressed first.

To deal with the above issue, I’ve been using RabbitMQ, as a middleman between applications and asynchronous task workers. Here, we’ll be looking at how to setup RabbitMQ and also some of the tricky issues I faced setting up a rabbitMQ cluster over AWS EC2 instances.

The definitions:

RabbitMQ is an open source message broker software or in simple terms, a queue-manager.

Applications or services connect to RabbitMQ, they can then, either add messages/tasks to the queue(publisher), or fetch from the queue(subscriber). A communication platform for microservices, you can say.

Each running an instance of the RabbitMQ application is a node.

A RabbitMQ cluster (or broker) can consist of multiple nodes. All of the nodes will share configuration information, such as users, exchanges, and queues.
And here, we are to look at clustering EC2 nodes/instances running rabbitMQ.

RabbitMQ is an implementation of AMQP, the emerging standard for high-performance enterprise messaging. The RabbitMQ server is a robust and scalable implementation of an AMQP broker.

Why clustering?

I’ve setup rabbitMQ with a couple of our applications at Aubergine solutions, to queue our async tasks. The tasks are then received by celery workers and executed. With distributed production servers, we are faced with a major task of monitoring over all the tasks being queued onto rabbitMQ. And primarily, eliminating a single point of failure.

Therefore, to avoid a single point of failure and make the system performant, even at scale, we are to form a cluster of rabbitMQ nodes.

How?

Installing rabbitMQ-

sudo apt-get install rabbitmq-server

To work with rabbitMQ on the server, we’ll be mostly using: rabbitmqctl — a command line tool for managing a RabbitMQ broker

To check the status of the cluster and the node-

sudo apt-get rabbitmqctl cluster_status

The result would look something like-

[{nodes,[{disc,[‘rabbit@ip-172–31–33–113’]}]},
{running_nodes,[‘rabbit@ip-172–31–33–113’]},
{cluster_name,<<”rabbit@ip-172–31–33–113.us-west-2.compute.internal”>>},
{partitions,[]}]

You can now use the running node to queue tasks from your application using the broker url, which should look something like-

amqp://myuser:mypassword@127.0.0.1:5672//

Now to setup a cluster with more than one nodes, setup rabbitMQ following the similar steps on to other nodes.

Suppose we have a second EC2 node -

Cluster status of node ‘rabbit@ip-172–31–35–88’ …
[{nodes,[{disc,[‘rabbit@ip-172–31–35–88’]}]},
{running_nodes,[‘rabbit@ip-172–31–35–88’]},
{partitions,[]}]
…done.

Let’s attach node-2 to the cluster of node-1.

The Erlang cookie- for node authentication

To get on the same page, both the nodes will need to have same Erlang cookie. RabbitMQ is an Erlang application. Find the cookie on node-1 -

sudo cat /var/lib/rabbitmq/.erlang.cookie

Copy the cookie and replace the one on node-2, same place-

sudo vim var/lib/rabbitmq/.erlang.cookie

It is a key that indicates that the multiple Erlang nodes can communicate with each other. All the nodes in a RabbitMQ cluster must have the same Erlang cookie.

Once done, restart the service,

sudo service rabbitmq-server restart

To cluster them together, we’ll first have to stop the node to be attached.

sudo rabbitmqctl stop_app

Reset node — to break up the node from current cluster.

sudo rabbitmqctl reset

The tricky part-

The EC2 rabbitmq cluster will be named something like — rabbit@ip-XXX-XXX-XX-XX
Connecting to a node works fine if both the nodes are in the same local network, but If you need to connect to a cluster outside local EC2 network, you will need to configure the hosts file at node 2-

sudo vim /etc/hosts

Add this to your hosts file then-

AA.AA.AAA.AAA ip-XXX-XX-XX-XX

Where the former ip with A’s is your node-1’s public IP.

Just in case:

If something like this pops up when clustering:

Error: mnesia_unexpectedly_running

Remember, the node needs to be stopped before clustering:

sudo rabbitmqctl stop_app

The Erlang epmd (erlang port mapper daemon) will use two ports, one for discovering other erlang nodes (port 4369) and a dynamic range for the actual communication.

Now, obviously, we do not want to open up wide range of ports. We’ll need to configure what range it should use then. And that can be done using-

erl -sname test -setcookie mycookie -kernel inet_dist_listen_min 44001 inet_dist_listen_max 44001

To check running epmd nodes:

$ epmd -names
 epmd: up and running on port 4369 with data:
     name test at port 44001

RabbitMQ invokes Erlang. And so, you still need to configure it to use the same kernel values to epmd.

sudo vim etc/rabbitmq/rabbitmq.config

You’ll need to add the following -

[
 {rabbit, [
 {cluster_nodes, {[‘rabbit@mynode1’, ‘rabbit@mynode2’], disc}},
 {cluster_partition_handling, ignore},
 {default_user, <<”guest”>>},
 {default_pass, <<”guest”>>}
 ]},
 {kernel, [
 {inet_dist_listen_max, 44001},
 {inet_dist_listen_min, 44001}
 ]}
].

And finally, for the action-

sudo rabbitmqctl join_cluster rabbit@node1

What next ?

Now, what if one needs to check the cluster or node status, queues, message rates?

rabbitmqctl with command-line is always an option, but there’s a more intuitive option too- RabbitMQ management console

As the official documentation says- The rabbitmq-management plugin provides an HTTP-based API for management and monitoring of your RabbitMQ server, along with a browser-based UI and a command line tool, rabbitmqadmin.

The plugin is included with the distribution and can be enabled by-

rabbitmq-plugins enable rabbitmq_management

The Web UI is located at: http://server-name:15672/

The UI provides with visual monitoring of the rabbitMQ node’s status as well as the graph of incoming and outgoing tasks.

That is all. Hope it helps!