Real-time Analytics with Storm and Cassandra (2015)

Chapter 8. Cassandra Management and Maintenance

In this chapter, we will learn about the gossip protocol of Cassandra. Thereafter, we will delve into Cassandra administration and management in terms of understanding scaling and reliability in action. This will equip you with the ability to handle situations that you would not like to come across but do happen in production, such as handling recoverable nodes, rolling restarts, and so on.

The topics that will be covered in the chapter are as follows:

·        Cassandra—gossip protocol

·        Cassandra scaling—adding a new node to a cluster

·        Replacing a node

·        Replication factor changes

·        Node tool commands

·        Rolling restarts and fault tolerance

·        Cassandra monitoring tools

So, this chapter will help you understand the basics of Cassandra, as well as the various options required for the maintenance and management of Cassandra activities.

Cassandra – gossip protocol

Gossip is a protocol wherein periodically the nodes exchange information with other nodes about the nodes they know; this way, all the nodes obtain information about each other via this peer-to-peer communication mechanism. It's very similar to real-world and social media world gossip.

Cassandra executes this mechanism every second, and one node is capable of exchanging gossip information with up to three nodes in the cluster. All these gossip messages have a version associated with them to track the chronology, and the older gossip interaction updates are overwritten chronologically by newer ones.

Now that we know what Cassandra's gossip is like at a very high level, let's have a closer look at it and understand the purpose of this chatty protocol. Here are the two broad purposes served by having this in place:

·        Bootstrapping

·        Failure scenario handling—detection and recovery

Let's understand what they mean in action and what their contribution is towards the well-being and stability of a Cassandra cluster.

Bootstrapping

Bootstrapping is a process that is triggered in a cluster when a node joins the ring for the first time. It's the seed nodes that we define under the Cassandra.yaml configuration file that help the new nodes obtain the information about the cluster, ring, keyset, and partition ranges. It's recommended that you keep the setting similar throughout the cluster; otherwise, you could run into partitions within the cluster. A node remembers which nodes it has gossiped with even after it restarts. One more point to remember about seed nodes is that their purpose is to serve the nodes at the time of bootstrap; beyond this, its neither a single point of failure, nor does it serve any other purpose.

Failure scenario handling – detection and recovery

Well, the gossip protocol is Cassandra's own efficient way of knowing when a failure has occurred; that is, the entire ring gets to know about a downed host through gossip. On a contrary, situation when a node joins the cluster, the same mechanism is employed to inform the all nodes in the ring.

Once Cassandra detects a failure of a nodes on the ring, it stops routing the client requests to it—failure definitely has some impact on the overall performance of the cluster. However, it's never a blocker until we have enough replicas for consistency to be served to the client.

Another interesting fact about gossip is that it happens at various levels—Cassandra gossip, like real-world gossip, could be secondhand or thirdhand and so on; this is the manifestation of indirect gossip.

Failure of a node could be actual or virtual. This means that either a node can actually fail due to system hardware giving away, or the failure could be virtual, wherein, for a while, network latency is so high that it would seem that the node is not responding. The latter scenarios, most of the time, are self-recoverable; that is, after a while, networks return to normalcy, and the nodes are detected in the ring once again. The live nodes keep trying to ping and gossip with the failed node periodically to see if they are up. If a node is to be declared as permanently departed from the cluster, we require some admin intervention to explicitly remove the node from the ring.

When a node is joined back to the cluster after quite a while, it's possible that it might have missed a couple of writes (inserts/updates/deletes), and thus, the data on the node is far from being accurate as per the latest state of data. It's advisable to run a repair using the nodetool repaircommand.

Cassandra cluster scaling – adding a new node

Cassandra scales very easily, and with zero downtime. This is one of the reasons why it is chosen over many other contenders. The steps are pretty straightforward and simple:

1.    You need to set up Cassandra on the nodes to be added. Don't start the Cassandra process yet; first, follow these steps:

1.    Update the seed nodes in Cassandra.yaml under seed_provider.

2.    Make sure the tmp folders are clean.

3.    Add auto_bootstrap to Cassandra.yaml and set it to true.

4.    Update cluster_name in Cassandra.yaml.

5.    Update listen_address/broadcast_address in Cassandra.yaml.

2.    Start all the new nodes one by one, pausing for at least 5 minutes between two consecutive starts.

3.    Once the node is started, it will proclaim its share of data based on the token range it owns and start streaming that in. This could be verified using the nodetoolnetstat command, as shown in the following code:

4.  mydomain@my-cass1:/home/ubuntu$ /usr/local/cassandra/apache- cassandra-1.1.6/bin/nodetool -h 10.3.12.29 netstats | grep - v 0%

5.  Mode: JOINING

6.  Not sending any streams.

7.  Streaming from: /10.3.12.179

8.  my_keyspace:  /var/lib/cassandra/data/my_keyspace/mycf/my_keyspace-my-hf- 461279-Data.db sections=1  progress=2382265999194/3079619547748 - 77%

9.  Pool Name                    Active   Pending      Completed

10.Commands                        n/a         0             33

11.Responses                       n/a         0       13575829

12.mydomain@my-cass1:/home/ubuntu$

4.    After all the nodes are joined to the cluster, it's strictly recommended that you run a nodetool cleanup command on all the nodes. This is recommended so that they relinquish the control of the keys that were formerly owned by them but now belong to the new nodes that have joined the cluster. Here is the command and the execution output:

5.  mydomain@my-cass3:/usr/local/cassandra/apache-cassandra- 1.1.6/bin$ sudo -bE ./nodetool -h 10.3.12.178 cleanup  my_keyspacemycf_index

6.  mydomain@my-cass3:/usr/local/cassandra/apache-cassandra- 1.1.6/bin$ du -h   /var/lib/cassandra/data/my_keyspace/mycf_index/

7.  53G  /var/lib/cassandra/data/my_keyspace/mycf_index/

8.  mydomain@my-cass3:/usr/local/cassandra/apache-cassandra- 1.1.6/bin$ jps

9.  27389 Jps

10.26893 NodeCmd

11.17925 CassandraDaemon

5.    Note that the NodeCmd process is actually the cleanup process for the Cassandra daemon. The disk space reclaimed after the cleanup on the preceding node is shown here:

6.  Size before cleanup – 57G

7.  Size after cleanup – 30G

Cassandra cluster – replacing a dead node

This section captures the various situations and scenarios that can occur and cause failures in a Cassandra cluster. We will also equip you with the knowledge and talk about the steps to handle these situations. These situations are specific to version 1.1.6 but can be applied to others as well.

Say, this is the problem: you're running an n node, for example let's say there are three node clusters and from that one node goes down; this will result in unrecoverable hardware failure. The solution is this: replace the dead nodes with new nodes.

The following are the steps to achieve the solution:

1.    Confirm the node failure using the nodetool ring command:

2.  bin/nodetool ring -h hostname

2.    The dead node will be shown as DOWN; let's assume node3 is down:

3.  192.168.1.54 datacenter1rack1 Up  Normal 755.25 MB 50.00% 0

4.  192.168.1.55 datacenter1rack1 Down Normal 400.62 MB 25.00%  42535295865117307932921825928971026432

5.  192.168.1.56 datacenter1rack1 Up  Normal 793.06 MB 25.00%  85070591730234615865843651857942052864

3.    Install and configure Cassandra on the replacement node. Make sure we remove the old installation, if any, from the replaced Cassandra node using the following command:

4.  sudorm -rf /var/lib/cassandra/*

Here, /var/lib/cassandra is the path of the Cassandra data directory for Cassandra.

4.    Configure Cassandra.yaml so that it holds the same non-default settings as that of the pre-existing Cassandra cluster.

5.    Set the initial_token range in the cassandra.yaml file of the replacement node to the value of the dead node's token 1, that is, 42535295865117307932921825928971026431.

6.    Starting the new node will join the cluster at one place prior to the dead node in the ring:

7.  192.168.1.54 datacenter1rack1 Up    Normal 755.25 MB 50.00% 0

8.  192.168.1.51 datacenter1rack1 Up    Normal 400.62 MB 0.00%  42535295865117307932921825928971026431

9.  192.168.1.55 datacenter1rack1 Down     Normal 793.06 MB 25.00%  42535295865117307932921825928971026432

10.192.168.1.56 datacenter1rack1 Up    Normal 793.06 MB 25.00%  85070591730234615865843651857942052864

7.    We are almost done. Just run nodetool repair on each node on each keyspace:

8.  nodetool repair -h 192.168.1.54 keyspace_name -pr

9.  nodetool repair -h 192.168.1.51 keyspace_name -pr

10.nodetool repair -h 192.168.1.56 keyspace_name–pr

8.    Remove the token of the dead node from the ring using the following command:

9.  nodetoolremovetoken 85070591730234615865843651857942052864

This command needs to be executed on all the remaining nodes to make sure all the live nodes know that the dead node is no longer available.

9.    This removes the dead node from the cluster; now we are done.

The replication factor

Occasionally, there are instances when we come across situations where we make changes to the replication factor. For example, I started with a smaller cluster so I kept my replication factor as 2. Later, I scaled out from 4 nodes to 8 nodes, and thus to make my entire setup more fail-safe, I increased my replication factor to 4. In such situations, the following steps are to be followed:

1.    The following is the command to update the replication factor and/or change the strategy. Execute these commands on the Cassandra CLI:

2.  ALTER KEYSPACEmy_keyspace WITH REPLICATION = { 'class' :  'SimpleStrategy', 'replication_factor' : 4 };

2.    Once the command has been updated, you have to execute the nodetool repair on each of the nodes one by one (in succession) so that all the keys are correctly replicated as per the new replication values:

3.  sudo -bE ./nodetool -h 10.3.12.29 repair my_keyspacemycf -pr

4.  6

5.  mydomain@my-cass3:/home/ubuntu$ sudo -E  /usr/local/cassandra/apache-cassandra-1.1.6/bin/nodetool -h  10.3.21.29 compactionstats

6.  pending tasks: 1

7.  compaction type  keyspace         column family bytes  compacted      bytes total  progress

8.  Validation       my_keyspacemycf  1826902206  761009279707   0.24%

9.  Active compaction remaining time :        n/a

10.mydomain@my-cass3:/home/ubuntu$

The following compactionstats command is used to track the progress of the nodetool repair command.

The nodetool commands

The nodetool command in Cassandra is the most handy tool in the hands of a Cassandra administrator. It has all the tools and commands that are required for all types of situational handling of various nodes. Let's look at a few widely used ones closely:

·        Ring: This command depicts the state of nodes (normal, down, leaving, joining, and so on). The ownership of the token range and percentage ownership of the keys along with the data centre and rack details is as follows:

·        bin/nodetool -host 192.168.1.54 ring

The output will be something like this:

192.168.1.54 datacenter1rack1 Up    Normal 755.25 MB 50.00% 0

192.168.1.51 datacenter1rack1 Up    Normal 400.62 MB 0.00%  42535295865117307932921825928971026431

192.168.1.55 datacenter1rack1 Down    Normal 793.06 MB 25.00%  42535295865117307932921825928971026432

192.168.1.56 datacenter1rack1 Up    Normal 793.06 MB 25.00%  85070591730234615865843651857942052864

·        Join: This is the option you can use with nodetool, which needs to be executed to add the new node to the cluster. When a new node joins the cluster, it starts streaming the data from other nodes until it receives all the keys as per its designated ownership based on the token in the ring. The status for this can be checked using the netsat commands:

·        mydomain@my-cass3:/home/ubuntu$ /usr/local/cassandra/apache- cassandra-1.1.6/bin/nodetool -h 10.3.12.29 netstats | grep - v 0%

·        Mode: JOINING

·        Not sending any streams.

·        Streaming from: /10.3.12.179

·        my_keyspace:  /var/lib/cassandra/data/my_keyspace/mycf/my_keyspace-mycf- hf-46129-Data.db sections=1  progress=238226599194/307961954748 - 77%

·        Pool Name                    Active   Pending      Completed

·        Commands                        n/a         0             33

·        Responses                       n/a         0       13575829

·        Info: This nodetool option gets all the required information about the node specified in the following command:

·        bin/nodetool -host 10.176.0.146 info

·        Token(137462771597874153173150284137310597304)

·        Load Info        : 0 bytes.

·        Generation No    : 1

·        Uptime (seconds) : 697595

·        Heap Memory (MB) : 28.18 / 759.81

·        Cleanup: This is the option that is generally used when we scale the cluster. New nodes are added and thus the existing nodes need to relinquish the control of the keys that now belong to the new entrants in the cluster:

·        mydomain@my-cass3:/usr/local/cassandra/apache-cassandra- 1.1.6/bin$ sudo -bE ./nodetool -h 10.3.12.178 cleanup  my_keyspacemycf_index

·        mydomain@my-cass3:/usr/local/cassandra/apache-cassandra- 1.1.6/bin$ du -h  /var/lib/cassandra/data/my_keyspace/mycf_index/

·        53G  /var/lib/cassandra/data/my_keyspace/mycf_index/

·        aeris@nrt-prod-cass3-C2:/usr/local/cassandra/apache-cassandra- 1.1.6/bin$ sudo `which jps

·        27389 Jps

·        26893 NodeCmd

·        17925 CassandraDaemon

·        mydomain@my-cass3:/usr/local/cassandra/apache-cassandra- 1.1.6/bin$ du -h  /var/lib/cassandra/data/my_keyspace/mycf_index/

·        53G  /var/lib/cassandra/data/my_keyspace/mycf_index/

·        Compaction: This is one of the most useful tools. It's used to explicitly issue the compact command to Cassandra. This can be done on the entire node, key space, or at the column family level:

·        sudo -bE /usr/local/cassandra/apache-cassandra- 1.1.6/bin/nodetool -h 10.3.1.24 compact

·        mydomain@my-cass3:/home/ubuntu$ sudo -E  /usr/local/cassandra/apache-cassandra-1.1.6/bin/nodetool -h  10.3.1.24 compactionstats

·        pending tasks: 1

·        compaction type keyspace column family bytes compacted bytes  total progress

·        Compaction my_keyspacemycf 1236772 1810648499806 0.00%

·        Active compaction remaining time:29h58m42s

·        mydomain@my-cass3:/home/ubuntu$

Cassandra has two types of compactions: minor compaction and major compaction. The minor cycle of compaction gets executed whenever a new sstable data is created to remove all the tombstones (that is, the deleted entries).

The major compaction is something that's triggered manually, using the preceding nodetool command. This can be applied to the node, keyspace, and a column family level.

·        Decommission: This is, in a way, the opposite of bootstrap and is triggered when we want a node to leave the cluster. The moment a live node receives the command, it stops accepting new rights, flushes the memtables, and starts streaming the data from itself to the nodes that would be a new owner of the key range it currently owns:

·        bin/nodetool -h 192.168.1.54 decommission

·        Removenode: This command is executed when a node is dead, that is, physically unavailable. This informs the other nodes about the node being unavailable. Cassandra replication kicks into action to restore the correct replication by creating copies of data as per the new ring ownership:

·        bin/nodetoolremovenode<UUID>

·        bin/nodetoolremovenode force

·        Repair: This nodetool repair command is executed to fix the data on any node. This is a very important tool to ensure that there is data consistency and the nodes that join the cluster back after a period of time exist. Let's assume a cluster with four nodes that are catering to continuous writes through a storm topology. Here, one of the nodes goes down and joins the ring again after an hour or two. Now, during this duration, the node might have missed some writes; to fix this data, we should execute a repair command on the node:

·        bin/nodetool repair

Cassandra fault tolerance

Well, one of the prime reasons for using Cassandra as a data store is its fault-tolerant capabilities. It's not driven by a typical master-slave architecture, where failure of the master becomes a single point of system breakdown. Instead, it harbors a concept of operating in a ring mode so that there is no single point of failure. Whenever required, we can restart the nodes without the dread of bringing the whole cluster down; there are various situations where this capability comes in handy.

There are situations where we need to restart Cassandra, but Cassandra's ring architecture equips the administrator to do this seamlessly with zero downtime for the cluster. This means that in situations such as the following that requires a Cassandra cluster to be restarted, a Cassandra administrator can restart the nodes one by one instead of bringing down the entire cluster and then starting it:

·        Starting the Cassandra daemon with changes in the memory configuration

·        Enabling JMX on an already running Cassandra cluster

·        Sometimes machines have routine maintenance and need restarts

Cassandra monitoring systems

Now that we have discussed the various management aspects of Cassandra, let's explore the various dashboarding and monitoring options for the Cassandra cluster. There are various free and licensed tools available that we'll discuss now.

JMX monitoring

You can use a type of monitoring for Cassandra that is based on jconsole. Here are the steps to connect to Cassandra using jconsole:

1.    In the Command Prompt, execute the jconsole command:

JMX monitoring

2.    In the next step, you have to specify the Cassandra node IP and port for connectivity:

JMX monitoring

3.    Once you are connected, JMX provides a variety of graphs and monitoring utilities:

JMX monitoring

The developers can monitor heap memory usage using the jconsole Memory tab. This will help you understand the utilization of node resources.

The limitation with jconsole is that it performs node-specific monitoring and not Cassandra-ring-based monitoring and dashboarding. Let's explore the other tools in the context.

Datastax OpsCenter

This is a datastax-provided utility with a graphical interface that lets the user monitor and execute administrative activities from one central dashboard. Note that a free version is available only for nonproduction usage.

Datastax Ops center provides a lot of graphical representations for various important system Key Performance Indicators (KPIs), such as performance trends, summary, and so on. Its UI also provides a historic data analysis and drill down capability on single data points. OpsCenter stores all its metrics in Cassandra itself. The key features of the OpsCenter utility are as follows:

·        KPI-based monitoring for the entire cluster

·        Alerts and alarms

·        Configuration management

·        Easy to set up

You can install and set up OpsCenter using the following simple steps:

1.    Run the following command to get started:

2.  $ sudo service opscenterd start

2.    Connect to OpsCenter in a web browser at http://localhost:8888.

3.    You will get a welcome screen where you will have options to spawn a new cluster or connect to an existing one.

4.    Next, configure the agent; once this is done, OpsCenter is ready for use.

Here is a screenshot from the application:

Datastax OpsCenter

Here we choose the metric to be executed and whether the operation is to be performed on a specific node or all the nodes. The following screenshot captures OpsCenter starting up and recognizing the various nodes in the cluster:

Datastax OpsCenter

The following screenshot captures various KPIs in the aspects of read and writes to the cluster, the overall cluster latency, disk I/O, and so on:

Datastax OpsCenter

Quiz time

Q.1. State whether the following statements are true or false.

1.    Cassandra has a single point of failure.

2.    A dead node is immediately detected in a Cassandra ring.

3.    Gossip is a data exchange protocol.

4.    The decommission and removenode commands are same.

Q.2. Fill in the blanks.

1.    _______________ is the command used to run compactions.

2.    _______________ is the command to get the information about a live node.

3.    ___________ is the command that displays the entire cluster information.

Q.3. Execute the following use case to see Cassandra high availability and replications:

1.    Creating a 4-node Cassandra cluster.

2.    Creating a keyspace with a replication factor of 3.

3.    Bringing down a Cassandra daemon on one the nodes.

4.    Executing nestat on each node to see the data streaming.

Summary

In this chapter, you learned about the concepts of the gossip protocol and adapted tools used for various scenarios such as scaling the cluster, replacing a dead node, compaction, and repair operations on Cassandra.

In the next chapter, we will discuss storm cluster maintenance and operational aspects.