saprkcassandra

Apache Spark Cassandra Read and Write

This entry was posted in Cassandra, Spark on by .   0 Comment[s]

Spark Cassandra integration is an wonderful combination for many level processing . Quick start is all about Spark Cassandra connectivity.

saprkcassandra

1.Perquisite of Spark Cassandra connectivity

apache-cassandra-2.2.3

spark-1.5.1-bin-hadoop2.6

jdk1.7.0_45

cassandra-driver-core-2.1.5.jar

spark-cassandra-connector_2.10-1.5.0-M1.jar


1.1 Setting  Environment

envfinal

Note : Even you can set this in .bash_profile and .bashrc  also

2.Apache Spark stand alone quick start

Please Refer  to chennaihug.org for spark installation

http://chennaihug.org/knowledgebase/spark-master-and-slaves-single-node-installation/

But use the version mentioned above

3.Apache Cassandra stand alone quick start

Please Refer  to chennaihug.org for Cassandra installation

http://chennaihug.org/knowledgebase/cassandra-single-node-installation/

But use the version mentioned above

4.Configuration

a.Copy all the apache-cassandra-2.2.3/lib jars to spark-1.5.1-bin-hadoop2.6/lib + cassandra-driver-core-2.1.5.jar (Have to download this jar as mentioned above

b.Open spark-1.5.1-bin-hadoop2.6/conf/

Rename the spark-evn.sh-template to  spark-evn.sh and include the following two ENV variables and path

sparkenv

c.Start Cassandra and Spark check the daemons with jps(Java Process Status)

sp3f

5.Key space and Table in Cassandra

Create a key space and Table need for this quick start in Cassandra

sp1f

Insert some records . Here this quick start uses patient dataset as input

dataset

Bulk load this record by using the following COPY command in Cassandra

sp2f

6.Start the spark-shell

Move the downloaded “spark-cassandra-connector_2.10-1.5.0-M1.jar” to spark-1.5.1-bin-hadoop2.6

bin/spark-shell –jars spark-cassandra-connector_2.10-1.5.0-M1.jar

And run the following

6.1 Configure a new sc

code1

6.2 Access to Cassandra

code2

6.3 Insert data in Cassandra

code3

Reference images

a. Creating spark  context for Cassandra

sp5fsp6f

b.Insert data in Cassandra

insert2f

c.Now check the CQLSH for the newly inserted record

insert3

P Saravana kumar, Data Engineer @ DataDotz.

SB Gowtham, Data Engineer @ DataDotz.

DataDotz is a Chennai based BigData Team primarily focussed on consulting and training on technologies such as Apache Hadoop, Apache Spark , NoSQL(HBase, Cassandra, MongoDB), Search and Cloud Computing. Gowtham can be reached via his linkedin

profile(https://in.linkedin.com/in/saravanasaro)

profile(https://in.linkedin.com/in/sbgowtham)