Apache Zeppelin: Quick start with Apache Spark

This entry was posted in Spark, zeppelin on by .   0 Comment[s]

Apache  Zeppelin web-based notebook that enables interactive data analytics  against multiple language basket . It also provides Apache  spark in additional  to data visualization.Currently Apache  Zeppelin is in  incubation

For more  , Please refer  Zeppelin official site  :  (https://zeppelin.incubator.Apache .org/)

Current version   :   zeppelin-0.5.5-incubating-bin-all



Currently this quick start is for Apache  Zeppelin against  Apache  Spark

1.Pre-requisites for Apache  Zeppelin

Below are the Pre-requisites


b.Npm   (NPM is NodeJs  package manager)

c.Java 1.7 or later

d.Apache spark for this quick start

1.1 Installation of Perquisite

This quick start  is written for ubuntu OS. For other OS please refer similar commands or write to us

sudo apt-get update

sudo apt-get install git

sudo apt-get install npm    (NPM is NodeJs package manager )

1.2.Download the Java from Oracle. Please Check for Oracle Website if the link is broken. Please check for current version in the Oracle

Download jdk 1.7 or higher because java is the dependency for zeppelin.


1.3.Set the Environment path of java in .bashrc

export JAVA_HOME=/home/datadotz/jdk1.7.0_45

Note : Even you can set this in .bash_profile also

2 Installation of Apache Zeppelin

2.1 Download Apache Zeppelin

Download the Zeppelin from Zeppelin site. Please Check for Zeppelin Website if the link is broken. Please check for current version in the Zeppelin

http://www.us.Apache .org/dist/incubator/zeppelin/0.5.5-incubating/zeppelin-0.5.5-incubating-bin-all.tgz

2.2.configure  variable as needed for Zeppelin



Note : Also can use the latest version of spark -1.5.2

Please use the bellow command to start

zeppelin-daemon.sh  start

This will start Zeppelin server daemon can check the status by using command jps (java process status)

Please check  web page by default Zeppelin UI runs on port 8080


#— Want to change the UI port number change the port number in zeepelin-site.xml –#

3.Spark installation

Please Refer  to chennaihug.org for spark installation


Command to start


4.List of all Daemons        Fig1

This figure shows the list of all daemons running  . If just using zeppelin with spark then no need of Hadoop to up and run.



5.Web UI

a.Web UI    Fig 2

This figure shows the default web UI of Zeppelin which runs on IP address localhost and the port on 8082(in my case I changed the port number from 8080 to 8082)


b.Note book Fig3

This is how you have to create  a note book . Simply to say note book is like a editor where you can run the commands and scripts.


c.Use Apache  spark Interpreter  Fig 4

This just a sample example to load a file form my Linux in to spark . And testing  it by running count.


6.Analysis of drug data using Apache  spark SQL  and Zeppelin

Load the data ad create a schema and temporary table

Table name : customers

Input Data set : datagen_10.txt   (drug data set)


Query : To find the total amount of the drugs

“select   drug  , sum(amt) from customers group by  drug”

The output of the query is to return all the records based of sum of amount group by drug

7.Output in Various forms

Form 1 Fig 5

This web UI shows the tabular view of output


Form 2 Fig 6

This web UI shows the bar chart representation


Form 3 Fig 7

This web UI shows the pie chart representation


Form 4  Fig 8


Form 5 Fig 9


Form 6 Fig 10


Hope you enjoyed this blog !!!


SB Gowtham, Data Engineer @ DataDotz.

DataDotz is a Chennai based BigData Team primarily focussed on consulting and training on technologies such as Apache Hadoop, Apache Spark , NoSQL(HBase, Cassandra, MongoDB), Search and Cloud Computing. Gowtham can be reached via his linkedin profile(https://in.linkedin.com/in/sbgowtham)