Hadoop-Olympian

HDFS NameNode HA Configuration in Hadoop 2.x – Part 2

This entry was posted in Big Data, Blog, Hadoop on by .   0 Comment[s]

Apache Hadoop High Availability Cluster Configuration

 

Step – 1

Software Requirements:

OS           –      Linux (Cent OS, Ubuntu, RedHat)

Hadoop     –     hadoop-2.6.0

Java           –     jdk1.6.0_45

Zookeeper –     Zookeeper – 3.4.6

 

Step – 2

Setting the Hosts file your IP Address

Setup in all the nodes in the cluster

$sudo vi /etc/hosts

10.0.0.7           hadoop1

10.0.0.8           hadoop2

10.0.0.9           hadoop3

 

Step – 3

SSH Configuration

Setup all Machines in below ssh-keygen commands

$ssh-keygen -t rsa

$cd .ssh

$cat id_rsa.pub >> authorized_keys

$ssh 10.0.0.7

(Above commands Without Asking Password )

 

 

Copy From Master Node to Standby and slaves node in id_rsa.pub key

$ssh-copy-id  dd@hadoop2

$ssh-copy-id  dd@hadoop3

Login & Check the without Password from Master node with cluster Nodes

$ssh hadoop1

$ssh hadoop2

$ssh hadoop3

 

Step – 4

Zookeeper Configuration in all the nodes in the cluster:

$tar -zxvf zookeepr-3.5.6.tar.gz

$cd zookeepr-3.5.6/conf

$cp zoo_sample.cfg zoo.cfg

$vi zoo.cfg

tickTime=2000

clientPort=2181

initLimit=5

syncLimit=2

dataDir=/home/dd/zookeeper/data/

dataLogDir=/home/dd/zookeeper/logs/

server.1=hadoop1:2888:3888

server.2=hadoop2:2889:3889

server.3=hadoop3:2890:3890

 

Data and Log directory creation in all the nodes in the cluster

$mkdir -p  /home/dd/zookeeper/data

$ mkdir -p  /home/dd/zookeeper/logs

hadoop1

$vi  /home/dd/zookeeper/data/myid

1 (just type 1)

save and exit (:wq)

hadoop2

$vi  /home/dd/zookeeper/data/myid

2 (just type 2)

save and exit (:wq)

hadoop3

$vi  /home/dd/zookeeper/data/myid

3 (just type 3)

save and exit (:wq)

Start Zookeeper   in all the nodes in the cluster.

$bin/zkServer.sh   start

$jps

QuorumPeerMain     (Running zookeeper Daemon)

 

Check the Status of Zookeeper in all the nodes in the cluster.

hadoop1

$bin/zkServer.sh   status

JMX enabled by default

Using config: /home/dd/zookeeper-3.4.6/bin/../conf/zoo.cfg

Mode: leader

hadoop2

$bin/zkServer.sh   status

JMX enabled by default

Using config: /home/dd/zookeeper-3.4.6/bin/../conf/zoo.cfg

Mode: follower

hadoop3

$bin/zkServer.sh   status

JMX enabled by default

Using config: /home/dd/zookeeper-3.4.6/bin/../conf/zoo.cfg

Mode: follower

 

Step – 5

Apache Hadoop Configuration in all the nodes in the cluster:

$tar  -zxvf  hadoop-2.6.0.tar.gz

Apache Hadoop cluster main configuration files are as shown below

  • core-site.xml
  • hdfs-site.xml
  • mapred-site.xml
  • yarn-site.xml
  • hadoop-env.sh
  • yarn-env.sh
  • mapred-env.sh
  • slaves

 

$vi  hadoop-2.6.0/etc/hadoop/core-site.xml

<property>

<name>fs.defaultFS</name>

<value>hdfs://mycluster</value>

</property>

<property>

<name>dfs.journalnode.edits.dir</name>

<value>/home/dd/journal/data</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/home/dd/ journal/tmp</value>

</property>

 

 

$vi  hadoop-2.6.0/etc/hadoop/hdfs-site.xml

<property>

<name>dfs.nameservices</name>

<value>mycluster</value>

<final>true</final>

</property>

<property>

<name>dfs.ha.namenodes.mycluster</name>

<value>mn1,mn2</value>

<final>true</final>

</property>

<property>

<name>dfs.namenode.rpc-address.mycluster.mn1</name>

<value>hadoop1:8020</value>

</property>

<property>

<name>dfs.namenode.rpc-address.mycluster.mn2</name>

<value>hadoop2:8020</value>

</property>

<property>

<name>dfs.namenode.http-address.mycluster.mn1</name>

<value>hadoop1:50070</value>

</property>

<property>

<name>dfs.namenode.http-address.mycluster.mn2</name>

<value>hadoop2:50070</value>

</property>

<property>

<name>dfs.namenode.shared.edits.dir</name>

<value>qjournal:// hadoop1:8485; hadoop3:8485;hadoop2:8485/mycluster</value>

</property>

<property>

<name>dfs.ha.automatic-failover.enabled</name>

<value>true</value>

</property>

<property>

<name>ha.zookeeper.quorum</name>

<value> hadoop1:2181, hadoop2:2181, hadoop3:2181</value>

</property>

<property>

<name>yarn.resourcemanager.store.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>

</property>

<property>

<name>dfs.client.failover.proxy.provider.mycluster</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

 

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/hduser/.ssh/id_rsa</value>

</property>

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.connect-timeout</name>

<value>3000</value>

</property>

 

 

$vi  hadoop-2.6.0/etc/hadoop/yarn-site.xml

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

 

 

$vi  hadoop-2.6.0/etc/hadoop/mapred-site.xml.template

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

 

 

$vi  hadoop-2.6.0/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/home/dd/jdk1.6.0_45

 

$vi  hadoop-2.6.0/etc/hadoop/yarn-env.sh

export JAVA_HOME=/home/dd/jdk1.6.0_45

 

$vi  hadoop-2.6.0/etc/hadoop/ mapred-env.sh

export JAVA_HOME=/home/dd/jdk1.6.0_45

 

$vi  hadoop-2.6.0/etc/hadoop/ slaves

For /etc/hosts (ipaddress Name)

hadoop1

hadoop2

hadoop3

 

 

Step – 6

$vi  .bashrc

export JAVA_HOME=/home/dd/jdk1.6.0_45

export PATH=$JAVA_HOME/bin:$PATH

export HADOOP_PREFIX=”$HOME/hadoop-2.6.0″

export PATH=$PATH:$HADOOP_PREFIX/bin

export PATH=$PATH:$HADOOP_PREFIX/sbin

export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}

export HADOOP_COMMON_HOME=${HADOOP_PREFIX}

export HADOOP_HDFS_HOME=${HADOOP_PREFIX}

export YARN_HOME=${HADOOP_PREFIX}

export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native

export HADOOP_OPTS=”-Djava.library.path=$HADOOP_PREFIX/lib”

 

 

Step – 7

Start journal node in all the cluster nodes (hadoop1,hadoop2 & hadoop3)

 

$mkdir –p /home/hduser/journal/node/local/data

$mkdir /home/hduser/tmp

$ hadoop-daemon.sh  start  journalnode

 

Format Zookeepr file system in hadoop1

 

$hdfs zkfc –formatZK

 

Format namenode in hadoop1

 

hadoop1$hdfs  namenode –format

hadoop1$hadoop-daemon.sh start namenode

hadoop2$hdfs namenode -bootstrapStandby    (Deployment details)

Stop & Start Hadoop from Master Node

$stop-all.sh

$start-all.sh

 

hadoop1$jps

QuorumPeerMain

DataNode

NameNode

JournalNode

NodeManager

ResourceManager

hadoop2$jps

NodeManager

DataNode

NameNode

QuorumPeerMain

DFSZKFailoverController

JournalNode

 

hadoop3$jps

NodeManager

DataNode

QuorumPeerMain

JournalNode

 

Written by SaravanaKumar, Data Engineer @ DataDotz.
DataDotz is a Chennai based BigData Team primarily focussed on consulting and training on technologies such as Apache Hadoop, Apache Spark , NoSQL(HBase, Cassandra, MongoDB), Search and Cloud Computing.