Installa Hadoop Multinode Cluster billi tuża CDH4 f'RHEL/CentOS 6.5


Hadoop huwa qafas ta 'programmazzjoni ta' sors miftuħ żviluppat minn apache biex jipproċessa dejta kbira. Juża HDFS (Hadoop Distributed File System) biex jaħżen id-dejta fuq id-datanodes kollha fil-cluster b'mod distributtiv u mapreduce mudell biex jipproċessa d-dejta.

Namenode (NN) huwa demon ewlieni li jikkontrolla HDFS u Jobtracker (JT) huwa demon ewlieni għall-magna mapreduce.

F'dan it-tutorja qed nuża żewġ CentOS 6.3 VMs 'master' u 'node' jiġifieri. (il-kaptan u n-node huma l-ismijiet tal-host tiegħi). L-IP kaptan huwa 172.21.17.175 u l-IP tan-node huwa 172.21.17.188. L-istruzzjonijiet li ġejjin jaħdmu wkoll fuq verżjonijiet RHEL/CentOS 6.x.

 hostname

master
 ifconfig|grep 'inet addr'|head -1

inet addr:172.21.17.175  Bcast:172.21.19.255  Mask:255.255.252.0
 hostname

node
 ifconfig|grep 'inet addr'|head -1

inet addr:172.21.17.188  Bcast:172.21.19.255  Mask:255.255.252.0

L-ewwel kun żgur li l-hosts tal-clusters kollha qegħdin hemm fil-fajl /etc/hosts (fuq kull node), jekk ma jkollokx DNS stabbilit.

 cat /etc/hosts

172.21.17.175 master
172.21.17.188 node
 cat /etc/hosts

172.21.17.197 qabox
172.21.17.176 ansible-ground

Installazzjoni ta' Hadoop Multinode Cluster f'CentOS

Aħna nużaw repożitorju uffiċjali tas-CDH biex ninstallaw CDH4 fuq l-hosts kollha (Master u Node) fi cluster.

Mur fil-paġna uffiċjali tat-tniżżil tas-CDH u aqbad il-verżjoni CDH4 (jiġifieri 4.6) jew tista' tuża l-kmand wget li ġej biex tniżżel ir-repożitorju u tinstallah.

# wget http://archive.cloudera.com/cdh4/one-click-install/redhat/6/i386/cloudera-cdh-4-0.i386.rpm
# yum --nogpgcheck localinstall cloudera-cdh-4-0.i386.rpm
# wget http://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.x86_64.rpm
# yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm

Qabel ma tinstalla Hadoop Multinode Cluster, żid iċ-Cloudera Public GPG Key mar-repożitorju tiegħek billi tħaddem wieħed mill-kmand li ġej skont l-arkitettura tas-sistema tiegħek.

## on 32-bit System ##

# rpm --import http://archive.cloudera.com/cdh4/redhat/6/i386/cdh/RPM-GPG-KEY-cloudera
## on 64-bit System ##

# rpm --import http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera

Sussegwentement, mexxi l-kmand li ġej biex tinstalla u ssettja JobTracker u NameNode fuq is-server Master.

 yum clean all 
 yum install hadoop-0.20-mapreduce-jobtracker
 yum clean all
 yum install hadoop-hdfs-namenode

Għal darb'oħra, mexxi l-kmandi li ġejjin fuq is-server Master biex issettja n-nodu tal-isem sekondarju.

 yum clean all 
 yum install hadoop-hdfs-secondarynam

Sussegwentement, waqqaf it-tasktracker u d-datanode fuq l-hosts tal-clusters kollha (Node) minbarra l-hosts ta’ NameNode JobTracker, NameNode, u Sekondarji (jew Standby) (fuq in-node f’dan il-każ).

 yum clean all
 yum install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode

Tista 'tinstalla l-klijent Hadoop fuq magna separata (f'dan il-każ installajt fuq datanode tista' tinstallah fuq kwalunkwe magna).

 yum install hadoop-client

Issa jekk lestejna l-passi ta 'hawn fuq ejja nimxu 'l quddiem biex niskjeraw hdfs (li jrid isir fuq in-nodi kollha).

Ikkopja l-konfigurazzjoni default fid-direttorju /etc/hadoop (fuq kull node fil-cluster ).

 cp -r /etc/hadoop/conf.dist /etc/hadoop/conf.my_cluster
 cp -r /etc/hadoop/conf.dist /etc/hadoop/conf.my_cluster

Uża l-kmand tal-alternattivi biex issettja d-direttorju tad-dwana tiegħek, kif ġej (fuq kull nodu fil-cluster).

 alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
reading /var/lib/alternatives/hadoop-conf

 alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
 alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
reading /var/lib/alternatives/hadoop-conf

 alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster

Issa tiftaħ il-fajl core-site.xml u aġġorna fs.defaultFS fuq kull node fil-cluster.

 cat /etc/hadoop/conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
 <name>fs.defaultFS</name>
 <value>hdfs://master/</value>
</property>
</configuration>
 cat /etc/hadoop/conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
 <name>fs.defaultFS</name>
 <value>hdfs://master/</value>
</property>
</configuration>

Aġġornament li jmiss dfs.permissions.superusergroup f'hdfs-site.xml fuq kull node fil-cluster.

 cat /etc/hadoop/conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
     <name>dfs.name.dir</name>
     <value>/var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value>
  </property>
  <property>
     <name>dfs.permissions.superusergroup</name>
     <value>hadoop</value>
  </property>
</configuration>
 cat /etc/hadoop/conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
     <name>dfs.name.dir</name>
     <value>/var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value>
  </property>
  <property>
     <name>dfs.permissions.superusergroup</name>
     <value>hadoop</value>
  </property>
</configuration>

Nota: Jekk jogħġbok kun żgur li, il-konfigurazzjoni ta 'hawn fuq hija preżenti fuq in-nodi kollha (agħmel fuq nodu wieħed u mexxi scp biex tikkopja fuq il-bqija tan-nodi).

Aġġorna dfs.name.dir jew dfs.namenode.name.dir f'hdfs-site.xml fuq NameNode (fuq Master u Node). Jekk jogħġbok ibdel il-valur kif enfasizzat.

 cat /etc/hadoop/conf/hdfs-site.xml
<property>
 <name>dfs.namenode.name.dir</name>
 <value>file:///data/1/dfs/nn,/nfsmount/dfs/nn</value>
</property>
 cat /etc/hadoop/conf/hdfs-site.xml
<property>
 <name>dfs.datanode.data.dir</name>
 <value>file:///data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn</value>
</property>

Esegwixxi hawn taħt kmandi biex toħloq struttura tad-direttorju u timmaniġġja l-permessi tal-utent fuq il-magna Namenode (Master) u Datanode (Node).

 mkdir -p /data/1/dfs/nn /nfsmount/dfs/nn
 chmod 700 /data/1/dfs/nn /nfsmount/dfs/nn
  mkdir -p /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
  chown -R hdfs:hdfs /data/1/dfs/nn /nfsmount/dfs/nn /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn

Format in-Namenode (fuq Master), billi toħroġ il-kmand li ġej.

 sudo -u hdfs hdfs namenode -format

Żid il-proprjetà li ġejja mal-fajl hdfs-site.xml u ibdel il-valur kif muri fuq Master.

<property>
  <name>dfs.namenode.http-address</name>
  <value>172.21.17.175:50070</value>
  <description>
    The address and port on which the NameNode UI will listen.
  </description>
</property>

Nota: Fil-każ tagħna l-valur għandu jkun l-indirizz ip tal-VM kaptan.

Issa ejja niskjeraw MRv1 (Verżjoni 1 Map-reduce). Iftaħ il-fajl 'mapred-site.xml' wara l-valuri kif muri.

 cp hdfs-site.xml mapred-site.xml
 vi mapred-site.xml
 cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
 <name>mapred.job.tracker</name>
 <value>master:8021</value>
</property>
</configuration>

Sussegwentement, ikkopja l-fajl 'mapred-site.xml' għal magna node billi tuża l-kmand scp li ġej.

 scp /etc/hadoop/conf/mapred-site.xml node:/etc/hadoop/conf/
mapred-site.xml                                                                      100%  200     0.2KB/s   00:00

Issa kkonfigura direttorji tal-ħażna lokali biex jintużaw minn MRv1 Daemons. Għal darb'oħra iftaħ il-fajl mapred-site.xml u agħmel bidliet kif muri hawn taħt għal kull TaskTracker.

<property>
 <name>mapred.local.dir</name>
 <value>/data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local</value>
</property>

Wara li tispeċifika dawn id-direttorji fil-fajl 'mapred-site.xml', trid toħloq id-direttorji u tassenja lilhom il-permessi korretti tal-fajl fuq kull node fil-cluster tiegħek.

mkdir -p /data/1/mapred/local /data/2/mapred/local /data/3/mapred/local /data/4/mapred/local
chown -R mapred:hadoop /data/1/mapred/local /data/2/mapred/local /data/3/mapred/local /data/4/mapred/local

Issa mexxi l-kmand li ġej biex tibda HDFS fuq kull node fil-cluster.

 for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done
 for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done

Huwa meħtieġ li toħloq /tmp b'permessi xierqa eżattament kif imsemmi hawn taħt.

 sudo -u hdfs hadoop fs -mkdir /tmp
 sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
 sudo -u hdfs hadoop fs -mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
 sudo -u hdfs hadoop fs -chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
 sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred

Issa ivverifika l-istruttura tal-Fajl HDFS.

 sudo -u hdfs hadoop fs -ls -R /

drwxrwxrwt   - hdfs hadoop          	0 2014-05-29 09:58 /tmp
drwxr-xr-x   	- hdfs hadoop          	0 2014-05-29 09:59 /var
drwxr-xr-x  	- hdfs hadoop          	0 2014-05-29 09:59 /var/lib
drwxr-xr-x   	- hdfs hadoop         	0 2014-05-29 09:59 /var/lib/hadoop-hdfs
drwxr-xr-x   	- hdfs hadoop          	0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache
drwxr-xr-x   	- mapred hadoop          0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache/mapred
drwxr-xr-x   	- mapred hadoop          0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache/mapred/mapred
drwxrwxrwt   - mapred hadoop          0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging

Wara li tibda HDFS u toħloq '/tmp', iżda qabel tibda l-JobTracker jekk jogħġbok oħloq id-direttorju HDFS speċifikat mill-parametru 'mapred.system.dir' (b'mod awtomatiku $ {hadoop.tmp.dir}/mapred/system u ibdel is-sid għal mapred.

 sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system
 sudo -u hdfs hadoop fs -chown mapred:hadoop /tmp/mapred/system

Biex tibda MapReduce: jekk jogħġbok ibda s-servizzi TT u JT.

 service hadoop-0.20-mapreduce-tasktracker start

Starting Tasktracker:                               [  OK  ]
starting tasktracker, logging to /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-tasktracker-node.out
 service hadoop-0.20-mapreduce-jobtracker start

Starting Jobtracker:                                [  OK  ]

starting jobtracker, logging to /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-master.out

Sussegwentement, oħloq direttorju tad-dar għal kull utent ta 'hadoop. huwa rakkomandat li tagħmel dan fuq NameNode; pereżempju.

 sudo -u hdfs hadoop fs -mkdir  /user/<user>
 sudo -u hdfs hadoop fs -chown <user> /user/<user>

Nota: fejn huwa l-isem tal-utent tal-Linux ta' kull utent.

Inkella, tista' toħloq id-direttorju tad-dar kif ġej.

 sudo -u hdfs hadoop fs -mkdir /user/$USER
 sudo -u hdfs hadoop fs -chown $USER /user/$USER

Iftaħ il-browser tiegħek u ikteb l-url bħala http://ip_address_of_namenode:50070 biex taċċessa Namenode.

Iftaħ tab oħra fil-browser tiegħek u ikteb l-url bħala http://ip_address_of_jobtracker:50030 biex taċċessa JobTracker.

Din il-proċedura ġiet ittestjata b'suċċess fuq RHEL/CentOS 5.X/6.X. Jekk jogħġbok ikkummenta hawn taħt jekk tiffaċċja xi kwistjonijiet bl-installazzjoni, ngħinek bis-soluzzjonijiet.