Installa Hadoop Multinode Cluster billi tuża CDH4 f'RHEL/CentOS 6.5
Hadoop huwa qafas ta 'programmazzjoni ta' sors miftuħ żviluppat minn apache biex jipproċessa dejta kbira. Juża HDFS (Hadoop Distributed File System) biex jaħżen id-dejta fuq id-datanodes kollha fil-cluster b'mod distributtiv u mapreduce mudell biex jipproċessa d-dejta.
Namenode (NN) huwa demon ewlieni li jikkontrolla HDFS u Jobtracker (JT) huwa demon ewlieni għall-magna mapreduce.
F'dan it-tutorja qed nuża żewġ CentOS 6.3 VMs 'master' u 'node' jiġifieri. (il-kaptan u n-node huma l-ismijiet tal-host tiegħi). L-IP kaptan huwa 172.21.17.175 u l-IP tan-node huwa 172.21.17.188. L-istruzzjonijiet li ġejjin jaħdmu wkoll fuq verżjonijiet RHEL/CentOS 6.x.
hostname master
ifconfig|grep 'inet addr'|head -1 inet addr:172.21.17.175 Bcast:172.21.19.255 Mask:255.255.252.0
hostname node
ifconfig|grep 'inet addr'|head -1 inet addr:172.21.17.188 Bcast:172.21.19.255 Mask:255.255.252.0
L-ewwel kun żgur li l-hosts tal-clusters kollha qegħdin hemm fil-fajl /etc/hosts (fuq kull node), jekk ma jkollokx DNS stabbilit.
cat /etc/hosts 172.21.17.175 master 172.21.17.188 node
cat /etc/hosts 172.21.17.197 qabox 172.21.17.176 ansible-ground
Installazzjoni ta' Hadoop Multinode Cluster f'CentOS
Aħna nużaw repożitorju uffiċjali tas-CDH biex ninstallaw CDH4 fuq l-hosts kollha (Master u Node) fi cluster.
Mur fil-paġna uffiċjali tat-tniżżil tas-CDH u aqbad il-verżjoni CDH4 (jiġifieri 4.6) jew tista' tuża l-kmand wget li ġej biex tniżżel ir-repożitorju u tinstallah.
# wget http://archive.cloudera.com/cdh4/one-click-install/redhat/6/i386/cloudera-cdh-4-0.i386.rpm # yum --nogpgcheck localinstall cloudera-cdh-4-0.i386.rpm
# wget http://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.x86_64.rpm # yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm
Qabel ma tinstalla Hadoop Multinode Cluster, żid iċ-Cloudera Public GPG Key mar-repożitorju tiegħek billi tħaddem wieħed mill-kmand li ġej skont l-arkitettura tas-sistema tiegħek.
## on 32-bit System ## # rpm --import http://archive.cloudera.com/cdh4/redhat/6/i386/cdh/RPM-GPG-KEY-cloudera
## on 64-bit System ## # rpm --import http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
Sussegwentement, mexxi l-kmand li ġej biex tinstalla u ssettja JobTracker u NameNode fuq is-server Master.
yum clean all yum install hadoop-0.20-mapreduce-jobtracker
yum clean all yum install hadoop-hdfs-namenode
Għal darb'oħra, mexxi l-kmandi li ġejjin fuq is-server Master biex issettja n-nodu tal-isem sekondarju.
yum clean all yum install hadoop-hdfs-secondarynam
Sussegwentement, waqqaf it-tasktracker u d-datanode fuq l-hosts tal-clusters kollha (Node) minbarra l-hosts ta’ NameNode JobTracker, NameNode, u Sekondarji (jew Standby) (fuq in-node f’dan il-każ).
yum clean all yum install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode
Tista 'tinstalla l-klijent Hadoop fuq magna separata (f'dan il-każ installajt fuq datanode tista' tinstallah fuq kwalunkwe magna).
yum install hadoop-client
Issa jekk lestejna l-passi ta 'hawn fuq ejja nimxu 'l quddiem biex niskjeraw hdfs (li jrid isir fuq in-nodi kollha).
Ikkopja l-konfigurazzjoni default fid-direttorju /etc/hadoop (fuq kull node fil-cluster ).
cp -r /etc/hadoop/conf.dist /etc/hadoop/conf.my_cluster
cp -r /etc/hadoop/conf.dist /etc/hadoop/conf.my_cluster
Uża l-kmand tal-alternattivi biex issettja d-direttorju tad-dwana tiegħek, kif ġej (fuq kull nodu fil-cluster).
alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 reading /var/lib/alternatives/hadoop-conf alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 reading /var/lib/alternatives/hadoop-conf alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
Issa tiftaħ il-fajl core-site.xml u aġġorna fs.defaultFS fuq kull node fil-cluster.
cat /etc/hadoop/conf/core-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master/</value> </property> </configuration>
cat /etc/hadoop/conf/core-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master/</value> </property> </configuration>
Aġġornament li jmiss dfs.permissions.superusergroup f'hdfs-site.xml fuq kull node fil-cluster.
cat /etc/hadoop/conf/hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.name.dir</name> <value>/var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value> </property> <property> <name>dfs.permissions.superusergroup</name> <value>hadoop</value> </property> </configuration>
cat /etc/hadoop/conf/hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.name.dir</name> <value>/var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value> </property> <property> <name>dfs.permissions.superusergroup</name> <value>hadoop</value> </property> </configuration>
Nota: Jekk jogħġbok kun żgur li, il-konfigurazzjoni ta 'hawn fuq hija preżenti fuq in-nodi kollha (agħmel fuq nodu wieħed u mexxi scp biex tikkopja fuq il-bqija tan-nodi).
Aġġorna dfs.name.dir jew dfs.namenode.name.dir f'hdfs-site.xml fuq NameNode (fuq Master u Node). Jekk jogħġbok ibdel il-valur kif enfasizzat.
cat /etc/hadoop/conf/hdfs-site.xml
<property> <name>dfs.namenode.name.dir</name> <value>file:///data/1/dfs/nn,/nfsmount/dfs/nn</value> </property>
cat /etc/hadoop/conf/hdfs-site.xml
<property> <name>dfs.datanode.data.dir</name> <value>file:///data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn</value> </property>
Esegwixxi hawn taħt kmandi biex toħloq struttura tad-direttorju u timmaniġġja l-permessi tal-utent fuq il-magna Namenode (Master) u Datanode (Node).
mkdir -p /data/1/dfs/nn /nfsmount/dfs/nn chmod 700 /data/1/dfs/nn /nfsmount/dfs/nn
mkdir -p /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn chown -R hdfs:hdfs /data/1/dfs/nn /nfsmount/dfs/nn /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
Format in-Namenode (fuq Master), billi toħroġ il-kmand li ġej.
sudo -u hdfs hdfs namenode -format
Żid il-proprjetà li ġejja mal-fajl hdfs-site.xml u ibdel il-valur kif muri fuq Master.
<property> <name>dfs.namenode.http-address</name> <value>172.21.17.175:50070</value> <description> The address and port on which the NameNode UI will listen. </description> </property>
Nota: Fil-każ tagħna l-valur għandu jkun l-indirizz ip tal-VM kaptan.
Issa ejja niskjeraw MRv1 (Verżjoni 1 Map-reduce). Iftaħ il-fajl 'mapred-site.xml' wara l-valuri kif muri.
cp hdfs-site.xml mapred-site.xml vi mapred-site.xml cat mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapred.job.tracker</name> <value>master:8021</value> </property> </configuration>
Sussegwentement, ikkopja l-fajl 'mapred-site.xml' għal magna node billi tuża l-kmand scp li ġej.
scp /etc/hadoop/conf/mapred-site.xml node:/etc/hadoop/conf/ mapred-site.xml 100% 200 0.2KB/s 00:00
Issa kkonfigura direttorji tal-ħażna lokali biex jintużaw minn MRv1 Daemons. Għal darb'oħra iftaħ il-fajl mapred-site.xml u agħmel bidliet kif muri hawn taħt għal kull TaskTracker.
<property> Â <name>mapred.local.dir</name> Â <value>/data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local</value> </property>
Wara li tispeċifika dawn id-direttorji fil-fajl 'mapred-site.xml', trid toħloq id-direttorji u tassenja lilhom il-permessi korretti tal-fajl fuq kull node fil-cluster tiegħek.
mkdir -p /data/1/mapred/local /data/2/mapred/local /data/3/mapred/local /data/4/mapred/local chown -R mapred:hadoop /data/1/mapred/local /data/2/mapred/local /data/3/mapred/local /data/4/mapred/local
Issa mexxi l-kmand li ġej biex tibda HDFS fuq kull node fil-cluster.
for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done
for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done
Huwa meħtieġ li toħloq /tmp b'permessi xierqa eżattament kif imsemmi hawn taħt.
sudo -u hdfs hadoop fs -mkdir /tmp sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
sudo -u hdfs hadoop fs -mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging sudo -u hdfs hadoop fs -chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred
Issa ivverifika l-istruttura tal-Fajl HDFS.
sudo -u hdfs hadoop fs -ls -R / drwxrwxrwt - hdfs hadoop 0 2014-05-29 09:58 /tmp drwxr-xr-x - hdfs hadoop 0 2014-05-29 09:59 /var drwxr-xr-x - hdfs hadoop 0 2014-05-29 09:59 /var/lib drwxr-xr-x - hdfs hadoop 0 2014-05-29 09:59 /var/lib/hadoop-hdfs drwxr-xr-x - hdfs hadoop 0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache drwxr-xr-x - mapred hadoop 0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache/mapred drwxr-xr-x - mapred hadoop 0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache/mapred/mapred drwxrwxrwt - mapred hadoop 0 2014-05-29 09:59 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
Wara li tibda HDFS u toħloq '/tmp', iżda qabel tibda l-JobTracker jekk jogħġbok oħloq id-direttorju HDFS speċifikat mill-parametru 'mapred.system.dir' (b'mod awtomatiku $ {hadoop.tmp.dir}/mapred/system u ibdel is-sid għal mapred.
sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system sudo -u hdfs hadoop fs -chown mapred:hadoop /tmp/mapred/system
Biex tibda MapReduce: jekk jogħġbok ibda s-servizzi TT u JT.
service hadoop-0.20-mapreduce-tasktracker start Starting Tasktracker: [ OK ] starting tasktracker, logging to /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-tasktracker-node.out
service hadoop-0.20-mapreduce-jobtracker start Starting Jobtracker: [ OK ] starting jobtracker, logging to /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-master.out
Sussegwentement, oħloq direttorju tad-dar għal kull utent ta 'hadoop. huwa rakkomandat li tagħmel dan fuq NameNode; pereżempju.
sudo -u hdfs hadoop fs -mkdir /user/<user> sudo -u hdfs hadoop fs -chown <user> /user/<user>
Nota: fejn huwa l-isem tal-utent tal-Linux ta' kull utent.
Inkella, tista' toħloq id-direttorju tad-dar kif ġej.
sudo -u hdfs hadoop fs -mkdir /user/$USER sudo -u hdfs hadoop fs -chown $USER /user/$USER
Iftaħ il-browser tiegħek u ikteb l-url bħala http://ip_address_of_namenode:50070 biex taċċessa Namenode.
Iftaħ tab oħra fil-browser tiegħek u ikteb l-url bħala http://ip_address_of_jobtracker:50030 biex taċċessa JobTracker.
Din il-proċedura ġiet ittestjata b'suċċess fuq RHEL/CentOS 5.X/6.X. Jekk jogħġbok ikkummenta hawn taħt jekk tiffaċċja xi kwistjonijiet bl-installazzjoni, ngħinek bis-soluzzjonijiet.