完整目录、平台简介、安装环境及版本:参考《Spark平台(精简版)概览》
二、Hadoop单节点安装
2.1 安装JDK
查看是否已安装:java -version
安装:sudo apt-get install default-jdk
查看安装版本:java -version
查看安装路径:update-alternatives --display java
2.2 SSH免密登录
安装SSH:sudo apt-get install
安装rsync:sshsudo apt-get install rsync
产生SSH KEY:ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
进入生成目录:ll /home/hduser/.ssh
查看生成目录:ll ~/.ssh
key放入许可文件:cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
2.3 下载安装Hadoop
远程下载:wget //ftp.twaren.net/Unix/Web/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
解压缩:sudo tar -zxvf hadoop-2.6.0.tar.gz
移动:sudo mv hadoop-2.6.0 /usr/local/hadoop
2.4 设定Hadoop环境变数
修改:sudo gedit ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
环境变量生效:source ~/.bashrc
2.5 配置文件
2.5.1 修改hadoop-env.sh
sudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
2.5.2 修改core-site.xml
sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
2.5.3 修改yarn-site.xml
sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
2.5.4 修改mapred-site.xml
sudo cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
2.5.5 修改hdfs-site.xml
sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/hadoop_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/hadoop_data/hdfs/datanode</value>
</property>
2.6 建立与格式化HDFS 目录
新建目录:sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
新建目录:sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode
权限:sudo chown hduser:hduser -R /usr/local/hadoop
格式化:hadoop namenode -format
2.7 启动Hadoop
先启动:start-dfs.sh
再启动:start-yarn.sh
还可以启动全部:start-all.sh
检测进程:jps
- HDFS需要:NameNode、SecondaryNameNode、DataNode
- MapReduce2(YARM)需要:ResourceManager、NodeManager
2.7.1 故障:NameNode未启动
查看日志:cd /usr/local/hadoop/logs
打开:cat hadoop-hduer-namenode-hadoop.log
显示NameNode is not formatted
停止:stop-all.sh
重新格式化:hadoop namenode -format
提示格式化成功
启动:start-all.sh
再次查看进程:jsp
2.8 开启Hadoop ResourceManager Web接口
浏览器输入:http://192.168.0.40:8088
2.9 NameNode HDFS Web接口
浏览器输入:http://192.168.0.40:50070