Spark平台(高级版十七)附件

完整目录、平台简介、安装环境及版本:参考《Spark平台(高级版)概览》

十七、附件

17.1 附件A:自动启停操作

随着安装软件越来越多,启动或者停止集群会越来越繁琐,需要启动的东西越来越多,而且顺序还容易乱。

自动化脚本位置:

ssh远程登录app-11,切换到hadoop用户:su – hadoop

所有启停工作都在app-11上进行。

切换到hadoop目录:cd /hadoop

删除已有的配置脚本:rm -rf config.conf startAll.sh stopAll.sh

上传三个配置文件:config.conf startAll.sh stopAll.sh

赋予可执行权限:chmod a+x startAll.sh stopAll.sh

将remoteSSH.exp上传至/hadoop/tools文件夹

17.1.1 config.conf 

export EXPECT2TCL_IS_INSTALL=True
export CONFIGHOSTS_IS_INSTALL=True
export CREATEHADOOPUSER_IS_INSTALL=True
export JDK_IS_INSTALL=True
export PROTOBUF_IS_INSTALL=True
export PACKAGES_IS_INSTALL=True
export SETUPHADOOPSSHENV_IS_INSTALL=True
export ZOOKEEPER_IS_INSTALL=True
export HADOOP_IS_INSTALL=True
export TEZ_IS_INSTALL=True
export MYSQL_IS_INSTALL=True
export HIVE_IS_INSTALL=True
export SCALA_IS_INSTALL=True
export SPARK_IS_INSTALL=True
export ANACONDA_IS_INSTALL=True
export JUPYTER_IS_INSTALL=True
export TOREE_IS_INSTALL=True
export OOZIE_IS_INSTALL=True

启停开关,在启动过程中,启动哪些,不启动哪些。不启动的组件通过在前面添加#进行开闭。

17.1.2 remoteSSH.exp

root账号执行,如:/hadoop/tools/expect/bin/expect /hadoop/tools/remoteSSH.exp root Yhf_1018 app-12 “systemctl start mysqld.service”

  • 第一个参数:用户名:root
  • 第二个参数:密码:Yhf_1018
  • 第三个参数:主机:app-12(host 文件配置)
  • 第四个参数:需要执行的命令
#!/usr/local/bin/expect
####################################
# Name  : sshLoginTest.exp
# Desc   : auto switch to usr($argv 0) with password($argv 1)
#             and host($argv 2)
#             and execute cmd($argv 3)
# Use     : /usr/local/bin/expect sshLoginTest.exp root *** "pwd"
####################################
set user [ lindex $argv 0 ]
set passwd [ lindex $argv 1 ]
set host [ lindex $argv 2 ]
set cmd [ lindex $argv 3 ]
set timeout -1
spawn ssh $user@$host
expect {
    "continue connecting (yes/no)? " {
            send "yes\n";
            exp_continue;
    }
	"password: " {
            send "$passwd\n";
            exp_continue;
    }
    "*]# " { }
    "*]$ " { }
}

send "$cmd\n"
expect {
    "*]# " { }
    "*]$ " { }
}

send "exit\n"
expect {
    "*]# " { }
    "*]$ " { }
}
return 0;

17.1.3 remoteSSHNOTroot.exp

非root账号执行,如:/hadoop/tools/expect/bin/expect /hadoop/tools/remoteSSHNOTroot.exp hadoop Yhf_1018 app-12 “mapred –daemon start historyserver”

  • 第一个参数:用户名:hadoop
  • 第二个参数:密码:Yhf_1018
  • 第三个参数:主机:app-12(host 文件配置)
  • 第四个参数:需要执行的命令
#!/usr/local/bin/expect
####################################
# Name  : sshLoginTest.exp
# Desc   : auto switch to usr($argv 0) with password($argv 1)
#             and host($argv 2)
#             and execute cmd($argv 3)
# Use     : /usr/local/bin/expect sshLoginTest.exp root *** "pwd"
####################################
set user [ lindex $argv 0 ]
set passwd [ lindex $argv 1 ]
set host [ lindex $argv 2 ]
set cmd [ lindex $argv 3 ]
set timeout -1
spawn ssh $user@$host
expect {
    "continue connecting (yes/no)? " {
            send "yes\n";
            exp_continue;
    }
	"password: " {
            send "$passwd\n";
            exp_continue;
    }
    "*]# " { }
    "*]$ " { }
}

send "$cmd\n"
expect {
    "*]# " { }
    "*]$ " { }
}

send "exit\n"
expect {
    "*]# " { }
    "*]$ " { }
}
return 0;

17.1.4 startAll.sh 

根据config.conf配置开关,进行启动工作,同时由于启动开关以环境变量的形式存在,所有需要通过source生效环境变量。

执行路径:/hadoop

执行用户:hadoop

#!/bin/sh
	export LOCAL_DIR=$(pwd)
	source $LOCAL_DIR/config.conf
	
	if [ "hadoop" != `whoami` ]; then echo "run in hadoop user" && exit ; fi
	#start zookeeper
	if [ "$ZOOKEEPER_IS_INSTALL" = "True" ]; then
		cd /hadoop/tools && ./startZookeeper.sh
	fi
	#start hadoop
	if [ "$HADOOP_IS_INSTALL" = "True" ]; then
		cd /hadoop/Hadoop/hadoop-3.1.2/sbin && ./start-all.sh
		/hadoop/tools/expect/bin/expect /hadoop/tools/remoteSSHNOTroot.exp hadoop Yhf_1018 app-12 "mapred --daemon start historyserver"
	fi
	#start mysql
	if [ "$MYSQL_IS_INSTALL" = "True" ]; then
		/hadoop/tools/expect/bin/expect /hadoop/tools/remoteSSH.exp root Yhf_1018 app-12 "systemctl start mysqld.service"
	fi
	#start hive
	if [ "$HIVE_IS_INSTALL" = "True" ]; then
		/hadoop/tools/expect/bin/expect /hadoop/tools/remoteSSHNOTroot.exp hadoop Yhf_1018 app-12 "cd /hadoop/Hive/apache-hive-3.1.1-bin/bin && nohup ./hive --service metastore > /hadoop/Hive/apache-hive-3.1.1-bin/log/metastore.log 2>&1 &"
		/hadoop/tools/expect/bin/expect /hadoop/tools/remoteSSHNOTroot.exp hadoop Yhf_1018 app-12 "cd /hadoop/Hive/apache-hive-3.1.1-bin/bin && nohup ./hive --service hiveserver2 > /hadoop/Hive/apache-hive-3.1.1-bin/log/hiveserver2.log 2>&1 &"
	fi
	#start spark
	if [ "$SPARK_IS_INSTALL" = "True" ]; then
		ssh app-11 "cd /hadoop/Spark/spark-2.4.0-bin-hadoop3.1.2/sbin && ./start-all.sh"
	fi
	#start oozie
	if [ "$OOZIE_IS_INSTALL" = "True" ]; then
		cd /hadoop/tools && ./startOozie.sh
	fi
	#start submarine docker
	if [ "$DOCKER_IS_INSTALL" = "True" ]; then
		cd /hadoop/tools && ./startDockerDaemon.sh
	fi
	#start jupyter
	if [ "$JUPYTER_IS_INSTALL" = "True" ]; then
		cd /hadoop/tools && ./startJupyter.sh
	fi
	echo "startAll cluster finished"
17.1.4.1 设置运行环境:
export LOCAL_DIR=$(pwd)
source $LOCAL_DIR/config.conf
if [ "hadoop" != `whoami` ]; then echo "run in hadoop user" && exit ; fi
17.1.4.2 启动Zookeeper
#start zookeeper
if [ "$ZOOKEEPER_IS_INSTALL" = "True" ]; then
	cd /hadoop/tools && ./startZookeeper.sh
fi
#!/bin/sh
nodeArray="app-13:2888:3888 app-12:2888:3888 app-11:2888:3888 "
for node in $nodeArray
do
	t=$(echo $node | cut -d ":" -f1)
	/hadoop/tools/expect/bin/expect /hadoop/tools/remoteSSHNOTroot.exp hadoop Yhf_1018 $t "zkServer.sh start"
	/hadoop/tools/expect/bin/expect /hadoop/tools/remoteSSHNOTroot.exp hadoop Yhf_1018 $t "zkServer.sh status"
done
17.1.4.3 启动Hadoop,同时启动app-12节点的historyserver

Hadoop自带了一个历史服务器,可以通过历史服务器查看已经运行完的Mapreduce作业记录,比如用了多少个Map、用了多少个Reduce、作业提交时间、作业启动时间、作业完成时间等信息。默认情况下,Hadoop历史服务器是没有启动的,我们可以通过下面的命令来启动Hadoop历史服务器

#start hadoop
if [ "$HADOOP_IS_INSTALL" = "True" ]; then
	cd /hadoop/Hadoop/hadoop-3.1.2/sbin && ./start-all.sh
	/hadoop/tools/expect/bin/expect /hadoop/tools/remoteSSHNOTroot.exp hadoop Yhf_1018 app-12 "mapred --daemon start historyserver"
fi

17.1.4.4 启动MySQL
#start mysql
if [ "$MYSQL_IS_INSTALL" = "True" ]; then
	/hadoop/tools/expect/bin/expect /hadoop/tools/remoteSSH.exp root Yhf_1018 app-12 "systemctl start mysqld.service"
fi
17.1.4.5 启动Hive
#start hive
if [ "$HIVE_IS_INSTALL" = "True" ]; then
	/hadoop/tools/expect/bin/expect /hadoop/tools/remoteSSHNOTroot.exp hadoop Yhf_1018 app-12 "cd /hadoop/Hive/apache-hive-3.1.1-bin/bin && nohup ./hive --service metastore > /hadoop/Hive/apache-hive-3.1.1-bin/log/metastore.log 2>&1 &"
	/hadoop/tools/expect/bin/expect /hadoop/tools/remoteSSHNOTroot.exp hadoop Yhf_1018 app-12 "cd /hadoop/Hive/apache-hive-3.1.1-bin/bin && nohup ./hive --service hiveserver2 > /hadoop/Hive/apache-hive-3.1.1-bin/log/hiveserver2.log 2>&1 &"
fi
17.1.4.6 启动Spark
#start spark
if [ "$SPARK_IS_INSTALL" = "True" ]; then
	ssh app-11 "cd /hadoop/Spark/spark-2.4.0-bin-hadoop3.1.2/sbin && ./start-all.sh"
fi
17.1.4.7 启动Oozie
#start oozie
if [ "$OOZIE_IS_INSTALL" = "True" ]; then
	cd /hadoop/tools && ./startOozie.sh
fi
17.1.4.8 启动Docker
#start submarine docker
if [ "$DOCKER_IS_INSTALL" = "True" ]; then
	cd /hadoop/tools && ./startDockerDaemon.sh
fi
17.1.4.9 启动Jupyter
#start jupyter
if [ "$JUPYTER_IS_INSTALL" = "True" ]; then
	cd /hadoop/tools && ./startJupyter.sh
fi

17.1.5 stopAll.sh 

类似startAll.sh,根据配置一次停止。

执行路径:/hadoop

执行用户:hadoop

#!/bin/sh
	export LOCAL_DIR=$(pwd)
	source $LOCAL_DIR/config.conf
	
	if [ "hadoop" != `whoami` ]; then echo "run in hadoop user" && exit ; fi
	#stop jupyter
	if [ "$JUPYTER_IS_INSTALL" = "True" ]; then
		cd /hadoop/tools && ./stopJupyter.sh
	fi
	#stop submarine docker
	if [ "$DOCKER_IS_INSTALL" = "True" ]; then
		cd /hadoop/tools && ./stopDockerDaemon.sh
	fi
	#stop oozie
	if [ "$OOZIE_IS_INSTALL" = "True" ]; then
		cd /hadoop/tools && ./stopOozie.sh
	fi
	#stop spark
	if [ "$SPARK_IS_INSTALL" = "True" ]; then
		ssh app-11 "cd /hadoop/Spark/spark-2.4.0-bin-hadoop3.1.2/sbin && ./stop-all.sh"
	fi
	#stop hive
	if [ "$HIVE_IS_INSTALL" = "True" ]; then
		ssh app-12 "pid=\$(ps x|grep hive|awk '{print \$1}') && for i in \$pid; do kill -9 \$i; done"
	fi
	#stop mysql
	if [ "$MYSQL_IS_INSTALL" = "True" ]; then
		/hadoop/tools/expect/bin/expect /hadoop/tools/remoteSSH.exp root Yhf_1018 app-12 "systemctl stop mysqld.service"
	fi
	#stop hadoop
	if [ "$HADOOP_IS_INSTALL" = "True" ]; then
		cd /hadoop/Hadoop/hadoop-3.1.2/sbin && ./stop-all.sh
		/hadoop/tools/expect/bin/expect /hadoop/tools/remoteSSHNOTroot.exp hadoop Yhf_1018 app-12 "mapred --daemon stop historyserver"
	fi
	#stop zookeeper
	if [ "$ZOOKEEPER_IS_INSTALL" = "True" ]; then
		cd /hadoop/tools && ./stopZookeeper.sh
	fi
	
	echo "stopAll cluster finished"

    关闭顺序和启动顺序相反关闭即可。

17.1.6 启动

自动启动,hadoop用户在/hadoop目录下:./startAll.sh

验证:jps

  • QuorumPeerMain:zookeeper独立的进程
  • DFSZKFailoverController:HDFS NameNode HA实现的中心组件,它负责整体的故障转移控制等。它是一个守护进程。
  • JournalNodes:数据同步,会通过一组称作JournalNodes的独立进程进行相互通信。
  • NameNode:是Master节点,有点类似Linux里的根目录。管理数据块映射;处理客户端的读写请求;配置副本策略;管理HDFS的名称空间;
  • DataNode:负责存储client发来的数据块block;执行数据块的读写操作。是NameNode的小弟。
  • ResourceManager:是一个中心的服务,它做的事情是调度、启动每一个Job所属的 ApplicationMaster、另外监控 ApplicationMaster 的存在情况。
  • NodeManager:是每一台机器框架的代理,是执行应用程序的容器,监控应用程序的资源使用情况(CPU,内存,硬盘,网络)并且向调度器(ResourceManager)汇报。
  • JobHistoryServer:Hadoop自带了一个历史服务器,可以通过历史服务器查看已经运行完的Mapreduce作业记录,比如用了多少个Map、用了多少个Reduce、作业提交时间、作业启动时间、作业完成时间等信息。
  • RunJar:完成job的初始化,包括获取jobID,将jar包上传至hdfs等。
  • Master、Workder:Spark节点,其中app-11设置为master,app-11、app-12、app-13为Worker。

17.1.7 停止

自动停止,hadoop用户在/hadoop目录下:./stopAll.sh

确认:jps

17.2 附件B:修改hosts文件

打开hosts文件,Hosts文件位于:C:\Windows\System32\drivers\etc

修改windows的hosts文件,添加:

192.168.56.102 app-11
192.168.56.103 app-12
192.168.56.104 app-13

17.3 附件C:环境变量

Hadoop用户的环境变量文件位于:~/.bashrc

具体内容如下:

# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
	. /etc/bashrc
fi

# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=

# User specific aliases and functions
export JAVA_HOME=/hadoop/JDK/jdk1.8.0_131
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
export PROTOBUF_HOME=/hadoop/tools/protobuf-2.5.0
export PATH=${PROTOBUF_HOME}/bin:$PATH
export ZOOKEEPER_HOME=/hadoop/ZooKeeper/zookeeper-3.4.10
export PATH=${ZOOKEEPER_HOME}/bin:$PATH
export HADOOP_HOME=/hadoop/Hadoop/hadoop-3.1.2
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${HADOOP_HOME}/lib:$PATH
export HIVE_HOME=/hadoop/Hive/apache-hive-3.1.1-bin
export PATH=${HIVE_HOME}/bin:$PATH
export SCALA_HOME=/hadoop/Scala/scala-2.11.12
export PATH=${SCALA_HOME}/bin:$PATH
export SPARK_HOME=/hadoop/Spark/spark-2.4.0-bin-hadoop3.1.2
export PATH=${SPARK_HOME}/bin:$PATH
export ANACONDA_HOME=/hadoop/Anaconda/Anaconda3-2018.12-Linux-x86_64
export PATH=${ANACONDA_HOME}/bin:$PATH
export MVN_HOME=/hadoop/tools/apache-maven-3.6.0
export PATH=$PATH:${MVN_HOME}/bin
export OOZIE_HOME=/hadoop/Oozie/oozie-5.0.0
export OOZIE_CONFIG=$OOZIE_HOME/conf
export PATH=$PATH:${OOZIE_HOME}/bin

修改完环境变量以后,需要同步到app-12和app-13

scp ~/.bashrc app-12:/home/hadoop/

scp ~/.bashrc app-13:/home/hadoop/

并启用:source ~/.bashrc

发表回复