hadoop环境搭建(四)
完全分布式部署
一、配置集群
|
100 |
101 |
102 |
HDFS |
namenode、 datanode |
datanode |
Secondary namenode(2nn)、datanode |
YARN |
nodemanager |
resourcemanager、nodemanager |
nodemanager |
二、修改配置:
切换路径:cd /opt/module/hadoop-3.1.3/etc/hadoop/
(1)配置core-site.xml,在cd /opt/module/hadoop-3.1.3/etc/hadoop/路径下,
vi core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1000:9820</value>
</property>
<!-- hadoop.data.dir是自定义的变量,下面的配置文件会用到 -->
<property>
<name>hadoop.data.dir</name>
<value>/opt/module/hadoop-3.1.3/data</value>
</property>
</configuration>
(2)配置hdfs-site.xml,在cd /opt/module/hadoop-3.1.3/etc/hadoop/路径下,
vi hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- namenode数据存放位置 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file://${hadoop.data.dir}/name</value>
</property>
<!-- datanode数据存放位置 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>file://${hadoop.data.dir}/data</value>
</property>
<!-- secondary namenode数据存放位置 -->
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file://${hadoop.data.dir}/namesecondary</value>
</property>
<!-- datanode重启超时时间是30s,解决兼容性问题,跳过 -->
<property>
<name>dfs.client.datanode-restart.timeout</name>
<value>30</value>
</property>
<!-- 设置web端访问namenode的地址 -->
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop1000:9870</value>
</property>
<!-- 设置web端访问secondary namenode的地址 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop1002:9868</value>
</property>
</configuration>
(3)配置yarn-site.xml,在cd /opt/module/hadoop-3.1.3/etc/hadoop/路径下,
vi yarn-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop1001</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
三、ssh免密登录
免密登录之前 每次远程访问主机都需要输密码:
1、在每个节点生成公钥和私钥,并拷贝
Hadoop1000 生成公钥和私钥:
[root@hadoop1001] ssh-keygen -t rsa
然后敲(三个回车)
将公钥拷贝到要免密登录的目标机器上
[root@hadoop1000] ssh-copy-id hadoop1000
[root@hadoop1000] ssh-copy-id hadoop1001
[root@hadoop1000] ssh-copy-id hadoop1002
Hadoop1001:生成公钥和私钥
[root@hadoop1001] ssh-keygen -t rsa
然后敲(三个回车)
将公钥拷贝到要免密登录的目标机器上
[root@hadoop1001] ssh-copy-id hadoop1000
[root@hadoop1001] ssh-copy-id hadoop1001
[root@hadoop1001] ssh-copy-id hadoop1002
Hadoop1002: 生成公钥和私钥
[root@hadoop1002] ssh-keygen -t rsa
然后敲(三个回车)
将公钥拷贝到要免密登录的目标机器上
[root@hadoop1002] ssh-copy-id hadoop1000
[root@hadoop1002] ssh-copy-id hadoop1001
[root@hadoop1002] ssh-copy-id hadoop1002
远程访问主机 命令:ssh hadoop1001(设置免密登录后,不用输密码直接进入hadoop1001)
登出:exit 路径切换到etc:cd ..
四、复制文件
复制文件到hadoop1001:
scp -r hadoop/ root@hadoop1001:/opt/module/hadoop-3.1.3/etc/
复制文件到hadoop1002:
scp -r hadoop/ root@hadoop1002:/opt/module/hadoop-3.1.3/etc/
五、格式化namenode (namenode启动不了需要删除data文件和logs)
命令:hdfs namenode -format
六、集群单点启动
Hadoop1000:
hdfs --daemon start namenode
hdfs --daemon start datanode
yarn --daemon start nodemanager
hadoop1001:
yarn --daemon start resourcemanager
hdfs --daemon start datanode
yarn --daemon start nodemanager
hadoop1002:
hdfs --daemon start secondarynamenode
hdfs --daemon start datanode
yarn --daemon start nodemanager