Hadoop完全分布式搭建
hadoop完全分布式搭建
- 1. 准备工作
- 2. 环境搭建
-
- 1.修改主机名
- 2. 关闭防火墙
- 3.修改hosts文件
- 4.配置ssh,无密码登录
- 5.安装jdk
- 6.安装hadoop
-
- 1.解压
- 2.将Hadoop添加到环境变量,vi /etc/profile
- 3.将 profile分配到其他节点,再source一下生效
- 4.创建hdfs存储目录
- 5.修改/hadoop-2.9.2/etc/jadoop/hadoop-env.sh文件,设置JAVA_HOME 为实际路径
- 6.修改/hadoop-2.9.2/etc/jadoop/yarn-env.sh文件,设置JAVA_HOME 为实际路径
- 7.配置/hadoop-2.9.2/etc/hadoop/core-site.xml
- 8.配置/hadoop-2.9.2/etc/hadoop/hdfs-site.xml
- 9.配置/hadoop-2.9.2/etc/hadoop/mapred-site.xml
- 10. 配置/hadoop-2.9.2/etc/hadoop/yarn-site.xml
- 11. 配置/hadoop-2.9.2/etc/hadoop/slaves
- 12.发送到其他节点上
- 13.格式化namenode
- 14.启动hadoop
- 15.访问web页面
- 16.运行实例
1. 准备工作
1.1. 软件版本
jdk: 1.8
hadoop:2.9.2
系统:centos7
安装包统一放在 /usr/local/src目录下
1.2. 集群规划
编号 | 主机名 | ip地址 | 节点类型 |
---|---|---|---|
1 | master | 192.168.1.101 | NameNode、SecondaryNameNode、ResourceManager |
2 | slave1 | 192.168.1.102 | NodeManager、DataNode |
3 | slave2 | 192.168.1.103 | NodeManager、DataNode |
2. 环境搭建
1.修改主机名
在三个节点上分别执行
hostnamectl set-hostname master
hostnamectl set-hostname slave1
hostnamectl set-hostname slave2
2. 关闭防火墙
集群上每个节点的防火墙都需要关闭
systemctl stop firewalld
systemctl disable firewalld
3.修改hosts文件
vi /etc/hosts
hosts添加下面三行
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.101 master
192.168.1.102 slave1
192.168.1.103 slave2
把hosts文件复制到其他节点上(需要输入yes,然后输入目标节点的用户密码
scp /etc/hosts root@slave1:/etc/
scp /etc/hosts root@slave2:/etc/
4.配置ssh,无密码登录
生成公钥和私钥
ssh-keygen -t rsa
连续按回车出现以下图像
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:0f4Tz5jw1zbR3t9j8RH1bOcwhg1BZwC7jkf1sUDfQTM root@master
The key's randomart image is:
+---[RSA 2048]----+
| .o=o+E |
| . o.+ .=|
| . o o+o.+|
| o o.o=+*|
| S = ..o*+|
| + + * =+|
| . o * +.O|
| . o +=|
| . +|
+----[SHA256]-----+
将公钥拷贝到要免密登录的目标机器上
ssh-copy-id master
ssh-copy-id slave1
ssh-copy-id slave2
测试效果
[root@master src]# ssh slave1
Last login: Wed Nov 10 15:34:09 2021 from 192.168.1.17
[root@slave1 ~]#
5.安装jdk
解压,重命名文件夹
tar -xvf jdk-8u261-linux-x64.tar.gz
mv jdk1.8.0_261 jdk1.8
追加环境变量
vi /etc/profile
在文件末尾添加
# java environment
export JAVA_HOME=/usr/local/src/jdk1.8 # java解压的路径
export PATH=$PATH:$JAVA_HOME/bin
让修改后的文件生效
source /etc/profile
测试安装是否成功
[root@master src]# java -version
java version "1.8.0_261"
Java(TM) SE Runtime Environment (build 1.8.0_261-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.261-b12, mixed mode)
把jdk和profile复制到其他节点
scp -r /usr/local/src/jdk1.8 root@slave1:/usr/local/src/
scp -r /usr/local/src/jdk1.8 root@slave2:/usr/local/src/
scp /etc/profile root@slave1:/etc/
scp /etc/profile root@slave2:/etc/
在其他节点使用 source /etc/profile 让环境生效
6.安装hadoop
1.解压
tar -zxvf hadoop-2.9.2.tar.gz
2.将Hadoop添加到环境变量,vi /etc/profile
#hadoop envrionment
export HADOOP_HOME=/usr/local/src/hadoop-2.9.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
3.将 profile分配到其他节点,再source一下生效
scp /etc/profile root@slave1:/etc/
scp /etc/profile root@slave2:/etc/
4.创建hdfs存储目录
(注意hadoop-2.9.2在/usr/local/src/目录下
/hadoop-2.9.2/hdfs/name --存储namenode文件
/hadoop-2.9.2/hdfs/data --存储数据
/hadoop-2.9.2/hdfs/tmp --存储临时文件
cd /usr/local/src/hadoop-2.9.2
mkdir hdfs
cd hdfs
mkdir name data tmp
5.修改/hadoop-2.9.2/etc/jadoop/hadoop-env.sh文件,设置JAVA_HOME 为实际路径
cd /usr/local/src/hadoop-2.9.2/etc/hadoop/
vi hadoop-env.sh
把原来的注释掉
# The java implementation to use.
# export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/local/src/jdk1.8
6.修改/hadoop-2.9.2/etc/jadoop/yarn-env.sh文件,设置JAVA_HOME 为实际路径
vi yarn-env.sh
把原来的注释下面添加
# some Java parameters
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
export JAVA_HOME=/usr/local/src/jdk1.8
7.配置/hadoop-2.9.2/etc/hadoop/core-site.xml
vi core-site.xml
在configuration中添加
<configuration>
# 临时存储目
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/src/hadoop-2.9.2/hdfs/tmp</value>
</property>
# hdfs文件系统地址和端口
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
8.配置/hadoop-2.9.2/etc/hadoop/hdfs-site.xml
vi hdfs-site.xml
在configuration中添加
<configuration>
# 数据副本数量
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
# namenode存储目录
<property>
<name>dfs.name.dir</name>
<value>/usr/local/src/hadoop-2.9.2/hdfs/name</value>
</property>
# 数据存储目录
<property>
<name>dfs.data.dir</name>
<value>/usr/local/src/hadoop-2.9.2/hdfs/data</value>
</property>
# 关闭上传hdfs文件权限检查
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
9.配置/hadoop-2.9.2/etc/hadoop/mapred-site.xml
根据目标复制一份出来
cp mapred-site.xml.template mapred-site.xml
vi mapred-site.xml
在configuration中添加
<configuration>
# 指定mapreduce在yarn平台运行
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
10. 配置/hadoop-2.9.2/etc/hadoop/yarn-site.xml
vi yarn-site.xml
在configuration中添加
<configuration>
<!-- Site specific YARN configuration properties -->
# resourcemanager地址
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
# reducer获取数据的方式
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
# 忽略虚拟内存检查
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
11. 配置/hadoop-2.9.2/etc/hadoop/slaves
vi slaves
删除原有的内容,添加如下内容
slave1
slave2
12.发送到其他节点上
cd /usr/local/src/
scp -r hadoop-2.9.2 root@slave1:$PWD # $PWD获取当前所在目录下的绝对路径
scp -r hadoop-2.9.2 root@slave2:$PWD
13.格式化namenode
hadoop namenode -format
如果有这一行说明格式化成功
14.启动hadoop
start-all.sh
查看各个节点情况
jps # jdk的命令
master
[root@master src]# jps
15636 NameNode
17014 Jps
16493 ResourceManager
16255 SecondaryNameNode
slave1,slave2
[root@slave1 src]# jps
14134 NodeManager
15739 Jps
13565 DataNode
15.访问web页面
访问hdfs页面 http://192.168.1.101:50070
访问yarn页面 http://192.168.1.101:8088
16.运行实例
cd hadoop-2.9.2/share/hadoop/mapreduce/
hadoop jar hadoop-mapreduce-examples-2.9.2.jar pi 5 10
[root@master mapreduce]# hadoop jar hadoop-mapreduce-examples-2.9.2.jar pi 5 10
Number of Maps = 5
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Starting Job
21/11/12 10:57:01 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.101:8032
21/11/12 10:57:01 INFO input.FileInputFormat: Total input files to process : 5
21/11/12 10:57:01 INFO mapreduce.JobSubmitter: number of splits:5
21/11/12 10:57:01 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
21/11/12 10:57:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1636685278166_0001
21/11/12 10:57:02 INFO impl.YarnClientImpl: Submitted application application_1636685278166_0001
21/11/12 10:57:02 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1636685278166_0001/
21/11/12 10:57:02 INFO mapreduce.Job: Running job: job_1636685278166_0001
21/11/12 10:57:08 INFO mapreduce.Job: Job job_1636685278166_0001 running in uber mode : false
21/11/12 10:57:08 INFO mapreduce.Job: map 0% reduce 0%
21/11/12 10:57:19 INFO mapreduce.Job: map 100% reduce 0%
21/11/12 10:57:24 INFO mapreduce.Job: map 100% reduce 100%
21/11/12 10:57:24 INFO mapreduce.Job: Job job_1636685278166_0001 completed successfully
21/11/12 10:57:24 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=116
FILE: Number of bytes written=1192839
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1300
HDFS: Number of bytes written=215
HDFS: Number of read operations=23
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=5
Launched reduce tasks=1
Data-local map tasks=5
Total time spent by all maps in occupied slots (ms)=42055
Total time spent by all reduces in occupied slots (ms)=2317
Total time spent by all map tasks (ms)=42055
Total time spent by all reduce tasks (ms)=2317
Total vcore-milliseconds taken by all map tasks=42055
Total vcore-milliseconds taken by all reduce tasks=2317
Total megabyte-milliseconds taken by all map tasks=43064320
Total megabyte-milliseconds taken by all reduce tasks=2372608
Map-Reduce Framework
Map input records=5
Map output records=10
Map output bytes=90
Map output materialized bytes=140
Input split bytes=710
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=140
Reduce input records=10
Reduce output records=0
Spilled Records=20
Shuffled Maps =5
Failed Shuffles=0
Merged Map outputs=5
GC time elapsed (ms)=4892
CPU time spent (ms)=2690
Physical memory (bytes) snapshot=1675964416
Virtual memory (bytes) snapshot=12723679232
Total committed heap usage (bytes)=1073741824
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=590
File Output Format Counters
Bytes Written=97
Job Finished in 23.816 seconds
Estimated value of Pi is 3.28000000000000000000
[root@master mapreduce]#
本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
THE END
二维码