Cluster集群部署
大约 5 分钟
经典集群拓扑结构
命名和网络规划
主机名 | 管理网/16 | 计算网/16 | IPMI/16 | 备注 |
---|---|---|---|---|
mgt01 | 192.168.0.254 | 10.1.0.254 | 172.16.0.254 | 管理节点 |
c01n01 | 192.168.1.1 | 10.1.1.1 | 172.16.1.1 | 刀片节点 |
c01n02 | 192.168.1.2 | 10.1.1.2 | 172.16.1.2 | 刀片节点 |
c01n03 | 192.168.1.3 | 10.1.1.3 | 172.16.1.3 | 刀片节点 |
c01n04 | 192.168.1.4 | 10.1.1.4 | 172.16.1.4 | 刀片节点 |
node01 | 192.168.0.1 | 10.1.0.1 | 172.16.0.1 | 机架计算节点 |
node02 | 192.168.0.2 | 10.1.0.2 | 172.16.0.2 | 机架计算节点 |
graphic01 | 192.168.0.101 | 10.1.0.101 | 172.16.0.101 | 图形节点 |
graphic02 | 192.168.0.102 | 10.1.0.102 | 172.16.0.102 | 图形节点 |
io01 | 192.168.0.201 | 10.1.0.201 | 172.16.0.201 | 存储节点 |
io01 | 192.168.0.202 | 10.1.0.202 | 172.16.0.202 | 存储节点 |
目录规划
目录 | 说明 | 备注 |
---|---|---|
/data | 共享目录 | |
/data/apps | 应用软件安装目录 | |
/data/home | 用户家目录 |
管理节点安装
假设本例中管理节点同时作为存储节点,除系统盘外,本地大容量磁盘做raid5,挂载点为/share, 应用软件安装目录为/share/apps, 用户家目录为/share/home
操作系统配置
- 操作系统centOS7 最小化安装
- 关闭防火墙,selinux, NetworkManager 重启服务器
- 配置网络ip和主机名
- 配置/etc/hosts,写入所有节点的 主机和ip对应关系,包括计算网和IPMI网
- 将光盘配置为yum源
NIS服务配置
#!/bin/bash
#安装nis
yum -y install rpcbind \
ypbind \
yp-tools \
ypserv
#设置nis域名,默认为hpc.local
[ "x${MYDOMAINNAME}" = "x" ] && MYDOMAINNAME="hpc.local"
sed -i '/NISDOMAIN/d' /etc/sysconfig/network
echo "NISDOMAIN=${MYDOMAINNAME}" >> /etc/sysconfig/network
#启动服务
systemctl restart network
systemctl start ypserv
systemctl start yppasswdd
systemctl enable ypserv
systemctl enable yppasswdd
#初始化数据库
/usr/lib64/yp/ypinit -m
NTP服务配置
设置集群本地NTP服务器。
安装ntp:
yum -y install ntp
修改/etc/ntp.conf,如下:
# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict -6 ::1
# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.rhel.pool.ntp.org iburst
#server 1.rhel.pool.ntp.org iburst
#server 2.rhel.pool.ntp.org iburst
#server 3.rhel.pool.ntp.org iburst
server 127.127.1.0
fudge 127.127.1.0 stratum 8
#broadcast 192.168.1.255 autokey # broadcast server
#broadcastclient # broadcast client
#broadcast 224.0.1.1 autokey # multicast server
#multicastclient 224.0.1.1 # multicast client
#manycastserver 239.255.254.254 # manycast server
#manycastclient 239.255.254.254 autokey # manycast client
# Enable public key cryptography.
#crypto
includefile /etc/ntp/crypto/pw
# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys
# Specify the key identifiers which are trusted.
#trustedkey 4 8 42
# Specify the key identifier to use with the ntpdc utility.
#requestkey 8
# Specify the key identifier to use with the ntpq utility.
#controlkey 8
# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats
启动ntp服务:
systemctl start ntpd
systemctl enable ntpd
计算节点安装
操作系统配置
- 同管理节点
NIS客户端配置
执行如下命令:
authconfig --enablenis --nisdomain=hpc.local --nisserver=mgt01 --update
NTP服务配置
计算节点从管理节点获取时间同步。
安装ntp:
yum -y install ntp
修改/etc/ntp.conf,如下:
# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict -6 ::1
# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.rhel.pool.ntp.org iburst
#server 1.rhel.pool.ntp.org iburst
#server 2.rhel.pool.ntp.org iburst
#server 3.rhel.pool.ntp.org iburst
restrict mgt01.hpc.local
server mgt01.hpc.local
#broadcast 192.168.1.255 autokey # broadcast server
#broadcastclient # broadcast client
#broadcast 224.0.1.1 autokey # multicast server
#multicastclient 224.0.1.1 # multicast client
#manycastserver 239.255.254.254 # manycast server
#manycastclient 239.255.254.254 autokey # manycast client
# Enable public key cryptography.
#crypto
includefile /etc/ntp/crypto/pw
# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys
# Specify the key identifiers which are trusted.
#trustedkey 4 8 42
# Specify the key identifier to use with the ntpdc utility.
#requestkey 8
# Specify the key identifier to use with the ntpq utility.
#controlkey 8
# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats
启动ntp服务:
systemctl start ntpd
systemctl enable ntpd
集群SSH配置
ssh服务免密码配置
root账号需要配置基于用户的免密码登录。
登录管理节点执行:
# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:IOvGKTpMCQ/fch74ydqkkEGHE43fMrPzPxb8S83n8tU root@mgt01
The key's randomart image is:
+---[RSA 2048]----+
| .o |
| .o. |
| +..o . |
|+ o= + . |
|o+.o* . S |
| ===+. o o .|
|= .*Oo o. o . .E|
|.+ *=. o.. .o . |
|..o.. o.... oo |
+----[SHA256]-----+
进入/root/.ssh目录,生成了如下两个文件:
# ls
id_rsa id_rsa.pub
##添加本机的登录密钥
# cp id_rsa.pub authorized_keys
登录本机验证免密码设置是否成功:
# ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:xn5vyXzYWgRD+GWKjP0Wpi3BSEkJgm6MeIc636ROUkI.
ECDSA key fingerprint is MD5:36:4d:de:4f:1c:b1:cd:67:62:80:d8:d1:0d:ba:b9:1f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Last login: Mon Aug 27 13:27:41 2018 from 192.168.10.1
将/root/.ssh目录拷贝到集群中所有其他节点
基于主机的免密码认证
基于主机的免密码认证对所有普通用户有效,配置如下:
登录管理节点:
修改/etc/ssh/sshd_config文件
HostbasedAuthentication yes
修改/etc/ssh/ssh_config文件
HostbasedAuthentication yes
EnableSSHKeysign yes
StrictHostKeyChecking no
创建ssh_known_hosts2文件
#ssh-keyscan -t rsa mgt01 > /etc/ssh/ssh_known_hosts2
修改/etc/ssh/ssh_known_hosts2,类似如下:
mgt01,mgt01.hpc.local,mgt01-ib0 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDS8CfEm5K1UX8wHHdM8/xiPP0eKxZshYgG4DZqHIQByedJ5DqGU93QnHQRqIdez95iy/7Zs4822E+U7aXY+OxJyAPbrSeQHtO3sIwU+HLfRsNMWjIF0M4UQKopJWmOqUuj27pJbCF46G8kablSc2vVUq8sagzcCX39e9Gl2NrE4HPsZalYfVo1W3CRoLuEPat7raziFZ0o8rgjirH+NVOvgcWKOoDg4O/sDuLy4n78gm0yoYWXB31B8MBfmYegUP8xC30BWc6+vDaB2l3/XIIA+B7572S1+Q80/4+5gqVv77cEy0OqRak6yWO/Kn0oearaSOK/9HGnhNX1+IvOP1Sr
c01n01,c01n01.hpc.local,c01n01-ib0 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDS8CfEm5K1UX8wHHdM8/xiPP0eKxZshYgG4DZqHIQByedJ5DqGU93QnHQRqIdez95iy/7Zs4822E+U7aXY+OxJyAPbrSeQHtO3sIwU+HLfRsNMWjIF0M4UQKopJWmOqUuj27pJbCF46G8kablSc2vVUq8sagzcCX39e9Gl2NrE4HPsZalYfVo1W3CRoLuEPat7raziFZ0o8rgjirH+NVOvgcWKOoDg4O/sDuLy4n78gm0yoYWXB31B8MBfmYegUP8xC30BWc6+vDaB2l3/XIIA+B7572S1+Q80/4+5gqVv77cEy0OqRak6yWO/Kn0oearaSOK/9HGnhNX1+IvOP1Sr
node01,node01.hpc.local,node01-ib0 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDS8CfEm5K1UX8wHHdM8/xiPP0eKxZshYgG4DZqHIQByedJ5DqGU93QnHQRqIdez95iy/7Zs4822E+U7aXY+OxJyAPbrSeQHtO3sIwU+HLfRsNMWjIF0M4UQKopJWmOqUuj27pJbCF46G8kablSc2vVUq8sagzcCX39e9Gl2NrE4HPsZalYfVo1W3CRoLuEPat7raziFZ0o8rgjirH+NVOvgcWKOoDg4O/sDuLy4n78gm0yoYWXB31B8MBfmYegUP8xC30BWc6+vDaB2l3/XIIA+B7572S1+Q80/4+5gqVv77cEy0OqRak6yWO/Kn0oearaSOK/9HGnhNX1+IvOP1Sr
创建shosts.equiv文件
# vi /etc/ssh/shosts.equiv
#文件内容格式如下:
node01.hpc.local
node01
node01-ib0
c01n01.hpc.local
c01n01
c01n01-ib0
...
将ssh配置文件复制到各个节点上
ssh node01 rm -rf /etc/ssh
scp -r /etc/ssh node01:/etc/
重启ssh服务
systemctl restart sshd
集群存储部署和调度系统部署
后续内容在HPC星球更新,欢迎加入学习。
详细了解——》HPC星球
Powered by Waline v2.14.7