Cluster集群部署

Mr.Haoz大约 5 分钟

经典集群拓扑结构

cluster拓扑
cluster拓扑

命名和网络规划

主机名管理网/16计算网/16IPMI/16备注
mgt01192.168.0.25410.1.0.254172.16.0.254管理节点
c01n01192.168.1.110.1.1.1172.16.1.1刀片节点
c01n02192.168.1.210.1.1.2172.16.1.2刀片节点
c01n03192.168.1.310.1.1.3172.16.1.3刀片节点
c01n04192.168.1.410.1.1.4172.16.1.4刀片节点
node01192.168.0.110.1.0.1172.16.0.1机架计算节点
node02192.168.0.210.1.0.2172.16.0.2机架计算节点
graphic01192.168.0.10110.1.0.101172.16.0.101图形节点
graphic02192.168.0.10210.1.0.102172.16.0.102图形节点
io01192.168.0.20110.1.0.201172.16.0.201存储节点
io01192.168.0.20210.1.0.202172.16.0.202存储节点

目录规划

目录说明备注
/data共享目录
/data/apps应用软件安装目录
/data/home用户家目录

管理节点安装

假设本例中管理节点同时作为存储节点,除系统盘外,本地大容量磁盘做raid5,挂载点为/share, 应用软件安装目录为/share/apps, 用户家目录为/share/home

操作系统配置

  • 操作系统centOS7 最小化安装
  • 关闭防火墙,selinux, NetworkManager 重启服务器
  • 配置网络ip和主机名
  • 配置/etc/hosts,写入所有节点的 主机和ip对应关系,包括计算网和IPMI网
  • 将光盘配置为yum源

NIS服务配置

执行setupnisserver.shopen in new window,如下:

#!/bin/bash
#安装nis
yum -y install rpcbind \
      ypbind \
      yp-tools \
      ypserv
#设置nis域名,默认为hpc.local
[ "x${MYDOMAINNAME}" = "x" ] && MYDOMAINNAME="hpc.local"

sed -i '/NISDOMAIN/d' /etc/sysconfig/network
echo "NISDOMAIN=${MYDOMAINNAME}" >> /etc/sysconfig/network
#启动服务
systemctl restart network
systemctl start ypserv
systemctl start yppasswdd
systemctl enable ypserv
systemctl enable yppasswdd
#初始化数据库
/usr/lib64/yp/ypinit -m

NTP服务配置

设置集群本地NTP服务器。

安装ntp:

yum -y install ntp

修改/etc/ntp.conf,如下:

# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).

driftfile /var/lib/ntp/drift

# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery

# Permit all access over the loopback interface.  This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1 
restrict -6 ::1

# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap

# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.rhel.pool.ntp.org iburst
#server 1.rhel.pool.ntp.org iburst
#server 2.rhel.pool.ntp.org iburst
#server 3.rhel.pool.ntp.org iburst
server 127.127.1.0
fudge 127.127.1.0 stratum 8

#broadcast 192.168.1.255 autokey	# broadcast server
#broadcastclient			# broadcast client
#broadcast 224.0.1.1 autokey		# multicast server
#multicastclient 224.0.1.1		# multicast client
#manycastserver 239.255.254.254		# manycast server
#manycastclient 239.255.254.254 autokey # manycast client

# Enable public key cryptography.
#crypto

includefile /etc/ntp/crypto/pw

# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography. 
keys /etc/ntp/keys

# Specify the key identifiers which are trusted.
#trustedkey 4 8 42

# Specify the key identifier to use with the ntpdc utility.
#requestkey 8

# Specify the key identifier to use with the ntpq utility.
#controlkey 8

# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats

启动ntp服务:

systemctl start ntpd
systemctl enable ntpd

计算节点安装

操作系统配置

  • 同管理节点

NIS客户端配置

执行如下命令:

authconfig --enablenis --nisdomain=hpc.local --nisserver=mgt01 --update

NTP服务配置

计算节点从管理节点获取时间同步。

安装ntp:

yum -y install ntp

修改/etc/ntp.conf,如下:

# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).

driftfile /var/lib/ntp/drift

# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery

# Permit all access over the loopback interface.  This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1 
restrict -6 ::1

# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap

# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.rhel.pool.ntp.org iburst
#server 1.rhel.pool.ntp.org iburst
#server 2.rhel.pool.ntp.org iburst
#server 3.rhel.pool.ntp.org iburst
restrict mgt01.hpc.local
server mgt01.hpc.local

#broadcast 192.168.1.255 autokey	# broadcast server
#broadcastclient			# broadcast client
#broadcast 224.0.1.1 autokey		# multicast server
#multicastclient 224.0.1.1		# multicast client
#manycastserver 239.255.254.254		# manycast server
#manycastclient 239.255.254.254 autokey # manycast client

# Enable public key cryptography.
#crypto

includefile /etc/ntp/crypto/pw

# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography. 
keys /etc/ntp/keys

# Specify the key identifiers which are trusted.
#trustedkey 4 8 42

# Specify the key identifier to use with the ntpdc utility.
#requestkey 8

# Specify the key identifier to use with the ntpq utility.
#controlkey 8

# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats

启动ntp服务:

systemctl start ntpd
systemctl enable ntpd

集群SSH配置

ssh服务免密码配置

root账号需要配置基于用户的免密码登录。

登录管理节点执行:

# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:IOvGKTpMCQ/fch74ydqkkEGHE43fMrPzPxb8S83n8tU root@mgt01
The key's randomart image is:
+---[RSA 2048]----+
| .o              |
| .o.             |
| +..o .          |
|+ o= + .         |
|o+.o* . S        |
| ===+. o  o     .|
|= .*Oo  o. o . .E|
|.+ *=. o.. .o .  |
|..o.. o.... oo   |
+----[SHA256]-----+

进入/root/.ssh目录,生成了如下两个文件:

# ls
id_rsa  id_rsa.pub
##添加本机的登录密钥
# cp id_rsa.pub authorized_keys

登录本机验证免密码设置是否成功:

# ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:xn5vyXzYWgRD+GWKjP0Wpi3BSEkJgm6MeIc636ROUkI.
ECDSA key fingerprint is MD5:36:4d:de:4f:1c:b1:cd:67:62:80:d8:d1:0d:ba:b9:1f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Last login: Mon Aug 27 13:27:41 2018 from 192.168.10.1

将/root/.ssh目录拷贝到集群中所有其他节点

基于主机的免密码认证

基于主机的免密码认证对所有普通用户有效,配置如下:
登录管理节点:
修改/etc/ssh/sshd_config文件

  HostbasedAuthentication yes

修改/etc/ssh/ssh_config文件

    HostbasedAuthentication yes

    EnableSSHKeysign yes

    StrictHostKeyChecking no

创建ssh_known_hosts2文件

  #ssh-keyscan -t rsa mgt01 > /etc/ssh/ssh_known_hosts2

修改/etc/ssh/ssh_known_hosts2,类似如下:

mgt01,mgt01.hpc.local,mgt01-ib0 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDS8CfEm5K1UX8wHHdM8/xiPP0eKxZshYgG4DZqHIQByedJ5DqGU93QnHQRqIdez95iy/7Zs4822E+U7aXY+OxJyAPbrSeQHtO3sIwU+HLfRsNMWjIF0M4UQKopJWmOqUuj27pJbCF46G8kablSc2vVUq8sagzcCX39e9Gl2NrE4HPsZalYfVo1W3CRoLuEPat7raziFZ0o8rgjirH+NVOvgcWKOoDg4O/sDuLy4n78gm0yoYWXB31B8MBfmYegUP8xC30BWc6+vDaB2l3/XIIA+B7572S1+Q80/4+5gqVv77cEy0OqRak6yWO/Kn0oearaSOK/9HGnhNX1+IvOP1Sr
c01n01,c01n01.hpc.local,c01n01-ib0 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDS8CfEm5K1UX8wHHdM8/xiPP0eKxZshYgG4DZqHIQByedJ5DqGU93QnHQRqIdez95iy/7Zs4822E+U7aXY+OxJyAPbrSeQHtO3sIwU+HLfRsNMWjIF0M4UQKopJWmOqUuj27pJbCF46G8kablSc2vVUq8sagzcCX39e9Gl2NrE4HPsZalYfVo1W3CRoLuEPat7raziFZ0o8rgjirH+NVOvgcWKOoDg4O/sDuLy4n78gm0yoYWXB31B8MBfmYegUP8xC30BWc6+vDaB2l3/XIIA+B7572S1+Q80/4+5gqVv77cEy0OqRak6yWO/Kn0oearaSOK/9HGnhNX1+IvOP1Sr
node01,node01.hpc.local,node01-ib0 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDS8CfEm5K1UX8wHHdM8/xiPP0eKxZshYgG4DZqHIQByedJ5DqGU93QnHQRqIdez95iy/7Zs4822E+U7aXY+OxJyAPbrSeQHtO3sIwU+HLfRsNMWjIF0M4UQKopJWmOqUuj27pJbCF46G8kablSc2vVUq8sagzcCX39e9Gl2NrE4HPsZalYfVo1W3CRoLuEPat7raziFZ0o8rgjirH+NVOvgcWKOoDg4O/sDuLy4n78gm0yoYWXB31B8MBfmYegUP8xC30BWc6+vDaB2l3/XIIA+B7572S1+Q80/4+5gqVv77cEy0OqRak6yWO/Kn0oearaSOK/9HGnhNX1+IvOP1Sr

创建shosts.equiv文件

# vi /etc/ssh/shosts.equiv

#文件内容格式如下:

node01.hpc.local
node01
node01-ib0
c01n01.hpc.local
c01n01
c01n01-ib0
...

将ssh配置文件复制到各个节点上

ssh node01 rm -rf /etc/ssh
scp -r /etc/ssh node01:/etc/

重启ssh服务

systemctl restart sshd

集群存储部署和调度系统部署

后续内容在HPC星球更新,欢迎加入学习。

详细了解——》HPC星球

上次编辑于:
贡献者: osc_72297572
评论
  • 按正序
  • 按倒序
  • 按热度
Powered by Waline v2.14.7

回到顶部