1966 字
10 分钟
Kubernetes 集群升级与高可用部署

记录 k8s 集群的版本升级和高可用部署过程。先对单 master 集群进行升级,然后搭建一个三 master 的高可用集群,使用 kube-vip 实现 VIP 漂移。

第一部分:集群升级#

环境说明#

现有集群:

  • k8s-master: 192.168.100.20
  • k8s-node1: 192.168.100.21
  • k8s-node2: 192.168.100.22
  • 当前版本:v1.33.5
  • 目标版本:v1.34.1

参考官方文档:

升级 Master 节点#

1. 修改 yum 源#

所有节点修改源版本:

Terminal window
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.34/rpm/
enabled=1
gpgcheck=0
EOF
yum makecache fast

2. 准备镜像#

在 harbor 节点上拉取镜像并推送到私有仓库:

Terminal window
# 拉取阿里云镜像
docker pull registry.aliyuncs.com/google_containers/kube-apiserver:v1.34.1
docker pull registry.aliyuncs.com/google_containers/kube-controller-manager:v1.34.1
docker pull registry.aliyuncs.com/google_containers/kube-scheduler:v1.34.1
docker pull registry.aliyuncs.com/google_containers/kube-proxy:v1.34.1
docker pull registry.aliyuncs.com/google_containers/coredns:v1.12.1
docker pull registry.aliyuncs.com/google_containers/pause:3.10.1
docker pull registry.aliyuncs.com/google_containers/etcd:3.6.4-0
# 打标签并推送到 Harbor
docker images |grep google_containers | awk '{print $1":"$2}' | awk -F/ '{system("docker tag "$0" reg.westos.org/k8s/"$3"")}'
docker images |grep reg.westos.org/k8s | awk '{system("docker push "$1":"$2"")}'

3. 修改 containerd 配置#

在所有节点更新 pause 镜像版本:

Terminal window
sed -i 's#sandbox_image = ".*"#sandbox_image = "reg.westos.org/k8s/pause:3.10.1"#g' /etc/containerd/config.toml
systemctl restart containerd

4. 升级 kubeadm#

在 master 节点:

Terminal window
yum install -y kubeadm-1.34.1
# 验证版本
kubeadm version

输出:

kubeadm version: &version.Info{Major:"1", Minor:"34", GitVersion:"v1.34.1", ...}

5. 验证升级计划#

Terminal window
kubeadm upgrade plan

会列出可升级的版本和组件变化。

6. 执行升级#

Terminal window
kubeadm upgrade apply v1.34.1

等待升级完成,会看到类似输出:

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.34.1". Enjoy!

7. 升级 kubelet 和 kubectl#

Terminal window
# 腾空节点(驱逐 Pod)
kubectl drain k8s-master --ignore-daemonsets
# 升级组件
yum install -y kubelet-1.34.1 kubectl-1.34.1
# 重启 kubelet
systemctl daemon-reload
systemctl restart kubelet
# 解除保护
kubectl uncordon k8s-master
# 验证
kubectl get node

升级 Worker 节点#

在每个 worker 节点上依次执行:

1. 升级 kubeadm#

Terminal window
yum install -y kubeadm-1.34.1

2. 升级节点#

Terminal window
kubeadm upgrade node

3. 升级 kubelet#

Terminal window
# 在 master 上腾空节点
kubectl drain k8s-node1 --ignore-daemonsets
# 在 node1 上升级
yum install -y kubelet-1.34.1
systemctl daemon-reload
systemctl restart kubelet
# 在 master 上解除保护
kubectl uncordon k8s-node1

对 node2 重复相同步骤。

验证升级结果#

Terminal window
kubectl get node

输出:

NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane 10d v1.34.1
k8s-node1 Ready <none> 10d v1.34.1
k8s-node2 Ready <none> 10d v1.34.1

第二部分:高可用集群部署#

环境规划#

主机IP用途
k8s-master01192.168.100.20控制平面节点
k8s-master02192.168.100.21控制平面节点
k8s-master03192.168.100.22控制平面节点
k8s-worker01192.168.100.23工作节点
harbor192.168.100.14私有镜像仓库
VIP192.168.100.200虚拟 IP(漂移)

软件版本:

  • k8s: v1.34.1
  • kube-vip: v1.0.1
  • Calico: v3.31.0

Harbor 仓库地址:reg.westos.org

系统初始化#

所有节点都需要执行以下操作(参考搭建k8s集群文章的完整流程)。

关闭 swap#

Terminal window
swapoff -a
sed -i '/swap/s/^/#/' /etc/fstab

调整内核参数#

Terminal window
cat <<EOF > /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
sysctl --system

配置主机名和解析#

Terminal window
# 在各节点设置主机名
hostnamectl set-hostname k8s-master01 # master01 上
hostnamectl set-hostname k8s-master02 # master02 上
hostnamectl set-hostname k8s-master03 # master03 上
hostnamectl set-hostname k8s-worker01 # worker01 上
# 所有节点添加 hosts
cat >> /etc/hosts <<EOF
192.168.100.20 k8s-master01
192.168.100.21 k8s-master02
192.168.100.22 k8s-master03
192.168.100.23 k8s-worker01
192.168.100.14 reg.westos.org
192.168.100.200 k8s-apiserver
EOF

配置 yum 源#

Terminal window
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.34/rpm/
enabled=1
gpgcheck=0
EOF

配置 IPVS#

Terminal window
cat > /etc/modules-load.d/ipvs.conf <<EOF
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack
overlay
br_netfilter
EOF
modprobe ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack overlay br_netfilter
dnf install -y ipvsadm ipset

安装 containerd#

所有节点安装:

Terminal window
yum install -y containerd.io cri-tools
containerd config default > /etc/containerd/config.toml
# 启用 systemd cgroup
sed -i 's#SystemdCgroup = false#SystemdCgroup = true#g' /etc/containerd/config.toml
systemctl enable --now containerd
# 配置 crictl
cat <<EOF > /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
EOF

配置 Harbor 证书#

Terminal window
# 所有节点创建目录
mkdir -p /etc/containerd/certs.d/reg.westos.org
# 从 harbor 复制证书(在各节点上执行)
scp root@192.168.100.14:/etc/docker/certs.d/reg.westos.org/ca.crt /etc/containerd/certs.d/reg.westos.org/
# 修改 containerd 配置
sed -i "s#config_path = ''#config_path = '/etc/containerd/certs.d'#g" /etc/containerd/config.toml
sed -i "s#registry.k8s.io/pause.*#reg.westos.org/k8s/pause:3.10.1'#g" /etc/containerd/config.toml
systemctl restart containerd

安装 k8s 组件#

所有节点安装:

Terminal window
yum install -y kubelet kubeadm kubectl
systemctl enable --now kubelet

部署 kube-vip(第一个 master)#

kube-vip 用于实现控制平面的高可用,通过 VIP 漂移确保 apiserver 的访问入口始终可用。

准备 kube-vip 镜像#

在 harbor 节点上:

Terminal window
docker pull ghcr.io/kube-vip/kube-vip:v1.0.1
# 打标签并推送
docker tag ghcr.io/kube-vip/kube-vip:v1.0.1 reg.westos.org/kube-vip/kube-vip:v1.0.1
docker push reg.westos.org/kube-vip/kube-vip:v1.0.1

创建 kube-vip 静态 Pod#

在 master01 上创建 kube-vip 配置(注意:要在初始化集群之前创建):

Terminal window
# 先创建 manifests 目录
mkdir -p /etc/kubernetes/manifests
# 创建 kube-vip 配置
cat > /etc/kubernetes/manifests/kube-vip.yaml <<EOF
apiVersion: v1
kind: Pod
metadata:
name: kube-vip
namespace: kube-system
spec:
containers:
- args:
- manager
env:
- name: vip_arp
value: "true"
- name: port
value: "6443"
- name: vip_nodename
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: vip_interface
value: ens160
- name: vip_subnet
value: "32"
- name: dns_mode
value: first
- name: cp_enable
value: "true"
- name: cp_namespace
value: kube-system
- name: svc_enable
value: "true"
- name: svc_leasename
value: plndr-svcs-lock
- name: vip_leaderelection
value: "true"
- name: vip_leasename
value: plndr-cp-lock
- name: vip_leaseduration
value: "5"
- name: vip_renewdeadline
value: "3"
- name: vip_retryperiod
value: "1"
- name: address
value: 192.168.100.200
- name: prometheus_server
value: :2112
image: reg.westos.org/kube-vip/kube-vip:v1.0.1
imagePullPolicy: IfNotPresent
name: kube-vip
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
- NET_RAW
drop:
- ALL
volumeMounts:
- mountPath: /etc/kubernetes/admin.conf
name: kubeconfig
hostAliases:
- hostnames:
- kubernetes
ip: 127.0.0.1
hostNetwork: true
volumes:
- hostPath:
path: /etc/kubernetes/super-admin.conf
name: kubeconfig
status: {}
EOF

准备集群初始化配置#

在 master01 上创建配置文件:

Terminal window
cat > kubeadm-config.yaml <<EOF
apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
localAPIEndpoint:
advertiseAddress: 192.168.100.20 # master01 本机 IP
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
name: k8s-master01 # master01 主机名
taints: null
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
kubernetesVersion: v1.34.1
clusterName: kubernetes
controlPlaneEndpoint: "192.168.100.200:6443" # VIP 地址
imageRepository: reg.westos.org/k8s
certificatesDir: /etc/kubernetes/pki
apiServer:
certSANs:
- 192.168.100.200 # VIP 加入证书
- 192.168.100.20
- 192.168.100.21
- 192.168.100.22
etcd:
local:
dataDir: /var/lib/etcd
networking:
serviceSubnet: 10.96.0.0/12
podSubnet: 10.244.0.0/16
dnsDomain: cluster.local
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
ipvs:
scheduler: rr
strictARP: true # 高可用必须开启,避免 ARP 冲突
EOF

初始化第一个 Master#

在 master01 上执行:

Terminal window
kubeadm init --config=kubeadm-config.yaml --upload-certs

初始化成功后会输出两条 join 命令,一条给 master 用,一条给 worker 用,记录下来:

Terminal window
# Master 节点加入命令(带 --control-plane)
kubeadm join 192.168.100.200:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:xxx... \
--control-plane --certificate-key yyy...
# Worker 节点加入命令
kubeadm join 192.168.100.200:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:xxx...

配置 kubectl#

在 master01 上:

Terminal window
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
# 命令补全
yum install -y bash-completion
echo "source <(kubectl completion bash)" >> ~/.bashrc
source ~/.bashrc

查看节点状态:

Terminal window
kubectl get node

输出:

NAME STATUS ROLES AGE VERSION
k8s-master01 NotReady control-plane 2m v1.34.1

安装 Calico 网络插件#

在 harbor 节点准备镜像:

Terminal window
docker pull quay.io/calico/cni:v3.31.0
docker pull quay.io/calico/node:v3.31.0
docker pull quay.io/calico/kube-controllers:v3.31.0
# 打标签并推送
docker images |grep calico | awk '{print $1":"$2}' | awk -F/ '{system("docker tag "$0" reg.westos.org/calico/"$3"")}'
docker images |grep reg.westos.org/calico | awk '{system("docker push "$1":"$2"")}'

在 master01 上部署:

Terminal window
wget https://raw.githubusercontent.com/projectcalico/calico/v3.31.0/manifests/calico.yaml
sed -i 's#quay.io/#reg.westos.org/#g' calico.yaml
kubectl apply -f calico.yaml

等待 Calico 启动:

Terminal window
kubectl get pod -A |grep calico

节点变为 Ready:

Terminal window
kubectl get node

输出:

NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 5m v1.34.1

验证 kube-vip#

检查 VIP 是否生效:

Terminal window
ip a s eth0 |grep 192.168.100.200

应该能看到:

inet 192.168.100.200/32 scope global eth0

查看 kube-vip Pod:

Terminal window
crictl ps |grep kube-vip

加入其他 Master 节点#

在 master02 和 master03 上,先创建 kube-vip 配置(使用命令生成):

Terminal window
# 下载 kube-vip 二进制(或用 docker run 方式生成)
mkdir -p /etc/kubernetes/manifests
# 生成配置
kube-vip manifest pod --interface eth0 --address 192.168.100.200 --controlplane --services --arp --leaderElection --image reg.westos.org/kube-vip/kube-vip:v1.0.1 > /etc/kubernetes/manifests/kube-vip.yaml

然后执行 join 命令:

Terminal window
# 使用之前记录的 master join 命令
kubeadm join 192.168.100.200:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:xxx... \
--control-plane --certificate-key yyy...

在 master01 上查看:

Terminal window
kubectl get node

输出:

NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 20m v1.34.1
k8s-master02 Ready control-plane 5m v1.34.1
k8s-master03 Ready control-plane 3m v1.34.1

加入 Worker 节点#

在 worker01 上执行:

Terminal window
# 使用之前记录的 worker join 命令
kubeadm join 192.168.100.200:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:xxx...

添加节点标签:

Terminal window
kubectl label nodes k8s-worker01 node-role.kubernetes.io/worker=

查看集群状态:

Terminal window
kubectl get node

输出:

NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 30m v1.34.1
k8s-master02 Ready control-plane 15m v1.34.1
k8s-master03 Ready control-plane 13m v1.34.1
k8s-worker01 Ready worker 2m v1.34.1

验证 etcd 集群#

查看 etcd 集群状态:

Terminal window
# 获取 etcd Pod 名称
ETCD_POD=$(kubectl get pod -n kube-system -l component=etcd -o jsonpath='{.items[0].metadata.name}')
# 查看 etcd 成员
kubectl -n kube-system exec ${ETCD_POD} -- sh -c \
"ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
member list -w table"

应该能看到 3 个 etcd 成员。

查看集群健康状态:

Terminal window
kubectl -n kube-system exec ${ETCD_POD} -- sh -c \
"ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
endpoint status --cluster -w table"

测试高可用#

测试 VIP 漂移:

Terminal window
# 关闭当前持有 VIP 的 master 节点
# 在另一台 master 上观察 VIP 是否漂移过来
ip a s eth0 |grep 192.168.100.200

测试 apiserver 访问:

Terminal window
# 通过 VIP 访问 apiserver
kubectl --server=https://192.168.100.200:6443 get node

常见问题#

遇到过几个问题:

  1. kube-vip 没启动:检查 /etc/kubernetes/super-admin.conf 是否存在,kubelet 初始化后才会生成这个文件

  2. VIP 冲突:确保 strictARP: true,避免多个节点同时响应 ARP

  3. token 过期:重新生成 join 命令:

    Terminal window
    kubeadm token create --print-join-command
    # 获取 certificate-key
    kubeadm init phase upload-certs --upload-certs
  4. etcd 不健康:检查防火墙,etcd 需要 2379、2380 端口互通

参考资料#

Kubernetes 集群升级与高可用部署
https://dev-null-sec.github.io/posts/k8s集群升级与高可用部署/
作者
DevNull
发布于
2024-11-12
许可协议
CC BY-NC-SA 4.0