记录 k8s 集群的版本升级和高可用部署过程。先对单 master 集群进行升级,然后搭建一个三 master 的高可用集群,使用 kube-vip 实现 VIP 漂移。
第一部分:集群升级
环境说明
现有集群:
- k8s-master: 192.168.100.20
- k8s-node1: 192.168.100.21
- k8s-node2: 192.168.100.22
- 当前版本:v1.33.5
- 目标版本:v1.34.1
参考官方文档:
升级 Master 节点
1. 修改 yum 源
所有节点修改源版本:
cat <<EOF > /etc/yum.repos.d/kubernetes.repo[kubernetes]name=Kubernetesbaseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.34/rpm/enabled=1gpgcheck=0EOF
yum makecache fast2. 准备镜像
在 harbor 节点上拉取镜像并推送到私有仓库:
# 拉取阿里云镜像docker pull registry.aliyuncs.com/google_containers/kube-apiserver:v1.34.1docker pull registry.aliyuncs.com/google_containers/kube-controller-manager:v1.34.1docker pull registry.aliyuncs.com/google_containers/kube-scheduler:v1.34.1docker pull registry.aliyuncs.com/google_containers/kube-proxy:v1.34.1docker pull registry.aliyuncs.com/google_containers/coredns:v1.12.1docker pull registry.aliyuncs.com/google_containers/pause:3.10.1docker pull registry.aliyuncs.com/google_containers/etcd:3.6.4-0
# 打标签并推送到 Harbordocker images |grep google_containers | awk '{print $1":"$2}' | awk -F/ '{system("docker tag "$0" reg.westos.org/k8s/"$3"")}'docker images |grep reg.westos.org/k8s | awk '{system("docker push "$1":"$2"")}'3. 修改 containerd 配置
在所有节点更新 pause 镜像版本:
sed -i 's#sandbox_image = ".*"#sandbox_image = "reg.westos.org/k8s/pause:3.10.1"#g' /etc/containerd/config.tomlsystemctl restart containerd4. 升级 kubeadm
在 master 节点:
yum install -y kubeadm-1.34.1
# 验证版本kubeadm version输出:
kubeadm version: &version.Info{Major:"1", Minor:"34", GitVersion:"v1.34.1", ...}5. 验证升级计划
kubeadm upgrade plan会列出可升级的版本和组件变化。
6. 执行升级
kubeadm upgrade apply v1.34.1等待升级完成,会看到类似输出:
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.34.1". Enjoy!7. 升级 kubelet 和 kubectl
# 腾空节点(驱逐 Pod)kubectl drain k8s-master --ignore-daemonsets
# 升级组件yum install -y kubelet-1.34.1 kubectl-1.34.1
# 重启 kubeletsystemctl daemon-reloadsystemctl restart kubelet
# 解除保护kubectl uncordon k8s-master
# 验证kubectl get node升级 Worker 节点
在每个 worker 节点上依次执行:
1. 升级 kubeadm
yum install -y kubeadm-1.34.12. 升级节点
kubeadm upgrade node3. 升级 kubelet
# 在 master 上腾空节点kubectl drain k8s-node1 --ignore-daemonsets
# 在 node1 上升级yum install -y kubelet-1.34.1systemctl daemon-reloadsystemctl restart kubelet
# 在 master 上解除保护kubectl uncordon k8s-node1对 node2 重复相同步骤。
验证升级结果
kubectl get node输出:
NAME STATUS ROLES AGE VERSIONk8s-master Ready control-plane 10d v1.34.1k8s-node1 Ready <none> 10d v1.34.1k8s-node2 Ready <none> 10d v1.34.1第二部分:高可用集群部署
环境规划
| 主机 | IP | 用途 |
|---|---|---|
| k8s-master01 | 192.168.100.20 | 控制平面节点 |
| k8s-master02 | 192.168.100.21 | 控制平面节点 |
| k8s-master03 | 192.168.100.22 | 控制平面节点 |
| k8s-worker01 | 192.168.100.23 | 工作节点 |
| harbor | 192.168.100.14 | 私有镜像仓库 |
| VIP | 192.168.100.200 | 虚拟 IP(漂移) |
软件版本:
- k8s: v1.34.1
- kube-vip: v1.0.1
- Calico: v3.31.0
Harbor 仓库地址:reg.westos.org
系统初始化
所有节点都需要执行以下操作(参考搭建k8s集群文章的完整流程)。
关闭 swap
swapoff -ased -i '/swap/s/^/#/' /etc/fstab调整内核参数
cat <<EOF > /etc/sysctl.d/k8s.confnet.ipv4.ip_forward = 1net.bridge.bridge-nf-call-iptables = 1net.bridge.bridge-nf-call-ip6tables = 1EOF
sysctl --system配置主机名和解析
# 在各节点设置主机名hostnamectl set-hostname k8s-master01 # master01 上hostnamectl set-hostname k8s-master02 # master02 上hostnamectl set-hostname k8s-master03 # master03 上hostnamectl set-hostname k8s-worker01 # worker01 上
# 所有节点添加 hostscat >> /etc/hosts <<EOF192.168.100.20 k8s-master01192.168.100.21 k8s-master02192.168.100.22 k8s-master03192.168.100.23 k8s-worker01192.168.100.14 reg.westos.org192.168.100.200 k8s-apiserverEOF配置 yum 源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo[kubernetes]name=Kubernetesbaseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.34/rpm/enabled=1gpgcheck=0EOF配置 IPVS
cat > /etc/modules-load.d/ipvs.conf <<EOFip_vsip_vs_rrip_vs_wrrip_vs_shnf_conntrackoverlaybr_netfilterEOF
modprobe ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack overlay br_netfilterdnf install -y ipvsadm ipset安装 containerd
所有节点安装:
yum install -y containerd.io cri-toolscontainerd config default > /etc/containerd/config.toml
# 启用 systemd cgroupsed -i 's#SystemdCgroup = false#SystemdCgroup = true#g' /etc/containerd/config.toml
systemctl enable --now containerd
# 配置 crictlcat <<EOF > /etc/crictl.yamlruntime-endpoint: unix:///run/containerd/containerd.sockimage-endpoint: unix:///run/containerd/containerd.sockEOF配置 Harbor 证书
# 所有节点创建目录mkdir -p /etc/containerd/certs.d/reg.westos.org
# 从 harbor 复制证书(在各节点上执行)scp root@192.168.100.14:/etc/docker/certs.d/reg.westos.org/ca.crt /etc/containerd/certs.d/reg.westos.org/
# 修改 containerd 配置sed -i "s#config_path = ''#config_path = '/etc/containerd/certs.d'#g" /etc/containerd/config.tomlsed -i "s#registry.k8s.io/pause.*#reg.westos.org/k8s/pause:3.10.1'#g" /etc/containerd/config.toml
systemctl restart containerd安装 k8s 组件
所有节点安装:
yum install -y kubelet kubeadm kubectlsystemctl enable --now kubelet部署 kube-vip(第一个 master)
kube-vip 用于实现控制平面的高可用,通过 VIP 漂移确保 apiserver 的访问入口始终可用。
准备 kube-vip 镜像
在 harbor 节点上:
docker pull ghcr.io/kube-vip/kube-vip:v1.0.1
# 打标签并推送docker tag ghcr.io/kube-vip/kube-vip:v1.0.1 reg.westos.org/kube-vip/kube-vip:v1.0.1docker push reg.westos.org/kube-vip/kube-vip:v1.0.1创建 kube-vip 静态 Pod
在 master01 上创建 kube-vip 配置(注意:要在初始化集群之前创建):
# 先创建 manifests 目录mkdir -p /etc/kubernetes/manifests
# 创建 kube-vip 配置cat > /etc/kubernetes/manifests/kube-vip.yaml <<EOFapiVersion: v1kind: Podmetadata: name: kube-vip namespace: kube-systemspec: containers: - args: - manager env: - name: vip_arp value: "true" - name: port value: "6443" - name: vip_nodename valueFrom: fieldRef: fieldPath: spec.nodeName - name: vip_interface value: ens160 - name: vip_subnet value: "32" - name: dns_mode value: first - name: cp_enable value: "true" - name: cp_namespace value: kube-system - name: svc_enable value: "true" - name: svc_leasename value: plndr-svcs-lock - name: vip_leaderelection value: "true" - name: vip_leasename value: plndr-cp-lock - name: vip_leaseduration value: "5" - name: vip_renewdeadline value: "3" - name: vip_retryperiod value: "1" - name: address value: 192.168.100.200 - name: prometheus_server value: :2112 image: reg.westos.org/kube-vip/kube-vip:v1.0.1 imagePullPolicy: IfNotPresent name: kube-vip resources: {} securityContext: capabilities: add: - NET_ADMIN - NET_RAW drop: - ALL volumeMounts: - mountPath: /etc/kubernetes/admin.conf name: kubeconfig hostAliases: - hostnames: - kubernetes ip: 127.0.0.1 hostNetwork: true volumes: - hostPath: path: /etc/kubernetes/super-admin.conf name: kubeconfigstatus: {}EOF准备集群初始化配置
在 master01 上创建配置文件:
cat > kubeadm-config.yaml <<EOFapiVersion: kubeadm.k8s.io/v1beta4kind: InitConfigurationbootstrapTokens:- groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authenticationlocalAPIEndpoint: advertiseAddress: 192.168.100.20 # master01 本机 IP bindPort: 6443nodeRegistration: criSocket: unix:///var/run/containerd/containerd.sock imagePullPolicy: IfNotPresent name: k8s-master01 # master01 主机名 taints: null---apiVersion: kubeadm.k8s.io/v1beta4kind: ClusterConfigurationkubernetesVersion: v1.34.1clusterName: kubernetescontrolPlaneEndpoint: "192.168.100.200:6443" # VIP 地址imageRepository: reg.westos.org/k8scertificatesDir: /etc/kubernetes/pkiapiServer: certSANs: - 192.168.100.200 # VIP 加入证书 - 192.168.100.20 - 192.168.100.21 - 192.168.100.22etcd: local: dataDir: /var/lib/etcdnetworking: serviceSubnet: 10.96.0.0/12 podSubnet: 10.244.0.0/16 dnsDomain: cluster.local---apiVersion: kubeproxy.config.k8s.io/v1alpha1kind: KubeProxyConfigurationmode: ipvsipvs: scheduler: rr strictARP: true # 高可用必须开启,避免 ARP 冲突EOF初始化第一个 Master
在 master01 上执行:
kubeadm init --config=kubeadm-config.yaml --upload-certs初始化成功后会输出两条 join 命令,一条给 master 用,一条给 worker 用,记录下来:
# Master 节点加入命令(带 --control-plane)kubeadm join 192.168.100.200:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:xxx... \ --control-plane --certificate-key yyy...
# Worker 节点加入命令kubeadm join 192.168.100.200:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:xxx...配置 kubectl
在 master01 上:
mkdir -p $HOME/.kubecp -i /etc/kubernetes/admin.conf $HOME/.kube/configchown $(id -u):$(id -g) $HOME/.kube/config
# 命令补全yum install -y bash-completionecho "source <(kubectl completion bash)" >> ~/.bashrcsource ~/.bashrc查看节点状态:
kubectl get node输出:
NAME STATUS ROLES AGE VERSIONk8s-master01 NotReady control-plane 2m v1.34.1安装 Calico 网络插件
在 harbor 节点准备镜像:
docker pull quay.io/calico/cni:v3.31.0docker pull quay.io/calico/node:v3.31.0docker pull quay.io/calico/kube-controllers:v3.31.0
# 打标签并推送docker images |grep calico | awk '{print $1":"$2}' | awk -F/ '{system("docker tag "$0" reg.westos.org/calico/"$3"")}'docker images |grep reg.westos.org/calico | awk '{system("docker push "$1":"$2"")}'在 master01 上部署:
wget https://raw.githubusercontent.com/projectcalico/calico/v3.31.0/manifests/calico.yamlsed -i 's#quay.io/#reg.westos.org/#g' calico.yamlkubectl apply -f calico.yaml等待 Calico 启动:
kubectl get pod -A |grep calico节点变为 Ready:
kubectl get node输出:
NAME STATUS ROLES AGE VERSIONk8s-master01 Ready control-plane 5m v1.34.1验证 kube-vip
检查 VIP 是否生效:
ip a s eth0 |grep 192.168.100.200应该能看到:
inet 192.168.100.200/32 scope global eth0查看 kube-vip Pod:
crictl ps |grep kube-vip加入其他 Master 节点
在 master02 和 master03 上,先创建 kube-vip 配置(使用命令生成):
# 下载 kube-vip 二进制(或用 docker run 方式生成)mkdir -p /etc/kubernetes/manifests
# 生成配置 kube-vip manifest pod --interface eth0 --address 192.168.100.200 --controlplane --services --arp --leaderElection --image reg.westos.org/kube-vip/kube-vip:v1.0.1 > /etc/kubernetes/manifests/kube-vip.yaml然后执行 join 命令:
# 使用之前记录的 master join 命令kubeadm join 192.168.100.200:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:xxx... \ --control-plane --certificate-key yyy...在 master01 上查看:
kubectl get node输出:
NAME STATUS ROLES AGE VERSIONk8s-master01 Ready control-plane 20m v1.34.1k8s-master02 Ready control-plane 5m v1.34.1k8s-master03 Ready control-plane 3m v1.34.1加入 Worker 节点
在 worker01 上执行:
# 使用之前记录的 worker join 命令kubeadm join 192.168.100.200:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:xxx...添加节点标签:
kubectl label nodes k8s-worker01 node-role.kubernetes.io/worker=查看集群状态:
kubectl get node输出:
NAME STATUS ROLES AGE VERSIONk8s-master01 Ready control-plane 30m v1.34.1k8s-master02 Ready control-plane 15m v1.34.1k8s-master03 Ready control-plane 13m v1.34.1k8s-worker01 Ready worker 2m v1.34.1验证 etcd 集群
查看 etcd 集群状态:
# 获取 etcd Pod 名称ETCD_POD=$(kubectl get pod -n kube-system -l component=etcd -o jsonpath='{.items[0].metadata.name}')
# 查看 etcd 成员kubectl -n kube-system exec ${ETCD_POD} -- sh -c \ "ETCDCTL_API=3 etcdctl \ --endpoints=https://127.0.0.1:2379 \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ member list -w table"应该能看到 3 个 etcd 成员。
查看集群健康状态:
kubectl -n kube-system exec ${ETCD_POD} -- sh -c \ "ETCDCTL_API=3 etcdctl \ --endpoints=https://127.0.0.1:2379 \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ endpoint status --cluster -w table"测试高可用
测试 VIP 漂移:
# 关闭当前持有 VIP 的 master 节点# 在另一台 master 上观察 VIP 是否漂移过来ip a s eth0 |grep 192.168.100.200测试 apiserver 访问:
# 通过 VIP 访问 apiserverkubectl --server=https://192.168.100.200:6443 get node常见问题
遇到过几个问题:
-
kube-vip 没启动:检查
/etc/kubernetes/super-admin.conf是否存在,kubelet 初始化后才会生成这个文件 -
VIP 冲突:确保
strictARP: true,避免多个节点同时响应 ARP -
token 过期:重新生成 join 命令:
Terminal window kubeadm token create --print-join-command# 获取 certificate-keykubeadm init phase upload-certs --upload-certs -
etcd 不健康:检查防火墙,etcd 需要 2379、2380 端口互通