问题现象
看到 k8s 集群中有 Evicted 状态的 pod,没有被清理
问题出现的原因是:
1
2
3
4
5
|
节点压力驱逐是 kubelet 主动终止 Pod 以回收节点上资源的过程。
kubelet 监控集群节点的 CPU、内存、磁盘空间和文件系统的 inode 等资源。 当这些资源中的一个或者多个达到特定的消耗水平, kubelet 可以主动地使节点上一个或者多个 Pod 失效,以回收资源防止饥饿。
在节点压力驱逐期间,kubelet 将所选 Pod 的 PodPhase 设置为 Failed。这将终止 Pod。
节点压力驱逐不同于 API 发起的驱逐。kubelet 并不理会你配置的 PodDisruptionBudget 或者是 Pod 的 terminationGracePeriodSeconds。
其实这个报错,我们不需要在意,直接删除掉就可以了。
|
如何解决这个问题了,通过 cronjob 即可
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
|
[root@k-m1 deletePodAuto]# cat ./*
apiVersion: v1
kind: Namespace
metadata:
name: delete-evicted-pods
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: delete-evicted-pods
namespace: delete-evicted-pods
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: delete-evicted-pods
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list", "delete"]
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: delete-evicted-pods
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: delete-evicted-pods
subjects:
- kind: ServiceAccount
name: delete-evicted-pods
namespace: delete-evicted-pods
apiVersion: batch/v1
kind: CronJob
metadata:
name: delete-evicted-pods
namespace: delete-evicted-pods
spec:
schedule: "*/30 * * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: delete-evicted-pods
containers:
- name: kubectl-runner
image: bitnami/kubectl:1.21.8
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- kubectl get pods --all-namespaces -o go-template='{{range .items}} {{if (eq .status.phase "Failed" )}} {{.metadata.name}}{{" "}} {{.metadata.namespace}}{{" "}} {{.metadata.creationTimestamp}}{{" "}} {{.status.reason}} {{"\n"}}{{end}} {{end}}' | while read epod namespace ct reason; do if [ x"$reason" = x"Evicted" -a $((`date +%s`-`date -d "$ct" +%s`)) -gt 259200 ];then echo "`date "+%Y-%m-%d %H:%M:%S"` delete $namespace $reason $epod "; kubectl -n $namespace delete pod $epod; fi; done;
restartPolicy: OnFailure
|
参考文档:
https://www.jianshu.com/p/19dcf715bb28
https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/
https://blog.51cto.com/u_14035463/5627073?u_atoken=fd281fa7-1e39-4f77-8bd8-c4f4b8ffd2f6&u_asession=01y8psPltWzPK5p6RiQxddbDelv2N3qnB3tdszlh3Ehn-S86RJTe_COA-SN20WrPWBX0KNBwm7Lovlpxjd_P_q4JsKWYrT3W_NKPr8w6oU7K95IP-MAYWFd-S6-lI-0YTWleIiiCxI4QtK681bDG6EW2BkFo3NEHBv0PZUm6pbxQU&u_asig=05_iqjE2ctFye6sIp-0lih0VzOtGEK9m3DziW902mWQB4mPsrKL6FAuitXevfFP-FkMSNmfmaolNbWhUz7j8CMigKCl7oMi_IvuFytWGXL9nCi-CAF53TS7fmG_UpptH6OMqxwGGjtYUBBdFJH09ywh9dr7mWQyN8sybPPqch61D_9JS7q8ZD7Xtz2Ly-b0kmuyAKRFSVJkkdwVUnyHAIJzYV7jaqxF4E_L8INKazfHBgxpeoklXQxEDsnXSLQ8v_5nHu_wmj5Aatvj_bWQkaX_e3h9VXwMyh6PgyDIVSG1W9ymCZWvyaLuDOU4CMntmgvKtBp_GYAoTfNrX2yA44vCjFnzFTzHJCmDHBQYearOE-dCrWleGmgQr3hrlLAvQ3bmWspDxyAEEo4kbsryBKb9Q&u_aref=1GjJSbPxuvTQ7lm4FYdqee%2FYKdw%3D