Rancher 下游集群启用审计日志

前言

Kubernetes 审计(Auditing) 功能提供了与安全相关的、按时间顺序排列的记录集, 记录每个用户、使用 Kubernetes API 的应用以及控制面自身引发的活动。审计记录最初产生于 kube-apiserver 内部。每个请求在不同执行阶段都会生成审计事件;这些审计事件会根据特定策略被预处理并写入后端。 策略确定要记录的内容和用来存储记录的后端,当前的后端支持日志文件和 Webhook。

审计日志功能

  • 发生了什么?
  • 什么时候发生的?
  • 谁触发的?
  • 活动发生在哪个(些)对象上?
  • 在哪观察到的?
  • 它从哪触发的?
  • 活动的后续处理行为是什么?

审计日志阶段

每个请求都可被记录其相关的阶段(stage)。已定义的阶段有:

阶段 说明
RequestReceived 此阶段对应审计处理器接收到请求后, 并且在委托给其余处理器之前生成的事件。
ResponseStarted 在响应消息的头部发送后,响应消息体发送前生成的事件。 只有长时间运行的请求(例如 watch)才会生成这个阶段。
ResponseComplete 当响应消息体完成并且没有更多数据需要传输的时候。
Panic 当 panic 发生时生成。

审计策略

策略 说明
None 符合这条规则的日志将不会记录。
Metadata 记录请求的元数据(请求的用户、时间戳、资源、动词等等), 但是不记录请求或者响应的消息体。
Request 记录请求的元数据和请求的消息体,但是不记录响应的消息体。 这不适用于非资源类型的请求。
RequestResponse 记录请求的元数据、请求正文和响应正文。这不适用于非资源类型的请求。

审计后端

后端 说明
Log 将事件写入到文件系统
Webhook 将事件发送到外部 HTTP API

RKE

在 UI 上或者本地编辑 RKE 的 cluster.yml文件,在如下位置启用审计日志功能:

1
2
3
4
services:
kube-api:
audit_log:
enabled: true

接着在/etc/kubernetes/audit-policy.yaml能看到默认值:

1
2
3
4
...
rules:
- level: Metadata
...

较新的版本中,审计日志参数的默认值如下:

1
2
3
4
5
6
--audit-log-maxage=30
--audit-log-maxbackup=10
--audit-log-path=/var/log/kube-audit/audit-log.json
--audit-log-maxsize=100
--audit-policy-file=/etc/kubernetes/audit-policy.yaml
--audit-log-format=json

可以通过如下方法配置审计策略:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
services:
kube-api:
audit_log:
enabled: true
configuration:
max_age: 6
max_backup: 6
max_size: 110
path: /var/log/kube-audit/audit-log.json
format: json
policy:
apiVersion: audit.k8s.io/v1 # This is required.
kind: Policy
omitStages:
- "RequestReceived"
rules:
# Log pod changes at RequestResponse level
- level: RequestResponse
resources:
- group: ""
# Resource "pods" doesn't match requests to any subresource of pods,
# which is consistent with the RBAC policy.
resources: ["pods"]
# Log "pods/log", "pods/status" at Metadata level
- level: Metadata
resources:
- group: ""
resources: ["pods/log", "pods/status"]

# Don't log requests to a configmap called "controller-leader"
- level: None
resources:
- group: ""
resources: ["configmaps"]
resourceNames: ["controller-leader"]

# Don't log watch requests by the "system:kube-proxy" on endpoints or services
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: "" # core API group
resources: ["endpoints", "services"]

# Don't log authenticated requests to certain non-resource URL paths.
- level: None
userGroups: ["system:authenticated"]
nonResourceURLs:
- "/api*" # Wildcard matching.
- "/version"

# Log the request body of configmap changes in kube-system.
- level: Request
resources:
- group: "" # core API group
resources: ["configmaps"]
# This rule only applies to resources in the "kube-system" namespace.
# The empty string "" can be used to select non-namespaced resources.
namespaces: ["kube-system"]

# Log configmap and secret changes in all other namespaces at the Metadata level.
- level: Metadata
resources:
- group: "" # core API group
resources: ["secrets", "configmaps"]

# Log all other resources in core and extensions at the Request level.
- level: Request
resources:
- group: "" # core API group
- group: "extensions" # Version of group should NOT be included.

# A catch-all rule to log all other requests at the Metadata level.
- level: Metadata
# Long-running requests like watches that fall under this rule will not
# generate an audit event in RequestReceived.
omitStages:
- "RequestReceived"

RKE2

在 Rancher UI 上编辑 RKE2 集群,进入 YAML 编辑状态,在 machineGlobalConfig 下,新增一个 audit-policy-file 配置,具体位置参考如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
spec:
rkeConfig:
machineGlobalConfig:
audit-policy-file: |
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
verbs: ["create", "update", "patch", "delete"]
resources:
- group: "helm.cattle.io"
resources: ["helmchartconfigs"]

- level: None
users: ["system:serviceaccount:kube-system:cluster-autoscaler"]
- level: None
verbs: ["watch", "list", "get"]

说明:

  • 第一个规则表示只监听 helmchartconfig 的增删改的操作
  • 第二、三个规则表示不监听所有的查看操作。
  • /var/lib/rancher/rke2/server/logs 这个目录下,观察 audit.log 文件的日志情况

常用规则

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
apiVersion: audit.k8s.io/v1
kind: Policy
omitStages: ["RequestReceived"] # 忽略请求接收阶段,减少日志量
rules:
# ============== 核心安全操作 ==============
# 1. 关键资源变更(完整记录)
- level: RequestResponse
verbs: ["create", "update", "patch", "delete"]
resources:
- group: "" # 核心API组
resources: ["secrets", "configmaps"]
- group: "rbac.authorization.k8s.io"
resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]
- group: "helm.cattle.io" # 用户指定资源
resources: ["helmchartconfigs"]

# 2. 权限变更(元数据级)
- level: Metadata
verbs: ["create", "update", "patch", "delete"]
resources:
- group: "rbac.authorization.k8s.io"
resources: ["clusterrolebindings"]

# 3. 节点/命名空间操作
- level: Request
verbs: ["delete"]
resources:
- group: ""
resources: ["nodes", "namespaces"]

# ============== 工作负载操作 ==============
# 4. 工作负载变更(元数据级)
- level: Metadata
verbs: ["create", "update", "patch", "delete"]
resources:
- group: "apps"
resources: ["deployments", "statefulsets", "daemonsets"]
- group: "batch"
resources: ["cronjobs", "jobs"]

# 5. Pod 删除操作
- level: Metadata
verbs: ["delete"]
resources:
- group: ""
resources: ["pods"]

# ============== 访问控制 ==============
# 6. 认证失败
- level: Request
verbs: ["create"]
resources:
- group: "authentication.k8s.io"
resources: ["tokenreviews"]

# 7. 特权访问
- level: Metadata
userGroups: ["system:masters"]

# ============== 优化规则 ==============
# 8. 忽略只读操作
- level: None
verbs: ["get", "list", "watch"]
resources:
- group: "" # 核心资源
resources: ["pods", "services", "endpoints"]
- group: "metrics.k8s.io" # 监控指标

# 9. 忽略特定系统组件
- level: None
users:
- "system:serviceaccount:kube-system:cloud-controller-manager"
- "system:serviceaccount:kube-system:node-controller"

# 10. 默认规则(记录其他写操作元数据)
- level: Metadata
verbs: ["create", "update", "patch", "delete"]

参考链接:

https://kubernetes.io/zh-cn/docs/tasks/debug/debug-cluster/audit/