第五部分:Kubernetes 性能调优实战
从系统瓶颈到应用优化的完整性能调优体系
目录
第7章:Kubernetes 性能调优与排查实战
7.1 性能调优的分层模型
理解性能问题的第一步:分层分析。
K8s 集群中,性能问题可分为 5 层:
应用层 → 程序慢
容器层 → Cgroups 限制、CPU 抢占
节点层 → 内核资源竞争、I/O 拥堵
网络层 → CNI 延迟、DNS 阻塞
集群层 → 调度延迟、API Server 压力
7.1.1 对应排查方向
| 层级 | 关键问题 | 核心工具 | 示例 |
|---|---|---|---|
| 应用层 | 代码瓶颈、Goroutine阻塞 | pprof / trace / flamegraph | go tool pprof |
| 容器层 | Cgroup 限制、内存泄漏 | docker stats / cadvisor | kubectl top pod |
| 节点层 | CPU Throttling、IO Wait | top / iostat / vmstat | iostat -x 1 |
| 网络层 | Ping高延迟、Pod连通性差 | iperf / ping / tcpdump | iperf3 -c server |
| 集群层 | 调度慢、事件积压 | kubectl get events / metrics | kubectl get events |
性能分层分析模型
实验1:性能问题定位流程
#!/bin/bash
# 性能问题定位流程脚本
echo "=== 性能问题定位流程 ==="
# 1. 应用层分析
echo "1. 应用层分析:"
kubectl get pods -o wide
kubectl top pod
kubectl logs -l app=myapp --tail=100
# 2. 容器层分析
echo "2. 容器层分析:"
kubectl describe pod $(kubectl get pod -l app=myapp -o jsonpath='{.items[0].metadata.name}')
kubectl exec $(kubectl get pod -l app=myapp -o jsonpath='{.items[0].metadata.name}') -- cat /sys/fs/cgroup/cpu/cpu.stat
# 3. 节点层分析
echo "3. 节点层分析:"
kubectl top node
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "top -bn1 | head -20"
# 4. 网络层分析
echo "4. 网络层分析:"
kubectl exec $(kubectl get pod -l app=myapp -o jsonpath='{.items[0].metadata.name}') -- ping -c 3 8.8.8.8
kubectl exec $(kubectl get pod -l app=myapp -o jsonpath='{.items[0].metadata.name}') -- nslookup kubernetes.default
# 5. 集群层分析
echo "5. 集群层分析:"
kubectl get events --sort-by=.lastTimestamp | tail -20
kubectl get --raw /metrics | grep scheduler
实验2:性能监控工具部署
#!/bin/bash
# 性能监控工具部署脚本
echo "=== 性能监控工具部署 ==="
# 1. 部署 Prometheus
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
data:
prometheus.yml: |
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus
ports:
- containerPort: 9090
volumeMounts:
- name: config
mountPath: /etc/prometheus
volumes:
- name: config
configMap:
name: prometheus-config
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
selector:
app: prometheus
ports:
- port: 9090
targetPort: 9090
type: NodePort
EOF
# 2. 部署 Grafana
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana
ports:
- containerPort: 3000
env:
- name: GF_SECURITY_ADMIN_PASSWORD
value: admin
---
apiVersion: v1
kind: Service
metadata:
name: grafana
spec:
selector:
app: grafana
ports:
- port: 3000
targetPort: 3000
type: NodePort
EOF
# 3. 部署 Node Exporter
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
containers:
- name: node-exporter
image: prom/node-exporter
ports:
- containerPort: 9100
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
hostNetwork: true
hostPID: true
---
apiVersion: v1
kind: Service
metadata:
name: node-exporter
spec:
selector:
app: node-exporter
ports:
- port: 9100
targetPort: 9100
EOF
echo "性能监控工具部署完成!"
echo "Prometheus: http://<node-ip>:<node-port>"
echo "Grafana: http://<node-ip>:<node-port> (admin/admin)"
CPU 性能分析与调优
7.2 CPU 性能分析
7.2.1 关键指标
| 指标 | 含义 | 查看命令 |
|---|---|---|
| CPU usage | 使用率 | kubectl top pod |
| CPU throttled time | 被 Cgroup 限制的时间 | cat /sys/fs/cgroup/cpu.stat |
| Load Average | 平均负载 | uptime |
| context switches | 上下文切换次数 | vmstat 1 |
| run queue length | 等待执行的进程数量 | vmstat 1 |
7.2.2 CPU Throttling 问题
当容器设置了 CPU limit 时,Linux CFS 会"限流":
cpu.cfs_quota_us / cpu.cfs_period_us
- 默认周期:100ms
- 超过限额 → 容器暂停执行 → 导致 QPS 波动
查看被限流情况:
cat /sys/fs/cgroup/cpu.stat
# throttled_time 表示 CPU 被限流的总时间
优化策略:
- 尽量避免设置 limits.cpu,只设置 requests.cpu
- 或者将 limit 与 request 设置相同
- 或者加大 cpu.cfs_period_us 周期
实验1:CPU Throttling 完整案例
#!/bin/bash
# CPU Throttling 完整案例
echo "=== CPU Throttling 完整案例 ==="
# 1. 创建 CPU 限制 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: cpu-throttling-test
spec:
containers:
- name: test
image: nixery.dev/shell/stress-ng
command: ["stress-ng", "--cpu", "4", "--timeout", "60s"]
resources:
requests:
cpu: 100m
limits:
cpu: 200m
EOF
# 2. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod cpu-throttling-test --timeout=60s
# 3. 查看 CPU 使用情况
echo "查看 CPU 使用情况:"
kubectl top pod cpu-throttling-test
# 4. 查看 Cgroup 配置
echo "查看 Cgroup 配置:"
kubectl exec cpu-throttling-test -- cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
kubectl exec cpu-throttling-test -- cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
# 5. 查看限流统计
echo "查看限流统计:"
kubectl exec cpu-throttling-test -- cat /sys/fs/cgroup/cpu/cpu.stat
# 6. 监控限流情况
echo "监控限流情况:"
for i in {1..10}; do
echo "第 $i 次检查:"
kubectl exec cpu-throttling-test -- cat /sys/fs/cgroup/cpu/cpu.stat | grep throttled
sleep 5
done
# 7. 清理
kubectl delete pod cpu-throttling-test
实验2:CPU 亲和与 NUMA 优化
#!/bin/bash
# CPU 亲和与 NUMA 优化
echo "=== CPU 亲和与 NUMA 优化 ==="
# 1. 检查 NUMA 拓扑
echo "检查 NUMA 拓扑:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "numactl --hardware"
# 2. 创建 NUMA 亲和性 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: numa-optimized
spec:
containers:
- name: test
image: nixery.dev/shell/stress-ng
command: ["stress-ng", "--cpu", "2", "--timeout", "60s"]
resources:
requests:
cpu: 2
memory: 1Gi
limits:
cpu: 2
memory: 1Gi
nodeSelector:
kubernetes.io/hostname: node-1
EOF
# 3. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod numa-optimized --timeout=60s
# 4. 查看 CPU 绑定
echo "查看 CPU 绑定:"
kubectl exec numa-optimized -- taskset -cp $$
# 5. 查看 NUMA 绑定
echo "查看 NUMA 绑定:"
kubectl exec numa-optimized -- numactl --show
# 6. 性能测试
echo "性能测试:"
kubectl exec numa-optimized -- stress-ng --cpu 2 --timeout 10s --metrics-brief
# 7. 清理
kubectl delete pod numa-optimized
内存性能与 OOM 问题
7.3 内存性能分析
7.3.1 关键指标
| 指标 | 含义 | 查看命令 |
|---|---|---|
| memory.usage_in_bytes | 实际使用 | cat /sys/fs/cgroup/memory/memory.usage_in_bytes |
| memory.limit_in_bytes | 限制 | cat /sys/fs/cgroup/memory/memory.limit_in_bytes |
| page_faults | 缺页次数 | cat /sys/fs/cgroup/memory/memory.stat |
| rss | 常驻内存集 | cat /sys/fs/cgroup/memory/memory.stat |
| cache | 文件缓存 | cat /sys/fs/cgroup/memory/memory.stat |
7.3.2 OOMKilled 问题
当容器超过 memory limit → 系统 OOM Killer 杀死进程。
查看:
kubectl describe pod <pod>
日志片段:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
优化策略:
| 方向 | 方法 | 示例 |
|---|---|---|
| 内存碎片 | 调整 vm.swappiness=0 | sysctl vm.swappiness=0 |
| Cgroup 限制 | 确保 limit 不小于 request | requests: {memory: "256Mi"}, limits: {memory: "512Mi"} |
| GC 调优 | Go:GOMEMLIMIT; Java:-XX:+UseG1GC | GOMEMLIMIT=512Mi |
| 临时文件 | 使用 emptyDir: { medium: Memory } | emptyDir: { medium: Memory } |
实验1:OOM 触发与分析
#!/bin/bash
# OOM 触发与分析
echo "=== OOM 触发与分析 ==="
# 1. 创建内存限制 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: oom-test
spec:
containers:
- name: test
image: python:3.9
command: ["python3", "-c"]
args:
- |
import time
data = []
for i in range(1000):
data.append('x' * 1024 * 1024) # 每次分配 1MB
print(f'Allocated: {i+1}MB')
time.sleep(0.1)
resources:
requests:
memory: 128Mi
limits:
memory: 256Mi
EOF
# 2. 等待 Pod 启动
sleep 10
# 3. 查看 Pod 状态
echo "查看 Pod 状态:"
kubectl get pod oom-test
# 4. 查看 Pod 事件
echo "查看 Pod 事件:"
kubectl describe pod oom-test
# 5. 查看 OOM 日志
echo "查看 OOM 日志:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "dmesg | grep -i oom | tail -10"
# 6. 查看内存使用统计
echo "查看内存使用统计:"
kubectl exec oom-test -- cat /sys/fs/cgroup/memory/memory.stat 2>/dev/null || echo "Pod 已终止"
# 7. 清理
kubectl delete pod oom-test
实验2:内存泄漏模拟与定位
#!/bin/bash
# 内存泄漏模拟与定位
echo "=== 内存泄漏模拟与定位 ==="
# 1. 创建内存泄漏测试 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: memory-leak-test
spec:
containers:
- name: test
image: golang:1.19
command: ["go", "run", "/app/main.go"]
volumeMounts:
- name: app
mountPath: /app
volumes:
- name: app
configMap:
name: memory-leak-app
---
apiVersion: v1
kind: ConfigMap
metadata:
name: memory-leak-app
data:
main.go: |
package main
import (
"fmt"
"runtime"
"time"
)
func main() {
// 启用 pprof
go func() {
for {
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Printf("Alloc = %d KB, Sys = %d KB, NumGC = %d\n",
m.Alloc/1024, m.Sys/1024, m.NumGC)
time.Sleep(5 * time.Second)
}
}()
// 模拟内存泄漏
for i := 0; i < 1000; i++ {
data := make([]byte, 1024*1024) // 1MB
_ = data
time.Sleep(100 * time.Millisecond)
}
time.Sleep(30 * time.Second)
}
EOF
# 2. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod memory-leak-test --timeout=60s
# 3. 监控内存使用
echo "监控内存使用:"
for i in {1..20}; do
echo "第 $i 次检查:"
kubectl top pod memory-leak-test
sleep 3
done
# 4. 查看 Pod 日志
echo "查看 Pod 日志:"
kubectl logs memory-leak-test
# 5. 清理
kubectl delete pod memory-leak-test
kubectl delete configmap memory-leak-app
I/O 与存储性能排查
7.4 磁盘与 I/O 性能排查
I/O 性能低常见原因:
| 原因 | 症状 | 解决方案 |
|---|---|---|
| OverlayFS 层太多 | 写入延迟高 | 使用独立卷存储 |
| inode 耗尽 | 无法创建文件 | 清理临时文件 |
| 容器日志写入太频繁 | 磁盘 IO 高 | 限制日志大小 |
| 节点磁盘饱和 | 整体性能下降 | 扩容或清理磁盘 |
实验1:I/O 性能测试
#!/bin/bash
# I/O 性能测试
echo "=== I/O 性能测试 ==="
# 1. 创建 I/O 测试 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: io-test
spec:
containers:
- name: test
image: nixery.dev/shell/fio
command: ["fio", "--name=randread", "--filename=/data/test.file", "--bs=4k", "--iodepth=32", "--rw=randread", "--numjobs=4", "--time_based", "--runtime=60", "--group_reporting"]
volumeMounts:
- name: test-volume
mountPath: /data
volumes:
- name: test-volume
emptyDir: {}
EOF
# 2. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod io-test --timeout=60s
# 3. 运行 I/O 测试
echo "运行 I/O 测试:"
kubectl exec io-test -- fio --name=randread --filename=/data/test.file --bs=4k --iodepth=32 --rw=randread --numjobs=4 --time_based --runtime=60 --group_reporting
# 4. 运行写入测试
echo "运行写入测试:"
kubectl exec io-test -- fio --name=randwrite --filename=/data/test.file --bs=4k --iodepth=32 --rw=randwrite --numjobs=4 --time_based --runtime=60 --group_reporting
# 5. 运行混合测试
echo "运行混合测试:"
kubectl exec io-test -- fio --name=mixed --filename=/data/test.file --bs=4k --iodepth=32 --rw=randrw --rwmixread=70 --numjobs=4 --time_based --runtime=60 --group_reporting
# 6. 清理
kubectl delete pod io-test
实验2:存储性能优化
#!/bin/bash
# 存储性能优化
echo "=== 存储性能优化 ==="
# 1. 检查当前 I/O 调度器
echo "检查当前 I/O 调度器:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "cat /sys/block/sda/queue/scheduler"
# 2. 优化 I/O 调度器
echo "优化 I/O 调度器:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "echo mq-deadline > /sys/block/sda/queue/scheduler"
# 3. 检查文件系统类型
echo "检查文件系统类型:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "df -T"
# 4. 优化内核参数
echo "优化内核参数:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w vm.dirty_ratio=10"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w vm.dirty_background_ratio=5"
# 5. 创建优化后的测试 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: optimized-io-test
spec:
containers:
- name: test
image: nixery.dev/shell/fio
command: ["fio", "--name=optimized", "--filename=/data/test.file", "--bs=4k", "--iodepth=64", "--rw=randread", "--numjobs=8", "--time_based", "--runtime=60", "--group_reporting"]
volumeMounts:
- name: test-volume
mountPath: /data
volumes:
- name: test-volume
emptyDir: {}
EOF
# 6. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod optimized-io-test --timeout=60s
# 7. 运行优化后的测试
echo "运行优化后的测试:"
kubectl exec optimized-io-test -- fio --name=optimized --filename=/data/test.file --bs=4k --iodepth=64 --rw=randread --numjobs=8 --time_based --runtime=60 --group_reporting
# 8. 清理
kubectl delete pod optimized-io-test
网络性能与 CNI 排查
7.5 网络性能排查
7.5.1 网络层级
Pod → veth pair → bridge (cni0) → host → eth0 → 外部网络
CNI 插件(Calico、Flannel、Cilium)影响路径性能。
7.5.2 常见问题与排查
| 问题 | 可能原因 | 工具 | 解决方案 |
|---|---|---|---|
| Pod 之间延迟高 | CNI 路由不通 / MTU 配置错误 | ping / traceroute | 检查路由表、MTU 设置 |
| DNS 解析慢 | CoreDNS 高负载 | kubectl logs coredns | 优化 CoreDNS 配置 |
| 出口带宽受限 | egress 限制 / 网卡瓶颈 | iperf3 / iftop | 检查网络策略、网卡配置 |
实验1:网络性能测试
#!/bin/bash
# 网络性能测试
echo "=== 网络性能测试 ==="
# 1. 创建网络测试 Pod
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: network-test
spec:
replicas: 2
selector:
matchLabels:
app: network-test
template:
metadata:
labels:
app: network-test
spec:
containers:
- name: test
image: nixery.dev/shell/iperf3
command: ["iperf3", "-s"]
ports:
- containerPort: 5201
---
apiVersion: v1
kind: Service
metadata:
name: network-test
spec:
selector:
app: network-test
ports:
- port: 5201
targetPort: 5201
EOF
# 2. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod -l app=network-test --timeout=60s
# 3. 获取 Pod IP
POD1_IP=$(kubectl get pod -l app=network-test -o jsonpath='{.items[0].status.podIP}')
POD2_IP=$(kubectl get pod -l app=network-test -o jsonpath='{.items[1].status.podIP}')
echo "Pod 1 IP: $POD1_IP"
echo "Pod 2 IP: $POD2_IP"
# 4. 测试延迟
echo "测试延迟:"
kubectl exec $(kubectl get pod -l app=network-test -o jsonpath='{.items[0].metadata.name}') -- ping -c 10 $POD2_IP
# 5. 测试带宽
echo "测试带宽:"
kubectl exec $(kubectl get pod -l app=network-test -o jsonpath='{.items[0].metadata.name}') -- iperf3 -c $POD2_IP -t 30
# 6. 测试外部网络
echo "测试外部网络:"
kubectl exec $(kubectl get pod -l app=network-test -o jsonpath='{.items[0].metadata.name}') -- ping -c 5 8.8.8.8
# 7. 清理
kubectl delete deployment network-test
kubectl delete service network-test
实验2:网络性能优化
#!/bin/bash
# 网络性能优化
echo "=== 网络性能优化 ==="
# 1. 检查当前网络配置
echo "检查当前网络配置:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "ip link show | grep mtu"
# 2. 优化 MTU
echo "优化 MTU:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "ip link set dev eth0 mtu 1500"
# 3. 优化 TCP 参数
echo "优化 TCP 参数:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w net.core.somaxconn=65535"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w net.ipv4.tcp_tw_reuse=1"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w net.ipv4.tcp_fin_timeout=10"
# 4. 优化网络缓冲区
echo "优化网络缓冲区:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w net.core.rmem_max=16777216"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w net.core.wmem_max=16777216"
# 5. 创建优化后的测试 Pod
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: optimized-network-test
spec:
replicas: 2
selector:
matchLabels:
app: optimized-network-test
template:
metadata:
labels:
app: optimized-network-test
spec:
containers:
- name: test
image: nixery.dev/shell/iperf3
command: ["iperf3", "-s"]
ports:
- containerPort: 5201
---
apiVersion: v1
kind: Service
metadata:
name: optimized-network-test
spec:
selector:
app: optimized-network-test
ports:
- port: 5201
targetPort: 5201
EOF
# 6. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod -l app=optimized-network-test --timeout=60s
# 7. 运行优化后的测试
echo "运行优化后的测试:"
POD1_IP=$(kubectl get pod -l app=optimized-network-test -o jsonpath='{.items[0].status.podIP}')
POD2_IP=$(kubectl get pod -l app=optimized-network-test -o jsonpath='{.items[1].status.podIP}')
kubectl exec $(kubectl get pod -l app=optimized-network-test -o jsonpath='{.items[0].metadata.name}') -- iperf3 -c $POD2_IP -t 30
# 8. 清理
kubectl delete deployment optimized-network-test
kubectl delete service optimized-network-test
实验环境搭建
快速搭建脚本
#!/bin/bash
# Kubernetes 性能调优实验环境搭建脚本
set -e
echo "开始搭建 Kubernetes 性能调优实验环境..."
# 1. 创建实验集群
kind create cluster --name performance-test --config - <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
networking:
podSubnet: "10.244.0.0/16"
serviceSubnet: "10.96.0.0/12"
EOF
# 2. 等待集群就绪
kubectl wait --for=condition=Ready node --all --timeout=60s
# 3. 安装测试工具
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: performance-tools
spec:
replicas: 1
selector:
matchLabels:
app: performance-tools
template:
metadata:
labels:
app: performance-tools
spec:
containers:
- name: tools
image: nixery.dev/shell/stress-ng
command: ["sleep", "3600"]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: fio-tools
spec:
replicas: 1
selector:
matchLabels:
app: fio-tools
template:
metadata:
labels:
app: fio-tools
spec:
containers:
- name: fio
image: nixery.dev/shell/fio
command: ["sleep", "3600"]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: network-tools
spec:
replicas: 1
selector:
matchLabels:
app: network-tools
template:
metadata:
labels:
app: network-tools
spec:
containers:
- name: tools
image: nixery.dev/shell/iperf3
command: ["sleep", "3600"]
EOF
# 4. 创建测试资源
kubectl apply -f - <<EOF
# 测试资源 1:CPU 密集型
apiVersion: v1
kind: Pod
metadata:
name: cpu-test
spec:
containers:
- name: test
image: nixery.dev/shell/stress-ng
command: ["stress-ng", "--cpu", "2", "--timeout", "60s"]
resources:
requests:
cpu: 1
memory: 128Mi
limits:
cpu: 2
memory: 256Mi
---
# 测试资源 2:内存密集型
apiVersion: v1
kind: Pod
metadata:
name: memory-test
spec:
containers:
- name: test
image: nixery.dev/shell/stress-ng
command: ["stress-ng", "--vm", "1", "--vm-bytes", "256M", "--timeout", "60s"]
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 200m
memory: 512Mi
---
# 测试资源 3:I/O 密集型
apiVersion: v1
kind: Pod
metadata:
name: io-test
spec:
containers:
- name: test
image: nixery.dev/shell/fio
command: ["fio", "--name=test", "--filename=/data/test.file", "--bs=4k", "--iodepth=32", "--rw=randread", "--numjobs=4", "--time_based", "--runtime=60", "--group_reporting"]
volumeMounts:
- name: test-volume
mountPath: /data
volumes:
- name: test-volume
emptyDir: {}
EOF
echo "Kubernetes 性能调优实验环境搭建完成!"
echo "运行 'kubectl get nodes' 查看集群状态"
echo "运行 'kubectl get pods' 查看 Pod 状态"
命令速查表
性能监控命令
| 命令 | 功能 | 示例 |
|---|---|---|
kubectl top pod | 查看 Pod 资源使用 | kubectl top pod |
kubectl top node | 查看节点资源使用 | kubectl top node |
kubectl describe pod <pod> | 查看 Pod 详细信息 | kubectl describe pod my-pod |
kubectl logs <pod> | 查看 Pod 日志 | kubectl logs my-pod |
kubectl get events --sort-by=.lastTimestamp | 查看事件 | kubectl get events --sort-by=.lastTimestamp |
系统监控命令
| 命令 | 功能 | 示例 |
|---|---|---|
top | 系统监控 | top |
htop | 增强版系统监控 | htop |
iostat -x 1 | I/O 监控 | iostat -x 1 |
vmstat 1 | 虚拟内存统计 | vmstat 1 |
sar -n DEV 1 | 网络统计 | sar -n DEV 1 |
perf stat | 性能统计 | perf stat ./myapp |
网络测试命令
| 命令 | 功能 | 示例 |
|---|---|---|
ping <ip> | 测试连通性 | ping 8.8.8.8 |
traceroute <ip> | 查看网络路径 | traceroute 8.8.8.8 |
iperf3 -c <server> | 带宽测试 | iperf3 -c server |
iperf3 -s | 启动带宽测试服务端 | iperf3 -s |
nslookup <hostname> | DNS 解析测试 | nslookup kubernetes.default |
存储测试命令
| 命令 | 功能 | 示例 |
|---|---|---|
fio --name=test --filename=/data/test.file --bs=4k --iodepth=32 --rw=randread --numjobs=4 --time_based --runtime=60 --group_reporting | 随机读测试 | 在 Pod 中运行 |
fio --name=test --filename=/data/test.file --bs=4k --iodepth=32 --rw=randwrite --numjobs=4 --time_based --runtime=60 --group_reporting | 随机写测试 | 在 Pod 中运行 |
dd if=/dev/zero of=/data/test.file bs=1M count=1000 | 顺序写测试 | 在 Pod 中运行 |
dd if=/data/test.file of=/dev/null bs=1M | 顺序读测试 | 在 Pod 中运行 |