第五部分：Kubernetes 性能调优实战

从系统瓶颈到应用优化的完整性能调优体系

第7章：Kubernetes 性能调优与排查实战

7.1 性能调优的分层模型

理解性能问题的第一步：分层分析。

K8s 集群中，性能问题可分为 5 层：

应用层     → 程序慢
容器层     → Cgroups 限制、CPU 抢占
节点层     → 内核资源竞争、I/O 拥堵
网络层     → CNI 延迟、DNS 阻塞
集群层     → 调度延迟、API Server 压力

7.1.1 对应排查方向

层级	关键问题	核心工具	示例
应用层	代码瓶颈、Goroutine阻塞	pprof / trace / flamegraph	`go tool pprof`
容器层	Cgroup 限制、内存泄漏	docker stats / cadvisor	`kubectl top pod`
节点层	CPU Throttling、IO Wait	top / iostat / vmstat	`iostat -x 1`
网络层	Ping高延迟、Pod连通性差	iperf / ping / tcpdump	`iperf3 -c server`
集群层	调度慢、事件积压	kubectl get events / metrics	`kubectl get events`

性能分层分析模型

实验1：性能问题定位流程

#!/bin/bash
# 性能问题定位流程脚本

echo "=== 性能问题定位流程 ==="

# 1. 应用层分析
echo "1. 应用层分析："
kubectl get pods -o wide
kubectl top pod
kubectl logs -l app=myapp --tail=100

# 2. 容器层分析
echo "2. 容器层分析："
kubectl describe pod $(kubectl get pod -l app=myapp -o jsonpath='{.items[0].metadata.name}')
kubectl exec $(kubectl get pod -l app=myapp -o jsonpath='{.items[0].metadata.name}') -- cat /sys/fs/cgroup/cpu/cpu.stat

# 3. 节点层分析
echo "3. 节点层分析："
kubectl top node
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "top -bn1 | head -20"

# 4. 网络层分析
echo "4. 网络层分析："
kubectl exec $(kubectl get pod -l app=myapp -o jsonpath='{.items[0].metadata.name}') -- ping -c 3 8.8.8.8
kubectl exec $(kubectl get pod -l app=myapp -o jsonpath='{.items[0].metadata.name}') -- nslookup kubernetes.default

# 5. 集群层分析
echo "5. 集群层分析："
kubectl get events --sort-by=.lastTimestamp | tail -20
kubectl get --raw /metrics | grep scheduler

实验2：性能监控工具部署

#!/bin/bash
# 性能监控工具部署脚本

echo "=== 性能监控工具部署 ==="

# 1. 部署 Prometheus
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
    scrape_configs:
    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
      - role: pod
    - job_name: 'kubernetes-nodes'
      kubernetes_sd_configs:
      - role: node
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus
        ports:
        - containerPort: 9090
        volumeMounts:
        - name: config
          mountPath: /etc/prometheus
      volumes:
      - name: config
        configMap:
          name: prometheus-config
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  selector:
    app: prometheus
  ports:
  - port: 9090
    targetPort: 9090
  type: NodePort
EOF

# 2. 部署 Grafana
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana
        ports:
        - containerPort: 3000
        env:
        - name: GF_SECURITY_ADMIN_PASSWORD
          value: admin
---
apiVersion: v1
kind: Service
metadata:
  name: grafana
spec:
  selector:
    app: grafana
  ports:
  - port: 3000
    targetPort: 3000
  type: NodePort
EOF

# 3. 部署 Node Exporter
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      containers:
      - name: node-exporter
        image: prom/node-exporter
        ports:
        - containerPort: 9100
        volumeMounts:
        - name: proc
          mountPath: /host/proc
          readOnly: true
        - name: sys
          mountPath: /host/sys
          readOnly: true
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys
      hostNetwork: true
      hostPID: true
---
apiVersion: v1
kind: Service
metadata:
  name: node-exporter
spec:
  selector:
    app: node-exporter
  ports:
  - port: 9100
    targetPort: 9100
EOF

echo "性能监控工具部署完成！"
echo "Prometheus: http://<node-ip>:<node-port>"
echo "Grafana: http://<node-ip>:<node-port> (admin/admin)"

CPU 性能分析与调优

7.2 CPU 性能分析

7.2.1 关键指标

指标	含义	查看命令
CPU usage	使用率	`kubectl top pod`
CPU throttled time	被 Cgroup 限制的时间	`cat /sys/fs/cgroup/cpu.stat`
Load Average	平均负载	`uptime`
context switches	上下文切换次数	`vmstat 1`
run queue length	等待执行的进程数量	`vmstat 1`

7.2.2 CPU Throttling 问题

当容器设置了 CPU limit 时，Linux CFS 会"限流"：

cpu.cfs_quota_us / cpu.cfs_period_us

默认周期：100ms
超过限额 → 容器暂停执行 → 导致 QPS 波动

查看被限流情况：

cat /sys/fs/cgroup/cpu.stat
# throttled_time 表示 CPU 被限流的总时间

优化策略：

尽量避免设置 limits.cpu，只设置 requests.cpu
或者将 limit 与 request 设置相同
或者加大 cpu.cfs_period_us 周期

实验1：CPU Throttling 完整案例

#!/bin/bash
# CPU Throttling 完整案例

echo "=== CPU Throttling 完整案例 ==="

# 1. 创建 CPU 限制 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: cpu-throttling-test
spec:
  containers:
  - name: test
    image: nixery.dev/shell/stress-ng
    command: ["stress-ng", "--cpu", "4", "--timeout", "60s"]
    resources:
      requests:
        cpu: 100m
      limits:
        cpu: 200m
EOF

# 2. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod cpu-throttling-test --timeout=60s

# 3. 查看 CPU 使用情况
echo "查看 CPU 使用情况："
kubectl top pod cpu-throttling-test

# 4. 查看 Cgroup 配置
echo "查看 Cgroup 配置："
kubectl exec cpu-throttling-test -- cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
kubectl exec cpu-throttling-test -- cat /sys/fs/cgroup/cpu/cpu.cfs_period_us

# 5. 查看限流统计
echo "查看限流统计："
kubectl exec cpu-throttling-test -- cat /sys/fs/cgroup/cpu/cpu.stat

# 6. 监控限流情况
echo "监控限流情况："
for i in {1..10}; do
  echo "第 $i 次检查："
  kubectl exec cpu-throttling-test -- cat /sys/fs/cgroup/cpu/cpu.stat | grep throttled
  sleep 5
done

# 7. 清理
kubectl delete pod cpu-throttling-test

实验2：CPU 亲和与 NUMA 优化

#!/bin/bash
# CPU 亲和与 NUMA 优化

echo "=== CPU 亲和与 NUMA 优化 ==="

# 1. 检查 NUMA 拓扑
echo "检查 NUMA 拓扑："
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "numactl --hardware"

# 2. 创建 NUMA 亲和性 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: numa-optimized
spec:
  containers:
  - name: test
    image: nixery.dev/shell/stress-ng
    command: ["stress-ng", "--cpu", "2", "--timeout", "60s"]
    resources:
      requests:
        cpu: 2
        memory: 1Gi
      limits:
        cpu: 2
        memory: 1Gi
  nodeSelector:
    kubernetes.io/hostname: node-1
EOF

# 3. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod numa-optimized --timeout=60s

# 4. 查看 CPU 绑定
echo "查看 CPU 绑定："
kubectl exec numa-optimized -- taskset -cp $$

# 5. 查看 NUMA 绑定
echo "查看 NUMA 绑定："
kubectl exec numa-optimized -- numactl --show

# 6. 性能测试
echo "性能测试："
kubectl exec numa-optimized -- stress-ng --cpu 2 --timeout 10s --metrics-brief

# 7. 清理
kubectl delete pod numa-optimized

内存性能与 OOM 问题

7.3 内存性能分析

7.3.1 关键指标

指标	含义	查看命令
memory.usage_in_bytes	实际使用	`cat /sys/fs/cgroup/memory/memory.usage_in_bytes`
memory.limit_in_bytes	限制	`cat /sys/fs/cgroup/memory/memory.limit_in_bytes`
page_faults	缺页次数	`cat /sys/fs/cgroup/memory/memory.stat`
rss	常驻内存集	`cat /sys/fs/cgroup/memory/memory.stat`
cache	文件缓存	`cat /sys/fs/cgroup/memory/memory.stat`

7.3.2 OOMKilled 问题

当容器超过 memory limit → 系统 OOM Killer 杀死进程。

查看：

kubectl describe pod <pod>

日志片段：

Last State: Terminated
Reason: OOMKilled
Exit Code: 137

优化策略：

方向	方法	示例
内存碎片	调整 vm.swappiness=0	`sysctl vm.swappiness=0`
Cgroup 限制	确保 limit 不小于 request	`requests: {memory: "256Mi"}, limits: {memory: "512Mi"}`
GC 调优	Go：GOMEMLIMIT; Java：-XX:+UseG1GC	`GOMEMLIMIT=512Mi`
临时文件	使用 emptyDir: { medium: Memory }	`emptyDir: { medium: Memory }`

实验1：OOM 触发与分析

#!/bin/bash
# OOM 触发与分析

echo "=== OOM 触发与分析 ==="

# 1. 创建内存限制 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: oom-test
spec:
  containers:
  - name: test
    image: python:3.9
    command: ["python3", "-c"]
    args:
    - |
      import time
      data = []
      for i in range(1000):
          data.append('x' * 1024 * 1024)  # 每次分配 1MB
          print(f'Allocated: {i+1}MB')
          time.sleep(0.1)
    resources:
      requests:
        memory: 128Mi
      limits:
        memory: 256Mi
EOF

# 2. 等待 Pod 启动
sleep 10

# 3. 查看 Pod 状态
echo "查看 Pod 状态："
kubectl get pod oom-test

# 4. 查看 Pod 事件
echo "查看 Pod 事件："
kubectl describe pod oom-test

# 5. 查看 OOM 日志
echo "查看 OOM 日志："
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "dmesg | grep -i oom | tail -10"

# 6. 查看内存使用统计
echo "查看内存使用统计："
kubectl exec oom-test -- cat /sys/fs/cgroup/memory/memory.stat 2>/dev/null || echo "Pod 已终止"

# 7. 清理
kubectl delete pod oom-test

实验2：内存泄漏模拟与定位

#!/bin/bash
# 内存泄漏模拟与定位

echo "=== 内存泄漏模拟与定位 ==="

# 1. 创建内存泄漏测试 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: memory-leak-test
spec:
  containers:
  - name: test
    image: golang:1.19
    command: ["go", "run", "/app/main.go"]
    volumeMounts:
    - name: app
      mountPath: /app
  volumes:
  - name: app
    configMap:
      name: memory-leak-app
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: memory-leak-app
data:
  main.go: |
    package main
    
    import (
        "fmt"
        "runtime"
        "time"
    )
    
    func main() {
        // 启用 pprof
        go func() {
            for {
                var m runtime.MemStats
                runtime.ReadMemStats(&m)
                fmt.Printf("Alloc = %d KB, Sys = %d KB, NumGC = %d\n", 
                    m.Alloc/1024, m.Sys/1024, m.NumGC)
                time.Sleep(5 * time.Second)
            }
        }()
        
        // 模拟内存泄漏
        for i := 0; i < 1000; i++ {
            data := make([]byte, 1024*1024) // 1MB
            _ = data
            time.Sleep(100 * time.Millisecond)
        }
        
        time.Sleep(30 * time.Second)
    }
EOF

# 2. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod memory-leak-test --timeout=60s

# 3. 监控内存使用
echo "监控内存使用："
for i in {1..20}; do
  echo "第 $i 次检查："
  kubectl top pod memory-leak-test
  sleep 3
done

# 4. 查看 Pod 日志
echo "查看 Pod 日志："
kubectl logs memory-leak-test

# 5. 清理
kubectl delete pod memory-leak-test
kubectl delete configmap memory-leak-app

I/O 与存储性能排查

7.4 磁盘与 I/O 性能排查

I/O 性能低常见原因：

原因	症状	解决方案
OverlayFS 层太多	写入延迟高	使用独立卷存储
inode 耗尽	无法创建文件	清理临时文件
容器日志写入太频繁	磁盘 IO 高	限制日志大小
节点磁盘饱和	整体性能下降	扩容或清理磁盘

实验1：I/O 性能测试

#!/bin/bash
# I/O 性能测试

echo "=== I/O 性能测试 ==="

# 1. 创建 I/O 测试 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: io-test
spec:
  containers:
  - name: test
    image: nixery.dev/shell/fio
    command: ["fio", "--name=randread", "--filename=/data/test.file", "--bs=4k", "--iodepth=32", "--rw=randread", "--numjobs=4", "--time_based", "--runtime=60", "--group_reporting"]
    volumeMounts:
    - name: test-volume
      mountPath: /data
  volumes:
  - name: test-volume
    emptyDir: {}
EOF

# 2. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod io-test --timeout=60s

# 3. 运行 I/O 测试
echo "运行 I/O 测试："
kubectl exec io-test -- fio --name=randread --filename=/data/test.file --bs=4k --iodepth=32 --rw=randread --numjobs=4 --time_based --runtime=60 --group_reporting

# 4. 运行写入测试
echo "运行写入测试："
kubectl exec io-test -- fio --name=randwrite --filename=/data/test.file --bs=4k --iodepth=32 --rw=randwrite --numjobs=4 --time_based --runtime=60 --group_reporting

# 5. 运行混合测试
echo "运行混合测试："
kubectl exec io-test -- fio --name=mixed --filename=/data/test.file --bs=4k --iodepth=32 --rw=randrw --rwmixread=70 --numjobs=4 --time_based --runtime=60 --group_reporting

# 6. 清理
kubectl delete pod io-test

实验2：存储性能优化

#!/bin/bash
# 存储性能优化

echo "=== 存储性能优化 ==="

# 1. 检查当前 I/O 调度器
echo "检查当前 I/O 调度器："
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "cat /sys/block/sda/queue/scheduler"

# 2. 优化 I/O 调度器
echo "优化 I/O 调度器："
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "echo mq-deadline > /sys/block/sda/queue/scheduler"

# 3. 检查文件系统类型
echo "检查文件系统类型："
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "df -T"

# 4. 优化内核参数
echo "优化内核参数："
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w vm.dirty_ratio=10"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w vm.dirty_background_ratio=5"

# 5. 创建优化后的测试 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: optimized-io-test
spec:
  containers:
  - name: test
    image: nixery.dev/shell/fio
    command: ["fio", "--name=optimized", "--filename=/data/test.file", "--bs=4k", "--iodepth=64", "--rw=randread", "--numjobs=8", "--time_based", "--runtime=60", "--group_reporting"]
    volumeMounts:
    - name: test-volume
      mountPath: /data
  volumes:
  - name: test-volume
    emptyDir: {}
EOF

# 6. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod optimized-io-test --timeout=60s

# 7. 运行优化后的测试
echo "运行优化后的测试："
kubectl exec optimized-io-test -- fio --name=optimized --filename=/data/test.file --bs=4k --iodepth=64 --rw=randread --numjobs=8 --time_based --runtime=60 --group_reporting

# 8. 清理
kubectl delete pod optimized-io-test

网络性能与 CNI 排查

7.5 网络性能排查

7.5.1 网络层级

Pod → veth pair → bridge (cni0) → host → eth0 → 外部网络

CNI 插件（Calico、Flannel、Cilium）影响路径性能。

7.5.2 常见问题与排查

问题	可能原因	工具	解决方案
Pod 之间延迟高	CNI 路由不通 / MTU 配置错误	ping / traceroute	检查路由表、MTU 设置
DNS 解析慢	CoreDNS 高负载	kubectl logs coredns	优化 CoreDNS 配置
出口带宽受限	egress 限制 / 网卡瓶颈	iperf3 / iftop	检查网络策略、网卡配置

实验1：网络性能测试

#!/bin/bash
# 网络性能测试

echo "=== 网络性能测试 ==="

# 1. 创建网络测试 Pod
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: network-test
spec:
  replicas: 2
  selector:
    matchLabels:
      app: network-test
  template:
    metadata:
      labels:
        app: network-test
    spec:
      containers:
      - name: test
        image: nixery.dev/shell/iperf3
        command: ["iperf3", "-s"]
        ports:
        - containerPort: 5201
---
apiVersion: v1
kind: Service
metadata:
  name: network-test
spec:
  selector:
    app: network-test
  ports:
  - port: 5201
    targetPort: 5201
EOF

# 2. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod -l app=network-test --timeout=60s

# 3. 获取 Pod IP
POD1_IP=$(kubectl get pod -l app=network-test -o jsonpath='{.items[0].status.podIP}')
POD2_IP=$(kubectl get pod -l app=network-test -o jsonpath='{.items[1].status.podIP}')

echo "Pod 1 IP: $POD1_IP"
echo "Pod 2 IP: $POD2_IP"

# 4. 测试延迟
echo "测试延迟："
kubectl exec $(kubectl get pod -l app=network-test -o jsonpath='{.items[0].metadata.name}') -- ping -c 10 $POD2_IP

# 5. 测试带宽
echo "测试带宽："
kubectl exec $(kubectl get pod -l app=network-test -o jsonpath='{.items[0].metadata.name}') -- iperf3 -c $POD2_IP -t 30

# 6. 测试外部网络
echo "测试外部网络："
kubectl exec $(kubectl get pod -l app=network-test -o jsonpath='{.items[0].metadata.name}') -- ping -c 5 8.8.8.8

# 7. 清理
kubectl delete deployment network-test
kubectl delete service network-test

实验2：网络性能优化

#!/bin/bash
# 网络性能优化

echo "=== 网络性能优化 ==="

# 1. 检查当前网络配置
echo "检查当前网络配置："
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "ip link show | grep mtu"

# 2. 优化 MTU
echo "优化 MTU："
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "ip link set dev eth0 mtu 1500"

# 3. 优化 TCP 参数
echo "优化 TCP 参数："
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w net.core.somaxconn=65535"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w net.ipv4.tcp_tw_reuse=1"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w net.ipv4.tcp_fin_timeout=10"

# 4. 优化网络缓冲区
echo "优化网络缓冲区："
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w net.core.rmem_max=16777216"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w net.core.wmem_max=16777216"

# 5. 创建优化后的测试 Pod
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: optimized-network-test
spec:
  replicas: 2
  selector:
    matchLabels:
      app: optimized-network-test
  template:
    metadata:
      labels:
        app: optimized-network-test
    spec:
      containers:
      - name: test
        image: nixery.dev/shell/iperf3
        command: ["iperf3", "-s"]
        ports:
        - containerPort: 5201
---
apiVersion: v1
kind: Service
metadata:
  name: optimized-network-test
spec:
  selector:
    app: optimized-network-test
  ports:
  - port: 5201
    targetPort: 5201
EOF

# 6. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod -l app=optimized-network-test --timeout=60s

# 7. 运行优化后的测试
echo "运行优化后的测试："
POD1_IP=$(kubectl get pod -l app=optimized-network-test -o jsonpath='{.items[0].status.podIP}')
POD2_IP=$(kubectl get pod -l app=optimized-network-test -o jsonpath='{.items[1].status.podIP}')

kubectl exec $(kubectl get pod -l app=optimized-network-test -o jsonpath='{.items[0].metadata.name}') -- iperf3 -c $POD2_IP -t 30

# 8. 清理
kubectl delete deployment optimized-network-test
kubectl delete service optimized-network-test

实验环境搭建

快速搭建脚本

#!/bin/bash
# Kubernetes 性能调优实验环境搭建脚本

set -e

echo "开始搭建 Kubernetes 性能调优实验环境..."

# 1. 创建实验集群
kind create cluster --name performance-test --config - <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
networking:
  podSubnet: "10.244.0.0/16"
  serviceSubnet: "10.96.0.0/12"
EOF

# 2. 等待集群就绪
kubectl wait --for=condition=Ready node --all --timeout=60s

# 3. 安装测试工具
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: performance-tools
spec:
  replicas: 1
  selector:
    matchLabels:
      app: performance-tools
  template:
    metadata:
      labels:
        app: performance-tools
    spec:
      containers:
      - name: tools
        image: nixery.dev/shell/stress-ng
        command: ["sleep", "3600"]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fio-tools
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fio-tools
  template:
    metadata:
      labels:
        app: fio-tools
    spec:
      containers:
      - name: fio
        image: nixery.dev/shell/fio
        command: ["sleep", "3600"]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: network-tools
spec:
  replicas: 1
  selector:
    matchLabels:
      app: network-tools
  template:
    metadata:
      labels:
        app: network-tools
    spec:
      containers:
      - name: tools
        image: nixery.dev/shell/iperf3
        command: ["sleep", "3600"]
EOF

# 4. 创建测试资源
kubectl apply -f - <<EOF
# 测试资源 1：CPU 密集型
apiVersion: v1
kind: Pod
metadata:
  name: cpu-test
spec:
  containers:
  - name: test
    image: nixery.dev/shell/stress-ng
    command: ["stress-ng", "--cpu", "2", "--timeout", "60s"]
    resources:
      requests:
        cpu: 1
        memory: 128Mi
      limits:
        cpu: 2
        memory: 256Mi
---
# 测试资源 2：内存密集型
apiVersion: v1
kind: Pod
metadata:
  name: memory-test
spec:
  containers:
  - name: test
    image: nixery.dev/shell/stress-ng
    command: ["stress-ng", "--vm", "1", "--vm-bytes", "256M", "--timeout", "60s"]
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: 200m
        memory: 512Mi
---
# 测试资源 3：I/O 密集型
apiVersion: v1
kind: Pod
metadata:
  name: io-test
spec:
  containers:
  - name: test
    image: nixery.dev/shell/fio
    command: ["fio", "--name=test", "--filename=/data/test.file", "--bs=4k", "--iodepth=32", "--rw=randread", "--numjobs=4", "--time_based", "--runtime=60", "--group_reporting"]
    volumeMounts:
    - name: test-volume
      mountPath: /data
  volumes:
  - name: test-volume
    emptyDir: {}
EOF

echo "Kubernetes 性能调优实验环境搭建完成！"
echo "运行 'kubectl get nodes' 查看集群状态"
echo "运行 'kubectl get pods' 查看 Pod 状态"

命令速查表

性能监控命令

命令	功能	示例
`kubectl top pod`	查看 Pod 资源使用	`kubectl top pod`
`kubectl top node`	查看节点资源使用	`kubectl top node`
`kubectl describe pod <pod>`	查看 Pod 详细信息	`kubectl describe pod my-pod`
`kubectl logs <pod>`	查看 Pod 日志	`kubectl logs my-pod`
`kubectl get events --sort-by=.lastTimestamp`	查看事件	`kubectl get events --sort-by=.lastTimestamp`

系统监控命令

命令	功能	示例
`top`	系统监控	`top`
`htop`	增强版系统监控	`htop`
`iostat -x 1`	I/O 监控	`iostat -x 1`
`vmstat 1`	虚拟内存统计	`vmstat 1`
`sar -n DEV 1`	网络统计	`sar -n DEV 1`
`perf stat`	性能统计	`perf stat ./myapp`

网络测试命令

命令	功能	示例
`ping <ip>`	测试连通性	`ping 8.8.8.8`
`traceroute <ip>`	查看网络路径	`traceroute 8.8.8.8`
`iperf3 -c <server>`	带宽测试	`iperf3 -c server`
`iperf3 -s`	启动带宽测试服务端	`iperf3 -s`
`nslookup <hostname>`	DNS 解析测试	`nslookup kubernetes.default`

存储测试命令

命令	功能	示例
`fio --name=test --filename=/data/test.file --bs=4k --iodepth=32 --rw=randread --numjobs=4 --time_based --runtime=60 --group_reporting`	随机读测试	在 Pod 中运行
`fio --name=test --filename=/data/test.file --bs=4k --iodepth=32 --rw=randwrite --numjobs=4 --time_based --runtime=60 --group_reporting`	随机写测试	在 Pod 中运行
`dd if=/dev/zero of=/data/test.file bs=1M count=1000`	顺序写测试	在 Pod 中运行
`dd if=/data/test.file of=/dev/null bs=1M`	顺序读测试	在 Pod 中运行