HiHuo
首页
博客
手册
工具
关于
首页
博客
手册
工具
关于
  • 运维手册

    • Kubernetes 全栈实战与性能原理教程
    • 第一部分:Linux 基础与系统原理
    • 第二部分:Kubernetes 网络深度解析
    • 第三部分:Kubernetes 存储管理实战
    • 第四部分:Kubernetes 调度器深度解析
    • 第五部分:Kubernetes 性能调优实战
    • 第六部分:CNI 与 eBPF 网络深度实践
    • 第七部分:Kubernetes 生产级调优案例
    • 第八部分:命令速查与YAML模板库
    • 第九部分:实验环境快速搭建指南
    • 第十部分:面试题库与进阶路径
    • 第11章:Kubernetes 网络·存储·大文件排查专项手册

第五部分:Kubernetes 性能调优实战

从系统瓶颈到应用优化的完整性能调优体系

目录

  • 第7章:Kubernetes 性能调优与排查实战
  • 性能分层分析模型
  • CPU 性能分析与调优
  • 内存性能与 OOM 问题
  • I/O 与存储性能排查
  • 网络性能与 CNI 排查
  • 实验环境搭建
  • 命令速查表

第7章:Kubernetes 性能调优与排查实战

7.1 性能调优的分层模型

理解性能问题的第一步:分层分析。

K8s 集群中,性能问题可分为 5 层:

应用层     → 程序慢
容器层     → Cgroups 限制、CPU 抢占
节点层     → 内核资源竞争、I/O 拥堵
网络层     → CNI 延迟、DNS 阻塞
集群层     → 调度延迟、API Server 压力

7.1.1 对应排查方向

层级关键问题核心工具示例
应用层代码瓶颈、Goroutine阻塞pprof / trace / flamegraphgo tool pprof
容器层Cgroup 限制、内存泄漏docker stats / cadvisorkubectl top pod
节点层CPU Throttling、IO Waittop / iostat / vmstatiostat -x 1
网络层Ping高延迟、Pod连通性差iperf / ping / tcpdumpiperf3 -c server
集群层调度慢、事件积压kubectl get events / metricskubectl get events

性能分层分析模型

实验1:性能问题定位流程

#!/bin/bash
# 性能问题定位流程脚本

echo "=== 性能问题定位流程 ==="

# 1. 应用层分析
echo "1. 应用层分析:"
kubectl get pods -o wide
kubectl top pod
kubectl logs -l app=myapp --tail=100

# 2. 容器层分析
echo "2. 容器层分析:"
kubectl describe pod $(kubectl get pod -l app=myapp -o jsonpath='{.items[0].metadata.name}')
kubectl exec $(kubectl get pod -l app=myapp -o jsonpath='{.items[0].metadata.name}') -- cat /sys/fs/cgroup/cpu/cpu.stat

# 3. 节点层分析
echo "3. 节点层分析:"
kubectl top node
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "top -bn1 | head -20"

# 4. 网络层分析
echo "4. 网络层分析:"
kubectl exec $(kubectl get pod -l app=myapp -o jsonpath='{.items[0].metadata.name}') -- ping -c 3 8.8.8.8
kubectl exec $(kubectl get pod -l app=myapp -o jsonpath='{.items[0].metadata.name}') -- nslookup kubernetes.default

# 5. 集群层分析
echo "5. 集群层分析:"
kubectl get events --sort-by=.lastTimestamp | tail -20
kubectl get --raw /metrics | grep scheduler

实验2:性能监控工具部署

#!/bin/bash
# 性能监控工具部署脚本

echo "=== 性能监控工具部署 ==="

# 1. 部署 Prometheus
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
    scrape_configs:
    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
      - role: pod
    - job_name: 'kubernetes-nodes'
      kubernetes_sd_configs:
      - role: node
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus
        ports:
        - containerPort: 9090
        volumeMounts:
        - name: config
          mountPath: /etc/prometheus
      volumes:
      - name: config
        configMap:
          name: prometheus-config
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  selector:
    app: prometheus
  ports:
  - port: 9090
    targetPort: 9090
  type: NodePort
EOF

# 2. 部署 Grafana
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana
        ports:
        - containerPort: 3000
        env:
        - name: GF_SECURITY_ADMIN_PASSWORD
          value: admin
---
apiVersion: v1
kind: Service
metadata:
  name: grafana
spec:
  selector:
    app: grafana
  ports:
  - port: 3000
    targetPort: 3000
  type: NodePort
EOF

# 3. 部署 Node Exporter
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      containers:
      - name: node-exporter
        image: prom/node-exporter
        ports:
        - containerPort: 9100
        volumeMounts:
        - name: proc
          mountPath: /host/proc
          readOnly: true
        - name: sys
          mountPath: /host/sys
          readOnly: true
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys
      hostNetwork: true
      hostPID: true
---
apiVersion: v1
kind: Service
metadata:
  name: node-exporter
spec:
  selector:
    app: node-exporter
  ports:
  - port: 9100
    targetPort: 9100
EOF

echo "性能监控工具部署完成!"
echo "Prometheus: http://<node-ip>:<node-port>"
echo "Grafana: http://<node-ip>:<node-port> (admin/admin)"

CPU 性能分析与调优

7.2 CPU 性能分析

7.2.1 关键指标

指标含义查看命令
CPU usage使用率kubectl top pod
CPU throttled time被 Cgroup 限制的时间cat /sys/fs/cgroup/cpu.stat
Load Average平均负载uptime
context switches上下文切换次数vmstat 1
run queue length等待执行的进程数量vmstat 1

7.2.2 CPU Throttling 问题

当容器设置了 CPU limit 时,Linux CFS 会"限流":

cpu.cfs_quota_us / cpu.cfs_period_us
  • 默认周期:100ms
  • 超过限额 → 容器暂停执行 → 导致 QPS 波动

查看被限流情况:

cat /sys/fs/cgroup/cpu.stat
# throttled_time 表示 CPU 被限流的总时间

优化策略:

  • 尽量避免设置 limits.cpu,只设置 requests.cpu
  • 或者将 limit 与 request 设置相同
  • 或者加大 cpu.cfs_period_us 周期

实验1:CPU Throttling 完整案例

#!/bin/bash
# CPU Throttling 完整案例

echo "=== CPU Throttling 完整案例 ==="

# 1. 创建 CPU 限制 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: cpu-throttling-test
spec:
  containers:
  - name: test
    image: nixery.dev/shell/stress-ng
    command: ["stress-ng", "--cpu", "4", "--timeout", "60s"]
    resources:
      requests:
        cpu: 100m
      limits:
        cpu: 200m
EOF

# 2. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod cpu-throttling-test --timeout=60s

# 3. 查看 CPU 使用情况
echo "查看 CPU 使用情况:"
kubectl top pod cpu-throttling-test

# 4. 查看 Cgroup 配置
echo "查看 Cgroup 配置:"
kubectl exec cpu-throttling-test -- cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
kubectl exec cpu-throttling-test -- cat /sys/fs/cgroup/cpu/cpu.cfs_period_us

# 5. 查看限流统计
echo "查看限流统计:"
kubectl exec cpu-throttling-test -- cat /sys/fs/cgroup/cpu/cpu.stat

# 6. 监控限流情况
echo "监控限流情况:"
for i in {1..10}; do
  echo "第 $i 次检查:"
  kubectl exec cpu-throttling-test -- cat /sys/fs/cgroup/cpu/cpu.stat | grep throttled
  sleep 5
done

# 7. 清理
kubectl delete pod cpu-throttling-test

实验2:CPU 亲和与 NUMA 优化

#!/bin/bash
# CPU 亲和与 NUMA 优化

echo "=== CPU 亲和与 NUMA 优化 ==="

# 1. 检查 NUMA 拓扑
echo "检查 NUMA 拓扑:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "numactl --hardware"

# 2. 创建 NUMA 亲和性 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: numa-optimized
spec:
  containers:
  - name: test
    image: nixery.dev/shell/stress-ng
    command: ["stress-ng", "--cpu", "2", "--timeout", "60s"]
    resources:
      requests:
        cpu: 2
        memory: 1Gi
      limits:
        cpu: 2
        memory: 1Gi
  nodeSelector:
    kubernetes.io/hostname: node-1
EOF

# 3. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod numa-optimized --timeout=60s

# 4. 查看 CPU 绑定
echo "查看 CPU 绑定:"
kubectl exec numa-optimized -- taskset -cp $$

# 5. 查看 NUMA 绑定
echo "查看 NUMA 绑定:"
kubectl exec numa-optimized -- numactl --show

# 6. 性能测试
echo "性能测试:"
kubectl exec numa-optimized -- stress-ng --cpu 2 --timeout 10s --metrics-brief

# 7. 清理
kubectl delete pod numa-optimized

内存性能与 OOM 问题

7.3 内存性能分析

7.3.1 关键指标

指标含义查看命令
memory.usage_in_bytes实际使用cat /sys/fs/cgroup/memory/memory.usage_in_bytes
memory.limit_in_bytes限制cat /sys/fs/cgroup/memory/memory.limit_in_bytes
page_faults缺页次数cat /sys/fs/cgroup/memory/memory.stat
rss常驻内存集cat /sys/fs/cgroup/memory/memory.stat
cache文件缓存cat /sys/fs/cgroup/memory/memory.stat

7.3.2 OOMKilled 问题

当容器超过 memory limit → 系统 OOM Killer 杀死进程。

查看:

kubectl describe pod <pod>

日志片段:

Last State: Terminated
Reason: OOMKilled
Exit Code: 137

优化策略:

方向方法示例
内存碎片调整 vm.swappiness=0sysctl vm.swappiness=0
Cgroup 限制确保 limit 不小于 requestrequests: {memory: "256Mi"}, limits: {memory: "512Mi"}
GC 调优Go:GOMEMLIMIT; Java:-XX:+UseG1GCGOMEMLIMIT=512Mi
临时文件使用 emptyDir: { medium: Memory }emptyDir: { medium: Memory }

实验1:OOM 触发与分析

#!/bin/bash
# OOM 触发与分析

echo "=== OOM 触发与分析 ==="

# 1. 创建内存限制 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: oom-test
spec:
  containers:
  - name: test
    image: python:3.9
    command: ["python3", "-c"]
    args:
    - |
      import time
      data = []
      for i in range(1000):
          data.append('x' * 1024 * 1024)  # 每次分配 1MB
          print(f'Allocated: {i+1}MB')
          time.sleep(0.1)
    resources:
      requests:
        memory: 128Mi
      limits:
        memory: 256Mi
EOF

# 2. 等待 Pod 启动
sleep 10

# 3. 查看 Pod 状态
echo "查看 Pod 状态:"
kubectl get pod oom-test

# 4. 查看 Pod 事件
echo "查看 Pod 事件:"
kubectl describe pod oom-test

# 5. 查看 OOM 日志
echo "查看 OOM 日志:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "dmesg | grep -i oom | tail -10"

# 6. 查看内存使用统计
echo "查看内存使用统计:"
kubectl exec oom-test -- cat /sys/fs/cgroup/memory/memory.stat 2>/dev/null || echo "Pod 已终止"

# 7. 清理
kubectl delete pod oom-test

实验2:内存泄漏模拟与定位

#!/bin/bash
# 内存泄漏模拟与定位

echo "=== 内存泄漏模拟与定位 ==="

# 1. 创建内存泄漏测试 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: memory-leak-test
spec:
  containers:
  - name: test
    image: golang:1.19
    command: ["go", "run", "/app/main.go"]
    volumeMounts:
    - name: app
      mountPath: /app
  volumes:
  - name: app
    configMap:
      name: memory-leak-app
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: memory-leak-app
data:
  main.go: |
    package main
    
    import (
        "fmt"
        "runtime"
        "time"
    )
    
    func main() {
        // 启用 pprof
        go func() {
            for {
                var m runtime.MemStats
                runtime.ReadMemStats(&m)
                fmt.Printf("Alloc = %d KB, Sys = %d KB, NumGC = %d\n", 
                    m.Alloc/1024, m.Sys/1024, m.NumGC)
                time.Sleep(5 * time.Second)
            }
        }()
        
        // 模拟内存泄漏
        for i := 0; i < 1000; i++ {
            data := make([]byte, 1024*1024) // 1MB
            _ = data
            time.Sleep(100 * time.Millisecond)
        }
        
        time.Sleep(30 * time.Second)
    }
EOF

# 2. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod memory-leak-test --timeout=60s

# 3. 监控内存使用
echo "监控内存使用:"
for i in {1..20}; do
  echo "第 $i 次检查:"
  kubectl top pod memory-leak-test
  sleep 3
done

# 4. 查看 Pod 日志
echo "查看 Pod 日志:"
kubectl logs memory-leak-test

# 5. 清理
kubectl delete pod memory-leak-test
kubectl delete configmap memory-leak-app

I/O 与存储性能排查

7.4 磁盘与 I/O 性能排查

I/O 性能低常见原因:

原因症状解决方案
OverlayFS 层太多写入延迟高使用独立卷存储
inode 耗尽无法创建文件清理临时文件
容器日志写入太频繁磁盘 IO 高限制日志大小
节点磁盘饱和整体性能下降扩容或清理磁盘

实验1:I/O 性能测试

#!/bin/bash
# I/O 性能测试

echo "=== I/O 性能测试 ==="

# 1. 创建 I/O 测试 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: io-test
spec:
  containers:
  - name: test
    image: nixery.dev/shell/fio
    command: ["fio", "--name=randread", "--filename=/data/test.file", "--bs=4k", "--iodepth=32", "--rw=randread", "--numjobs=4", "--time_based", "--runtime=60", "--group_reporting"]
    volumeMounts:
    - name: test-volume
      mountPath: /data
  volumes:
  - name: test-volume
    emptyDir: {}
EOF

# 2. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod io-test --timeout=60s

# 3. 运行 I/O 测试
echo "运行 I/O 测试:"
kubectl exec io-test -- fio --name=randread --filename=/data/test.file --bs=4k --iodepth=32 --rw=randread --numjobs=4 --time_based --runtime=60 --group_reporting

# 4. 运行写入测试
echo "运行写入测试:"
kubectl exec io-test -- fio --name=randwrite --filename=/data/test.file --bs=4k --iodepth=32 --rw=randwrite --numjobs=4 --time_based --runtime=60 --group_reporting

# 5. 运行混合测试
echo "运行混合测试:"
kubectl exec io-test -- fio --name=mixed --filename=/data/test.file --bs=4k --iodepth=32 --rw=randrw --rwmixread=70 --numjobs=4 --time_based --runtime=60 --group_reporting

# 6. 清理
kubectl delete pod io-test

实验2:存储性能优化

#!/bin/bash
# 存储性能优化

echo "=== 存储性能优化 ==="

# 1. 检查当前 I/O 调度器
echo "检查当前 I/O 调度器:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "cat /sys/block/sda/queue/scheduler"

# 2. 优化 I/O 调度器
echo "优化 I/O 调度器:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "echo mq-deadline > /sys/block/sda/queue/scheduler"

# 3. 检查文件系统类型
echo "检查文件系统类型:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "df -T"

# 4. 优化内核参数
echo "优化内核参数:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w vm.dirty_ratio=10"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w vm.dirty_background_ratio=5"

# 5. 创建优化后的测试 Pod
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: optimized-io-test
spec:
  containers:
  - name: test
    image: nixery.dev/shell/fio
    command: ["fio", "--name=optimized", "--filename=/data/test.file", "--bs=4k", "--iodepth=64", "--rw=randread", "--numjobs=8", "--time_based", "--runtime=60", "--group_reporting"]
    volumeMounts:
    - name: test-volume
      mountPath: /data
  volumes:
  - name: test-volume
    emptyDir: {}
EOF

# 6. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod optimized-io-test --timeout=60s

# 7. 运行优化后的测试
echo "运行优化后的测试:"
kubectl exec optimized-io-test -- fio --name=optimized --filename=/data/test.file --bs=4k --iodepth=64 --rw=randread --numjobs=8 --time_based --runtime=60 --group_reporting

# 8. 清理
kubectl delete pod optimized-io-test

网络性能与 CNI 排查

7.5 网络性能排查

7.5.1 网络层级

Pod → veth pair → bridge (cni0) → host → eth0 → 外部网络

CNI 插件(Calico、Flannel、Cilium)影响路径性能。

7.5.2 常见问题与排查

问题可能原因工具解决方案
Pod 之间延迟高CNI 路由不通 / MTU 配置错误ping / traceroute检查路由表、MTU 设置
DNS 解析慢CoreDNS 高负载kubectl logs coredns优化 CoreDNS 配置
出口带宽受限egress 限制 / 网卡瓶颈iperf3 / iftop检查网络策略、网卡配置

实验1:网络性能测试

#!/bin/bash
# 网络性能测试

echo "=== 网络性能测试 ==="

# 1. 创建网络测试 Pod
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: network-test
spec:
  replicas: 2
  selector:
    matchLabels:
      app: network-test
  template:
    metadata:
      labels:
        app: network-test
    spec:
      containers:
      - name: test
        image: nixery.dev/shell/iperf3
        command: ["iperf3", "-s"]
        ports:
        - containerPort: 5201
---
apiVersion: v1
kind: Service
metadata:
  name: network-test
spec:
  selector:
    app: network-test
  ports:
  - port: 5201
    targetPort: 5201
EOF

# 2. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod -l app=network-test --timeout=60s

# 3. 获取 Pod IP
POD1_IP=$(kubectl get pod -l app=network-test -o jsonpath='{.items[0].status.podIP}')
POD2_IP=$(kubectl get pod -l app=network-test -o jsonpath='{.items[1].status.podIP}')

echo "Pod 1 IP: $POD1_IP"
echo "Pod 2 IP: $POD2_IP"

# 4. 测试延迟
echo "测试延迟:"
kubectl exec $(kubectl get pod -l app=network-test -o jsonpath='{.items[0].metadata.name}') -- ping -c 10 $POD2_IP

# 5. 测试带宽
echo "测试带宽:"
kubectl exec $(kubectl get pod -l app=network-test -o jsonpath='{.items[0].metadata.name}') -- iperf3 -c $POD2_IP -t 30

# 6. 测试外部网络
echo "测试外部网络:"
kubectl exec $(kubectl get pod -l app=network-test -o jsonpath='{.items[0].metadata.name}') -- ping -c 5 8.8.8.8

# 7. 清理
kubectl delete deployment network-test
kubectl delete service network-test

实验2:网络性能优化

#!/bin/bash
# 网络性能优化

echo "=== 网络性能优化 ==="

# 1. 检查当前网络配置
echo "检查当前网络配置:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "ip link show | grep mtu"

# 2. 优化 MTU
echo "优化 MTU:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "ip link set dev eth0 mtu 1500"

# 3. 优化 TCP 参数
echo "优化 TCP 参数:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w net.core.somaxconn=65535"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w net.ipv4.tcp_tw_reuse=1"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w net.ipv4.tcp_fin_timeout=10"

# 4. 优化网络缓冲区
echo "优化网络缓冲区:"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w net.core.rmem_max=16777216"
kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}' | xargs -I {} ssh {} "sysctl -w net.core.wmem_max=16777216"

# 5. 创建优化后的测试 Pod
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: optimized-network-test
spec:
  replicas: 2
  selector:
    matchLabels:
      app: optimized-network-test
  template:
    metadata:
      labels:
        app: optimized-network-test
    spec:
      containers:
      - name: test
        image: nixery.dev/shell/iperf3
        command: ["iperf3", "-s"]
        ports:
        - containerPort: 5201
---
apiVersion: v1
kind: Service
metadata:
  name: optimized-network-test
spec:
  selector:
    app: optimized-network-test
  ports:
  - port: 5201
    targetPort: 5201
EOF

# 6. 等待 Pod 就绪
kubectl wait --for=condition=Ready pod -l app=optimized-network-test --timeout=60s

# 7. 运行优化后的测试
echo "运行优化后的测试:"
POD1_IP=$(kubectl get pod -l app=optimized-network-test -o jsonpath='{.items[0].status.podIP}')
POD2_IP=$(kubectl get pod -l app=optimized-network-test -o jsonpath='{.items[1].status.podIP}')

kubectl exec $(kubectl get pod -l app=optimized-network-test -o jsonpath='{.items[0].metadata.name}') -- iperf3 -c $POD2_IP -t 30

# 8. 清理
kubectl delete deployment optimized-network-test
kubectl delete service optimized-network-test

实验环境搭建

快速搭建脚本

#!/bin/bash
# Kubernetes 性能调优实验环境搭建脚本

set -e

echo "开始搭建 Kubernetes 性能调优实验环境..."

# 1. 创建实验集群
kind create cluster --name performance-test --config - <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
networking:
  podSubnet: "10.244.0.0/16"
  serviceSubnet: "10.96.0.0/12"
EOF

# 2. 等待集群就绪
kubectl wait --for=condition=Ready node --all --timeout=60s

# 3. 安装测试工具
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: performance-tools
spec:
  replicas: 1
  selector:
    matchLabels:
      app: performance-tools
  template:
    metadata:
      labels:
        app: performance-tools
    spec:
      containers:
      - name: tools
        image: nixery.dev/shell/stress-ng
        command: ["sleep", "3600"]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fio-tools
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fio-tools
  template:
    metadata:
      labels:
        app: fio-tools
    spec:
      containers:
      - name: fio
        image: nixery.dev/shell/fio
        command: ["sleep", "3600"]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: network-tools
spec:
  replicas: 1
  selector:
    matchLabels:
      app: network-tools
  template:
    metadata:
      labels:
        app: network-tools
    spec:
      containers:
      - name: tools
        image: nixery.dev/shell/iperf3
        command: ["sleep", "3600"]
EOF

# 4. 创建测试资源
kubectl apply -f - <<EOF
# 测试资源 1:CPU 密集型
apiVersion: v1
kind: Pod
metadata:
  name: cpu-test
spec:
  containers:
  - name: test
    image: nixery.dev/shell/stress-ng
    command: ["stress-ng", "--cpu", "2", "--timeout", "60s"]
    resources:
      requests:
        cpu: 1
        memory: 128Mi
      limits:
        cpu: 2
        memory: 256Mi
---
# 测试资源 2:内存密集型
apiVersion: v1
kind: Pod
metadata:
  name: memory-test
spec:
  containers:
  - name: test
    image: nixery.dev/shell/stress-ng
    command: ["stress-ng", "--vm", "1", "--vm-bytes", "256M", "--timeout", "60s"]
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: 200m
        memory: 512Mi
---
# 测试资源 3:I/O 密集型
apiVersion: v1
kind: Pod
metadata:
  name: io-test
spec:
  containers:
  - name: test
    image: nixery.dev/shell/fio
    command: ["fio", "--name=test", "--filename=/data/test.file", "--bs=4k", "--iodepth=32", "--rw=randread", "--numjobs=4", "--time_based", "--runtime=60", "--group_reporting"]
    volumeMounts:
    - name: test-volume
      mountPath: /data
  volumes:
  - name: test-volume
    emptyDir: {}
EOF

echo "Kubernetes 性能调优实验环境搭建完成!"
echo "运行 'kubectl get nodes' 查看集群状态"
echo "运行 'kubectl get pods' 查看 Pod 状态"

命令速查表

性能监控命令

命令功能示例
kubectl top pod查看 Pod 资源使用kubectl top pod
kubectl top node查看节点资源使用kubectl top node
kubectl describe pod <pod>查看 Pod 详细信息kubectl describe pod my-pod
kubectl logs <pod>查看 Pod 日志kubectl logs my-pod
kubectl get events --sort-by=.lastTimestamp查看事件kubectl get events --sort-by=.lastTimestamp

系统监控命令

命令功能示例
top系统监控top
htop增强版系统监控htop
iostat -x 1I/O 监控iostat -x 1
vmstat 1虚拟内存统计vmstat 1
sar -n DEV 1网络统计sar -n DEV 1
perf stat性能统计perf stat ./myapp

网络测试命令

命令功能示例
ping <ip>测试连通性ping 8.8.8.8
traceroute <ip>查看网络路径traceroute 8.8.8.8
iperf3 -c <server>带宽测试iperf3 -c server
iperf3 -s启动带宽测试服务端iperf3 -s
nslookup <hostname>DNS 解析测试nslookup kubernetes.default

存储测试命令

命令功能示例
fio --name=test --filename=/data/test.file --bs=4k --iodepth=32 --rw=randread --numjobs=4 --time_based --runtime=60 --group_reporting随机读测试在 Pod 中运行
fio --name=test --filename=/data/test.file --bs=4k --iodepth=32 --rw=randwrite --numjobs=4 --time_based --runtime=60 --group_reporting随机写测试在 Pod 中运行
dd if=/dev/zero of=/data/test.file bs=1M count=1000顺序写测试在 Pod 中运行
dd if=/data/test.file of=/dev/null bs=1M顺序读测试在 Pod 中运行

Prev
第四部分:Kubernetes 调度器深度解析
Next
第六部分:CNI 与 eBPF 网络深度实践