第3章:服务网格(Service Mesh)
什么是Service Mesh
定义
Service Mesh(服务网格):专门处理服务间通信的基础设施层,通过Sidecar代理拦截所有网络流量,无需修改应用代码即可实现流量管理、安全、监控等功能。
核心概念
Sidecar模式:
传统方式:
┌─────────────┐
│ Service A │ ──直接调用──> Service B
│ (包含通信 │
│ 逻辑) │
└─────────────┘
Service Mesh:
┌─────────────┐ ┌─────────────┐
│ Service A │ │ Service B │
│ (纯业务) │ │ (纯业务) │
└─────────────┘ └─────────────┘
↓ ↑
┌─────────────┐ ┌─────────────┐
│ Sidecar A │ ──网络调用──> │ Sidecar B │
│ (通信逻辑) │ │ (通信逻辑) │
└─────────────┘ └─────────────┘
Sidecar负责:
- 流量路由
- 负载均衡
- 熔断限流
- 安全加密
- 监控追踪
架构
┌──────────────────────────────────────────┐
│ Control Plane │
│ (控制平面 - 管理和配置Sidecar) │
│ │
│ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ Pilot │ │ Mixer │ │Citadel │ │
│ │(流量) │ │(策略) │ │(安全) │ │
│ └────────┘ └────────┘ └────────┘ │
└──────────────────────────────────────────┘
↓ 配置下发
┌──────────────────────────────────────────┐
│ Data Plane │
│ (数据平面 - 实际处理流量) │
│ │
│ ┌──────────┐ ┌──────────┐ │
│ │Service A │ │Service B │ │
│ │ + │←──→│ + │ │
│ │ Sidecar │ │ Sidecar │ │
│ │ (Envoy) │ │ (Envoy) │ │
│ └──────────┘ └──────────┘ │
└──────────────────────────────────────────┘
Service Mesh vs API Gateway
| 维度 | API Gateway | Service Mesh |
|---|---|---|
| 流量方向 | 南北流量(外部→内部) | 东西流量(服务→服务) |
| 部署方式 | 集中式(网关集群) | 分布式(Sidecar) |
| 功能定位 | 外部API管理 | 内部服务通信 |
| 侵入性 | 无侵入 | 无侵入 |
| 适用场景 | 暴露给外部 | 内部微服务 |
组合使用:
External Client
↓
┌────────────┐
│API Gateway │ ← 南北流量
└────────────┘
↓
┌──────────────────────────┐
│ Service Mesh │
│ │
│ Service A ←→ Service B │ ← 东西流量
│ ↓ ↓ │
│ Service C ←→ Service D │
└──────────────────────────┘
Istio架构
核心组件
控制平面(Control Plane):
┌────────────────────────────────┐
│ Istiod │
│ (统一控制平面,Istio 1.5+) │
│ │
│ ┌──────────────────────────┐ │
│ │ Pilot │ │ 服务发现、流量管理
│ │ - 服务发现 │ │
│ │ - 流量路由配置 │ │
│ │ - 弹性配置(超时、重试) │ │
│ └──────────────────────────┘ │
│ │
│ ┌──────────────────────────┐ │
│ │ Citadel │ │ 证书管理、安全
│ │ - 证书签发 │ │
│ │ - mTLS │ │
│ │ - 认证授权 │ │
│ └──────────────────────────┘ │
│ │
│ ┌──────────────────────────┐ │
│ │ Galley │ │ 配置管理
│ │ - 配置验证 │ │
│ │ - 配置分发 │ │
│ └──────────────────────────┘ │
└────────────────────────────────┘
数据平面(Data Plane):
每个Pod中注入Envoy Sidecar:
┌─────────────────────────┐
│ Pod │
│ │
│ ┌────────────────┐ │
│ │ Application │ │
│ │ Container │ │
│ └────────────────┘ │
│ ↕ │
│ ┌────────────────┐ │
│ │ Envoy │ │ ← Sidecar代理
│ │ Proxy │ │
│ └────────────────┘ │
└─────────────────────────┘
Envoy负责:
- 流量拦截
- 负载均衡
- 熔断限流
- mTLS加密
- 指标收集
Istio实战
安装Istio
1. 下载Istio:
# 下载最新版本
curl -L https://istio.io/downloadIstio | sh -
cd istio-1.20.0
export PATH=$PWD/bin:$PATH
2. 安装Istio:
# 安装demo配置(包含所有组件)
istioctl install --set profile=demo -y
# 输出:
Istio core installed
Istiod installed
Ingress gateways installed
Egress gateways installed
Installation complete
3. 启用Sidecar自动注入:
# 为namespace启用自动注入
kubectl label namespace default istio-injection=enabled
4. 验证安装:
# 查看Istio组件
kubectl get pods -n istio-system
# 输出:
NAME READY STATUS RESTARTS AGE
istio-ingressgateway-xxx 1/1 Running 0 2m
istiod-xxx 1/1 Running 0 2m
部署示例应用
应用架构:
┌──────────┐
│ Gateway │
└──────────┘
↓
┌──────────┐
│Frontend │
└──────────┘
↓
┌──────────┐ ┌──────────┐
│Product │────→│Reviews │
│Service │ │Service │
└──────────┘ └──────────┘
↓
┌──────────┐
│Ratings │
│Service │
└──────────┘
部署YAML:
# productpage.yaml
apiVersion: v1
kind: Service
metadata:
name: productpage
spec:
ports:
- port: 9080
name: http
selector:
app: productpage
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: productpage-v1
spec:
replicas: 1
selector:
matchLabels:
app: productpage
version: v1
template:
metadata:
labels:
app: productpage
version: v1
spec:
containers:
- name: productpage
image: docker.io/istio/examples-bookinfo-productpage-v1:1.17.0
ports:
- containerPort: 9080
---
# reviews.yaml
apiVersion: v1
kind: Service
metadata:
name: reviews
spec:
ports:
- port: 9080
name: http
selector:
app: reviews
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: reviews-v1
spec:
replicas: 1
selector:
matchLabels:
app: reviews
version: v1
template:
metadata:
labels:
app: reviews
version: v1
spec:
containers:
- name: reviews
image: docker.io/istio/examples-bookinfo-reviews-v1:1.17.0
ports:
- containerPort: 9080
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: reviews-v2
spec:
replicas: 1
selector:
matchLabels:
app: reviews
version: v2
template:
metadata:
labels:
app: reviews
version: v2
spec:
containers:
- name: reviews
image: docker.io/istio/examples-bookinfo-reviews-v2:1.17.0
ports:
- containerPort: 9080
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: reviews-v3
spec:
replicas: 1
selector:
matchLabels:
app: reviews
version: v3
template:
metadata:
labels:
app: reviews
version: v3
spec:
containers:
- name: reviews
image: docker.io/istio/examples-bookinfo-reviews-v3:1.17.0
ports:
- containerPort: 9080
部署:
# 部署应用
kubectl apply -f productpage.yaml
kubectl apply -f reviews.yaml
# 验证Pod(每个Pod有2个容器:应用 + Envoy Sidecar)
kubectl get pods
# 输出:
NAME READY STATUS RESTARTS AGE
productpage-v1-xxx 2/2 Running 0 1m
reviews-v1-xxx 2/2 Running 0 1m
reviews-v2-xxx 2/2 Running 0 1m
reviews-v3-xxx 2/2 Running 0 1m
配置Gateway
Istio Gateway(入口网关):
# gateway.yaml
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: bookinfo-gateway
spec:
selector:
istio: ingressgateway # 使用Istio默认ingress gateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: bookinfo
spec:
hosts:
- "*"
gateways:
- bookinfo-gateway
http:
- match:
- uri:
exact: /productpage
- uri:
prefix: /static
route:
- destination:
host: productpage
port:
number: 9080
部署并访问:
# 部署Gateway
kubectl apply -f gateway.yaml
# 获取Ingress Gateway地址
export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
# 访问应用
curl http://$INGRESS_HOST:$INGRESS_PORT/productpage
流量管理
1. 流量路由(VirtualService)
基于权重的路由(金丝雀发布):
# virtualservice-reviews-90-10.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v1
weight: 90 # 90%流量到v1
- destination:
host: reviews
subset: v2
weight: 10 # 10%流量到v2
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: reviews
spec:
host: reviews
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
- name: v3
labels:
version: v3
基于Header的路由:
# virtualservice-reviews-header.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- match:
- headers:
user:
exact: jason # jason用户路由到v2
route:
- destination:
host: reviews
subset: v2
- route: # 其他用户路由到v1
- destination:
host: reviews
subset: v1
基于URI的路由:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- match:
- uri:
prefix: /api/v2 # /api/v2开头路由到v2
route:
- destination:
host: reviews
subset: v2
- route: # 其他路由到v1
- destination:
host: reviews
subset: v1
2. 超时和重试
超时配置:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v1
timeout: 3s # 3秒超时
重试配置:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v1
retries:
attempts: 3 # 重试3次
perTryTimeout: 1s # 每次重试超时1秒
retryOn: 5xx,reset,connect-failure # 重试条件
3. 熔断(Circuit Breaker)
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: reviews
spec:
host: reviews
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100 # 最大连接数
http:
http1MaxPendingRequests: 50 # HTTP/1.1最大待处理请求
http2MaxRequests: 100 # HTTP/2最大请求数
maxRequestsPerConnection: 2 # 每个连接最大请求数
outlierDetection:
consecutiveErrors: 5 # 连续失败5次触发熔断
interval: 30s # 检查间隔
baseEjectionTime: 30s # 熔断时长
maxEjectionPercent: 50 # 最多熔断50%实例
minHealthPercent: 50 # 至少保留50%健康实例
4. 流量镜像(Traffic Mirroring)
将生产流量复制到测试环境:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v1
weight: 100 # 100%流量到v1
mirror:
host: reviews
subset: v2 # 镜像到v2(不影响响应)
mirrorPercentage:
value: 100 # 镜像100%流量
使用场景:
- 新版本测试(v2接收真实流量但不返回响应)
- 压力测试(复制生产流量到测试环境)
- 问题诊断(观察新版本行为)
5. 故障注入(Fault Injection)
延迟注入:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- fault:
delay:
percentage:
value: 10 # 10%请求延迟
fixedDelay: 5s # 延迟5秒
route:
- destination:
host: reviews
subset: v1
错误注入:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- fault:
abort:
percentage:
value: 10 # 10%请求返回错误
httpStatus: 500 # 返回500错误
route:
- destination:
host: reviews
subset: v1
使用场景:
- 混沌工程(测试系统弹性)
- 验证超时和重试机制
- 测试降级逻辑
安全
1. mTLS(双向TLS)
启用mTLS(自动加密服务间通信):
# 全局启用mTLS
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT # 强制mTLS
# 特定服务启用mTLS
---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: reviews-mtls
namespace: default
spec:
selector:
matchLabels:
app: reviews
mtls:
mode: STRICT
mTLS模式:
DISABLE:禁用mTLS
PERMISSIVE:允许明文和mTLS(默认)
STRICT:强制mTLS(推荐生产环境)
验证mTLS:
# 查看mTLS状态
istioctl authn tls-check productpage-v1-xxx.default reviews.default.svc.cluster.local
# 输出:
HOST:PORT STATUS SERVER CLIENT AUTHN POLICY
reviews.default.svc.cluster.local:9080 OK STRICT ISTIO default/default
2. 认证授权
JWT认证:
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
name: jwt-auth
namespace: default
spec:
selector:
matchLabels:
app: productpage
jwtRules:
- issuer: "https://example.com"
jwksUri: "https://example.com/.well-known/jwks.json"
audiences:
- "productpage"
基于角色的授权(RBAC):
# 拒绝所有访问
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: deny-all
namespace: default
spec: {} # 空spec表示拒绝所有
---
# 允许特定访问
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: allow-productpage
namespace: default
spec:
selector:
matchLabels:
app: reviews
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/default/sa/productpage"]
to:
- operation:
methods: ["GET"]
paths: ["/reviews/*"]
基于用户的授权:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: admin-only
namespace: default
spec:
selector:
matchLabels:
app: admin-service
action: ALLOW
rules:
- when:
- key: request.auth.claims[role]
values: ["admin"]
可观测性
1. 指标监控(Prometheus + Grafana)
安装监控组件:
# 安装Prometheus和Grafana
kubectl apply -f samples/addons/prometheus.yaml
kubectl apply -f samples/addons/grafana.yaml
# 访问Grafana
istioctl dashboard grafana
核心指标:
服务级别:
- istio_requests_total:总请求数
- istio_request_duration_milliseconds:请求延迟
- istio_request_bytes:请求大小
- istio_response_bytes:响应大小
网格级别:
- pilot_xds_pushes:配置推送次数
- envoy_cluster_upstream_cx_active:活跃连接数
- envoy_cluster_upstream_cx_total:总连接数
自定义指标查询:
# 每秒请求数(QPS)
rate(istio_requests_total[1m])
# P95延迟
histogram_quantile(0.95, sum(rate(istio_request_duration_milliseconds_bucket[1m])) by (le))
# 错误率
sum(rate(istio_requests_total{response_code=~"5.*"}[1m])) /
sum(rate(istio_requests_total[1m]))
2. 分布式追踪(Jaeger)
安装Jaeger:
kubectl apply -f samples/addons/jaeger.yaml
# 访问Jaeger UI
istioctl dashboard jaeger
追踪示例:
请求链路:
Gateway → productpage → reviews → ratings
Trace ID: abc123
├─ productpage (100ms)
│ ├─ reviews (80ms)
│ │ └─ ratings (60ms)
│ └─ ...
└─ Total: 100ms
在应用中传递追踪Header:
// Go示例:转发追踪Header
func forwardRequest(w http.ResponseWriter, r *http.Request) {
// 创建新请求
proxyReq, _ := http.NewRequest("GET", "http://reviews:9080/reviews", nil)
// 传递追踪Header
tracingHeaders := []string{
"x-request-id",
"x-b3-traceid",
"x-b3-spanid",
"x-b3-parentspanid",
"x-b3-sampled",
"x-b3-flags",
"x-ot-span-context",
}
for _, header := range tracingHeaders {
if val := r.Header.Get(header); val != "" {
proxyReq.Header.Set(header, val)
}
}
// 发送请求
client := &http.Client{}
resp, _ := client.Do(proxyReq)
// ...
}
3. 日志(Fluentd + Elasticsearch)
配置访问日志:
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
name: mesh-default
namespace: istio-system
spec:
accessLogging:
- providers:
- name: envoy
disabled: false
日志格式:
{
"start_time": "2025-11-13T10:00:00.000Z",
"method": "GET",
"path": "/reviews/123",
"protocol": "HTTP/1.1",
"response_code": 200,
"response_flags": "-",
"bytes_received": 0,
"bytes_sent": 1234,
"duration": 25,
"upstream_service_time": "23",
"x_forwarded_for": "10.0.0.1",
"user_agent": "curl/7.64.1",
"request_id": "abc-123-def",
"authority": "reviews:9080",
"upstream_host": "10.0.1.5:9080",
"upstream_cluster": "outbound|9080||reviews.default.svc.cluster.local"
}
面试问答
Service Mesh和API Gateway有什么区别?
答案:
| 维度 | API Gateway | Service Mesh |
|---|---|---|
| 流量类型 | 南北流量(外部→内部) | 东西流量(服务→服务) |
| 部署方式 | 集中式(网关集群) | 分布式(Sidecar) |
| 主要功能 | 鉴权、限流、路由 | 流量管理、安全、监控 |
| 适用场景 | 外部API暴露 | 内部服务通信 |
| 性能开销 | 集中瓶颈 | 分布式,每个Pod额外开销 |
形象比喻:
API Gateway = 大门保安
- 检查所有进入大楼的人
- 集中管理
Service Mesh = 大楼内部通信系统
- 管理楼层之间的通信
- 每个楼层都有通信设备(Sidecar)
最佳实践:
组合使用:
External → API Gateway(南北流量)
↓
Internal → Service Mesh(东西流量)
Istio的Sidecar是如何注入的?
答案:
注入方式:
1. 自动注入(推荐)
# 为namespace启用自动注入
kubectl label namespace default istio-injection=enabled
# 之后在该namespace中创建的Pod会自动注入Sidecar
kubectl apply -f deployment.yaml
# 验证(Pod应该有2个容器)
kubectl get pods
NAME READY STATUS RESTARTS AGE
myapp-xxx 2/2 Running 0 1m
原理:Kubernetes Admission Webhook
1. Pod创建请求发送到API Server
2. Istio的Mutating Webhook拦截请求
3. Webhook修改Pod定义,添加Envoy容器
4. 修改后的Pod被创建
2. 手动注入
# 手动注入(生成新的YAML)
istioctl kube-inject -f deployment.yaml | kubectl apply -f -
# 或者
kubectl apply -f <(istioctl kube-inject -f deployment.yaml)
注入内容:
# 原始Pod
spec:
containers:
- name: myapp
image: myapp:v1
# 注入后
spec:
initContainers:
- name: istio-init # 初始化容器(配置iptables)
image: docker.io/istio/proxyv2:1.20.0
# ...
containers:
- name: myapp
image: myapp:v1
- name: istio-proxy # Sidecar代理
image: docker.io/istio/proxyv2:1.20.0
# ...
流量拦截:
istio-init容器配置iptables规则:
1. 拦截Pod所有出站流量 → Envoy(15001端口)
2. 拦截Pod所有入站流量 → Envoy(15006端口)
3. Envoy处理流量后转发到应用
如何实现灰度发布?
答案:
方案1:基于权重
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v1
weight: 90 # 90%流量
- destination:
host: reviews
subset: v2
weight: 10 # 10%流量(灰度)
发布流程:
阶段1:10%流量 → v2(观察指标)
阶段2:50%流量 → v2
阶段3:100%流量 → v2(完全切换)
方案2:基于用户
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- match:
- headers:
user:
exact: beta-tester # beta用户
route:
- destination:
host: reviews
subset: v2
- route: # 其他用户
- destination:
host: reviews
subset: v1
方案3:基于地域
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- match:
- headers:
x-region:
exact: beijing # 北京地区
route:
- destination:
host: reviews
subset: v2
- route:
- destination:
host: reviews
subset: v1
Istio的mTLS如何工作?
答案:
mTLS原理:
Service A → Service B的通信:
1. Service A的Envoy发起请求
├─ 使用Citadel签发的证书
└─ 建立mTLS连接
2. Service B的Envoy接收请求
├─ 验证证书
├─ 解密流量
└─ 转发到Service B应用
优势:
自动加密(无需修改应用代码)
自动证书轮换
双向验证(防止中间人攻击)
证书管理:
Citadel(证书颁发机构):
1. 为每个服务生成证书
2. 自动轮换(默认24小时)
3. 分发给Envoy
证书存储:
- Kubernetes Secret
- 或直接通过SDS(Secret Discovery Service)
配置mTLS:
# STRICT模式(强制mTLS)
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT
# PERMISSIVE模式(兼容明文和mTLS)
spec:
mtls:
mode: PERMISSIVE
验证mTLS:
# 查看证书
kubectl exec -it productpage-v1-xxx -c istio-proxy -- \
openssl s_client -showcerts -connect reviews:9080
# 查看mTLS状态
istioctl authn tls-check productpage-v1-xxx.default reviews.default.svc.cluster.local
Service Mesh的性能开销有多大?
答案:
性能开销:
延迟增加:
- P50: +1-2ms
- P99: +5-10ms
原因:
1. Sidecar代理处理时间
2. mTLS加密/解密
3. 策略检查
资源消耗:
- CPU: 每个Sidecar约0.1-0.5核
- 内存: 每个Sidecar约50-200MB
测试数据(Istio官方):
无Service Mesh:
- QPS: 10,000
- P99延迟: 10ms
- CPU: 2核
- 内存: 1GB
有Service Mesh(mTLS + 监控):
- QPS: 9,500(下降5%)
- P99延迟: 18ms(增加8ms)
- CPU: 3核(增加1核)
- 内存: 2GB(增加1GB)
优化建议:
1. 关闭不需要的功能:
- 减少遥测数据采样率
- 关闭不需要的插件
2. 调整资源限制:
resources:
limits:
cpu: 200m
memory: 128Mi
3. 使用CPU亲和性:
避免Sidecar和应用竞争CPU
4. 选择性部署:
不是所有服务都需要Service Mesh
关键服务才启用