HiHuo
首页
博客
手册
工具
首页
博客
手册
工具
  • 手撸容器系统

    • 完整手撸容器技术文档系列
    • 01-容器本质与基础概念
    • 02-Namespace隔离机制
    • 03-CGroup资源控制
    • 04-Capabilities与安全机制
    • 05-容器网络原理
    • 06-网络模式与实现
    • 07-CNI插件开发
    • 08-RootFS与文件系统隔离
    • 09-OverlayFS镜像分层
    • 10-命令行手撸容器
    • 11-Go实现最小容器
    • 12-Go实现完整容器
    • 13-容器生命周期管理
    • 14-调试技术与工具
    • 15-OCI规范与标准化
    • 16-进阶场景与优化
    • 常见问题与故障排查
    • 参考资料与延伸阅读

04-Capabilities与安全机制

学习目标

  • 深入理解 Linux Capabilities 机制
  • 掌握 Seccomp 系统调用过滤技术
  • 了解 AppArmor 和 SELinux 强制访问控制
  • 能够配置容器安全策略
  • 掌握容器安全加固最佳实践

前置知识

  • Linux 用户权限管理
  • 系统调用基础
  • 安全模型概念
  • 容器基础原理

️ 一、容器安全概述

1.1 容器安全挑战

容器虽然提供了隔离,但仍然存在安全风险:

graph TD
    A[容器安全风险] --> B[权限逃逸]
    A --> C[资源滥用]
    A --> D[网络攻击]
    A --> E[数据泄露]
    
    B --> B1[容器逃逸到宿主机]
    B --> B2[特权容器滥用]
    
    C --> C1[CPU/内存耗尽]
    C --> C2[磁盘空间耗尽]
    
    D --> D1[网络端口暴露]
    D --> D2[中间人攻击]
    
    E --> E1[敏感数据泄露]
    E --> E2[配置信息暴露]

1.2 容器安全防护体系

graph LR
    A[容器安全] --> B[权限控制]
    A --> C[系统调用过滤]
    A --> D[强制访问控制]
    A --> E[资源限制]
    A --> F[网络隔离]
    
    B --> B1[Capabilities]
    C --> C1[Seccomp]
    D --> D1[AppArmor/SELinux]
    E --> E1[CGroup]
    F --> F1[Network Namespace]

二、Linux Capabilities 详解

2.1 传统权限模型的问题

传统的 Linux 权限模型是二元化的:

  • root 用户:拥有所有权限
  • 普通用户:权限受限

这种模型在容器中存在问题:

  • 容器内需要 root 权限运行某些服务
  • 但给容器完整的 root 权限过于危险

2.2 Capabilities 解决方案

Capabilities 将 root 权限细分为多个独立的能力:

graph TD
    A[传统 Root 权限] --> B[细分为 Capabilities]
    B --> C[CAP_NET_ADMIN - 网络管理]
    B --> D[CAP_SYS_ADMIN - 系统管理]
    B --> E[CAP_SYS_PTRACE - 进程调试]
    B --> F[CAP_MKNOD - 创建设备文件]
    B --> G[CAP_SYS_CHROOT - 改变根目录]
    B --> H[CAP_DAC_OVERRIDE - 绕过文件权限]
    B --> I[CAP_SETUID - 设置用户ID]
    B --> J[CAP_SYS_MODULE - 加载内核模块]
    B --> K[... 30+ 种能力]

2.3 常用 Capabilities 详解

Capability作用风险等级容器中是否必要
CAP_NET_ADMIN网络接口管理高通常不需要
CAP_SYS_ADMIN系统级管理极高通常不需要
CAP_SYS_PTRACE调试进程高调试时需要
CAP_MKNOD创建设备文件中通常不需要
CAP_SYS_CHROOT改变根目录中容器运行时需要
CAP_DAC_OVERRIDE绕过文件权限高通常不需要
CAP_SETUID设置用户ID高通常不需要
CAP_SYS_MODULE加载内核模块极高绝对不需要

2.4 Capabilities 实战演示

2.4.1 查看进程 Capabilities

# 查看当前进程的 Capabilities
cat /proc/self/status | grep Cap

# 使用 getcap 命令查看文件 Capabilities
getcap /bin/ping
# 输出: /bin/ping = cap_net_raw+ep

# 使用 capsh 命令查看详细 Capabilities
capsh --print

2.4.2 设置进程 Capabilities

# 使用 capsh 创建受限环境
capsh --drop=cap_net_admin,cap_sys_admin --shell

# 在新环境中尝试网络管理操作
ip link add test0 type dummy
# 输出: RTNETLINK answers: Operation not permitted

# 退出受限环境
exit

2.4.3 设置文件 Capabilities

# 给 ping 命令添加网络权限
sudo setcap cap_net_raw+ep /bin/ping

# 验证设置
getcap /bin/ping
# 输出: /bin/ping = cap_net_raw+ep

# 测试 ping 命令
ping -c 1 8.8.8.8
# 应该能正常工作,即使以普通用户运行

三、Seccomp 系统调用过滤

3.1 Seccomp 原理

Seccomp (Secure Computing Mode) 允许进程限制可用的系统调用:

graph TD
    A[进程发起系统调用] --> B[内核检查 Seccomp 规则]
    B --> C{系统调用是否允许?}
    C -->|是| D[执行系统调用]
    C -->|否| E[返回错误或终止进程]
    
    F[Seccomp 规则] --> G[允许的系统调用列表]
    F --> H[默认动作 (KILL/TRAP/ERRNO/TRACE)]
    F --> I[特定调用的动作]

3.2 Seccomp 动作类型

动作说明使用场景
KILL立即终止进程最严格的安全策略
TRAP发送 SIGSYS 信号调试和监控
ERRNO返回错误码优雅降级
TRACE通知调试器调试和审计
ALLOW允许执行白名单模式

3.3 Seccomp 配置文件格式

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64"],
  "syscalls": [
    {
      "names": ["read", "write", "open", "close"],
      "action": "SCMP_ACT_ALLOW"
    },
    {
      "names": ["mount", "umount"],
      "action": "SCMP_ACT_ERRNO"
    },
    {
      "names": ["ptrace"],
      "action": "SCMP_ACT_KILL"
    }
  ]
}

3.4 Seccomp 实战演示

3.4.1 创建 Seccomp 配置文件

# 创建简单的 Seccomp 配置文件
cat > seccomp-profile.json << 'EOF'
{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64"],
  "syscalls": [
    {
      "names": [
        "read", "write", "open", "close", "stat", "fstat",
        "lstat", "poll", "lseek", "mmap", "mprotect",
        "munmap", "brk", "rt_sigaction", "rt_sigprocmask",
        "rt_sigreturn", "ioctl", "pread64", "pwrite64",
        "readv", "writev", "access", "pipe", "select",
        "sched_yield", "mremap", "msync", "mincore",
        "madvise", "shmget", "shmat", "shmctl", "dup",
        "dup2", "pause", "nanosleep", "getitimer",
        "alarm", "setitimer", "getpid", "sendfile",
        "socket", "connect", "accept", "sendto", "recvfrom",
        "sendmsg", "recvmsg", "shutdown", "bind", "listen",
        "getsockname", "getpeername", "socketpair", "setsockopt",
        "getsockopt", "clone", "fork", "vfork", "execve",
        "exit", "wait4", "kill", "uname", "semget", "semop",
        "semctl", "shmdt", "msgget", "msgsnd", "msgrcv",
        "msgctl", "fcntl", "flock", "fsync", "fdatasync",
        "truncate", "ftruncate", "getdents", "getcwd",
        "chdir", "fchdir", "rename", "mkdir", "rmdir",
        "creat", "link", "unlink", "symlink", "readlink",
        "chmod", "fchmod", "chown", "fchown", "lchown",
        "umask", "gettimeofday", "getrlimit", "getrusage",
        "sysinfo", "times", "ptrace", "getuid", "syslog",
        "getgid", "setuid", "setgid", "geteuid", "getegid",
        "setpgid", "getppid", "getpgrp", "setsid", "setreuid",
        "setregid", "getgroups", "setgroups", "setresuid",
        "getresuid", "setresgid", "getresgid", "getpgid",
        "setfsuid", "setfsgid", "getsid", "capget", "capset",
        "rt_sigpending", "rt_sigtimedwait", "rt_sigqueueinfo",
        "rt_sigsuspend", "sigaltstack", "utime", "mknod",
        "uselib", "personality", "ustat", "statfs", "fstatfs",
        "sysfs", "getpriority", "setpriority", "sched_setparam",
        "sched_getparam", "sched_setscheduler", "sched_getscheduler",
        "sched_get_priority_max", "sched_get_priority_min",
        "sched_rr_get_interval", "mlock", "munlock", "mlockall",
        "munlockall", "vhangup", "modify_ldt", "pivot_root",
        "_sysctl", "prctl", "arch_prctl", "adjtimex", "setrlimit",
        "chroot", "sync", "acct", "settimeofday", "mount",
        "umount2", "swapon", "swapoff", "reboot", "sethostname",
        "setdomainname", "iopl", "ioperm", "create_module",
        "init_module", "delete_module", "get_kernel_syms",
        "query_module", "quotactl", "nfsservctl", "getpmsg",
        "putpmsg", "afs_syscall", "tuxcall", "security",
        "gettid", "readahead", "setxattr", "lsetxattr", "fsetxattr",
        "getxattr", "lgetxattr", "fgetxattr", "listxattr",
        "llistxattr", "flistxattr", "removexattr", "lremovexattr",
        "fremovexattr", "tkill", "time", "futex", "sched_setaffinity",
        "sched_getaffinity", "set_thread_area", "io_setup",
        "io_destroy", "io_getevents", "io_submit", "io_cancel",
        "get_thread_area", "lookup_dcookie", "epoll_create",
        "epoll_ctl_old", "epoll_wait_old", "remap_file_pages",
        "getdents64", "set_tid_address", "restart_syscall",
        "semtimedop", "fadvise64", "timer_create", "timer_settime",
        "timer_gettime", "timer_getoverrun", "timer_delete",
        "clock_settime", "clock_gettime", "clock_getres",
        "clock_nanosleep", "exit_group", "epoll_wait", "epoll_ctl",
        "tgkill", "utimes", "vserver", "mbind", "set_mempolicy",
        "get_mempolicy", "mq_open", "mq_unlink", "mq_timedsend",
        "mq_timedreceive", "mq_notify", "mq_getsetattr", "kexec_load",
        "waitid", "add_key", "request_key", "keyctl", "ioprio_set",
        "ioprio_get", "inotify_init", "inotify_add_watch",
        "inotify_rm_watch", "migrate_pages", "openat", "mkdirat",
        "mknodat", "fchownat", "futimesat", "newfstatat", "unlinkat",
        "renameat", "linkat", "symlinkat", "readlinkat", "fchmodat",
        "faccessat", "pselect6", "ppoll", "unshare", "set_robust_list",
        "get_robust_list", "splice", "tee", "sync_file_range",
        "vmsplice", "move_pages", "utimensat", "epoll_pwait",
        "signalfd", "timerfd_create", "eventfd", "fallocate",
        "timerfd_settime", "timerfd_gettime", "accept4", "signalfd4",
        "eventfd2", "epoll_create1", "dup3", "pipe2", "inotify_init1",
        "preadv", "pwritev", "rt_tgsigqueueinfo", "perf_event_open",
        "recvmmsg", "fanotify_init", "fanotify_mark", "prlimit64",
        "name_to_handle_at", "open_by_handle_at", "clock_adjtime",
        "syncfs", "sendmmsg", "setns", "getcpu", "process_vm_readv",
        "process_vm_writev", "kcmp", "finit_module", "sched_setattr",
        "sched_getattr", "renameat2", "seccomp", "getrandom",
        "memfd_create", "kexec_file_load", "bpf", "execveat",
        "userfaultfd", "membarrier", "mlock2", "copy_file_range",
        "preadv2", "pwritev2", "pkey_mprotect", "pkey_alloc",
        "pkey_free", "statx", "io_pgetevents", "rseq", "pidfd_send_signal",
        "io_uring_setup", "io_uring_enter", "io_uring_register",
        "open_tree", "move_mount", "fsopen", "fsconfig", "fsmount",
        "fspick", "pidfd_open", "clone3", "close_range", "openat2",
        "pidfd_getfd", "faccessat2", "process_madvise", "epoll_pwait2",
        "mount_setattr", "quotactl_fd", "landlock_create_ruleset",
        "landlock_add_rule", "landlock_restrict_self", "memfd_secret",
        "process_mrelease", "futex_waitv", "set_mempolicy_home_node"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}
EOF

3.4.2 使用 Seccomp 运行程序

# 使用 runc 运行容器并应用 Seccomp 策略
runc run --seccomp-profile seccomp-profile.json test-container

# 或者使用 Docker 运行
docker run --security-opt seccomp=seccomp-profile.json ubuntu:20.04 /bin/bash

四、AppArmor 强制访问控制

4.1 AppArmor 原理

AppArmor 是基于路径的强制访问控制系统:

graph TD
    A[进程访问文件] --> B[AppArmor 检查]
    B --> C{路径是否在规则中?}
    C -->|是| D[检查权限]
    C -->|否| E[使用默认策略]
    
    D --> F{权限是否允许?}
    F -->|是| G[允许访问]
    F -->|否| H[拒绝访问]
    
    E --> I[默认允许/拒绝]

4.2 AppArmor 配置文件格式

# 创建 AppArmor 配置文件
cat > /etc/apparmor.d/container-profile << 'EOF'
#include <tunables/global>

profile container-profile flags=(attach_disconnected,mediate_deleted) {
  #include <abstractions/base>
  
  # 允许读取基本文件
  /etc/passwd r,
  /etc/group r,
  /etc/hosts r,
  
  # 允许访问 /tmp 目录
  /tmp/** rw,
  
  # 禁止访问敏感文件
  deny /etc/shadow r,
  deny /etc/sudoers r,
  
  # 允许网络访问
  network,
  
  # 允许执行基本命令
  /bin/bash ix,
  /bin/ls ix,
  /bin/cat ix,
}
EOF

4.3 AppArmor 实战演示

# 1. 安装 AppArmor 工具
sudo apt-get install apparmor-utils

# 2. 加载配置文件
sudo apparmor_parser -r /etc/apparmor.d/container-profile

# 3. 查看加载的配置文件
sudo apparmor_status

# 4. 使用配置文件运行容器
docker run --security-opt apparmor=container-profile ubuntu:20.04 /bin/bash

五、SELinux 强制访问控制

5.1 SELinux 原理

SELinux 是基于标签的强制访问控制系统:

graph TD
    A[进程访问资源] --> B[SELinux 检查]
    B --> C[获取进程标签]
    B --> D[获取资源标签]
    C --> E[查找访问规则]
    D --> E
    E --> F{规则是否允许?}
    F -->|是| G[允许访问]
    F -->|否| H[拒绝访问]

5.2 SELinux 标签示例

# 查看文件 SELinux 标签
ls -Z /etc/passwd
# 输出: -rw-r--r--. root root system_u:object_r:passwd_file_t:s0

# 查看进程 SELinux 标签
ps -Z
# 输出: system_u:system_r:init_t:s0 1 ? 00:00:01 systemd

# 查看 SELinux 状态
sestatus

5.3 SELinux 策略配置

# 创建 SELinux 策略文件
cat > container.te << 'EOF'
policy_module(container, 1.0.0)

# 定义容器类型
type container_t;
type container_exec_t;

# 允许容器执行基本操作
allow container_t self:capability { setuid setgid };
allow container_t self:process { transition signal_perms };

# 允许访问基本文件
allow container_t passwd_file_t:file { read getattr };
allow container_t group_file_t:file { read getattr };

# 禁止访问敏感文件
dontaudit container_t shadow_file_t:file { read getattr };
dontaudit container_t sudoers_file_t:file { read getattr };
EOF

# 编译和安装策略
make -f /usr/share/selinux/devel/Makefile container.pp
sudo semodule -i container.pp

️ 六、综合实战:容器安全配置

6.1 创建安全配置文件

#!/bin/bash
# 创建完整的容器安全配置

echo "=== 创建容器安全配置 ==="

# 1. 创建 Capabilities 配置
cat > capabilities.json << 'EOF'
{
  "capabilities": {
    "bounding": [
      "CAP_CHOWN",
      "CAP_DAC_OVERRIDE",
      "CAP_FOWNER",
      "CAP_FSETID",
      "CAP_KILL",
      "CAP_SETGID",
      "CAP_SETUID",
      "CAP_SETPCAP",
      "CAP_NET_BIND_SERVICE",
      "CAP_NET_RAW",
      "CAP_SYS_CHROOT",
      "CAP_MKNOD",
      "CAP_AUDIT_WRITE",
      "CAP_SETFCAP"
    ],
    "effective": [
      "CAP_CHOWN",
      "CAP_DAC_OVERRIDE",
      "CAP_FOWNER",
      "CAP_FSETID",
      "CAP_KILL",
      "CAP_SETGID",
      "CAP_SETUID",
      "CAP_SETPCAP",
      "CAP_NET_BIND_SERVICE",
      "CAP_NET_RAW",
      "CAP_SYS_CHROOT",
      "CAP_MKNOD",
      "CAP_AUDIT_WRITE",
      "CAP_SETFCAP"
    ],
    "inheritable": [
      "CAP_CHOWN",
      "CAP_DAC_OVERRIDE",
      "CAP_FOWNER",
      "CAP_FSETID",
      "CAP_KILL",
      "CAP_SETGID",
      "CAP_SETUID",
      "CAP_SETPCAP",
      "CAP_NET_BIND_SERVICE",
      "CAP_NET_RAW",
      "CAP_SYS_CHROOT",
      "CAP_MKNOD",
      "CAP_AUDIT_WRITE",
      "CAP_SETFCAP"
    ],
    "permitted": [
      "CAP_CHOWN",
      "CAP_DAC_OVERRIDE",
      "CAP_FOWNER",
      "CAP_FSETID",
      "CAP_KILL",
      "CAP_SETGID",
      "CAP_SETUID",
      "CAP_SETPCAP",
      "CAP_NET_BIND_SERVICE",
      "CAP_NET_RAW",
      "CAP_SYS_CHROOT",
      "CAP_MKNOD",
      "CAP_AUDIT_WRITE",
      "CAP_SETFCAP"
    ],
    "ambient": []
  }
}
EOF

# 2. 创建 Seccomp 配置
cat > seccomp.json << 'EOF'
{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64"],
  "syscalls": [
    {
      "names": [
        "read", "write", "open", "close", "stat", "fstat",
        "lstat", "poll", "lseek", "mmap", "mprotect",
        "munmap", "brk", "rt_sigaction", "rt_sigprocmask",
        "rt_sigreturn", "ioctl", "pread64", "pwrite64",
        "readv", "writev", "access", "pipe", "select",
        "sched_yield", "mremap", "msync", "mincore",
        "madvise", "shmget", "shmat", "shmctl", "dup",
        "dup2", "pause", "nanosleep", "getitimer",
        "alarm", "setitimer", "getpid", "sendfile",
        "socket", "connect", "accept", "sendto", "recvfrom",
        "sendmsg", "recvmsg", "shutdown", "bind", "listen",
        "getsockname", "getpeername", "socketpair", "setsockopt",
        "getsockopt", "clone", "fork", "vfork", "execve",
        "exit", "wait4", "kill", "uname", "semget", "semop",
        "semctl", "shmdt", "msgget", "msgsnd", "msgrcv",
        "msgctl", "fcntl", "flock", "fsync", "fdatasync",
        "truncate", "ftruncate", "getdents", "getcwd",
        "chdir", "fchdir", "rename", "mkdir", "rmdir",
        "creat", "link", "unlink", "symlink", "readlink",
        "chmod", "fchmod", "chown", "fchown", "lchown",
        "umask", "gettimeofday", "getrlimit", "getrusage",
        "sysinfo", "times", "ptrace", "getuid", "syslog",
        "getgid", "setuid", "setgid", "geteuid", "getegid",
        "setpgid", "getppid", "getpgrp", "setsid", "setreuid",
        "setregid", "getgroups", "setgroups", "setresuid",
        "getresuid", "setresgid", "getresgid", "getpgid",
        "setfsuid", "setfsgid", "getsid", "capget", "capset",
        "rt_sigpending", "rt_sigtimedwait", "rt_sigqueueinfo",
        "rt_sigsuspend", "sigaltstack", "utime", "mknod",
        "uselib", "personality", "ustat", "statfs", "fstatfs",
        "sysfs", "getpriority", "setpriority", "sched_setparam",
        "sched_getparam", "sched_setscheduler", "sched_getscheduler",
        "sched_get_priority_max", "sched_get_priority_min",
        "sched_rr_get_interval", "mlock", "munlock", "mlockall",
        "munlockall", "vhangup", "modify_ldt", "pivot_root",
        "_sysctl", "prctl", "arch_prctl", "adjtimex", "setrlimit",
        "chroot", "sync", "acct", "settimeofday", "mount",
        "umount2", "swapon", "swapoff", "reboot", "sethostname",
        "setdomainname", "iopl", "ioperm", "create_module",
        "init_module", "delete_module", "get_kernel_syms",
        "query_module", "quotactl", "nfsservctl", "getpmsg",
        "putpmsg", "afs_syscall", "tuxcall", "security",
        "gettid", "readahead", "setxattr", "lsetxattr", "fsetxattr",
        "getxattr", "lgetxattr", "fgetxattr", "listxattr",
        "llistxattr", "flistxattr", "removexattr", "lremovexattr",
        "fremovexattr", "tkill", "time", "futex", "sched_setaffinity",
        "sched_getaffinity", "set_thread_area", "io_setup",
        "io_destroy", "io_getevents", "io_submit", "io_cancel",
        "get_thread_area", "lookup_dcookie", "epoll_create",
        "epoll_ctl_old", "epoll_wait_old", "remap_file_pages",
        "getdents64", "set_tid_address", "restart_syscall",
        "semtimedop", "fadvise64", "timer_create", "timer_settime",
        "timer_gettime", "timer_getoverrun", "timer_delete",
        "clock_settime", "clock_gettime", "clock_getres",
        "clock_nanosleep", "exit_group", "epoll_wait", "epoll_ctl",
        "tgkill", "utimes", "vserver", "mbind", "set_mempolicy",
        "get_mempolicy", "mq_open", "mq_unlink", "mq_timedsend",
        "mq_timedreceive", "mq_notify", "mq_getsetattr", "kexec_load",
        "waitid", "add_key", "request_key", "keyctl", "ioprio_set",
        "ioprio_get", "inotify_init", "inotify_add_watch",
        "inotify_rm_watch", "migrate_pages", "openat", "mkdirat",
        "mknodat", "fchownat", "futimesat", "newfstatat", "unlinkat",
        "renameat", "linkat", "symlinkat", "readlinkat", "fchmodat",
        "faccessat", "pselect6", "ppoll", "unshare", "set_robust_list",
        "get_robust_list", "splice", "tee", "sync_file_range",
        "vmsplice", "move_pages", "utimensat", "epoll_pwait",
        "signalfd", "timerfd_create", "eventfd", "fallocate",
        "timerfd_settime", "timerfd_gettime", "accept4", "signalfd4",
        "eventfd2", "epoll_create1", "dup3", "pipe2", "inotify_init1",
        "preadv", "pwritev", "rt_tgsigqueueinfo", "perf_event_open",
        "recvmmsg", "fanotify_init", "fanotify_mark", "prlimit64",
        "name_to_handle_at", "open_by_handle_at", "clock_adjtime",
        "syncfs", "sendmmsg", "setns", "getcpu", "process_vm_readv",
        "process_vm_writev", "kcmp", "finit_module", "sched_setattr",
        "sched_getattr", "renameat2", "seccomp", "getrandom",
        "memfd_create", "kexec_file_load", "bpf", "execveat",
        "userfaultfd", "membarrier", "mlock2", "copy_file_range",
        "preadv2", "pwritev2", "pkey_mprotect", "pkey_alloc",
        "pkey_free", "statx", "io_pgetevents", "rseq", "pidfd_send_signal",
        "io_uring_setup", "io_uring_enter", "io_uring_register",
        "open_tree", "move_mount", "fsopen", "fsconfig", "fsmount",
        "fspick", "pidfd_open", "clone3", "close_range", "openat2",
        "pidfd_getfd", "faccessat2", "process_madvise", "epoll_pwait2",
        "mount_setattr", "quotactl_fd", "landlock_create_ruleset",
        "landlock_add_rule", "landlock_restrict_self", "memfd_secret",
        "process_mrelease", "futex_waitv", "set_mempolicy_home_node"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}
EOF

echo "=== 安全配置文件创建完成 ==="
echo "Capabilities 配置: capabilities.json"
echo "Seccomp 配置: seccomp.json"

6.2 Go 代码实现安全配置

package main

import (
    "encoding/json"
    "fmt"
    "os"
    "os/exec"
    "syscall"
)

type Capabilities struct {
    Bounding    []string `json:"bounding"`
    Effective   []string `json:"effective"`
    Inheritable []string `json:"inheritable"`
    Permitted   []string `json:"permitted"`
    Ambient     []string `json:"ambient"`
}

type SecurityConfig struct {
    Capabilities Capabilities `json:"capabilities"`
}

func createSecureContainer() error {
    // 1. 创建安全配置
    config := SecurityConfig{
        Capabilities: Capabilities{
            Bounding: []string{
                "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FOWNER",
                "CAP_FSETID", "CAP_KILL", "CAP_SETGID", "CAP_SETUID",
                "CAP_SETPCAP", "CAP_NET_BIND_SERVICE", "CAP_NET_RAW",
                "CAP_SYS_CHROOT", "CAP_MKNOD", "CAP_AUDIT_WRITE", "CAP_SETFCAP",
            },
            Effective: []string{
                "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FOWNER",
                "CAP_FSETID", "CAP_KILL", "CAP_SETGID", "CAP_SETUID",
                "CAP_SETPCAP", "CAP_NET_BIND_SERVICE", "CAP_NET_RAW",
                "CAP_SYS_CHROOT", "CAP_MKNOD", "CAP_AUDIT_WRITE", "CAP_SETFCAP",
            },
            Inheritable: []string{
                "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FOWNER",
                "CAP_FSETID", "CAP_KILL", "CAP_SETGID", "CAP_SETUID",
                "CAP_SETPCAP", "CAP_NET_BIND_SERVICE", "CAP_NET_RAW",
                "CAP_SYS_CHROOT", "CAP_MKNOD", "CAP_AUDIT_WRITE", "CAP_SETFCAP",
            },
            Permitted: []string{
                "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FOWNER",
                "CAP_FSETID", "CAP_KILL", "CAP_SETGID", "CAP_SETUID",
                "CAP_SETPCAP", "CAP_NET_BIND_SERVICE", "CAP_NET_RAW",
                "CAP_SYS_CHROOT", "CAP_MKNOD", "CAP_AUDIT_WRITE", "CAP_SETFCAP",
            },
            Ambient: []string{},
        },
    }
    
    // 2. 保存配置文件
    configData, err := json.MarshalIndent(config, "", "  ")
    if err != nil {
        return fmt.Errorf("序列化配置失败: %v", err)
    }
    
    if err := os.WriteFile("security-config.json", configData, 0644); err != nil {
        return fmt.Errorf("保存配置文件失败: %v", err)
    }
    
    // 3. 创建容器进程
    cmd := exec.Command("/bin/bash")
    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags: syscall.CLONE_NEWUTS |
                   syscall.CLONE_NEWPID |
                   syscall.CLONE_NEWNS |
                   syscall.CLONE_NEWNET |
                   syscall.CLONE_NEWIPC |
                   syscall.CLONE_NEWUSER |
                   syscall.CLONE_NEWCGROUP,
    }
    
    // 4. 设置标准输入输出
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr
    
    // 5. 启动容器
    if err := cmd.Run(); err != nil {
        return fmt.Errorf("启动容器失败: %v", err)
    }
    
    return nil
}

func main() {
    if err := createSecureContainer(); err != nil {
        fmt.Printf("创建安全容器失败: %v\n", err)
        os.Exit(1)
    }
}

七、验证检查清单

基础理解

  • [ ] 理解 Capabilities 机制的作用和原理
  • [ ] 掌握 Seccomp 系统调用过滤技术
  • [ ] 了解 AppArmor 和 SELinux 的区别
  • [ ] 理解容器安全防护体系

实践能力

  • [ ] 能够配置和测试 Capabilities
  • [ ] 能够创建和测试 Seccomp 策略
  • [ ] 能够配置 AppArmor 或 SELinux
  • [ ] 能够创建综合安全配置

安全技能

  • [ ] 掌握容器安全加固最佳实践
  • [ ] 能够进行安全漏洞评估
  • [ ] 能够配置最小权限原则
  • [ ] 理解安全策略的权衡

相关链接

  • 03-CGroup资源控制 - 资源限制技术
  • 05-容器网络原理 - 网络隔离技术
  • 14-调试技术与工具 - 安全调试技术

下一步:让我们学习容器网络原理,这是容器通信的基础!

Prev
03-CGroup资源控制
Next
05-容器网络原理