HiHuo
首页
博客
手册
工具
首页
博客
手册
工具
  • 手撸容器系统

    • 完整手撸容器技术文档系列
    • 01-容器本质与基础概念
    • 02-Namespace隔离机制
    • 03-CGroup资源控制
    • 04-Capabilities与安全机制
    • 05-容器网络原理
    • 06-网络模式与实现
    • 07-CNI插件开发
    • 08-RootFS与文件系统隔离
    • 09-OverlayFS镜像分层
    • 10-命令行手撸容器
    • 11-Go实现最小容器
    • 12-Go实现完整容器
    • 13-容器生命周期管理
    • 14-调试技术与工具
    • 15-OCI规范与标准化
    • 16-进阶场景与优化
    • 常见问题与故障排查
    • 参考资料与延伸阅读

15-OCI规范与标准化

学习目标

  • 深入理解 OCI 规范的核心内容
  • 掌握 OCI Runtime 和 Image 规范
  • 能够实现 OCI 兼容的容器运行时
  • 了解容器标准化的重要性
  • 掌握 OCI 工具链的使用

前置知识

  • 容器基础原理
  • JSON 数据处理
  • Go 语言编程
  • 容器镜像原理

一、OCI 规范概述

1.1 OCI 规范组成

graph TD
    A[OCI 规范] --> B[Runtime Spec]
    A --> C[Image Spec]
    A --> D[Distribution Spec]
    
    B --> B1[config.json]
    B --> B2[容器生命周期]
    B --> B3[文件系统]
    B --> B4[进程]
    B --> B5[Linux 平台]
    
    C --> C1[镜像格式]
    C --> C2[层管理]
    C --> C3[配置]
    C --> C4[清单]
    
    D --> D1[镜像分发]
    D --> D2[认证]
    D --> D3[存储]

1.2 OCI 规范特点

特点说明优势
标准化统一的容器标准跨平台兼容
模块化分离运行时和镜像灵活组合
可扩展支持自定义扩展适应不同需求
开放开源标准社区驱动

二、OCI Runtime 规范

2.1 Runtime 规范结构

graph TD
    A[OCI Runtime] --> B[config.json]
    A --> C[容器状态]
    A --> D[生命周期]
    
    B --> B1[进程配置]
    B --> B2[根文件系统]
    B --> B3[挂载点]
    B --> B4[Linux 配置]
    B --> B5[Windows 配置]
    
    C --> C1[状态文件]
    C --> C2[PID 文件]
    C --> C3[退出码]
    
    D --> D1[创建]
    D --> D2[启动]
    D --> D3[停止]
    D --> D4[删除]

2.2 config.json 格式

{
  "ociVersion": "1.0.0",
  "process": {
    "terminal": true,
    "user": {
      "uid": 0,
      "gid": 0
    },
    "args": [
      "sh"
    ],
    "env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "TERM=xterm"
    ],
    "cwd": "/",
    "capabilities": {
      "bounding": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ],
      "effective": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ],
      "inheritable": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ],
      "permitted": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ],
      "ambient": []
    },
    "rlimits": [
      {
        "type": "RLIMIT_NOFILE",
        "hard": 1024,
        "soft": 1024
      }
    ],
    "noNewPrivileges": true
  },
  "root": {
    "path": "rootfs",
    "readonly": true
  },
  "hostname": "runc",
  "mounts": [
    {
      "destination": "/proc",
      "type": "proc",
      "source": "proc"
    },
    {
      "destination": "/dev",
      "type": "tmpfs",
      "source": "tmpfs",
      "options": [
        "nosuid",
        "strictatime",
        "mode=755",
        "size=65536k"
      ]
    },
    {
      "destination": "/dev/pts",
      "type": "devpts",
      "source": "devpts",
      "options": [
        "nosuid",
        "noexec",
        "newinstance",
        "ptmxmode=0666",
        "mode=0620",
        "gid=5"
      ]
    },
    {
      "destination": "/dev/shm",
      "type": "tmpfs",
      "source": "shm",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "mode=1777",
        "size=65536k"
      ]
    },
    {
      "destination": "/dev/mqueue",
      "type": "mqueue",
      "source": "mqueue",
      "options": [
        "nosuid",
        "noexec",
        "nodev"
      ]
    },
    {
      "destination": "/sys",
      "type": "sysfs",
      "source": "sysfs",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "ro"
      ]
    },
    {
      "destination": "/sys/fs/cgroup",
      "type": "cgroup",
      "source": "cgroup",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "relatime",
        "ro"
      ]
    }
  ],
  "hooks": {
    "prestart": [
      {
        "path": "/usr/bin/fix-mounts",
        "args": [
          "fix-mounts",
          "arg1",
          "arg2"
        ],
        "env": [
          "key1=value1"
        ]
      }
    ],
    "poststart": [
      {
        "path": "/usr/bin/notify-start",
        "timeout": 5
      }
    ],
    "poststop": [
      {
        "path": "/usr/sbin/cleanup.sh",
        "args": [
          "cleanup.sh",
          "arg1",
          "arg2"
        ],
        "timeout": 5
      }
    ]
  },
  "linux": {
    "devices": [
      {
        "path": "/dev/fuse",
        "type": "c",
        "major": 10,
        "minor": 229,
        "fileMode": 438,
        "uid": 0,
        "gid": 0
      },
      {
        "path": "/dev/sda",
        "type": "b",
        "major": 8,
        "minor": 0,
        "fileMode": 432,
        "uid": 0,
        "gid": 0
      }
    ],
    "uidMappings": [
      {
        "containerID": 0,
        "hostID": 1000,
        "size": 32000
      }
    ],
    "gidMappings": [
      {
        "containerID": 0,
        "hostID": 1000,
        "size": 32000
      }
    ],
    "sysctl": {
      "net.ipv4.ip_forward": "1"
    },
    "cgroupsPath": "myContainer",
    "resources": {
      "devices": [
        {
          "allow": false,
          "access": "rwm"
        }
      ],
      "memory": {
        "limit": 536870912,
        "reservation": 536870912
      },
      "cpu": {
        "shares": 1024,
        "quota": 1000000,
        "period": 500000
      },
      "pids": {
        "limit": 32771
      },
      "blockIO": {
        "weight": 10
      },
      "hugepageLimits": [
        {
          "pageSize": "2MB",
          "limit": 9223372036854775807
        }
      ],
      "network": {
        "classID": 1048577,
        "priorities": [
          {
            "name": "eth0",
            "priority": 500
          }
        ]
      }
    },
    "rootfsPropagation": "slave",
    "seccomp": {
      "defaultAction": "SCMP_ACT_ERRNO",
      "architectures": [
        "SCMP_ARCH_X86_64",
        "SCMP_ARCH_X86",
        "SCMP_ARCH_X32"
      ],
      "syscalls": [
        {
          "names": [
            "accept",
            "accept4",
            "access",
            "adjtimex",
            "alarm",
            "bind",
            "brk",
            "capget",
            "capset",
            "chdir",
            "chmod",
            "chown",
            "chroot",
            "clock_getres",
            "clock_gettime",
            "clock_nanosleep",
            "close",
            "connect",
            "copy_file_range",
            "creat",
            "dup",
            "dup2",
            "dup3",
            "epoll_create",
            "epoll_create1",
            "epoll_ctl",
            "epoll_pwait",
            "epoll_wait",
            "eventfd",
            "eventfd2",
            "execve",
            "execveat",
            "exit",
            "exit_group",
            "faccessat",
            "fadvise64",
            "fallocate",
            "fanotify_mark",
            "fchdir",
            "fchmod",
            "fchmodat",
            "fchown",
            "fchownat",
            "fcntl",
            "fdatasync",
            "fgetxattr",
            "flistxattr",
            "flock",
            "fork",
            "fremovexattr",
            "fsetxattr",
            "fstat",
            "fstatfs",
            "fsync",
            "ftruncate",
            "futex",
            "getcwd",
            "getdents",
            "getdents64",
            "getegid",
            "geteuid",
            "getgid",
            "getgroups",
            "getpeername",
            "getpgid",
            "getpgrp",
            "getpid",
            "getppid",
            "getpriority",
            "getrandom",
            "getresgid",
            "getresuid",
            "getrlimit",
            "get_robust_list",
            "getrusage",
            "getsid",
            "getsockname",
            "getsockopt",
            "get_thread_area",
            "gettid",
            "gettimeofday",
            "getuid",
            "getxattr",
            "inotify_add_watch",
            "inotify_init",
            "inotify_init1",
            "inotify_rm_watch",
            "io_cancel",
            "ioctl",
            "io_destroy",
            "io_getevents",
            "ioprio_get",
            "ioprio_set",
            "io_setup",
            "io_submit",
            "ipc",
            "kill",
            "lchown",
            "lgetxattr",
            "link",
            "linkat",
            "listen",
            "listxattr",
            "llistxattr",
            "lremovexattr",
            "lseek",
            "lsetxattr",
            "lstat",
            "madvise",
            "mincore",
            "mkdir",
            "mkdirat",
            "mknod",
            "mknodat",
            "mlock",
            "mlockall",
            "mmap",
            "mmap2",
            "mprotect",
            "mq_getsetattr",
            "mq_notify",
            "mq_open",
            "mq_timedreceive",
            "mq_timedsend",
            "mq_unlink",
            "mremap",
            "msgctl",
            "msgget",
            "msgrcv",
            "msgsnd",
            "msync",
            "munlock",
            "munlockall",
            "munmap",
            "nanosleep",
            "newfstatat",
            "_newselect",
            "open",
            "openat",
            "pause",
            "pipe",
            "pipe2",
            "poll",
            "ppoll",
            "prctl",
            "pread64",
            "preadv",
            "prlimit64",
            "pselect6",
            "ptrace",
            "pwrite64",
            "pwritev",
            "read",
            "readahead",
            "readlink",
            "readlinkat",
            "readv",
            "recv",
            "recvfrom",
            "recvmmsg",
            "recvmsg",
            "remap_file_pages",
            "removexattr",
            "rename",
            "renameat",
            "renameat2",
            "restart_syscall",
            "rmdir",
            "rt_sigaction",
            "rt_sigpending",
            "rt_sigprocmask",
            "rt_sigqueueinfo",
            "rt_sigsuspend",
            "rt_sigtimedwait",
            "rt_tgsigqueueinfo",
            "sched_get_priority_max",
            "sched_get_priority_min",
            "sched_getaffinity",
            "sched_getparam",
            "sched_getscheduler",
            "sched_rr_get_interval",
            "sched_setaffinity",
            "sched_setparam",
            "sched_setscheduler",
            "sched_yield",
            "seccomp",
            "select",
            "sendfile",
            "sendfile64",
            "sendmmsg",
            "sendmsg",
            "sendto",
            "setfsgid",
            "setfsuid",
            "setgid",
            "setgroups",
            "setitimer",
            "setpgid",
            "setpriority",
            "setregid",
            "setresgid",
            "setresuid",
            "setreuid",
            "setrlimit",
            "set_robust_list",
            "setsid",
            "setsockopt",
            "set_thread_area",
            "set_tid_address",
            "setuid",
            "setxattr",
            "shmat",
            "shmctl",
            "shmdt",
            "shmget",
            "shutdown",
            "sigaltstack",
            "signalfd",
            "signalfd4",
            "sigreturn",
            "socket",
            "socketcall",
            "socketpair",
            "splice",
            "stat",
            "statfs",
            "symlink",
            "symlinkat",
            "sync",
            "sync_file_range",
            "syncfs",
            "sysinfo",
            "syslog",
            "tee",
            "tgkill",
            "time",
            "timer_create",
            "timer_delete",
            "timerfd_create",
            "timerfd_gettime",
            "timerfd_settime",
            "timer_getoverrun",
            "timer_gettime",
            "timer_settime",
            "times",
            "tkill",
            "truncate",
            "umask",
            "uname",
            "unlink",
            "unlinkat",
            "utime",
            "utimensat",
            "utimes",
            "vfork",
            "vmsplice",
            "wait4",
            "waitid",
            "waitpid",
            "write",
            "writev"
          ],
          "action": "SCMP_ACT_ALLOW"
        }
      ]
    },
    "namespaces": [
      {
        "type": "pid"
      },
      {
        "type": "network"
      },
      {
        "type": "ipc"
      },
      {
        "type": "uts"
      },
      {
        "type": "mount"
      },
      {
        "type": "user"
      },
      {
        "type": "cgroup"
      }
    ],
    "maskedPaths": [
      "/proc/kcore",
      "/proc/latency_stats",
      "/proc/timer_list",
      "/proc/timer_stats",
      "/proc/sched_debug"
    ],
    "readonlyPaths": [
      "/proc/bus",
      "/proc/fs",
      "/proc/irq",
      "/proc/sys",
      "/proc/sysrq-trigger"
    ]
  }
}

2.3 OCI Runtime 实现

package main

import (
    "encoding/json"
    "fmt"
    "io/ioutil"
    "os"
    "path/filepath"
    "syscall"
    "golang.org/x/sys/unix"
)

// OCIConfig OCI 配置结构
type OCIConfig struct {
    OCIVersion string `json:"ociVersion"`
    Process    struct {
        Terminal bool     `json:"terminal"`
        User     struct {
            UID int `json:"uid"`
            GID int `json:"gid"`
        } `json:"user"`
        Args []string `json:"args"`
        Env  []string `json:"env"`
        Cwd  string   `json:"cwd"`
    } `json:"process"`
    Root struct {
        Path     string `json:"path"`
        Readonly bool   `json:"readonly"`
    } `json:"root"`
    Hostname string `json:"hostname"`
    Mounts   []struct {
        Destination string   `json:"destination"`
        Type        string   `json:"type"`
        Source      string   `json:"source"`
        Options     []string `json:"options"`
    } `json:"mounts"`
    Linux struct {
        Namespaces []struct {
            Type string `json:"type"`
        } `json:"namespaces"`
        Resources struct {
            Memory struct {
                Limit int64 `json:"limit"`
            } `json:"memory"`
            CPU struct {
                Shares int64 `json:"shares"`
                Quota  int64 `json:"quota"`
                Period int64 `json:"period"`
            } `json:"cpu"`
        } `json:"resources"`
    } `json:"linux"`
}

// LoadOCIConfig 加载 OCI 配置
func LoadOCIConfig(configPath string) (*OCIConfig, error) {
    data, err := ioutil.ReadFile(configPath)
    if err != nil {
        return nil, fmt.Errorf("failed to read config file: %v", err)
    }
    
    var config OCIConfig
    if err := json.Unmarshal(data, &config); err != nil {
        return nil, fmt.Errorf("failed to parse config: %v", err)
    }
    
    return &config, nil
}

// CreateContainer 创建容器
func CreateContainer(config *OCIConfig, bundlePath string) error {
    // 创建容器目录
    containerDir := filepath.Join(bundlePath, "container")
    if err := os.MkdirAll(containerDir, 0755); err != nil {
        return fmt.Errorf("failed to create container directory: %v", err)
    }
    
    // 创建状态文件
    stateFile := filepath.Join(containerDir, "state.json")
    state := map[string]interface{}{
        "ociVersion": config.OCIVersion,
        "id":         "container-123",
        "status":     "created",
        "bundle":     bundlePath,
        "created":    "2023-01-01T00:00:00Z",
    }
    
    stateData, err := json.MarshalIndent(state, "", "  ")
    if err != nil {
        return fmt.Errorf("failed to marshal state: %v", err)
    }
    
    if err := ioutil.WriteFile(stateFile, stateData, 0644); err != nil {
        return fmt.Errorf("failed to write state file: %v", err)
    }
    
    return nil
}

// StartContainer 启动容器
func StartContainer(config *OCIConfig, bundlePath string) error {
    // 创建所有 namespace 的 flags
    flags := syscall.CLONE_NEWUTS |
             syscall.CLONE_NEWPID |
             syscall.CLONE_NEWNS |
             syscall.CLONE_NEWNET |
             syscall.CLONE_NEWIPC |
             syscall.CLONE_NEWUSER |
             syscall.CLONE_NEWCGROUP

    // 准备子进程命令
    cmd := exec.Command("/proc/self/exe", "child", bundlePath)
    
    // 设置系统调用属性
    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags: flags,
    }

    // 设置标准输入输出
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    // 启动子进程
    if err := cmd.Start(); err != nil {
        return fmt.Errorf("failed to start child process: %v", err)
    }

    // 等待子进程完成
    return cmd.Wait()
}

// setupContainer 设置容器环境
func setupContainer(config *OCIConfig, bundlePath string) error {
    // 设置主机名
    if err := unix.Sethostname([]byte(config.Hostname)); err != nil {
        return fmt.Errorf("failed to set hostname: %v", err)
    }

    // 设置工作目录
    if err := os.Chdir(config.Process.Cwd); err != nil {
        return fmt.Errorf("failed to change working directory: %v", err)
    }

    // 挂载文件系统
    for _, mount := range config.Mounts {
        if err := mountFilesystem(mount); err != nil {
            return fmt.Errorf("failed to mount %s: %v", mount.Destination, err)
        }
    }

    // 切换根目录
    if err := pivotRoot(config.Root.Path); err != nil {
        return fmt.Errorf("failed to pivot root: %v", err)
    }

    // 执行命令
    if err := syscall.Exec(config.Process.Args[0], config.Process.Args, config.Process.Env); err != nil {
        return fmt.Errorf("failed to exec command: %v", err)
    }

    return nil
}

// mountFilesystem 挂载文件系统
func mountFilesystem(mount struct {
    Destination string   `json:"destination"`
    Type        string   `json:"type"`
    Source      string   `json:"source"`
    Options     []string `json:"options"`
}) error {
    // 创建目标目录
    if err := os.MkdirAll(mount.Destination, 0755); err != nil {
        return fmt.Errorf("failed to create mount point: %v", err)
    }

    // 解析挂载选项
    var flags uintptr
    for _, option := range mount.Options {
        switch option {
        case "ro":
            flags |= unix.MS_RDONLY
        case "noexec":
            flags |= unix.MS_NOEXEC
        case "nosuid":
            flags |= unix.MS_NOSUID
        case "nodev":
            flags |= unix.MS_NODEV
        }
    }

    // 执行挂载
    if err := unix.Mount(mount.Source, mount.Destination, mount.Type, flags, ""); err != nil {
        return fmt.Errorf("failed to mount: %v", err)
    }

    return nil
}

// pivotRoot 切换根目录
func pivotRoot(newroot string) error {
    // 绑定挂载 newroot
    if err := unix.Mount(newroot, newroot, "", unix.MS_BIND|unix.MS_REC, ""); err != nil {
        return fmt.Errorf("failed to bind mount newroot: %v", err)
    }

    // 创建 put_old 目录
    putold := "/.oldroot"
    if err := os.MkdirAll(putold, 0700); err != nil {
        return fmt.Errorf("failed to create put_old directory: %v", err)
    }

    // 执行 pivot_root
    if err := unix.PivotRoot(newroot, putold); err != nil {
        return fmt.Errorf("failed to pivot_root: %v", err)
    }

    // 切换到新根目录
    if err := os.Chdir("/"); err != nil {
        return fmt.Errorf("failed to change working directory: %v", err)
    }

    // 卸载原根目录
    if err := unix.Unmount(putold, unix.MNT_DETACH); err != nil {
        return fmt.Errorf("failed to unmount old root: %v", err)
    }

    // 删除 put_old 目录
    if err := os.RemoveAll(putold); err != nil {
        return fmt.Errorf("failed to remove put_old directory: %v", err)
    }

    return nil
}

️ 三、OCI Image 规范

3.1 Image 规范结构

graph TD
    A[OCI Image] --> B[Manifest]
    A --> C[Config]
    A --> D[Layers]
    A --> E[Index]
    
    B --> B1[媒体类型]
    B --> B2[层引用]
    B --> B3[配置引用]
    
    C --> C1[架构信息]
    C --> C2[操作系统]
    C --> C3[配置]
    C --> C4[历史]
    
    D --> D1[层文件]
    D --> D2[压缩格式]
    D --> D3[校验和]
    
    E --> E1[多架构支持]
    E --> E2[平台信息]

3.2 Image 清单格式

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "size": 1234,
    "digest": "sha256:abc123..."
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "size": 5678,
      "digest": "sha256:def456..."
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "size": 9012,
      "digest": "sha256:ghi789..."
    }
  ],
  "annotations": {
    "org.opencontainers.image.title": "my-image",
    "org.opencontainers.image.description": "A sample OCI image",
    "org.opencontainers.image.version": "1.0.0"
  }
}

3.3 Image 配置格式

{
  "architecture": "amd64",
  "os": "linux",
  "config": {
    "Env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
    ],
    "Cmd": [
      "/bin/sh"
    ],
    "WorkingDir": "/",
    "User": "root",
    "ExposedPorts": {
      "80/tcp": {}
    },
    "Volumes": {
      "/data": {}
    }
  },
  "rootfs": {
    "type": "layers",
    "diff_ids": [
      "sha256:abc123...",
      "sha256:def456...",
      "sha256:ghi789..."
    ]
  },
  "history": [
    {
      "created": "2023-01-01T00:00:00Z",
      "created_by": "RUN apt-get update"
    },
    {
      "created": "2023-01-01T00:01:00Z",
      "created_by": "RUN apt-get install -y nginx"
    }
  ]
}

3.4 Image 索引格式

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 1234,
      "digest": "sha256:abc123...",
      "platform": {
        "architecture": "amd64",
        "os": "linux"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "size": 5678,
      "digest": "sha256:def456...",
      "platform": {
        "architecture": "arm64",
        "os": "linux"
      }
    }
  ],
  "annotations": {
    "org.opencontainers.image.title": "my-image",
    "org.opencontainers.image.description": "A multi-arch OCI image"
  }
}

️ 四、OCI 工具链

4.1 runc 使用

# 1. 安装 runc
wget https://github.com/opencontainers/runc/releases/download/v1.1.0/runc.amd64
chmod +x runc.amd64
sudo mv runc.amd64 /usr/local/bin/runc

# 2. 创建 OCI bundle
mkdir -p /tmp/oci-bundle/{rootfs,config}

# 3. 准备 rootfs
# (使用之前的方法准备 rootfs)

# 4. 创建 config.json
cat > /tmp/oci-bundle/config.json << 'EOF'
{
  "ociVersion": "1.0.0",
  "process": {
    "terminal": true,
    "user": {
      "uid": 0,
      "gid": 0
    },
    "args": ["sh"],
    "env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "TERM=xterm"
    ],
    "cwd": "/"
  },
  "root": {
    "path": "rootfs",
    "readonly": true
  },
  "hostname": "runc",
  "mounts": [
    {
      "destination": "/proc",
      "type": "proc",
      "source": "proc"
    },
    {
      "destination": "/dev",
      "type": "tmpfs",
      "source": "tmpfs",
      "options": ["nosuid", "strictatime", "mode=755", "size=65536k"]
    }
  ],
  "linux": {
    "namespaces": [
      {"type": "pid"},
      {"type": "network"},
      {"type": "ipc"},
      {"type": "uts"},
      {"type": "mount"},
      {"type": "user"},
      {"type": "cgroup"}
    ]
  }
}
EOF

# 5. 运行容器
runc run mycontainer

# 6. 管理容器
runc list
runc start mycontainer
runc stop mycontainer
runc delete mycontainer

4.2 skopeo 使用

# 1. 安装 skopeo
sudo apt-get install skopeo

# 2. 检查镜像
skopeo inspect docker://ubuntu:20.04

# 3. 复制镜像
skopeo copy docker://ubuntu:20.04 oci:ubuntu:20.04

# 4. 转换镜像格式
skopeo copy docker://ubuntu:20.04 dir:./ubuntu-image

# 5. 同步镜像
skopeo sync --src docker --dest dir ubuntu:20.04 ./images/

4.3 buildah 使用

# 1. 安装 buildah
sudo apt-get install buildah

# 2. 创建容器
buildah from ubuntu:20.04

# 3. 运行命令
buildah run ubuntu-working-container apt-get update
buildah run ubuntu-working-container apt-get install -y nginx

# 4. 配置容器
buildah config --cmd "nginx -g 'daemon off;'" ubuntu-working-container
buildah config --port 80 ubuntu-working-container

# 5. 提交镜像
buildah commit ubuntu-working-container my-nginx:latest

# 6. 清理
buildah rm ubuntu-working-container

五、OCI 实现示例

5.1 简单 OCI 运行时

package main

import (
    "encoding/json"
    "flag"
    "fmt"
    "io/ioutil"
    "os"
    "os/exec"
    "path/filepath"
    "syscall"
    "golang.org/x/sys/unix"
)

func main() {
    var (
        bundle = flag.String("bundle", "", "Path to OCI bundle")
        action = flag.String("action", "run", "Action to perform")
    )
    flag.Parse()

    if *bundle == "" {
        fmt.Fprintf(os.Stderr, "bundle is required\n")
        os.Exit(1)
    }

    // 加载 OCI 配置
    configPath := filepath.Join(*bundle, "config.json")
    config, err := loadOCIConfig(configPath)
    if err != nil {
        fmt.Fprintf(os.Stderr, "Failed to load config: %v\n", err)
        os.Exit(1)
    }

    // 执行操作
    switch *action {
    case "create":
        err = createContainer(config, *bundle)
    case "start":
        err = startContainer(config, *bundle)
    case "run":
        err = runContainer(config, *bundle)
    case "delete":
        err = deleteContainer(*bundle)
    default:
        fmt.Fprintf(os.Stderr, "Unknown action: %s\n", *action)
        os.Exit(1)
    }

    if err != nil {
        fmt.Fprintf(os.Stderr, "Operation failed: %v\n", err)
        os.Exit(1)
    }
}

// loadOCIConfig 加载 OCI 配置
func loadOCIConfig(configPath string) (*OCIConfig, error) {
    data, err := ioutil.ReadFile(configPath)
    if err != nil {
        return nil, fmt.Errorf("failed to read config file: %v", err)
    }

    var config OCIConfig
    if err := json.Unmarshal(data, &config); err != nil {
        return nil, fmt.Errorf("failed to parse config: %v", err)
    }

    return &config, nil
}

// createContainer 创建容器
func createContainer(config *OCIConfig, bundlePath string) error {
    // 创建容器目录
    containerDir := filepath.Join(bundlePath, "container")
    if err := os.MkdirAll(containerDir, 0755); err != nil {
        return fmt.Errorf("failed to create container directory: %v", err)
    }

    // 创建状态文件
    stateFile := filepath.Join(containerDir, "state.json")
    state := map[string]interface{}{
        "ociVersion": config.OCIVersion,
        "id":         "container-123",
        "status":     "created",
        "bundle":     bundlePath,
        "created":    "2023-01-01T00:00:00Z",
    }

    stateData, err := json.MarshalIndent(state, "", "  ")
    if err != nil {
        return fmt.Errorf("failed to marshal state: %v", err)
    }

    if err := ioutil.WriteFile(stateFile, stateData, 0644); err != nil {
        return fmt.Errorf("failed to write state file: %v", err)
    }

    fmt.Println("Container created successfully")
    return nil
}

// startContainer 启动容器
func startContainer(config *OCIConfig, bundlePath string) error {
    // 创建所有 namespace 的 flags
    flags := syscall.CLONE_NEWUTS |
             syscall.CLONE_NEWPID |
             syscall.CLONE_NEWNS |
             syscall.CLONE_NEWNET |
             syscall.CLONE_NEWIPC |
             syscall.CLONE_NEWUSER |
             syscall.CLONE_NEWCGROUP

    // 准备子进程命令
    cmd := exec.Command("/proc/self/exe", "-bundle", bundlePath, "-action", "child")
    
    // 设置系统调用属性
    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags: flags,
    }

    // 设置标准输入输出
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    // 启动子进程
    if err := cmd.Start(); err != nil {
        return fmt.Errorf("failed to start child process: %v", err)
    }

    // 等待子进程完成
    return cmd.Wait()
}

// runContainer 运行容器
func runContainer(config *OCIConfig, bundlePath string) error {
    // 创建容器
    if err := createContainer(config, bundlePath); err != nil {
        return err
    }

    // 启动容器
    return startContainer(config, bundlePath)
}

// deleteContainer 删除容器
func deleteContainer(bundlePath string) error {
    containerDir := filepath.Join(bundlePath, "container")
    if err := os.RemoveAll(containerDir); err != nil {
        return fmt.Errorf("failed to remove container directory: %v", err)
    }

    fmt.Println("Container deleted successfully")
    return nil
}

// child 子进程入口点
func child(config *OCIConfig, bundlePath string) error {
    // 设置主机名
    if err := unix.Sethostname([]byte(config.Hostname)); err != nil {
        return fmt.Errorf("failed to set hostname: %v", err)
    }

    // 设置工作目录
    if err := os.Chdir(config.Process.Cwd); err != nil {
        return fmt.Errorf("failed to change working directory: %v", err)
    }

    // 挂载文件系统
    for _, mount := range config.Mounts {
        if err := mountFilesystem(mount); err != nil {
            return fmt.Errorf("failed to mount %s: %v", mount.Destination, err)
        }
    }

    // 切换根目录
    rootfsPath := filepath.Join(bundlePath, config.Root.Path)
    if err := pivotRoot(rootfsPath); err != nil {
        return fmt.Errorf("failed to pivot root: %v", err)
    }

    // 执行命令
    if err := syscall.Exec(config.Process.Args[0], config.Process.Args, config.Process.Env); err != nil {
        return fmt.Errorf("failed to exec command: %v", err)
    }

    return nil
}

// mountFilesystem 挂载文件系统
func mountFilesystem(mount struct {
    Destination string   `json:"destination"`
    Type        string   `json:"type"`
    Source      string   `json:"source"`
    Options     []string `json:"options"`
}) error {
    // 创建目标目录
    if err := os.MkdirAll(mount.Destination, 0755); err != nil {
        return fmt.Errorf("failed to create mount point: %v", err)
    }

    // 解析挂载选项
    var flags uintptr
    for _, option := range mount.Options {
        switch option {
        case "ro":
            flags |= unix.MS_RDONLY
        case "noexec":
            flags |= unix.MS_NOEXEC
        case "nosuid":
            flags |= unix.MS_NOSUID
        case "nodev":
            flags |= unix.MS_NODEV
        }
    }

    // 执行挂载
    if err := unix.Mount(mount.Source, mount.Destination, mount.Type, flags, ""); err != nil {
        return fmt.Errorf("failed to mount: %v", err)
    }

    return nil
}

// pivotRoot 切换根目录
func pivotRoot(newroot string) error {
    // 绑定挂载 newroot
    if err := unix.Mount(newroot, newroot, "", unix.MS_BIND|unix.MS_REC, ""); err != nil {
        return fmt.Errorf("failed to bind mount newroot: %v", err)
    }

    // 创建 put_old 目录
    putold := "/.oldroot"
    if err := os.MkdirAll(putold, 0700); err != nil {
        return fmt.Errorf("failed to create put_old directory: %v", err)
    }

    // 执行 pivot_root
    if err := unix.PivotRoot(newroot, putold); err != nil {
        return fmt.Errorf("failed to pivot_root: %v", err)
    }

    // 切换到新根目录
    if err := os.Chdir("/"); err != nil {
        return fmt.Errorf("failed to change working directory: %v", err)
    }

    // 卸载原根目录
    if err := unix.Unmount(putold, unix.MNT_DETACH); err != nil {
        return fmt.Errorf("failed to unmount old root: %v", err)
    }

    // 删除 put_old 目录
    if err := os.RemoveAll(putold); err != nil {
        return fmt.Errorf("failed to remove put_old directory: %v", err)
    }

    return nil
}

六、验证检查清单

基础理解

  • [ ] 理解 OCI 规范的核心内容
  • [ ] 掌握 OCI Runtime 和 Image 规范
  • [ ] 了解容器标准化的重要性
  • [ ] 掌握 OCI 工具链的使用

实践能力

  • [ ] 能够创建 OCI 兼容的配置
  • [ ] 能够实现 OCI 运行时
  • [ ] 能够使用 OCI 工具
  • [ ] 能够进行 OCI 镜像操作

高级技能

  • [ ] 能够实现完整的 OCI 运行时
  • [ ] 能够处理 OCI 镜像
  • [ ] 能够进行 OCI 标准化
  • [ ] 能够集成 OCI 工具链

实战实现

完整的 OCI 实现

根据实际开发经验,以下是完整的 OCI 规范实现代码:

1. OCI 镜像构建器

package main

import (
    "archive/tar"
    "compress/gzip"
    "crypto/sha256"
    "encoding/json"
    "fmt"
    "io"
    "os"
    "path/filepath"
    "time"
    
    "github.com/opencontainers/image-spec/specs-go/v1"
)

type OCIImageBuilder struct {
    baseDir    string
    layers     []LayerInfo
    config     *v1.Image
    manifest   *v1.Manifest
    index      *v1.Index
}

type LayerInfo struct {
    Path     string
    Digest   string
    Size     int64
    MediaType string
}

func NewOCIImageBuilder(baseDir string) *OCIImageBuilder {
    return &OCIImageBuilder{
        baseDir: baseDir,
        layers:  make([]LayerInfo, 0),
        config:  &v1.Image{},
    }
}

func (b *OCIImageBuilder) AddLayer(layerPath string) error {
    // 计算层摘要
    digest, size, err := b.calculateDigest(layerPath)
    if err != nil {
        return fmt.Errorf("failed to calculate digest: %v", err)
    }
    
    // 压缩层
    compressedPath, err := b.compressLayer(layerPath)
    if err != nil {
        return fmt.Errorf("failed to compress layer: %v", err)
    }
    
    // 更新压缩后的文件大小
    if stat, err := os.Stat(compressedPath); err == nil {
        size = stat.Size()
    }
    
    layer := LayerInfo{
        Path:       compressedPath,
        Digest:     digest,
        Size:       size,
        MediaType:  "application/vnd.oci.image.layer.v1.tar+gzip",
    }
    
    b.layers = append(b.layers, layer)
    return nil
}

func (b *OCIImageBuilder) calculateDigest(filePath string) (string, int64, error) {
    file, err := os.Open(filePath)
    if err != nil {
        return "", 0, err
    }
    defer file.Close()
    
    hasher := sha256.New()
    size, err := io.Copy(hasher, file)
    if err != nil {
        return "", 0, err
    }
    
    digest := fmt.Sprintf("sha256:%x", hasher.Sum(nil))
    return digest, size, nil
}

func (b *OCIImageBuilder) compressLayer(layerPath string) (string, error) {
    compressedPath := layerPath + ".tar.gz"
    
    // 创建压缩文件
    outFile, err := os.Create(compressedPath)
    if err != nil {
        return "", err
    }
    defer outFile.Close()
    
    gzWriter := gzip.NewWriter(outFile)
    defer gzWriter.Close()
    
    tarWriter := tar.NewWriter(gzWriter)
    defer tarWriter.Close()
    
    // 遍历目录并添加到 tar
    return compressedPath, filepath.Walk(layerPath, func(path string, info os.FileInfo, err error) error {
        if err != nil {
            return err
        }
        
        // 创建 tar 头
        header, err := tar.FileInfoHeader(info, "")
        if err != nil {
            return err
        }
        
        // 更新路径
        relPath, err := filepath.Rel(layerPath, path)
        if err != nil {
            return err
        }
        header.Name = relPath
        
        // 写入头
        if err := tarWriter.WriteHeader(header); err != nil {
            return err
        }
        
        // 如果是文件,写入内容
        if !info.IsDir() {
            file, err := os.Open(path)
            if err != nil {
                return err
            }
            defer file.Close()
            
            _, err = io.Copy(tarWriter, file)
            return err
        }
        
        return nil
    })
}

2. 镜像配置生成

func (b *OCIImageBuilder) BuildConfig(architecture, os string) error {
    // 设置基础配置
    b.config.Architecture = architecture
    b.config.OS = os
    b.config.Created = &v1.Time{Time: time.Now()}
    
    // 设置根文件系统
    b.config.RootFS = v1.RootFS{
        Type:    "layers",
        DiffIDs: make([]v1.Hash, len(b.layers)),
    }
    
    // 计算每层的 diff ID
    for i, layer := range b.layers {
        diffID, err := b.calculateDiffID(layer.Path)
        if err != nil {
            return fmt.Errorf("failed to calculate diff ID for layer %d: %v", i, err)
        }
        b.config.RootFS.DiffIDs[i] = diffID
    }
    
    // 设置历史记录
    b.config.History = make([]v1.History, len(b.layers))
    for i, layer := range b.layers {
        b.config.History[i] = v1.History{
            Created:    &v1.Time{Time: time.Now()},
            CreatedBy:  fmt.Sprintf("layer %d", i),
            EmptyLayer: false,
        }
    }
    
    return nil
}

func (b *OCIImageBuilder) calculateDiffID(layerPath string) (v1.Hash, error) {
    file, err := os.Open(layerPath)
    if err != nil {
        return v1.Hash{}, err
    }
    defer file.Close()
    
    // 解压缩并计算摘要
    gzReader, err := gzip.NewReader(file)
    if err != nil {
        return v1.Hash{}, err
    }
    defer gzReader.Close()
    
    hasher := sha256.New()
    _, err = io.Copy(hasher, gzReader)
    if err != nil {
        return v1.Hash{}, err
    }
    
    return v1.Hash{
        Algorithm: "sha256",
        Hex:       fmt.Sprintf("%x", hasher.Sum(nil)),
    }, nil
}

3. 清单生成

func (b *OCIImageBuilder) BuildManifest() error {
    // 生成配置摘要
    configDigest, configSize, err := b.saveConfig()
    if err != nil {
        return fmt.Errorf("failed to save config: %v", err)
    }
    
    // 创建清单
    b.manifest = &v1.Manifest{
        Versioned: v1.Versioned{
            SchemaVersion: 2,
        },
        MediaType: "application/vnd.oci.image.manifest.v1+json",
        Config: v1.Descriptor{
            MediaType: "application/vnd.oci.image.config.v1+json",
            Digest:    configDigest,
            Size:      configSize,
        },
        Layers: make([]v1.Descriptor, len(b.layers)),
    }
    
    // 添加层描述符
    for i, layer := range b.layers {
        b.manifest.Layers[i] = v1.Descriptor{
            MediaType: layer.MediaType,
            Digest:    v1.Hash{Algorithm: "sha256", Hex: layer.Digest[7:]}, // 去掉 "sha256:" 前缀
            Size:      layer.Size,
        }
    }
    
    return nil
}

func (b *OCIImageBuilder) saveConfig() (v1.Hash, int64, error) {
    configPath := filepath.Join(b.baseDir, "config.json")
    
    // 序列化配置
    configData, err := json.MarshalIndent(b.config, "", "  ")
    if err != nil {
        return v1.Hash{}, 0, err
    }
    
    // 保存配置
    if err := os.WriteFile(configPath, configData, 0644); err != nil {
        return v1.Hash{}, 0, err
    }
    
    // 计算摘要
    hasher := sha256.New()
    hasher.Write(configData)
    digest := v1.Hash{
        Algorithm: "sha256",
        Hex:       fmt.Sprintf("%x", hasher.Sum(nil)),
    }
    
    return digest, int64(len(configData)), nil
}

4. 索引生成

func (b *OCIImageBuilder) BuildIndex() error {
    // 生成清单摘要
    manifestDigest, manifestSize, err := b.saveManifest()
    if err != nil {
        return fmt.Errorf("failed to save manifest: %v", err)
    }
    
    // 创建索引
    b.index = &v1.Index{
        Versioned: v1.Versioned{
            SchemaVersion: 2,
        },
        MediaType: "application/vnd.oci.image.index.v1+json",
        Manifests: []v1.Descriptor{
            {
                MediaType: "application/vnd.oci.image.manifest.v1+json",
                Digest:    manifestDigest,
                Size:      manifestSize,
                Platform: &v1.Platform{
                    Architecture: b.config.Architecture,
                    OS:           b.config.OS,
                },
            },
        },
    }
    
    return nil
}

func (b *OCIImageBuilder) saveManifest() (v1.Hash, int64, error) {
    manifestPath := filepath.Join(b.baseDir, "manifest.json")
    
    // 序列化清单
    manifestData, err := json.MarshalIndent(b.manifest, "", "  ")
    if err != nil {
        return v1.Hash{}, 0, err
    }
    
    // 保存清单
    if err := os.WriteFile(manifestPath, manifestData, 0644); err != nil {
        return v1.Hash{}, 0, err
    }
    
    // 计算摘要
    hasher := sha256.New()
    hasher.Write(manifestData)
    digest := v1.Hash{
        Algorithm: "sha256",
        Hex:       fmt.Sprintf("%x", hasher.Sum(nil)),
    }
    
    return digest, int64(len(manifestData)), nil
}

5. 镜像保存

func (b *OCIImageBuilder) Save() error {
    // 保存索引
    indexPath := filepath.Join(b.baseDir, "index.json")
    indexData, err := json.MarshalIndent(b.index, "", "  ")
    if err != nil {
        return fmt.Errorf("failed to marshal index: %v", err)
    }
    
    if err := os.WriteFile(indexPath, indexData, 0644); err != nil {
        return fmt.Errorf("failed to save index: %v", err)
    }
    
    // 移动层文件到 blobs 目录
    blobsDir := filepath.Join(b.baseDir, "blobs", "sha256")
    if err := os.MkdirAll(blobsDir, 0755); err != nil {
        return fmt.Errorf("failed to create blobs directory: %v", err)
    }
    
    for _, layer := range b.layers {
        destPath := filepath.Join(blobsDir, layer.Digest[7:]) // 去掉 "sha256:" 前缀
        if err := os.Rename(layer.Path, destPath); err != nil {
            return fmt.Errorf("failed to move layer %s: %v", layer.Digest, err)
        }
    }
    
    return nil
}

OCI 运行时实现

1. 运行时接口

type OCIRuntime struct {
    rootDir    string
    stateDir   string
    configFile string
}

type ContainerState struct {
    ID          string    `json:"id"`
    Status      string    `json:"status"`
    Pid         int       `json:"pid"`
    Bundle      string    `json:"bundle"`
    CreatedAt   time.Time `json:"createdAt"`
    StartedAt   time.Time `json:"startedAt"`
    FinishedAt  time.Time `json:"finishedAt"`
}

func NewOCIRuntime(rootDir, stateDir string) *OCIRuntime {
    return &OCIRuntime{
        rootDir:    rootDir,
        stateDir:   stateDir,
        configFile: "config.json",
    }
}

2. 容器创建

func (r *OCIRuntime) Create(containerID, bundlePath string) error {
    // 验证 bundle 路径
    if _, err := os.Stat(bundlePath); err != nil {
        return fmt.Errorf("bundle path does not exist: %v", err)
    }
    
    // 验证配置文件
    configPath := filepath.Join(bundlePath, r.configFile)
    if _, err := os.Stat(configPath); err != nil {
        return fmt.Errorf("config.json not found: %v", err)
    }
    
    // 创建容器状态
    state := &ContainerState{
        ID:        containerID,
        Status:    "created",
        Bundle:    bundlePath,
        CreatedAt: time.Now(),
    }
    
    // 保存状态
    if err := r.saveState(containerID, state); err != nil {
        return fmt.Errorf("failed to save state: %v", err)
    }
    
    return nil
}

3. 容器启动

func (r *OCIRuntime) Start(containerID string) error {
    // 加载容器状态
    state, err := r.loadState(containerID)
    if err != nil {
        return fmt.Errorf("failed to load state: %v", err)
    }
    
    if state.Status != "created" {
        return fmt.Errorf("container is not in created state")
    }
    
    // 解析配置
    config, err := r.loadConfig(state.Bundle)
    if err != nil {
        return fmt.Errorf("failed to load config: %v", err)
    }
    
    // 启动容器进程
    pid, err := r.startContainer(config)
    if err != nil {
        return fmt.Errorf("failed to start container: %v", err)
    }
    
    // 更新状态
    state.Status = "running"
    state.Pid = pid
    state.StartedAt = time.Now()
    
    if err := r.saveState(containerID, state); err != nil {
        return fmt.Errorf("failed to update state: %v", err)
    }
    
    return nil
}

4. 容器停止

func (r *OCIRuntime) Kill(containerID string, signal syscall.Signal) error {
    // 加载容器状态
    state, err := r.loadState(containerID)
    if err != nil {
        return fmt.Errorf("failed to load state: %v", err)
    }
    
    if state.Status != "running" {
        return fmt.Errorf("container is not running")
    }
    
    // 发送信号
    if err := syscall.Kill(state.Pid, signal); err != nil {
        return fmt.Errorf("failed to send signal: %v", err)
    }
    
    // 等待进程结束
    if signal == syscall.SIGKILL || signal == syscall.SIGTERM {
        // 等待进程结束
        for i := 0; i < 30; i++ { // 最多等待 30 秒
            if err := syscall.Kill(state.Pid, 0); err != nil {
                // 进程已结束
                break
            }
            time.Sleep(1 * time.Second)
        }
    }
    
    // 更新状态
    state.Status = "stopped"
    state.FinishedAt = time.Now()
    
    if err := r.saveState(containerID, state); err != nil {
        return fmt.Errorf("failed to update state: %v", err)
    }
    
    return nil
}

实战练习

练习 1:OCI 镜像构建

  1. 实现基本的 OCI 镜像构建器
  2. 支持多层镜像构建
  3. 生成标准的 OCI 格式

验证步骤:

# 1. 编译程序
go build -o oci-builder main.go

# 2. 构建镜像
./oci-builder build --base-dir=/tmp/oci-image --layers=layer1,layer2,layer3

# 3. 验证镜像格式
ls -la /tmp/oci-image/
# 应该看到 index.json, manifest.json, config.json 和 blobs 目录

# 4. 验证镜像内容
cat /tmp/oci-image/index.json | jq .
cat /tmp/oci-image/manifest.json | jq .

练习 2:OCI 运行时实现

  1. 实现基本的 OCI 运行时
  2. 支持容器创建、启动、停止
  3. 管理容器状态

验证步骤:

# 1. 编译运行时
go build -o oci-runtime main.go

# 2. 创建容器
./oci-runtime create test-container /path/to/bundle

# 3. 启动容器
./oci-runtime start test-container

# 4. 查看容器状态
./oci-runtime state test-container

# 5. 停止容器
./oci-runtime kill test-container SIGTERM

练习 3:OCI 工具集成

  1. 与现有 OCI 工具集成
  2. 支持镜像导入导出
  3. 实现镜像验证

验证步骤:

# 1. 导入镜像
./oci-runtime import --source=docker://nginx:latest --target=nginx-oci

# 2. 验证镜像
./oci-runtime verify nginx-oci

# 3. 导出镜像
./oci-runtime export nginx-oci --output=nginx-oci.tar

# 4. 使用 skopeo 验证
skopeo inspect oci:nginx-oci

性能优化

1. 镜像构建优化

  • 使用并发构建
  • 优化层压缩算法
  • 实现增量构建

2. 运行时优化

  • 优化容器启动时间
  • 减少内存占用
  • 提高并发性能

3. 存储优化

  • 使用内容寻址存储
  • 实现层去重
  • 优化存储布局

相关链接

  • 14-调试技术与工具 - 调试技术详解
  • 16-进阶场景与优化 - 进阶应用
  • 17-环境准备与依赖 - 环境配置

下一步:让我们学习进阶场景与优化,这是容器技术的高级应用!

Prev
14-调试技术与工具
Next
16-进阶场景与优化