<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Kubernetes Blog</title>
    <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/</link>
    <description>The Kubernetes blog is used by the project to communicate new features, community reports, and any news that might be relevant to the Kubernetes community.</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>zh-cn</language>
    <image>
      <url>https://raw.githubusercontent.com/kubernetes/kubernetes/master/logo/logo.png</url>
      <title>The Kubernetes project logo</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/</link>
    </image>
    
    <atom:link href="https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/feed.xml" rel="self" type="application/rss+xml" />
    
    
    <item>
      <title>Headlamp 2025 年度项目亮点</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2026/01/22/headlamp-in-2025-project-highlights/</link>
      <pubDate>Thu, 22 Jan 2026 10:00:00 +0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2026/01/22/headlamp-in-2025-project-highlights/</guid>
      <description>
        
        
        &lt;!--
title: &#34;Headlamp in 2025: Project Highlights&#34;
date: 2026-01-22T10:00:00+08:00
slug: headlamp-in-2025-project-highlights
author: &gt;
  Evangelos Skopelitis (Microsoft)
--&gt;
&lt;!--
_This announcement is a recap from a post originally [published](https://headlamp.dev/blog/2025/11/13/headlamp-in-2025) on the Headlamp blog._
--&gt;
&lt;p&gt;&lt;strong&gt;本公告是对最初在 Headlamp 博客上&lt;a href=&#34;https://headlamp.dev/blog/2025/11/13/headlamp-in-2025&#34;&gt;发布&lt;/a&gt;的帖子的回顾。&lt;/strong&gt;&lt;/p&gt;
&lt;!--
[Headlamp](https://headlamp.dev/) has come a long way in 2025. The project has continued to grow – reaching more teams across platforms, powering new workflows and integrations through plugins, and seeing increased collaboration from the broader community.
--&gt;
&lt;p&gt;&lt;a href=&#34;https://headlamp.dev/&#34;&gt;Headlamp&lt;/a&gt; 在 2025 年取得了长足的发展。该项目持续成长，覆盖了更多平台和团队；
通过插件机制支持了新的工作流和集成方式；同时也看到了来自更广泛社区的协作不断增强。&lt;/p&gt;
&lt;!--
We wanted to take a moment to share a few updates and highlight how Headlamp has evolved over the past year.
--&gt;
&lt;p&gt;我们想借此机会分享一些最新进展，并重点介绍 Headlamp 在过去一年中的演进与变化。&lt;/p&gt;
&lt;!--
## Updates
--&gt;
&lt;h2 id=&#34;updates&#34;&gt;更新&lt;/h2&gt;
&lt;!--
### Joining Kubernetes SIG UI
--&gt;
&lt;h3 id=&#34;joining-kubernetes-sig-ui&#34;&gt;加入 Kubernetes SIG UI&lt;/h3&gt;
&lt;!--
This year marked a big milestone for the project: Headlamp is now officially part of Kubernetes [SIG UI](https://github.com/kubernetes/community/blob/master/sig-ui/README.md). This move brings roadmap and design discussions even closer to the core Kubernetes community and reinforces Headlamp&#39;s role as a modern, extensible UI for the project.
--&gt;
&lt;p&gt;今年标志着该项目的一个重要里程碑：Headlamp 现已成为 Kubernetes &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-ui/README.md&#34;&gt;SIG UI&lt;/a&gt;
的正式组成部分。此举使路线图和设计讨论更贴近 Kubernetes 核心社区，并强化了 Headlamp 作为该项目现代化、可扩展 UI 的角色。&lt;/p&gt;


    
    &lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
      &lt;iframe allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&#34; allowfullscreen=&#34;allowfullscreen&#34; loading=&#34;eager&#34; referrerpolicy=&#34;strict-origin-when-cross-origin&#34; src=&#34;https://www.youtube.com/embed/Q5xkeoj6JiA?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; title=&#34;YouTube video&#34;
      &gt;&lt;/iframe&gt;
    &lt;/div&gt;

&lt;!--
As part of that, we&#39;ve also been sharing more about making Kubernetes approachable for a wider audience, including an [appearance on Enlightening with Whitney Lee](https://www.youtube.com/watch?v=VFOSyKVOPxs) and a [talk at KCD New York 2025](https://www.youtube.com/watch?v=Q7cbT2UBfE0).
--&gt;
&lt;p&gt;作为其中的一部分，我们还分享了更多关于让 Kubernetes 面向更广泛受众的内容，
包括在 &lt;a href=&#34;https://www.youtube.com/watch?v=VFOSyKVOPxs&#34;&gt;Enlightening with Whitney Lee&lt;/a&gt;
上的亮相以及在 &lt;a href=&#34;https://www.youtube.com/watch?v=Q7cbT2UBfE0&#34;&gt;KCD New York 2025&lt;/a&gt; 上的演讲。&lt;/p&gt;
&lt;!--
### Linux Foundation mentorship
--&gt;
&lt;h3 id=&#34;linux-foundation-mentorship&#34;&gt;Linux Foundation 导师计划&lt;/h3&gt;
&lt;!--
This year, we were excited to work with several students through the Linux Foundation&#39;s Mentorship program, and our mentees have already left a visible mark on Headlamp:
--&gt;
&lt;p&gt;今年，我们很高兴通过 Linux Foundation 的导师计划与多名学生合作，我们的学员已经在 Headlamp 上留下了明显的印记：&lt;/p&gt;
&lt;!--
- [**Adwait Godbole**](https://github.com/adwait-godbole) built the KEDA plugin, adding a UI in Headlamp to view and manage KEDA resources like ScaledObjects and ScaledJobs.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/adwait-godbole&#34;&gt;&lt;strong&gt;Adwait Godbole&lt;/strong&gt;&lt;/a&gt; 构建了 KEDA 插件，
在 Headlamp 中添加了用于查看和管理 KEDA 资源（如 ScaledObjects 和 ScaledJobs）的 UI。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- [**Dhairya Majmudar**](https://github.com/DhairyaMajmudar) set up an OpenTelemetry-based observability stack for Headlamp, wiring up metrics, logs, and traces so the project is easier to monitor and debug.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/DhairyaMajmudar&#34;&gt;&lt;strong&gt;Dhairya Majmudar&lt;/strong&gt;&lt;/a&gt; 为 Headlamp 设置了基于 OpenTelemetry 的可观测性堆栈，
连接指标、日志和追踪，使项目更易于监控和调试。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- [**Aishwarya Ghatole**](https://www.linkedin.com/in/aishwarya-ghatole-506745231/) led a UX audit of Headlamp plugins, identifying usability issues and proposing design improvements and personas for plugin users.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.linkedin.com/in/aishwarya-ghatole-506745231/&#34;&gt;&lt;strong&gt;Aishwarya Ghatole&lt;/strong&gt;&lt;/a&gt; 领导了 Headlamp 插件的 UX 审计，
识别可用性问题，并提出设计改进和插件用户画像。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- [**Anirban Singha**](https://github.com/SinghaAnirban005) developed the Karpenter plugin, giving Headlamp a focused view into Karpenter autoscaling resources and decisions.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/SinghaAnirban005&#34;&gt;&lt;strong&gt;Anirban Singha&lt;/strong&gt;&lt;/a&gt; 开发了 Karpenter 插件，
为 Headlamp 提供了专注于 Karpenter 自动扩缩容资源和决策的视图。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- [**Aditya Chaudhary**](https://github.com/useradityaa) improved Gateway API support, so you can see networking relationships on the resource map, as well as improved support for many of the new Gateway API resources.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/useradityaa&#34;&gt;&lt;strong&gt;Aditya Chaudhary&lt;/strong&gt;&lt;/a&gt; 改进了 Gateway API 支持，
你可以在资源映射上看到网络关系，以及对许多新的 Gateway API 资源的改进支持。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- [**Faakhir Zahid**](https://github.com/Faakhir30) completed a way to easily [manage plugin installation](https://headlamp.dev/docs/latest/installation/in-cluster/#plugin-management) with Headlamp deployed in clusters.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/Faakhir30&#34;&gt;&lt;strong&gt;Faakhir Zahid&lt;/strong&gt;&lt;/a&gt; 完成了一种在集群中部署 Headlamp 时
轻松&lt;a href=&#34;https://headlamp.dev/docs/latest/installation/in-cluster/#plugin-management&#34;&gt;管理插件安装&lt;/a&gt;的方法。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- [**Saurav Upadhyay**](https://github.com/upsaurav12) worked on backend caching for Kubernetes API calls, reducing load on the API server and improving performance in Headlamp.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/upsaurav12&#34;&gt;&lt;strong&gt;Saurav Upadhyay&lt;/strong&gt;&lt;/a&gt; 致力于 Kubernetes API 调用的后端缓存，
减少 API 服务器负载并提高 Headlamp 的性能。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## New changes
--&gt;
&lt;h2 id=&#34;new-changes&#34;&gt;新变更&lt;/h2&gt;
&lt;!--
### Multi-cluster view
--&gt;
&lt;h3 id=&#34;multi-cluster-view&#34;&gt;多集群视图&lt;/h3&gt;
&lt;!--
Managing multiple clusters is challenging: teams often switch between tools and lose context when trying to see what runs where. Headlamp solves this by giving you a single view to compare clusters side-by-side. This makes it easier to understand workloads across environments and reduces the time spent hunting for resources.
--&gt;
&lt;p&gt;管理多个集群具有挑战性：团队经常在工具之间切换，在尝试查看哪些内容在哪里运行时失去上下文。
Headlamp 通过提供单一视图来并排比较集群来解决这个问题。这使得跨环境理解工作负载变得更容易，
并减少了查找资源所花费的时间。&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2026/01/22/headlamp-in-2025-project-highlights/multi-cluster-view.png&#34;
         alt=&#34;Multi-cluster view&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;View of multi-cluster workloads&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;!--
### Projects
--&gt;
&lt;h3 id=&#34;projects&#34;&gt;项目&lt;/h3&gt;
&lt;!--
Kubernetes apps often span multiple namespaces and resource types, which makes troubleshooting feel like piecing together a puzzle. We&#39;ve added **Projects** to give you an application-centric view that groups related resources across multiple namespaces – and even clusters. This allows you to reduce sprawl, troubleshoot faster, and collaborate without digging through YAML or cluster-wide lists.
--&gt;
&lt;p&gt;Kubernetes 应用通常跨越多个命名空间和资源类型，这使得故障排除感觉像是在拼拼图一样。
我们添加了&lt;strong&gt;项目（Projects）&lt;/strong&gt;，为你提供以应用为中心的视图，将相关资源分组到多个命名空间——甚至集群中。
这使你能够减少蔓延、更快地进行故障排除，并在无需深入研究 YAML 或集群范围列表的情况下进行协作。&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2026/01/22/headlamp-in-2025-project-highlights/projects-feature.png&#34;
         alt=&#34;Projects feature&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;View of the new Projects feature&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;!--
Changes:
--&gt;
&lt;p&gt;变更：&lt;/p&gt;
&lt;!--
- New &#34;Projects&#34; feature for grouping namespaces into app- or team-centric projects
--&gt;
&lt;ul&gt;
&lt;li&gt;新的&amp;quot;项目（Projects）&amp;quot;特性，用于将命名空间分组为以应用或团队为中心的项目&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Extensible Projects details view that plugins can customize with their own tabs and actions
--&gt;
&lt;ul&gt;
&lt;li&gt;可扩展的项目详细信息视图，插件可以使用自己的标签页和操作进行自定义&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
### Navigation and Activities
--&gt;
&lt;h3 id=&#34;navigation-and-activities&#34;&gt;导航和活动&lt;/h3&gt;
&lt;!--
Day-to-day ops in Kubernetes often means juggling logs, terminals, YAML, and dashboards across clusters. We redesigned Headlamp&#39;s navigation to treat these as first-class &#34;activities&#34; you can keep open and come back to, instead of one-off views you lose as soon as you click away.
--&gt;
&lt;p&gt;Kubernetes 中的日常运维通常意味着在集群之间处理日志、终端、YAML 和仪表板。
我们重新设计了 Headlamp 的导航，将这些视为一流的&amp;quot;活动&amp;quot;，你可以保持打开并随时返回，
而不是在点击离开后立即丢失的一次性视图。&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2026/01/22/headlamp-in-2025-project-highlights/new-task-bar.png&#34;
         alt=&#34;New task bar&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;View of the new task bar&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;!--
Changes:
--&gt;
&lt;p&gt;变更：&lt;/p&gt;
&lt;!--
- A new task bar/activities model lets you pin logs, exec sessions, and details as ongoing activities
--&gt;
&lt;ul&gt;
&lt;li&gt;新的任务栏/活动模型允许你将日志、exec 会话和详细信息固定为正在进行的活动&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- An activity overview with a &#34;Close all&#34; action and cluster information
--&gt;
&lt;ul&gt;
&lt;li&gt;活动概览，带有&amp;quot;全部关闭&amp;quot;操作和集群信息&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Multi-select and global filters in tables
--&gt;
&lt;ul&gt;
&lt;li&gt;表格中的多选和全局过滤器&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Thanks to [Jan Jansen](https://github.com/farodin91) and [Aditya Chaudhary](https://github.com/useradityaa).
--&gt;
&lt;p&gt;感谢 &lt;a href=&#34;https://github.com/farodin91&#34;&gt;Jan Jansen&lt;/a&gt; 和 &lt;a href=&#34;https://github.com/useradityaa&#34;&gt;Aditya Chaudhary&lt;/a&gt;。&lt;/p&gt;
&lt;!--
### Search and map
--&gt;
&lt;h3 id=&#34;search-and-map&#34;&gt;搜索和映射&lt;/h3&gt;
&lt;!--
When something breaks in production, the first two questions are usually &#34;where is it?&#34; and &#34;what is it connected to?&#34; We&#39;ve upgraded both search and the map view so you can get from a high-level symptom to the right set of objects much faster.
--&gt;
&lt;p&gt;当生产环境中出现问题时，前两个问题通常是&amp;quot;它在哪里？&amp;quot;和&amp;quot;它连接到什么？&amp;quot;我们升级了搜索和映射视图，
以便你可以更快地从高级症状定位到正确的对象集。&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2026/01/22/headlamp-in-2025-project-highlights/advanced-search.png&#34;
         alt=&#34;Advanced search&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;View of the new Advanced Search feature&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;!--
Changes:
--&gt;
&lt;p&gt;变更：&lt;/p&gt;
&lt;!--
- An Advanced search view that supports rich, expression-based queries over Kubernetes objects
--&gt;
&lt;ul&gt;
&lt;li&gt;高级搜索视图，支持对 Kubernetes 对象进行丰富的、基于表达式的查询&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Improved global search that understands labels and multiple search items, and can even update your current namespace based on what you find
--&gt;
&lt;ul&gt;
&lt;li&gt;改进的全局搜索，理解标签和多个搜索项，甚至可以根据你找到的内容更新当前命名空间&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- EndpointSlice support in the Network section
--&gt;
&lt;ul&gt;
&lt;li&gt;网络部分中的 EndpointSlice 支持&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- A richer map view that now includes Custom Resources and Gateway API objects
--&gt;
&lt;ul&gt;
&lt;li&gt;更丰富的映射视图，现在包括自定义资源和 Gateway API 对象&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Thanks to [Fabian](https://github.com/faebr), [Alexander North](https://github.com/alexandernorth), and [Victor Marcolino](https://github.com/victormarcolino) from Swisscom, and also to [Aditya Chaudhary](https://github.com/useradityaa).
--&gt;
&lt;p&gt;感谢来自 Swisscom 的 &lt;a href=&#34;https://github.com/faebr&#34;&gt;Fabian&lt;/a&gt;、&lt;a href=&#34;https://github.com/alexandernorth&#34;&gt;Alexander North&lt;/a&gt;
和 &lt;a href=&#34;https://github.com/victormarcolino&#34;&gt;Victor Marcolino&lt;/a&gt;，以及 &lt;a href=&#34;https://github.com/useradityaa&#34;&gt;Aditya Chaudhary&lt;/a&gt;。&lt;/p&gt;
&lt;!--
### OIDC and authentication
--&gt;
&lt;h3 id=&#34;oidc-and-authentication&#34;&gt;OIDC 和身份认证&lt;/h3&gt;
&lt;!--
We&#39;ve put real work into making OIDC setup clearer and more resilient, especially for in-cluster deployments.
--&gt;
&lt;p&gt;我们在使 OIDC 设置更清晰、更具弹性方面做了实际工作，特别是对于集群内部署。&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2026/01/22/headlamp-in-2025-project-highlights/user-info.png&#34;
         alt=&#34;User info&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;View of user information for OIDC clusters&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;!--
Changes:
--&gt;
&lt;p&gt;变更：&lt;/p&gt;
&lt;!--
- User information displayed in the top bar for OIDC-authenticated users
--&gt;
&lt;ul&gt;
&lt;li&gt;在顶部栏中为 OIDC 认证用户显示用户信息&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- PKCE support for more secure authentication flows, as well as hardened token refresh handling
--&gt;
&lt;ul&gt;
&lt;li&gt;PKCE 支持更安全的身份认证流程，以及强化的令牌刷新处理&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Documentation for using the access token using `-oidc-use-access-token=true`
--&gt;
&lt;ul&gt;
&lt;li&gt;使用 &lt;code&gt;-oidc-use-access-token=true&lt;/code&gt; 使用访问令牌的文档&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Improved support for public OIDC clients like AKS and EKS
--&gt;
&lt;ul&gt;
&lt;li&gt;改进了对 AKS 和 EKS 等公共 OIDC 客户端的支持&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- New guide for setting up Headlamp [on AKS with Azure Entra-ID using OAuth2Proxy](https://headlamp.dev/docs/latest/installation/in-cluster/aks-cluster-oauth/)
--&gt;
&lt;ul&gt;
&lt;li&gt;使用 OAuth2Proxy 在 AKS 上使用 Azure Entra-ID 设置 Headlamp 的&lt;a href=&#34;https://headlamp.dev/docs/latest/installation/in-cluster/aks-cluster-oauth/&#34;&gt;新指南&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Thanks to [David Dobmeier](https://github.com/daviddob) and [Harsh Srivastava](https://github.com/HarshSrivastava275).
--&gt;
&lt;p&gt;感谢 &lt;a href=&#34;https://github.com/daviddob&#34;&gt;David Dobmeier&lt;/a&gt; 和 &lt;a href=&#34;https://github.com/HarshSrivastava275&#34;&gt;Harsh Srivastava&lt;/a&gt;。&lt;/p&gt;
&lt;!--
### App Catalog and Helm
--&gt;
&lt;h3 id=&#34;app-catalog-and-helm&#34;&gt;应用目录和 Helm&lt;/h3&gt;
&lt;!--
We&#39;ve broadened how you deploy and source apps via Headlamp, specifically supporting vanilla Helm repos.
--&gt;
&lt;p&gt;我们扩展了通过 Headlamp 部署和获取应用的方式，特别是支持原生 Helm 仓库。&lt;/p&gt;
&lt;!--
Changes:
--&gt;
&lt;p&gt;变更：&lt;/p&gt;
&lt;!--
- A more capable Helm chart with optional backend TLS termination, PodDisruptionBudgets, custom pod labels, and more
--&gt;
&lt;ul&gt;
&lt;li&gt;功能更强大的 Helm chart，具有可选的后端 TLS 终止、PodDisruptionBudgets、自定义 Pod 标签等&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Improved formatting and added missing access token arg in the Helm chart
--&gt;
&lt;ul&gt;
&lt;li&gt;改进了 Helm chart 中的格式并添加了缺失的访问令牌参数&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- New in-cluster Helm support with an `--enable-helm` flag and a service proxy
--&gt;
&lt;ul&gt;
&lt;li&gt;新的集群内 Helm 支持，带有 &lt;code&gt;--enable-helm&lt;/code&gt; 标志和服务代理&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Thanks to [Vrushali Shah](https://github.com/shahvrushali22) and [Murali Annamneni](https://github.com/muraliinformal) from Oracle, and also to [Pat Riehecky](https://github.com/jcpunk), [Joshua Akers](https://github.com/jda258), [Rostislav Stříbrný](https://github.com/rstribrn), [Rick L](https://github.com/rickliujh), and [Victor](https://github.com/vnea).
--&gt;
&lt;p&gt;感谢来自 Oracle 的 &lt;a href=&#34;https://github.com/shahvrushali22&#34;&gt;Vrushali Shah&lt;/a&gt; 和 &lt;a href=&#34;https://github.com/muraliinformal&#34;&gt;Murali Annamneni&lt;/a&gt;，
以及 &lt;a href=&#34;https://github.com/jcpunk&#34;&gt;Pat Riehecky&lt;/a&gt;、&lt;a href=&#34;https://github.com/jda258&#34;&gt;Joshua Akers&lt;/a&gt;、
&lt;a href=&#34;https://github.com/rstribrn&#34;&gt;Rostislav Stříbrný&lt;/a&gt;、&lt;a href=&#34;https://github.com/rickliujh&#34;&gt;Rick L&lt;/a&gt; 和 &lt;a href=&#34;https://github.com/vnea&#34;&gt;Victor&lt;/a&gt;。&lt;/p&gt;
&lt;!--
### Performance, accessibility, and UX
--&gt;
&lt;h3 id=&#34;performance-accessibility-and-ux&#34;&gt;性能、可访问性和用户体验&lt;/h3&gt;
&lt;!--
Finally, we&#39;ve spent a lot of time on the things you notice every day but don&#39;t always make headlines: startup time, list views, log viewers, accessibility, and small network UX details. A continuous accessibility self-audit has also helped us identify key issues and make Headlamp easier for everyone to use.
--&gt;
&lt;p&gt;最后，我们在你每天注意到但不总是成为头条的事情上花费了大量时间：启动时间、列表视图、日志查看器、可访问性以及小的网络 UX 细节。
持续的可访问性自我审计也帮助我们识别关键问题，并使 Headlamp 更易于每个人使用。&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2026/01/22/headlamp-in-2025-project-highlights/learn-section.png&#34;
         alt=&#34;Learn section&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;View of the Learn section in docs&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;!--
Changes:
--&gt;
&lt;p&gt;变更：&lt;/p&gt;
&lt;!--
- Significant desktop improvements, with up to 60% faster app loads and much quicker dev-mode reloads for contributors
--&gt;
&lt;ul&gt;
&lt;li&gt;显著的桌面改进，应用加载速度提高高达 60%，为贡献者提供更快的开发模式重载&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Numerous table and log viewer refinements: persistent sort order, consistent row actions, copy-name buttons, better tooltips, and more forgiving log inputs
--&gt;
&lt;ul&gt;
&lt;li&gt;大量表格和日志查看器改进：持久排序顺序、一致的行操作、复制名称按钮、更好的工具提示以及更宽松的日志输入&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Accessibility and localization improvements, including fixes for zoom-related layout issues, better color contrast, improved screen reader support, and expanded language coverage
--&gt;
&lt;ul&gt;
&lt;li&gt;可访问性和本地化改进，包括修复与缩放相关的布局问题、更好的颜色对比度、改进的屏幕阅读器支持以及扩展的语言覆盖范围&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- More control over resources, with live pod CPU/memory metrics, richer pod details, and inline editing for secrets and CRD fields
--&gt;
&lt;ul&gt;
&lt;li&gt;对资源的更多控制，包括实时 Pod CPU/内存指标、更丰富的 Pod 详细信息以及 Secret 和 CRD 字段的内联编辑&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- A refreshed documentation and plugin onboarding experience, including a &#34;Learn&#34; section and plugin showcase
--&gt;
&lt;ul&gt;
&lt;li&gt;刷新的文档和插件入门体验，包括&amp;quot;学习&amp;quot;部分和插件展示&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- A more complete NetworkPolicy UI and network-related polish
--&gt;
&lt;ul&gt;
&lt;li&gt;更完整的 NetworkPolicy UI 和网络相关的改进&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Nightly builds available for early testing
--&gt;
&lt;ul&gt;
&lt;li&gt;提供夜间构建版本用于早期测试&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Thanks to [Jaehan Byun](https://github.com/jaehanbyun) and [Jan Jansen](https://github.com/farodin91).
--&gt;
&lt;p&gt;感谢 &lt;a href=&#34;https://github.com/jaehanbyun&#34;&gt;Jaehan Byun&lt;/a&gt; 和 &lt;a href=&#34;https://github.com/farodin91&#34;&gt;Jan Jansen&lt;/a&gt;。&lt;/p&gt;
&lt;!--
## Plugins and extensibility
--&gt;
&lt;h2 id=&#34;plugins-and-extensibility&#34;&gt;插件和可扩展性&lt;/h2&gt;
&lt;!--
Discovering plugins is simpler now – no more hopping between Artifact Hub and assorted GitHub repos. Browse our dedicated [Plugins page](https://headlamp.dev/plugins) for a curated catalog of Headlamp-endorsed plugins, along with a showcase of featured plugins.
--&gt;
&lt;p&gt;现在发现插件更简单了——不再需要在 Artifact Hub 和各种 GitHub 仓库之间跳转。浏览我们专门的&lt;a href=&#34;https://headlamp.dev/plugins&#34;&gt;插件页面&lt;/a&gt;，
查看 Headlamp 认可的插件精选目录以及特色插件展示。&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2026/01/22/headlamp-in-2025-project-highlights/plugins-page.png&#34;
         alt=&#34;Plugins page&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;View of the Plugins showcase&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;!--
### Headlamp AI Assistant
--&gt;
&lt;h3 id=&#34;headlamp-ai-assistant&#34;&gt;Headlamp AI 助手&lt;/h3&gt;
&lt;!--
Managing Kubernetes often means memorizing commands and juggling tools. Headlamp&#39;s new AI Assistant changes this by adding a natural-language interface built into the UI. Now, instead of typing `kubectl` or digging through YAML you can ask, &#34;Is my app healthy?&#34; or &#34;Show logs for this deployment,&#34; and get answers in context, speeding up troubleshooting and smoothing onboarding for new users. Learn more about it [here](https://headlamp.dev/blog/2025/08/07/introducing-the-headlamp-ai-assistant/).
--&gt;
&lt;p&gt;管理 Kubernetes 通常意味着记忆命令和处理各种工具。Headlamp 的新 AI 助手通过添加内置在 UI 中的自然语言界面改变了这一点。
现在，你可以问&amp;quot;我的应用是否健康？&amp;quot;或&amp;quot;显示此部署的日志&amp;quot;，而不是输入 &lt;code&gt;kubectl&lt;/code&gt; 或深入研究 YAML，并在上下文中获得答案，
加快故障排除速度并简化新用户的入门。&lt;a href=&#34;https://headlamp.dev/blog/2025/08/07/introducing-the-headlamp-ai-assistant/&#34;&gt;在此&lt;/a&gt;了解更多信息。&lt;/p&gt;


    
    &lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
      &lt;iframe allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&#34; allowfullscreen=&#34;allowfullscreen&#34; loading=&#34;eager&#34; referrerpolicy=&#34;strict-origin-when-cross-origin&#34; src=&#34;https://www.youtube.com/embed/GzXkUuCTcd4?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; title=&#34;YouTube video&#34;
      &gt;&lt;/iframe&gt;
    &lt;/div&gt;

&lt;!--
### New plugins additions
--&gt;
&lt;h3 id=&#34;new-plugins-additions&#34;&gt;新增插件&lt;/h3&gt;
&lt;!--
Alongside the new AI Assistant, we&#39;ve been growing Headlamp&#39;s plugin ecosystem so you can bring more of your workflows into a single UI, with integrations like Minikube, Karpenter, and more.
--&gt;
&lt;p&gt;除了新的 AI 助手，我们一直在发展 Headlamp 的插件生态系统，以便你可以将更多工作流集成到单个 UI 中，包括 Minikube、Karpenter 等集成。&lt;/p&gt;
&lt;!--
Highlights from the latest plugin releases:
--&gt;
&lt;p&gt;最新插件发布的亮点：&lt;/p&gt;
&lt;!--
- Minikube plugin, providing a locally stored single node Minikube cluster
--&gt;
&lt;ul&gt;
&lt;li&gt;Minikube 插件，提供本地存储的单节点 Minikube 集群&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Karpenter plugin, with support for Azure Node Auto-Provisioning (NAP)
--&gt;
&lt;ul&gt;
&lt;li&gt;Karpenter 插件，支持 Azure 节点自动预配（NAP）&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- KEDA plugin, which you can learn more about [here](https://headlamp.dev/blog/2025/07/25/enabling-event-driven-autoscaling-with-the-new-keda-plugin-for-headlamp/)
--&gt;
&lt;ul&gt;
&lt;li&gt;KEDA 插件，你可以&lt;a href=&#34;https://headlamp.dev/blog/2025/07/25/enabling-event-driven-autoscaling-with-the-new-keda-plugin-for-headlamp/&#34;&gt;在此&lt;/a&gt;
了解更多信息&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Community-maintained plugins for [Gatekeeper](https://github.com/sozercan/gatekeeper-headlamp-plugin) and [KAITO](https://github.com/kaito-project/headlamp-kaito)
--&gt;
&lt;ul&gt;
&lt;li&gt;社区维护的 &lt;a href=&#34;https://github.com/sozercan/gatekeeper-headlamp-plugin&#34;&gt;Gatekeeper&lt;/a&gt;
和 &lt;a href=&#34;https://github.com/kaito-project/headlamp-kaito&#34;&gt;KAITO&lt;/a&gt; 插件&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Thanks to [Vrushali Shah](https://github.com/shahvrushali22) and [Murali Annamneni](https://github.com/muraliinformal) from Oracle, and also to [Anirban Singha](https://github.com/SinghaAnirban005), [Adwait Godbole](https://github.com/adwait-godbole), [Sertaç Özercan](https://github.com/sozercan), [Ernest Wong](https://github.com/chewong), and [Chloe Lim](https://github.com/chloe608).
--&gt;
&lt;p&gt;感谢来自 Oracle 的 &lt;a href=&#34;https://github.com/shahvrushali22&#34;&gt;Vrushali Shah&lt;/a&gt; 和 &lt;a href=&#34;https://github.com/muraliinformal&#34;&gt;Murali Annamneni&lt;/a&gt;，
以及 &lt;a href=&#34;https://github.com/SinghaAnirban005&#34;&gt;Anirban Singha&lt;/a&gt;、&lt;a href=&#34;https://github.com/adwait-godbole&#34;&gt;Adwait Godbole&lt;/a&gt;、
&lt;a href=&#34;https://github.com/sozercan&#34;&gt;Sertaç Özercan&lt;/a&gt;、&lt;a href=&#34;https://github.com/chewong&#34;&gt;Ernest Wong&lt;/a&gt; 和 &lt;a href=&#34;https://github.com/chloe608&#34;&gt;Chloe Lim&lt;/a&gt;。&lt;/p&gt;
&lt;!--
### Other plugins updates
--&gt;
&lt;h3 id=&#34;other-plugins-updates&#34;&gt;其他插件更新&lt;/h3&gt;
&lt;!--
Alongside new additions, we&#39;ve also spent time refining plugins that many of you already use, focusing on smoother workflows and better integration with the core UI.
--&gt;
&lt;p&gt;除了新增内容，我们还花时间改进了你们许多人已经在使用的插件，专注于更流畅的工作流和与核心 UI 的更好集成。&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2026/01/22/headlamp-in-2025-project-highlights/backstage-plugin.png&#34;
         alt=&#34;Backstage plugin&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;View of the Backstage plugin&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;!--
Changes:
--&gt;
&lt;p&gt;变更：&lt;/p&gt;
&lt;!--
- **Flux plugin**: Updated for Flux v2.7, with support for newer CRDs, navigation fixes so it works smoothly on recent clusters
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Flux 插件&lt;/strong&gt;：更新以支持 Flux v2.7，支持更新的 CRD，导航修复使其在最近的集群上平稳运行&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- **App Catalog**: Now supports Helm repos in addition to Artifact Hub, can run in-cluster via /serviceproxy, and shows both current and latest app versions
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;应用目录&lt;/strong&gt;：现在除了 Artifact Hub 之外还支持 Helm 仓库，可以通过 /serviceproxy 在集群内运行，并显示当前和最新的应用版本&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- **Plugin Catalog**: Improved card layout and accessibility, plus dependency and Storybook test updates
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;插件目录&lt;/strong&gt;：改进了卡片布局和可访问性，以及依赖项和 Storybook 测试更新&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- **Backstage plugin**: Dependency and build updates, more info [here](https://headlamp.dev/blog/2025/11/05/strengthening-backstage-and-headlamp-integration/)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Backstage 插件&lt;/strong&gt;：依赖项和构建更新，&lt;a href=&#34;https://headlamp.dev/blog/2025/11/05/strengthening-backstage-and-headlamp-integration/&#34;&gt;在此&lt;/a&gt;
了解更多信息&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
### Plugin development
--&gt;
&lt;h3 id=&#34;plugin-development&#34;&gt;插件开发&lt;/h3&gt;
&lt;!--
We&#39;ve focused on making it faster and clearer to build, test, and ship Headlamp plugins, backed by improved documentation and lighter tooling.
--&gt;
&lt;p&gt;我们专注于使构建、测试和发布 Headlamp 插件更快、更清晰，并辅以改进的文档和更轻量的工具。&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2026/01/22/headlamp-in-2025-project-highlights/plugin-development.png&#34;
         alt=&#34;Plugin development&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;View of the Plugin Development guide&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;!--
Changes:
--&gt;
&lt;p&gt;变更：&lt;/p&gt;
&lt;!--
- New and expanded guides for [plugin architecture](https://headlamp.dev/docs/latest/development/architecture#plugins) and [development](https://headlamp.dev/docs/latest/development/plugins/getting-started), including how to publish and ship plugins
--&gt;
&lt;ul&gt;
&lt;li&gt;新增和扩展的&lt;a href=&#34;https://headlamp.dev/docs/latest/development/architecture#plugins&#34;&gt;插件架构&lt;/a&gt;
和&lt;a href=&#34;https://headlamp.dev/docs/latest/development/plugins/getting-started&#34;&gt;开发&lt;/a&gt;指南，包括如何发布和交付插件&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Added [i18n support documentation](https://headlamp.dev/docs/latest/development/plugins/i18n) so plugins can be translated and localized
--&gt;
&lt;ul&gt;
&lt;li&gt;添加了 &lt;a href=&#34;https://headlamp.dev/docs/latest/development/plugins/i18n&#34;&gt;i18n 支持文档&lt;/a&gt;，以便插件可以被翻译和本地化&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Added example plugins: [ui-panels](https://github.com/kubernetes-sigs/headlamp/tree/main/plugins/examples/ui-panels), [resource-charts](https://github.com/kubernetes-sigs/headlamp/tree/main/plugins/examples/resource-charts), [custom-theme](https://github.com/kubernetes-sigs/headlamp/tree/main/plugins/examples/custom-theme), and [projects](https://github.com/kubernetes-sigs/headlamp/tree/main/plugins/examples/projects)
--&gt;
&lt;ul&gt;
&lt;li&gt;添加了示例插件：&lt;a href=&#34;https://github.com/kubernetes-sigs/headlamp/tree/main/plugins/examples/ui-panels&#34;&gt;ui-panels&lt;/a&gt;、
&lt;a href=&#34;https://github.com/kubernetes-sigs/headlamp/tree/main/plugins/examples/resource-charts&#34;&gt;resource-charts&lt;/a&gt;、
&lt;a href=&#34;https://github.com/kubernetes-sigs/headlamp/tree/main/plugins/examples/custom-theme&#34;&gt;custom-theme&lt;/a&gt;
和&lt;a href=&#34;https://github.com/kubernetes-sigs/headlamp/tree/main/plugins/examples/projects&#34;&gt;projects&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Improved type checking for Headlamp APIs, restored Storybook support for component testing, and reduced dependencies for faster installs and fewer updates
--&gt;
&lt;ul&gt;
&lt;li&gt;改进了 Headlamp API 的类型检查，恢复了用于组件测试的 Storybook 支持，并减少了依赖项以加快安装速度并减少更新&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Documented plugin install locations, UI signifiers in Plugin Settings, and labels that differentiated shipped, UI-installed, and dev-mode plugins
--&gt;
&lt;ul&gt;
&lt;li&gt;记录了插件安装位置、插件设置中的 UI 标识符，以及区分已交付、UI 安装和开发模式插件的标签&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Security upgrades
--&gt;
&lt;h2 id=&#34;security-upgrades&#34;&gt;安全升级&lt;/h2&gt;
&lt;!--
We&#39;ve also been investing in keeping Headlamp secure – both by tightening how authentication works and by staying on top of upstream vulnerabilities and tooling.
--&gt;
&lt;p&gt;我们还在投资保持 Headlamp 的安全性——既通过加强身份认证的工作方式，也密切关注上游漏洞和工具的更新。&lt;/p&gt;
&lt;!--
Updates:
--&gt;
&lt;p&gt;更新：&lt;/p&gt;
&lt;!--
- We&#39;ve been keeping up with security updates, regularly updating dependencies and addressing upstream security issues.
--&gt;
&lt;ul&gt;
&lt;li&gt;我们一直在跟进安全更新，定期更新依赖项并解决上游安全问题。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- We tightened the Helm chart&#39;s default security context and fixed a regression that broke the plugin manager.
--&gt;
&lt;ul&gt;
&lt;li&gt;我们加强了 Helm chart 的默认安全上下文，并修复了破坏插件管理器的回归问题。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- We&#39;ve improved OIDC security with PKCE support, helping unblock more secure and standards-compliant OIDC setups when deploying Headlamp in-cluster.
--&gt;
&lt;ul&gt;
&lt;li&gt;我们通过 PKCE 支持改进了 OIDC 安全性，帮助在集群中部署 Headlamp 时解除更安全和符合标准的 OIDC 设置的阻碍。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Conclusion
--&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;结论&lt;/h2&gt;
&lt;!--
Thank you to everyone who has contributed to Headlamp this year – whether through pull requests, plugins, or simply sharing how you&#39;re using the project. Seeing the different ways teams are adopting and extending the project is a big part of what keeps us moving forward. If your organization uses Headlamp, consider adding it to our [adopters list](https://github.com/kubernetes-sigs/headlamp/blob/main/ADOPTERS.md).
--&gt;
&lt;p&gt;感谢今年为 Headlamp 做出贡献的每个人——无论是通过合并请求、插件，还是简单地分享你如何使用该项目。
看到团队采用和扩展该项目的不同方式是我们继续前进的重要动力。如果你的组织使用 Headlamp，
请考虑将其添加到我们的&lt;a href=&#34;https://github.com/kubernetes-sigs/headlamp/blob/main/ADOPTERS.md&#34;&gt;采用者列表&lt;/a&gt;中。&lt;/p&gt;
&lt;!--
If you haven&#39;t tried Headlamp recently, all these updates are available today. Check out the latest Headlamp release, explore the new views, plugins, and docs, and share your feedback with us on Slack or GitHub – your feedback helps shape where Headlamp goes next
--&gt;
&lt;p&gt;如果你最近还没有尝试过 Headlamp，所有这些更新今天都可以使用。查看最新的 Headlamp 版本，探索新的视图、插件和文档，
并在 Slack 或 GitHub 上与我们分享你的反馈——你的反馈有助于塑造 Headlamp 的未来发展方向。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.35：云控制器管理器中的基于监视的路由协调</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2026/01/08/kubernetes-v1-35-watch-based-route-reconciliation-in-ccm/</link>
      <pubDate>Thu, 08 Jan 2026 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2026/01/08/kubernetes-v1-35-watch-based-route-reconciliation-in-ccm/</guid>
      <description>
        
        
        &lt;!--
---
layout: blog
title: &#34;Kubernetes v1.35: Watch Based Route Reconciliation in the Cloud Controller Manager&#34;
date: 2026-01-08T10:30:00-08:00
slug: kubernetes-v1-35-watch-based-route-reconciliation-in-ccm
author: &gt;
  [Lukas Metzner](https://github.com/lukasmetzner) (Hetzner)
---
--&gt;
&lt;!--
Up to and including Kubernetes v1.34,
the route controller in Cloud Controller Manager (CCM)
implementations built using the
[k8s.io/cloud-provider](https://github.com/kubernetes/cloud-provider)
library reconciles routes at a fixed interval.
This causes unnecessary API requests to the cloud provider when
there are no changes to routes. Other controllers implemented
through the same library already use watch-based mechanisms,
leveraging informers to avoid unnecessary API calls.
A new feature gate is being introduced in v1.35 to allow
changing the behavior of the route controller to use watch-based informers.
--&gt;
&lt;p&gt;在 Kubernetes v1.34 及更早版本中，使用
&lt;a href=&#34;https://github.com/kubernetes/cloud-provider&#34;&gt;k8s.io/cloud-provider&lt;/a&gt;
库构建的云控制器管理器（CCM）实现中的路由控制器会以固定的时间间隔进行路由协调。
这会导致在路由没有变化的情况下，向云提供商发出不必要的 API 请求。
其他使用同一库实现的控制器已经使用基于监听的机制，
利用 informer 来避免不必要的 API 调用。
v1.35 版本引入了一个新的特性门控，允许更改路由控制器的行为，
使其使用基于监听的 informer。&lt;/p&gt;
&lt;!--
## What&#39;s new?

The feature gate `CloudControllerManagerWatchBasedRoutesReconciliation`
has been introduced to
[k8s.io/cloud-provider](https://github.com/kubernetes/cloud-provider)
in alpha stage by
[SIG Cloud Provider](https://github.com/kubernetes/community/blob/master/sig-cloud-provider/README.md).
To enable this feature you can use
`--feature-gate=CloudControllerManagerWatchBasedRoutesReconciliation=true`
in the CCM implementation you are using.
--&gt;
&lt;h2 id=&#34;新特性&#34;&gt;新特性&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-cloud-provider/README.md&#34;&gt;SIG Cloud Provider&lt;/a&gt;
已在 &lt;a href=&#34;https://github.com/kubernetes/cloud-provider&#34;&gt;k8s.io/cloud-provider&lt;/a&gt;
引入了 Alpha 阶段的 &lt;code&gt;CloudControllerManagerWatchBasedRoutesReconciliation&lt;/code&gt;
特性门控。要启用此特性，你可以在使用的 CCM 实现中使用
&lt;code&gt;--feature-gate=CloudControllerManagerWatchBasedRoutesReconciliation=true&lt;/code&gt;
参数。&lt;/p&gt;
&lt;!--
## About the feature gate
--&gt;
&lt;h2 id=&#34;关于此特性门控&#34;&gt;关于此特性门控&lt;/h2&gt;
&lt;!--
This feature gate will trigger the route reconciliation loop whenever a node is
added, deleted, or the fields `.spec.podCIDRs` or `.status.addresses` are updated.

An additional reconcile is performed in a random interval between 12h and 24h,
which is chosen at the controller&#39;s start time.
--&gt;
&lt;p&gt;此特性门控会在节点添加、删除 &lt;code&gt;.spec.podCIDRs&lt;/code&gt; 或
&lt;code&gt;.status.addresses&lt;/code&gt; 字段更新时触发路由协调循环。&lt;/p&gt;
&lt;p&gt;此外，还会以 12 小时到 24 小时之间的随机间隔执行一次额外的协调，
该间隔在控制器启动时确定。&lt;/p&gt;
&lt;!--
This feature gate does not modify the logic within the reconciliation loop.
Therefore, users of a CCM implementation should not experience significant
changes to their existing route configurations.
--&gt;
&lt;p&gt;此特性门控不会修改协调循环内的逻辑。
因此，CCM 实现的用户不应遇到现有路由配置的重大变化。&lt;/p&gt;
&lt;!--
## How can I learn more?

For more details, refer to the [KEP-5237](https://kep.k8s.io/5237).
--&gt;
&lt;h2 id=&#34;如何了解更多&#34;&gt;如何了解更多？&lt;/h2&gt;
&lt;p&gt;更多详情请参阅
&lt;a href=&#34;https://kep.k8s.io/5237&#34;&gt;KEP-5237&lt;/a&gt;。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.35: 通过就地重启 Pod 实现更高的效率</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2026/01/05/kubernetes-v1-35-restart-all-containers/</link>
      <pubDate>Mon, 05 Jan 2026 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2026/01/05/kubernetes-v1-35-restart-all-containers/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.35: New level of efficiency with in-place Pod restart&#34;
date: 2026-01-05T10:30:00-08:00
slug: kubernetes-v1-35-restart-all-containers
author: &gt;
  [Yuan Wang](https://github.com/yuanwang04)
  [Giuseppe Tinti Tomio](https://github.com/GiuseppeTT)
  [Sergey Kanzhelev](https://github.com/SergeyKanzhelev)
translator: &gt;
  [Xin Li](https://github.com/my-git9)
--&gt;
&lt;!--
The release of Kubernetes 1.35 introduces a powerful new feature that provides a much-requested capability: the ability to trigger a full, in-place restart of the Pod. This feature, *Restart All Containers* (alpha in 1.35), allows for an efficient way to reset a Pod&#39;s state compared to resource-intensive approach of deleting and recreating the entire Pod. This feature is especially useful for AI/ML workloads allowing application developers to concentrate on their core training logic while offloading complex failure-handling and recovery mechanisms to sidecars and declarative Kubernetes configuration. With `RestartAllContainers` and other planned enhancements, Kubernetes continues to add building blocks for creating the most flexible, robust, and efficient platforms for AI/ML workloads.

This new functionality is available by enabling the `RestartAllContainersOnContainerExits` feature gate. This alpha feature extends the [*Container Restart Rules* feature](/docs/concepts/workloads/pods/pod-lifecycle/#container-restart-rules), which graduated to beta in Kubernetes 1.35.
--&gt;
&lt;p&gt;Kubernetes 1.35 版本引入了一项强大的新特性，满足了用户对 Pod 就地重启的迫切需求。
这项名为“重启所有容器”（Restart All Containers，1.35 版本为 Alpha 版）的特性，
相比于资源用量较高的删除并重建整个 Pod 的方式，能够更高效地重置 Pod 的状态。
该特性对于 AI/ML 工作负载尤为实用，使应用程序开发人员能够专注于核心训练逻辑，
同时将复杂的故障处理和恢复机制交给边车容器和声明式 Kubernetes 配置来处理。
凭借 &lt;code&gt;RestartAllContainers&lt;/code&gt; 和其他计划中的增强特性，
Kubernetes 将继续构建更灵活、更健壮、更高效的 AI/ML 工作负载平台。&lt;/p&gt;
&lt;p&gt;启用 &lt;code&gt;RestartAllContainersOnContainerExits&lt;/code&gt; 特性门控即可使用此新特性。
此 Alpha 特性扩展了&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/pods/pod-lifecycle/#container-restart-rules&#34;&gt;&lt;strong&gt;容器重启规则&lt;/strong&gt;特性&lt;/a&gt;，
该特性在 Kubernetes 1.35 中升级为 Beta 版。&lt;/p&gt;
&lt;!--
## The problem: when a single container restart isn&#39;t enough and recreating pods is too costly

Kubernetes has long supported restart policies at the Pod level (`restartPolicy`) and, more recently, at the [individual container level](/blog/2025/08/29/kubernetes-v1-34-per-container-restart-policy/). These policies are great for handling crashes in a single, isolated process. However, many modern applications have more complex inter-container dependencies. For instance:
--&gt;
&lt;h2 id=&#34;问题-当单个容器重启不足以解决问题-而重新创建-pod-成本过高时&#34;&gt;问题：当单个容器重启不足以解决问题，而重新创建 Pod 成本过高时&lt;/h2&gt;
&lt;p&gt;Kubernetes 长期以来一直支持 Pod 级别的重启策略（&lt;code&gt;restartPolicy&lt;/code&gt;），
最近也支持&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2025/08/29/kubernetes-v1-34-per-container-restart-policy/&#34;&gt;单个容器级别的重启策略&lt;/a&gt;。
这些策略非常适合处理单个独立进程中的崩溃。然而，许多现代应用程序具有更复杂的容器间依赖关系。例如：&lt;/p&gt;
&lt;!--
- An **init container** prepares the environment by mounting a volume or generating a configuration file. If the main application container corrupts this environment, simply restarting that one container is not enough. The entire initialization process needs to run again.
- A **watcher sidecar** monitors system health. If it detects an unrecoverable but retriable error state, it must trigger a restart of the main application container from a clean slate.
- A **sidecar** that manages a remote resource fails. Even if the sidecar restarts on its own, the main container may be stuck trying to access an outdated or broken connection.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;初始化容器&lt;/strong&gt;通过挂载卷或生成配置文件来准备环境。如果主应用程序容器损坏了此环境，
仅仅重启该容器是不够的，需要重新运行整个初始化过程。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;监视边车&lt;/strong&gt;监控系统健康状况。如果它检测到不可恢复但可重试的错误状态，则必须触发主应用程序容器从头开始重启。&lt;/li&gt;
&lt;li&gt;管理远程资源的&lt;strong&gt;边车&lt;/strong&gt;发生故障。即使边车自行重启，主容器也可能因为尝试访问过时或损坏的连接而卡住。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
In all these cases, the desired action is not to restart a single container, but all of them. Previously, the only way to achieve this was to delete the Pod and have a controller (like a Job or ReplicaSet) create a new one. This process is slow and expensive, involving the scheduler, node resource allocation and re-initialization of networking and storage.

This inefficiency becomes even worse when handling large-scale AI/ML workloads (&gt;= 1,000 Nodes with one Pod per Node). A common requirement for these synchronous workloads is that when a failure occurs (such as a Node crash), all Pods in the fleet must be recreated to reset the state before training can resume, even if all the other Pods were not directly affected by the failure. Deleting, creating and scheduling thousands of Pods simultaneously creates a massive bottleneck. The estimated overhead of this failure could cost [$100,000 per month in wasted resources](https://docs.google.com/document/d/16zexVooHKPc80F4dVtUjDYK9DOpkVPRNfSv0zRtfFpk/edit?tab=t.0#bookmark=id.qwqcnzf96avw).
--&gt;
&lt;p&gt;在所有这些情况下，我们期望的操作并非重启单个容器，而是重启所有容器。
此前，实现此目的的唯一方法是删除 Pod，然后由控制器（例如 Job 或 ReplicaSet）创建一个新的 Pod。
这个过程缓慢且成本高昂，涉及调度器、节点资源分配以及网络和存储的重新初始化。&lt;/p&gt;
&lt;p&gt;在处理大规模 AI/ML 工作负载（≥ 1000 个节点，每个节点一个 Pod）时，这种低效性会更加严重。
这些同步工作负载的一个常见要求是，当发生故障（例如节点崩溃）时，
必须重新创建集群中的所有 Pod 以重置状态，然后才能恢复训练，
即使其他 Pod 并未直接受到故障的影响。
同时删除、创建和调度数千个 Pod 会造成巨大的瓶颈。
此次故障造成的损失估计每月可能高达 10 万美元（资源浪费）。&lt;/p&gt;
&lt;!--
Handling these failures for AI/ML training jobs requires a complex integration touching both the training framework and Kubernetes, which are often fragile and toilsome. This feature introduces a Kubernetes-native solution, improving system robustness and allowing application developers to concentrate on their core training logic.

Another major benefit of restarting Pods in place is that keeping Pods on their assigned Nodes allows for further optimizations. For example, one can implement node-level caching tied to a specific Pod identity, something that is impossible when Pods are unnecessarily being recreated on different Nodes.
--&gt;
&lt;p&gt;处理 AI/ML 训练任务的这些故障需要复杂的集成，涉及训练框架和 Kubernetes，
而这两者通常都很脆弱且繁琐。
此特性引入了一种 Kubernetes 原生解决方案，
提高了系统健壮性，并使应用程序开发人员能够专注于其核心训练逻辑。&lt;/p&gt;
&lt;p&gt;就地重启 Pod 的另一个主要优势在于，将 Pod 保留在其分配的节点上可以进行进一步的优化。
例如，可以实现与特定 Pod 标识绑定的节点级缓存，
而当 Pod 不必要地在不同的节点上重新创建时，这种优化方式是无法实现的。&lt;/p&gt;
&lt;!--
## Introducing the `RestartAllContainers` action

To address this, Kubernetes v1.35 adds a new action to the container restart rules: `RestartAllContainers`. When a container exits in a way that matches a rule with this action, the kubelet initiates a fast, **in-place** restart of the Pod.
--&gt;
&lt;h2 id=&#34;引入-restartallcontainers-操作&#34;&gt;引入 &lt;code&gt;RestartAllContainers&lt;/code&gt; 操作&lt;/h2&gt;
&lt;p&gt;为了解决这个问题，Kubernetes v1.35 在容器重启规则中添加了一个新的操作：&lt;code&gt;RestartAllContainers&lt;/code&gt;。
当容器以符合此操作规则的方式退出时，kubelet 会启动对 Pod 的快速&lt;strong&gt;就地&lt;/strong&gt;重启。&lt;/p&gt;
&lt;!--
This in-place restart is highly efficient because it preserves the Pod&#39;s most important resources:
- The Pod&#39;s UID, IP address and network namespace.
- The Pod&#39;s sandbox and any attached devices.
- All volumes, including `emptyDir` and mounted volumes from PVCs.
--&gt;
&lt;p&gt;这种就地重启非常高效，因为它保留了 Pod 最重要的资源：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Pod 的 UID、IP 地址和网络命名空间。&lt;/li&gt;
&lt;li&gt;Pod 的沙箱及其所有连接的设备。&lt;/li&gt;
&lt;li&gt;所有卷，包括 &lt;code&gt;emptyDir&lt;/code&gt; 和从 PVC 挂载的卷。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
After terminating all running containers, the Pod&#39;s startup sequence is re-executed from the very beginning. This means all **init containers** are run again in order, followed by the sidecar and regular containers, ensuring a completely fresh start in a known-good environment. With the exception of ephemeral containers (which are terminated), all other containers—including those that previously succeeded or failed—will be restarted, regardless of their individual restart policies.
--&gt;
&lt;p&gt;终止所有正在运行的容器后，Pod 的启动序列将从头开始重新执行。
这意味着所有&lt;strong&gt;初始化容器&lt;/strong&gt;将按顺序再次运行，随后是边车容器和常规容器，
从而确保在已知良好的环境中完全重新启动。
除了临时容器（会被终止）之外，所有其他容器——包括之前成功或失败的容器——都将重新启动，
而不管它们各自的重启策略如何。&lt;/p&gt;
&lt;!--
## Use cases

### 1. Efficient restarts for ML/Batch jobs

For ML training jobs, [rescheduling a worker Pod on failure](/blog/2025/07/03/navigating-failures-in-pods-with-devices/#roadmap-for-failure-modes-container-code-failed) is a costly operation that wastes valuable compute resources. On a 1,000-node training cluster, rescheduling overhead can waste [over $100,000 in compute resources monthly](https://docs.google.com/document/d/16zexVooHKPc80F4dVtUjDYK9DOpkVPRNfSv0zRtfFpk/edit?tab=t.0#bookmark=id.qwqcnzf96avw).
--&gt;
&lt;h2 id=&#34;应用案例&#34;&gt;应用案例&lt;/h2&gt;
&lt;h3 id=&#34;1-高效重启机器学习-批处理作业&#34;&gt;1. 高效重启机器学习/批处理作业&lt;/h3&gt;
&lt;p&gt;对于机器学习训练作业，
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2025/07/03/navigating-failures-in-pods-with-devices/#roadmap-for-failure-modes-container-code-failed&#34;&gt;在工作节点 Pod 发生故障时重新调度&lt;/a&gt;是一项代价高昂的操作，
会浪费宝贵的计算资源。
在一个拥有 1000 个节点的训练集群中，
重新调度带来的开销每月可能会浪费&lt;a href=&#34;https://docs.google.com/document/d/16zexVooHKPc80F4dVtUjDYK9DOpkVPRNfSv0zRtfFpk/edit?tab=t.0#bookmark=id.qwqcnzf96avw&#34;&gt;超过 10 万美元的计算资源&lt;/a&gt;。&lt;/p&gt;
&lt;!--
With `RestartAllContainers` actions you can address this by enabling a much faster, hybrid recovery strategy: recreate only the &#34;bad&#34; Pods (e.g., those on unhealthy Nodes) while triggering `RestartAllContainers` for the remaining healthy Pods. Benchmarks show this reduces the recovery overhead [from minutes to a few seconds](https://docs.google.com/document/d/16zexVooHKPc80F4dVtUjDYK9DOpkVPRNfSv0zRtfFpk/edit?tab=t.0#bookmark=id.cwkee8kar0i5).

With in-place restarts, a watcher sidecar can monitor the main training process. If it encounters a specific, retriable error, the watcher can exit with a designated code to trigger a fast reset of the worker Pod, allowing it to restart from the last checkpoint without involving the Job controller. This capability is now natively supported by Kubernetes.

Read more details about future development and JobSet features at [KEP-467 JobSet in-place restart](https://github.com/kubernetes-sigs/jobset/issues/467).
--&gt;
&lt;p&gt;借助 &lt;code&gt;RestartAllContainers&lt;/code&gt; 操作，你可以启用一种速度更快、混合的恢复策略来解决这个问题：
仅重新创建“故障”Pod（例如，位于不健康节点上的 Pod），同时对其余健康的 Pod
触发 &lt;code&gt;RestartAllContainers&lt;/code&gt; 操作。基准测试表明，这可以将恢复开销从几分钟降低到几秒钟。&lt;/p&gt;
&lt;p&gt;通过就地重启，监视器边车可以监控主训练过程。如果遇到特定的可重试错误，
监视器可以退出并返回指定的代码，从而触发工作 Pod 的快速重置，
使其能够从上一个检查点重新启动，而无需 Job 控制器的参与。Kubernetes 现在原生支持此特性。&lt;/p&gt;
&lt;p&gt;有关未来开发和 JobSet 特性的更多详细信息，请参阅
&lt;a href=&#34;https://github.com/kubernetes-sigs/jobset/issues/467&#34;&gt;KEP-467 JobSet 就地重启&lt;/a&gt;。&lt;/p&gt;
&lt;!--
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: ml-worker-pod
spec:
  restartPolicy: Never
  initContainers:
  # This init container will re-run on every in-place restart
  - name: setup-environment
    image: my-repo/setup-worker:1.0
  - name: watcher-sidecar
    image: my-repo/watcher:1.0
    restartPolicy: Always
    restartPolicyRules:
    - action: RestartAllContainers
      onExit:
        exitCodes:
          operator: In
          # A specific exit code from the watcher triggers a full pod restart
          values: [88]
  containers:
  - name: main-application
    image: my-repo/training-app:1.0
```
--&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;ml-worker-pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;restartPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Never&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;initContainers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 此初始化容器将在每次就地重启时重新运行。&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;setup-environment&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;my-repo/setup-worker:1.0&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;watcher-sidecar&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;my-repo/watcher:1.0&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;restartPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Always&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;restartPolicyRules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;action&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;RestartAllContainers&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;onExit&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;exitCodes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;operator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;In&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 监视器返回特定退出代码会触发 Pod 完全重启。&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;values&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#666&#34;&gt;88&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;main-application&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;my-repo/training-app:1.0&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
### 2. Re-running init containers for a clean state

Imagine a scenario where an init container is responsible for fetching credentials or setting up a shared volume. If the main application fails in a way that corrupts this shared state, you need the [init container to rerun](https://github.com/kubernetes/enhancements/issues/3676).

By configuring the main application to exit with a specific code upon detecting such a corruption, you can trigger the `RestartAllContainers` action, guaranteeing that the init container provides a clean setup before the application restarts.
--&gt;
&lt;h3 id=&#34;2-重新运行初始化容器以确保干净状态&#34;&gt;2. 重新运行初始化容器以确保干净状态&lt;/h3&gt;
&lt;p&gt;设想这样一种场景：初始化容器负责获取凭据或设置共享卷。
如果主应用程序发生故障，导致共享状态损坏，则需要重新运行初始化容器。&lt;/p&gt;
&lt;p&gt;通过配置主应用程序在检测到此类损坏时以特定代码退出，你可以触发 &lt;code&gt;RestartAllContainers&lt;/code&gt;
操作，从而确保初始化容器在应用程序重启之前提供一个干净的设置。&lt;/p&gt;
&lt;!--
### 3. Handling high rate of similar tasks execution

There are cases when tasks are best represented as a Pod execution. And each task requires a clean execution. The task may be a game session backend or some queue item processing. If the rate of tasks is high, running the whole cycle of Pod creation, scheduling and initialization is simply too expensive, especially when tasks can be short. The ability to restart all containers from scratch enables a Kubernetes-native way to handle this scenario without custom solutions or frameworks. 
--&gt;
&lt;h3 id=&#34;3-处理高频率的类似任务执行&#34;&gt;3. 处理高频率的类似任务执行&lt;/h3&gt;
&lt;p&gt;有些情况下，任务最好以 Pod 执行的形式呈现。每个任务都需要干净利落地执行。例如，游戏会话后端或队列项处理。
如果任务频率很高，运行完整的 Pod 创建、调度和初始化流程会非常耗费资源，
尤其是在任务执行时间可能很短的情况下。
Kubernetes 原生支持从头开始重启所有容器，无需自定义解决方案或框架即可处理这种情况。&lt;/p&gt;
&lt;!--
## How to use it

To try this feature, you must enable the `RestartAllContainersOnContainerExits` feature gate on your Kubernetes cluster components (API server and kubelet) running Kubernetes v1.35+. This alpha feature extends the `ContainerRestartRules` feature, which graduated to beta in v1.35 and is enabled by default.

Once enabled, you can add `restartPolicyRules` to any container (init, sidecar, or regular) and use the `RestartAllContainers` action.
--&gt;
&lt;h2 id=&#34;使用方法&#34;&gt;使用方法&lt;/h2&gt;
&lt;p&gt;要试用此特性，你必须在运行 Kubernetes v1.35 或更高版本的 Kubernetes
集群组件（API 服务器和 kubelet）上启用 &lt;code&gt;RestartAllContainersOnContainerExits&lt;/code&gt; 特性门控。
此 Alpha 特性扩展了 &lt;code&gt;ContainerRestartRules&lt;/code&gt; 特性，后者已在 v1.35 版本中升级为 beta 版，并默认启用。&lt;/p&gt;
&lt;p&gt;启用后，你可以将 &lt;code&gt;restartPolicyRules&lt;/code&gt; 添加到任何容器（Init、边车或常规容器），
并使用 &lt;code&gt;RestartAllContainers&lt;/code&gt; 操作。&lt;/p&gt;
&lt;!--
The feature is designed to be easily usable on existing apps. However, if an application does not follow some best practices, it may cause issues for the application or for observability tooling. When enabling the feature, make sure that all containers are reentrant and that external tooling is prepared for init containers to re-run. Also, when restarting all containers, the kubelet does not run `preStop` hooks. This means containers must be designed to handle abrupt termination without relying on `preStop` hooks for graceful shutdown. 
--&gt;
&lt;p&gt;该特性旨在方便现有应用程序使用。但是，如果应用程序不遵循某些最佳实践，
则可能会导致应用程序本身或可观测性工具出现问题。
启用此特性时，请确保所有容器都是可重入的，并且外部工具已准备好用于重新启动初始化容器。
此外，重启所有容器时，kubelet 不会运行 &lt;code&gt;preStop&lt;/code&gt; 钩子。
这意味着容器必须设计为能够处理突然终止的情况，而无需依赖 &lt;code&gt;preStop&lt;/code&gt; 钩子来实现优雅关闭。&lt;/p&gt;
&lt;!--
## Observing the restart

To make this process observable, a new Pod condition, `AllContainersRestarting`, is added to the Pod&#39;s status. When a restart is triggered, this condition becomes `True` and it reverts to `False` once all containers have terminated and the Pod is ready to start its lifecycle anew. This provides a clear signal to users and other cluster components about the Pod&#39;s state.

All containers restarted by this action will have their restart count incremented in the container status.
--&gt;
&lt;h2 id=&#34;观察重启&#34;&gt;观察重启&lt;/h2&gt;
&lt;p&gt;为了使重启过程可观察，Pod 的状态中添加了一个新的条件 &lt;code&gt;AllContainersRestarting&lt;/code&gt;。
当触发重启时，此条件变为 &lt;code&gt;True&lt;/code&gt;；当所有容器终止且 Pod 准备好重新开始其生命周期时，
此条件变为 &lt;code&gt;False&lt;/code&gt;。这为用户和其他集群组件提供了关于 Pod 状态的清晰信号。&lt;/p&gt;
&lt;p&gt;所有通过此操作重启的容器，其容器状态中的重启计数都会递增。&lt;/p&gt;
&lt;!--
## Learn more

- Read the official documentation on [Pod Lifecycle](/docs/concepts/workloads/pods/pod-lifecycle/#restart-all-containers).
- Read the detailed proposal in the [KEP-5532: Restart All Containers on Container Exits](https://kep.k8s.io/5532).
- Read the proposal for JobSet in-place restart in [JobSet issue #467](https://github.com/kubernetes-sigs/jobset/issues/467).
--&gt;
&lt;h2 id=&#34;了解更多&#34;&gt;了解更多&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;阅读 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/pods/pod-lifecycle/#restart-all-containers&#34;&gt;Pod 生命周期&lt;/a&gt;的官方文档。&lt;/li&gt;
&lt;li&gt;阅读 &lt;a href=&#34;https://kep.k8s.io/5532&#34;&gt;KEP-5532：容器退出时重启所有容器&lt;/a&gt;中的详细提案。&lt;/li&gt;
&lt;li&gt;阅读 &lt;a href=&#34;https://github.com/kubernetes-sigs/jobset/issues/467&#34;&gt;JobSet issue #467&lt;/a&gt;
中关于 JobSet 就地重启的提案。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## We want your feedback!

As an alpha feature, `RestartAllContainers` is ready for you to experiment with and any use cases and feedback are welcome. This feature is driven by the [SIG Node](https://github.com/kubernetes/community/blob/master/sig-node/README.md) community. If you are interested in getting involved, sharing your thoughts, or contributing, please join us!
--&gt;
&lt;h2 id=&#34;我们期待你的反馈&#34;&gt;我们期待你的反馈！&lt;/h2&gt;
&lt;p&gt;作为一项 Alpha 特性，&lt;code&gt;RestartAllContainers&lt;/code&gt; 现已开放试用，
欢迎你提出任何使用案例和反馈意见。
此特性由 &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-node/README.md&#34;&gt;SIG Node&lt;/a&gt; 社区驱动。
如果你有兴趣参与、分享想法或做出贡献，请加入我们！&lt;/p&gt;
&lt;!--
You can reach SIG Node through:
- Slack: [#sig-node](https://kubernetes.slack.com/messages/sig-node)
- [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-node)
--&gt;
&lt;p&gt;你可以通过以下方式联系 SIG Node：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Slack：&lt;a href=&#34;https://kubernetes.slack.com/messages/sig-node&#34;&gt;#sig-node&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://groups.google.com/forum/#!forum/kubernetes-sig-node&#34;&gt;邮件列表&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.35：扩展容忍度运算符以支持数值比较（Alpha）</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2026/01/05/kubernetes-v1-35-numeric-toleration-operators/</link>
      <pubDate>Mon, 05 Jan 2026 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2026/01/05/kubernetes-v1-35-numeric-toleration-operators/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.35: Extended Toleration Operators to Support Numeric Comparisons (Alpha)&#34;
date: 2026-01-05T10:30:00-08:00
slug: kubernetes-v1-35-numeric-toleration-operators
author: &gt;
  Heba Elayoty (Microsoft)
--&gt;
&lt;!--
Many production Kubernetes clusters blend on-demand (higher-SLA) and spot/preemptible (lower-SLA) nodes to optimize costs while maintaining reliability for critical workloads. Platform teams need a safe default that keeps most workloads away from risky capacity, while allowing specific workloads to opt-in with explicit thresholds like &#34;I can tolerate nodes with failure probability up to 5%&#34;.
--&gt;
&lt;p&gt;许多生产级 Kubernetes 集群会混合使用按需（on-demand，高 SLA）节点与 spot/可抢占（preemptible，低 SLA）节点，
以在保证关键工作负载可靠性的同时优化成本。平台团队需要一个“安全默认值”，让大多数工作负载远离风险容量，
同时又允许特定工作负载用明确阈值显式选择接受（opt-in），例如“我可以容忍失败概率最高 5% 的节点”。&lt;/p&gt;
&lt;!--
Today, Kubernetes taints and tolerations can match exact values or check for existence, but they can&#39;t compare numeric thresholds. You&#39;d need to create discrete taint categories, use external admission controllers, or accept less-than-optimal placement decisions.
--&gt;
&lt;p&gt;目前，Kubernetes 的污点与容忍度（taints and tolerations）可以匹配精确值或检查键是否存在，
但&lt;strong&gt;无法进行数值阈值比较&lt;/strong&gt;。你不得不创建离散的污点类别、使用外部准入控制器，或接受不够理想的放置决策。&lt;/p&gt;
&lt;!--
In Kubernetes v1.35, we&#39;re introducing **Extended Toleration Operators** as an alpha feature. This enhancement adds `Gt` (Greater Than) and `Lt` (Less Than) operators to `spec.tolerations`, enabling threshold-based scheduling decisions that unlock new possibilities for SLA-based placement, cost optimization, and performance-aware workload distribution.
--&gt;
&lt;p&gt;在 Kubernetes v1.35 中，我们以 Alpha 形式引入 &lt;strong&gt;扩展容忍度运算符（Extended Toleration Operators）&lt;/strong&gt;。
该增强为 &lt;code&gt;spec.tolerations&lt;/code&gt; 增加 &lt;code&gt;Gt&lt;/code&gt;（Greater Than）与 &lt;code&gt;Lt&lt;/code&gt;（Less Than）运算符，
使调度器能够进行基于阈值的调度决策，从而为基于 SLA 的放置、成本优化以及面向性能的工作负载分发打开新可能。&lt;/p&gt;
&lt;!--
## The evolution of tolerations
--&gt;
&lt;h2 id=&#34;容忍度的演进&#34;&gt;容忍度的演进&lt;/h2&gt;
&lt;!--
Historically, Kubernetes supported two primary toleration operators:
--&gt;
&lt;p&gt;从历史上看，Kubernetes 主要支持两种容忍度运算符：&lt;/p&gt;
&lt;!--
- **`Equal`**: The toleration matches a taint if the key and value are exactly equal
- **`Exists`**: The toleration matches a taint if the key exists, regardless of value
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;Equal&lt;/code&gt;&lt;/strong&gt;：当 key 与 value 完全相等时，容忍度匹配该污点&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;Exists&lt;/code&gt;&lt;/strong&gt;：只要 key 存在（无论 value 是什么），容忍度就匹配该污点&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
While these worked well for categorical scenarios, they fell short for numeric comparisons. Starting with v1.35, we are closing this gap.
--&gt;
&lt;p&gt;这两者对“类别型”场景很好用，但在数值比较方面就显得力不从心。从 v1.35 开始，我们在补齐这一缺口。&lt;/p&gt;
&lt;!--
Consider these real-world scenarios:
--&gt;
&lt;p&gt;请看一些真实世界的场景：&lt;/p&gt;
&lt;!--
- **SLA requirements**: Schedule high-availability workloads only on nodes with failure probability below a certain threshold
- **Cost optimization**: Allow cost-sensitive batch jobs to run on cheaper nodes that exceed a specific cost-per-hour value
- **Performance guarantees**: Ensure latency-sensitive applications run only on nodes with disk IOPS or network bandwidth above minimum thresholds
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;SLA 要求&lt;/strong&gt;：只把高可用工作负载调度到失败概率低于某个阈值的节点上&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;成本优化&lt;/strong&gt;：允许对成本敏感的批处理作业运行在更便宜、且“每小时成本”超过某个特定值的节点上&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;性能保障&lt;/strong&gt;：确保对延迟敏感的应用只运行在磁盘 IOPS 或网络带宽高于最低阈值的节点上&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Without numeric comparison operators, cluster operators have had to resort to workarounds like creating multiple discrete taint values or using external admission controllers, neither of which scale well or provide the flexibility needed for dynamic threshold-based scheduling.
--&gt;
&lt;p&gt;在缺少数值比较运算符的情况下，集群运维人员不得不采用一些变通方案，例如创建多个离散的污点值，
或使用外部准入控制器。但这些方案既难以规模化，也无法提供“动态阈值调度”所需的灵活性。&lt;/p&gt;
&lt;!--
## Why extend tolerations instead of using NodeAffinity?
--&gt;
&lt;h2 id=&#34;为什么要扩展容忍度-而不是用节点亲和性-nodeaffinity&#34;&gt;为什么要扩展容忍度，而不是用节点亲和性（NodeAffinity）？&lt;/h2&gt;
&lt;!--
You might wonder: NodeAffinity already supports numeric comparison operators, so why extend tolerations? While NodeAffinity is powerful for expressing pod preferences, taints and tolerations provide critical operational benefits:
--&gt;
&lt;p&gt;你可能会问：NodeAffinity 已经支持数值比较运算符，为什么还要扩展容忍度？
NodeAffinity 虽然很适合表达 Pod 的偏好，但污点与容忍度提供了一些关键的运维收益：&lt;/p&gt;
&lt;!--
- **Policy orientation**: NodeAffinity is per-pod, requiring every workload to explicitly opt-out of risky nodes. Taints invert control—nodes declare their risk level, and only pods with matching tolerations may land there. This provides a safer default; most pods stay away from spot/preemptible nodes unless they explicitly opt-in.
- **Eviction semantics**: NodeAffinity has no eviction capability. Taints support the `NoExecute` effect with `tolerationSeconds`, enabling operators to drain and evict pods when a node&#39;s SLA degrades or spot instances receive termination notices.
- **Operational ergonomics**: Centralized, node-side policy is consistent with other safety taints like disk-pressure and memory-pressure, making cluster management more intuitive.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;策略导向&lt;/strong&gt;：NodeAffinity 是按 Pod 配置的，需要每个工作负载显式选择“避开”风险节点。
污点则把控制反转：由节点声明风险等级，只有带有匹配容忍度的 Pod 才能落到这些节点上。
这提供了更安全的默认值：大多数 Pod 会默认避开 spot/可抢占节点，除非它们显式选择接受。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;驱逐语义&lt;/strong&gt;：NodeAffinity 不具备驱逐能力。污点支持 &lt;code&gt;NoExecute&lt;/code&gt; 效果以及 &lt;code&gt;tolerationSeconds&lt;/code&gt;，
使运维人员可以在节点 SLA 降级或 spot 实例收到终止通知时，排空（drain）并驱逐 Pod。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;运维易用性&lt;/strong&gt;：集中式、节点侧的策略与磁盘压力、内存压力等其他安全污点一致，让集群管理更直观。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
This enhancement preserves the well-understood safety model of taints and tolerations while enabling threshold-based placement for SLA-aware scheduling.
--&gt;
&lt;p&gt;该增强在保留污点与容忍度这一成熟安全模型的基础上，为 SLA 感知调度提供了基于阈值的放置能力。&lt;/p&gt;
&lt;!--
## Introducing Gt and Lt operators
--&gt;
&lt;h2 id=&#34;引入-gt-与-lt-运算符&#34;&gt;引入 Gt 与 Lt 运算符&lt;/h2&gt;
&lt;!--
Kubernetes v1.35 introduces two new operators for tolerations:
--&gt;
&lt;p&gt;Kubernetes v1.35 为容忍度引入两个新运算符：&lt;/p&gt;
&lt;!--
- **`Gt` (Greater Than)**: The toleration matches if the taint&#39;s numeric value is less than the toleration&#39;s value
- **`Lt` (Less Than)**: The toleration matches if the taint&#39;s numeric value is greater than the toleration&#39;s value
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;Gt&lt;/code&gt;（Greater Than）&lt;/strong&gt;：当污点的数值 &lt;strong&gt;小于&lt;/strong&gt; 容忍度的数值时，容忍度匹配&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;Lt&lt;/code&gt;（Less Than）&lt;/strong&gt;：当污点的数值 &lt;strong&gt;大于&lt;/strong&gt; 容忍度的数值时，容忍度匹配&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
When a pod tolerates a taint with `Lt`, it&#39;s saying &#34;I can tolerate nodes where this metric is *less than* my threshold&#34;. Since tolerations allow scheduling, the pod can run on nodes where the taint value is greater than the toleration value. Think of it as: &#34;I tolerate nodes that are above my minimum requirements&#34;.
--&gt;
&lt;p&gt;当一个 Pod 使用 &lt;code&gt;Lt&lt;/code&gt; 来容忍某个污点时，它表达的是：“我可以容忍该指标&lt;strong&gt;小于&lt;/strong&gt;我的阈值的节点”。
由于“容忍度”本质上允许调度，因此该 Pod 也可以运行在污点值 &lt;strong&gt;大于&lt;/strong&gt; 容忍度值的节点上。
你可以把它理解为：“我容忍满足我最低要求之上的节点”。&lt;/p&gt;
&lt;!--
These operators work with numeric taint values and enable the scheduler to make sophisticated placement decisions based on continuous metrics rather than discrete categories.
--&gt;
&lt;p&gt;这些运算符适用于数值型污点值，使调度器能基于连续指标（continuous metrics）而不是离散类别做出更精细的放置决策。&lt;/p&gt;

&lt;div class=&#34;alert alert-info&#34; role=&#34;alert&#34;&gt;&lt;h4 class=&#34;alert-heading&#34;&gt;说明：&lt;/h4&gt;&lt;!--
Numeric values for `Gt` and `Lt` operators must be positive 64-bit integers without leading zeros. For example, `&#34;100&#34;` is valid, but `&#34;0100&#34;` (with leading zero) and `&#34;0&#34;` (zero value) are not permitted.

The `Gt` and `Lt` operators work with all taint effects: `NoSchedule`, `NoExecute`, and `PreferNoSchedule`.
--&gt;
&lt;p&gt;&lt;code&gt;Gt&lt;/code&gt; 与 &lt;code&gt;Lt&lt;/code&gt; 运算符的数值必须是&lt;strong&gt;正的 64 位整数&lt;/strong&gt;，且&lt;strong&gt;不能有前导零&lt;/strong&gt;。
例如，&lt;code&gt;&amp;quot;100&amp;quot;&lt;/code&gt; 是合法的，但 &lt;code&gt;&amp;quot;0100&amp;quot;&lt;/code&gt;（带前导零）与 &lt;code&gt;&amp;quot;0&amp;quot;&lt;/code&gt;（零值）不被允许。&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Gt&lt;/code&gt; 与 &lt;code&gt;Lt&lt;/code&gt; 运算符适用于所有污点效果（effect）：&lt;code&gt;NoSchedule&lt;/code&gt;、&lt;code&gt;NoExecute&lt;/code&gt;、&lt;code&gt;PreferNoSchedule&lt;/code&gt;。&lt;/p&gt;
&lt;/div&gt;

&lt;!--
## Use cases and examples
--&gt;
&lt;h2 id=&#34;使用场景与示例&#34;&gt;使用场景与示例&lt;/h2&gt;
&lt;!--
Let&#39;s explore how Extended Toleration Operators solve real-world scheduling challenges.
--&gt;
&lt;p&gt;下面我们通过几个例子看看扩展容忍度运算符如何解决真实调度挑战。&lt;/p&gt;
&lt;!--
### Example 1: Spot instance protection with SLA thresholds
--&gt;
&lt;h3 id=&#34;示例-1-用-sla-阈值限制-spot-实例的使用&#34;&gt;示例 1：用 SLA 阈值限制 spot 实例的使用&lt;/h3&gt;
&lt;!--
Many clusters mix on-demand and spot/preemptible nodes to optimize costs. Spot nodes offer significant savings but have higher failure rates. You want most workloads to avoid spot nodes by default, while allowing specific workloads to opt-in with clear SLA boundaries.
--&gt;
&lt;p&gt;许多集群会混合按需与 spot/可抢占节点以优化成本。Spot 节点能显著节省费用，但失败率更高。
你希望大多数工作负载默认避开 spot 节点，同时允许某些工作负载在清晰的 SLA 边界内显式选择接受。&lt;/p&gt;
&lt;!--
First, taint spot nodes with their failure probability (for example, 15% annual failure rate):
--&gt;
&lt;p&gt;首先，用“失败概率”给 spot 节点打上污点（例如：年化失败率 15%）：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Node&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;spot-node-1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;taints&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;failure-probability&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;15&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;NoExecute&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
On-demand nodes have much lower failure rates:
--&gt;
&lt;p&gt;按需节点的失败率要低得多：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Node&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;ondemand-node-1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;taints&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;failure-probability&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;2&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;NoExecute&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
Critical workloads can specify strict SLA requirements:
--&gt;
&lt;p&gt;关键工作负载可以指定严格的 SLA 要求：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;payment-processor&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tolerations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;failure-probability&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;operator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;Lt&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;5&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;NoExecute&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tolerationSeconds&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;30&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;app&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;payment-app:v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
This pod will **only** schedule on nodes with `failure-probability` less than 5 (meaning `ondemand-node-1` with 2% but not `spot-node-1` with 15%). The `NoExecute` effect with `tolerationSeconds: 30` means if a node&#39;s SLA degrades (for example, cloud provider changes the taint value), the pod gets 30 seconds to gracefully terminate before forced eviction.
--&gt;
&lt;p&gt;这个 Pod 将&lt;strong&gt;只会&lt;/strong&gt;被调度到 &lt;code&gt;failure-probability&lt;/code&gt; 小于 5 的节点上（也就是 2% 的 &lt;code&gt;ondemand-node-1&lt;/code&gt;，
而不是 15% 的 &lt;code&gt;spot-node-1&lt;/code&gt;）。带有 &lt;code&gt;tolerationSeconds: 30&lt;/code&gt; 的 &lt;code&gt;NoExecute&lt;/code&gt; 效果意味着：
如果节点 SLA 降级（例如云厂商改变了污点值），该 Pod 会获得 30 秒的时间用于优雅终止，然后才会被强制驱逐。&lt;/p&gt;
&lt;!--
Meanwhile, a fault-tolerant batch job can explicitly opt-in to spot instances:
--&gt;
&lt;p&gt;与此同时，一个具备容错能力的批处理作业可以显式选择接受 spot 实例：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;batch-job&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tolerations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;failure-probability&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;operator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;Lt&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;20&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;NoExecute&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;worker&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;batch-worker:v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
This batch job tolerates nodes with failure probability up to 20%, so it can run on both on-demand and spot nodes, maximizing cost savings while accepting higher risk.
--&gt;
&lt;p&gt;该批处理作业可容忍失败概率最高 20% 的节点，因此既能运行在按需节点上，也能运行在 spot 节点上，
在接受更高风险的同时最大化节省成本。&lt;/p&gt;
&lt;!--
### Example 2: AI workload placement with GPU tiers
--&gt;
&lt;h3 id=&#34;示例-2-基于-gpu-分层的-ai-工作负载放置&#34;&gt;示例 2：基于 GPU 分层的 AI 工作负载放置&lt;/h3&gt;
&lt;!--
AI and machine learning workloads often have specific hardware requirements. With Extended Toleration Operators, you can create GPU node tiers and ensure workloads land on appropriately powered hardware.
--&gt;
&lt;p&gt;AI 与机器学习工作负载通常对硬件有明确要求。通过扩展容忍度运算符，你可以建立 GPU 节点分层，
并确保工作负载落到性能匹配的硬件上。&lt;/p&gt;
&lt;!--
Taint GPU nodes with their compute capability score:
--&gt;
&lt;p&gt;用“算力评分”给 GPU 节点打上污点：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Node&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gpu-node-a100&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;taints&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;gpu-compute-score&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;1000&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;NoSchedule&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;---&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Node&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gpu-node-t4&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;taints&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;gpu-compute-score&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;500&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;NoSchedule&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
A heavy training workload can require high-performance GPUs:
--&gt;
&lt;p&gt;重训练（heavy training）工作负载可以要求更高性能的 GPU：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;model-training&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tolerations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;gpu-compute-score&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;operator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;Gt&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;800&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;NoSchedule&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;trainer&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;ml-trainer:v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;resources&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;limits&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;nvidia.com/gpu&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
This ensures the training pod only schedules on nodes with compute scores greater than 800 (like the A100 node), preventing placement on lower-tier GPUs that would slow down training.
--&gt;
&lt;p&gt;这将确保训练 Pod 只会被调度到算力评分大于 800 的节点上（如 A100 节点），避免落到低档 GPU 上而拖慢训练。&lt;/p&gt;
&lt;!--
Meanwhile, inference workloads with less demanding requirements can use any available GPU:
--&gt;
&lt;p&gt;而对性能要求没那么高的推理工作负载则可以使用任何可用 GPU：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;model-inference&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tolerations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;gpu-compute-score&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;operator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;Gt&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;400&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;NoSchedule&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;inference&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;ml-inference:v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;resources&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;limits&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;nvidia.com/gpu&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
### Example 3: Cost-optimized workload placement
--&gt;
&lt;h3 id=&#34;示例-3-面向成本优化的工作负载放置&#34;&gt;示例 3：面向成本优化的工作负载放置&lt;/h3&gt;
&lt;!--
For batch processing or non-critical workloads, you might want to minimize costs by running on cheaper nodes, even if they have lower performance characteristics.
--&gt;
&lt;p&gt;对于批处理或非关键工作负载，你可能希望即使牺牲一些性能特征，也通过运行在更便宜的节点上来尽量降低成本。&lt;/p&gt;
&lt;!--
Nodes can be tainted with their cost rating:
--&gt;
&lt;p&gt;节点可以用成本评级来打污点：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;taints&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;cost-per-hour&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;50&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;NoSchedule&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
A cost-sensitive batch job can express its tolerance for expensive nodes:
--&gt;
&lt;p&gt;对成本敏感的批处理作业可以表达它对昂贵节点的容忍度：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tolerations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;cost-per-hour&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;operator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;Lt&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;100&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;NoSchedule&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
This batch job will schedule on nodes costing less than $100/hour but avoid more expensive nodes. Combined with Kubernetes scheduling priorities, this enables sophisticated cost-tiering strategies where critical workloads get premium nodes while batch workloads efficiently use budget-friendly resources.
--&gt;
&lt;p&gt;该批处理作业会被调度到成本低于 100 美元/小时的节点上，并避开更昂贵的节点。
结合 Kubernetes 的调度优先级能力，你可以实现更精细的成本分层策略：关键工作负载使用高配节点，
而批处理作业高效利用更经济的资源。&lt;/p&gt;
&lt;!--
### Example 4: Performance-based placement
--&gt;
&lt;h3 id=&#34;示例-4-基于性能的放置&#34;&gt;示例 4：基于性能的放置&lt;/h3&gt;
&lt;!--
Storage-intensive applications often require minimum disk performance guarantees. With Extended Toleration Operators, you can enforce these requirements at the scheduling level.
--&gt;
&lt;p&gt;存储密集型应用通常需要最低磁盘性能保障。通过扩展容忍度运算符，你可以在调度层面强制执行这些要求。&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tolerations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;disk-iops&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;operator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;Gt&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;3000&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;NoSchedule&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
This toleration ensures the pod only schedules on nodes where `disk-iops` exceeds 3000. The `Gt` operator means &#34;I need nodes that are greater than this minimum&#34;.
--&gt;
&lt;p&gt;该容忍度确保 Pod 只会被调度到 &lt;code&gt;disk-iops&lt;/code&gt; 超过 3000 的节点上。
&lt;code&gt;Gt&lt;/code&gt; 运算符表达的是：“我需要指标高于这个最低值的节点”。&lt;/p&gt;
&lt;!--
## How to use this feature
--&gt;
&lt;h2 id=&#34;如何使用该特性&#34;&gt;如何使用该特性&lt;/h2&gt;
&lt;!--
Extended Toleration Operators is an **alpha feature** in Kubernetes v1.35. To try it out:
--&gt;
&lt;p&gt;扩展容忍度运算符是 Kubernetes v1.35 中的 &lt;strong&gt;Alpha 特性&lt;/strong&gt;。要试用它：&lt;/p&gt;
&lt;!--
1. **Enable the feature gate** on both your API server and scheduler:
--&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;在 API server 与 scheduler 上启用特性门控&lt;/strong&gt;：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;--feature-gates&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#b8860b&#34;&gt;TaintTolerationComparisonOperators&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#a2f&#34;&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
1. **Taint your nodes** with numeric values representing the metrics relevant to your scheduling needs:
--&gt;
&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;用数值型污点给节点打标&lt;/strong&gt;，其值代表你调度所关心的指标：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl taint nodes node-1 failure-probability&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;5:NoSchedule
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl taint nodes node-2 disk-iops&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;5000:NoSchedule
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
1. **Use the new operators** in your pod specifications:
--&gt;
&lt;ol start=&#34;3&#34;&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;在 Pod 规约中使用新运算符&lt;/strong&gt;：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tolerations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;failure-probability&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;operator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;Lt&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;1&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;NoSchedule&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class=&#34;alert alert-info&#34; role=&#34;alert&#34;&gt;&lt;h4 class=&#34;alert-heading&#34;&gt;说明：&lt;/h4&gt;&lt;!--
As an alpha feature, Extended Toleration Operators may change in future releases and should be used with caution in production environments. Always test thoroughly in non-production clusters first.
--&gt;
&lt;p&gt;作为 Alpha 特性，扩展容忍度运算符可能会在未来版本中发生变化，应谨慎用于生产环境。
请务必先在非生产集群中充分测试。&lt;/p&gt;&lt;/div&gt;

&lt;!--
## What&#39;s next?
--&gt;
&lt;h2 id=&#34;下一步计划是什么&#34;&gt;下一步计划是什么？&lt;/h2&gt;
&lt;!--
This alpha release is just the beginning. As we gather feedback from the community, we plan to:
--&gt;
&lt;p&gt;这次 Alpha 发布只是开始。随着我们收集社区反馈，我们计划：&lt;/p&gt;
&lt;!--
- Add support for [CEL (Common Expression Language) expressions](https://github.com/kubernetes/enhancements/issues/5500) in tolerations and node affinity for even more flexible scheduling logic, including semantic versioning comparisons
- Improve integration with cluster autoscaling for threshold-aware capacity planning
- Graduate the feature to beta and eventually GA with production-ready stability
--&gt;
&lt;ul&gt;
&lt;li&gt;在容忍度与节点亲和性（node affinity）中增加对 &lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/5500&#34;&gt;CEL（Common Expression Language）表达式&lt;/a&gt;
的支持，以提供更灵活的调度逻辑（包括语义化版本比较）&lt;/li&gt;
&lt;li&gt;改进与集群自动扩缩容（cluster autoscaling）的集成，以支持“阈值感知”的容量规划&lt;/li&gt;
&lt;li&gt;将该特性升级为 Beta，并最终达到具备生产级稳定性的 GA&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
We&#39;re particularly interested in hearing about your use cases! Do you have scenarios where threshold-based scheduling would solve problems? Are there additional operators or capabilities you&#39;d like to see?
--&gt;
&lt;p&gt;我们尤其希望听到你的使用场景！你是否有一些问题可以通过“基于阈值的调度”来解决？
你还希望看到哪些额外运算符或能力？&lt;/p&gt;
&lt;!--
## Getting involved
--&gt;
&lt;h2 id=&#34;参与其中&#34;&gt;参与其中&lt;/h2&gt;
&lt;!--
This feature is driven by the [SIG Scheduling](https://github.com/kubernetes/community/tree/master/sig-scheduling) community. Please join us to connect with the community and share your ideas and feedback around this feature and beyond.
--&gt;
&lt;p&gt;该特性由 &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-scheduling&#34;&gt;SIG Scheduling&lt;/a&gt; 社区推动。
欢迎加入我们，与社区交流并分享你对该特性及其他相关议题的想法与反馈。&lt;/p&gt;
&lt;!--
You can reach the maintainers of this feature at:
--&gt;
&lt;p&gt;你可以通过以下方式联系该特性的维护者：&lt;/p&gt;
&lt;!--
- Slack: [#sig-scheduling](https://kubernetes.slack.com/messages/sig-scheduling) on Kubernetes Slack
- Mailing list: [kubernetes-sig-scheduling@googlegroups.com](https://groups.google.com/g/kubernetes-sig-scheduling)
--&gt;
&lt;ul&gt;
&lt;li&gt;Slack：Kubernetes Slack 上的 &lt;a href=&#34;https://kubernetes.slack.com/messages/sig-scheduling&#34;&gt;#sig-scheduling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;邮件列表：&lt;a href=&#34;https://groups.google.com/g/kubernetes-sig-scheduling&#34;&gt;kubernetes-sig-scheduling@googlegroups.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
For questions or specific inquiries related to Extended Toleration Operators, please reach out to the SIG Scheduling community. We look forward to hearing from you!
--&gt;
&lt;p&gt;如果你对扩展容忍度运算符有疑问或具体咨询，请联系 SIG Scheduling 社区。我们期待你的反馈！&lt;/p&gt;
&lt;!--
## How can I learn more?
--&gt;
&lt;h2 id=&#34;如何了解更多&#34;&gt;如何了解更多？&lt;/h2&gt;
&lt;!--
- [Taints and Tolerations](/docs/concepts/scheduling-eviction/taint-and-toleration/) for understanding the fundamentals
- [Numeric comparison operators](/docs/concepts/scheduling-eviction/taint-and-toleration/#numeric-comparison-operators) for details on using `Gt` and `Lt` operators
- [KEP-5471: Extended Toleration Operators for Threshold-Based Placement](https://kep.k8s.io/5471)
--&gt;
&lt;ul&gt;
&lt;li&gt;阅读基础概念：&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/scheduling-eviction/taint-and-toleration/&#34;&gt;污点与容忍度（Taints and Tolerations）&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;了解 &lt;code&gt;Gt&lt;/code&gt; / &lt;code&gt;Lt&lt;/code&gt; 用法细节：&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/scheduling-eviction/taint-and-toleration/#numeric-comparison-operators&#34;&gt;数值比较运算符（Numeric comparison operators）&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;阅读提案：&lt;a href=&#34;https://kep.k8s.io/5471&#34;&gt;KEP-5471：用于基于阈值放置的扩展容忍度运算符&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.35：Job Managed By 特性正式发布（GA）</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/12/18/kubernetes-v1-35-job-managedby-for-jobs-goes-ga/</link>
      <pubDate>Thu, 18 Dec 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/12/18/kubernetes-v1-35-job-managedby-for-jobs-goes-ga/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.35: Job Managed By Goes GA&#34;
date: 2025-12-18T10:30:00-08:00
slug: kubernetes-v1-35-job-managedby-for-jobs-goes-ga
author: &gt;
  [Dejan Zele Pejchev](https://github.com/dejanzele) (G-Research),
  [Michał Woźniak](https://github.com/mimowo) (Google)
--&gt;
&lt;!--
In Kubernetes v1.35, the ability to specify an external Job controller (through `.spec.managedBy`) graduates to General Availability.
--&gt;
&lt;p&gt;在 Kubernetes v1.35 中，通过 &lt;code&gt;.spec.managedBy&lt;/code&gt; 指定外部 Job 控制器的能力升级为正式可用（GA）。&lt;/p&gt;
&lt;!--
This feature allows external controllers to take full responsibility for Job reconciliation, unlocking powerful scheduling patterns like multi-cluster dispatching with [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/).
--&gt;
&lt;p&gt;该特性允许外部控制器对 Job 的调谐（reconciliation）承担完全责任，从而解锁更强大的调度模式，
例如借助 &lt;a href=&#34;https://kueue.sigs.k8s.io/docs/concepts/multikueue/&#34;&gt;MultiKueue&lt;/a&gt; 进行跨多集群派发。&lt;/p&gt;
&lt;!--
## Why delegate Job reconciliation?
--&gt;
&lt;h2 id=&#34;why-delegate-job-reconciliation&#34;&gt;为何要委派 Job 调谐？  &lt;/h2&gt;
&lt;!--
The primary motivation for this feature is to support multi-cluster batch scheduling architectures, such as MultiKueue.
--&gt;
&lt;p&gt;该特性的主要动机是支持多集群批处理调度架构，例如 MultiKueue。&lt;/p&gt;
&lt;!--
The MultiKueue architecture distinguishes between a Management Cluster and a pool of Worker Clusters:
--&gt;
&lt;p&gt;MultiKueue 架构区分“管理集群（Management Cluster）”与一组“工作集群（Worker Clusters）”：&lt;/p&gt;
&lt;!--
- The Management Cluster is responsible for dispatching Jobs but not executing them. It needs to accept Job objects to track status, but it skips the creation and execution of Pods.
--&gt;
&lt;ul&gt;
&lt;li&gt;管理集群负责派发 Job，但不负责执行。
它需要接收 Job 对象以跟踪状态，但会跳过 Pod 的创建与执行。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- The Worker Clusters receive the dispatched Jobs and execute the actual Pods.
--&gt;
&lt;ul&gt;
&lt;li&gt;工作集群接收被派发的 Job，并执行实际的 Pod。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Users usually interact with the Management Cluster. Because the status is automatically propagated back, they can observe the Job&#39;s progress &#34;live&#34; without accessing the Worker Clusters.
--&gt;
&lt;ul&gt;
&lt;li&gt;用户通常与管理集群交互。由于状态会自动回传，
用户无需访问工作集群也能“实时”观察 Job 的进度。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- In the Worker Clusters, the dispatched Jobs run as regular Jobs managed by the built-in Job controller, with no `.spec.managedBy` set.
--&gt;
&lt;ul&gt;
&lt;li&gt;在工作集群中，被派发的 Job 会作为常规 Job 运行，
由内置 Job 控制器管理，且不会设置 &lt;code&gt;.spec.managedBy&lt;/code&gt;。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
By using `.spec.managedBy`, the MultiKueue controller on the Management Cluster can take over the reconciliation of a Job. It copies the status from the &#34;mirror&#34; Job running on the Worker Cluster back to the Management Cluster.
--&gt;
&lt;p&gt;通过使用 &lt;code&gt;.spec.managedBy&lt;/code&gt;，管理集群上的 MultiKueue 控制器可以接管某个 Job 的调谐。
它会将工作集群中运行的“镜像（mirror）Job”的状态复制回管理集群。&lt;/p&gt;
&lt;!--
Why not just disable the Job controller? While one could theoretically achieve this by disabling the built-in Job controller entirely, this is often impossible or impractical for two reasons:
--&gt;
&lt;p&gt;为什么不直接禁用 Job 控制器？理论上可以通过完全禁用内置 Job 控制器来实现，
但这通常不可行或不现实，原因主要有两点：&lt;/p&gt;
&lt;!--
1. Managed Control Planes: In many cloud environments, the Kubernetes control plane is locked, and users cannot modify controller manager flags.
--&gt;
&lt;ol&gt;
&lt;li&gt;托管控制平面：在许多云环境中，Kubernetes 控制平面是锁定的，
用户无法修改控制器管理器的参数。&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
2. Hybrid Cluster Role: Users often need a &#34;hybrid&#34; mode where the Management Cluster dispatches some heavy workloads to remote clusters but still executes smaller or control-plane-related Jobs in the Management Cluster. `.spec.managedBy` allows this granularity on a per-Job basis.
--&gt;
&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;混合集群角色：用户常常需要一种“混合”模式：
管理集群将部分重型工作负载派发到远端集群，
但仍在管理集群中执行较小的、或与控制平面相关的 Job。
&lt;code&gt;.spec.managedBy&lt;/code&gt; 让这种粒度可以按 Job 逐个控制。&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
## How `.spec.managedBy` works
--&gt;
&lt;h2 id=&#34;how-specmanagedby-works&#34;&gt;&lt;code&gt;.spec.managedBy&lt;/code&gt; 的工作机制  &lt;/h2&gt;
&lt;!--
The `.spec.managedBy` field indicates which controller is responsible for the Job, specifically there are two modes of operation:
--&gt;
&lt;p&gt;&lt;code&gt;.spec.managedBy&lt;/code&gt; 字段用于指示由哪个控制器负责该 Job。
具体而言，它有两种工作模式：&lt;/p&gt;
&lt;!--
- **Standard**: if unset or set to the reserved value `kubernetes.io/job-controller`, the built-in Job controller reconciles the Job as usual (standard behavior).
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;标准（Standard）&lt;/strong&gt;：如果未设置，或设置为保留值 &lt;code&gt;kubernetes.io/job-controller&lt;/code&gt;，
内置 Job 控制器会像往常一样调谐该 Job（标准行为）。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- **Delegation**: If set to any other value, the built-in Job controller skips reconciliation entirely for that Job.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;委派（Delegation）&lt;/strong&gt;：如果设置为任何其他值，内置 Job 控制器将完全跳过对该 Job 的调谐。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
To prevent orphaned Pods or resource leaks, this field is immutable. You cannot transfer a running Job from one controller to another.
--&gt;
&lt;p&gt;为防止出现孤儿 Pod 或资源泄漏，该字段是不可变的（immutable）。
你不能将一个正在运行的 Job 从一个控制器转移到另一个控制器。&lt;/p&gt;
&lt;!--
If you are looking into implementing an external controller, be aware that your controller needs to be conformant with the definitions for the [Job API](/docs/reference/kubernetes-api/workload-resources/job-v1/).
--&gt;
&lt;p&gt;如果你计划实现一个外部控制器，请注意你的控制器需要符合
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/kubernetes-api/workload-resources/job-v1/&#34;&gt;Job API&lt;/a&gt;
的定义。&lt;/p&gt;
&lt;!--
In order to enforce the conformance, a significant part of the effort was to introduce the extensive Job status validation rules.
--&gt;
&lt;p&gt;为确保这种一致性，这项工作的一个重要部分是引入了一套&lt;strong&gt;完善且严格的 Job 状态校验规则&lt;/strong&gt;。&lt;/p&gt;
&lt;!--
Navigate to the [How can you learn more?](#how-can-you-learn-more) section for more details.
--&gt;
&lt;p&gt;更多细节请参阅&lt;a href=&#34;#how-can-you-learn-more&#34;&gt;如何进一步了解？&lt;/a&gt;一节。&lt;/p&gt;
&lt;!--
## Ecosystem Adoption
--&gt;
&lt;h2 id=&#34;ecosystem-adoption&#34;&gt;生态采纳情况  &lt;/h2&gt;
&lt;!--
The `.spec.managedBy` field is rapidly becoming the standard interface for delegating control in the Kubernetes batch ecosystem.
--&gt;
&lt;p&gt;&lt;code&gt;.spec.managedBy&lt;/code&gt; 字段正在快速成为 Kubernetes 批处理生态中委派控制的标准接口。&lt;/p&gt;
&lt;!--
Various custom workload controllers are adding this field (or an equivalent) to allow MultiKueue to take over their reconciliation and orchestrate them across clusters:
--&gt;
&lt;p&gt;多种自定义工作负载控制器正在加入该字段（或等效字段），
以便让 MultiKueue 接管它们的调谐并在多集群之间进行编排：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes-sigs/jobset&#34;&gt;JobSet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.kubeflow.org/docs/components/training/&#34;&gt;Kubeflow Trainer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.ray.io/en/latest/cluster/kubernetes/&#34;&gt;KubeRay&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://project-codeflare.github.io/appwrapper/&#34;&gt;AppWrapper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://tekton.dev/docs/&#34;&gt;Tekton Pipelines&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
While it is possible to use `.spec.managedBy` to implement a custom Job controller from scratch, we haven&#39;t observed that yet. The feature is specifically designed to support delegation patterns, like MultiKueue, without reinventing the wheel.
--&gt;
&lt;p&gt;虽然理论上可以用 &lt;code&gt;.spec.managedBy&lt;/code&gt; 从零实现一个自定义 Job 控制器，
但我们尚未观察到这种用法。该特性更明确地面向委派模式（例如 MultiKueue）而设计，
以避免重复造轮子。&lt;/p&gt;
&lt;!--
## How can you learn more?
--&gt;
&lt;h2 id=&#34;how-can-you-learn-more&#34;&gt;如何进一步了解？  &lt;/h2&gt;
&lt;!--
If you want to dig deeper:
--&gt;
&lt;p&gt;如果你想进一步深入了解：&lt;/p&gt;
&lt;!--
Read the user-facing documentation for:
--&gt;
&lt;p&gt;阅读面向用户的文档：&lt;/p&gt;
&lt;!--
- [Jobs](/docs/concepts/workloads/controllers/job/),
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/controllers/job/&#34;&gt;Job&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- [Delegation of managing a Job object to an external controller](/docs/concepts/workloads/controllers/job/#delegation-of-managing-a-job-object-to-external-controller), and
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/controllers/job/#delegation-of-managing-a-job-object-to-external-controller&#34;&gt;将 Job 对象的管理委派给外部控制器&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/).
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://kueue.sigs.k8s.io/docs/concepts/multikueue/&#34;&gt;MultiKueue&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Deep dive into the design history:
--&gt;
&lt;p&gt;深入了解设计历程：&lt;/p&gt;
&lt;!--
- The Kubernetes Enhancement Proposal (KEP) [Job&#39;s managed-by mechanism](https://github.com/kubernetes/enhancements/issues/4368) including introduction of the extensive [Job status validation rules](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/4368-support-managed-by-for-batch-jobs#job-status-validation).
--&gt;
&lt;ul&gt;
&lt;li&gt;Kubernetes 增强提案（KEP）&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/4368&#34;&gt;Job&#39;s managed-by mechanism&lt;/a&gt;，
其中包括引入了更全面的 &lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/4368-support-managed-by-for-batch-jobs#job-status-validation&#34;&gt;Job status validation rules&lt;/a&gt;。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- The Kueue KEP for [MultiKueue](https://github.com/kubernetes-sigs/kueue/tree/main/keps/693-multikueue).
--&gt;
&lt;ul&gt;
&lt;li&gt;Kueue 的 KEP：&lt;a href=&#34;https://github.com/kubernetes-sigs/kueue/tree/main/keps/693-multikueue&#34;&gt;MultiKueue&lt;/a&gt;。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Explore how MultiKueue uses `.spec.managedBy` in practice in the task guide for [running Jobs across clusters](https://kueue.sigs.k8s.io/docs/tasks/run/multikueue/job/).
--&gt;
&lt;p&gt;也可以通过任务指南了解 MultiKueue 在实践中如何使用 &lt;code&gt;.spec.managedBy&lt;/code&gt;：
&lt;a href=&#34;https://kueue.sigs.k8s.io/docs/tasks/run/multikueue/job/&#34;&gt;跨集群运行 Job&lt;/a&gt;。&lt;/p&gt;
&lt;!--
## Acknowledgments
--&gt;
&lt;h2 id=&#34;acknowledgments&#34;&gt;致谢  &lt;/h2&gt;
&lt;!--
As with any Kubernetes feature, a lot of people helped shape this one through design discussions, reviews, test runs,
and bug reports.
--&gt;
&lt;p&gt;与任何 Kubernetes 特性一样，这项特性也由许多人一起塑造：
他们参与设计讨论、评审、试运行与缺陷报告等工作。&lt;/p&gt;
&lt;!--
We would like to thank, in particular:
--&gt;
&lt;p&gt;我们特别感谢：&lt;/p&gt;
&lt;!--
* [Maciej Szulik](https://github.com/soltysh) - for guidance, mentorship, and reviews.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/soltysh&#34;&gt;Maciej Szulik&lt;/a&gt;——提供指导、辅导与评审。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* [Filip Křepinský](https://github.com/atiratree) - for guidance, mentorship, and reviews.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/atiratree&#34;&gt;Filip Křepinský&lt;/a&gt;——提供指导、辅导与评审。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Get involved
--&gt;
&lt;h2 id=&#34;get-involved&#34;&gt;参与其中  &lt;/h2&gt;
&lt;!--
This work was sponsored by the Kubernetes
[Batch Working Group](https://github.com/kubernetes/community/tree/master/wg-batch)
in close collaboration with the
[SIG Apps](https://github.com/kubernetes/community/tree/master/sig-apps),
and with strong input from the
[SIG Scheduling](https://github.com/kubernetes/community/tree/master/sig-scheduling) community.
--&gt;
&lt;p&gt;这项工作由 Kubernetes 的 &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/wg-batch&#34;&gt;Batch Working Group&lt;/a&gt; 发起，
并与 &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-apps&#34;&gt;SIG Apps&lt;/a&gt; 紧密协作，
同时也得到了 &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-scheduling&#34;&gt;SIG Scheduling&lt;/a&gt;
社区的强力支持与投入。&lt;/p&gt;
&lt;!--
If you are interested in batch scheduling, multi-cluster solutions, or further improving the Job API:
--&gt;
&lt;p&gt;如果你对批处理调度、多集群解决方案或进一步改进 Job API 感兴趣：&lt;/p&gt;
&lt;!--
- Join us in the Batch WG and SIG Apps meetings.
--&gt;
&lt;ul&gt;
&lt;li&gt;欢迎加入 Batch WG 与 SIG Apps 会议。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Subscribe to the [WG Batch Slack channel](https://kubernetes.slack.com/messages/wg-batch).
--&gt;
&lt;ul&gt;
&lt;li&gt;订阅 &lt;a href=&#34;https://kubernetes.slack.com/messages/wg-batch&#34;&gt;WG Batch Slack 频道&lt;/a&gt;。&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.35：Timbernetes（世界树版本）</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/12/17/kubernetes-v1-35-release/</link>
      <pubDate>Wed, 17 Dec 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/12/17/kubernetes-v1-35-release/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.35: Timbernetes (The World Tree Release)&#34;
date: 2025-12-17T10:30:00-08:00
evergreen: true
slug: kubernetes-v1-35-release
author: &gt;
  [Kubernetes v1.35 Release Team](https://github.com/kubernetes/sig-release/blob/master/releases/release-1.35/release-team.md)
--&gt;
&lt;!--
**Editors**: Aakanksha Bhende, Arujjwal Negi, Chad M. Crowell, Graziano Casto, Swathi Rao
--&gt;
&lt;p&gt;&lt;strong&gt;编辑&lt;/strong&gt;：Aakanksha Bhende、Arujjwal Negi、Chad M. Crowell、Graziano Casto、Swathi Rao&lt;/p&gt;
&lt;!--
Similar to previous releases, the release of Kubernetes v1.35 introduces new stable, beta, and alpha features. The consistent delivery of high-quality releases underscores the strength of our development cycle and the vibrant support from our community.
--&gt;
&lt;p&gt;与之前版本类似，Kubernetes v1.35 的发布引入了新的稳定（GA）、Beta 和 Alpha 特性。
持续交付高质量版本，体现了我们开发周期的韧性，也离不开社区的热情支持。&lt;/p&gt;
&lt;!--
This release consists of 60 enhancements, including 17 stable, 19 beta, and 22 alpha features.
--&gt;
&lt;p&gt;此版本包含 60 个增强项，其中包括 17 个稳定（GA）特性、19 个 Beta 特性和 22 个 Alpha 特性。&lt;/p&gt;
&lt;!--
There are also some [deprecations and removals](#deprecations-removals-and-community-updates) in this release;
make sure to read about those.
--&gt;
&lt;p&gt;本次发布还包含一些&lt;a href=&#34;#deprecations-removals-and-community-updates&#34;&gt;弃用与移除&lt;/a&gt;内容，请务必阅读相关说明。&lt;/p&gt;
&lt;!--
## Release theme and logo
--&gt;
&lt;h2 id=&#34;release-theme-and-logo&#34;&gt;发布主题与徽标  &lt;/h2&gt;


&lt;figure class=&#34;release-logo &#34;&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/12/17/kubernetes-v1-35-release/k8s-v1.35.png&#34;
         alt=&#34;Kubernetes v1.35 Timbernetes 徽标：世界树与三只松鼠。&#34;/&gt; 
&lt;/figure&gt;
&lt;!--
2025 began in the shimmer of Octarine: The Color of Magic (v1.33) and rode the gusts Of Wind &amp; Will (v1.34). We close the year with our hands on the World Tree, inspired by Yggdrasil, the tree of life that binds many realms. Like any great tree, Kubernetes grows ring by ring and release by release, shaped by the care of a global community.
--&gt;
&lt;p&gt;2025 年在 Octarine：The Color of Magic（v1.33）的微光中启程，
又乘着 Of Wind &amp;amp; Will（v1.34）的疾风前行。
我们在年末将双手搭在世界树上，灵感来自 Yggdrasil——那棵连接诸多世界的生命之树。
如同所有伟大的树木，Kubernetes 也在全球社区的悉心呵护下，
以年轮为记、以版本为序，不断成长。&lt;/p&gt;
&lt;!--
At its center sits the Kubernetes wheel wrapped around the Earth, grounded by the resilient maintainers, contributors and users who keep showing up. Between day jobs, life changes, and steady open-source stewardship, they prune old APIs, graft new features and keep one of the world’s largest open source projects healthy.
--&gt;
&lt;p&gt;在这棵树的中心，是环抱地球的 Kubernetes 方向盘标。
它之所以稳固，源于那些始终如一的维护者、贡献者与用户。
在本职工作与生活变迁之间，在持续的开源维护之中，
他们修剪旧 API、嫁接新特性，让这个全球最大开源项目之一保持健康。&lt;/p&gt;
&lt;!--
Three squirrels guard the tree: a wizard holding the LGTM scroll for reviewers, a warrior with an axe and Kubernetes shield for the release crews who cut new branches, and a rogue with a lantern for the triagers who bring light to dark issue queues.
--&gt;
&lt;p&gt;三只松鼠守护着这棵树：
为评阅者举起 LGTM 卷轴的法师；
为发布团队挥斧开枝、并举起 Kubernetes 盾牌的战士；
以及为分诊者照亮幽深 Issue 队列的提灯游侠。&lt;/p&gt;
&lt;!--
Together, they stand in for a much larger adventuring party. Kubernetes v1.35 adds another growth ring to the World Tree, a fresh cut shaped by many hands, many paths and a community whose branches reach higher as its roots grow deeper.
--&gt;
&lt;p&gt;它们共同象征着一支规模更大的冒险队伍。
Kubernetes v1.35 为世界树再添一圈年轮——这一道新切面由无数双手、
无数条路径与一个根系更深、枝叶更高的社区共同塑造。&lt;/p&gt;
&lt;!--
## Spotlight on key updates
--&gt;
&lt;h2 id=&#34;spotlight-on-key-updates&#34;&gt;重点更新速览  &lt;/h2&gt;
&lt;!--
Kubernetes v1.35 is packed with new features and improvements. Here are a few select updates the Release Team would like to highlight!
--&gt;
&lt;p&gt;Kubernetes v1.35 带来了大量新特性与改进。下面是发布团队希望重点介绍的几个更新！&lt;/p&gt;
&lt;!--
### Stable: In-place update of Pod resources
--&gt;
&lt;h3 id=&#34;stable-in-place-update-of-pod-resources&#34;&gt;稳定（GA）阶段：Pod 资源原地更新  &lt;/h3&gt;
&lt;!--
Kubernetes has graduated in-place updates for Pod resources to General Availability (GA).
--&gt;
&lt;p&gt;Kubernetes 已将 Pod 资源的原地更新特性升级为正式发布（GA）。&lt;/p&gt;
&lt;!--
This feature allows users to adjust CPU and memory resources without restarting Pods or Containers. Previously, such modifications required recreating Pods, which could disrupt workloads, particularly for stateful or batch applications. Earlier Kubernetes releases allowed you to change only infrastructure resource settings (requests and limits) for existing Pods. The new in-place functionality allows for smoother, nondisruptive vertical scaling, improves efficiency, and can also simplify development.
--&gt;
&lt;p&gt;该特性允许用户在不重启 Pod 或容器的情况下，调整 CPU 与内存资源。
此前，这类修改需要重建 Pod，可能会干扰工作负载，尤其是有状态或批处理应用。
更早的 Kubernetes 版本仅允许你为现有 Pod 修改基础设施资源设置（requests 与 limits）。
新的原地更新能力支持更平滑、不中断的纵向扩缩容，提高效率，也能简化开发流程。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #1287](https://kep.k8s.io/1287) led by SIG Node.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/1287&#34;&gt;KEP #1287&lt;/a&gt; 的一部分，由 SIG Node 牵头完成。&lt;/p&gt;
&lt;!--
### Beta: Pod certificates for workload identity and security
--&gt;
&lt;h3 id=&#34;beta-pod-certificates-for-workload-identity-and-security&#34;&gt;Beta：用于工作负载身份与安全的 Pod 证书  &lt;/h3&gt;
&lt;!--
Previously, delivering certificates to pods required external controllers (cert-manager, SPIFFE/SPIRE), CRD orchestration, and Secret management, with rotation handled by sidecars or init containers. Kubernetes v1.35 enables native workload identity with automated certificate rotation, drastically simplifying service mesh and zero-trust architectures. 
--&gt;
&lt;p&gt;此前，要向 Pod 下发证书，往往需要外部控制器（cert-manager、SPIFFE/SPIRE）、
CRD 编排以及 Secret 管理，并由边车或 Init 容器负责证书轮换。
Kubernetes v1.35 通过自动化证书轮换，实现原生工作负载身份，
大幅简化服务网格与零信任架构。&lt;/p&gt;
&lt;!--
Now, the `kubelet` generates keys, requests certificates via PodCertificateRequest, and writes credential bundles directly to the Pod&#39;s filesystem. The `kube-apiserver` enforces node restriction at admission time, eliminating the most common pitfall for third-party signers: accidentally violating node isolation boundaries. This enables pure mTLS flows with no bearer tokens in the issuance path.
--&gt;
&lt;p&gt;现在，&lt;code&gt;kubelet&lt;/code&gt; 会生成密钥，通过 PodCertificateRequest 请求证书，
并将凭据包直接写入 Pod 的文件系统。
&lt;code&gt;kube-apiserver&lt;/code&gt; 会在准入阶段强制执行节点限制，
消除第三方签名者最常见的陷阱：无意间突破节点隔离边界。
这使得签发路径中无需持有者令牌即可实现纯双向 TLS 流程。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #4317](https://kep.k8s.io/4317) led by SIG Auth.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/4317&#34;&gt;KEP #4317&lt;/a&gt; 的一部分，由 SIG Auth 牵头完成。&lt;/p&gt;
&lt;!--
### Alpha: Node declared features before scheduling
--&gt;
&lt;h3 id=&#34;alpha-node-declared-features-before-scheduling&#34;&gt;Alpha：调度前节点声明式特性  &lt;/h3&gt;
&lt;!--
When control planes enable new features but nodes lag behind (permitted by Kubernetes skew policy), the scheduler can place pods requiring those features onto incompatible older nodes.
--&gt;
&lt;p&gt;当控制平面启用新特性、但节点侧进度滞后时（Kubernetes 版本偏差策略允许这种情况），
调度器可能会将需要这些特性的 Pod 调度到不兼容的旧节点上。&lt;/p&gt;
&lt;!--
The node-declaration features framework allows nodes to declare their supported Kubernetes features. With the new alpha feature enabled, a Node reports the features it supports, publishing this information to the control plane via a new `.status.declaredFeatures` field. Then, the `kube-scheduler`, admission controllers, and third-party components can use these declarations. For example, you can enforce scheduling and API validation constraints to ensure that Pods run only on compatible nodes.
--&gt;
&lt;p&gt;节点声明式特性框架允许节点声明其所支持的 Kubernetes 特性。
启用这一 Alpha 特性后，Node 会通过新的 &lt;code&gt;.status.declaredFeatures&lt;/code&gt; 字段上报其支持的特性，
并将信息发布到控制平面。
随后，&lt;code&gt;kube-scheduler&lt;/code&gt;、准入控制器以及第三方组件都可以使用这些声明。
例如，你可以强制执行调度与 API 校验约束，确保 Pod 只运行在兼容的节点上。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #5328](https://kep.k8s.io/5328) led by SIG Node.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/5328&#34;&gt;KEP #5328&lt;/a&gt; 的一部分，由 SIG Node 牵头完成。&lt;/p&gt;
&lt;!--
## Features graduating to Stable
--&gt;
&lt;h2 id=&#34;features-graduating-to-stable&#34;&gt;进入稳定（GA）阶段的特性  &lt;/h2&gt;
&lt;!--
*This is a selection of some of the improvements that are now stable following the v1.35 release.*
--&gt;
&lt;p&gt;&lt;strong&gt;以下列出 v1.35 发布后进入稳定（GA）阶段的一些改进。&lt;/strong&gt;&lt;/p&gt;
&lt;!--
### PreferSameNode traffic distribution
--&gt;
&lt;h3 id=&#34;prefersamenode-traffic-distribution&#34;&gt;PreferSameNode 流量分配  &lt;/h3&gt;
&lt;!--
The `trafficDistribution` field for Services has been updated to provide more explicit control over traffic routing. A new option, `PreferSameNode`, has been introduced to let services strictly prioritize endpoints on the local node if available, falling back to remote endpoints otherwise.
--&gt;
&lt;p&gt;Service 的 &lt;code&gt;trafficDistribution&lt;/code&gt; 字段已更新，以便更明确地控制流量路由。
新增选项 &lt;code&gt;PreferSameNode&lt;/code&gt;：在可用时严格优先选择本节点上的端点，
否则再回退到远端端点。&lt;/p&gt;
&lt;!--
Simultaneously, the existing `PreferClose` option has been renamed to `PreferSameZone`. This change makes the API self-explanatory by explicitly indicating that traffic is preferred within the current availability zone. While `PreferClose` is preserved for backward compatibility, `PreferSameZone` is now the standard for zonal routing, ensuring that both node-level and zone-level preferences are clearly distinguished.
--&gt;
&lt;p&gt;同时，现有的 &lt;code&gt;PreferClose&lt;/code&gt; 选项已重命名为 &lt;code&gt;PreferSameZone&lt;/code&gt;。
这一变更让 API 更加直观、自解释：它明确表示优先在当前可用区内选择流量路径。
虽然为了向后兼容仍保留 &lt;code&gt;PreferClose&lt;/code&gt;，但 &lt;code&gt;PreferSameZone&lt;/code&gt; 现在是可用区级别路由的标准选项，
确保“节点级”与“可用区级”的偏好能够清晰区分。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #3015](https://kep.k8s.io/3015) led by SIG Network.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/3015&#34;&gt;KEP #3015&lt;/a&gt; 的一部分，由 SIG Network 牵头完成。&lt;/p&gt;
&lt;!--
### Job API managed-by mechanism
--&gt;
&lt;h3 id=&#34;job-api-managed-by-mechanism&#34;&gt;Job API 的 managed-by 机制  &lt;/h3&gt;
&lt;!--
The Job API now includes a `managedBy` field that allows an external controller to handle Job status synchronization. This feature, which graduates to stable in Kubernetes v1.35, is primarily driven by [MultiKueue](https://github.com/kubernetes-sigs/kueue/tree/main/keps/693-multikueue), a multi-cluster dispatching system where a Job created in a management cluster is mirrored and executed in a worker cluster, with status updates propagated back. To enable this workflow, the built-in Job controller must not act on a particular Job resource so that the Kueue controller can manage status updates instead.
--&gt;
&lt;p&gt;Job API 新增 &lt;code&gt;managedBy&lt;/code&gt; 字段，允许外部控制器接管 Job 状态同步。
该特性在 Kubernetes v1.35 中进入稳定（GA）阶段，
主要由 &lt;a href=&#34;https://github.com/kubernetes-sigs/kueue/tree/main/keps/693-multikueue&#34;&gt;MultiKueue&lt;/a&gt; 推动。
MultiKueue 是一种多集群分发系统，在管理集群创建的 Job 会被镜像到工作集群执行，并将状态更新回传。
为实现这一工作流，需要让内置 Job 控制器不要处理某个特定 Job 资源，
从而由 Kueue 控制器接管状态更新。&lt;/p&gt;
&lt;!--
The goal is to allow clean delegation of Job synchronization to another controller. It does not aim to pass custom parameters to that controller or modify CronJob concurrency policies.
--&gt;
&lt;p&gt;其目标是让 Job 同步能够清晰地委派给另一个控制器。
它并不意图向该控制器传递自定义参数，也不打算修改 CronJob 的并发策略。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #4368](https://kep.k8s.io/4368) led by SIG Apps.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/4368&#34;&gt;KEP #4368&lt;/a&gt; 的一部分，由 SIG Apps 牵头完成。&lt;/p&gt;
&lt;!--
### Reliable Pod update tracking with `.metadata.generation`
--&gt;
&lt;h3 id=&#34;reliable-pod-update-tracking-with-metadata-generation&#34;&gt;使用 &lt;code&gt;.metadata.generation&lt;/code&gt; 可靠跟踪 Pod 更新  &lt;/h3&gt;
&lt;!--
Historically, the Pod API lacked the `metadata.generation` field found in other Kubernetes objects such as Deployments.
Because of this omission, controllers and users had no reliable way to verify whether the `kubelet` had actually processed the latest changes to a Pod&#39;s specification. This ambiguity was particularly problematic for features like [In-Place Pod Vertical Scaling](#stable-in-place-update-of-pod-resources), where it was difficult to know exactly when a resource resize request had been enacted.
--&gt;
&lt;p&gt;在历史上，Pod API 缺少 &lt;code&gt;metadata.generation&lt;/code&gt; 字段（其他对象例如 Deployment 具备该字段）。
因此，控制器与用户无法可靠地确认 &lt;code&gt;kubelet&lt;/code&gt; 是否已经处理了 Pod 规约的最新变更。
这种不确定性在诸如&lt;a href=&#34;#stable-in-place-update-of-pod-resources&#34;&gt;Pod 资源原地纵向扩缩容&lt;/a&gt;
等特性中尤为突出，因为很难精确判断资源调整请求何时真正生效。&lt;/p&gt;
&lt;!--
Kubernetes v1.33 added `.metadata.generation` fields for Pods, as an alpha feature. That field is now stable in the v1.35 Pod API, which means that every time a Pod&#39;s `spec` is updated, the `.metadata.generation` value is incremented. As part of this improvement, the Pod API also gained a `.status.observedGeneration` field, which reports the generation that the `kubelet` has successfully seen and processed. Pod conditions also each contain their own individual `observedGeneration` field that clients can report and / or observe.
--&gt;
&lt;p&gt;Kubernetes v1.33 以 Alpha 形式为 Pod 增加了 &lt;code&gt;.metadata.generation&lt;/code&gt; 字段。
在 v1.35 的 Pod API 中，该字段已进入稳定（GA）阶段。
每当更新 Pod 的 &lt;code&gt;spec&lt;/code&gt; 时，&lt;code&gt;.metadata.generation&lt;/code&gt; 的值都会递增。
作为这一改进的一部分，Pod API 还新增了 &lt;code&gt;.status.observedGeneration&lt;/code&gt; 字段，
用于报告 &lt;code&gt;kubelet&lt;/code&gt; 已经成功看到并处理的 generation。
Pod 的各类状况（conditions）也各自包含独立的 &lt;code&gt;observedGeneration&lt;/code&gt; 字段，
客户端可以上报和/或观测这些字段。&lt;/p&gt;
&lt;!--
Because this feature has graduated to stable in v1.35, it is available for all workloads.
--&gt;
&lt;p&gt;由于该特性在 v1.35 进入稳定（GA）阶段，它对所有工作负载可用。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #5067](https://kep.k8s.io/5067) led by SIG Node.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/5067&#34;&gt;KEP #5067&lt;/a&gt; 的一部分，由 SIG Node 牵头完成。&lt;/p&gt;
&lt;!--
### Configurable NUMA node limit for topology manager
--&gt;
&lt;h3 id=&#34;configurable-numa-node-limit-for-topology-manager&#34;&gt;为拓扑管理器提供可配置 NUMA 节点上限  &lt;/h3&gt;
&lt;!--
The [topology manager](/docs/concepts/policy/node-resource-managers/) historically used a hard-coded limit of 8 for the maximum number of NUMA nodes it can support, preventing state explosion during affinity calculation. (There&#39;s an important detail here; a _NUMA node_ is not the same as a Node in the Kubernetes API.) This limit on the number of NUMA nodes prevented Kubernetes from fully utilizing modern high-end servers, which increasingly feature CPU architectures with more than 8 NUMA nodes.
--&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/policy/node-resource-managers/&#34;&gt;拓扑管理器&lt;/a&gt;过去使用硬编码上限 8，
作为其可支持的 NUMA 节点最大数量，以避免在亲和性计算期间出现状态爆炸。
这里有个重要细节：NUMA 节点（NUMA node）与 Kubernetes API 中的 Node 并不是同一概念。
这一 NUMA 节点数量上限，限制了 Kubernetes 对现代高端服务器的充分利用，
因为这类服务器越来越常见地采用拥有超过 8 个 NUMA 节点的 CPU 架构。&lt;/p&gt;
&lt;!--
Kubernetes v1.31 introduced a new, **beta** `max-allowable-numa-nodes` option to the topology manager policy configuration. In Kubernetes v1.35, that option is stable. Cluster administrators who enable it can use servers with more than 8 NUMA nodes.
--&gt;
&lt;p&gt;Kubernetes v1.31 为拓扑管理器策略配置引入了新的 &lt;strong&gt;Beta&lt;/strong&gt; 选项&lt;code&gt;max-allowable-numa-nodes&lt;/code&gt;。
在 Kubernetes v1.35 中，该选项已进入稳定（GA）阶段。
启用该选项的集群管理员可以使用拥有超过 8 个 NUMA 节点的服务器。&lt;/p&gt;
&lt;!--
Although the configuration option is stable, the Kubernetes community is aware of the poor performance for large NUMA hosts, and there is a [proposed enhancement](https://kep.k8s.io/5726) (KEP-5726) that aims to improve on it. You can learn more about this by reading [Control Topology Management Policies on a node](/docs/tasks/administer-cluster/topology-manager/).
--&gt;
&lt;p&gt;尽管这一配置选项已进入稳定（GA）阶段，Kubernetes 社区仍注意到在大型 NUMA 主机上性能欠佳，
并提出了旨在改进该问题的&lt;a href=&#34;https://kep.k8s.io/5726&#34;&gt;增强提案&lt;/a&gt;（KEP-5726）。
要了解更多信息，请阅读&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/administer-cluster/topology-manager/&#34;&gt;在节点上控制拓扑管理策略&lt;/a&gt;。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #4622](https://kep.k8s.io/4622) led by SIG Node.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/4622&#34;&gt;KEP #4622&lt;/a&gt; 的一部分，由 SIG Node 牵头完成。&lt;/p&gt;
&lt;!--
## New features in Beta
--&gt;
&lt;h2 id=&#34;new-features-in-beta&#34;&gt;Beta 中的新特性  &lt;/h2&gt;
&lt;!--
*This is a selection of some of the improvements that are now beta following the v1.35 release.*
--&gt;
&lt;p&gt;&lt;strong&gt;以下列出 v1.35 发布后进入 Beta 阶段的一些改进。&lt;/strong&gt;&lt;/p&gt;
&lt;!--
### Expose node topology labels via Downward API
--&gt;
&lt;h3 id=&#34;expose-node-topology-labels-via-downward-api&#34;&gt;通过 Downward API 暴露节点拓扑标签  &lt;/h3&gt;
&lt;!--
Accessing node topology information, such as region and zone, from within a Pod has typically required querying the Kubernetes API server. While functional, this approach creates complexity and security risks by necessitating broad RBAC permissions or sidecar containers just to retrieve infrastructure metadata. Kubernetes v1.35 promotes the capability to expose node topology labels directly via the Downward API to beta. 
--&gt;
&lt;p&gt;过去，要在 Pod 内访问节点拓扑信息（例如区域与可用区），通常需要查询 Kubernetes API 服务器。
这种做法虽然可行，但为了获取基础设施元数据，往往需要授予较宽泛的 RBAC 权限，
或引入边车容器，从而带来复杂度与安全风险。
Kubernetes v1.35 将“通过 Downward API 直接暴露节点拓扑标签”的能力提升为 Beta。&lt;/p&gt;
&lt;!--
The `kubelet` can now inject standard topology labels, such as `topology.kubernetes.io/zone` and `topology.kubernetes.io/region`, into Pods as environment variables or projected volume files. The primary benefit is a safer and more efficient way for workloads to be topology-aware. This allows applications to natively adapt to their availability zone or region without dependencies on the API server, strengthening security by upholding the principle of least privilege and simplifying cluster configuration. 
--&gt;
&lt;p&gt;现在，&lt;code&gt;kubelet&lt;/code&gt; 可以将标准拓扑标签（例如 &lt;code&gt;topology.kubernetes.io/zone&lt;/code&gt;
与 &lt;code&gt;topology.kubernetes.io/region&lt;/code&gt;）注入到 Pod 中，
以环境变量或投射卷文件（projected volume files）的形式呈现。
其主要收益是让工作负载以更安全、更高效的方式具备拓扑感知能力。
应用可以在不依赖 API 服务器的情况下原生适配其所在可用区或区域，
通过坚持最小特权原则来增强安全性，并简化集群配置。&lt;/p&gt;
&lt;!--
**Note:** Kubernetes now injects available topology labels to every Pod so that they can be used as inputs to the [downward API](/docs/concepts/workloads/pods/downward-api/). With the v1.35 upgrade, most cluster administrators will see several new labels added to each Pod; this is expected as part of the design.
--&gt;
&lt;p&gt;&lt;strong&gt;说明：&lt;/strong&gt; Kubernetes 现在会为每个 Pod 注入可用的拓扑标签，
使其可以作为 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/pods/downward-api/&#34;&gt;Downward API&lt;/a&gt; 的输入。
升级到 v1.35 后，大多数集群管理员会看到每个 Pod 新增了若干标签；
这是设计的一部分，属于预期行为。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #4742](https://kep.k8s.io/4742) led by SIG Node.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/4742&#34;&gt;KEP #4742&lt;/a&gt; 的一部分，由 SIG Node 牵头完成。&lt;/p&gt;
&lt;!--
### Native support for storage version migration
--&gt;
&lt;h3 id=&#34;native-support-for-storage-version-migration&#34;&gt;存储版本迁移的原生支持  &lt;/h3&gt;
&lt;!--
In Kubernetes v1.35, the native support for storage version migration graduates to beta and is enabled by default. This move integrates the migration logic directly into the core Kubernetes control plane (&#34;in-tree&#34;), eliminating the dependency on external tools.
--&gt;
&lt;p&gt;在 Kubernetes v1.35 中，存储版本迁移的原生支持升级为 Beta 并默认启用。
这一改动将迁移逻辑直接集成到 Kubernetes 核心控制平面（in-tree）中，
从而消除对外部工具的依赖。&lt;/p&gt;
&lt;!--
Historically, administrators relied on manual &#34;read/write loops&#34;—often piping `kubectl get` into `kubectl replace`—to update schemas or re-encrypt data at rest. This method was inefficient and prone to conflicts, especially for large resources like Secrets. With this release, the built-in controller automatically handles update conflicts and consistency tokens, providing a safe, streamlined, and reliable way to ensure stored data remains current with minimal operational overhead.
--&gt;
&lt;p&gt;在过去，管理员依赖手工的“读/写循环”（read/write loops），
常见做法是把 &lt;code&gt;kubectl get&lt;/code&gt; 的输出通过管道传给 &lt;code&gt;kubectl replace&lt;/code&gt;，
用来更新资源的模式（Schema）或重新加密静态数据。
这种方式效率低且容易产生冲突，尤其是对 Secret 这类较大的资源更是如此。
在本次发布中，内置控制器会自动处理更新冲突与一致性令牌，
以更安全、简化且可靠的方式确保存储数据保持最新，并将运维开销降到最低。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #4192](https://kep.k8s.io/4192) led by SIG API Machinery.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/4192&#34;&gt;KEP #4192&lt;/a&gt; 的一部分，由 SIG API Machinery 牵头完成。&lt;/p&gt;
&lt;!--
### Mutable Volume attach limits
--&gt;
&lt;h3 id=&#34;mutable-volume-attach-limits&#34;&gt;可变更的卷挂接上限  &lt;/h3&gt;
&lt;!--
A CSI (Container Storage Interface) driver is a Kubernetes plugin that provides a consistent way for storage systems to be exposed to containerized workloads. The `CSINode` object records details about all CSI drivers installed on a node. However, a mismatch can arise between the reported and actual attachment capacity on nodes. When volume slots are consumed after a CSI driver starts up, the `kube-scheduler` may assign stateful pods to nodes without sufficient capacity, ultimately getting stuck in a `ContainerCreating` state.
--&gt;
&lt;p&gt;CSI（Container Storage Interface）驱动是 Kubernetes 插件，
为存储系统向容器化工作负载暴露能力提供一致的方式。
&lt;code&gt;CSINode&lt;/code&gt; 对象会记录节点上安装的所有 CSI 驱动的详细信息。
不过，节点上报告的挂接容量与实际挂接容量可能出现不一致：
当 CSI 驱动启动后卷槽位被消耗时，&lt;code&gt;kube-scheduler&lt;/code&gt; 可能把有状态 Pod
调度到挂接容量不足的节点上，最终卡在 &lt;code&gt;ContainerCreating&lt;/code&gt; 状态。&lt;/p&gt;
&lt;!--
Kubernetes v1.35 makes `CSINode.spec.drivers[*].allocatable.count` mutable so that a node’s available volume attachment capacity can be updated dynamically. It also allows CSI drivers to control how frequently the `allocatable.count` value is updated on all nodes by introducing a configurable refresh interval, defined through the `CSIDriver` object. Additionally, it automatically updates `CSINode.spec.drivers[*].allocatable.count` on detecting a failure in volume attachment due to insufficient capacity. Although this feature graduated to beta in v1.34 with the feature flag `MutableCSINodeAllocatableCount` disabled by default, it remains in beta for v1.35 to allow time for feedback, but the feature flag is enabled by default.
--&gt;
&lt;p&gt;Kubernetes v1.35 使 &lt;code&gt;CSINode.spec.drivers[*].allocatable.count&lt;/code&gt; 可变更，
以便动态更新节点可用的卷挂接容量。它还通过 &lt;code&gt;CSIDriver&lt;/code&gt; 对象引入可配置的刷新间隔，
允许 CSI 驱动控制在所有节点上更新 &lt;code&gt;allocatable.count&lt;/code&gt; 值的频率。
此外，当检测到因容量不足导致的卷挂接失败时，
它会自动更新 &lt;code&gt;CSINode.spec.drivers[*].allocatable.count&lt;/code&gt;。
尽管该特性在 v1.34 中已升级为 Beta，
但当时特性门控 &lt;code&gt;MutableCSINodeAllocatableCount&lt;/code&gt; 默认关闭；
在 v1.35 中它仍处于 Beta，以便留出反馈时间，同时该特性门控默认启用。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #4876](https://kep.k8s.io/4876) led by SIG Storage.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/4876&#34;&gt;KEP #4876&lt;/a&gt; 的一部分，由 SIG Storage 牵头完成。&lt;/p&gt;
&lt;!--
### Opportunistic batching
--&gt;
&lt;h3 id=&#34;opportunistic-batching&#34;&gt;机会式批处理  &lt;/h3&gt;
&lt;!--
Historically, the Kubernetes scheduler processes pods sequentially with time complexity of `O(num pods × num nodes)`, which can result in redundant computation for compatible pods. This KEP introduces an opportunistic batching mechanism that aims to improve performance by identifying such compatible Pods via `Pod scheduling signature` and batching them together, allowing shared filtering and scoring results across them.
--&gt;
&lt;p&gt;在过去，Kubernetes 调度器按顺序处理 Pod，其时间复杂度为 &lt;code&gt;O(Pod 个数 × 节点个数)&lt;/code&gt;，
这会导致对“可兼容 Pod”执行重复计算。此 KEP 引入一种机会式批处理机制，
旨在通过 &lt;code&gt;Pod scheduling signature&lt;/code&gt; 识别这类可兼容 Pod 并将它们批量处理，
从而在这些 Pod 之间共享过滤与打分结果以提升性能。&lt;/p&gt;
&lt;!--
The pod scheduling signature ensures that two pods with the same signature are “the same” from a scheduling perspective. It takes into account not only the pod and node attributes, but also the other pods in the system and global data about the pod placement. This means that any pod with the given signature will get the same scores/feasibility results from any arbitrary set of nodes.
--&gt;
&lt;p&gt;**Pod 调度签名（Pod Scheduling Signature）**机制确保从调度视角看，具有相同签名的两个 Pod 是“相同的”。
它不仅会考虑 Pod 与节点属性，还会纳入系统中的其他 Pod 以及有关放置的全局数据。
这意味着：具有给定签名的任意 Pod，在任意一组节点上都会得到相同的打分/可行性判断结果。&lt;/p&gt;
&lt;!--
The batching mechanism consists of two operations that can be invoked whenever needed - *create* and *nominate*. Create leads to the creation of a new set of batch information from the scheduling results of Pods that have a valid signature. Nominate uses the batching information from create to set the nominated node name from a new Pod whose signature matches the canonical Pod’s signature.
--&gt;
&lt;p&gt;该批处理机制包含两个可按需调用的操作：&lt;em&gt;create&lt;/em&gt; 与 &lt;em&gt;nominate&lt;/em&gt;。
create 会基于具有有效签名的 Pod 的调度结果，创建一组新的批处理信息。
nominate 会使用 create 生成的批处理信息，
为一个新 Pod（其签名与规范 Pod 的签名一致）设置提名的节点名称。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #5598](https://kep.k8s.io/5598) led by SIG Scheduling.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/5598&#34;&gt;KEP #5598&lt;/a&gt; 的一部分，由 SIG Scheduling 牵头完成。&lt;/p&gt;
&lt;!--
### `maxUnavailable` for StatefulSets
--&gt;
&lt;h3 id=&#34;maxunavailable-for-statefulsets&#34;&gt;StatefulSet 的 &lt;code&gt;maxUnavailable&lt;/code&gt;  &lt;/h3&gt;
&lt;!--
A StatefulSet runs a group of Pods and maintains a sticky identity for each of those Pods. This is critical for stateful workloads requiring stable network identifiers or persistent storage. When a StatefulSet&#39;s `.spec.updateStrategy.&lt;type&gt;` is set to `RollingUpdate`, the StatefulSet controller will delete and recreate each Pod in the StatefulSet. It will proceed in the same order as Pod termination (from the largest ordinal to the smallest), updating each Pod one at a time.
--&gt;
&lt;p&gt;StatefulSet 运行一组 Pod，并为其中每个 Pod 维护粘性身份（Sticky Identity）。
这对需要稳定网络标识符或持久存储的有状态工作负载至关重要。
当 StatefulSet 的 &lt;code&gt;.spec.updateStrategy.&amp;lt;type&amp;gt;&lt;/code&gt; 设置为 &lt;code&gt;RollingUpdate&lt;/code&gt; 时，
StatefulSet 控制器会删除并重建 StatefulSet 中的每个 Pod。
它会按 Pod 终止的顺序（从最大序号到最小序号）推进，一次只更新一个 Pod。&lt;/p&gt;
&lt;!--
Kubernetes v1.24 added a new **alpha** field to a StatefulSet&#39;s `rollingUpdate` configuration settings, called `maxUnavailable`. That field wasn&#39;t part of the Kubernetes API unless your cluster administrator explicitly opted in.
In Kubernetes v1.35 that field is beta and is available by default. You can use it to define the maximum number of pods that can be unavailable during an update. This setting is most effective in combination with `.spec.podManagementPolicy` set to Parallel.  You can set `maxUnavailable` as either a positive number (example: 2) or a percentage of the desired number of Pods (example: 10%). If this field is not specified, it will default to 1, to maintain the previous behavior of only updating one Pod at a time. This improvement allows stateful applications (that can tolerate more than one Pod being down) to finish updating faster.
--&gt;
&lt;p&gt;Kubernetes v1.24 在 StatefulSet 的 &lt;code&gt;rollingUpdate&lt;/code&gt; 配置中新增了一个 &lt;strong&gt;Alpha&lt;/strong&gt; 字段
&lt;code&gt;maxUnavailable&lt;/code&gt;，除非你的集群管理员显式选择启用，
否则该字段不会出现在 Kubernetes API 中。
在 Kubernetes v1.35 中，该字段升级为 Beta 且默认可用。
你可以用它定义更新期间最多允许不可用的 Pod 数量。
该设置与将 &lt;code&gt;.spec.podManagementPolicy&lt;/code&gt; 设为 Parallel 组合使用时最有效。
你可以把 &lt;code&gt;maxUnavailable&lt;/code&gt; 设置为一个正整数（例如：2），
或设置为期望 Pod 数量的百分比（例如：10%）。
如果未指定该字段，它默认为 1，以保持此前“一次只更新一个 Pod”的行为。
这一改进使有状态应用（可容忍多个 Pod 同时不可用）能够更快完成更新。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #961](https://kep.k8s.io/961) led by SIG Apps.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/961&#34;&gt;KEP #961&lt;/a&gt; 的一部分，由 SIG Apps 牵头完成。&lt;/p&gt;
&lt;!--
### Configurable credential plugin policy in `kuberc`
--&gt;
&lt;h3 id=&#34;configurable-credential-plugin-policy-in-kuberc&#34;&gt;&lt;code&gt;kuberc&lt;/code&gt; 中可配置的凭据插件策略  &lt;/h3&gt;
&lt;!--
The optional [`kuberc` file](/docs/reference/kubectl/kuberc/) is a way to separate server configurations and cluster credentials from user preferences without disrupting already running CI pipelines with unexpected outputs.
--&gt;
&lt;p&gt;可选的 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/kubectl/kuberc/&#34;&gt;&lt;code&gt;kuberc&lt;/code&gt; 文件&lt;/a&gt;
用于将服务器配置与集群凭据和用户偏好相分离，而不会因意外输出而打断已经在运行的 CI 流水线。&lt;/p&gt;
&lt;!--
As part of the v1.35 release, `kuberc` gains additional functionality which allows users to configure credential plugin policy. This change introduces two fields `credentialPluginPolicy`, which allows or denies all plugins, and allows specifying a list of allowed plugins using `credentialPluginAllowlist`.
--&gt;
&lt;p&gt;作为 v1.35 发布的一部分，&lt;code&gt;kuberc&lt;/code&gt; 增加了允许用户配置凭据插件策略的能力。
此变更引入两个字段：&lt;code&gt;credentialPluginPolicy&lt;/code&gt;（允许或拒绝所有插件），
以及 &lt;code&gt;credentialPluginAllowlist&lt;/code&gt;（允许指定允许插件的列表）。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #3104](https://kep.k8s.io/3104) as a cooperation between SIG Auth and SIG CLI.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/3104&#34;&gt;KEP #3104&lt;/a&gt; 的一部分，
由 SIG Auth 与 SIG CLI 协作完成。&lt;/p&gt;
&lt;!--
### KYAML
--&gt;
&lt;h3 id=&#34;kyaml&#34;&gt;KYAML  &lt;/h3&gt;
&lt;!--
YAML is a human-readable format of data serialization. In Kubernetes, YAML files are used to define and configure resources, such as Pods, Services, and Deployments. However, complex YAML is difficult to read. YAML&#39;s significant whitespace requires careful attention to indentation and nesting, while its optional string-quoting can lead to unexpected type coercion (see: The Norway Bug). While JSON is an alternative, it lacks support for comments and has strict requirements for trailing commas and quoted keys.
--&gt;
&lt;p&gt;YAML 是一种便于人类阅读的数据序列化格式。
在 Kubernetes 中，YAML 文件用于定义与配置资源，例如 Pod、Service 与 Deployment。
不过，复杂 YAML 很难阅读：YAML 对缩进与嵌套要求严格；
同时，其可选的字符串引用也可能导致意外的类型强制转换（参见：The Norway Bug）。
虽然 JSON 可以作为一种替代方案，但它不支持注释，并对尾随逗号与键的引号有严格要求。&lt;/p&gt;
&lt;!--
KYAML is a safer and less ambiguous subset of YAML designed specifically for Kubernetes. Introduced as an opt-in alpha feature in v1.34, this feature graduated to beta in Kubernetes v1.35 and has been enabled by default. It can be disabled by setting the environment variable `KUBECTL_KYAML=false`. 
--&gt;
&lt;p&gt;KYAML 是专为 Kubernetes 设计的、更安全且更少歧义的 YAML 子集。
它在 v1.34 作为可选的 Alpha 特性引入，
并在 Kubernetes v1.35 升级为 Beta 且默认启用。
你可以通过设置环境变量 &lt;code&gt;KUBECTL_KYAML=false&lt;/code&gt; 来禁用它。&lt;/p&gt;
&lt;!--
KYAML addresses challenges pertaining to both YAML and JSON. All KYAML files are also valid YAML files. This means you can write KYAML and pass it as an input to any version of kubectl. This also means that you don’t need to write in strict KYAML for the input to be parsed.
--&gt;
&lt;p&gt;KYAML 旨在解决 YAML 与 JSON 的一些共性挑战。
所有 KYAML 文件也都是合法的 YAML 文件，
这意味着你可以编写 KYAML 并将其作为输入提供给任意版本的 kubectl。
这也意味着，即使输入并非严格 KYAML，也仍然可以被解析。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #5295](https://kep.k8s.io/5295) led by SIG CLI.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/5295&#34;&gt;KEP #5295&lt;/a&gt; 的一部分，由 SIG CLI 牵头完成。&lt;/p&gt;
&lt;!--
### Configurable tolerance for HorizontalPodAutoscalers
--&gt;
&lt;h3 id=&#34;configurable-tolerance-for-horizontalpodautoscalers&#34;&gt;可配置的 HorizontalPodAutoscalers 容忍度  &lt;/h3&gt;
&lt;!--
The Horizontal Pod Autoscaler (HPA) has historically relied on a fixed, global 10% tolerance for scaling actions. A drawback of this hardcoded value was that workloads requiring high sensitivity, such as those needing to scale on a 5% load increase, were often blocked from scaling, while others might oscillate unnecessarily.
--&gt;
&lt;p&gt;水平 Pod 自动扩缩容器（Horizontal Pod Autoscaler，HPA）长期依赖固定的全局 10% 容忍度来执行扩缩容。
这一硬编码值的缺点是：对需要高灵敏度的工作负载（例如希望在负载增加 5% 时就扩容）不够友好，
这些工作负载常常无法触发扩缩容；而另一些工作负载则可能产生不必要的振荡。&lt;/p&gt;
&lt;!--
With Kubernetes v1.35, the configurable tolerance feature graduates to beta and is enabled by default. This enhancement allows users to define a custom tolerance window on a per-resource basis within the HPA `behavior` field. By setting a specific tolerance (e.g., lowering it to 0.05 for 5%), operators gain precise control over autoscaling sensitivity, ensuring that critical workloads react quickly to small metric changes, without requiring cluster-wide configuration adjustments.
--&gt;
&lt;p&gt;在 Kubernetes v1.35 中，“可配置容忍度”特性升级为 Beta 并默认启用。
该增强允许用户在 HPA 的 &lt;code&gt;behavior&lt;/code&gt; 字段中，按资源粒度定义自定义容忍窗口。
通过设置特定容忍度（例如将其降低到 0.05 来表示 5%），运维人员可以更精确地控制自动扩缩容灵敏度，
确保关键工作负载能对小幅指标变化快速响应，而无需进行集群范围的配置调整。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #4951](https://kep.k8s.io/4951) led by SIG Autoscaling.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/4951&#34;&gt;KEP #4951&lt;/a&gt; 的一部分，由 SIG Autoscaling 牵头完成。&lt;/p&gt;
&lt;!--
### Support for user namespaces in Pods
--&gt;
&lt;h3 id=&#34;support-for-user-namespaces-in-pods&#34;&gt;Pod 中的用户命名空间支持  &lt;/h3&gt;
&lt;!--
Kubernetes is adding support for user namespaces, allowing pods to run with isolated user and group ID mappings instead of sharing host IDs. This means containers can operate as root internally while actually being mapped to an unprivileged user on the host, reducing the risk of privilege escalation in the event of a compromise. The feature improves pod-level security and makes it safer to run workloads that need root inside the container. Over time, support has expanded to both stateless and stateful Pods through id-mapped mounts.
--&gt;
&lt;p&gt;Kubernetes 增加了对用户命名空间（user namespaces）的支持，
使 Pod 可以使用相互隔离的用户/组 ID 映射运行，而不是共享主机上的 ID。
这意味着容器在内部可以以 root 身份运行，
但在主机上实际映射为一个非特权用户，从而在发生入侵时降低提权风险。
该特性提升了 Pod 级别的安全性，使需要在容器内使用 root 的工作负载更安全。
随着时间推移，该能力也通过 ID 映射挂载（id-mapped mounts）扩展到无状态与有状态 Pod。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #127](https://kep.k8s.io/127) led by SIG Node.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/127&#34;&gt;KEP #127&lt;/a&gt; 的一部分，由 SIG Node 牵头完成。&lt;/p&gt;
&lt;!--
### VolumeSource: OCI artifact and/or image
--&gt;
&lt;h3 id=&#34;volumesource-oci-artifact-andor-image&#34;&gt;VolumeSource：OCI 工件和/或镜像  &lt;/h3&gt;
&lt;!--
When creating a Pod, you often need to provide data, binaries, or configuration files for your containers. This meant including the content into the main container image or using a custom init container to download and unpack files into an `emptyDir`. Both these approaches are still valid. Kubernetes v1.31 added support for the `image` volume type allowing Pods to declaratively pull and unpack OCI container image artifacts into a volume. This lets you package and deliver data-only artifacts such as configs, binaries, or machine learning models using standard OCI registry tools.
--&gt;
&lt;p&gt;在创建 Pod 时，你常常需要为容器提供数据、二进制文件或配置文件。
这通常意味着要么把内容打进主容器镜像，要么使用自定义 Init 容器下载并解包到 &lt;code&gt;emptyDir&lt;/code&gt; 中。
这两种方式仍然有效。Kubernetes v1.31 增加了对 &lt;code&gt;image&lt;/code&gt; 卷类型的支持，
允许 Pod 以声明的方式拉取并将 OCI 容器镜像工件解包到卷中。
这使你可以使用标准 OCI 镜像库工具来打包与分发纯数据工件，例如配置、二进制文件或机器学习模型。&lt;/p&gt;
&lt;!--
With this feature, you can fully separate your data from your container image and remove the need for extra init containers or startup scripts. The image volume type has been in beta since v1.33 and is enabled by default in v1.35. Please note that using this feature requires a compatible container runtime, such as containerd v2.1 or later.
--&gt;
&lt;p&gt;借助该特性，你可以将数据与容器镜像彻底分离，并去除额外 Init 容器或启动脚本的需求。
image 卷类型自 v1.33 起处于 Beta，并在 v1.35 中默认启用。
请注意，使用该特性需要兼容的容器运行时，例如 containerd v2.1 或更高版本。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #4639](https://kep.k8s.io/4639) led by SIG Node.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/4639&#34;&gt;KEP #4639&lt;/a&gt; 的一部分，由 SIG Node 牵头完成。&lt;/p&gt;
&lt;!--
### Enforced `kubelet` credential verification for cached images
--&gt;
&lt;h3 id=&#34;enforced-kubelet-credential-verification-for-cached-images&#34;&gt;对缓存镜像强制执行 &lt;code&gt;kubelet&lt;/code&gt; 凭据校验  &lt;/h3&gt;
&lt;!--
The `imagePullPolicy: IfNotPresent` setting currently allows a Pod to use a container image that is already cached on a node, even if the Pod itself does not possess the credentials to pull that image. A drawback of this behavior is that it creates a security vulnerability in multi-tenant clusters: if a Pod with valid credentials pulls a sensitive private image to a node, a subsequent unauthorized Pod on the same node can access that image simply by relying on the local cache.
--&gt;
&lt;p&gt;当前，&lt;code&gt;imagePullPolicy: IfNotPresent&lt;/code&gt; 允许 Pod 使用节点上已经缓存的容器镜像，
即使 Pod 本身并不具备拉取该镜像所需的凭据。这种行为在多租户集群中会带来安全漏洞：
如果某个具备有效凭据的 Pod 把敏感的私有镜像拉取到某节点上，
同一节点上后续的未授权 Pod 只需依赖本地缓存就能访问该镜像。&lt;/p&gt;
&lt;!--
This KEP introduces a mechanism where the `kubelet` enforces credential verification for cached images. Before allowing a Pod to use a locally cached image, the `kubelet` checks if the Pod has the valid credentials to pull it. This ensures that only authorized workloads can use private images, regardless of whether they are already present on the node, significantly hardening the security posture for shared clusters.
--&gt;
&lt;p&gt;此 KEP 引入一种机制：由 &lt;code&gt;kubelet&lt;/code&gt; 对缓存镜像强制执行凭据校验。
在允许 Pod 使用本地缓存镜像之前，&lt;code&gt;kubelet&lt;/code&gt; 会检查 Pod 是否具备拉取该镜像的有效凭据。
这确保只有经授权的工作负载才能使用私有镜像，
无论该镜像是否已经存在于节点上，从而显著增强共享集群的安全性。&lt;/p&gt;
&lt;!--
In Kubernetes v1.35, this feature has graduated to beta and is enabled by default. Users can still disable it by setting the `KubeletEnsureSecretPulledImages` feature gate to false. Additionally, the `imagePullCredentialsVerificationPolicy` flag allows operators to configure the desired security level, ranging from a mode that prioritizes backward compatibility to a strict enforcement mode that offers maximum security.
--&gt;
&lt;p&gt;在 Kubernetes v1.35 中，该特性升级为 Beta 并默认启用。
用户仍可将 &lt;code&gt;KubeletEnsureSecretPulledImages&lt;/code&gt; 特性门控设为 false 来禁用它。
此外，&lt;code&gt;imagePullCredentialsVerificationPolicy&lt;/code&gt; 参数允许运维人员配置期望的安全级别，
从优先保证向后兼容的模式到提供最高安全性的严格强制模式不等。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #2535](https://kep.k8s.io/2535) led by SIG Node.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/2535&#34;&gt;KEP #2535&lt;/a&gt; 的一部分，由 SIG Node 牵头完成。&lt;/p&gt;
&lt;!--
### Fine-grained Container restart rules
--&gt;
&lt;h3 id=&#34;fine-grained-container-restart-rules&#34;&gt;细粒度的容器重启规则  &lt;/h3&gt;
&lt;!--
Historically, the `restartPolicy` field was defined strictly at the Pod level, forcing the same behavior on all containers within a Pod. A drawback of this global setting was the lack of granularity for complex workloads, such as AI/ML training jobs. These often required `restartPolicy: Never` for the Pod to manage job completion, yet individual containers would benefit from in-place restarts for specific, retriable errors (like network glitches or GPU init failures).
--&gt;
&lt;p&gt;在过去，&lt;code&gt;restartPolicy&lt;/code&gt; 字段只能在 Pod 级别定义，从而强制 Pod 内所有容器采用相同行为。
这一全局设置对复杂工作负载（例如 AI/ML 训练作业）缺乏足够的粒度。
这类作业往往需要 Pod 使用 &lt;code&gt;restartPolicy: Never&lt;/code&gt; 以管理作业完成，
但某些容器仍希望能针对可重试的特定错误（如网络抖动或 GPU 初始化失败）执行原地重启。&lt;/p&gt;
&lt;!--
Kubernetes v1.35 addresses this by enabling `restartPolicy` and `restartPolicyRules` within the container API itself. This allows users to define restart strategies for individual regular and init containers that operate independently of the Pod&#39;s overall policy. For example, a container can now be configured to restart automatically only if it exits with a specific error code, avoiding the expensive overhead of rescheduling the entire Pod for a transient failure.
--&gt;
&lt;p&gt;Kubernetes v1.35 通过在容器 API 本身中启用 &lt;code&gt;restartPolicy&lt;/code&gt; 与 &lt;code&gt;restartPolicyRules&lt;/code&gt;
来解决这一问题。这允许用户为单个普通容器与 Init 容器定义重启策略，
并使其与 Pod 的整体策略相互独立。例如，你可以将容器配置为仅在以特定错误码退出时才自动重启，
从而避免因短暂故障而重调度整个 Pod 的昂贵开销。&lt;/p&gt;
&lt;!--
In this release, the feature has graduated to beta and is enabled by default. Users can immediately leverage `restartPolicyRules` in their container specifications to optimize recovery times and resource utilization for long-running workloads, without altering the broader lifecycle logic of their Pods.
--&gt;
&lt;p&gt;在本次发布中，该特性升级为 Beta 并默认启用。用户可以立即在容器规约中使用 &lt;code&gt;restartPolicyRules&lt;/code&gt;，
为长时间运行的工作负载优化恢复时间与资源利用率，而无需改变 Pod 更宏观的生命周期逻辑。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #5307](https://kep.k8s.io/5307) led by SIG Node.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/5307&#34;&gt;KEP #5307&lt;/a&gt; 的一部分，由 SIG Node 牵头完成。&lt;/p&gt;
&lt;!--
### CSI driver opt-in for service account tokens via secrets field
--&gt;
&lt;h3 id=&#34;csi-driver-opt-in-for-service-account-tokens-via-secrets-field&#34;&gt;CSI 驱动可选择通过 secrets 字段获取 ServiceAccount 令牌  &lt;/h3&gt;
&lt;!--
Providing ServiceAccount tokens to Container Storage Interface (CSI) drivers has traditionally relied on injecting them into the `volume_context` field. This approach presents a significant security risk because `volume_context` is intended for non-sensitive configuration data and is frequently logged in plain text by drivers and debugging tools, potentially leaking credentials.
--&gt;
&lt;p&gt;在向 CSI（Container Storage Interface）驱动提供 ServiceAccount 令牌时，
传统上依赖把令牌注入到 &lt;code&gt;volume_context&lt;/code&gt; 字段中。
这种方式存在显著安全风险：&lt;code&gt;volume_context&lt;/code&gt; 主要用于非敏感配置数据，
并且常被驱动与调试工具以明文形式记录到日志中，从而可能泄露凭据。&lt;/p&gt;
&lt;!--
Kubernetes v1.35 introduces an opt-in mechanism for CSI drivers to receive ServiceAccount tokens via the dedicated secrets field in the NodePublishVolume request. Drivers can now enable this behavior by setting the `serviceAccountTokenInSecrets` field to true in their CSIDriver object, instructing the `kubelet` to populate the token securely.
--&gt;
&lt;p&gt;Kubernetes v1.35 引入一套可选择启用的机制，
让 CSI 驱动通过 NodePublishVolume 请求中的专用 secrets 字段获取 ServiceAccount 令牌。
驱动现在可以在其 CSIDriver 对象中将 &lt;code&gt;serviceAccountTokenInSecrets&lt;/code&gt; 设为 true 来启用此行为，
从而指示 &lt;code&gt;kubelet&lt;/code&gt; 以更安全的方式填充该令牌。&lt;/p&gt;
&lt;!--
The primary benefit is the prevention of accidental credential exposure in logs and error messages. This change ensures that sensitive workload identities are handled via the appropriate secure channels, aligning with best practices for secret management while maintaining backward compatibility for existing drivers.
--&gt;
&lt;p&gt;其主要收益是防止凭据在日志与错误信息中被意外暴露。
这一变更确保敏感的工作负载身份通过合适的安全通道处理，
在保持对既有驱动向后兼容的同时，也更符合密文管理最佳实践。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #5538](https://kep.k8s.io/5538) led by SIG Auth in cooperation with SIG Storage.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/5538&#34;&gt;KEP #5538&lt;/a&gt; 的一部分，
由 SIG Auth 牵头并与 SIG Storage 协作完成。&lt;/p&gt;
&lt;!--
### Deployment status: count of terminating replicas
--&gt;
&lt;h3 id=&#34;deployment-status-count-of-terminating-replicas&#34;&gt;Deployment 状态：正在终止的副本计数  &lt;/h3&gt;
&lt;!--
Historically, the Deployment status provided details on available and updated replicas but lacked explicit visibility into Pods that were in the process of shutting down. A drawback of this omission was that users and controllers could not easily distinguish between a stable Deployment and one that still had Pods executing cleanup tasks or adhering to long grace periods.
--&gt;
&lt;p&gt;在过去，Deployment 状态会提供可用副本与已更新副本的详细信息，
但缺少对“正在关闭过程中的 Pod”的明确可见性。
这一缺失使用户与控制器难以区分“稳定的 Deployment”与“仍有 Pod 正在执行清理任务或处于较长优雅终止期”的 Deployment。&lt;/p&gt;
&lt;!--
Kubernetes v1.35 promotes the `terminatingReplicas` field within the Deployment status to beta. This field provides a count of Pods that have a deletion timestamp set but have not yet been removed from the system. This feature is a foundational step in a larger initiative to improve how Deployments handle Pod replacement, laying the groundwork for future policies regarding when to create new Pods during a rollout.
--&gt;
&lt;p&gt;Kubernetes v1.35 将 Deployment 状态中的 &lt;code&gt;terminatingReplicas&lt;/code&gt; 字段提升为 Beta。
该字段提供已设置删除时间戳但尚未从系统移除的 Pod 数量。该特性是一个更大计划中的基础一步，
旨在改进 Deployment 如何处理 Pod 替换，并为未来制定“在滚动发布期间何时创建新 Pod”的策略奠定基础。&lt;/p&gt;
&lt;!--
The primary benefit is improved observability for lifecycle management tools and operators. By exposing the number of terminating Pods, external systems can now make more informed decisions such as waiting for a complete shutdown before proceeding with subsequent tasks without needing to manually query and filter individual Pod lists.
--&gt;
&lt;p&gt;其主要收益是提升生命周期管理工具与运维人员的可观测性。
通过公开正在终止的 Pod 数量，可以让外部系统做出更明智的决策，
例如在继续后续任务之前等待完全关闭，而无需手工查询并筛选各个 Pod 的列表。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #3973](https://kep.k8s.io/3973) led by SIG Apps.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/3973&#34;&gt;KEP #3973&lt;/a&gt; 的一部分，由 SIG Apps 牵头完成。&lt;/p&gt;
&lt;!--
## New features in Alpha
--&gt;
&lt;h2 id=&#34;new-features-in-alpha&#34;&gt;Alpha 阶段的新特性  &lt;/h2&gt;
&lt;!--
*This is a selection of some of the improvements that are now alpha following the v1.35 release.*
--&gt;
&lt;p&gt;&lt;strong&gt;以下列出 v1.35 发布后进入 Alpha 阶段的一些改进。&lt;/strong&gt;&lt;/p&gt;
&lt;!--
### Gang scheduling support in Kubernetes
--&gt;
&lt;h3 id=&#34;gang-scheduling-support-in-kubernetes&#34;&gt;Kubernetes 中的 Gang 调度支持  &lt;/h3&gt;
&lt;!--
Scheduling interdependent workloads, such as AI/ML training jobs or HPC simulations, has traditionally been challenging because the default Kubernetes scheduler places Pods individually. This often leads to partial scheduling where some Pods start while others wait indefinitely for resources, resulting in deadlocks and wasted cluster capacity.
--&gt;
&lt;p&gt;对相互依赖的工作负载（例如 AI/ML 训练作业或 HPC 仿真）进行调度，
传统上一直很有挑战性，因为默认的 Kubernetes 调度器会逐个调度 Pod。
这常导致“部分调度”：部分 Pod 已启动，而其他 Pod 由于资源不足无限期等待，
从而引发死锁并浪费集群容量。&lt;/p&gt;
&lt;!--
Kubernetes v1.35 introduces native support for so-called &#34;gang scheduling&#34; via the new Workload API and PodGroup concept. This feature implements an &#34;all-or-nothing&#34; scheduling strategy, ensuring that a defined group of Pods is scheduled only if the cluster has sufficient resources to accommodate the entire group simultaneously.
--&gt;
&lt;p&gt;Kubernetes v1.35 通过新的 Workload API 与 PodGroup 概念，
引入对所谓成组调度（Gang Scheduling）的原生支持。
该特性实现“全有或全无”的调度策略：只有当集群有足够资源同时容纳整个 Pod 组时，才会对该组进行调度。&lt;/p&gt;
&lt;!--
The primary benefit is improved reliability and efficiency for batch and parallel workloads. By preventing partial deployments, it eliminates resource deadlocks and ensures that expensive cluster capacity is utilized only when a complete job can run, significantly optimizing the orchestration of large-scale data processing tasks.
--&gt;
&lt;p&gt;其主要收益是提升批处理与并行工作负载的可靠性与效率。通过避免部分部署，它消除了资源死锁，
并确保昂贵的集群容量只在能够运行完整作业时才会被使用，从而显著优化大规模数据处理任务的编排。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #4671](https://kep.k8s.io/4671) led by SIG Scheduling.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/4671&#34;&gt;KEP #4671&lt;/a&gt; 的一部分，由 SIG Scheduling 牵头完成。&lt;/p&gt;
&lt;!--
### Constrained impersonation
--&gt;
&lt;h3 id=&#34;constrained-impersonation&#34;&gt;受限的身份扮演（Impersonation）  &lt;/h3&gt;
&lt;!--
Historically, the `impersonate` verb in Kubernetes RBAC functioned on an all-or-nothing basis: once a user was authorized to impersonate a target identity, they gained all associated permissions. A drawback of this broad authorization was that it violated the principle of least privilege, preventing administrators from restricting impersonators to specific actions or resources.
--&gt;
&lt;p&gt;在过去，Kubernetes RBAC 中的 &lt;code&gt;impersonate&lt;/code&gt; 动词按“全有或全无”运作：
一旦用户被授权可以扮演某个目标身份，就会获得该身份所关联的全部权限。
这种宽泛授权的缺点是违背最小特权原则，使管理员难以将模拟者的权限限制到特定动作或特定资源上。&lt;/p&gt;
&lt;!--
Kubernetes v1.35 introduces a new alpha feature, constrained impersonation, which adds a secondary authorization check to the impersonation flow. When enabled via the `ConstrainedImpersonation` feature gate, the API server verifies not only the basic `impersonate` permission but also checks if the impersonator is authorized for the specific action using new verb prefixes (e.g., `impersonate-on:&lt;mode&gt;:&lt;verb&gt;`). This allows administrators to define fine-grained policies—such as permitting a support engineer to impersonate a cluster admin solely to view logs, without granting full administrative access.
--&gt;
&lt;p&gt;Kubernetes v1.35 引入一个新的 Alpha 特性：受限的身份扮演（Constrained Impersonation），
它在身份扮演流程中增加一次二次鉴权检查。当 &lt;code&gt;ConstrainedImpersonation&lt;/code&gt; 特性门控被启用后，
API 服务器不仅会校验基础的 &lt;code&gt;impersonate&lt;/code&gt; 权限，还会使用新的动词前缀（例如 &lt;code&gt;impersonate-on:&amp;lt;mode&amp;gt;:&amp;lt;verb&amp;gt;&lt;/code&gt;）
检查身份扮演者是否被授权执行特定动作。
这使管理员可以定义细粒度策略——例如允许支持工程师模拟集群管理员仅用于查看日志，
而不授予完整的管理员访问权限。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #5284](https://kep.k8s.io/5284) led by SIG Auth.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/5284&#34;&gt;KEP #5284&lt;/a&gt; 的一部分，由 SIG Auth 牵头完成。&lt;/p&gt;
&lt;!--
### Flagz for Kubernetes components
--&gt;
&lt;h3 id=&#34;flagz-for-kubernetes-components&#34;&gt;Kubernetes 组件的 Flagz  &lt;/h3&gt;
&lt;!--
Verifying the runtime configuration of Kubernetes components, such as the API server or `kubelet`, has traditionally required privileged access to the host node or process arguments. To address this, the `/flagz` endpoint was introduced to expose command-line options via HTTP. However, its output was initially limited to plain text, making it difficult for automated tools to parse and validate configurations reliably.
--&gt;
&lt;p&gt;在过去，要验证 Kubernetes 组件（例如 API 服务器或 &lt;code&gt;kubelet&lt;/code&gt;）的运行时配置，
通常需要对宿主机节点或进程参数具有特权访问权限。
为解决这一问题，引入了 &lt;code&gt;/flagz&lt;/code&gt; 端点，通过 HTTP 公开其命令行选项。
但其最初输出仅为纯文本，使自动化工具难以可靠地解析并校验配置。&lt;/p&gt;
&lt;!--
In Kubernetes v1.35, the `/flagz` endpoint has been enhanced to support structured, machine-readable JSON output. Authorized users can now request a versioned JSON response using standard HTTP content negotiation, while the original plain text format remains available for human inspection. This update significantly improves observability and compliance workflows, allowing external systems to programmatically audit component configurations without fragile text parsing or direct infrastructure access.
--&gt;
&lt;p&gt;在 Kubernetes v1.35 中，&lt;code&gt;/flagz&lt;/code&gt; 端点增强为支持结构化、机器可读的 JSON 输出。
经授权的用户现在可以通过标准 HTTP 内容协商请求版本化的 JSON 响应，
同时原先的纯文本格式仍保留，便于人工查看。
此更新显著改进可观测性与合规工作流，让外部系统无需脆弱的文本解析或直接基础设施访问，
即可通过编程方式审计组件配置。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #4828](https://kep.k8s.io/4828) led by SIG Instrumentation.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/4828&#34;&gt;KEP #4828&lt;/a&gt; 的一部分，由 SIG Instrumentation 牵头完成。&lt;/p&gt;
&lt;!--
### Statusz for Kubernetes components
--&gt;
&lt;h3 id=&#34;statusz-for-kubernetes-components&#34;&gt;Kubernetes 组件的 Statusz  &lt;/h3&gt;
&lt;!--
Troubleshooting Kubernetes components like the `kube-apiserver` or `kubelet` has traditionally involved parsing unstructured logs or text output, which is brittle and difficult to automate. While a basic `/statusz` endpoint existed previously, it lacked a standardized, machine-readable format, limiting its utility for external monitoring systems.
--&gt;
&lt;p&gt;传统上，排查 &lt;code&gt;kube-apiserver&lt;/code&gt; 或 &lt;code&gt;kubelet&lt;/code&gt; 等 Kubernetes 组件问题，
往往需要解析非结构化日志或文本输出，这种方式脆弱且难以自动化。
此前虽然存在基础的 &lt;code&gt;/statusz&lt;/code&gt; 端点，
但缺乏标准化、机器可读的格式，从而限制了外部监控系统的可用性。&lt;/p&gt;
&lt;!--
In Kubernetes v1.35, the `/statusz` endpoint has been enhanced to support structured, machine-readable JSON output. Authorized users can now request this format using standard HTTP content negotiation to retrieve precise status data—such as version information and health indicators—without relying on fragile text parsing. This improvement provides a reliable, consistent interface for automated debugging and observability tools across all core components.
--&gt;
&lt;p&gt;在 Kubernetes v1.35 中，&lt;code&gt;/statusz&lt;/code&gt; 端点增强为支持结构化、机器可读的 JSON 输出。
经授权的用户现在可以通过标准 HTTP 内容协商请求这一格式，
以获取精确的状态数据——例如版本信息与健康指标——而无需依赖脆弱的文本解析。
该改进为所有核心组件的自动化调试与可观测性工具提供了可靠且一致的接口。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #4827](https://kep.k8s.io/4827) led by SIG Instrumentation.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/4827&#34;&gt;KEP #4827&lt;/a&gt; 的一部分，由 SIG Instrumentation 牵头完成。&lt;/p&gt;
&lt;!--
### CCM: watch-based route controller reconciliation using informers
--&gt;
&lt;h3 id=&#34;ccm-watch-based-route-controller-reconciliation-using-informers&#34;&gt;CCM：基于 Informer 的 Watch 式路由控制器调谐  &lt;/h3&gt;
&lt;!--
Managing network routes within cloud environments has traditionally relied on the Cloud Controller Manager (CCM) periodically polling the cloud provider&#39;s API to verify and update route tables. This fixed-interval reconciliation approach can be inefficient, often generating a high volume of unnecessary API calls and introducing latency between a node state change and the corresponding route update.
--&gt;
&lt;p&gt;在云环境中管理网络路由，传统上依赖云控制器管理器（CCM）定期轮询云提供商 API 来校验并更新路由表。
这种固定间隔的调谐方式可能效率不高，
常会产生大量不必要的 API 调用，并在节点状态变化与路由更新之间引入延迟。&lt;/p&gt;
&lt;!--
For the Kubernetes v1.35 release, the cloud-controller-manager library introduces a watch-based reconciliation strategy for the route controller. Instead of relying on a timer, the controller now utilizes informers to watch for specific Node events, such as additions, deletions, or relevant field updates and triggers route synchronization only when a change actually occurs.
--&gt;
&lt;p&gt;在 Kubernetes v1.35 中，cloud-controller-manager 库为路由控制器引入基于 watch 的调谐策略。
控制器不再依赖定时器，而是利用 Informer 监听特定的 Node 事件，例如新增、删除或相关字段更新，
仅在确有变更发生时触发路由同步。&lt;/p&gt;
&lt;!--
The primary benefit is a significant reduction in cloud provider API usage, which lowers the risk of hitting rate limits and reduces operational overhead. Additionally, this event-driven model improves the responsiveness of the cluster&#39;s networking layer by ensuring that route tables are updated immediately following changes in cluster topology.
--&gt;
&lt;p&gt;其主要收益是显著减少对云提供商 API 的使用，从而降低触发速率限制的风险并减少运维开销。
此外，这种事件驱动模型通过确保路由表在集群拓扑变化后立即更新，提升了集群网络层的响应速度。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #5237](https://kep.k8s.io/5237) led by SIG Cloud Provider.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/5237&#34;&gt;KEP #5237&lt;/a&gt; 的一部分，由 SIG Cloud Provider 牵头完成。&lt;/p&gt;
&lt;!--
### Extended toleration operators for threshold-based placement
--&gt;
&lt;h3 id=&#34;extended-toleration-operators-for-threshold-based-placement&#34;&gt;用于基于阈值放置的扩展容忍度运算符  &lt;/h3&gt;
&lt;!--
Kubernetes v1.35 introduces SLA-aware scheduling by enabling workloads to express reliability requirements. The feature adds numeric comparison operators to tolerations, allowing pods to match or avoid nodes based on SLA-oriented taints such as service guarantees or fault-domain quality.
--&gt;
&lt;p&gt;Kubernetes v1.35 通过允许工作负载表达可靠性要求，引入 SLA 感知调度（SLA-aware scheduling）。
该特性为容忍度增加数值比较运算符，
让 Pod 可以依据与 SLA 相关的污点（例如服务保障或故障域质量）来匹配或避开节点。&lt;/p&gt;
&lt;!--
The primary benefit is enhancing the scheduler with more precise placement. Critical workloads can demand higher-SLA nodes, while lower priority workloads can opt into lower SLA ones. This improves utilization and reduces cost without compromising reliability.
--&gt;
&lt;p&gt;其主要收益是让调度器具备更精确的放置能力。关键工作负载可要求更高 SLA 的节点，
而低优先级工作负载则可选择使用较低 SLA 的节点。这在不牺牲可靠性的前提下提升了利用率并降低成本。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #5471](https://kep.k8s.io/5471) led by SIG Scheduling.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/5471&#34;&gt;KEP #5471&lt;/a&gt; 的一部分，由 SIG Scheduling 牵头完成。&lt;/p&gt;
&lt;!--
### Mutable container resources when Job is suspended
--&gt;
&lt;h3 id=&#34;mutable-container-resources-when-job-is-suspended&#34;&gt;Job 挂起时可变更的容器资源  &lt;/h3&gt;
&lt;!--
Running batch workloads often involves trial and error with resource limits. Currently, the Job specification is immutable, meaning that if a Job fails due to an Out of Memory (OOM) error or insufficient CPU, the user cannot simply adjust the resources; they must delete the Job and create a new one, losing the execution history and status.
--&gt;
&lt;p&gt;运行批处理工作负载时，经常需要对资源限制进行反复试错。
目前 Job 规约是不可变的，这意味着当 Job 因内存不足（OOM）或 CPU 不足而失败时，
用户无法直接调整资源；他们必须删除 Job 并重新创建，从而丢失执行历史与状态信息。&lt;/p&gt;
&lt;!--
Kubernetes v1.35 introduces the capability to update resource requests and limits for Jobs that are in a suspended state. Enabled via the `MutablePodResourcesForSuspendedJobs` feature gate, this enhancement allows users to pause a failing Job, modify its Pod template with appropriate resource values, and then resume execution with the corrected configuration.
--&gt;
&lt;p&gt;Kubernetes v1.35 引入一种能力：对处于挂起状态的 Job 更新资源请求与限制。
通过 &lt;code&gt;MutablePodResourcesForSuspendedJobs&lt;/code&gt; 特性门控启用后，
用户可以暂停一个失败的 Job，修改其 Pod 模板中的资源值，然后在修正配置后恢复执行。&lt;/p&gt;
&lt;!--
The primary benefit is a smoother recovery workflow for misconfigured jobs. By allowing in-place corrections during suspension, users can resolve resource bottlenecks without disrupting the Job&#39;s lifecycle identity or losing track of its completion status, significantly improving the developer experience for batch processing.
--&gt;
&lt;p&gt;其主要收益是让配置错误的 Job 具备更平滑的恢复流程。
通过允许在挂起期间进行原地修正，用户可以消除资源瓶颈，
而不会破坏 Job 的生命周期标识，也不会丢失完成状态追踪，
从而显著改善批处理场景下的开发体验。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #5440](https://kep.k8s.io/5440) led by SIG Apps.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/5440&#34;&gt;KEP #5440&lt;/a&gt; 的一部分，由 SIG Apps 牵头完成。&lt;/p&gt;
&lt;!--
## Other notable changes
--&gt;
&lt;h2 id=&#34;other-notable-changes&#34;&gt;其他值得关注的变更  &lt;/h2&gt;
&lt;!--
### Continued innovation in Dynamic Resource Allocation (DRA)
--&gt;
&lt;h3 id=&#34;continued-innovation-in-dynamic-resource-allocation-dra&#34;&gt;动态资源分配（DRA）的持续创新  &lt;/h3&gt;
&lt;!--
The [core functionality](https://kep.k8s.io/4381) was graduated to stable in v1.34, with the ability to turn it off. In v1.35 it is always enabled. Several alpha features have also been significantly improved and are ready for testing. We encourage users to provide feedback on these capabilities to help clear the path for their target promotion to beta in upcoming releases.
--&gt;
&lt;p&gt;&lt;a href=&#34;https://kep.k8s.io/4381&#34;&gt;核心能力&lt;/a&gt;在 v1.34 中进阶至稳定（GA）阶段，并允许关闭。
在 v1.35 中，此特性将始终被启用。此外，若干 Alpha 特性也得到了显著改进，已准备好进行测试。
我们鼓励用户就这些能力提供反馈，以帮助它们在后续版本中更顺利地走向 Beta。&lt;/p&gt;
&lt;!--
#### Extended Resource Requests via DRA
--&gt;
&lt;h4 id=&#34;extended-resource-requests-via-dra&#34;&gt;通过 DRA 扩展资源请求  &lt;/h4&gt;
&lt;!--
Several functional gaps compared to Extended Resource requests via Device Plugins were addressed, for example scoring and reuse of devices in init containers.
--&gt;
&lt;p&gt;相较于通过设备插件（Device Plugins）实现的扩展资源请求，
当前版本补齐了若干特性差距，例如对 Init 容器中设备的打分与复用能力。&lt;/p&gt;
&lt;!--
#### Device Taints and Tolerations
--&gt;
&lt;h4 id=&#34;device-taints-and-tolerations&#34;&gt;设备污点与容忍度  &lt;/h4&gt;
&lt;!--
The new &#34;None&#34; effect can be used to report a problem without immediately affecting scheduling or running pod. DeviceTaintRule now provides status information about an ongoing eviction. The &#34;None&#34; effect can be used for a &#34;dry run&#34; before actually evicting pods:
--&gt;
&lt;p&gt;新的 “None” 效果可用于报告问题，而不会立刻影响调度或正在运行的 Pod。
DeviceTaintRule 现在还会提供正在进行驱逐的状态信息。
在真正开始驱逐 Pod 之前，可以先用 “None” 效果进行一次“演练”（dry run）：&lt;/p&gt;
&lt;!--
- Create DeviceTaintRule with &#34;effect: None&#34;.
- Check the status to see how many pods would be evicted.
- Replace &#34;effect: None&#34; with &#34;effect: NoExecute&#34;.
--&gt;
&lt;ul&gt;
&lt;li&gt;使用 &lt;code&gt;effect: None&lt;/code&gt; 创建 DeviceTaintRule。&lt;/li&gt;
&lt;li&gt;检查状态，了解将会驱逐多少个 Pod。&lt;/li&gt;
&lt;li&gt;将 &lt;code&gt;effect: None&lt;/code&gt; 替换为 &lt;code&gt;effect: NoExecute&lt;/code&gt;。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
#### Partitionable Devices
--&gt;
&lt;h4 id=&#34;partitionable-devices&#34;&gt;可切分设备  &lt;/h4&gt;
&lt;!--
Devices belonging to the same partitionable devices may now be defined in different ResourceSlices.
--&gt;
&lt;p&gt;属于同一类可切分设备（Partitionable Devices）的设备，
现在可以定义在不同的 ResourceSlice 中。&lt;/p&gt;
&lt;!--
You can read more in the [official documentation](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#partitionable-devices).
--&gt;
&lt;p&gt;更多信息请参阅&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#partitionable-devices&#34;&gt;官方文档&lt;/a&gt;。&lt;/p&gt;
&lt;!--
#### Consumable Capacity, Device Binding Conditions
--&gt;
&lt;h4 id=&#34;consumable-capacity-device-binding-conditions&#34;&gt;可消耗容量与设备绑定条件  &lt;/h4&gt;
&lt;!--
Several bugs were fixed and/or more tests added.
--&gt;
&lt;p&gt;该版本修复了若干缺陷并添加了更多测试。&lt;/p&gt;
&lt;!--
You can learn more about [Consumable Capacity](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#consumable-capacity) and [Binding Conditions](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-binding-conditions) in the official documentation.
--&gt;
&lt;p&gt;你可以在官方文档中进一步了解&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#consumable-capacity&#34;&gt;可消耗容量&lt;/a&gt;
与&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-binding-conditions&#34;&gt;绑定条件&lt;/a&gt;。&lt;/p&gt;
&lt;!--
### Comparable resource version semantics
--&gt;
&lt;h3 id=&#34;comparable-resource-version-semantics&#34;&gt;可比较的资源版本语义  &lt;/h3&gt;
&lt;!--
Kubernetes v1.35 changes the way that clients are allowed to interpret [resource versions](/docs/reference/using-api/api-concepts/#resource-versions).
--&gt;
&lt;p&gt;Kubernetes v1.35 改变了客户端被允许解释&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/using-api/api-concepts/#resource-versions&#34;&gt;资源版本（resource versions）&lt;/a&gt;的方式。&lt;/p&gt;
&lt;!--
Before v1.35, the only supported comparison that clients could make was to check for string equality: if two resource versions were equal, they were the same. Clients could also provide a resource version to the API server and ask the control plane to do internal comparisons, such as streaming all events since a particular resource version.
--&gt;
&lt;p&gt;在 v1.35 之前，客户端唯一受支持的比较方式是字符串相等性检查：
如果两个资源版本相等，它们就是同一个版本。
客户端也可以向 API 服务器提供资源版本，并请求控制平面执行内部比较，
例如流式获取自某个资源版本以来的所有事件。&lt;/p&gt;
&lt;!--
In v1.35, all in-tree resource versions meet a new stricter definition: the values are a special form of decimal number. And, because they can be compared, clients can do their own operations to compare two different resource versions.
--&gt;
&lt;p&gt;在 v1.35 中，所有 in-tree 的资源版本都满足更严格的新定义：
它们的取值是一种特殊形式的十进制数。由于这些值可比较，
客户端也可以自行比较两个不同的资源版本。&lt;/p&gt;
&lt;!--
For example, this means that a client reconnecting after a crash can detect when it has lost updates, as distinct from the case where there has been an update but no lost changes in the meantime.
--&gt;
&lt;p&gt;例如，这意味着客户端在崩溃后重新连接时，可以检测自己是否丢失了更新，
而不仅仅是判断“期间是否有更新但没有丢失变更”的情况。&lt;/p&gt;
&lt;!--
This change in semantics enables other important use cases such as [storage version migration](/docs/tasks/manage-kubernetes-objects/storage-version-migration/), performance improvements to _informers_ (a client helper concept), and controller reliability. All of those cases require knowing whether one resource version is newer than another.
--&gt;
&lt;p&gt;这一语义变更还支撑了其他重要用例，例如
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/manage-kubernetes-objects/storage-version-migration/&#34;&gt;存储版本迁移&lt;/a&gt;、对 &lt;code&gt;informers&lt;/code&gt;
（一种客户端辅助概念）的性能改进，以及控制器可靠性提升。
这些用例都需要能够判断一个资源版本是否比另一个更新。&lt;/p&gt;
&lt;!--
This work was done as part of [KEP #5504](https://kep.k8s.io/5504) led by SIG API Machinery.
--&gt;
&lt;p&gt;此项工作是 &lt;a href=&#34;https://kep.k8s.io/5504&#34;&gt;KEP #5504&lt;/a&gt; 的一部分，由 SIG API Machinery 牵头完成。&lt;/p&gt;
&lt;!--
## Graduations, deprecations, and removals in v1.35
--&gt;
&lt;h2 id=&#34;deprecations-and-removals&#34;&gt;v1.35 的升级、弃用与移除  &lt;/h2&gt;
&lt;!--
### Graduations to stable
--&gt;
&lt;h3 id=&#34;graduations-to-stable&#34;&gt;进入稳定（GA）阶段的特性  &lt;/h3&gt;
&lt;!--
This lists all the features that graduated to stable (also known as *general availability*). For a full list of updates including new features and graduations from alpha to beta, see the release notes.
--&gt;
&lt;p&gt;这里列出所有进入稳定（也称为 &lt;strong&gt;正式发布（GA）&lt;/strong&gt;）阶段的特性。
要获取包含新增特性与从 Alpha 升级到 Beta 等在内的完整更新列表，请参阅发布说明。&lt;/p&gt;
&lt;!--
This release includes a total of 15 enhancements promoted to stable:
--&gt;
&lt;p&gt;本次发布共有 15 个增强项进入稳定（GA）阶段：&lt;/p&gt;
&lt;!--
* [Add CPUManager policy option to restrict reservedSystemCPUs to system daemons and interrupt processing](https://kep.k8s.io/4540)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://kep.k8s.io/4540&#34;&gt;为 CPUManager 策略增加选项，将 reservedSystemCPUs 限定用于系统守护进程与中断处理&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* [Pod Generation](https://kep.k8s.io/5067)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://kep.k8s.io/5067&#34;&gt;Pod Generation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* [Invariant Testing](https://kep.k8s.io/5468)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://kep.k8s.io/5468&#34;&gt;Invariant Testing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* [In-Place Update of Pod Resources](https://kep.k8s.io/1287)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://kep.k8s.io/1287&#34;&gt;Pod 资源原地更新&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* [Fine-grained SupplementalGroups control](https://kep.k8s.io/3619)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://kep.k8s.io/3619&#34;&gt;更细粒度的 SupplementalGroups 控制&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* [Add support for a drop-in kubelet configuration directory](https://kep.k8s.io/3983)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://kep.k8s.io/3983&#34;&gt;支持 drop-in kubelet 配置目录&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* [Remove gogo protobuf dependency for Kubernetes API types](https://kep.k8s.io/5589)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://kep.k8s.io/5589&#34;&gt;移除 Kubernetes API 类型对 gogo protobuf 的依赖&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* [kubelet image GC after a maximum age](https://kep.k8s.io/4210)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://kep.k8s.io/4210&#34;&gt;kubelet 镜像垃圾回收：基于最大镜像年龄&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* [Kubelet limit of Parallel Image Pulls](https://kep.k8s.io/3673)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://kep.k8s.io/3673&#34;&gt;kubelet 并行拉取镜像的上限&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* [Add a TopologyManager policy option for MaxAllowableNUMANodes](https://kep.k8s.io/4622)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://kep.k8s.io/4622&#34;&gt;为 TopologyManager 策略增加 MaxAllowableNUMANodes 选项&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* [Include kubectl command metadata in http request headers](https://kep.k8s.io/859)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://kep.k8s.io/859&#34;&gt;在 HTTP 请求头中包含 kubectl 命令元数据&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* [PreferSameNode Traffic Distribution (formerly PreferLocal traffic policy / Node-level topology)](https://kep.k8s.io/3015)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://kep.k8s.io/3015&#34;&gt;PreferSameNode 流量分配（原 PreferLocal 流量策略/节点级拓扑）&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* [Job API managed-by mechanism](https://kep.k8s.io/4368)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://kep.k8s.io/4368&#34;&gt;Job API 的 managed-by 机制&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* [Transition from SPDY to WebSockets](https://kep.k8s.io/4006)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://kep.k8s.io/4006&#34;&gt;从 SPDY 迁移到 WebSockets&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
### Deprecations, removals and community updates
--&gt;
&lt;h3 id=&#34;deprecations-removals-and-community-updates&#34;&gt;弃用、移除与社区更新  &lt;/h3&gt;
&lt;!--
As Kubernetes develops and matures, features may be deprecated, removed, or replaced with better
ones to improve the project&#39;s overall health. See the Kubernetes
[deprecation and removal policy](/docs/reference/using-api/deprecation-policy/) for more details on this process. Kubernetes v1.35 includes a couple of deprecations.
--&gt;
&lt;p&gt;随着 Kubernetes 的发展与成熟，为提升项目整体健康度，
一些特性可能会被弃用、移除，或被更好的方案替代。
关于这一过程的更多信息，请参阅 Kubernetes 的&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/using-api/deprecation-policy/&#34;&gt;弃用与移除策略&lt;/a&gt;。
Kubernetes v1.35 包含了若干项弃用内容。&lt;/p&gt;
&lt;!--
#### Ingress NGINX retirement
--&gt;
&lt;h4 id=&#34;ingress-nginx-retirement&#34;&gt;Ingress NGINX 退役  &lt;/h4&gt;
&lt;!--
For years, the Ingress NGINX controller has been a popular choice for routing traffic into Kubernetes clusters. It was flexible, widely adopted, and served as the standard entry point for countless applications.
--&gt;
&lt;p&gt;多年来，Ingress NGINX 控制器一直是将流量路由到 Kubernetes 集群的热门选择。
它灵活、被广泛采用，并长期作为无数应用的标准入口。&lt;/p&gt;
&lt;!--
However, maintaining the project has become unsustainable. With a severe shortage of maintainers and mounting technical debt, the community recently made the difficult decision to retire it. This isn&#39;t strictly part of the v1.35 release, but it&#39;s such an important change that we wanted to highlight it here.
--&gt;
&lt;p&gt;然而，项目维护已经变得难以为继。由于维护者严重短缺且技术债不断累积，
社区近期做出了艰难决定：让该项目退役。这虽然并非严格意义上的 v1.35 发布内容，
但它影响重大，我们希望在这里特别强调。&lt;/p&gt;
&lt;!--
Consequently, the Kubernetes project announced that Ingress NGINX will receive only best-effort maintenance until **March 2026**. After this date, it will be archived with no further updates. The recommended path forward is to migrate to the [Gateway API](https://gateway-api.sigs.k8s.io/), which offers a more modern, secure, and extensible standard for traffic management.
--&gt;
&lt;p&gt;因此，Kubernetes 项目宣布 Ingress NGINX 将仅提供尽力而为的维护，
直至 &lt;strong&gt;2026 年 3 月&lt;/strong&gt;。
此日期之后，该项目将归档并不再更新。
推荐的后续路径是迁移到 &lt;a href=&#34;https://gateway-api.sigs.k8s.io/&#34;&gt;Gateway API&lt;/a&gt;，
它提供了更现代、更安全且更可扩展的流量管理标准。&lt;/p&gt;
&lt;!--
You can find more in the [official blog post](/blog/2025/11/11/ingress-nginx-retirement/).
--&gt;
&lt;p&gt;更多信息请参阅&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2025/11/11/ingress-nginx-retirement/&#34;&gt;官方博客文章&lt;/a&gt;。&lt;/p&gt;
&lt;!--
#### Removal of cgroup v1 support
--&gt;
&lt;h4 id=&#34;removal-of-cgroup-v1-support&#34;&gt;移除对 cgroup v1 的支持  &lt;/h4&gt;
&lt;!--
When it comes to managing resources on Linux nodes, Kubernetes has historically relied on cgroups (control groups). While the original cgroup v1 was functional, it was often inconsistent and limited. That is why Kubernetes introduced support for cgroup v2 back in v1.25, offering a much cleaner, unified hierarchy and better resource isolation.
--&gt;
&lt;p&gt;在 Linux 节点的资源管理方面，Kubernetes 历史上依赖 cgroups（control groups）。
尽管最初的 cgroup v1 可以工作，但它常常不一致且存在局限。
因此，Kubernetes 在 v1.25 引入对 cgroup v2 的支持，
提供了更干净的统一层级结构与更好的资源隔离能力。&lt;/p&gt;
&lt;!--
Because cgroup v2 is now the modern standard, Kubernetes is ready to retire the legacy cgroup v1 support in v1.35. This is an important notice for cluster administrators: if you are still running nodes on older Linux distributions that don&#39;t support cgroup v2, your `kubelet` will fail to start. To avoid downtime, you will need to migrate those nodes to systems where cgroup v2 is enabled.
--&gt;
&lt;p&gt;由于 cgroup v2 现已成为现代标准，
Kubernetes 准备在 v1.35 中退役遗留的 cgroup v1 支持。
这对集群管理员而言是一项重要提醒：
如果你仍在运行不支持 cgroup v2 的旧 Linux 发行版节点，
你的 &lt;code&gt;kubelet&lt;/code&gt; 将无法启动。
为避免停机，你需要将这些节点迁移到启用了 cgroup v2 的系统上。&lt;/p&gt;
&lt;!--
To learn more, read [about cgroup v2](/docs/concepts/architecture/cgroups/);  
you can also track the switchover work via [KEP-5573: Remove cgroup v1 support](https://kep.k8s.io/5573).  
--&gt;
&lt;p&gt;要了解更多信息，请阅读&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/architecture/cgroups/&#34;&gt;关于 cgroup v2&lt;/a&gt;；&lt;br&gt;
你也可以通过 &lt;a href=&#34;https://kep.k8s.io/5573&#34;&gt;KEP-5573：移除 cgroup v1 支持&lt;/a&gt; 跟踪切换工作。&lt;/p&gt;
&lt;!--
#### Deprecation of ipvs mode in kube-proxy
--&gt;
&lt;h4 id=&#34;deprecation-of-ipvs-mode-in-kube-proxy&#34;&gt;kube-proxy 中 ipvs 模式的弃用  &lt;/h4&gt;
&lt;!--
Years ago, Kubernetes adopted the [`ipvs`](/docs/reference/networking/virtual-ips/#proxy-mode-ipvs) mode in `kube-proxy` to provide faster load balancing than the standard [`iptables`](/docs/reference/networking/virtual-ips/#proxy-mode-iptables). While it offered a performance boost, keeping it in sync with evolving networking requirements created too much technical debt and complexity.
--&gt;
&lt;p&gt;多年前，Kubernetes 在 &lt;code&gt;kube-proxy&lt;/code&gt; 中采用&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/networking/virtual-ips/#proxy-mode-ipvs&#34;&gt;&lt;code&gt;ipvs&lt;/code&gt;&lt;/a&gt; 模式，
以提供比标准&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/networking/virtual-ips/#proxy-mode-iptables&#34;&gt;&lt;code&gt;iptables&lt;/code&gt;&lt;/a&gt; 更快的负载均衡。
虽然它带来了性能提升，但为了跟上不断演进的网络需求，
维护其一致性所带来的技术债与复杂度已过高。&lt;/p&gt;
&lt;!--
Because of this maintenance burden, Kubernetes v1.35 deprecates `ipvs` mode. Although the mode remains available in this release, `kube-proxy` will now emit a warning on startup when configured to use it. The goal is to streamline the codebase and focus on modern standards. For Linux nodes, you should begin transitioning to [`nftables`](/docs/reference/networking/virtual-ips/#proxy-mode-nftables), which is now the recommended replacement.
--&gt;
&lt;p&gt;由于这一维护负担，Kubernetes v1.35 弃用 &lt;code&gt;ipvs&lt;/code&gt; 模式。尽管该模式在本次发布中仍可用，
但当 &lt;code&gt;kube-proxy&lt;/code&gt; 被配置为使用该模式时，将在启动时发出警告。
该弃用的目标是精简代码库并聚焦于现代标准。
对于 Linux 节点，你应开始迁移到&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/networking/virtual-ips/#proxy-mode-nftables&#34;&gt;&lt;code&gt;nftables&lt;/code&gt;&lt;/a&gt;，
它现在是推荐的替代方案。&lt;/p&gt;
&lt;!--
You can find more in [KEP-5495: Deprecate ipvs mode in kube-proxy](https://kep.k8s.io/5495).
--&gt;
&lt;p&gt;更多信息请参阅 &lt;a href=&#34;https://kep.k8s.io/5495&#34;&gt;KEP-5495：弃用 kube-proxy 的 ipvs 模式&lt;/a&gt;。&lt;/p&gt;
&lt;!--
#### Final call for containerd v1.X
--&gt;
&lt;h4 id=&#34;final-call-for-containerd-v1x&#34;&gt;containerd v1.X 的最后通告  &lt;/h4&gt;
&lt;!--
While Kubernetes v1.35 still supports containerd 1.7 and other LTS releases, this is the final version with such support. The SIG Node community has designated v1.35 as the last release to support the containerd v1.X series.
--&gt;
&lt;p&gt;尽管 Kubernetes v1.35 仍支持 containerd 1.7 与其他 LTS 版本，
但这是最后一个提供此类支持的版本。
SIG Node 社区已将 v1.35 指定为最后一个支持 containerd v1.X 系列的版本。&lt;/p&gt;
&lt;!--
This serves as an important reminder: before upgrading to the next Kubernetes version, you must switch to containerd 2.0 or later. To help identify which nodes need attention, you can monitor the `kubelet_cri_losing_support` metric within your cluster.
--&gt;
&lt;p&gt;这是一条重要提醒：
在升级到下一个 Kubernetes 版本之前，你必须切换到 containerd 2.0 或更高版本。
为帮助识别哪些节点需要关注，你可以在集群中监控 &lt;code&gt;kubelet_cri_losing_support&lt;/code&gt; 指标。&lt;/p&gt;
&lt;!--
You can find more in the [official blog post](/blog/2025/09/12/kubernetes-v1-34-cri-cgroup-driver-lookup-now-ga/#announcement-kubernetes-is-deprecating-containerd-v1-y-support) or in [KEP-4033: Discover cgroup driver from CRI](https://kep.k8s.io/4033).
--&gt;
&lt;p&gt;更多信息可参阅&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2025/09/12/kubernetes-v1-34-cri-cgroup-driver-lookup-now-ga/#announcement-kubernetes-is-deprecating-containerd-v1-y-support&#34;&gt;官方博客文章&lt;/a&gt;，
或阅读 &lt;a href=&#34;https://kep.k8s.io/4033&#34;&gt;KEP-4033：从 CRI 发现 cgroup driver&lt;/a&gt;。&lt;/p&gt;
&lt;!--
#### Improved Pod stability during `kubelet` restarts
--&gt;
&lt;h4 id=&#34;improved-pod-stability-during-kubelet-restarts&#34;&gt;&lt;code&gt;kubelet&lt;/code&gt; 重启期间的 Pod 稳定性改进  &lt;/h4&gt;
&lt;!--
Previously, restarting the `kubelet` service often caused a temporary disruption in Pod status. During a restart, the kubelet would reset container states, causing healthy Pods to be marked as `NotReady` and removed from load balancers, even if the application itself was still running correctly.
--&gt;
&lt;p&gt;此前，重启 &lt;code&gt;kubelet&lt;/code&gt; 服务往往会造成 Pod 状态的短暂波动。在重启期间，kubelet 会重置容器状态，
导致健康的 Pod 被标记为 &lt;code&gt;NotReady&lt;/code&gt; 并从负载均衡器中移除，即便应用本身仍在正常运行。&lt;/p&gt;
&lt;!--
To address this reliability issue, this behavior has been corrected to ensure seamless node maintenance. The `kubelet` now properly restores the state of existing containers from the runtime upon startup. This ensures that your workloads remain `Ready` and traffic continues to flow uninterrupted during `kubelet` restarts or upgrades.
--&gt;
&lt;p&gt;为解决这一可靠性问题，该行为已被修正，以确保节点维护更平滑。
&lt;code&gt;kubelet&lt;/code&gt; 现在会在启动时从运行时中正确恢复现有容器状态，
确保你的工作负载保持 &lt;code&gt;Ready&lt;/code&gt;，并使流量在 &lt;code&gt;kubelet&lt;/code&gt; 重启或升级期间持续不中断。&lt;/p&gt;
&lt;!--
You can find more in [KEP-4781: Fix inconsistent container ready state after kubelet restart](https://kep.k8s.io/4781).
--&gt;
&lt;p&gt;更多信息请参阅
&lt;a href=&#34;https://kep.k8s.io/4781&#34;&gt;KEP-4781：修复 kubelet 重启后容器就绪状态不一致问题&lt;/a&gt;。&lt;/p&gt;
&lt;!--
## Release notes
--&gt;
&lt;h2 id=&#34;release-notes&#34;&gt;发布说明  &lt;/h2&gt;
&lt;!--
Check out the full details of the Kubernetes v1.35 release in our [release notes](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.35.md).
--&gt;
&lt;p&gt;请在我们的&lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.35.md&#34;&gt;发布说明&lt;/a&gt;
中查看 Kubernetes v1.35 发布的完整细节。&lt;/p&gt;
&lt;!--
## Availability
--&gt;
&lt;h2 id=&#34;availability&#34;&gt;可用性  &lt;/h2&gt;
&lt;!--
Kubernetes v1.35 is available for download on [GitHub](https://github.com/kubernetes/kubernetes/releases/tag/v1.35.0) or on the [Kubernetes download page](/releases/download/).
--&gt;
&lt;p&gt;Kubernetes v1.35 可通过&lt;a href=&#34;https://github.com/kubernetes/kubernetes/releases/tag/v1.35.0&#34;&gt;GitHub&lt;/a&gt;
或 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/releases/download/&#34;&gt;Kubernetes 下载页面&lt;/a&gt;获取。&lt;/p&gt;
&lt;!--
To get started with Kubernetes, check out these [interactive tutorials](/docs/tutorials/) or run local Kubernetes clusters using [minikube](https://minikube.sigs.k8s.io/). You can also easily install v1.35 using [kubeadm](/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/).
--&gt;
&lt;p&gt;要开始使用 Kubernetes，请查看这些&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tutorials/&#34;&gt;交互式教程&lt;/a&gt;，
或使用 &lt;a href=&#34;https://minikube.sigs.k8s.io/&#34;&gt;minikube&lt;/a&gt; 在本地运行 Kubernetes 集群。
你也可以使用 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/&#34;&gt;kubeadm&lt;/a&gt;
轻松安装 v1.35。&lt;/p&gt;
&lt;!--
## Release team
--&gt;
&lt;h2 id=&#34;release-team&#34;&gt;发布团队  &lt;/h2&gt;
&lt;!--
Kubernetes is only possible with the support, commitment, and hard work of its community. Each release team is made up of dedicated community volunteers who work together to build the many pieces that make up the Kubernetes releases you rely on. This requires the specialized skills of people from all corners of our community, from the code itself to its documentation and project management.
--&gt;
&lt;p&gt;Kubernetes 之所以成为可能，离不开社区的支持、承诺与辛勤付出。
每个发布团队由一群投入的社区志愿者组成，他们一起构建你所依赖的 Kubernetes 发布版本的诸多部分。
这需要来自社区各个角落的专业能力：从代码本身到文档与项目管理。&lt;/p&gt;
&lt;!--
[We honor the memory of Han Kang](https://github.com/cncf/memorials/blob/main/han-kang.md), a long-time contributor and respected engineer whose technical excellence and infectious enthusiasm left a lasting impact on the Kubernetes community. Han was a significant force within SIG Instrumentation and SIG API Machinery, earning a [2021 Kubernetes Contributor Award](https://www.kubernetes.dev/community/awards/2021/) for his critical work and sustained commitment to the project&#39;s core stability. Beyond his technical contributions, Han was deeply admired for his generosity as a mentor and his passion for building connections among people. He was known for &#34;opening doors&#34; for others, whether guiding new contributors through their first pull requests or supporting colleagues with patience and kindness. Han’s legacy lives on through the engineers he inspired, the robust systems he helped build, and the warm, collaborative spirit he fostered within the cloud native ecosystem.
--&gt;
&lt;p&gt;我们在此缅怀&lt;a href=&#34;https://github.com/cncf/memorials/blob/main/han-kang.md&#34;&gt;Han Kang&lt;/a&gt;
——一位长期贡献者与备受尊敬的工程师，他的技术卓越与感染力十足的热情，
为 Kubernetes 社区留下了深远影响。Han 是 SIG Instrumentation 与 SIG API Machinery 中的重要力量，
并因其关键工作与对项目核心稳定性的持续投入，
获得了&lt;a href=&#34;https://www.kubernetes.dev/community/awards/2021/&#34;&gt;2021 Kubernetes Contributor Award&lt;/a&gt;。
除技术贡献之外，Han 也因其作为导师的慷慨与联结人们的热情而广受敬重。
他以“为他人打开大门”而闻名——无论是带领新贡献者完成第一次 PR，
还是以耐心与善意支持同事。Han 的遗产将通过他所激励的工程师、他参与构建的健壮系统，
以及他在云原生生态中所塑造的温暖协作精神延续下去。&lt;/p&gt;
&lt;!--
We would like to thank the entire [Release Team](https://github.com/kubernetes/sig-release/blob/master/releases/release-1.35/release-team.md) for the hours spent hard at work to deliver the Kubernetes v1.35 release to our community. The Release Team&#39;s membership ranges from first-time shadows to returning team leads with experience forged over several release cycles. We are incredibly grateful to our Release Lead, [Drew Hagen](https://github.com/drewhagen), whose hands-on guidance and vibrant energy not only navigated us through complex challenges but also fueled the community spirit behind this successful release.
--&gt;
&lt;p&gt;我们感谢整个&lt;a href=&#34;https://github.com/kubernetes/sig-release/blob/master/releases/release-1.35/release-team.md&#34;&gt;发布团队&lt;/a&gt;
为向社区交付 Kubernetes v1.35 所付出的辛勤时间。
发布团队成员既有第一次参与的 shadow，也有历经多轮发布周期、经验丰富的回归 team lead。
我们尤其感谢发布负责人&lt;a href=&#34;https://github.com/drewhagen&#34;&gt;Drew Hagen&lt;/a&gt;：
他既以务实指导带我们穿越复杂挑战，也以充沛能量点燃了这次成功发布背后的社区精神。&lt;/p&gt;
&lt;!--
## Project velocity
--&gt;
&lt;h2 id=&#34;project-velocity&#34;&gt;项目活跃度  &lt;/h2&gt;
&lt;!--
The CNCF K8s [DevStats](https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&amp;var-period=m&amp;var-repogroup_name=All) project aggregates a number of interesting data points related to the velocity of Kubernetes and various sub-projects. This includes everything from individual contributions to the number of companies that are contributing and is an illustration of the depth and breadth of effort that goes into evolving this ecosystem.
--&gt;
&lt;p&gt;CNCF K8s 的&lt;a href=&#34;https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&amp;var-period=m&amp;var-repogroup_name=All&#34;&gt;DevStats&lt;/a&gt;
项目汇总了与 Kubernetes 及其各子项目活跃度相关的一系列有趣数据点。
这些数据涵盖从个人贡献到参与贡献公司的数量等多个方面，
体现了推动该生态演进所投入努力的深度与广度。&lt;/p&gt;
&lt;!--
During the v1.35 release cycle, which spanned 14 weeks from 15th September 2025 to 17th December 2025, Kubernetes received contributions from as many as 85 different companies and 419 individuals. In the wider cloud native ecosystem, the figure goes up to 281 companies, counting 1769 total contributors.
--&gt;
&lt;p&gt;在 v1.35 发布周期（从 2025 年 9 月 15 日到 2025 年 12 月 17 日，共 14 周）期间，
Kubernetes 收到了来自多达 85 家公司与 419 名个人的贡献。
在更广泛的云原生生态中，这一数字上升到 281 家公司，共计 1769 名贡献者。&lt;/p&gt;
&lt;!--
Note that &#34;contribution&#34; counts when someone makes a commit, code review, comment, creates an issue or PR, reviews a PR (including blogs and documentation) or comments on issues and PRs.  
If you are interested in contributing, visit [Getting Started](https://www.kubernetes.dev/docs/guide/#getting-started) on our contributor website.
--&gt;
&lt;p&gt;请注意，这里的“贡献”统计包括：提交 commit、进行代码评审、发表评论、创建 Issue 或 PR、
评审 PR（包括博客与文档）以及对 Issue 与 PR 的评论等。&lt;br&gt;
如果你有兴趣参与贡献，请访问贡献者网站上的&lt;a href=&#34;https://www.kubernetes.dev/docs/guide/#getting-started&#34;&gt;Getting Started&lt;/a&gt;。&lt;/p&gt;
&lt;!--
Sources for this data:
--&gt;
&lt;p&gt;数据来源：&lt;/p&gt;
&lt;!--
* [Companies contributing to Kubernetes](https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&amp;from=1757890800000&amp;to=1765929599000&amp;var-period=d28&amp;var-repogroup_name=Kubernetes&amp;var-repo_name=kubernetes%2Fkubernetes)  
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&amp;from=1757890800000&amp;to=1765929599000&amp;var-period=d28&amp;var-repogroup_name=Kubernetes&amp;var-repo_name=kubernetes%2Fkubernetes&#34;&gt;贡献 Kubernetes 的公司&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* [Overall ecosystem contributions](https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&amp;from=1757890800000&amp;to=1765929599000&amp;var-period=d28&amp;var-repogroup_name=All&amp;var-repo_name=kubernetes%2Fkubernetes)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&amp;from=1757890800000&amp;to=1765929599000&amp;var-period=d28&amp;var-repogroup_name=All&amp;var-repo_name=kubernetes%2Fkubernetes&#34;&gt;整体生态的贡献&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Events update
--&gt;
&lt;h2 id=&#34;events-update&#34;&gt;活动更新  &lt;/h2&gt;
&lt;!--
Explore upcoming Kubernetes and cloud native events, including KubeCon \+ CloudNativeCon, KCD, and other notable conferences worldwide. Stay informed and get involved with the Kubernetes community\!
--&gt;
&lt;p&gt;了解即将到来的 Kubernetes 与云原生活动，包括 KubeCon + CloudNativeCon、KCD 与全球其他重要会议。
保持关注并参与 Kubernetes 社区！&lt;/p&gt;
&lt;!--
**February 2026**
--&gt;
&lt;p&gt;&lt;strong&gt;2026 年 2 月&lt;/strong&gt;&lt;/p&gt;
&lt;!--
- [**KCD - Kubernetes Community Days:  New Delhi**](https://www.kcddelhi.com/index.html): Feb 21, 2026 | New Delhi, India
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.kcddelhi.com/index.html&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days:  New Delhi&lt;/strong&gt;&lt;/a&gt;：2026 年 2 月 21 日｜印度 New Delhi&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- [**KCD - Kubernetes Community Days:  Guadalajara**](https://community.cncf.io/events/details/cncf-kcd-guadalajara-presents-kcd-guadalajara-open-source-contributor-summit/cohost-kcd-guadalajara): Feb 23, 2026 | Guadalajara, Mexico
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/events/details/cncf-kcd-guadalajara-presents-kcd-guadalajara-open-source-contributor-summit/cohost-kcd-guadalajara&#34;&gt;&lt;strong&gt;KCD：Guadalajara&lt;/strong&gt;&lt;/a&gt;：2026 年 2 月 23 日｜墨西哥 Guadalajara&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
**March 2026**
--&gt;
&lt;p&gt;&lt;strong&gt;2026 年 3 月&lt;/strong&gt;&lt;/p&gt;
&lt;!--
- [**KubeCon + CloudNativeCon Europe 2026**](https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/): Mar 23-26, 2026 | Amsterdam, Netherlands
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/&#34;&gt;&lt;strong&gt;KubeCon + CloudNativeCon Europe 2026&lt;/strong&gt;&lt;/a&gt;：2026 年 3 月 23-26 日｜荷兰 Amsterdam&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
**May 2026**
--&gt;
&lt;p&gt;&lt;strong&gt;2026 年 5 月&lt;/strong&gt;&lt;/p&gt;
&lt;!--
- [**KCD - Kubernetes Community Days:  Toronto**](https://community.cncf.io/events/details/cncf-kcd-toronto-presents-kcd-toronto-canada-2026/): May 13, 2026 | Toronto, Canada
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/events/details/cncf-kcd-toronto-presents-kcd-toronto-canada-2026/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days:  Toronto&lt;/strong&gt;&lt;/a&gt;：2026 年 5 月 13 日｜加拿大 Toronto&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- [**KCD - Kubernetes Community Days:  Helsinki**](https://cloudnativefinland.org/kcd-helsinki-2026/): May 20, 2026 | Helsinki, Finland
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://cloudnativefinland.org/kcd-helsinki-2026/&#34;&gt;&lt;strong&gt;KCD - Kubernetes Community Days:  Helsinki&lt;/strong&gt;&lt;/a&gt;：2026 年 5 月 20 日｜芬兰 Helsinki&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
**June 2026**
--&gt;
&lt;p&gt;&lt;strong&gt;2026 年 6 月&lt;/strong&gt;&lt;/p&gt;
&lt;!--
- [**KubeCon + CloudNativeCon India 2026**](https://events.linuxfoundation.org/kubecon-cloudnativecon-india/): Jun 18-19, 2026 | Mumbai, India
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://events.linuxfoundation.org/kubecon-cloudnativecon-india/&#34;&gt;&lt;strong&gt;KubeCon + CloudNativeCon India 2026&lt;/strong&gt;&lt;/a&gt;：2026 年 6 月 18-19 日｜印度 Mumbai&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- [**KCD - Kubernetes Community Days:  Kuala Lumpur**](https://community.cncf.io/kcd-kuala-lumpur-2026/): Jun 27, 2026 | Kuala Lumpur, Malaysia
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://community.cncf.io/kcd-kuala-lumpur-2026/&#34;&gt;&lt;strong&gt;KCD：Kuala Lumpur&lt;/strong&gt;&lt;/a&gt;：2026 年 6 月 27 日｜马来西亚 Kuala Lumpur&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
**July 2026**
--&gt;
&lt;p&gt;&lt;strong&gt;2026 年 7 月&lt;/strong&gt;&lt;/p&gt;
&lt;!--
- [**KubeCon + CloudNativeCon Japan 2026**](https://events.linuxfoundation.org/kubecon-cloudnativecon-japan/): Jul 29-30, 2026 | Yokohama, Japan
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://events.linuxfoundation.org/kubecon-cloudnativecon-japan/&#34;&gt;&lt;strong&gt;KubeCon + CloudNativeCon Japan 2026&lt;/strong&gt;&lt;/a&gt;：2026 年 7 月 29-30 日｜日本 Yokohama&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
You can find the latest event details [here](https://community.cncf.io/events/#/list).
--&gt;
&lt;p&gt;你可以在&lt;a href=&#34;https://community.cncf.io/events/#/list&#34;&gt;此处&lt;/a&gt;查看最新活动详情。&lt;/p&gt;
&lt;!--
## Upcoming release webinar
--&gt;
&lt;h2 id=&#34;upcoming-release-webinar&#34;&gt;即将举行的发布网络研讨会  &lt;/h2&gt;
&lt;!--
Join members of the Kubernetes v1.35 Release Team on **Wednesday, January 14, 2026, at 5:00 PM (UTC)** to learn about the release highlights of this release. For more information and registration, visit the [event page](https://community.cncf.io/events/details/cncf-cncf-online-programs-presents-cloud-native-live-kubernetes-v135-release/) on the CNCF Online Programs site.
--&gt;
&lt;p&gt;欢迎在 &lt;strong&gt;2026 年 1 月 14 日（星期三）17:00（UTC）&lt;/strong&gt; 与 Kubernetes v1.35 发布团队成员一起，
了解本次发布的重点亮点。有关更多信息与注册方式，请访问 CNCF Online Programs 网站上的&lt;a href=&#34;https://community.cncf.io/events/details/cncf-cncf-online-programs-presents-cloud-native-live-kubernetes-v135-release/&#34;&gt;活动页面&lt;/a&gt;。&lt;/p&gt;
&lt;!--
## Get involved
--&gt;
&lt;h2 id=&#34;get-involved&#34;&gt;参与其中  &lt;/h2&gt;
&lt;!--
The simplest way to get involved with Kubernetes is by joining one of the many [Special Interest Groups](https://github.com/kubernetes/community/blob/master/sig-list.md) (SIGs) that align with your interests. Have something you’d like to broadcast to the Kubernetes community? Share your voice at our weekly [community meeting](https://github.com/kubernetes/community/tree/master/communication), and through the channels below. Thank you for your continued feedback and support.
--&gt;
&lt;p&gt;参与 Kubernetes 最简单的方式之一，是加入与你兴趣相符的众多&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-list.md&#34;&gt;特别兴趣小组（Special Interest Groups，SIG）&lt;/a&gt;
之一。你想向 Kubernetes 社区发布一些内容吗？欢迎在我们每周的&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/communication&#34;&gt;社区会议&lt;/a&gt;
上发声，也可以通过以下渠道参与交流。感谢你持续的反馈与支持。&lt;/p&gt;
&lt;!--
* Follow us on Bluesky [@Kubernetesio](https://bsky.app/profile/kubernetes.io) for the latest updates
--&gt;
&lt;ul&gt;
&lt;li&gt;在 Bluesky 关注我们：&lt;a href=&#34;https://bsky.app/profile/kubernetes.io&#34;&gt;@Kubernetesio&lt;/a&gt;，获取最新动态&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* Join the community discussion on [Discuss](https://discuss.kubernetes.io/)
--&gt;
&lt;ul&gt;
&lt;li&gt;在 &lt;a href=&#34;https://discuss.kubernetes.io/&#34;&gt;Discuss&lt;/a&gt; 加入社区讨论&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* Join the community on [Slack](http://slack.k8s.io/)
--&gt;
&lt;ul&gt;
&lt;li&gt;在 &lt;a href=&#34;http://slack.k8s.io/&#34;&gt;Slack&lt;/a&gt; 加入社区&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* Post questions (or answer questions) on [Stack Overflow](http://stackoverflow.com/questions/tagged/kubernetes)
--&gt;
&lt;ul&gt;
&lt;li&gt;在 &lt;a href=&#34;http://stackoverflow.com/questions/tagged/kubernetes&#34;&gt;Stack Overflow&lt;/a&gt; 提问（或解答问题）&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* Share your Kubernetes [story](https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform)
--&gt;
&lt;ul&gt;
&lt;li&gt;分享你的 Kubernetes &lt;a href=&#34;https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform&#34;&gt;故事&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* Read more about what’s happening with Kubernetes on the [blog](https://kubernetes.io/blog/)
--&gt;
&lt;ul&gt;
&lt;li&gt;在&lt;a href=&#34;https://kubernetes.io/blog/&#34;&gt;博客&lt;/a&gt;阅读 Kubernetes 的更多动态&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* Learn more about the [Kubernetes Release Team](https://github.com/kubernetes/sig-release/tree/master/release-team)
--&gt;
&lt;ul&gt;
&lt;li&gt;了解更多关于 &lt;a href=&#34;https://github.com/kubernetes/sig-release/tree/master/release-team&#34;&gt;Kubernetes 发布团队&lt;/a&gt;的信息&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.35 抢先一览</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/11/26/kubernetes-v1-35-sneak-peek/</link>
      <pubDate>Wed, 26 Nov 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/11/26/kubernetes-v1-35-sneak-peek/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#39;Kubernetes v1.35 Sneak Peek&#39;
date: 2025-11-26
slug: kubernetes-v1-35-sneak-peek
author: &gt;
  Aakanksha Bhende,
  Arujjwal Negi,
  Chad M. Crowell,
  Graziano Casto,
  Swathi Rao
--&gt;
&lt;!--
As the release of Kubernetes v1.35 approaches, the Kubernetes project continues to evolve.
Features may be deprecated, removed, or replaced to improve the project&#39;s overall health.
This blog post outlines planned changes for the v1.35 release that the release team believes
you should be aware of to ensure the continued smooth operation of your Kubernetes cluster(s),
and to keep you up to date with the latest developments.
The information below is based on the current status of the v1.35 release
and is subject to change before the final release date.
--&gt;
&lt;p&gt;随着 Kubernetes v1.35 发布的临近，Kubernetes 项目持续演进。
为了改善项目的整体健康状况，某些功能可能会被弃用、移除或替换。
本博客文章概述了 v1.35 版本的计划变更，
发布团队认为你应该了解这些变更，以确保 Kubernetes 集群的持续平稳运行，
并让你了解最新进展。
以下信息基于 v1.35 版本的当前状态，在最终发布日期之前可能会发生变化。&lt;/p&gt;
&lt;!--
## Deprecations and removals for Kubernetes v1.35
--&gt;
&lt;h2 id=&#34;deprecations-and-removals-for-kubernetes-v1-35&#34;&gt;Kubernetes v1.35 的弃用和移除&lt;/h2&gt;
&lt;!--
### cgroup v1 support
--&gt;
&lt;h3 id=&#34;cgroup-v1-support&#34;&gt;cgroup v1 支持&lt;/h3&gt;
&lt;!--
On Linux nodes, container runtimes typically rely on cgroups (short for &#34;control groups&#34;).
Support for using cgroup v2 has been stable in Kubernetes since v1.25,
providing an alternative to the original v1 cgroup support.
While cgroup v1 provided the initial resource control mechanism,
it suffered from well-known inconsistencies and limitations.
Adding support for cgroup v2 allowed use of a unified control group hierarchy,
improved resource isolation, and served as the foundation for modern features,
making legacy cgroup v1 support ready for removal.
The removal of cgroup v1 support will only impact cluster administrators
running nodes on older Linux distributions that do not support cgroup v2;
on those nodes, the `kubelet` will fail to start.
Administrators must migrate their nodes to systems with cgroup v2 enabled.
More details on compatibility requirements will be available in a blog post
soon after the v1.35 release.
--&gt;
&lt;p&gt;在 Linux 节点上，容器运行时通常依赖于 cgroups（&amp;quot;control groups&amp;quot; 的缩写）。
自 v1.25 以来，Kubernetes 中对 cgroup v2 的支持已经稳定，
为原有的 v1 cgroup 支持提供了替代方案。
虽然 cgroup v1 提供了初始的资源控制机制，
但它存在众所周知的不一致性和局限性。
添加对 cgroup v2 的支持允许使用统一的控制组层次结构，
改善了资源隔离，并为现代功能奠定了基础，
使得传统的 cgroup v1 支持可以准备移除。
移除 cgroup v1 支持只会影响在不支持 cgroup v2 的旧版 Linux 发行版上运行节点的集群管理员；
在这些节点上，&lt;code&gt;kubelet&lt;/code&gt; 将无法启动。
管理员必须将其节点迁移到启用了 cgroup v2 的系统。
关于兼容性要求的更多详细信息将在 v1.35 发布后不久在博客文章中提供。&lt;/p&gt;
&lt;!--
To learn more, read [about cgroup v2](/docs/concepts/architecture/cgroups/);
you can also track the switchover work via [KEP-5573: Remove cgroup v1 support](https://kep.k8s.io/5573).
--&gt;
&lt;p&gt;要了解更多信息，请阅读&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/architecture/cgroups/&#34;&gt;关于 cgroup v2&lt;/a&gt;；
你也可以通过 &lt;a href=&#34;https://kep.k8s.io/5573&#34;&gt;KEP-5573：移除 cgroup v1 支持&lt;/a&gt; 跟踪切换工作。&lt;/p&gt;
&lt;!--
### Deprecation of ipvs mode in kube-proxy
--&gt;
&lt;h3 id=&#34;deprecation-of-ipvs-mode-in-kube-proxy&#34;&gt;kube-proxy 中 ipvs 模式的弃用&lt;/h3&gt;
&lt;!--
Many releases ago, the Kubernetes project implemented an [ipvs](/docs/reference/networking/virtual-ips/#proxy-mode-ipvs)
mode in `kube-proxy`.
It was adopted as a way to provide high-performance service load balancing,
with better performance than the existing `iptables` mode.
However, maintaining feature parity between ipvs and other kube-proxy modes
became difficult, due to technical complexity and diverging requirements.
This created significant technical debt and made the ipvs backend impractical
to support alongside newer networking capabilities.
--&gt;
&lt;p&gt;许多版本之前，Kubernetes 项目在 &lt;code&gt;kube-proxy&lt;/code&gt; 中实现了
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/networking/virtual-ips/#proxy-mode-ipvs&#34;&gt;ipvs&lt;/a&gt; 模式。
它被采用作为一种提供高性能服务负载均衡的方式，
性能优于现有的 &lt;code&gt;iptables&lt;/code&gt; 模式。
然而，由于技术复杂性和需求分歧，
在 ipvs 和其他 kube-proxy 模式之间保持功能对等变得困难。
这造成了重大的技术债务，并使 ipvs 后端难以与更新的网络功能一起支持。&lt;/p&gt;
&lt;!--
The Kubernetes project intends to deprecate kube-proxy `ipvs` mode in the v1.35 release,
to streamline the `kube-proxy` codebase.
For Linux nodes, the recommended `kube-proxy` mode is already [nftables](/docs/reference/networking/virtual-ips/#proxy-mode-nftables).
--&gt;
&lt;p&gt;Kubernetes 项目计划在 v1.35 版本中弃用 kube-proxy &lt;code&gt;ipvs&lt;/code&gt; 模式，
以简化 &lt;code&gt;kube-proxy&lt;/code&gt; 代码库。
对于 Linux 节点，推荐的 &lt;code&gt;kube-proxy&lt;/code&gt; 模式已经是
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/networking/virtual-ips/#proxy-mode-nftables&#34;&gt;nftables&lt;/a&gt;。&lt;/p&gt;
&lt;!--
You can find more in [KEP-5495: Deprecate ipvs mode in kube-proxy](https://kep.k8s.io/5495)
--&gt;
&lt;p&gt;你可以在 &lt;a href=&#34;https://kep.k8s.io/5495&#34;&gt;KEP-5495：弃用 kube-proxy 中的 ipvs 模式&lt;/a&gt; 中找到更多信息。&lt;/p&gt;
&lt;!--
### Kubernetes is deprecating containerd v1.y support
--&gt;
&lt;h3 id=&#34;kubernetes-is-deprecating-containerd-v1-y-support&#34;&gt;Kubernetes 正在弃用 containerd v1.y 支持&lt;/h3&gt;
&lt;!--
While Kubernetes v1.35 still supports containerd 1.7 and other LTS releases of containerd,
as a consequence of automated cgroup driver detection,
the Kubernetes SIG Node community has formally agreed upon a final support timeline
for containerd v1.X.
Kubernetes v1.35 is the last release to offer this support (aligned with containerd 1.7 EOL).
--&gt;
&lt;p&gt;虽然 Kubernetes v1.35 仍然支持 containerd 1.7 和其他 containerd LTS 版本，
但由于自动化的 cgroup 驱动程序检测，
Kubernetes SIG Node 社区已正式商定了 containerd v1.X 的最终支持时间表。
Kubernetes v1.35 是提供此支持的最后一个版本（与 containerd 1.7 EOL 对齐）。&lt;/p&gt;
&lt;!--
This is a final warning that if you are using containerd 1.X,
you must switch to 2.0 or later before upgrading Kubernetes to the next version.
You are able to monitor the `kubelet_cri_losing_support` metric to determine
if any nodes in your cluster are using a containerd version that will soon be unsupported.
--&gt;
&lt;p&gt;这是最终警告：如果你正在使用 containerd 1.X，
必须在将 Kubernetes 升级到下一个版本之前切换到 2.0 或更高版本。
你可以监控 &lt;code&gt;kubelet_cri_losing_support&lt;/code&gt; 指标来确定
集群中的任何节点是否正在使用即将不受支持的 containerd 版本。&lt;/p&gt;
&lt;!--
You can find more in the [official blog post](/blog/2025/09/12/kubernetes-v1-34-cri-cgroup-driver-lookup-now-ga/#announcement-kubernetes-is-deprecating-containerd-v1-y-support)
or in [KEP-4033: Discover cgroup driver from CRI](https://kep.k8s.io/4033)
--&gt;
&lt;p&gt;你可以在&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2025/09/12/kubernetes-v1-34-cri-cgroup-driver-lookup-now-ga/#announcement-kubernetes-is-deprecating-containerd-v1-y-support&#34;&gt;官方博客文章&lt;/a&gt;
或 &lt;a href=&#34;https://kep.k8s.io/4033&#34;&gt;KEP-4033：从 CRI 发现 cgroup 驱动程序&lt;/a&gt; 中找到更多信息。&lt;/p&gt;
&lt;!--
## Featured enhancements of Kubernetes v1.35
--&gt;
&lt;h2 id=&#34;featured-enhancements-of-kubernetes-v1-35&#34;&gt;Kubernetes v1.35 的重点增强功能&lt;/h2&gt;
&lt;!--
The following enhancements are some of those likely to be included in the v1.35 release.
This is not a commitment, and the release content is subject to change.
--&gt;
&lt;p&gt;以下增强功能是可能包含在 v1.35 版本中的部分功能。
这不是承诺，发布内容可能会发生变化。&lt;/p&gt;
&lt;!--
### Node declared features
--&gt;
&lt;h3 id=&#34;node-declared-features&#34;&gt;节点声明式特性&lt;/h3&gt;
&lt;!--
When scheduling Pods, Kubernetes uses node labels, taints, and tolerations
to match workload requirements with node capabilities.
However, managing feature compatibility becomes challenging during cluster upgrades
due to version skew between the control plane and nodes.
This can lead to Pods being scheduled on nodes that lack required features,
resulting in runtime failures.
--&gt;
&lt;p&gt;在调度 Pod 时，Kubernetes 使用节点标签、污点和容忍度
来匹配工作负载需求与节点能力。
然而，由于控制平面和节点之间的版本偏移，
在集群升级期间管理功能兼容性变得具有挑战性。
这可能导致 Pod 被调度到缺少所需功能的节点上，从而导致运行时失败。&lt;/p&gt;
&lt;!--
The _node declared features_ framework will introduce a standard mechanism
for nodes to declare their supported Kubernetes features.
With the new alpha feature enabled, a Node reports the features it can support,
publishing this information to the control plane through a new `.status.declaredFeatures` field.
Then, the `kube-scheduler`, admission controllers and third-party components
can use these declarations.
For example, you can enforce scheduling and API validation constraints,
ensuring that Pods run only on compatible nodes.
--&gt;
&lt;p&gt;**节点声明式特性（Node Declared Features）**框架将引入一种标准机制，
让节点声明其所支持的 Kubernetes 特性。
启用这一新的 Alpha 特性后，节点会报告其可以支持的特性，
通过新的 &lt;code&gt;.status.declaredFeatures&lt;/code&gt; 字段将此信息发布到控制平面。
然后，&lt;code&gt;kube-scheduler&lt;/code&gt;、准入控制器和第三方组件可以使用这些声明。
例如，你可以强制执行调度和 API 验证约束，
确保 Pod 仅在兼容的节点上运行。&lt;/p&gt;
&lt;!--
This approach reduces manual node labeling, improves scheduling accuracy,
and prevents incompatible pod placements proactively.
It also integrates with the Cluster Autoscaler for informed scale-up decisions.
Feature declarations are temporary and tied to Kubernetes feature gates,
enabling safe rollout and cleanup.
--&gt;
&lt;p&gt;这种方法可以减少手动为节点打标签的操作，提高调度准确性，
并主动防止不兼容的 Pod 放置。
它还与集群自动扩缩器（Cluster Autoscaler）集成，以便做出明智的扩容决策。
特性声明是临时性的，并与 Kubernetes 特性门控绑定，
从而实现安全的推出和清理。&lt;/p&gt;
&lt;!--
Targeting alpha in v1.35, _node declared features_ aims to solve version skew
scheduling issues by making node capabilities explicit,
enhancing reliability and cluster stability in heterogeneous version environments.
--&gt;
&lt;p&gt;目标是在 v1.35 中达到 Alpha 阶段，&lt;strong&gt;节点声明式特性&lt;/strong&gt;旨在通过明确节点能力
来解决版本偏移调度问题，在异构版本环境中增强可靠性和集群稳定性。&lt;/p&gt;
&lt;!--
To learn more about this before the official documentation is published,
you can read [KEP-5328](https://kep.k8s.io/5328).
--&gt;
&lt;p&gt;在官方文档发布之前了解更多信息，你可以阅读 &lt;a href=&#34;https://kep.k8s.io/5328&#34;&gt;KEP-5328&lt;/a&gt;。&lt;/p&gt;
&lt;!--
### In-place update of Pod resources
--&gt;
&lt;h3 id=&#34;in-place-update-of-pod-resources&#34;&gt;Pod 资源的原地更新&lt;/h3&gt;
&lt;!--
Kubernetes is graduating in-place updates for Pod resources to General Availability (GA).
This feature allows users to adjust `cpu` and `memory` resources
without restarting Pods or Containers.
Previously, such modifications required recreating Pods,
which could disrupt workloads, particularly for stateful or batch applications.
--&gt;
&lt;p&gt;Kubernetes 正在将 Pod 资源的原地更新提升到正式发布（GA）状态。
此特性允许用户在不重启 Pod 或容器的情况下调整 &lt;code&gt;cpu&lt;/code&gt; 和 &lt;code&gt;memory&lt;/code&gt; 资源。
以前，此类修改需要重新创建 Pod，这可能会中断工作负载，
特别是对于有状态或批处理应用程序。&lt;/p&gt;
&lt;!--
Previous Kubernetes releases already allowed you to change infrastructure resources settings
(requests and limits) for existing Pods.
This allows for smoother [vertical scaling](/docs/concepts/workloads/autoscaling/vertical-pod-autoscale/),
improves efficiency, and can also simplify solution development.
--&gt;
&lt;p&gt;之前的 Kubernetes 版本已经允许你更改现有 Pod 的基础设施资源设置（requests 和 limits）。
这允许更平滑的&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/docs/concepts/workloads/autoscaling/vertical-pod-autoscale/&#34;&gt;垂直扩缩容&lt;/a&gt;，
提高效率，还可以简化解决方案开发。&lt;/p&gt;
&lt;!--
The Container Runtime Interface (CRI) has also been improved,
extending the `UpdateContainerResources` API for Windows and future runtimes
while allowing `ContainerStatus` to report real-time resource configurations.
Together, these changes make scaling in Kubernetes faster, more flexible, and disruption-free.
The feature was introduced as alpha in v1.27, graduated to beta in v1.33,
and is targeting graduation to stable in v1.35.
--&gt;
&lt;p&gt;容器运行时接口（CRI）也得到了改进，
为 Windows 和未来的运行时扩展了 &lt;code&gt;UpdateContainerResources&lt;/code&gt; API，
同时允许 &lt;code&gt;ContainerStatus&lt;/code&gt; 报告实时的资源配置情况。
这些更改一起使 Kubernetes 中的扩缩容更快、更灵活且无中断。
此特性在 v1.27 中作为 Alpha 特性引入，在 v1.33 中升级到 Beta，
并且计划在 v1.35 中升级到稳定状态。&lt;/p&gt;
&lt;!--
You can find more in [KEP-1287: In-place Update of Pod Resources](https://kep.k8s.io/1287)
--&gt;
&lt;p&gt;你可以在 &lt;a href=&#34;https://kep.k8s.io/1287&#34;&gt;KEP-1287：Pod 资源的原地更新&lt;/a&gt; 中找到更多信息。&lt;/p&gt;
&lt;!--
### Pod certificates
--&gt;
&lt;h3 id=&#34;pod-certificates&#34;&gt;Pod 证书&lt;/h3&gt;
&lt;!--
When running microservices, Pods often require a strong cryptographic identity
to authenticate with each other using mutual TLS (mTLS).
While Kubernetes provides Service Account tokens,
these are designed for authenticating to the API server,
not for general-purpose workload identity.
--&gt;
&lt;p&gt;在运行微服务时，Pod 通常需要强加密身份，
以便使用双向 TLS（mTLS）相互进行身份认证。
虽然 Kubernetes 提供服务账号令牌，
但这些令牌设计用于向 API 服务器进行身份认证，
而不是用于通用工作负载身份。&lt;/p&gt;
&lt;!--
Before this enhancement, operators had to rely on complex, external projects
like SPIFFE/SPIRE or cert-manager to provision and rotate certificates for their workloads.
But what if you could issue a unique, short-lived certificate to your Pods natively and automatically?
KEP-4317 is designed to enable such native workload identity.
It opens up various possibilities for securing pod-to-pod communication
by allowing the `kubelet` to request and mount certificates for a Pod via a projected volume.
--&gt;
&lt;p&gt;在此增强之前，操作员必须依赖复杂的外部项目（如 SPIFFE/SPIRE 或 cert-manager）
来为其工作负载提供和轮换证书。
但是，如果你可以原生且自动地为 Pod 颁发唯一的短期证书呢？
KEP-4317 旨在启用这种原生工作负载身份。
它通过允许 &lt;code&gt;kubelet&lt;/code&gt; 通过投影卷为 Pod 请求和挂载证书，
为保护 Pod 到 Pod 的通信开辟了多种可能性。&lt;/p&gt;
&lt;!--
This provides a built-in mechanism for workload identity,
complete with automated certificate rotation,
significantly simplifying the setup of service meshes and other zero-trust network policies.
This feature was introduced as alpha in v1.34 and is targeting beta in v1.35.
--&gt;
&lt;p&gt;Pod 证书为工作负载身份提供了一种内置的机制，包括自动证书轮换，
显著简化了服务网格和其他零信任网络策略的设置。
该特性在 v1.34 中作为 Alpha 特性引入，目标是在 v1.35 中达到 Beta 阶段。&lt;/p&gt;
&lt;!--
You can find more in [KEP-4317: Pod Certificates](https://kep.k8s.io/4317)
--&gt;
&lt;p&gt;你可以在 &lt;a href=&#34;https://kep.k8s.io/4317&#34;&gt;KEP-4317：Pod 证书&lt;/a&gt; 中找到更多信息。&lt;/p&gt;
&lt;!--
### Numeric values for taints
--&gt;
&lt;h3 id=&#34;numeric-values-for-taints&#34;&gt;数值形式的污点&lt;/h3&gt;
&lt;!--
Kubernetes is enhancing [taints and tolerations](/docs/concepts/scheduling-eviction/taint-and-toleration/)
by adding numeric comparison operators, such as `Gt` (Greater Than) and `Lt` (Less Than).
--&gt;
&lt;p&gt;Kubernetes 正在通过添加数值比较运算符（如 &lt;code&gt;Gt&lt;/code&gt;（大于）和 &lt;code&gt;Lt&lt;/code&gt;（小于））
来增强&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/scheduling-eviction/taint-and-toleration/&#34;&gt;污点和容忍度&lt;/a&gt;。&lt;/p&gt;
&lt;!--
Previously, tolerations supported only exact (`Equal`) or existence (`Exists`) matches,
which were not suitable for numeric properties such as reliability SLAs.
--&gt;
&lt;p&gt;以前，容忍度仅支持精确（&lt;code&gt;Equal&lt;/code&gt;）或存在（&lt;code&gt;Exists&lt;/code&gt;）匹配，
这不适用于可靠性 SLA 等数值属性。&lt;/p&gt;
&lt;!--
With this change, a Pod can use a toleration to &#34;opt-in&#34; to nodes
that meet a specific numeric threshold.
For example, a Pod can require a Node with an SLA taint value greater than 950
(`operator: Gt`, `value: &#34;950&#34;`).
--&gt;
&lt;p&gt;通过此更改，Pod 可以使用容忍度来&amp;quot;选择&amp;quot;满足特定数值阈值的节点。
例如，Pod 可以要求 SLA 污点值大于 950 的节点（&lt;code&gt;operator: Gt&lt;/code&gt;，&lt;code&gt;value: &amp;quot;950&amp;quot;&lt;/code&gt;）。&lt;/p&gt;
&lt;!--
This approach is more powerful than Node Affinity because it supports the NoExecute effect,
allowing Pods to be automatically evicted if a node&#39;s numeric value
drops below the tolerated threshold.
--&gt;
&lt;p&gt;这种方法比节点亲和性更强大，因为它支持 NoExecute 效果，
如果节点的数值降至容忍阈值以下，允许自动驱逐 Pod。&lt;/p&gt;
&lt;!--
You can find more in [KEP-5471: Enable SLA-based Scheduling](https://kep.k8s.io/5471)
--&gt;
&lt;p&gt;你可以在 &lt;a href=&#34;https://kep.k8s.io/5471&#34;&gt;KEP-5471：启用基于 SLA 的调度&lt;/a&gt; 中找到更多信息。&lt;/p&gt;
&lt;!--
### User namespaces
--&gt;
&lt;h3 id=&#34;user-namespaces&#34;&gt;用户名字空间&lt;/h3&gt;
&lt;!--
When running Pods, you can use `securityContext` to drop privileges,
but containers inside the pod often still run as root (UID 0).
This simplicity poses a significant challenge,
as that container UID 0 maps directly to the host&#39;s root user.
--&gt;
&lt;p&gt;在运行 Pod 时，你可以使用 &lt;code&gt;securityContext&lt;/code&gt; 来去除特权，
但 Pod 内的容器通常仍以 root（UID 0）运行。
这种简单性带来了重大挑战，因为容器 UID 0 直接映射到主机的 root 用户。&lt;/p&gt;
&lt;!--
Before this enhancement, a container breakout vulnerability
could grant an attacker full root access to the node.
But what if you could dynamically remap the container&#39;s root user
to a safe, unprivileged user on the host?
KEP-127 specifically allows such native support for Linux User Namespaces.
It opens up various possibilities for pod security
by isolating container and host user/group IDs.
This allows a process to have root privileges (UID 0) within its namespace,
while running as a non-privileged, high-numbered UID on the host.
--&gt;
&lt;p&gt;在此增强之前，容器逃逸漏洞可能授予攻击者对节点的完全 root 访问权限。
但是，如果你可以将容器的 root 用户动态重新映射到主机上的安全、无特权用户呢？
KEP-127 专门为 Linux 用户名字空间提供原生支持。
它通过隔离容器和主机用户/组 ID 为 Pod 安全开辟了各种可能性。
这允许进程在其名字空间内拥有 root 权限（UID 0），
同时在主机上以非特权的高编号 UID 运行。&lt;/p&gt;
&lt;!--
Released as alpha in v1.25 and beta in v1.30,
this feature continues to progress through beta maturity,
paving the way for truly &#34;rootless&#34; containers
that drastically reduce the attack surface for a whole class of security vulnerabilities.
--&gt;
&lt;p&gt;该特性在 v1.25 中作为 Alpha 特性发布，并在 v1.30 中进阶到 Beta 阶段，
在 Beta 成熟度级别，此特性仍在进一步演化，
为真正的&amp;quot;无 root&amp;quot;容器铺平道路，
这些改进大大减少了一整类安全漏洞的攻击面。&lt;/p&gt;
&lt;!--
You can find more in [KEP-127: User Namespaces](https://kep.k8s.io/127)
--&gt;
&lt;p&gt;你可以在 &lt;a href=&#34;https://kep.k8s.io/127&#34;&gt;KEP-127：用户名字空间&lt;/a&gt; 中找到更多信息。&lt;/p&gt;
&lt;!--
### Support for mounting OCI images as volumes
--&gt;
&lt;h3 id=&#34;support-for-mounting-oci-images-as-volumes&#34;&gt;支持将 OCI 镜像挂载为卷&lt;/h3&gt;
&lt;!--
When provisioning a Pod, you often need to bundle data, binaries,
or configuration files for your containers.
Before this enhancement, people often included that kind of data
directly into the main container image,
or required a custom init container to download and unpack files into an `emptyDir`.
You can still take either of those approaches, of course.
--&gt;
&lt;p&gt;在配置 Pod 时，你经常需要为容器打包数据、二进制文件或配置文件。
在此增强之前，人们通常将此类数据直接包含在主容器镜像中，
或需要自定义 Init 容器将文件下载并解压到 &lt;code&gt;emptyDir&lt;/code&gt; 中。
当然，你仍然可以采用这两种方法中的任何一种。&lt;/p&gt;
&lt;!--
But what if you could populate a volume directly from a data-only artifact
in an OCI registry, just like pulling a container image?
Kubernetes v1.31 added support for the `image` volume type,
allowing Pods to pull and unpack OCI container image artifacts into a volume declaratively.
--&gt;
&lt;p&gt;但是，如果你可以直接使用 OCI 镜像库中的纯数据工件填充卷，
就像拉取容器镜像一样呢？
Kubernetes v1.31 添加了对 &lt;code&gt;image&lt;/code&gt; 卷类型的支持，
允许 Pod 以声明的方式将 OCI 容器镜像工件拉取并解压到卷中。&lt;/p&gt;
&lt;!--
This allows for seamless distribution of data, binaries, or ML models
using standard registry tooling,
completely decoupling data from the container image
and eliminating the need for complex init containers or startup scripts.
This volume type has been in beta since v1.33
and will likely be enabled by default in v1.35.
--&gt;
&lt;p&gt;这一特性使我们能够使用标准镜像库工具无缝分发数据、二进制文件或 ML 模型，
完全将数据与容器镜像解耦，并消除对复杂 Init 容器或启动脚本的需求。
此卷类型自 v1.33 以来一直处于 Beta 状态，并可能在 v1.35 中默认启用。&lt;/p&gt;
&lt;!--
You can try out the beta version of [`image` volumes](/docs/concepts/storage/volumes/#image),
or you can learn more about the plans from [KEP-4639: OCI Volume Source](https://kep.k8s.io/4639).
--&gt;
&lt;p&gt;你可以试用 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/storage/volumes/#image&#34;&gt;&lt;code&gt;image&lt;/code&gt; 卷&lt;/a&gt; 的 Beta 版本，
或者你可以从 &lt;a href=&#34;https://kep.k8s.io/4639&#34;&gt;KEP-4639：OCI 卷源&lt;/a&gt; 了解更多计划。&lt;/p&gt;
&lt;!--
## Want to know more?
--&gt;
&lt;h2 id=&#34;want-to-know-more&#34;&gt;想了解更多？&lt;/h2&gt;
&lt;!--
New features and deprecations are also announced in the Kubernetes release notes.
We will formally announce what&#39;s new in [Kubernetes v1.35](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.35.md)
as part of the CHANGELOG for that release.
--&gt;
&lt;p&gt;新特性和弃用也在 Kubernetes 发布说明中宣布。
我们将正式宣布 &lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.35.md&#34;&gt;Kubernetes v1.35&lt;/a&gt; 的新内容，
作为该版本 CHANGELOG 的一部分。&lt;/p&gt;
&lt;!--
The Kubernetes v1.35 release is planned for **December 17, 2025**. Stay tuned for updates!
--&gt;
&lt;p&gt;Kubernetes v1.35 版本计划于 &lt;strong&gt;2025 年 12 月 17 日&lt;/strong&gt;发布。请关注更新！&lt;/p&gt;
&lt;!--
You can also see the announcements of changes in the release notes for:
--&gt;
&lt;p&gt;你还可以在以下版本的发布说明中查看变更公告：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.34.md&#34;&gt;Kubernetes v1.34&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.33.md&#34;&gt;Kubernetes v1.33&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.32.md&#34;&gt;Kubernetes v1.32&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md&#34;&gt;Kubernetes v1.31&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md&#34;&gt;Kubernetes v1.30&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Get involved
--&gt;
&lt;h2 id=&#34;get-involved&#34;&gt;参与进来&lt;/h2&gt;
&lt;!--
The simplest way to get involved with Kubernetes is by joining one of the many
[Special Interest Groups](https://github.com/kubernetes/community/blob/master/sig-list.md) (SIGs)
that align with your interests.
Have something you&#39;d like to broadcast to the Kubernetes community?
Share your voice at our weekly [community meeting](https://github.com/kubernetes/community/tree/master/communication),
and through the channels below.
Thank you for your continued feedback and support.
--&gt;
&lt;p&gt;参与 Kubernetes 最简单的方法是加入众多&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-list.md&#34;&gt;特别兴趣小组&lt;/a&gt;（SIG）
中与你兴趣相符的一个。有什么想向 Kubernetes 社区广播的内容吗？
在我们的每周&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/communication&#34;&gt;社区会议&lt;/a&gt;上
以及通过下面的渠道分享你的声音。感谢你持续的反馈和支持。&lt;/p&gt;
&lt;!--
- Follow us on Bluesky [@kubernetes.io](https://bsky.app/profile/kubernetes.io) for the latest updates
- Join the community discussion on [Discuss](https://discuss.kubernetes.io/)
- Join the community on [Slack](http://slack.k8s.io/)
- Post questions (or answer questions) on [Server Fault](https://serverfault.com/questions/tagged/kubernetes) or [Stack Overflow](http://stackoverflow.com/questions/tagged/kubernetes)
- Share your Kubernetes [story](https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform)
- Read more about what&#39;s happening with Kubernetes on the [blog](https://kubernetes.io/blog/)
- Learn more about the [Kubernetes Release Team](https://github.com/kubernetes/sig-release/tree/master/release-team)
--&gt;
&lt;ul&gt;
&lt;li&gt;在 Bluesky 上关注我们 &lt;a href=&#34;https://bsky.app/profile/kubernetes.io&#34;&gt;@kubernetes.io&lt;/a&gt; 获取最新动态&lt;/li&gt;
&lt;li&gt;在 &lt;a href=&#34;https://discuss.kubernetes.io/&#34;&gt;Discuss&lt;/a&gt; 上加入社区讨论&lt;/li&gt;
&lt;li&gt;在 &lt;a href=&#34;http://slack.k8s.io/&#34;&gt;Slack&lt;/a&gt; 上加入社区&lt;/li&gt;
&lt;li&gt;在 &lt;a href=&#34;https://serverfault.com/questions/tagged/kubernetes&#34;&gt;Server Fault&lt;/a&gt; 或
&lt;a href=&#34;http://stackoverflow.com/questions/tagged/kubernetes&#34;&gt;Stack Overflow&lt;/a&gt; 上发布问题（或回答问题）&lt;/li&gt;
&lt;li&gt;分享你的 Kubernetes &lt;a href=&#34;https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform&#34;&gt;故事&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;在&lt;a href=&#34;https://kubernetes.io/zh-cn/blog/&#34;&gt;博客&lt;/a&gt;上阅读更多关于 Kubernetes 正在发生的事情&lt;/li&gt;
&lt;li&gt;了解更多关于 &lt;a href=&#34;https://github.com/kubernetes/sig-release/tree/master/release-team&#34;&gt;Kubernetes 发布团队&lt;/a&gt; 的信息&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes 配置最佳实践</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/11/25/configuration-good-practices/</link>
      <pubDate>Tue, 25 Nov 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/11/25/configuration-good-practices/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes Configuration Good Practices&#34;
date: 2025-11-25T00:00:00+00:00
slug: configuration-good-practices
evergreen: true
author: Kirti Goyal
--&gt;
&lt;!--
Configuration is one of those things in Kubernetes that seems small until it&#39;s not.
Configuration is at the heart of every Kubernetes workload.
A missing quote, a wrong API version or a misplaced YAML indent can ruin your entire deploy.
--&gt;
&lt;p&gt;配置是 Kubernetes 中看似微不足道，实则关键的事情之一。
配置是每个 Kubernetes 工作负载的核心。
一个缺失的引号、错误的 API 版本或错位的 YAML 缩进都可能毁掉你的整个部署。&lt;/p&gt;
&lt;!--
This blog brings together tried-and-tested configuration best practices.
The small habits that make your Kubernetes setup clean, consistent and easier to manage.
Whether you are just starting out or already deploying apps daily,
these are the little things that keep your cluster stable and your future self sane.
--&gt;
&lt;p&gt;本博客汇集了经过验证的配置最佳实践。
这些小的习惯让你的 Kubernetes 设置更干净、一致且更易于管理。
无论你是刚刚开始还是已经在每天部署应用，
这些都是让你的集群保持稳定、让未来的你保持理智的小细节。&lt;/p&gt;
&lt;!--
_This blog is inspired by the original *Configuration Best Practices* page,
which has evolved through contributions from many members of the Kubernetes community._
--&gt;
&lt;p&gt;&lt;strong&gt;本博客的灵感源自最初的 Configuration Best Practices（配置最佳实践） 页面，
该页面由 Kubernetes 社区众多成员的贡献不断演进而来。&lt;/strong&gt;&lt;/p&gt;
&lt;!--
## General configuration practices
--&gt;
&lt;h2 id=&#34;general-configuration-practices&#34;&gt;通用配置实践&lt;/h2&gt;
&lt;!--
### Use the latest stable API version
Kubernetes evolves fast. Older APIs eventually get deprecated and stop working.
So, whenever you are defining resources, make sure you are using the latest stable API version.
You can always check with
--&gt;
&lt;h3 id=&#34;use-the-latest-stable-api-version&#34;&gt;使用最新的稳定 API 版本&lt;/h3&gt;
&lt;p&gt;Kubernetes 发展很快。旧版 API 最终会被弃用并停止工作。
因此，在定义资源时，请确保使用最新的稳定 API 版本。
你可以随时使用以下命令检查：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl api-resources
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
This simple step saves you from future compatibility issues.
--&gt;
&lt;p&gt;这个简单的步骤可以让你避免未来的兼容性问题。&lt;/p&gt;
&lt;!--
### Store configuration in version control
Never apply manifest files directly from your desktop.
Always keep them in a version control system like Git, it&#39;s your safety net.
If something breaks, you can instantly roll back to a previous commit,
compare changes or recreate your cluster setup without panic.
--&gt;
&lt;h3 id=&#34;store-configuration-in-version-control&#34;&gt;将配置存储在版本控制中&lt;/h3&gt;
&lt;p&gt;永远不要直接从桌面应用清单文件。
始终将它们保存在像 Git 这样的版本控制系统中，这是你的安全网。
如果出现问题，你可以立即回滚到之前的提交、
比较更改或重新创建集群设置，而不会惊慌。&lt;/p&gt;
&lt;!--
### Write configs in YAML not JSON
Write your configuration files using YAML rather than JSON.
Both work technically, but YAML is just easier for humans.
It&#39;s cleaner to read and less noisy and widely used in the community.
--&gt;
&lt;h3 id=&#34;write-configs-in-yaml-not-json&#34;&gt;使用 YAML 而不是 JSON 编写配置&lt;/h3&gt;
&lt;p&gt;使用 YAML 而不是 JSON 编写配置文件。
两者在技术上都可以工作，但 YAML 对人类来说更容易。
它更易读、更简洁，并在社区中广泛使用。&lt;/p&gt;
&lt;!--
YAML has some sneaky gotchas with boolean values:
Use only `true` or `false`.
Don&#39;t write `yes`, `no`, `on` or  `off`.
They might work in one version of YAML but break in another.
To be safe, quote anything that looks like a Boolean (for example `&#34;yes&#34;`).
--&gt;
&lt;p&gt;YAML 在布尔值方面有一些隐藏的陷阱：
只使用 &lt;code&gt;true&lt;/code&gt; 或 &lt;code&gt;false&lt;/code&gt;。
不要写 &lt;code&gt;yes&lt;/code&gt;、&lt;code&gt;no&lt;/code&gt;、&lt;code&gt;on&lt;/code&gt; 或 &lt;code&gt;off&lt;/code&gt;。
它们可能在同一个 YAML 版本中工作，但在另一个版本中会失败。
为了安全起见，请给任何看起来像布尔值的内容加引号（例如 &lt;code&gt;&amp;quot;yes&amp;quot;&lt;/code&gt;）。&lt;/p&gt;
&lt;!--
###     Keep configuration simple and minimal
Avoid setting default values that are already handled by Kubernetes.
Minimal manifests are easier to debug, cleaner to review and less likely to break things later.
--&gt;
&lt;h3 id=&#34;keep-configuration-simple-and-minimal&#34;&gt;保持配置简单和最小化&lt;/h3&gt;
&lt;p&gt;避免设置 Kubernetes 已经处理的默认值。
最小化的清单更容易调试、更易于审查，并且以后不太可能破坏东西。&lt;/p&gt;
&lt;!--
###     Group related objects together
If your Deployment, Service and ConfigMap all belong to one app, put them in a single manifest file.
It&#39;s easier to track changes and apply them as a unit.
See the [Guestbook all-in-one.yaml](https://github.com/kubernetes/examples/blob/master/web/guestbook/all-in-one/guestbook-all-in-one.yaml) file for an example of this syntax.
--&gt;
&lt;h3 id=&#34;group-related-objects-together&#34;&gt;将相关对象分组在一起&lt;/h3&gt;
&lt;p&gt;如果你的 Deployment、Service 和 ConfigMap 都属于一个应用，
请将它们放在一个清单文件中。
这样更容易跟踪更改并将它们作为一个单元应用。
有关此语法的示例，请参阅
&lt;a href=&#34;https://github.com/kubernetes/examples/blob/master/web/guestbook/all-in-one/guestbook-all-in-one.yaml&#34;&gt;Guestbook all-in-one.yaml&lt;/a&gt; 文件。&lt;/p&gt;
&lt;!--
You can even apply entire directories with:
--&gt;
&lt;p&gt;你甚至可以使用以下命令应用整个目录：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f configs/
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
One command and boom everything in that folder gets deployed.
--&gt;
&lt;p&gt;只需一个命令，该文件夹中的所有内容都会被部署。&lt;/p&gt;
&lt;!--
###     Add helpful annotations
Manifest files are not just for machines, they are for humans too.
Use annotations to describe why something exists or what it does.
A quick one-liner can save hours when debugging later and also allows better collaboration.
--&gt;
&lt;h3 id=&#34;add-helpful-annotations&#34;&gt;添加有用的注解&lt;/h3&gt;
&lt;p&gt;清单文件不仅是为机器准备的，也是为人类准备的。
使用注解来描述某些内容存在的原因或它的作用。
快速的一行注释可以在以后调试时节省数小时，并且还可以实现更好的协作。&lt;/p&gt;
&lt;!--
The most helpful annotation to set is `kubernetes.io/description`.
It&#39;s like using comment, except that it gets copied into the API
so that everyone else can see it even after you deploy.
--&gt;
&lt;p&gt;最有用的注解是 &lt;code&gt;kubernetes.io/description&lt;/code&gt;。
这就像使用注释一样，只是它会被复制到 API 中，
这样其他人在你部署后也能看到它。&lt;/p&gt;
&lt;!--
## Managing Workloads: Pods, Deployments, and Jobs
--&gt;
&lt;h2 id=&#34;managing-workloads-pods-deployments-and-jobs&#34;&gt;管理工作负载：Pod、Deployment 和 Job&lt;/h2&gt;
&lt;!--
A common early mistake in Kubernetes is creating Pods directly.
Pods work, but they don&#39;t reschedule themselves if something goes wrong.
--&gt;
&lt;p&gt;在 Kubernetes 中，一个常见的早期错误是直接创建 Pod。
Pod 可以工作，但如果出现问题，它们不会重新调度自己。&lt;/p&gt;
&lt;!--
_Naked Pods_ (Pods not managed by a controller, such as [Deployment](/docs/concepts/workloads/controllers/deployment/) or a [StatefulSet](/docs/concepts/workloads/controllers/statefulset/)) are fine for testing, but in real setups, they are risky.
--&gt;
&lt;p&gt;&lt;strong&gt;裸 Pod&lt;/strong&gt;（不受控制器管理的 Pod，例如
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/controllers/deployment/&#34;&gt;Deployment&lt;/a&gt; 或
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/controllers/statefulset/&#34;&gt;StatefulSet&lt;/a&gt;）
用于测试是可以的，但在实际设置中，它们是有风险的。&lt;/p&gt;
&lt;!--
Why?
Because if the node hosting that Pod dies, the Pod dies with it
and Kubernetes won&#39;t bring it back automatically.
--&gt;
&lt;p&gt;为什么？
因为如果托管该 Pod 的节点死亡，Pod 也会随之死亡，
Kubernetes 不会自动将其恢复。&lt;/p&gt;
&lt;!--
### Use Deployments for apps that should always be running
A Deployment, which both creates a ReplicaSet to ensure that the desired number of Pods is always available,
and specifies a strategy to replace Pods (such as [RollingUpdate](/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment)),
is almost always preferable to creating Pods directly.
You can roll out a new version, and if something breaks, roll back instantly.
--&gt;
&lt;h3 id=&#34;use-deployments-for-apps-that-should-always-be-running&#34;&gt;对应该始终运行的应用使用 Deployment&lt;/h3&gt;
&lt;p&gt;Deployment 既创建 ReplicaSet 以确保所需数量的 Pod 始终可用，
又指定替换 Pod 的策略（例如&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment&#34;&gt;滚动更新&lt;/a&gt;），
几乎总是比直接创建 Pod 更可取。
你可以推出新版本，如果出现问题，可以立即回滚。&lt;/p&gt;
&lt;!--
### Use Jobs for tasks that should finish
A [Job](/docs/concepts/workloads/controllers/job/) is perfect when you need something to run once and then stop
like database migration or batch processing task.
It will retry if the pods fails and report success when it&#39;s done.
--&gt;
&lt;h3 id=&#34;use-jobs-for-tasks-that-should-finish&#34;&gt;对应该完成的任务使用 Job&lt;/h3&gt;
&lt;p&gt;当你需要某些东西运行一次然后停止时（如数据库迁移或批处理任务），
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/controllers/job/&#34;&gt;Job&lt;/a&gt; 是完美的选择。
如果 Pod 失败，它会重试，并在完成时报告成功。&lt;/p&gt;
&lt;!--
## Service Configuration and Networking
--&gt;
&lt;h2 id=&#34;service-configuration-and-networking&#34;&gt;Service 配置和网络&lt;/h2&gt;
&lt;!--
Services are how your workloads talk to each other inside (and sometimes outside) your cluster.
Without them, your pods exist but can&#39;t reach anyone. Let&#39;s make sure that doesn&#39;t happen.
--&gt;
&lt;p&gt;Service 是你的工作负载在集群内部（有时是外部）相互通信的方式。
没有它们，你的 Pod 存在但无法被任何人访问。让我们确保这种情况不会发生。&lt;/p&gt;
&lt;!--
### Create Services before workloads that use them
When Kubernetes starts a Pod, it automatically injects environment variables for existing Services.
So, if a Pod depends on a Service, create a [Service](/docs/concepts/services-networking/service/) **before** its corresponding backend workloads (Deployments or StatefulSets),
and before any workloads that need to access it.
--&gt;
&lt;h3 id=&#34;create-services-before-workloads-that-use-them&#34;&gt;在使用它们的工作负载之前创建 Service&lt;/h3&gt;
&lt;p&gt;当 Kubernetes 启动 Pod 时，它会自动为现有 Service 注入环境变量。
因此，如果 Pod 依赖于 Service，请在其相应的后端工作负载（Deployment 或 StatefulSet）
以及任何需要访问它的工作负载&lt;strong&gt;之前&lt;/strong&gt;创建 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/services-networking/service/&#34;&gt;Service&lt;/a&gt;。&lt;/p&gt;
&lt;!--
For example, if a Service named foo exists, all containers will get the following variables in their initial environment:
--&gt;
&lt;p&gt;例如，如果存在名为 foo 的 Service，所有容器将在其初始环境中获得以下变量：&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;FOO_SERVICE_HOST=&amp;lt;the host the Service runs on&amp;gt;
FOO_SERVICE_PORT=&amp;lt;the port the Service runs on&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;!--
DNS based discovery doesn&#39;t have this problem, but it&#39;s a good habit to follow anyway.
--&gt;
&lt;p&gt;基于 DNS 的发现没有这个问题，但无论如何遵循它是一个好习惯。&lt;/p&gt;
&lt;!--
### Use DNS for Service discovery
If your cluster has the DNS [add-on](/docs/concepts/cluster-administration/addons/) (most do),
every Service automatically gets a DNS entry.
That means you can access it by name instead of IP:
--&gt;
&lt;h3 id=&#34;use-dns-for-service-discovery&#34;&gt;使用 DNS 进行 Service 发现&lt;/h3&gt;
&lt;p&gt;如果你的集群有 DNS &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/cluster-administration/addons/&#34;&gt;安装扩展（Addon）&lt;/a&gt;（大多数都有），
每个 Service 都会自动获得一个 DNS 条目。
这意味着你可以通过名称而不是 IP 访问它：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;curl http://my-service.default.svc.cluster.local
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
It&#39;s one of those features that makes Kubernetes networking feel magical.
--&gt;
&lt;p&gt;这是让 Kubernetes 网络感觉神奇的特性之一。&lt;/p&gt;
&lt;!--
### Avoid `hostPort` and `hostNetwork` unless absolutely necessary
You&#39;ll sometimes see these options in manifests:
--&gt;
&lt;h3 id=&#34;avoid-hostport-and-hostnetwork-unless-absolutely-necessary&#34;&gt;除非绝对必要，否则避免使用 &lt;code&gt;hostPort&lt;/code&gt; 和 &lt;code&gt;hostNetwork&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;你有时会在清单中看到这些选项：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;hostPort&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;8080&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;hostNetwork&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;true&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
But here&#39;s the thing:
They tie your Pods to specific nodes, making them harder to schedule and scale.
Because each &lt;`hostIP`, `hostPort`, `protocol`&gt; combination must be unique.
If you don&#39;t specify the `hostIP` and `protocol` explicitly,
Kubernetes will use `0.0.0.0` as the default `hostIP` and `TCP` as the default `protocol`.
Unless you&#39;re debugging or building something like a network plugin, avoid them.
--&gt;
&lt;p&gt;但问题是：
它们将你的 Pod 绑定到特定节点，使它们更难调度和扩缩容。
因为每个 &amp;lt;&lt;code&gt;hostIP&lt;/code&gt;、&lt;code&gt;hostPort&lt;/code&gt;、&lt;code&gt;protocol&lt;/code&gt;&amp;gt; 组合必须是唯一的。
如果你没有明确指定 &lt;code&gt;hostIP&lt;/code&gt; 和 &lt;code&gt;protocol&lt;/code&gt;，
Kubernetes 将使用 &lt;code&gt;0.0.0.0&lt;/code&gt; 作为默认 &lt;code&gt;hostIP&lt;/code&gt;，使用 &lt;code&gt;TCP&lt;/code&gt; 作为默认 &lt;code&gt;protocol&lt;/code&gt;。
除非你在调试或构建网络插件之类的东西，否则请避免使用它们。&lt;/p&gt;
&lt;!--
If you just need local access for testing, try [`kubectl port-forward`](/docs/reference/kubectl/generated/kubectl_port-forward/):
--&gt;
&lt;p&gt;如果你只需要本地访问进行测试，请尝试 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/kubectl/generated/kubectl_port-forward/&#34;&gt;&lt;code&gt;kubectl port-forward&lt;/code&gt;&lt;/a&gt;：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl port-forward deployment/web 8080:80
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
See [Use Port Forwarding to access applications in a cluster](/docs/tasks/access-application-cluster/port-forward-access-application-cluster/) to learn more.
Or if you really need external access, use a [`type: NodePort` Service](/docs/concepts/services-networking/service/#type-nodeport). That&#39;s the safer, Kubernetes-native way.
--&gt;
&lt;p&gt;有关更多信息，请参阅
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/access-application-cluster/port-forward-access-application-cluster/&#34;&gt;使用端口转发访问集群中的应用程序&lt;/a&gt;。
或者如果你真的需要外部访问，请使用 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/services-networking/service/#type-nodeport&#34;&gt;&lt;code&gt;type: NodePort&lt;/code&gt; Service&lt;/a&gt;。
这是更安全、更符合 Kubernetes 原生方式的做法。&lt;/p&gt;
&lt;!--
### Use headless Services for internal discovery
Sometimes, you don&#39;t want Kubernetes to load balance traffic.
You want to talk directly to each Pod. That&#39;s where [headless Services](/docs/concepts/services-networking/service/#headless-services) come in.
--&gt;
&lt;h3 id=&#34;use-headless-services-for-internal-discovery&#34;&gt;使用无头 Service 进行内部服务发现&lt;/h3&gt;
&lt;p&gt;有时，你不想让 Kubernetes 负载均衡流量。
你想直接与每个 Pod 通信。这就是&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/services-networking/service/#headless-services&#34;&gt;无头 Service&lt;/a&gt; 的用武之地。&lt;/p&gt;
&lt;!--
You create one by setting `clusterIP: None`.
Instead of a single IP, DNS gives you a list of all Pods IPs,
perfect for apps that manage connections themselves.
--&gt;
&lt;p&gt;你通过设置 &lt;code&gt;clusterIP: None&lt;/code&gt; 来创建一个。
DNS 不是给你一个 IP，而是给你所有 Pod IP 的列表，
这非常适合自己管理连接的应用程序。&lt;/p&gt;
&lt;!--
## Working with labels effectively
--&gt;
&lt;h2 id=&#34;working-with-labels-effectively&#34;&gt;有效使用标签&lt;/h2&gt;
&lt;!--
[Labels](/docs/concepts/overview/working-with-objects/labels/) are key/value pairs that are attached to objects such as Pods.
Labels help you organize, query and group your resources.
They don&#39;t do anything by themselves, but they make everything else from Services to Deployments work together smoothly.
--&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/overview/working-with-objects/labels/&#34;&gt;标签&lt;/a&gt;是附加到 Pod 等对象的键/值对。
标签帮助你组织、查询和分组资源。
它们本身不做任何事情，但它们使从 Service 到 Deployment 的所有其他内容都能顺利协同工作。&lt;/p&gt;
&lt;!--
### Use semantics labels
Good labels help you understand what&#39;s what, even after months later.
Define and use [labels](/docs/concepts/overview/working-with-objects/labels/) that identify semantic attributes of your application or Deployment.
For example;
--&gt;
&lt;h3 id=&#34;use-semantics-labels&#34;&gt;使用语义标签&lt;/h3&gt;
&lt;p&gt;好的标签可以帮助你理解什么是什么，即使在几个月后也是如此。
定义并使用&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/overview/working-with-objects/labels/&#34;&gt;标签&lt;/a&gt;来标识应用程序或 Deployment 的语义属性。
例如：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;labels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app.kubernetes.io/name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app.kubernetes.io/component&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;web&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tier&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;frontend&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;phase&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;test&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
  - `app.kubernetes.io/name` : what the app is
  - `tier` : which layer it belongs to (frontend/backend)
  - `phase` : which stage it&#39;s in (test/prod)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;app.kubernetes.io/name&lt;/code&gt;：应用是什么&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tier&lt;/code&gt;：它属于哪一层（前端/后端）&lt;/li&gt;
&lt;li&gt;&lt;code&gt;phase&lt;/code&gt;：它处于哪个阶段（测试/生产）&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
You can then use these labels to make powerful selectors.
For example:
--&gt;
&lt;p&gt;然后你可以使用这些标签来创建强大的选择算符。
例如：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl get pods -l &lt;span style=&#34;color:#b8860b&#34;&gt;tier&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;frontend
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
This will list all frontend Pods across your cluster, no matter which Deployment they came from.
Basically you are not manually listing Pod names; you are just describing what you want.
See the [guestbook](https://github.com/kubernetes/examples/tree/master/web/guestbook/) app for examples of this approach.
--&gt;
&lt;p&gt;这将列出集群中所有前端 Pod，无论它们来自哪个 Deployment。
基本上，你不需要手动列出 Pod 名称；你只是在描述你想要什么。
有关此方法的示例，请参阅 &lt;a href=&#34;https://github.com/kubernetes/examples/tree/master/web/guestbook/&#34;&gt;guestbook&lt;/a&gt; 应用。&lt;/p&gt;
&lt;!--
### Use common Kubernetes labels
Kubernetes actually recommends a set of [common labels](/docs/concepts/overview/working-with-objects/common-labels/).
It&#39;s a standardized way to name things across your different workloads or projects.
Following this convention makes your manifests cleaner, and it means that tools such as [Headlamp](https://headlamp.dev/),
[dashboard](https://github.com/kubernetes/dashboard#introduction), or third-party monitoring systems can all
automatically understand what&#39;s running.
--&gt;
&lt;h3 id=&#34;use-common-kubernetes-labels&#34;&gt;使用常见的 Kubernetes 标签&lt;/h3&gt;
&lt;p&gt;Kubernetes 实际上推荐一组&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/overview/working-with-objects/common-labels/&#34;&gt;常见标签&lt;/a&gt;。
这是在你的不同工作负载或项目中命名事物的一种标准方式。
遵循此约定使你的清单更清晰，这意味着诸如 &lt;a href=&#34;https://headlamp.dev/&#34;&gt;Headlamp&lt;/a&gt;、
&lt;a href=&#34;https://github.com/kubernetes/dashboard#introduction&#34;&gt;dashboard&lt;/a&gt; 或第三方监控系统等工具
都可以自动理解正在运行的内容。&lt;/p&gt;
&lt;!--
###     Manipulate labels for debugging
Since controllers (like ReplicaSets or Deployments) use labels to manage Pods,
you can remove a label to &#34;detach&#34; a Pod temporarily.
--&gt;
&lt;h3 id=&#34;manipulate-labels-for-debugging&#34;&gt;操作标签进行调试&lt;/h3&gt;
&lt;p&gt;由于控制器（如 ReplicaSet 或 Deployment）使用标签来管理 Pod，
你可以删除标签以临时 &amp;quot;分离&amp;quot; Pod。&lt;/p&gt;
&lt;!--
Example:
--&gt;
&lt;p&gt;示例：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl label pod mypod app-
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
The `app-` part removes the label key `app`.
Once that happens, the controller won&#39;t manage that Pod anymore.
It&#39;s like isolating it for inspection, a &#34;quarantine mode&#34; for debugging.
To interactively remove or add labels, use [`kubectl label`](/docs/reference/kubectl/generated/kubectl_label/).
--&gt;
&lt;p&gt;&lt;code&gt;app-&lt;/code&gt; 部分会删除标签键 &lt;code&gt;app&lt;/code&gt;。
一旦发生这种情况，控制器将不再管理该 Pod。
这就像将其隔离以进行检查，一种用于调试的&amp;quot;隔离模式&amp;quot;。
要交互式地删除或添加标签，请使用 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/kubectl/generated/kubectl_label/&#34;&gt;&lt;code&gt;kubectl label&lt;/code&gt;&lt;/a&gt;。&lt;/p&gt;
&lt;!--
You can then check logs, exec into it and once done, delete it manually.
That&#39;s a super underrated trick every Kubernetes engineer should know.
--&gt;
&lt;p&gt;然后你可以检查 Pod 日志、exec 进入 Pod，完成后手动删除 Pod。
这是每个 Kubernetes 工程师都应该知道的超级被低估的技巧。&lt;/p&gt;
&lt;!--
## Handy kubectl tips
--&gt;
&lt;h2 id=&#34;handy-kubectl-tips&#34;&gt;实用的 kubectl 技巧&lt;/h2&gt;
&lt;!--
These small tips make life much easier when you are working with multiple manifest files or clusters.
--&gt;
&lt;p&gt;这些小技巧使你在处理多个清单文件或集群时生活变得更加轻松。&lt;/p&gt;
&lt;!--
### Apply entire directories
Instead of applying one file at a time, apply the whole folder:
--&gt;
&lt;h3 id=&#34;apply-entire-directories&#34;&gt;应用整个目录&lt;/h3&gt;
&lt;p&gt;不要一次应用一个文件，而是应用整个文件夹：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# Using server-side apply is also a good practice&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f configs/ --server-side
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
This command looks for `.yaml`, `.yml` and `.json` files in that folder and applies them all together.
It&#39;s faster, cleaner and helps keep things grouped by app.
--&gt;
&lt;p&gt;此命令在该文件夹中查找 &lt;code&gt;.yaml&lt;/code&gt;、&lt;code&gt;.yml&lt;/code&gt; 和 &lt;code&gt;.json&lt;/code&gt; 文件并将它们一起应用。
它更快、更清晰，并有助于按应用分组。&lt;/p&gt;
&lt;!--
### Use label selectors to get or delete resources
You don&#39;t always need to type out resource names one by one.
Instead, use [selectors](/docs/concepts/overview/working-with-objects/labels/#label-selectors) to act on entire groups at once:
--&gt;
&lt;h3 id=&#34;use-label-selectors-to-get-or-delete-resources&#34;&gt;使用标签选择算符获取或删除资源&lt;/h3&gt;
&lt;p&gt;你不需要总是逐个输入资源名称。
相反，使用&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/overview/working-with-objects/labels/#label-selectors&#34;&gt;标签选择算符&lt;/a&gt;一次对整个组进行操作：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl get pods -l &lt;span style=&#34;color:#b8860b&#34;&gt;app&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;myapp
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl delete pod -l &lt;span style=&#34;color:#b8860b&#34;&gt;phase&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#a2f&#34;&gt;test&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
It&#39;s especially useful in CI/CD pipelines, where you want to clean up test resources dynamically.
--&gt;
&lt;p&gt;这在 CI/CD 流水线中特别有用，你可以在其中动态清理测试资源。&lt;/p&gt;
&lt;!--
### Quickly create Deployments and Services
For quick experiments, you don&#39;t always need to write a manifest.
You can spin up a Deployment right from the CLI:
--&gt;
&lt;h3 id=&#34;quickly-create-deployments-and-services&#34;&gt;快速创建 Deployment 和 Service&lt;/h3&gt;
&lt;p&gt;对于快速实验，你不需要总是编写清单。
你可以直接从 CLI 启动 Deployment：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl create deployment webapp --image&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;nginx
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
Then expose it as a Service:
--&gt;
&lt;p&gt;然后将其公开为 Service：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl expose deployment webapp --port&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;80&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
This is great when you just want to test something before writing full manifests.
Also, see [Use a Service to Access an Application in a cluster](/docs/tasks/access-application-cluster/service-access-application-cluster/) for an example.
--&gt;
&lt;p&gt;当你想在编写完整清单之前测试某些内容时，这非常有用。
另外，有关示例，请参阅
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/access-application-cluster/service-access-application-cluster/&#34;&gt;使用 Service 访问集群中的应用程序&lt;/a&gt;。&lt;/p&gt;
&lt;!--
## Conclusion
--&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;结论&lt;/h2&gt;
&lt;!--
Cleaner configuration leads to calmer cluster administrators.
If you stick to a few simple habits: keep configuration simple and minimal, version-control everything,
use consistent labels, and avoid relying on naked Pods, you&#39;ll save yourself hours of debugging down the road.
--&gt;
&lt;p&gt;更清晰的配置可以让集群管理员更为泰然自若。
如果你坚持几个简单的习惯：保持配置简单和最小化、对所有内容进行版本控制、
使用一致的标签，并避免依赖裸 Pod，你将为自己节省数小时的调试时间。&lt;/p&gt;
&lt;!--
The best part?
Clean configurations stay readable. Even after months, you or anyone on your team
can glance at them and know exactly what&#39;s happening.
--&gt;
&lt;p&gt;最好的部分是什么？
清晰的配置保持可读性。即使在几个月后，
你或团队中的任何人都可以瞥一眼它们并确切知道发生了什么。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes 1.35：版本化 z-pages API 带来更强大的调试能力</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/11/13/kubernetes-1-35-structured-zpages/</link>
      <pubDate>Thu, 13 Nov 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/11/13/kubernetes-1-35-structured-zpages/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes 1.35: Enhanced Debugging with Versioned z-pages APIs&#34;
draft: true
slug: kubernetes-1-35-structured-zpages
author: &gt;
  [Richa Banker](https://github.com/richabanker),
  [Han Kang](https://github.com/cncf/memorials/blob/main/han-kang.md)
--&gt;
&lt;!--
Debugging Kubernetes control plane components can be challenging, especially when you need to quickly understand the runtime state of a component or verify its configuration. With Kubernetes 1.35, we&#39;re enhancing the z-pages debugging endpoints with structured, machine-parseable responses that make it easier to build tooling and automate troubleshooting workflows.
--&gt;
&lt;p&gt;调试 Kubernetes 控制平面组件可能很具挑战性，
尤其是在需要快速理解组件运行时状态或验证配置时。
在 Kubernetes 1.35 中，我们为 z-pages 调试端点带来结构化、可被机器解析的响应，
让构建工具和自动化排障流程变得更加轻松。&lt;/p&gt;
&lt;!--
## What are z-pages?
--&gt;
&lt;h2 id=&#34;what-are-z-pages&#34;&gt;什么是 z-pages？&lt;/h2&gt;
&lt;!--
z-pages are special debugging endpoints exposed by Kubernetes control plane components. Introduced as an alpha feature in Kubernetes 1.32, these endpoints provide runtime diagnostics for components like `kube-apiserver`, `kube-controller-manager`, `kube-scheduler`, `kubelet` and `kube-proxy`. The name &#34;z-pages&#34; comes from the convention of using `/*z` paths for debugging endpoints.
--&gt;
&lt;p&gt;z-pages 是 Kubernetes 控制平面组件所公开的特殊调试端点。
它们在 Kubernetes 1.32 中以 Alpha 特性引入，为 &lt;code&gt;kube-apiserver&lt;/code&gt;、&lt;code&gt;kube-controller-manager&lt;/code&gt;、
&lt;code&gt;kube-scheduler&lt;/code&gt;、&lt;code&gt;kubelet&lt;/code&gt; 与 &lt;code&gt;kube-proxy&lt;/code&gt; 等组件提供运行时诊断。
&amp;quot;z-pages&amp;quot; 这一名称源自使用 &lt;code&gt;/*z&lt;/code&gt; 路径来公开调试端点的惯例。&lt;/p&gt;
&lt;!--
Currently, Kubernetes supports two primary z-page endpoints:

`/statusz`
: Displays high-level component information including version information, start time, uptime, and available debug paths

`/flagz`
: Shows all command-line arguments and their values used to start the component (with confidential values redacted for security)
--&gt;
&lt;p&gt;目前，Kubernetes 支持两个主要的 z-page 端点：&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;/statusz&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;显示组件的高级信息，包括版本、启动时间、运行时长以及可用调试路径&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;/flagz&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;展示用于启动组件的全部命令行参数及其取值（敏感值会出于安全考虑被屏蔽）&lt;/dd&gt;
&lt;/dl&gt;
&lt;!--
These endpoints are valuable for human operators who need to quickly inspect component state, but until now, they only returned plain text output that was difficult to parse programmatically.
--&gt;
&lt;p&gt;这些端点对于需要快速检查组件状态的人工运维人员非常有价值，
但在此之前它们只返回难以通过程序解析的纯文本输出。&lt;/p&gt;
&lt;!--
## What&#39;s new in Kubernetes 1.35?
--&gt;
&lt;h2 id=&#34;whats-new-in-kubernetes-1-35&#34;&gt;Kubernetes 1.35 有哪些新内容？&lt;/h2&gt;
&lt;!--
Kubernetes 1.35 introduces structured, versioned responses for both `/statusz` and `/flagz` endpoints. This enhancement maintains backward compatibility with the existing plain text format while adding support for machine-readable JSON responses.
--&gt;
&lt;p&gt;Kubernetes 1.35 为 &lt;code&gt;/statusz&lt;/code&gt; 与 &lt;code&gt;/flagz&lt;/code&gt; 两个端点都引入了结构化、具备版本控制的响应。
这一增强在保留现有纯文本格式向后兼容性的同时，新增了对机器可读 JSON 响应的支持。&lt;/p&gt;
&lt;!--
### Backward compatible design
--&gt;
&lt;h3 id=&#34;backward-compatible-design&#34;&gt;向后兼容的设计&lt;/h3&gt;
&lt;!--
The new structured responses are opt-in. Without specifying an `Accept` header, the endpoints continue to return the familiar plain text format:
--&gt;
&lt;p&gt;新的结构化响应是按需启用的。
如果未指定 &lt;code&gt;Accept&lt;/code&gt; 头，端点仍会返回熟悉的纯文本格式：&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;$ curl --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt \
  --key /etc/kubernetes/pki/apiserver-kubelet-client.key \
  --cacert /etc/kubernetes/pki/ca.crt \
  https://localhost:6443/statusz

kube-apiserver statusz
Warning: This endpoint is not meant to be machine parseable, has no formatting compatibility guarantees and is for debugging purposes only.

Started: Wed Oct 16 21:03:43 UTC 2024
Up: 0 hr 00 min 16 sec
Go version: go1.23.2
Binary version: 1.35.0-alpha.0.1595
Emulation version: 1.35
Paths: /healthz /livez /metrics /readyz /statusz /version
&lt;/code&gt;&lt;/pre&gt;&lt;!--
### Structured JSON responses
--&gt;
&lt;h3 id=&#34;structured-json-responses&#34;&gt;结构化 JSON 响应&lt;/h3&gt;
&lt;!--
To receive a structured response, include the appropriate `Accept` header:
--&gt;
&lt;p&gt;若要获得结构化响应，需要提供合适的 &lt;code&gt;Accept&lt;/code&gt; 头：&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Statusz
&lt;/code&gt;&lt;/pre&gt;&lt;!--
This returns a versioned JSON response:
--&gt;
&lt;p&gt;这样即可返回具备版本号的 JSON 响应：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;kind&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;Statusz&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;apiVersion&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;config.k8s.io/v1alpha1&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;metadata&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;name&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;kube-apiserver&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;startTime&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;2025-10-29T00:30:01Z&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;uptimeSeconds&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#666&#34;&gt;856&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;goVersion&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;go1.23.2&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;binaryVersion&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;1.35.0&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;emulationVersion&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;1.35&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;paths&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;/healthz&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;/livez&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;/metrics&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;/readyz&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;/statusz&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;/version&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
Similarly, `/flagz` supports structured responses with the header:
--&gt;
&lt;p&gt;类似地，&lt;code&gt;/flagz&lt;/code&gt; 也支持结构化响应，只需设置以下头部：&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Flagz
&lt;/code&gt;&lt;/pre&gt;&lt;!--
Example response:
--&gt;
&lt;p&gt;响应示例如下：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;kind&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;Flagz&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;apiVersion&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;config.k8s.io/v1alpha1&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;metadata&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;name&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;kube-apiserver&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;flags&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;advertise-address&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;192.168.8.4&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;allow-privileged&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;true&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;authorization-mode&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;[Node,RBAC]&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;enable-priority-and-fairness&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;true&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;&amp;#34;profiling&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;true&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
## Why structured responses matter
--&gt;
&lt;h2 id=&#34;why-structured-responses-matter&#34;&gt;结构化响应为什么很重要&lt;/h2&gt;
&lt;!--
The addition of structured responses opens up several new possibilities:
--&gt;
&lt;p&gt;引入结构化响应使得一系列新的用例成为可能：&lt;/p&gt;
&lt;!--
### 1. **Automated health checks and monitoring**

Instead of parsing plain text, monitoring tools can now easily extract specific fields. For example, you can programmatically check if a component has been running with an unexpected emulated version or verify that critical flags are set correctly.
--&gt;
&lt;h3 id=&#34;1-automated-health-checks-and-monitoring&#34;&gt;1. &lt;strong&gt;自动化健康检查与监控&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;相比解析纯文本，监控工具现在可以轻松提取特定字段。
例如，你可以通过程序检查组件是否以异常的模拟版本运行，或确认关键参数是否配置正确。&lt;/p&gt;
&lt;!--
### 2. **Better debugging tools**

Developers can build sophisticated debugging tools that compare configurations across multiple components or track configuration drift over time. The structured format makes it trivial to `diff` configurations or validate that components are running with expected settings.
--&gt;
&lt;h3 id=&#34;2-better-debugging-tools&#34;&gt;2. &lt;strong&gt;更好的调试工具&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;开发者能够构建更加智能的调试工具，用于跨组件比较配置或随时间追踪配置漂移。
结构化格式让对配置执行 &lt;code&gt;diff&lt;/code&gt; 或验证组件是否按预期设置运行变得轻而易举。&lt;/p&gt;
&lt;!--
### 3. **API versioning and stability**

By introducing versioned APIs (starting with `v1alpha1`), we provide a clear path to stability. As the feature matures, we&#39;ll introduce `v1beta1` and eventually `v1`, giving you confidence that your tooling won&#39;t break with future Kubernetes releases.
--&gt;
&lt;h3 id=&#34;3-api-versioning-and-stability&#34;&gt;3. &lt;strong&gt;API 版本化与稳定性&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;通过引入带版本的 API（从 &lt;code&gt;v1alpha1&lt;/code&gt; 开始），我们为稳定性提供了明确路径。
随着特性不断成熟，我们会发布 &lt;code&gt;v1beta1&lt;/code&gt; 甚至 &lt;code&gt;v1&lt;/code&gt;，
让你更有信心确保这些工具在未来的 Kubernetes 版本中依然能够正常工作。&lt;/p&gt;
&lt;!--
## How to use structured z-pages
--&gt;
&lt;h2 id=&#34;how-to-use-structured-z-pages&#34;&gt;如何使用结构化 z-pages&lt;/h2&gt;
&lt;!--
### Prerequisites

Both endpoints require feature gates to be enabled:

- `/statusz`: Enable the `ComponentStatusz` feature gate
- `/flagz`: Enable the `ComponentFlagz` feature gate
--&gt;
&lt;h3 id=&#34;prerequisites&#34;&gt;前提条件&lt;/h3&gt;
&lt;p&gt;两个端点都需要启用相应的特性门控：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;/statusz&lt;/code&gt;：启用 &lt;code&gt;ComponentStatusz&lt;/code&gt; 特性门控&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/flagz&lt;/code&gt;：启用 &lt;code&gt;ComponentFlagz&lt;/code&gt; 特性门控&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
### Example: Getting structured responses

Here&#39;s an example using `curl` to retrieve structured JSON responses from the kube-apiserver:
--&gt;
&lt;h3 id=&#34;example-getting-structured-responses&#34;&gt;示例：获取结构化响应&lt;/h3&gt;
&lt;p&gt;下面示例展示如何使用 &lt;code&gt;curl&lt;/code&gt; 从 kube-apiserver 中获取结构化 JSON 响应：&lt;/p&gt;
&lt;!--
```bash
# Get structured statusz response
curl \
  --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt \
  --key /etc/kubernetes/pki/apiserver-kubelet-client.key \
  --cacert /etc/kubernetes/pki/ca.crt \
  -H &#34;Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Statusz&#34; \
  https://localhost:6443/statusz | jq .

# Get structured flagz response
curl \
  --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt \
  --key /etc/kubernetes/pki/apiserver-kubelet-client.key \
  --cacert /etc/kubernetes/pki/ca.crt \
  -H &#34;Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Flagz&#34; \
  https://localhost:6443/flagz | jq .
```
--&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 获取结构化状态响应&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;curl &lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;&lt;/span&gt;  --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt &lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;&lt;/span&gt;  --key /etc/kubernetes/pki/apiserver-kubelet-client.key &lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;&lt;/span&gt;  --cacert /etc/kubernetes/pki/ca.crt &lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;&lt;/span&gt;  -H &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Statusz&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;&lt;/span&gt;  https://localhost:6443/statusz | jq .
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 获取结构化标记响应&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;curl &lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;&lt;/span&gt;  --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt &lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;&lt;/span&gt;  --key /etc/kubernetes/pki/apiserver-kubelet-client.key &lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;&lt;/span&gt;  --cacert /etc/kubernetes/pki/ca.crt &lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;&lt;/span&gt;  -H &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Flagz&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;&lt;/span&gt;  https://localhost:6443/flagz | jq .
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&#34;alert alert-info&#34; role=&#34;alert&#34;&gt;&lt;h4 class=&#34;alert-heading&#34;&gt;说明：&lt;/h4&gt;&lt;!--
The examples above use client certificate authentication and verify the server&#39;s certificate using `--cacert`. 
If you need to bypass certificate verification in a test environment, you can use `--insecure` (or `-k`), 
but this should never be done in production as it makes you vulnerable to man-in-the-middle attacks.
--&gt;
&lt;p&gt;上述示例使用客户端证书认证，并通过 &lt;code&gt;--cacert&lt;/code&gt; 验证服务器证书。
如果在测试环境中需要跳过证书验证，可以使用 &lt;code&gt;--insecure&lt;/code&gt;（或 &lt;code&gt;-k&lt;/code&gt;），
但在生产环境切勿这样做，否则会暴露在中间人攻击风险之下。&lt;/p&gt;&lt;/div&gt;

&lt;!--
## Important considerations
--&gt;
&lt;h2 id=&#34;important-considerations&#34;&gt;重要注意事项&lt;/h2&gt;
&lt;!--
### Alpha feature status

The structured z-page responses are an **alpha** feature in Kubernetes 1.35. This means:

- The API format may change in future releases
- These endpoints are intended for debugging, not production automation
- You should avoid relying on them for critical monitoring workflows until they reach beta or stable status
--&gt;
&lt;h3 id=&#34;alpha-feature-status&#34;&gt;Alpha 特性状态&lt;/h3&gt;
&lt;p&gt;结构化 z-page 响应在 Kubernetes 1.35 中仍是 &lt;strong&gt;Alpha&lt;/strong&gt; 特性，这意味着：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;API 格式可能会在未来版本中发生变化&lt;/li&gt;
&lt;li&gt;这些端点用于调试，而非生产自动化&lt;/li&gt;
&lt;li&gt;在其达到 Beta 或稳定版之前，不应把它们作为关键监控工作流的依赖&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
### Security and access control
--&gt;
&lt;h3 id=&#34;security-and-access-control&#34;&gt;安全与访问控制&lt;/h3&gt;
&lt;!--
z-pages expose internal component information and require proper access controls. Here are the key security considerations:
--&gt;
&lt;p&gt;z-pages 会公开组件内部信息，因此必须设置恰当的访问控制，重点注意以下安全事项：&lt;/p&gt;
&lt;!--
**Authorization**: Access to z-page endpoints is restricted to members of the `system:monitoring` group, which follows the same authorization model as other debugging endpoints like `/healthz`, `/livez`, and `/readyz`. This ensures that only authorized users and service accounts can access debugging information. If your cluster uses RBAC, you can manage access by granting appropriate permissions to this group.
--&gt;
&lt;p&gt;&lt;strong&gt;鉴权&lt;/strong&gt;：访问 z-page 端点仅限 &lt;code&gt;system:monitoring&lt;/code&gt; 组成员，
遵循与 &lt;code&gt;/healthz&lt;/code&gt;、&lt;code&gt;/livez&lt;/code&gt;、&lt;code&gt;/readyz&lt;/code&gt; 等调试端点相同的鉴权模型。
这样可确保只有获授权的用户和服务账号才能获取调试信息。
如果集群使用 RBAC，可以通过赋予该组适当权限来管理访问。&lt;/p&gt;
&lt;!--
**Authentication**: The authentication requirements for these endpoints depend on your cluster&#39;s configuration. Unless anonymous authentication is enabled for your cluster, you typically need to use authentication mechanisms (such as client certificates) to access these endpoints.
--&gt;
&lt;p&gt;&lt;strong&gt;身份认证&lt;/strong&gt;：这些端点的身份认证要求取决于集群配置。
除非集群启用了匿名身份认证，否则通常需要使用身份认证机制（如客户端证书）来访问这些端点。&lt;/p&gt;
&lt;!--
**Information disclosure**: These endpoints reveal configuration details about your cluster components, including:
- Component versions and build information
- All command-line arguments and their values (with confidential values redacted)
- Available debug endpoints
--&gt;
&lt;p&gt;&lt;strong&gt;信息披露&lt;/strong&gt;：这些端点会泄露集群组件的配置细节，包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;组件版本与构建信息&lt;/li&gt;
&lt;li&gt;所有命令行参数及其取值（敏感值会被屏蔽）&lt;/li&gt;
&lt;li&gt;可用的调试端点&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Only grant access to trusted operators and debugging tools. Avoid exposing these endpoints to unauthorized users or automated systems that don&#39;t require this level of access.
--&gt;
&lt;p&gt;务必仅向受信任的运维人员和调试工具授予访问权限，
避免对无关用户或不需要该访问级别的自动化系统开放这些端点。&lt;/p&gt;
&lt;!--
### Future evolution

As the feature matures, we (Kubernetes SIG Instrumentation) expect to:

- Introduce `v1beta1` and eventually `v1` versions of the API
- Gather community feedback on the response schema
- Potentially add additional z-page endpoints based on user needs
--&gt;
&lt;h3 id=&#34;future-evolution&#34;&gt;未来演进&lt;/h3&gt;
&lt;p&gt;随着特性愈发成熟，Kubernetes SIG Instrumentation 计划：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;引入 &lt;code&gt;v1beta1&lt;/code&gt; 并最终提供 &lt;code&gt;v1&lt;/code&gt; 版本的 API&lt;/li&gt;
&lt;li&gt;收集社区对响应模式的反馈&lt;/li&gt;
&lt;li&gt;根据用户需求，潜在新增更多 z-page 端点&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Try it out

We encourage you to experiment with structured z-pages in a test environment:
--&gt;
&lt;h2 id=&#34;try-it-out&#34;&gt;动手试试&lt;/h2&gt;
&lt;p&gt;我们鼓励你在测试环境体验结构化 z-pages：&lt;/p&gt;
&lt;!--
1. Enable the `ComponentStatusz` and `ComponentFlagz` feature gates on your control plane components
2. Try querying the endpoints with both plain text and structured formats
3. Build a simple tool or script that uses the structured data
4. Share your feedback with the community
--&gt;
&lt;ol&gt;
&lt;li&gt;在控制平面组件上启用 &lt;code&gt;ComponentStatusz&lt;/code&gt; 与 &lt;code&gt;ComponentFlagz&lt;/code&gt; 特性门控&lt;/li&gt;
&lt;li&gt;使用纯文本与结构化两种格式查询端点&lt;/li&gt;
&lt;li&gt;构建一个使用结构化数据的简单工具或脚本&lt;/li&gt;
&lt;li&gt;向社区分享你的反馈&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
## Learn more

- [z-pages documentation](/docs/reference/instrumentation/zpages/)
- [KEP-4827: Component Statusz](https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/4827-component-statusz/README.md)
- [KEP-4828: Component Flagz](https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/4828-component-flagz/README.md)
- Join the discussion in the [#sig-instrumentation](https://kubernetes.slack.com/archives/C20HH14P7) channel on Kubernetes Slack
--&gt;
&lt;h2 id=&#34;learn-more&#34;&gt;了解更多&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/instrumentation/zpages/&#34;&gt;z-pages 文档&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/4827-component-statusz/README.md&#34;&gt;KEP-4827：Component Statusz&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/4828-component-flagz/README.md&#34;&gt;KEP-4828：Component Flagz&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;加入 Kubernetes Slack 中的 &lt;a href=&#34;https://kubernetes.slack.com/archives/C20HH14P7&#34;&gt;#sig-instrumentation&lt;/a&gt; 频道参与讨论&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Get involved
--&gt;
&lt;h2 id=&#34;get-involved&#34;&gt;参与其中&lt;/h2&gt;
&lt;!--
We&#39;d love to hear your feedback! The structured z-pages feature is designed to make Kubernetes easier to debug and monitor. Whether you&#39;re building internal tooling, contributing to open source projects, or just exploring the feature, your input helps shape the future of Kubernetes observability.
--&gt;
&lt;p&gt;我们非常期待你的反馈！结构化 z-pages 旨在让 Kubernetes 调试和监控更轻松。
无论你是在构建内部工具、为开源项目做贡献，还是只是探索该特性，
你的意见都将帮助塑造 Kubernetes 可观测性的未来。&lt;/p&gt;
&lt;!--
If you have questions, suggestions, or run into issues, please reach out to SIG Instrumentation. You can find us on Slack or at our regular [community meetings](https://github.com/kubernetes/community/tree/master/sig-instrumentation).
--&gt;
&lt;p&gt;如果你有问题、建议或遇到问题，请联系 SIG Instrumentation。
你可以在 Slack 中找到我们，或参加常规的&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-instrumentation&#34;&gt;社区会议&lt;/a&gt;。&lt;/p&gt;
&lt;!--
Happy debugging!
--&gt;
&lt;p&gt;祝你调试愉快！&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Ingress NGINX 退役：你需要了解的内容</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/11/11/ingress-nginx-retirement/</link>
      <pubDate>Tue, 11 Nov 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/11/11/ingress-nginx-retirement/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Ingress NGINX Retirement: What You Need to Know&#34;
slug: ingress-nginx-retirement
canonicalUrl: https://www.kubernetes.dev/blog/2025/11/12/ingress-nginx-retirement
date: 2025-11-11T10:30:00-08:00
author: &gt;
  Tabitha Sable (Kubernetes SRC)
--&gt;
&lt;!--
To prioritize the safety and security of the ecosystem, Kubernetes SIG Network and the Security Response Committee are announcing the upcoming retirement of [Ingress NGINX](https://github.com/kubernetes/ingress-nginx/). Best-effort maintenance will continue until March 2026. Afterward, there will be no further releases, no bugfixes, and no updates to resolve any security vulnerabilities that may be discovered. **Existing deployments of Ingress NGINX will continue to function and installation artifacts will remain available.**

We recommend migrating to one of the many alternatives. Consider [migrating to Gateway API](https://gateway-api.sigs.k8s.io/guides/), the modern replacement for Ingress. If you must continue using Ingress, many alternative Ingress controllers are [listed in the Kubernetes documentation](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/). Continue reading for further information about the history and current state of Ingress NGINX, as well as next steps.
--&gt;
&lt;p&gt;为了优先考虑生态系统的安全，Kubernetes SIG Network 和安全响应委员会宣布
&lt;a href=&#34;https://github.com/kubernetes/ingress-nginx/&#34;&gt;Ingress NGINX&lt;/a&gt; 即将退役，
并将尽力将其维护期持续到 2026 年 3 月。
之后，将不再有进一步的版本发布、错误修复和更新来解决可能发现的任何安全漏洞。
&lt;strong&gt;现有的 Ingress NGINX Deployment 将继续运行，并且安装工件仍将可用。&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;我们建议迁移到替代方案之一。考虑&lt;a href=&#34;https://gateway-api.sigs.k8s.io/guides/&#34;&gt;迁移到 Gateway API&lt;/a&gt;，
这是 Ingress 的现代替代品。如果你必须继续使用 Ingress，许多替代的 Ingress 控制器已在
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/services-networking/ingress-controllers/&#34;&gt;Kubernetes 文档中列出&lt;/a&gt;。
下文介绍有关 Ingress NGINX 的历史和当前状态以及后续步骤的更多信息。&lt;/p&gt;
&lt;!--
## About Ingress NGINX

[Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) is the original user-friendly way to direct network traffic to workloads running on Kubernetes. ([Gateway API](https://kubernetes.io/docs/concepts/services-networking/gateway/) is a newer way to achieve many of the same goals.) In order for an Ingress to work in your cluster, there must be an [Ingress controller](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/) running. There are many Ingress controller choices available, which serve the needs of different users and use cases. Some are cloud-provider specific, while others have more general applicability.

[Ingress NGINX](https://www.github.com/kubernetes/ingress-nginx) was an Ingress controller, developed early in the history of the Kubernetes project as an example implementation of the API. It became very popular due to its tremendous flexibility, breadth of features, and independence from any particular cloud or infrastructure provider. Since those days, many other Ingress controllers have been created within the Kubernetes project by community groups, and by cloud native vendors. Ingress NGINX has continued to be one of the most popular, deployed as part of many hosted Kubernetes platforms and within innumerable independent users’ clusters.
--&gt;
&lt;h2 id=&#34;关于-ingress-nginx&#34;&gt;关于 Ingress NGINX&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://kubernetes.io/zh-cn/docs/concepts/services-networking/ingress/&#34;&gt;Ingress&lt;/a&gt;
是将网络流量导向运行在 Kubernetes 上的工作负载的原生的、用户友好的方式。
（&lt;a href=&#34;https://kubernetes.io/zh-cn/docs/concepts/services-networking/gateway/&#34;&gt;Gateway API&lt;/a&gt; 是实现许多相同目标的新方法。）
为了使 Ingress 在集群中工作，你必须运行一个
&lt;a href=&#34;https://kubernetes.io/zh-cn/docs/concepts/services-networking/ingress-controllers/&#34;&gt;Ingress 控制器&lt;/a&gt;。
有多种 Ingress 控制器可供选择，可以满足不同用户和使用场景的需求。
有些是特定于云提供商的，而其他的则具有更广泛的应用性。&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.github.com/kubernetes/ingress-nginx&#34;&gt;Ingress NGINX&lt;/a&gt;
是一个 Ingress 控制器，作为 API 的示例实现，在 Kubernetes 项目早期开发。
由于其极大的灵活性、丰富的特性以及不依赖于任何特定的云或基础设施提供商，它变得非常流行。
自那时以来，许多其他的 Ingress 控制器已经在 Kubernetes 项目中由社区小组和云原生供应商创建。
Ingress NGINX 一直是其中最受欢迎的选择之一，被部署在许多托管的 Kubernetes
平台上以及无数独立用户的集群中。&lt;/p&gt;
&lt;!--
## History and Challenges

The breadth and flexibility of Ingress NGINX has caused maintenance challenges. Changing expectations about cloud native software have also added complications. What were once considered helpful options have sometimes come to be considered serious security flaws, such as the ability to add arbitrary NGINX configuration directives via the &#34;snippets&#34; annotations. Yesterday’s flexibility has become today’s insurmountable technical debt.

Despite the project’s popularity among users, Ingress NGINX has always struggled with insufficient or barely-sufficient maintainership. For years, the project has had only one or two people doing development work, on their own time, after work hours and on weekends. Last year, the Ingress NGINX maintainers [announced](https://kccncna2024.sched.com/event/1hoxW/securing-the-future-of-ingress-nginx-james-strong-isovalent-marco-ebert-giant-swarm) their plans to wind down Ingress NGINX and develop a replacement controller together with the Gateway API community. Unfortunately, even that announcement failed to generate additional interest in helping maintain Ingress NGINX or develop InGate to replace it. (InGate development never progressed far enough to create a mature replacement; it will also be retired.)
--&gt;
&lt;h2 id=&#34;历史与挑战&#34;&gt;历史与挑战&lt;/h2&gt;
&lt;p&gt;Ingress NGINX 的广度和灵活性导致了维护上的挑战，对于云原生软件不断变化的期望也增加了复杂性。
其中曾经被认为是有帮助的选项，有时却被视为严重的安全缺陷，例如通过“片段”注解添加任意 NGINX 配置指令的能力。
昨天的灵活性已成为今天的难以克服的技术债务。&lt;/p&gt;
&lt;p&gt;尽管该项目在用户中非常受欢迎，但 Ingress NGINX 一直存在一个问题，就是维护者很少、勉强应付。
多年来，项目仅有的一到两个人在其业余时间、下班后和周末进行开发工作。去年，Ingress NGINX
维护者&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/contribute/blog/securing-the-future-of-ingress-nginx&#34;&gt;宣布&lt;/a&gt;
他们的计划是逐步停止 Ingress NGINX，并与 Gateway API 社区一起开发替代控制器。
不幸的是，即使是这样的公告也未能激起更多兴趣来帮助维护 Ingress NGINX 或开发 InGate 以取代它。
（InGate 的开发从未进展到足以创建一个成熟的替代品；它也将被退役。）&lt;/p&gt;
&lt;!--
## Current State and Next Steps

Currently, Ingress NGINX is receiving best-effort maintenance. SIG Network and the Security Response Committee have exhausted our efforts to find additional support to make Ingress NGINX sustainable. To prioritize user safety, we must retire the project.

In March 2026, Ingress NGINX maintenance will be halted, and the project will be [retired](https://github.com/kubernetes-retired/). After that time, there will be no further releases, no bugfixes, and no updates to resolve any security vulnerabilities that may be discovered. The GitHub repositories will be made read-only and left available for reference.
--&gt;
&lt;h2 id=&#34;当前状态与下一步&#34;&gt;当前状态与下一步&lt;/h2&gt;
&lt;p&gt;目前，Ingress NGINX 的维护模式是尽力而为的。
SIG Network 和安全响应委员会已经用尽全力寻找额外的支持来使 Ingress NGINX 可持续发展。
为了优先考虑用户的安全，我们必须停止该项目。&lt;/p&gt;
&lt;p&gt;2026 年 3 月，Ingress NGINX 的维护将被停止，项目将被&lt;a href=&#34;https://github.com/kubernetes-retired/&#34;&gt;退役&lt;/a&gt;。
之后，将不再有进一步的版本发布、错误修复或更新来解决可能发现的任何安全漏洞。
GitHub 仓库将变为只读，并留作参考。&lt;/p&gt;
&lt;!--
Existing deployments of Ingress NGINX will not be broken. Existing project artifacts such as Helm charts and container images will remain available.

In most cases, you can check whether you use Ingress NGINX by running `kubectl get pods \--all-namespaces \--selector app.kubernetes.io/name=ingress-nginx` with cluster administrator permissions.
--&gt;
&lt;p&gt;现有的 Ingress NGINX 部署不会受到影响。现有的项目制品，如 Helm 图表和容器镜像，仍将保持可用。&lt;/p&gt;
&lt;p&gt;在大多数情况下，你可以通过运行 &lt;code&gt;kubectl get pods --all-namespaces --selector app.kubernetes.io/name=ingress-nginx&lt;/code&gt;
来检查是否使用了 Ingress NGINX，这需要集群管理员权限。&lt;/p&gt;
&lt;!--
We would like to thank the Ingress NGINX maintainers for their work in creating and maintaining this project–their dedication remains impressive. This Ingress controller has powered billions of requests in datacenters and homelabs all around the world. In a lot of ways, Kubernetes wouldn’t be where it is without Ingress NGINX, and we are grateful for so many years of incredible effort.

**SIG Network and the Security Response Committee recommend that all Ingress NGINX users begin migration to Gateway API or another Ingress controller immediately.** Many options are listed in the Kubernetes documentation: [Gateway API](https://gateway-api.sigs.k8s.io/guides/), [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/). Additional options may be available from vendors you work with.
--&gt;
&lt;p&gt;我们想感谢 Ingress NGINX 的维护者们在创建和维护此项目中所做的工作——他们的奉献精神令人印象深刻。
这个 Ingress 控制器在全球的数据中心和家庭实验室中处理了数十亿次请求。
在很多方面，如果没有 Ingress NGINX，Kubernetes 不会取得如今的成就，我们对如此多年的杰出努力表示感激。&lt;/p&gt;
&lt;p&gt;**SIG Network 和安全响应委员会建议所有 Ingress NGINX 用户立即开始迁移到 Gateway API
或其他 Ingress 控制器。
** Kubernetes 文档中列出了许多选项：&lt;a href=&#34;https://gateway-api.sigs.k8s.io/guides/&#34;&gt;Gateway API&lt;/a&gt;、
&lt;a href=&#34;https://kubernetes.io/zh-cn/docs/concepts/services-networking/ingress-controllers/&#34;&gt;Ingress&lt;/a&gt;。
与你合作的供应商可能还提供其他选项。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>公布 2025 年指导委员会选举结果</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/11/09/steering-committee-results-2025/</link>
      <pubDate>Sun, 09 Nov 2025 15:10:00 -0500</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/11/09/steering-committee-results-2025/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Announcing the 2025 Steering Committee Election Results&#34;
slug: steering-committee-results-2025
canonicalUrl: https://www.kubernetes.dev/blog/2025/11/09/steering-committee-results-2025
date: 2025-11-09T15:10:00-05:00
author: &gt;
  Arujjwal Negi
--&gt;
&lt;!--
The [2025 Steering Committee Election](https://github.com/kubernetes/community/tree/master/elections/steering/2025) is now complete. The Kubernetes Steering Committee consists of 7 seats, 4 of which were up for election in 2025. Incoming committee members serve a term of 2 years, and all members are elected by the Kubernetes Community.

The Steering Committee oversees the governance of the entire Kubernetes project. With that great power comes great responsibility. You can learn more about the steering committee’s role in their [charter](https://github.com/kubernetes/steering/blob/master/charter.md).

Thank you to everyone who voted in the election; your participation helps support the community’s continued health and success.
--&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/elections/steering/2025&#34;&gt;2025 指导委员会选举&lt;/a&gt;现已结束。
Kubernetes 指导委员会由 7 个席位组成，其中 4 个席位在 2025 年进行了选举。
新当选的委员会成员将任职 2 年，所有成员均由 Kubernetes 社区选举产生。&lt;/p&gt;
&lt;p&gt;指导委员会负责监督整个 Kubernetes 项目的治理。权力越大责任越大，
你可以通过他们的&lt;a href=&#34;https://github.com/kubernetes/steering/blob/master/charter.md&#34;&gt;章程&lt;/a&gt;了解指导委员会的角色。&lt;/p&gt;
&lt;p&gt;感谢每位参与投票的人；你的参与有助于支持社区的持续健康和成功。&lt;/p&gt;
&lt;!--
## Results

Congratulations to the elected committee members whose two year terms begin immediately (listed in alphabetical order by GitHub handle):
--&gt;
&lt;h2 id=&#34;结果&#34;&gt;结果&lt;/h2&gt;
&lt;p&gt;祝贺当选的委员会成员，其两年任期立即开始（按 GitHub 名称字母顺序列出）：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Kat Cosgrove (&lt;a href=&#34;https://github.com/katcosgrove&#34;&gt;@katcosgrove&lt;/a&gt;), Minimus&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Paco Xu 徐俊杰 (&lt;a href=&#34;https://github.com/pacoxu&#34;&gt;@pacoxu&lt;/a&gt;), DaoCloud&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rita Zhang (&lt;a href=&#34;https://github.com/ritazh&#34;&gt;@ritazh&lt;/a&gt;), Microsoft&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Maciej Szulik (&lt;a href=&#34;https://github.com/soltysh&#34;&gt;@soltysh&lt;/a&gt;), Defense Unicorns&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
They join continuing members:
--&gt;
&lt;p&gt;他们将与以下连任成员一起工作：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Antonio Ojea (&lt;a href=&#34;https://github.com/aojea&#34;&gt;@aojea&lt;/a&gt;), Google&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Benjamin Elder (&lt;a href=&#34;https://github.com/BenTheElder&#34;&gt;@BenTheElder&lt;/a&gt;), Google&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sascha Grunert (&lt;a href=&#34;https://github.com/saschagrunert&#34;&gt;@saschagrunert&lt;/a&gt;), Red Hat&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Maciej Szulik and Paco Xu are returning Steering Committee Members.

## Big thanks!

Thank you and congratulations on a successful election to this round’s election officers:
--&gt;
&lt;p&gt;Maciej Szulik 和徐俊杰（Paco Xu）是回归的指导委员会成员。&lt;/p&gt;
&lt;h2 id=&#34;十分感谢&#34;&gt;十分感谢！&lt;/h2&gt;
&lt;p&gt;感谢并祝贺本轮选举官员成功完成选举工作：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Christoph Blecker (&lt;a href=&#34;https://github.com/cblecker&#34;&gt;@cblecker&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Nina Polshakova (&lt;a href=&#34;https://github.com/npolshakova&#34;&gt;@npolshakova&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Sreeram Venkitesh (&lt;a href=&#34;https://github.com/sreeram-venkitesh&#34;&gt;@sreeram-venkitesh&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Thanks to the Emeritus Steering Committee Members. Your service is appreciated by the community:
--&gt;
&lt;p&gt;感谢名誉指导委员会成员，你们的服务受到社区的赞赏：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Stephen Augustus (&lt;a href=&#34;https://github.com/justaugustus&#34;&gt;@justaugustus&lt;/a&gt;), Bloomberg&lt;/li&gt;
&lt;li&gt;Patrick Ohly (&lt;a href=&#34;https://github.com/pohly&#34;&gt;@pohly&lt;/a&gt;), Intel&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
And thank you to all the candidates who came forward to run for election.
--&gt;
&lt;p&gt;感谢所有参加竞选的候选人。&lt;/p&gt;
&lt;!--
## Get involved with the Steering Committee

This governing body, like all of Kubernetes, is open to all. You can follow along with Steering Committee [meeting notes](https://bit.ly/k8s-steering-wd) and weigh in by filing an issue or creating a PR against their [repo](https://github.com/kubernetes/steering). They have an open meeting on [the first Wednesday at 8am PT of every month](https://github.com/kubernetes/steering). They can also be contacted at their public mailing list steering@kubernetes.io.

You can see what the Steering Committee meetings are all about by watching past meetings on the [YouTube Playlist](https://www.youtube.com/playlist?list=PL69nYSiGNLP1yP1B_nd9-drjoxp0Q14qM).
--&gt;
&lt;h2 id=&#34;参与指导委员会&#34;&gt;参与指导委员会&lt;/h2&gt;
&lt;p&gt;这个管理机构与所有 Kubernetes 一样，向所有人开放。
你可以关注指导委员会&lt;a href=&#34;https://github.com/orgs/kubernetes/projects/40&#34;&gt;会议记录&lt;/a&gt;，
并通过提交 Issue 或针对其 &lt;a href=&#34;https://github.com/kubernetes/steering&#34;&gt;repo&lt;/a&gt; 创建 PR 来参与。
他们在&lt;a href=&#34;https://github.com/kubernetes/steering&#34;&gt;太平洋时间每月第一个周三上午 8:00&lt;/a&gt; 举行开放的会议。
你还可以通过其公共邮件列表 &lt;a href=&#34;mailto:steering@kubernetes.io&#34;&gt;steering@kubernetes.io&lt;/a&gt; 与他们联系。&lt;/p&gt;
&lt;p&gt;你可以通过在
&lt;a href=&#34;https://www.youtube.com/playlist?list=PL69nYSiGNLP1yP1B_nd9-drjoxp0Q14qM&#34;&gt;YouTube 播放列表&lt;/a&gt;上观看过去的会议来了解指导委员会会议的全部内容。&lt;/p&gt;
&lt;hr&gt;
&lt;!--
_This post was adapted from one written by the [Contributor Comms Subproject](https://github.com/kubernetes/community/tree/master/communication/contributor-comms). If you want to write stories about the Kubernetes community, learn more about us._
--&gt;
&lt;p&gt;&lt;strong&gt;这篇文章是由&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/communication/contributor-comms&#34;&gt;贡献者通信子项目&lt;/a&gt;撰写的。
如果你想撰写有关 Kubernetes 社区的故事，请了解有关我们的更多信息。&lt;/strong&gt;&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.35：云控制器管理器中的基于 Watch 的路由协调</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/10/27/watch-based-route-reconciliation-in-ccm/</link>
      <pubDate>Mon, 27 Oct 2025 08:30:00 -0700</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/10/27/watch-based-route-reconciliation-in-ccm/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.35: Watch Based Route Reconciliation in the Cloud Controller Manager&#34;
date: 2025-10-27T08:30:00-07:00
draft: true
slug: watch-based-route-reconciliation-in-ccm
author: &gt;
  [Lukas Metzner](https://github.com/lukasmetzner) (Hetzner)
--&gt;
&lt;!--
Up to and including Kubernetes v1.34, the route controller in
Cloud Controller Manager (CCM) implementations built using the
[k8s.io/cloud-provider](https://github.com/kubernetes/cloud-provider)
library reconciles routes at a fixed interval.
This causes unnecessary API requests to the cloud provider when
there are no changes to routes. 
Other controllers implemented through the same library already
use watch-based mechanisms, leveraging informers to avoid unnecessary API calls.
A new feature gate is being introduced in v1.35 to allow changing the behavior of
the route controller to use watch-based informers.
--&gt;
&lt;p&gt;截至 Kubernetes v1.34，使用
&lt;a href=&#34;https://github.com/kubernetes/cloud-provider&#34;&gt;k8s.io/cloud-provider&lt;/a&gt;
库构建的云控制器管理器（CCM）实现中的路由控制器以固定间隔协调路由。
这在路由没有变化时导致了不必要的 API 请求。
通过相同库实现的其他控制器已经使用基于 Watch 的机制，利用 Informer
来避免不必要的 API 调用。
在 v1.35 中，引入了一个新的&lt;strong&gt;特性门控&lt;/strong&gt;，允许更改路由控制器的行为以使用基于
Watch 的 Informer。&lt;/p&gt;
&lt;!--
## What&#39;s new?

The feature gate `CloudControllerManagerWatchBasedRoutesReconciliation`
has been introduced to
[k8s.io/cloud-provider](https://github.com/kubernetes/cloud-provider)
in alpha stage by
[SIG Cloud Provider](https://github.com/kubernetes/community/blob/master/sig-cloud-provider/README.md).
To enable this feature you can use
`--feature-gate=CloudControllerManagerWatchBasedRoutesReconciliation=true`
in the CCM implementation you are using.
--&gt;
&lt;h2 id=&#34;新的变化&#34;&gt;新的变化&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;CloudControllerManagerWatchBasedRoutesReconciliation&lt;/code&gt; &lt;strong&gt;特性门控&lt;/strong&gt;已由
&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-cloud-provider/README.md&#34;&gt;SIG Cloud Provider&lt;/a&gt;
在 &lt;a href=&#34;https://github.com/kubernetes/cloud-provider&#34;&gt;k8s.io/cloud-provider&lt;/a&gt;
中作为 Alpha 级别特性引入。
要启用此特性，你可以在使用的 CCM 实现中使用
&lt;code&gt;--feature-gate=CloudControllerManagerWatchBasedRoutesReconciliation=true&lt;/code&gt;
标志。&lt;/p&gt;
&lt;!--
## About the feature gate

This feature gate will trigger the route reconciliation loop
whenever a node is added, deleted, or the fields `.spec.podCIDRs`
or `.status.addresses` are updated.
--&gt;
&lt;h2 id=&#34;关于特性门控&#34;&gt;关于特性门控&lt;/h2&gt;
&lt;p&gt;此特性门控将在节点被添加、删除或字段 &lt;code&gt;.spec.podCIDRs&lt;/code&gt; 或 &lt;code&gt;.status.addresses&lt;/code&gt;
被更新时，触发路由协调循环。&lt;/p&gt;
&lt;!--
An additional reconcile is performed in a random interval between 12h and 24h,
which is chosen at the controller&#39;s start time.
--&gt;
&lt;p&gt;另外，在控制器启动时会随机选择一个 12 到 24 小时之间的间隔进行额外的协调。&lt;/p&gt;
&lt;!--
This feature gate does not modify the logic within the reconciliation loop.
Therefore, users of a CCM implementation should not experience significant
changes to their existing route configurations.
--&gt;
&lt;p&gt;此特性门控不会修改协调循环内的逻辑。
因此，CCM 实现的用户不应体验到对其现有路由配置的重大变更。&lt;/p&gt;
&lt;!--
## How can I learn more?

For more details, refer to the [KEP-5237](https://kep.k8s.io/5237).
--&gt;
&lt;h2 id=&#34;了解更多&#34;&gt;了解更多&lt;/h2&gt;
&lt;p&gt;欲获取更多详情，请参阅 &lt;a href=&#34;https://kep.k8s.io/5237&#34;&gt;KEP-5237&lt;/a&gt;。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>7 个常见的 Kubernetes 坑（以及我是如何避开的）</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/10/20/seven-kubernetes-pitfalls-and-how-to-avoid/</link>
      <pubDate>Mon, 20 Oct 2025 08:30:00 -0700</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/10/20/seven-kubernetes-pitfalls-and-how-to-avoid/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;7 Common Kubernetes Pitfalls (and How I Learned to Avoid Them)&#34;
date: 2025-10-20T08:30:00-07:00
slug: seven-kubernetes-pitfalls-and-how-to-avoid
author: &gt;
  Abdelkoddous Lhajouji
--&gt;
&lt;!--
It&#39;s no secret that Kubernetes can be both powerful and frustrating at times. When I first started dabbling with container orchestration, I made more than my fair share of mistakes enough to compile a whole list of pitfalls. In this post, I want to walk through seven big gotchas I&#39;ve encountered (or seen others run into) and share some tips on how to avoid them. Whether you&#39;re just kicking the tires on Kubernetes or already managing production clusters, I hope these insights help you steer clear of a little extra stress.
--&gt;
&lt;p&gt;Kubernetes 功能强大，但有时也会令人沮丧，这已不是什么秘密。
当我刚开始接触容器编排时，我犯了不少错误，足以列出一整张误区清单。
在这篇文章中，我想分享我遇到的（或看到其他人遇到的）七个常见误区，
以及如何避免它们的建议。
无论你只是刚开始尝试 Kubernetes，还是已经在管理生产集群，
我希望这些见解能帮助你避免一些额外的麻烦。&lt;/p&gt;
&lt;!--
## 1. Skipping resource requests and limits
--&gt;
&lt;h2 id=&#34;1-skipping-resource-requests-and-limits&#34;&gt;1. 忽略资源 requests 和 limits&lt;/h2&gt;
&lt;!--
**The pitfall**: Not specifying CPU and memory requirements in Pod specifications. This typically happens because Kubernetes does not require these fields, and workloads can often start and run without them—making the omission easy to overlook in early configurations or during rapid deployment cycles.
--&gt;
&lt;p&gt;&lt;strong&gt;常见误区&lt;/strong&gt;：在 Pod 规约中未指定 CPU 和内存需求。
这种情况经常发生，原因是 Kubernetes 不要求这些字段必须设置，
工作负载通常可以在没有这些字段的情况下启动和运行——
这使得在早期配置或快速部署周期中很容易忽略这些设置。&lt;/p&gt;
&lt;!--
**Context**:
In Kubernetes, resource requests and limits are critical for efficient cluster management. Resource requests ensure that the scheduler reserves the appropriate amount of CPU and memory for each pod, guaranteeing that it has the necessary resources to operate. Resource limits cap the amount of CPU and memory a pod can use, preventing any single pod from consuming excessive resources and potentially starving other pods.
When resource requests and limits are not set:
--&gt;
&lt;p&gt;&lt;strong&gt;背景&lt;/strong&gt;：
在 Kubernetes 中，资源请求和限制对于高效的集群管理至关重要。
资源请求确保调度器为每个 Pod 预留适当数量的 CPU 和内存，
保证它有必要的资源来运行。
资源限制限制了 Pod 可以使用的 CPU 和内存数量，
防止任何单个 Pod 消耗过多资源而可能导致其他 Pod 资源不足。
当未设置资源请求和限制时：&lt;/p&gt;
&lt;!--
 1. Resource Starvation: Pods may get insufficient resources, leading to degraded performance or failures. This is because Kubernetes schedules pods based on these requests. Without them, the scheduler might place too many pods on a single node, leading to resource contention and performance bottlenecks.
 2. Resource Hoarding: Conversely, without limits, a pod might consume more than its fair share of resources, impacting the performance and stability of other pods on the same node. This can lead to issues such as other pods getting evicted or killed by the Out-Of-Memory (OOM) killer due to lack of available memory.
--&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;资源不足&lt;/strong&gt;：Pod 可能未获得足够的资源，导致性能下降或运行失败。
这是因为 Kubernetes 根据这些请求来调度 Pod。
没有这些请求，调度器可能会在单个节点上放置过多的 Pod，导致资源竞争和性能瓶颈。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;资源囤积&lt;/strong&gt;：相反，没有设置限制值时，一个 Pod 可能会消耗过多的资源，
影响同一节点上其他 Pod 的性能和稳定性。
这可能导致其他 Pod 因内存不足而被驱逐或被 OOM（Out-Of-Memory）强制终止。&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
### How to avoid it:
- Start with modest `requests` (for example `100m` CPU, `128Mi` memory) and see how your app behaves.
- Monitor real-world usage and refine your values; the [HorizontalPodAutoscaler](/docs/tasks/run-application/horizontal-pod-autoscale/) can help automate scaling based on metrics.
- Keep an eye on `kubectl top pods` or your logging/monitoring tool to confirm you&#39;re not over- or under-provisioning.
--&gt;
&lt;h3 id=&#34;如何避免&#34;&gt;如何避免：&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;从适度的 &lt;code&gt;requests&lt;/code&gt; 开始（例如 &lt;code&gt;100m&lt;/code&gt; CPU、&lt;code&gt;128Mi&lt;/code&gt; 内存），观察应用的行为。&lt;/li&gt;
&lt;li&gt;监控实际使用情况并优化你的值；&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/run-application/horizontal-pod-autoscale/&#34;&gt;Pod 水平自动扩缩&lt;/a&gt;
可以帮助基于指标自动扩缩容。&lt;/li&gt;
&lt;li&gt;关注 &lt;code&gt;kubectl top pods&lt;/code&gt; 或你的日志/监控工具，
确认你没有过多或过少地配置资源。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
**My reality check**: Early on, I never thought about memory limits. Things seemed fine on my local cluster. Then, on a larger environment, Pods got *OOMKilled* left and right. Lesson learned.
For detailed instructions on configuring resource requests and limits for your containers, please refer to [Assign Memory Resources to Containers and Pods](/docs/tasks/configure-pod-container/assign-memory-resource/)
(part of the official Kubernetes documentation).
--&gt;
&lt;p&gt;&lt;strong&gt;我的经验教训&lt;/strong&gt;：早期，我从未考虑过内存限制。
在我的本地集群上一切看起来都很好。
后来，在更大的环境中，Pod 被 &lt;strong&gt;OOMKilled&lt;/strong&gt;（内存不足终止）的情况比比皆是。
教训深刻。有关为容器配置资源请求和限制的详细说明，
请参阅&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/configure-pod-container/assign-memory-resource/&#34;&gt;为容器和 Pod 分配内存资源&lt;/a&gt;
（Kubernetes 官方文档的一部分）。&lt;/p&gt;
&lt;!--
## 2. Underestimating liveness and readiness probes
--&gt;
&lt;h2 id=&#34;2-underestimating-liveness-and-readiness-probes&#34;&gt;2. 低估了存活探针和就绪态探针的重要性&lt;/h2&gt;
&lt;!--
**The pitfall**: Deploying containers without explicitly defining how Kubernetes should check their health or readiness. This tends to happen because Kubernetes will consider a container &#34;running&#34; as long as the process inside hasn&#39;t exited. Without additional signals, Kubernetes assumes the workload is functioning—even if the application inside is unresponsive, initializing, or stuck.
--&gt;
&lt;p&gt;&lt;strong&gt;常见误区&lt;/strong&gt;：部署容器时未明确定义 Kubernetes 应如何检查其健康状态或就绪状态。
这通常发生在 Kubernetes 只要容器内的进程未退出就认为容器“正在运行”的情况下。
在没有额外的信号的情况下，Kubernetes 会假设工作负载正在运行——
即使内部的应用无响应、正在初始化或卡住。&lt;/p&gt;
&lt;!--
**Context**:  
Liveness, readiness, and startup probes are mechanisms Kubernetes uses to monitor container health and availability. 

- **Liveness probes** determine if the application is still alive. If a liveness check fails, the container is restarted.
- **Readiness probes** control whether a container is ready to serve traffic. Until the readiness probe passes, the container is removed from Service endpoints.
- **Startup probes** help distinguish between long startup times and actual failures.
--&gt;
&lt;p&gt;&lt;strong&gt;背景&lt;/strong&gt;：
存活态、就绪态和启动探针是 Kubernetes 用来监控容器健康状态和可用性的机制。&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;存活态探针&lt;/strong&gt;确定应用是否仍然存活。如果存活态检查失败，容器会被重启。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;就绪态探针&lt;/strong&gt;控制容器是否准备好接收流量。
在就绪态探针通过之前，容器会从 Service 端点中移除。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;启动探针&lt;/strong&gt;帮助区分长时间启动和实际故障。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
### How to avoid it:
- Add a simple HTTP `livenessProbe` to check a health endpoint (for example `/healthz`) so Kubernetes can restart a hung container.
- Use a `readinessProbe` to ensure traffic doesn&#39;t reach your app until it&#39;s warmed up.
- Keep probes simple. Overly complex checks can create false alarms and unnecessary restarts.
--&gt;
&lt;h3 id=&#34;如何避免-1&#34;&gt;如何避免：&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;添加一个简单的 HTTP &lt;code&gt;livenessProbe&lt;/code&gt; 来检查健康端点（例如 &lt;code&gt;/healthz&lt;/code&gt;），
以便 Kubernetes 可以重启挂起的容器。&lt;/li&gt;
&lt;li&gt;使用 &lt;code&gt;readinessProbe&lt;/code&gt; 确保在应用预热完成之前流量不会到达应用。&lt;/li&gt;
&lt;li&gt;保持探针简单。过于复杂的检查可能会产生误报和不必要的重启。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
**My reality check**: I once forgot a readiness probe for a web service that took a while to load. Users hit it prematurely, got weird timeouts, and I spent hours scratching my head. A 3-line readiness probe would have saved the day.
--&gt;
&lt;p&gt;&lt;strong&gt;我的经验教训&lt;/strong&gt;：我曾经忘记为一个需要一段时间才能加载的 Web 服务设置就绪态探针。
用户过早访问了它，遇到了奇怪的超时，我花了几个小时才找到问题。
一个 3 行的就绪态探针就能解决这个问题。&lt;/p&gt;
&lt;!--
For comprehensive instructions on configuring liveness, readiness, and startup probes for containers, please refer to [Configure Liveness, Readiness and Startup Probes](/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
in the official Kubernetes documentation.
--&gt;
&lt;p&gt;有关为容器配置存活态、就绪态和启动探针的全面说明，
请参阅 Kubernetes 官方文档中的&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/&#34;&gt;配置存活、就绪和启动探针&lt;/a&gt;。&lt;/p&gt;
&lt;!--
## 3. &#34;We&#39;ll just look at container logs&#34; (famous last words)
--&gt;
&lt;h2 id=&#34;3-well-just-look-at-container-logs-famous-last-words&#34;&gt;3. &amp;quot;我们只需要查看容器日志&amp;quot;（著名的遗言）&lt;/h2&gt;
&lt;!--
**The pitfall**: Relying solely on container logs retrieved via `kubectl logs`. This often happens because the command is quick and convenient, and in many setups, logs appear accessible during development or early troubleshooting. However, `kubectl logs` only retrieves logs from currently running or recently terminated containers, and those logs are stored on the node&#39;s local disk. As soon as the container is deleted, evicted, or the node is restarted, the log files may be rotated out or permanently lost.
--&gt;
&lt;p&gt;&lt;strong&gt;常见误区&lt;/strong&gt;：仅依赖通过 &lt;code&gt;kubectl logs&lt;/code&gt; 检索的容器日志。
这种想法背后的原因通常是因为查看日志的命令既快速又便捷，
在许多集群环境中，日志在开发或早期故障排除时似乎可以访问。
然而，&lt;code&gt;kubectl logs&lt;/code&gt; 只能从当前正在运行或最近终止的容器中检索日志，
这些日志存储在节点的本地磁盘上。
一旦容器被删除、驱逐或节点重启，日志文件可能会被轮换掉或永久丢失。&lt;/p&gt;
&lt;!--
### How to avoid it:
- **Centralize logs** using CNCF tools like [Fluentd](https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent) or [Fluent Bit](https://fluentbit.io/) to aggregate output from all Pods.
- **Adopt OpenTelemetry** for a unified view of logs, metrics, and (if needed) traces. This lets you spot correlations between infrastructure events and app-level behavior.
- **Pair logs with Prometheus metrics** to track cluster-level data alongside application logs. If you need distributed tracing, consider CNCF projects like [Jaeger](https://www.jaegertracing.io/).
--&gt;
&lt;h3 id=&#34;如何避免-2&#34;&gt;如何避免：&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;集中化日志&lt;/strong&gt;：使用 CNCF 工具如 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent&#34;&gt;Fluentd&lt;/a&gt;
或 &lt;a href=&#34;https://fluentbit.io/&#34;&gt;Fluent Bit&lt;/a&gt; 来聚合所有 Pod 的输出。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;采用 OpenTelemetry&lt;/strong&gt;：用于构造日志、指标和（如果需要）追踪的统一视图。
这让你能够发现基础设施事件和应用级行为之间的关联。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;将日志与 Prometheus 指标对应起来&lt;/strong&gt;：与应用日志同时跟踪集群级数据。
如果你需要分布式追踪，可以考虑 &lt;a href=&#34;https://www.jaegertracing.io/&#34;&gt;Jaeger&lt;/a&gt; 这类 CNCF 项目。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
**My reality check**: The first time I lost Pod logs to a quick restart, I realized how flimsy &#34;kubectl logs&#34; can be on its own. Since then, I&#39;ve set up a proper pipeline for every cluster to avoid missing vital clues.
--&gt;
&lt;p&gt;&lt;strong&gt;我的经验教训&lt;/strong&gt;：第一次因为快速重启而丢失 Pod 日志时，
我意识到仅依赖 &amp;quot;kubectl logs&amp;quot; 是多么不可靠。
从那时起，我为每个集群都搭建了完整的日志采集管道，
以避免错过任何关键线索。&lt;/p&gt;
&lt;!--
## 4. Treating dev and prod exactly the same
--&gt;
&lt;h2 id=&#34;4-treating-dev-and-prod-exactly-the-same&#34;&gt;4. 将开发环境和生产环境视为完全相同&lt;/h2&gt;
&lt;!--
**The pitfall**: Deploying the same Kubernetes manifests with identical settings across development, staging, and production environments. This often occurs when teams aim for consistency and reuse, but overlook that environment-specific factors—such as traffic patterns, resource availability, scaling needs, or access control—can differ significantly. Without customization, configurations optimized for one environment may cause instability, poor performance, or security gaps in another.
--&gt;
&lt;p&gt;&lt;strong&gt;常见误区&lt;/strong&gt;：在开发、预发布和生产环境中使用相同的 Kubernetes 清单和相同的设置进行部署。
在团队追求一致性和复用，
但忽略了环境特定的因素——如流量模式、资源可用性、扩缩容需求或访问控制——
可能显著不同时，常会发生这种情况。
如果略过定制这一步骤，针对一个环境优化的配置可能会导致负载在另一个环境下不稳定、
性能差或暴露安全漏洞。&lt;/p&gt;
&lt;!--
### How to avoid it:
- Use environment overlays or [kustomize](https://kustomize.io/) to maintain a shared base while customizing resource requests, replicas, or config for each environment.
- Extract environment-specific configuration into ConfigMaps and / or Secrets. You can use a specialized tool such as [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets) to manage confidential data.
- Plan for scale in production. Your dev cluster can probably get away with minimal CPU/memory, but prod might need significantly more.
--&gt;
&lt;h3 id=&#34;如何避免-3&#34;&gt;如何避免：&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;使用环境覆盖层或 &lt;a href=&#34;https://kustomize.io/&#34;&gt;kustomize&lt;/a&gt; 来维护共享基础，
同时为每个环境定制资源请求、副本数或配置。&lt;/li&gt;
&lt;li&gt;将环境特定的配置提取到 ConfigMap 和/或 Secret 中。
你可以使用专门的工具如 &lt;a href=&#34;https://github.com/bitnami-labs/sealed-secrets&#34;&gt;Sealed Secrets&lt;/a&gt; 来管理机密数据。&lt;/li&gt;
&lt;li&gt;为生产环境中的扩缩需求做规划。
你的开发集群可能只需要最少的 CPU/内存，但生产环境可能需要显著更多。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
**My reality check**: One time, I scaled up `replicaCount` from 2 to 10 in a tiny dev environment just to &#34;test.&#34; I promptly ran out of resources and spent half a day cleaning up the aftermath. Oops.
--&gt;
&lt;p&gt;&lt;strong&gt;我的经验教训&lt;/strong&gt;：有一次，
我在一个很小的开发环境中将 &lt;code&gt;replicaCount&lt;/code&gt; 从 2 扩展到 10，只是为了&amp;quot;测试&amp;quot;。
我立即耗尽了资源，花了半天时间清理后果。&lt;/p&gt;
&lt;!--
## 5. Leaving old stuff floating around
--&gt;
&lt;h2 id=&#34;5-leaving-old-stuff-floating-around&#34;&gt;5. 遗留未清理的旧资源&lt;/h2&gt;
&lt;!--
**The pitfall**: Leaving unused or outdated resources—such as Deployments, Services, ConfigMaps, or PersistentVolumeClaims—running in the cluster. This often happens because Kubernetes does not automatically remove resources unless explicitly instructed, and there is no built-in mechanism to track ownership or expiration. Over time, these forgotten objects can accumulate, consuming cluster resources, increasing cloud costs, and creating operational confusion, especially when stale Services or LoadBalancers continue to route traffic.
--&gt;
&lt;p&gt;&lt;strong&gt;常见误区&lt;/strong&gt;：在集群中遗留未使用或过时的资源——例如 Deployment、Service、ConfigMap 或 PersistentVolumeClaim。
这种情况经常发生，因为 Kubernetes 不会自动删除资源，除非明确指示；
同时系统也没有内建机制来追踪资源的归属或过期时间。
随着时间推移，这些被遗忘的对象可能不断累积，
占用集群资源、增加云成本，并造成运维上的混乱，
尤其是在陈旧的 Service 或 LoadBalancer 仍持续转发流量的情况下。&lt;/p&gt;
&lt;!--
### How to avoid it:
- **Label everything** with a purpose or owner label. That way, you can easily query resources you no longer need.
- **Regularly audit** your cluster: run `kubectl get all -n &lt;namespace&gt;` to see what&#39;s actually running, and confirm it&#39;s all legit.
- **Adopt Kubernetes&#39; Garbage Collection**: [K8s docs](/docs/concepts/workloads/controllers/garbage-collection/) show how to remove dependent objects automatically.
- **Leverage policy automation**: Tools like [Kyverno](https://kyverno.io/) can automatically delete or block stale resources after a certain period, or enforce lifecycle policies so you don&#39;t have to remember every single cleanup step.
--&gt;
&lt;h3 id=&#34;如何避免-4&#34;&gt;如何避免：&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;为所有资源添加标签&lt;/strong&gt;：使用用途或所有者标签。
这样，你可以轻松查询不再需要的资源。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;定期审计集群&lt;/strong&gt;：运行 &lt;code&gt;kubectl get all -n &amp;lt;namespace&amp;gt;&lt;/code&gt; 查看实际运行的内容，
并确认它们都是合法的。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;采用 Kubernetes 的垃圾收集&lt;/strong&gt;：&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/controllers/garbage-collection/&#34;&gt;K8s 文档&lt;/a&gt;
展示了如何自动删除依赖对象。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;利用策略自动化&lt;/strong&gt;：像 &lt;a href=&#34;https://kyverno.io/&#34;&gt;Kyverno&lt;/a&gt; 这样的工具可以在一定时间后自动删除或阻止过期的资源，
或强制执行生命周期策略，这样你就不必记住每个清理步骤。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
**My reality check**: After a hackathon, I forgot to tear down a &#34;test-svc&#34; pinned to an external load balancer. Three weeks later, I realized I&#39;d been paying for that load balancer the entire time. Facepalm.
--&gt;
&lt;p&gt;&lt;strong&gt;我的经验教训&lt;/strong&gt;：在一次黑客松活动结束后，
我忘记删除一个绑定到外部负载均衡器的 &amp;quot;test-svc&amp;quot;。
三周后我才意识到，这段时间我一直在为那个负载均衡器付费。&lt;/p&gt;
&lt;!--
## 6. Diving too deep into networking too soon
--&gt;
&lt;h2 id=&#34;6-diving-too-deep-into-networking-too-soon&#34;&gt;6. 过早深入复杂的网络配置&lt;/h2&gt;
&lt;!--
**The pitfall**: Introducing advanced networking solutions—such as service meshes, custom CNI plugins, or multi-cluster communication—before fully understanding Kubernetes&#39; native networking primitives. This commonly occurs when teams implement features like traffic routing, observability, or mTLS using external tools without first mastering how core Kubernetes networking works: including Pod-to-Pod communication, ClusterIP Services, DNS resolution, and basic ingress traffic handling. As a result, network-related issues become harder to troubleshoot, especially when overlays introduce additional abstractions and failure points.
--&gt;
&lt;p&gt;&lt;strong&gt;常见误区&lt;/strong&gt;：在完全理解 Kubernetes 原生网络原语之前引入高级网络解决方案——
如服务网格、自定义 CNI 插件或多集群通信。
这通常发生在团队使用外部工具实现流量路由、可观测性或 mTLS 等功能，
而没有首先掌握核心 Kubernetes 网络的工作原理：
包括 Pod 到 Pod 通信、ClusterIP Services、DNS 解析和基本 Ingress 流量处理。
因此，网络相关问题变得更难排查，
特别是当覆盖层引入额外的抽象和故障点时。&lt;/p&gt;
&lt;!--
### How to avoid it:

- Start small: a Deployment, a Service, and a basic ingress controller such as one based on NGINX (e.g., Ingress-NGINX).
- Make sure you understand how traffic flows within the cluster, how service discovery works, and how DNS is configured.
- Only move to a full-blown mesh or advanced CNI features when you actually need them, complex networking adds overhead.
--&gt;
&lt;h3 id=&#34;如何避免-5&#34;&gt;如何避免：&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;从简单开始：部署一个 Deployment、一个 Service，
以及一个基础的 Ingress 控制器（例如基于 NGINX 的 Ingress-NGINX）。&lt;/li&gt;
&lt;li&gt;确保理解集群内的流量流向、服务发现机制以及 DNS 的配置方式。&lt;/li&gt;
&lt;li&gt;仅在确实需要时再引入完整的服务网格或高级 CNI 功能，
因为复杂的网络架构会带来额外开销。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
**My reality check**: I tried Istio on a small internal app once, then spent more time debugging Istio itself than the actual app. Eventually, I stepped back, removed Istio, and everything worked fine.
--&gt;
&lt;p&gt;&lt;strong&gt;我的经验教训&lt;/strong&gt;：我曾经在一个小的内部应用上尝试 Istio，
然后花在调试 Istio 本身上的时间比调试实际应用还多。
最终，我退了一步，移除了 Istio，一切运行正常。&lt;/p&gt;
&lt;!--
## 7. Going too light on security and RBAC
--&gt;
&lt;h2 id=&#34;7-going-too-light-on-security-and-rbac&#34;&gt;7. 对安全性和基于角色的访问控制 (RBAC) 重视不足&lt;/h2&gt;
&lt;!--
**The pitfall**: Deploying workloads with insecure configurations, such as running containers as the root user, using the `latest` image tag, disabling security contexts, or assigning overly broad RBAC roles like `cluster-admin`. These practices persist because Kubernetes does not enforce strict security defaults out of the box, and the platform is designed to be flexible rather than opinionated. Without explicit security policies in place, clusters can remain exposed to risks like container escape, unauthorized privilege escalation, or accidental production changes due to unpinned images.
--&gt;
&lt;p&gt;&lt;strong&gt;常见误区&lt;/strong&gt;：以不安全的方式配置部署工作负载，例如以 root 用户运行容器、使用 &lt;code&gt;latest&lt;/code&gt; 镜像标签、
禁用安全上下文（security context），或分配过于宽泛的 RBAC 角色（如 &lt;code&gt;cluster-admin&lt;/code&gt;）。
这些做法之所以普遍存在，是因为 Kubernetes 默认并不会强制实施严格的安全策略——
该平台在设计上追求灵活性而非强约束性。如果未显式配置安全策略，集群可能面临容器逃逸、
未经授权的权限提升或由于未固定镜像导致的意外生产变更等风险。&lt;/p&gt;
&lt;!--
### How to avoid it:

- Use [RBAC](/docs/reference/access-authn-authz/rbac/) to define roles and permissions within Kubernetes. While RBAC is the default and most widely supported authorization mechanism, Kubernetes also allows the use of alternative authorizers. For more advanced or external policy needs, consider solutions like [OPA Gatekeeper](https://open-policy-agent.github.io/gatekeeper/) (based on Rego), [Kyverno](https://kyverno.io/), or custom webhooks using policy languages such as CEL or [Cedar](https://cedarpolicy.com/).
- Pin images to specific versions (no more `:latest`!). This helps you know what&#39;s actually deployed.
- Look into [Pod Security Admission](/docs/concepts/security/pod-security-admission/) (or other solutions like Kyverno) to enforce non-root containers, read-only filesystems, etc.
--&gt;
&lt;h3 id=&#34;如何避免-6&#34;&gt;如何避免：&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;使用 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/access-authn-authz/rbac/&#34;&gt;RBAC&lt;/a&gt; 定义在 Kubernetes 中的角色和权限。
虽然 RBAC 是默认且最广泛支持的鉴权机制，Kubernetes 也允许使用替代性的鉴权组件。
对于更高级或外部策略需求，可以考虑 &lt;a href=&#34;https://open-policy-agent.github.io/gatekeeper/&#34;&gt;OPA Gatekeeper&lt;/a&gt;（基于 Rego）、
&lt;a href=&#34;https://kyverno.io/&#34;&gt;Kyverno&lt;/a&gt; 或使用 CEL 或 &lt;a href=&#34;https://cedarpolicy.com/&#34;&gt;Cedar&lt;/a&gt; 等策略语言的自定义 Webhook 等解决方案。&lt;/li&gt;
&lt;li&gt;将镜像固定到特定版本（不要再使用 &lt;code&gt;:latest&lt;/code&gt;！）。这有助于你了解实际部署的内容。&lt;/li&gt;
&lt;li&gt;查看 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/security/pod-security-admission/&#34;&gt;Pod 安全准入&lt;/a&gt;（或 Kyverno 等其他解决方案）
以强制执行非 root 容器、只读文件系统等。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
**My reality check**: I never had a huge security breach, but I&#39;ve heard plenty of cautionary tales. If you don&#39;t tighten things up, it&#39;s only a matter of time before something goes wrong.
--&gt;
&lt;p&gt;&lt;strong&gt;我的经验教训&lt;/strong&gt;：我从未遇到过巨大的安全漏洞，但我听过很多警示故事。
如果你不加强安全措施，出问题只是时间问题。&lt;/p&gt;
&lt;!--
## Final thoughts
--&gt;
&lt;h2 id=&#34;final-thoughts&#34;&gt;最后的话&lt;/h2&gt;
&lt;!--
Kubernetes is amazing, but it&#39;s not psychic, it won&#39;t magically do the right thing if you don&#39;t tell it what you need. By keeping these pitfalls in mind, you&#39;ll avoid a lot of headaches and wasted time. Mistakes happen (trust me, I&#39;ve made my share), but each one is a chance to learn more about how Kubernetes truly works under the hood.
If you&#39;re curious to dive deeper, the [official docs](/docs/home/) and the [community Slack](http://slack.kubernetes.io/) are excellent next steps. And of course, feel free to share your own horror stories or success tips, because at the end of the day, we&#39;re all in this cloud native adventure together.
--&gt;
&lt;p&gt;Kubernetes 非常强大，但它并非全知全能——如果你不明确告知它你的需求，它不会&amp;quot;神奇地&amp;quot;自动做出正确的决策。
牢记这些常见误区，你就能避免许多麻烦和时间浪费。错误在所难免（相信我，我也犯过不少），但每一次失误，
都是深入理解 Kubernetes 内部工作机制的机会。如果你希望进一步探索，
可以查阅&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/home/&#34;&gt;官方文档&lt;/a&gt;或加入&lt;a href=&#34;http://slack.kubernetes.io/&#34;&gt;社区 Slack&lt;/a&gt;。
当然，也欢迎你分享自己的&amp;quot;踩坑经历&amp;quot;或成功经验——毕竟，在云原生这场旅程中，我们都在同行。&lt;/p&gt;
&lt;!--
**Happy Shipping!**
--&gt;
&lt;p&gt;&lt;strong&gt;祝你部署顺利！&lt;/strong&gt;&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.34：从存储卷扩展失效中恢复（GA）</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/09/19/kubernetes-v1-34-recover-expansion-failure/</link>
      <pubDate>Fri, 19 Sep 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/09/19/kubernetes-v1-34-recover-expansion-failure/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.34: Recovery From Volume Expansion Failure (GA)&#34;
date: 2025-09-19T10:30:00-08:00
slug: kubernetes-v1-34-recover-expansion-failure
author: &gt;
  [Hemant Kumar](https://github.com/gnufied) (Red Hat)
--&gt;
&lt;!--
Have you ever made a typo when expanding your persistent volumes in Kubernetes? Meant to specify `2TB`
but specified `20TiB`? This seemingly innocuous problem was kinda hard to fix - and took the project almost 5 years to fix.
[Automated recovery from storage expansion](/docs/concepts/storage/persistent-volumes/#recovering-from-failure-when-expanding-volumes) has been around for a while in beta; however, with the v1.34 release, we have graduated this to
**general availability**.
--&gt;
&lt;p&gt;你是否曾经在扩展 Kubernetes 中的持久卷时犯过拼写错误？本来想指定 &lt;code&gt;2TB&lt;/code&gt; 却写成了 &lt;code&gt;20TiB&lt;/code&gt;？
这个看似无害的问题实际上很难修复——项目花了将近 5 年时间才解决。
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/storage/persistent-volumes/#recovering-from-failure-when-expanding-volumes&#34;&gt;存储扩展的自动恢复&lt;/a&gt;
此特性在一段时间内一直处于 Beta 状态；不过，随着 v1.34 版本的发布，我们已经将其提升到&lt;strong&gt;正式发布&lt;/strong&gt;状态。&lt;/p&gt;
&lt;!--
While it was always possible to recover from failing volume expansions manually, it usually required cluster-admin access and was tedious to do (See aformentioned link for more information).
--&gt;
&lt;p&gt;虽然手动从失败的卷扩展中恢复总是可能的，但这通常需要集群管理员权限，而且操作繁琐（更多信息请参见上述链接）。&lt;/p&gt;
&lt;!--
What if you make a mistake and then realize immediately?
With Kubernetes v1.34, you should be able to reduce the requested size of the PersistentVolumeClaim (PVC) and, as long as the expansion to previously requested
size hadn&#39;t finished, you can amend the size requested. Kubernetes will
automatically work to correct it. Any quota consumed by failed expansion will be returned to the user and the associated PersistentVolume should be resized to the
latest size you specified.
--&gt;
&lt;p&gt;如果你在申请存储时不小心填错了大小，并且立刻发现了这个错误怎么办？
在 Kubernetes v1.34 中，你可以&lt;strong&gt;降低 PersistentVolumeClaim（PVC）请求的存储大小&lt;/strong&gt;，只要上一次扩容操作还未完成，
就可以修改为新的大小。
Kubernetes 会自动进行修正，归还因扩容失败而暂时占用的配额，并将关联的 PersistentVolume 调整为你最新指定的大小。&lt;/p&gt;
&lt;!--
I&#39;ll walk through an example of how all of this works.
--&gt;
&lt;p&gt;我将通过一个示例来演示这一切是如何工作的。&lt;/p&gt;
&lt;!--
## Reducing PVC size to recover from failed expansion
--&gt;
&lt;h2 id=&#34;通过降低-pvc-尺寸完成从失败的扩展操作中恢复&#34;&gt;通过降低 PVC 尺寸完成从失败的扩展操作中恢复&lt;/h2&gt;
&lt;!--
Imagine that you are running out of disk space for one of your database servers, and you want to expand the PVC from previously
specified `10TB` to `100TB` - but you make a typo and specify `1000TB`.
--&gt;
&lt;p&gt;想象一下，你的某个数据库服务器磁盘空间不足，
你想将 PVC 从之前指定的 &lt;code&gt;10TB&lt;/code&gt; 扩展到 &lt;code&gt;100TB&lt;/code&gt;——但你犯了一个拼写错误，指定了 &lt;code&gt;1000TB&lt;/code&gt;。&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;PersistentVolumeClaim&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myclaim&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;accessModes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- ReadWriteOnce&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;resources&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;requests&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;storage&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;1000TB&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 新的大小配置，但不正确！&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
Now, you may be out of disk space on your disk array or simply ran out of allocated quota on your cloud-provider. But, assume that expansion to `1000TB` is never going to succeed.
--&gt;
&lt;p&gt;现在，你的磁盘阵列可能空间不足，或者云平台所分配的配额已用完。
不管怎样，我们先来假设扩展到 &lt;code&gt;1000TB&lt;/code&gt; 的操作永远不会成功。&lt;/p&gt;
&lt;!--
In Kubernetes v1.34, you can simply correct your mistake and request a new PVC size,
that is smaller than the mistake, provided it is still larger than the original size
of the actual PersistentVolume.
--&gt;
&lt;p&gt;在 Kubernetes v1.34 中，你可以轻松地修正错误，重新请求一个新的 PVC 尺寸，令该尺寸比之前错误请求的更小，
但前提是它&lt;strong&gt;仍需大于最初 PersistentVolume 的实际尺寸&lt;/strong&gt;。&lt;/p&gt;
&lt;!--
```yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: myclaim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100TB # Corrected size; has to be greater than 10TB.
                     # You cannot shrink the volume below its actual size.
```
--&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;PersistentVolumeClaim&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myclaim&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;accessModes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- ReadWriteOnce&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;resources&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;requests&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;storage&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;100TB&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 更正后的大小；必须大于 10TB。&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;                     &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 你不能将卷缩小到其实际大小以下。&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
This requires no admin intervention. Even better, any surplus Kubernetes quota that you temporarily consumed will be automatically returned.
--&gt;
&lt;p&gt;这不需要管理员干预。更好的是，你临时消耗的任何多余 Kubernetes 配额都将自动返回。&lt;/p&gt;
&lt;!--
This fault recovery mechanism does have a caveat: whatever new size you specify for the PVC, it **must** be still higher than the original size in `.status.capacity`.
Since Kubernetes doesn&#39;t support shrinking your PV objects, you can never go below the size that was originally allocated for your PVC request.
--&gt;
&lt;p&gt;这个故障恢复机制有一点很值得注意：无论你为 PVC 所指定的新尺寸是多少，
它&lt;strong&gt;必须&lt;/strong&gt;仍然高于 &lt;code&gt;.status.capacity&lt;/code&gt; 中的原始大小。
由于 Kubernetes 不支持缩小你的 PV 对象，你一定不能给出低于你的 PVC 请求的最初分配尺寸。&lt;/p&gt;
&lt;!--
## Improved error handling and observability of volume expansion
--&gt;
&lt;h2 id=&#34;卷扩展操作的错误处理和可观测性提升&#34;&gt;卷扩展操作的错误处理和可观测性提升&lt;/h2&gt;
&lt;!--
Implementing what might look like a relatively minor change also required us to almost
fully redo how volume expansion works under the hood in Kubernetes.
There are new API fields available in PVC objects which you can monitor to observe progress of volume expansion.
--&gt;
&lt;p&gt;即便看似相对较小的更改，也需要我们几乎完全重新实现 Kubernetes 中卷扩展操作的底层工作方式。
PVC 对象中有新的 API 字段可供你监控以观察卷扩展的进度。&lt;/p&gt;
&lt;!--
### Improved observability of in-progress expansion
--&gt;
&lt;h3 id=&#34;对进行中扩展的可观测性改进&#34;&gt;对进行中扩展的可观测性改进&lt;/h3&gt;
&lt;!--
You can query `.status.allocatedResourceStatus[&#39;storage&#39;]` of a PVC to monitor progress of a volume expansion operation.
For a typical block volume, this should transition between `ControllerResizeInProgress`, `NodeResizePending` and `NodeResizeInProgress` and become nil/empty when volume expansion has finished.

If for some reason, volume expansion to requested size is not feasible it should accordingly be in states like - `ControllerResizeInfeasible` or `NodeResizeInfeasible`.

You can also observe size towards which Kubernetes is working by watching `pvc.status.allocatedResources`.
--&gt;
&lt;p&gt;你可以查询 PVC 的 &lt;code&gt;.status.allocatedResourceStatus[&#39;storage&#39;]&lt;/code&gt; 来监控卷扩展操作的进度。
对于典型的块卷，字段值应该在 &lt;code&gt;ControllerResizeInProgress&lt;/code&gt;、&lt;code&gt;NodeResizePending&lt;/code&gt; 和 &lt;code&gt;NodeResizeInProgress&lt;/code&gt; 之间转换，
并在卷扩展完成时变为 nil（空）。&lt;/p&gt;
&lt;p&gt;如果由于某种原因，无法将卷扩展到请求的尺寸，这一字段应该处于对应的 &lt;code&gt;ControllerResizeInfeasible&lt;/code&gt; 或 &lt;code&gt;NodeResizeInfeasible&lt;/code&gt; 等状态。&lt;/p&gt;
&lt;p&gt;你还可以通过观察 &lt;code&gt;pvc.status.allocatedResources&lt;/code&gt; 来观察 Kubernetes 正在处理的大小。&lt;/p&gt;
&lt;!--
### Improved error handling and reporting
--&gt;
&lt;h3 id=&#34;改进的错误处理和报告&#34;&gt;改进的错误处理和报告&lt;/h3&gt;
&lt;!--
Kubernetes should now retry your failed volume expansions at slower rate, it should make fewer requests to both storage system and Kubernetes apiserver.

Errors observerd during volume expansion are now reported as condition on PVC objects and should persist unlike events. Kubernetes will now populate `pvc.status.conditions` with error keys `ControllerResizeError` or `NodeResizeError` when volume expansion fails.
--&gt;
&lt;p&gt;Kubernetes 现在应该以较慢的速率重试你已经失败的卷扩展操作，它应该向存储系统和 Kubernetes apiserver 发出更少的请求。&lt;/p&gt;
&lt;p&gt;卷扩展期间观察到的错误现在作为 PVC 对象上的状况报告，并且应该持久化，不像事件。当卷扩展失败时，
Kubernetes 现在将用错误键 &lt;code&gt;ControllerResizeError&lt;/code&gt; 或 &lt;code&gt;NodeResizeError&lt;/code&gt; 填充 &lt;code&gt;pvc.status.conditions&lt;/code&gt;。&lt;/p&gt;
&lt;!--
### Fixes long standing bugs in resizing workflows
--&gt;
&lt;h3 id=&#34;修复调整大小工作流中的长期错误&#34;&gt;修复调整大小工作流中的长期错误&lt;/h3&gt;
&lt;!--
This feature also has allowed us to fix long standing bugs in resizing workflow such as [Kubernetes issue #115294](https://github.com/kubernetes/kubernetes/issues/115294).
If you observe anything broken, please report your bugs to [https://github.com/kubernetes/kubernetes/issues](https://github.com/kubernetes/kubernetes/issues/new/choose), along with details about how to reproduce the problem.
--&gt;
&lt;p&gt;此功能还允许我们修复调整大小工作流中的长期存在的若干错误，例如 &lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/115294&#34;&gt;Kubernetes issue #115294&lt;/a&gt;。
如果你观察到任何问题，请将你所发现的错误及如何重新问题的详细信息报告到 &lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/new/choose&#34;&gt;https://github.com/kubernetes/kubernetes/issues&lt;/a&gt;。&lt;/p&gt;
&lt;!--
Working on this feature through its lifecycle was challenging and it wouldn&#39;t have been possible to reach GA
without feedback from [@msau42](https://github.com/msau42), [@jsafrane](https://github.com/jsafrane) and [@xing-yang](https://github.com/xing-yang).
--&gt;
&lt;p&gt;此功能的整个开发周期中充满挑战，如果没有 &lt;a href=&#34;https://github.com/msau42&#34;&gt;@msau42&lt;/a&gt;、&lt;a href=&#34;https://github.com/jsafrane&#34;&gt;@jsafrane&lt;/a&gt; 和 &lt;a href=&#34;https://github.com/xing-yang&#34;&gt;@xing-yang&lt;/a&gt; 的反馈，
就不可能达到正式发布状态。&lt;/p&gt;
&lt;!--
All of the contributors who worked on this also appreciate the input provided by [@thockin](https://github.com/thockin) and [@liggitt](https://github.com/liggitt) at various Kubernetes contributor summits.
--&gt;
&lt;p&gt;感谢所有参与此功能开发的贡献者，同时也感谢 &lt;a href=&#34;https://github.com/thockin&#34;&gt;@thockin&lt;/a&gt;
和 &lt;a href=&#34;https://github.com/liggitt&#34;&gt;@liggitt&lt;/a&gt; 在各种 Kubernetes 贡献者峰会上提供的意见。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.34: 将卷组快照推进至 v1beta2 阶段</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/09/16/kubernetes-v1-34-volume-group-snapshot-beta-2/</link>
      <pubDate>Tue, 16 Sep 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/09/16/kubernetes-v1-34-volume-group-snapshot-beta-2/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.34: Moving Volume Group Snapshots to v1beta2&#34;
date: 2025-09-16T10:30:00-08:00
slug: kubernetes-v1-34-volume-group-snapshot-beta-2
author: &gt;
   Xing Yang (VMware by Broadcom)
--&gt;
&lt;!--
Volume group snapshots were [introduced](/blog/2023/05/08/kubernetes-1-27-volume-group-snapshot-alpha/)
as an Alpha feature with the Kubernetes 1.27 release and moved to [Beta](/blog/2024/12/18/kubernetes-1-32-volume-group-snapshot-beta/) in the Kubernetes 1.32 release.
The recent release of Kubernetes v1.34 moved that support to a second beta.
The support for volume group snapshots relies on a set of
[extension APIs for group snapshots](https://kubernetes-csi.github.io/docs/group-snapshot-restore-feature.html#volume-group-snapshot-apis).
These APIs allow users to take crash consistent snapshots for a set of volumes.
Behind the scenes, Kubernetes uses a label selector to group multiple PersistentVolumeClaims
for snapshotting.
A key aim is to allow you restore that set of snapshots to new volumes and
recover your workload based on a crash consistent recovery point.

This new feature is only supported for [CSI](https://kubernetes-csi.github.io/docs/) volume drivers.
--&gt;
&lt;p&gt;卷组快照在 Kubernetes 1.27 版本中作为 Alpha 特性被引入，
并在 Kubernetes 1.32 版本中移至 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2024/12/18/kubernetes-1-32-volume-group-snapshot-beta/&#34;&gt;Beta&lt;/a&gt; 阶段。
Kubernetes v1.34 的最近一次发布将该支持移至第二个 Beta 阶段。
对卷组快照的支持依赖于一组&lt;a href=&#34;https://kubernetes-csi.github.io/docs/group-snapshot-restore-feature.html#volume-group-snapshot-apis&#34;&gt;用于组快照的扩展 API&lt;/a&gt;。
这些 API 允许用户为一组卷获取崩溃一致性快照。在后台，Kubernetes 根据标签选择器对多个
PersistentVolumeClaim 分组，并进行快照操作。关键目标是允许你将这组快照恢复到新卷上，
并基于崩溃一致性恢复点恢复工作负载。&lt;/p&gt;
&lt;p&gt;此新特性仅支持 &lt;a href=&#34;https://kubernetes-csi.github.io/docs/&#34;&gt;CSI&lt;/a&gt; 卷驱动。&lt;/p&gt;
&lt;!--
## What&#39;s new in Beta 2?

While testing the beta version, we encountered an [issue](https://github.com/kubernetes-csi/external-snapshotter/issues/1271) where the `restoreSize` field is not set for individual VolumeSnapshotContents and VolumeSnapshots if CSI driver does not implement the ListSnapshots RPC call.
We evaluated various options [here](https://docs.google.com/document/d/1LLBSHcnlLTaP6ZKjugtSGQHH2LGZPndyfnNqR1YvzS4/edit?tab=t.0) and decided to make this change releasing a new beta for the API.
--&gt;
&lt;h2 id=&#34;beta-2-的新内容&#34;&gt;Beta 2 的新内容&lt;/h2&gt;
&lt;p&gt;在测试 Beta 版本时，我们遇到了一个问题：如果 CSI 驱动未实现 ListSnapshots RPC 调用，
则对于单独的 VolumeSnapshotContent 和 VolumeSnapshot 来说，&lt;code&gt;restoreSize&lt;/code&gt; 字段不会被设置。
我们在这里评估了不同的选项&lt;a href=&#34;https://docs.google.com/document/d/1LLBSHcnlLTaP6ZKjugtSGQHH2LGZPndyfnNqR1YvzS4/edit?tab=t.0&#34;&gt;此处&lt;/a&gt;，
并决定为此发布一个新的 Beta 版本 API。&lt;/p&gt;
&lt;!--
Specifically, a VolumeSnapshotInfo struct is added in v1beta2, it contains information for an individual volume snapshot that is a member of a volume group snapshot.
VolumeSnapshotInfoList, a list of VolumeSnapshotInfo, is added to VolumeGroupSnapshotContentStatus, replacing VolumeSnapshotHandlePairList.
VolumeSnapshotInfoList is a list of snapshot information returned by the CSI driver to identify snapshots on the storage system.
VolumeSnapshotInfoList is populated by the csi-snapshotter sidecar based on the CSI CreateVolumeGroupSnapshotResponse returned by the CSI driver&#39;s CreateVolumeGroupSnapshot call.

The existing v1beta1 API objects will be converted to the new v1beta2 API objects by a conversion webhook.
--&gt;
&lt;p&gt;具体来说，在 v1beta2 中添加了一个 VolumeSnapshotInfo 结构，它包含了属于卷组快照成员的单个卷快照的信息。&lt;/p&gt;
&lt;p&gt;VolumeSnapshotInfoList，即 VolumeSnapshotInfo 的列表，被添加到 VolumeGroupSnapshotContentStatus
中，取代了 VolumeSnapshotHandlePairList。&lt;/p&gt;
&lt;p&gt;VolumeSnapshotInfoList 是 CSI 驱动通过 ListSnapshots 调用返回的快照信息列表，用于识别存储系统上的快照。&lt;/p&gt;
&lt;p&gt;VolumeSnapshotInfoList 由 csi-snapshotter 边车根据 CSI 驱动的 CreateVolumeGroupSnapshot
调用返回的 CSI CreateVolumeGroupSnapshotResponse 填充。&lt;/p&gt;
&lt;p&gt;现有的 v1beta1 API 对象将通过转换 Webhook 转换为新的 v1beta2 API 对象。&lt;/p&gt;
&lt;!--
## What’s next?

Depending on feedback and adoption, the Kubernetes project plans to push the volume
group snapshot implementation to general availability (GA) in a future release.
--&gt;
&lt;h2 id=&#34;接下来&#34;&gt;接下来？&lt;/h2&gt;
&lt;p&gt;根据反馈和采用情况，Kubernetes 项目计划在未来的版本中将卷组快照实现推进到正式发布版本（GA）。&lt;/p&gt;
&lt;!--
## How can I learn more?

- The [design spec](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/3476-volume-group-snapshot)
  for the volume group snapshot feature.
- The [code repository](https://github.com/kubernetes-csi/external-snapshotter) for volume group
  snapshot APIs and controller.
- CSI [documentation](https://kubernetes-csi.github.io/docs/) on the group snapshot feature.
--&gt;
&lt;h2 id=&#34;如何了解更多&#34;&gt;如何了解更多？&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;卷组快照特性的&lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/3476-volume-group-snapshot&#34;&gt;设计规范&lt;/a&gt;。&lt;/li&gt;
&lt;li&gt;卷组快照 API 和控制器的&lt;a href=&#34;https://github.com/kubernetes-csi/external-snapshotter&#34;&gt;代码仓库&lt;/a&gt;。&lt;/li&gt;
&lt;li&gt;CSI 关于组快照特性的&lt;a href=&#34;https://kubernetes-csi.github.io/docs/&#34;&gt;文档&lt;/a&gt;。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## How do I get involved?

This project, like all of Kubernetes, is the result of hard work by many contributors
from diverse backgrounds working together. On behalf of SIG Storage, I would like to
offer a huge thank you to the contributors who stepped up these last few quarters
to help the project reach beta:
--&gt;
&lt;h2 id=&#34;如何参与&#34;&gt;如何参与？&lt;/h2&gt;
&lt;p&gt;这个项目，如同所有的 Kubernetes 项目一样，是许多来自不同背景的贡献者共同努力的结果。
代表 SIG Storage，我想对过去几个季度中挺身而出帮助项目达到 Beta 阶段的贡献者们表示巨大的感谢：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ben Swartzlander (&lt;a href=&#34;https://github.com/bswartz&#34;&gt;bswartz&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Hemant Kumar (&lt;a href=&#34;https://github.com/gnufied&#34;&gt;gnufied&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Jan Šafránek (&lt;a href=&#34;https://github.com/jsafrane&#34;&gt;jsafrane&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Madhu Rajanna (&lt;a href=&#34;https://github.com/Madhu-1&#34;&gt;Madhu-1&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Michelle Au (&lt;a href=&#34;https://github.com/msau42&#34;&gt;msau42&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Niels de Vos (&lt;a href=&#34;https://github.com/nixpanic&#34;&gt;nixpanic&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Leonardo Cecchi (&lt;a href=&#34;https://github.com/leonardoce&#34;&gt;leonardoce&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Saad Ali (&lt;a href=&#34;https://github.com/saad-ali&#34;&gt;saad-ali&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Xing Yang (&lt;a href=&#34;https://github.com/xing-yang&#34;&gt;xing-yang&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Yati Padia (&lt;a href=&#34;https://github.com/yati1998&#34;&gt;yati1998&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
For those interested in getting involved with the design and development of CSI or
any part of the Kubernetes Storage system, join the
[Kubernetes Storage Special Interest Group](https://github.com/kubernetes/community/tree/master/sig-storage) (SIG).
We always welcome new contributors.

We also hold regular [Data Protection Working Group meetings](https://github.com/kubernetes/community/tree/master/wg-data-protection).
New attendees are welcome to join our discussions.
--&gt;
&lt;p&gt;对于那些有兴趣参与 CSI 或 Kubernetes 存储系统任何部分的设计和开发的人，可以加入
&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-storage&#34;&gt;Kubernetes 存储特别兴趣小组&lt;/a&gt;（SIG）。
我们始终欢迎新的贡献者。&lt;/p&gt;
&lt;p&gt;我们还定期举行&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/wg-data-protection&#34;&gt;数据保护工作组会议&lt;/a&gt;。
新参会者可以加入我们的讨论。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.34：可变 CSI 节点可分配数进阶至 Beta</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/09/11/kubernetes-v1-34-mutable-csi-node-allocatable-count/</link>
      <pubDate>Thu, 11 Sep 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/09/11/kubernetes-v1-34-mutable-csi-node-allocatable-count/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.34: Mutable CSI Node Allocatable Graduates to Beta&#34;
date: 2025-09-11T10:30:00-08:00
slug: kubernetes-v1-34-mutable-csi-node-allocatable-count
author: Eddie Torres (Amazon Web Services)
--&gt;
&lt;!--
The [functionality for CSI drivers to update information about attachable volume count on the nodes](https://kep.k8s.io/4876), first introduced as Alpha in Kubernetes v1.33, has graduated to **Beta** in the Kubernetes v1.34 release! This marks a significant milestone in enhancing the accuracy of stateful pod scheduling by reducing failures due to outdated attachable volume capacity information.
--&gt;
&lt;p&gt;&lt;a href=&#34;https://kep.k8s.io/4876&#34;&gt;CSI 驱动更新节点上可挂接卷数量信息的这一功能&lt;/a&gt;在 Kubernetes v1.33
中首次以 Alpha 引入，如今在 Kubernetes v1.34 中进阶为 &lt;strong&gt;Beta&lt;/strong&gt;！
这是提升有状态 Pod 调度准确性的重要里程碑，可减少因可挂接卷容量信息过时所导致的调度失败问题。&lt;/p&gt;
&lt;!--
## Background

Traditionally, Kubernetes [CSI drivers](https://kubernetes-csi.github.io/docs/introduction.html) report a static maximum volume attachment limit when initializing. However, actual attachment capacities can change during a node&#39;s lifecycle for various reasons, such as:
--&gt;
&lt;h2 id=&#34;background&#34;&gt;背景&lt;/h2&gt;
&lt;p&gt;传统上，Kubernetes 的
&lt;a href=&#34;https://kubernetes-csi.github.io/docs/introduction.html&#34;&gt;CSI 驱动&lt;/a&gt;在初始化时会报告一个静态的最大卷挂接限制。
然而，在节点的生命周期中，实际的挂接数量可能因各种原因发生变化，例如：&lt;/p&gt;
&lt;!--
- Manual or external operations attaching/detaching volumes outside of Kubernetes control.
- Dynamically attached network interfaces or specialized hardware (GPUs, NICs, etc.) consuming available slots.
- Multi-driver scenarios, where one CSI driver’s operations affect available capacity reported by another.

Static reporting can cause Kubernetes to schedule pods onto nodes that appear to have capacity but don&#39;t, leading to pods stuck in a `ContainerCreating` state.
--&gt;
&lt;ul&gt;
&lt;li&gt;在 Kubernetes 控制之外的手动或外部卷挂接/解除挂接操作。&lt;/li&gt;
&lt;li&gt;动态挂接的网络接口或专用硬件（GPU、NIC 等）消耗可用的插槽。&lt;/li&gt;
&lt;li&gt;在多驱动场景中，一个 CSI 驱动的操作影响另一个驱动所报告的可用容量。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;静态报告可能导致 Kubernetes 将 Pod 调度到看似有容量但实际上没有容量的节点上，
从而导致 Pod 卡在 &lt;code&gt;ContainerCreating&lt;/code&gt; 状态。&lt;/p&gt;
&lt;!--
## Dynamically adapting CSI volume limits

With this new feature, Kubernetes enables CSI drivers to dynamically adjust and report node attachment capacities at runtime. This ensures that the scheduler, as well as other components relying on this information, have the most accurate, up-to-date view of node capacity.
--&gt;
&lt;h2 id=&#34;dynamically-adapting-csi-volume-limits&#34;&gt;动态调整 CSI 卷限制&lt;/h2&gt;
&lt;p&gt;借助这一新特性，Kubernetes 允许 CSI 驱动在运行时动态调整并报告节点的卷挂接数量。
这一特性可确保调度器以及依赖此信息的其他组件能够获得最准确、最新的节点容量信息。&lt;/p&gt;
&lt;!--
### How it works

Kubernetes supports two mechanisms for updating the reported node volume limits:

- **Periodic Updates:** CSI drivers specify an interval to periodically refresh the node&#39;s allocatable capacity.
- **Reactive Updates:** An immediate update triggered when a volume attachment fails due to exhausted resources (`ResourceExhausted` error).
--&gt;
&lt;h3 id=&#34;how-it-works&#34;&gt;工作原理&lt;/h3&gt;
&lt;p&gt;Kubernetes 支持两种机制来更新所报告的节点卷限制：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;周期性更新：&lt;/strong&gt; CSI 驱动指定一个时间间隔，定期刷新节点的可分配容量。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;触发式更新：&lt;/strong&gt; 当卷挂接因资源耗尽（&lt;code&gt;ResourceExhausted&lt;/code&gt; 错误）而失败时触发立即更新。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
### Enabling the feature

To use this beta feature, the `MutableCSINodeAllocatableCount` feature gate must be enabled in these components:
--&gt;
&lt;h3 id=&#34;enabling-the-feature&#34;&gt;启用特性&lt;/h3&gt;
&lt;p&gt;要使用此 Beta 特性，必须在以下组件中启用 &lt;code&gt;MutableCSINodeAllocatableCount&lt;/code&gt; 特性门控：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;kube-apiserver&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kubelet&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
### Example CSI driver configuration

Below is an example of configuring a CSI driver to enable periodic updates every 60 seconds:
--&gt;
&lt;h3 id=&#34;示例-csi-驱动配置&#34;&gt;示例 CSI 驱动配置&lt;/h3&gt;
&lt;p&gt;以下是配置 CSI 驱动以启用每 60 秒周期性更新一次的示例：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;storage.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;CSIDriver&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;example.csi.k8s.io&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;nodeAllocatableUpdatePeriodSeconds&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;60&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
This configuration directs kubelet to periodically call the CSI driver&#39;s `NodeGetInfo` method every 60 seconds, updating the node’s allocatable volume count. Kubernetes enforces a minimum update interval of 10 seconds to balance accuracy and resource usage.

### Immediate updates on attachment failures

When a volume attachment operation fails due to a `ResourceExhausted` error (gRPC code `8`), Kubernetes immediately updates the allocatable count instead of waiting for the next periodic update. The Kubelet then marks the affected pods as Failed, enabling their controllers to recreate them. This prevents pods from getting permanently stuck in the `ContainerCreating` state.
--&gt;
&lt;p&gt;此配置指示 kubelet 每隔 60 秒调用一次 CSI 驱动的 &lt;code&gt;NodeGetInfo&lt;/code&gt; 方法，以更新节点的可分配卷数。
Kubernetes 强制要求更新时间间隔最小为 10 秒，目的是在准确性与资源消耗间达成平衡。&lt;/p&gt;
&lt;h3 id=&#34;挂接失败时立即更新&#34;&gt;挂接失败时立即更新&lt;/h3&gt;
&lt;p&gt;当卷挂接操作因 &lt;code&gt;ResourceExhausted&lt;/code&gt; 错误（gRPC 代码 &lt;code&gt;8&lt;/code&gt;）而失败时，Kubernetes 会立即更新可分配数量，
而不是等待下一次周期性更新。随后 kubelet 会将受影响的 Pod 标记为 Failed，使其控制器能够重新创建这些 Pod。
这样可以防止 Pod 永久卡在 &lt;code&gt;ContainerCreating&lt;/code&gt; 状态。&lt;/p&gt;
&lt;!--
## Getting started

To enable this feature in your Kubernetes v1.34 cluster:

1. Enable the feature gate `MutableCSINodeAllocatableCount` on the `kube-apiserver` and `kubelet` components.
2. Update your CSI driver configuration by setting `nodeAllocatableUpdatePeriodSeconds`.
3. Monitor and observe improvements in scheduling accuracy and pod placement reliability.
--&gt;
&lt;h2 id=&#34;getting-started&#34;&gt;快速入门&lt;/h2&gt;
&lt;p&gt;要在 Kubernetes v1.34 集群中启用此特性：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;在 &lt;code&gt;kube-apiserver&lt;/code&gt; 和 &lt;code&gt;kubelet&lt;/code&gt; 组件上启用特性门控 &lt;code&gt;MutableCSINodeAllocatableCount&lt;/code&gt;。&lt;/li&gt;
&lt;li&gt;通过设置 &lt;code&gt;nodeAllocatableUpdatePeriodSeconds&lt;/code&gt;，更新你的 CSI 驱动配置。&lt;/li&gt;
&lt;li&gt;监控并观察调度准确性和 Pod 调度可靠性的提升。&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
## Next steps

This feature is currently in beta and the Kubernetes community welcomes your feedback. Test it, share your experiences, and help guide its evolution to GA stability.

Join discussions in the [Kubernetes Storage Special Interest Group (SIG-Storage)](https://github.com/kubernetes/community/tree/master/sig-storage) to shape the future of Kubernetes storage capabilities.
--&gt;
&lt;h2 id=&#34;next-steps&#34;&gt;下一步&lt;/h2&gt;
&lt;p&gt;此特性目前处于 Beta，Kubernetes 社区欢迎你的反馈。请测试、分享你的经验，并帮助推动其发展至 GA（正式发布）稳定版。&lt;/p&gt;
&lt;p&gt;欢迎加入 &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-storage&#34;&gt;Kubernetes SIG-Storage&lt;/a&gt;
参与讨论，共同塑造 Kubernetes 存储能力的未来。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.34: 使用 Init 容器定义应用环境变量</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/09/10/kubernetes-v1-34-env-files/</link>
      <pubDate>Wed, 10 Sep 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/09/10/kubernetes-v1-34-env-files/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.34: Use An Init Container To Define App Environment Variables&#34;
date: 2025-09-10T10:30:00-08:00
draft: true
slug: kubernetes-v1-34-env-files
author: &gt;
  HirazawaUi
--&gt;
&lt;!--
Kubernetes typically uses ConfigMaps and Secrets to set environment variables,
which introduces additional API calls and complexity,
For example, you need to separately manage the Pods of your workloads 
and their configurations, while ensuring orderly 
updates for both the configurations and the workload Pods.

Alternatively, you might be using a vendor-supplied container 
that requires environment variables (such as a license key or a one-time token),
but you don’t want to hard-code them or mount volumes just to get the job done.
--&gt;
&lt;p&gt;Kubernetes 通常使用 ConfigMap 和 Secret 来设置环境变量，
这会引入额外的 API 调用和复杂性。例如，你需要分别管理工作负载的 Pod 和它们的配置，
同时还要确保配置和工作负载 Pod 的有序更新。&lt;/p&gt;
&lt;p&gt;另外，你可能在使用一个供应商提供的、需要环境变量（例如许可证密钥或一次性令牌）的容器，
但你又不想对这些变量进行硬编码，或者仅仅为了完成工作而挂载卷。&lt;/p&gt;
&lt;!--
If that&#39;s the situation you are in, you now have a new (alpha) way to
achieve that. Provided you have the `EnvFiles`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
enabled across your cluster, you can tell the kubelet to load a container&#39;s
environment variables from a volume (the volume must be part of the Pod that
the container belongs to).
this feature gate allows you to load environment variables directly from a file in an emptyDir volume
without actually mounting that file into the container.
It’s a simple yet elegant solution to some surprisingly common problems.
--&gt;
&lt;p&gt;如果你正面对这种情况，现在有一种新的（Alpha）方式来实现。只要你在集群中启用了 &lt;code&gt;EnvFiles&lt;/code&gt;
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/command-line-tools-reference/feature-gates/&#34;&gt;特性门控&lt;/a&gt;，
你就可以告诉 kubelet 从一个卷中加载容器的环境变量（此卷必须是容器所属的 Pod）。
这个特性门控允许你直接从 &lt;code&gt;emptyDir&lt;/code&gt; 卷中的文件加载环境变量，而不需要将该文件实际挂载到容器中。
这是一个简单而优雅的解决方案，可以应对一些出乎意料的常见问题。&lt;/p&gt;
&lt;!--
## What’s this all about?
At its core, this feature allows you to point your container to a file,
one generated by an `initContainer`,
and have Kubernetes parse that file to set your environment variables.
The file lives in an `emptyDir` volume (a temporary storage space that lasts as long as the pod does),
Your main container doesn’t need to mount the volume.
The kubelet will read the file and inject these variables when the container starts.
--&gt;
&lt;h2 id=&#34;what-s-this-all-about&#34;&gt;特性概述 &lt;/h2&gt;
&lt;p&gt;从核心上来说，这个特性允许你将容器指向一个文件，该文件由 &lt;code&gt;initContainer&lt;/code&gt; 生成，
然后让 Kubernetes 解析该文件以设置你的环境变量。此文件位于一个 &lt;code&gt;emptyDir&lt;/code&gt;
卷中（这是一种临时存储空间，只要 Pod 存在就会保留），你的主容器不需要挂载此卷。
kubelet 会在容器启动时读取文件并注入这些变量。&lt;/p&gt;
&lt;!--
## How It Works
Here&#39;s a simple example:
--&gt;
&lt;h2 id=&#34;how-it-works&#34;&gt;工作原理 &lt;/h2&gt;
&lt;p&gt;这里有一个简单的例子：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;initContainers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;generate-config&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;busybox&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;sh&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;-c&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;echo &amp;#34;CONFIG_VAR=HELLO&amp;#34; &amp;gt; /config/config.env&amp;#39;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeMounts&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;config-volume&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;mountPath&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;/config&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;app-container&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gcr.io/distroless/static&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;env&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;CONFIG_VAR&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;valueFrom&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;fileKeyRef&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;path&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;config.env&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;config-volume&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;CONFIG_VAR&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;config-volume&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;emptyDir&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
Using this approach is a breeze.
You define your environment variables in the pod spec using the `fileKeyRef` field,
which tells Kubernetes where to find the file and which key to pull.
The file itself resembles the standard for .env syntax (think KEY=VALUE),
and (for this alpha stage at least) you must ensure that it is written into
an `emptyDir` volume. Other volume types aren&#39;t supported for this feature.
At least one init container must mount that `emptyDir` volume (to write the file),
but the main container doesn’t need to—it just gets the variables handed to it at startup.
--&gt;
&lt;p&gt;使用这种方法非常简单。你在 Pod 规约中使用 &lt;code&gt;fileKeyRef&lt;/code&gt; 字段定义环境变量，
此字段告诉 Kubernetes 去哪里找到文件以及要提取哪个键。
此文件本身类似于 &lt;code&gt;.env&lt;/code&gt; 语法的标准格式（即 &lt;code&gt;KEY=VALUE&lt;/code&gt;），
并且（至少在这个 Alpha 阶段）你必须确保它被写入到一个 &lt;code&gt;emptyDir&lt;/code&gt; 卷中。
其他类型的卷在此特性中不受支持。至少有一个 Init 容器必须挂载该 &lt;code&gt;emptyDir&lt;/code&gt; 卷（以写入文件），
但主容器不需要挂载它——它在启动时就能直接获取这些变量。&lt;/p&gt;
&lt;!--
## A word on security
While this feature supports handling sensitive data such as keys or tokens, 
note that its implementation relies on `emptyDir` volumes mounted into pod.
Operators with node filesystem access could therefore 
easily retrieve this sensitive data through pod directory paths.

If storing sensitive data like keys or tokens using this feature,
ensure your cluster security policies effectively protect nodes
against unauthorized access to prevent exposure of confidential information.
--&gt;
&lt;h2 id=&#34;a-word-on-security&#34;&gt;关于安全性 &lt;/h2&gt;
&lt;p&gt;虽然此特性支持处理密钥或令牌等敏感数据，但需要注意它的实现依赖于挂载到 Pod 的 &lt;code&gt;emptyDir&lt;/code&gt; 卷。
具有节点文件系统访问权限的操作人员因此可以通过 Pod 目录路径轻易获取这些敏感数据。&lt;/p&gt;
&lt;p&gt;如果使用此特性存储密钥或令牌等敏感数据，确保你的集群安全策略能够有效保护节点免受未经授权的访问，
以防止机密信息泄露。&lt;/p&gt;
&lt;!--
## Summary
This feature will eliminate a number of complex workarounds used today, simplifying
apps authoring, and opening doors for more use cases. Kubernetes stays flexible and
open for feedback. Tell us how you use this feature or what is missing.
--&gt;
&lt;h2 id=&#34;summary&#34;&gt;总结&lt;/h2&gt;
&lt;p&gt;此特性将消除如今使用的许多复杂变通方法，简化应用编写，并为更多使用场景打开大门。
Kubernetes 保持灵活性，欢迎反馈。请告诉我们你是如何使用这个特性的，或者此特性还缺少什么。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes 中的 PSI 指标进入 Beta 阶段</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/08/08/introducing-psi-metrics-beta/</link>
      <pubDate>Fri, 08 Aug 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/08/08/introducing-psi-metrics-beta/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;PSI Metrics for Kubernetes Graduates to Beta&#34;
date: 2025-XX-XX
draft: true
slug: introducing-psi-metrics-beta
author: &#34;Haowei Cai (Google)&#34;
--&gt;
&lt;!--
As Kubernetes clusters grow in size and complexity, understanding the health and performance of individual nodes becomes increasingly critical. We are excited to announce that as of Kubernetes v1.34, **Pressure Stall Information (PSI) Metrics** has graduated to Beta.
--&gt;
&lt;p&gt;随着 Kubernetes 集群规模和复杂性的增长，了解各个节点的健康状况和性能变得越来越关键。
我们很高兴地宣布，从 Kubernetes v1.34 开始，&lt;strong&gt;压力停滞信息 (PSI) 指标&lt;/strong&gt;已升级到 Beta 版本。&lt;/p&gt;
&lt;!--
## What is Pressure Stall Information (PSI)?
--&gt;
&lt;h2 id=&#34;what-is-pressure-stall-information-psi&#34;&gt;什么是压力停滞信息 (PSI)？&lt;/h2&gt;
&lt;!--
[Pressure Stall Information (PSI)](https://docs.kernel.org/accounting/psi.html) is a feature of the Linux kernel (version 4.20 and later)
that provides a canonical way to quantify pressure on infrastructure resources,
in terms of whether demand for a resource exceeds current supply.
It moves beyond simple resource utilization metrics and instead
measures the amount of time that tasks are stalled due to resource contention.
This is a powerful way to identify and diagnose resource bottlenecks that can impact application performance.
--&gt;
&lt;p&gt;&lt;a href=&#34;https://docs.kernel.org/accounting/psi.html&#34;&gt;压力停滞信息 (PSI)&lt;/a&gt; 是 Linux 内核（4.20 及更高版本）的一项功能，
它提供了一种规范化的方式来量化基础设施资源的压力，
即资源需求是否超过当前供应。
它超越了简单的资源利用率指标，而是测量任务因资源竞争而停滞的时间。
这是识别和诊断可能影响应用程序性能的资源瓶颈的强大方法。&lt;/p&gt;
&lt;!--
PSI exposes metrics for CPU, memory, and I/O, categorized as either `some` or `full` pressure:
--&gt;
&lt;p&gt;PSI 暴露了 CPU、内存和 I/O 的指标，分为 &lt;code&gt;some&lt;/code&gt; 或 &lt;code&gt;full&lt;/code&gt; 压力：&lt;/p&gt;
&lt;!--
`some`
: The percentage of time that **at least one** task is stalled on a resource. This indicates some level of resource contention.
--&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;some&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;strong&gt;至少一个&lt;/strong&gt;任务在资源上停滞的时间百分比。这表明存在某种程度的资源竞争。&lt;/dd&gt;
&lt;/dl&gt;
&lt;!--
`full`
: The percentage of time that **all** non-idle tasks are stalled on a resource simultaneously. This indicates a more severe resource bottleneck.


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/images/psi-metrics-some-vs-full.svg&#34;
         alt=&#34;Diagram illustrating the difference between &amp;#39;some&amp;#39; and &amp;#39;full&amp;#39; PSI pressure.&#34;/&gt; &lt;figcaption&gt;
            &lt;h4&gt;PSI: &amp;#39;Some&amp;#39; vs. &amp;#39;Full&amp;#39; Pressure&lt;/h4&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
--&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;full&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;strong&gt;所有&lt;/strong&gt;非空闲任务同时在资源上停滞的时间百分比。这表明存在更严重的资源瓶颈。


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/images/psi-metrics-some-vs-full.svg&#34;
         alt=&#34;展示 &amp;#39;some&amp;#39; 与 &amp;#39;full&amp;#39; PSI 压力差异的示意图。&#34;/&gt; &lt;figcaption&gt;
            &lt;h4&gt;PSI：&amp;#39;Some&amp;#39; 与 &amp;#39;Full&amp;#39; 压力对比&lt;/h4&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;&lt;/dd&gt;
&lt;/dl&gt;
&lt;!--
These metrics are aggregated over 10-second, 1-minute, and 5-minute rolling windows, providing a comprehensive view of resource pressure over time.
--&gt;
&lt;p&gt;这些指标在 10 秒、1 分钟和 5 分钟的滚动窗口上进行聚合，提供了随时间变化的资源压力的全面视图。&lt;/p&gt;
&lt;!--
## PSI metrics in Kubernetes
--&gt;
&lt;h2 id=&#34;psi-metrics-in-kubernetes&#34;&gt;Kubernetes 中的 PSI 指标&lt;/h2&gt;
&lt;!--
With the `KubeletPSI` feature gate enabled, the kubelet can now collect PSI metrics from the Linux kernel and expose them through two channels: the [Summary API](/docs/reference/instrumentation/node-metrics#summary-api-source) and the `/metrics/cadvisor` Prometheus endpoint. This allows you to monitor and alert on resource pressure at the node, pod, and container level.
--&gt;
&lt;p&gt;启用 &lt;code&gt;KubeletPSI&lt;/code&gt; 特性门控后，kubelet 现在可以从 Linux 内核收集 PSI 指标，
并通过两个渠道暴露它们：&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/instrumentation/node-metrics/#summary-api-source&#34;&gt;Summary API&lt;/a&gt;
和 &lt;code&gt;/metrics/cadvisor&lt;/code&gt; Prometheus 端点。这允许你在节点、Pod 和容器级别监控和告警资源压力。&lt;/p&gt;
&lt;!--
The following new metrics are available in Prometheus exposition format via `/metrics/cadvisor`:
--&gt;
&lt;p&gt;以下新指标可通过 &lt;code&gt;/metrics/cadvisor&lt;/code&gt; 以 Prometheus 暴露格式获得：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;container_pressure_cpu_stalled_seconds_total&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;container_pressure_cpu_waiting_seconds_total&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;container_pressure_memory_stalled_seconds_total&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;container_pressure_memory_waiting_seconds_total&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;container_pressure_io_stalled_seconds_total&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;container_pressure_io_waiting_seconds_total&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
These metrics, along with the data from the Summary API, provide a granular view of resource pressure, enabling you to pinpoint the source of performance issues and take corrective action. For example, you can use these metrics to:
--&gt;
&lt;p&gt;这些指标与 Summary API 的数据一起，提供了资源压力的细粒度视图，
使你能够精确定位性能问题的根源并采取纠正措施。
例如，你可以使用这些指标来：&lt;/p&gt;
&lt;!--
*   **Identify memory leaks:** A steadily increasing `some` pressure for memory can indicate a memory leak in an application.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;识别内存泄漏：&lt;/strong&gt; 内存的 &lt;code&gt;some&lt;/code&gt; 压力持续增加可能表明应用程序中存在内存泄漏。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
*   **Optimize resource requests and limits:** By understanding the resource pressure of your workloads, you can more accurately tune their resource requests and limits.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;优化资源请求和限制：&lt;/strong&gt; 通过了解你的工作负载的资源压力，你可以更准确地调整其资源请求和限制。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
*   **Autoscale workloads:** You can use PSI metrics to trigger autoscaling events, ensuring that your workloads have the resources they need to perform optimally.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;自动扩缩容工作负载：&lt;/strong&gt; 你可以使用 PSI 指标触发自动扩缩容事件，确保你的工作负载拥有最佳性能所需的资源。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## How to enable PSI metrics
--&gt;
&lt;h2 id=&#34;how-to-enable-psi-metrics&#34;&gt;如何启用 PSI 指标&lt;/h2&gt;
&lt;!--
To enable PSI metrics in your Kubernetes cluster, you need to:
--&gt;
&lt;p&gt;要在你的 Kubernetes 集群中启用 PSI 指标，你需要：&lt;/p&gt;
&lt;!--
1.  **Ensure your nodes are running a Linux kernel version 4.20 or later and are using cgroup v2.**
--&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;确保你的节点运行 Linux 内核版本 4.20 或更高版本，并使用 cgroup v2。&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
2.  **Enable the `KubeletPSI` feature gate on the kubelet.**
--&gt;
&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;&lt;strong&gt;在 kubelet 上启用 &lt;code&gt;KubeletPSI&lt;/code&gt; 特性门控。&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
Once enabled, you can start scraping the `/metrics/cadvisor` endpoint with your Prometheus-compatible monitoring solution or query the Summary API to collect and visualize the new PSI metrics. Note that PSI is a Linux-kernel feature, so these metrics are not available on Windows nodes. Your cluster can contain a mix of Linux and Windows nodes, and on the Windows nodes the kubelet does not expose PSI metrics.
--&gt;
&lt;p&gt;启用后，你可以开始使用 Prometheus 兼容的监控解决方案抓取 &lt;code&gt;/metrics/cadvisor&lt;/code&gt; 端点，
或查询 Summary API 来收集和可视化新的 PSI 指标。
请注意，PSI 是 Linux 内核功能，因此这些指标在 Windows 节点上不可用。
你的集群可以包含 Linux 和 Windows 节点的混合，在 Windows 节点上，kubelet 不会暴露 PSI 指标。&lt;/p&gt;
&lt;!--
## What&#39;s next?
--&gt;
&lt;h2 id=&#34;whats-next&#34;&gt;接下来是什么？&lt;/h2&gt;
&lt;!--
We are excited to bring PSI metrics to the Kubernetes community and look forward to your feedback. As a beta feature, we are actively working on improving and extending this functionality towards a stable GA release. We encourage you to try it out and share your experiences with us.
--&gt;
&lt;p&gt;我们很高兴为 Kubernetes 社区带来 PSI 指标，并期待你的反馈。
作为 Beta 功能，我们正在积极改进和扩展此功能，以实现稳定的 GA 发布。
我们鼓励你试用并与我们分享你的经验。&lt;/p&gt;
&lt;!--
To learn more about PSI metrics, check out the official [Kubernetes documentation](/docs/reference/instrumentation/understand-psi-metrics/). You can also get involved in the conversation on the [#sig-node](https://kubernetes.slack.com/messages/sig-node) Slack channel.
--&gt;
&lt;p&gt;要了解有关 PSI 指标的更多信息，请查看官方 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/docs/reference/instrumentation/understand-psi-metrics/&#34;&gt;Kubernetes 文档&lt;/a&gt;。
你还可以参与 &lt;a href=&#34;https://kubernetes.slack.com/messages/sig-node&#34;&gt;#sig-node&lt;/a&gt; Slack 频道的对话。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Headlamp AI 助手简介</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/08/07/introducing-headlamp-ai-assistant/</link>
      <pubDate>Thu, 07 Aug 2025 20:00:00 +0100</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/08/07/introducing-headlamp-ai-assistant/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Introducing Headlamp AI Assistant&#34;
date: 2025-08-07T20:00:00+01:00
slug: introducing-headlamp-ai-assistant
author: &gt;
  Joaquim Rocha (Microsoft)
canonicalUrl: &#34;https://headlamp.dev/blog/2025/08/07/introducing-the-headlamp-ai-assistant&#34;
--&gt;
&lt;!--
_This announcement originally [appeared](https://headlamp.dev/blog/2025/08/07/introducing-the-headlamp-ai-assistant) on the Headlamp blog._

To simplify Kubernetes management and troubleshooting, we&#39;re thrilled to
introduce [Headlamp AI Assistant](https://github.com/headlamp-k8s/plugins/tree/main/ai-assistant#readme): a powerful new plugin for Headlamp that helps
you understand and operate your Kubernetes clusters and applications with
greater clarity and ease.
--&gt;
&lt;p&gt;&lt;strong&gt;本文是 &lt;a href=&#34;https://headlamp.dev/blog/2025/08/07/introducing-the-headlamp-ai-assistant&#34;&gt;Headlamp AI 助手介绍&lt;/a&gt;这篇博客的中文译稿。&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;为了简化 Kubernetes 的管理和故障排除，我们非常高兴地推出
&lt;a href=&#34;https://github.com/headlamp-k8s/plugins/tree/main/ai-assistant#readme&#34;&gt;Headlamp AI 助手&lt;/a&gt;：
这是 Headlamp 的一个强大的新插件，可以帮助你更清晰、更轻松地理解和操作你的 Kubernetes 集群和应用程序。&lt;/p&gt;
&lt;!--
Whether you&#39;re a seasoned engineer or just getting started, the AI Assistant offers:
* **Fast time to value:** Ask questions like _&#34;Is my application healthy?&#34;_ or
  _&#34;How can I fix this?&#34;_ without needing deep Kubernetes knowledge.
* **Deep insights:** Start with high-level queries and dig deeper with prompts
  like _&#34;List all the problematic pods&#34;_ or _&#34;How can I fix this pod?&#34;_
* **Focused &amp; relevant:** Ask questions in the context of what you&#39;re viewing
  in the UI, such as _&#34;What&#39;s wrong here?&#34;_
* **Action-oriented:** Let the AI take action for you, like _&#34;Restart that
  deployment&#34;_, with your permission.
--&gt;
&lt;p&gt;无论你是经验丰富的工程师还是初学者，AI 助手都能提供：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;快速实现价值&lt;/strong&gt;：无需深入了解 Kubernetes 知识即可提出问题，例如 “我的应用程序健康吗？” 或 “我如何修复这个问题？”&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;深入洞察&lt;/strong&gt;：从高层次查询开始，并通过提示深入挖掘，如 “列出所有有问题的 Pod” 或者 “我如何修复这个 Pod？”&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;专注且相关&lt;/strong&gt;：根据你在 UI 中查看的内容提问，比如 “这里有什么问题？”&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;面向行动&lt;/strong&gt;：让 AI 在获得你的许可后为你采取行动，例如 “重启那个部署”。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Here is a demo of the AI Assistant in action as it helps troubleshoot an
application running with issues in a Kubernetes cluster:
--&gt;
&lt;p&gt;在这里，我们展示 AI 助手在 Kubernetes 集群中处理应用程序问题时的工作方式：&lt;/p&gt;
&lt;p&gt;以下是 AI 助手帮助排查 Kubernetes 集群中运行有问题的应用程序的演示：&lt;/p&gt;


    
    &lt;div class=&#34;youtube-quote-sm&#34;&gt;
      &lt;iframe allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&#34; allowfullscreen=&#34;allowfullscreen&#34; loading=&#34;eager&#34; referrerpolicy=&#34;strict-origin-when-cross-origin&#34; src=&#34;https://www.youtube.com/embed/GzXkUuCTcd4?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0&#34; title=&#34;Headlamp AI Assistant&#34;
      &gt;&lt;/iframe&gt;
    &lt;/div&gt;

&lt;!--
## Hopping on the AI train

Large Language Models (LLMs) have transformed not just how we access data but
also how we interact with it. The rise of tools like ChatGPT opened a world of
possibilities, inspiring a wave of new applications. Asking questions or giving
commands in natural language is intuitive, especially for users who aren&#39;t deeply
technical. Now everyone can quickly ask how to do X or Y, without feeling awkward
or having to traverse pages and pages of documentation like before.
--&gt;
&lt;h2 id=&#34;搭上-ai-列车&#34;&gt;搭上 AI 列车&lt;/h2&gt;
&lt;p&gt;大型语言模型（LLM）不仅改变了我们访问数据的方式，也改变了我们与其交互的方式。
像 ChatGPT 这样的工具的兴起开启了一个充满可能性的世界，激发了一波新的应用浪潮。
用自然语言提问或给出命令是直观的，特别是对于非技术用户而言。现在每个人都可以快速询问如何做 X 或 Y，
而不会感到尴尬，也不必像以前那样遍历一页又一页的文档。&lt;/p&gt;
&lt;!--
Therefore, Headlamp AI Assistant brings a conversational UI to [Headlamp](https://headlamp.dev),
powered by LLMs that Headlamp users can configure with their own API keys.
It is available as a Headlamp plugin, making it easy to integrate into your
existing setup. Users can enable it by installing the plugin and configuring
it with their own LLM API keys, giving them control over which model powers
the assistant. Once enabled, the assistant becomes part of the Headlamp UI,
ready to respond to contextual queries and perform actions directly from the
interface.
--&gt;
&lt;p&gt;因此，Headlamp AI Assistant 将对话式 UI 带入 &lt;a href=&#34;https://headlamp.dev&#34;&gt;Headlamp&lt;/a&gt;，
由 LLM 驱动，Headlamp 用户可以使用自己的 API 密钥进行配置。它作为一个 Headlamp 插件提供，
易于集成到你的现有设置中。用户可以通过安装插件并用自己的 LLM API 密钥进行配置来启用它，
这使他们能够控制哪个模型为助手提供动力。一旦启用，助手就会成为 Headlamp UI 的一部分，
准备好响应上下文查询，并直接从界面执行操作。&lt;/p&gt;
&lt;!--
## Context is everything

As expected, the AI Assistant is focused on helping users with Kubernetes
concepts. Yet, while there is a lot of value in responding to Kubernetes
related questions from Headlamp&#39;s UI, we believe that the great benefit of such
an integration is when it can use the context of what the user is experiencing
in an application. So, the Headlamp AI Assistant knows what you&#39;re currently
viewing in Headlamp, and this makes the interaction feel more like working
with a human assistant.
--&gt;
&lt;h2 id=&#34;上下文就是一切&#34;&gt;上下文就是一切&lt;/h2&gt;
&lt;p&gt;正如预期的那样，AI 助手专注于帮助用户理解 Kubernetes 概念。然而，尽管从
Headlamp 的 UI 回答与 Kubernetes 相关的问题有很多价值，
但我们认为这种集成的最大好处在于它能够使用用户在应用程序中体验到的上下文信息。
因此，Headlamp AI 助手知道你当前在 Headlamp 中查看的内容，
这让交互感觉更像是在与人类助手一起工作。&lt;/p&gt;
&lt;!--
For example, if a pod is failing, users can simply ask _&#34;What&#39;s wrong here?&#34;_
and the AI Assistant will respond with the root cause, like a missing
environment variable or a typo in the image name. Follow-up prompts like
_&#34;How can I fix this?&#34;_ allow the AI Assistant to suggest a fix, streamlining
what used to take multiple steps into a quick, conversational flow.

Sharing the context from Headlamp is not a trivial task though, so it&#39;s
something we will keep working on perfecting.
--&gt;
&lt;p&gt;例如，如果一个 Pod 出现故障，用户只需问 &lt;strong&gt;“这里出了什么问题？”&lt;/strong&gt;，
AI 助手就会回答根本原因，如缺少环境变量或镜像名称中的拼写错误。
后续的问题如 &lt;strong&gt;“我该如何修复？”&lt;/strong&gt; 能让 AI 助手建议一个解决方案，
将原本需要多个步骤的过程简化为快速的对话流。&lt;/p&gt;
&lt;p&gt;然而，从 Headlamp 共享上下文并非易事，因此这是我们将会继续努力完善的工作。&lt;/p&gt;
&lt;!--
## Tools

Context from the UI is helpful, but sometimes additional capabilities are
needed. If the user is viewing the pod list and wants to identify problematic
deployments, switching views should not be necessary. To address this, the AI
Assistant includes support for a Kubernetes tool. This allows asking questions
like &#34;Get me all deployments with problems&#34; prompting the assistant to fetch
and display relevant data from the current cluster. Likewise, if the user
requests an action like &#34;Restart that deployment&#34; after the AI points out what
deployment needs restarting, it can also do that. In case of &#34;write&#34;
operations, the AI Assistant does check with the user for permission to run them.
--&gt;
&lt;h2 id=&#34;工具&#34;&gt;工具&lt;/h2&gt;
&lt;p&gt;UI 中的上下文很有帮助，但有时还需要额外的功能。如果用户正在查看 Pod 列表并想要识别有问题的 Deployment，
切换视图不应是必要的。为此，AI 助手包含了对 Kubernetes 工具的支持。
这允许提出诸如 &lt;strong&gt;“获取所有有问题的 Deployment”&lt;/strong&gt; 的问题，促使助手从当前集群中获取并显示相关数据。
同样，如果用户在 AI 指出哪个部署需要重启后请求执行类似 &lt;strong&gt;“重启那个 Deployment”&lt;/strong&gt; 的操作，
它也可以做到。对于写操作，AI 助手确实会向用户检查是否获得运行权限。&lt;/p&gt;
&lt;!--
## AI Plugins

Although the initial version of the AI Assistant is already useful for
Kubernetes users, future iterations will expand its capabilities. Currently,
the assistant supports only the Kubernetes tool, but further integration with
Headlamp plugins is underway. Similarly, we could get richer insights for
GitOps via the Flux plugin, monitoring through Prometheus, package management
with Helm, and more.

And of course, as the popularity of MCP grows, we are looking into how to
integrate it as well, for a more plug-and-play fashion.
--&gt;
&lt;h2 id=&#34;ai-插件&#34;&gt;AI 插件&lt;/h2&gt;
&lt;p&gt;尽管 AI 助手的初始版本已经对 Kubernetes 用户很有用，但未来的迭代将进一步扩展其功能。
目前，助手仅支持 Kubernetes 工具，但与 Headlamp 插件的进一步集成正在进行中。
类似于，通过 Flux 插件我们可以获得更丰富的 GitOps 见解、通过 Prometheus 进行监控、
使用 Helm 进行包管理等。&lt;/p&gt;
&lt;p&gt;随着 MCP 的流行度增长，我们也在研究如何以更即插即用的方式集成它。&lt;/p&gt;
&lt;!--
## Try it out!

We hope this first version of the AI Assistant helps users manage Kubernetes
clusters more effectively and assist newcomers in navigating the learning
curve. We invite you to try out this early version and give us your feedback.
The AI Assistant plugin can be installed from Headlamp&#39;s Plugin Catalog in the
desktop version, or by using the container image when deploying Headlamp.
Stay tuned for the future versions of the Headlamp AI Assistant!
--&gt;
&lt;h2 id=&#34;试用一下&#34;&gt;试用一下！&lt;/h2&gt;
&lt;p&gt;我们希望 AI 助手的第一个版本能够帮助用户更有效地管理 Kubernetes 集群，
并帮助新用户应对学习曲线。我们邀请你试用这个早期版本，并向我们提供反馈。
AI 助手插件可以从桌面版的 Headlamp 插件目录中安装，或者在部署 Headlamp 时使用容器镜像安装。
敬请期待 Headlamp AI 助手的未来版本！&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.34 抢先一览</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/07/28/kubernetes-v1-34-sneak-peek/</link>
      <pubDate>Mon, 28 Jul 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/07/28/kubernetes-v1-34-sneak-peek/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#39;Kubernetes v1.34 Sneak Peek&#39;
date: 2025-07-28
slug: kubernetes-v1-34-sneak-peek
author: &gt;
  Agustina Barbetta,
  Alejandro Josue Leon Bellido,
  Graziano Casto,
  Melony Qin,
  Dipesh Rawat
--&gt;
&lt;!--
Kubernetes v1.34 is coming at the end of August 2025. 
This release will not include any removal or deprecation, but it is packed with an impressive number of enhancements. 
Here are some of the features we are most excited about in this cycle!  

Please note that this information reflects the current state of v1.34 development and may change before release.
--&gt;
&lt;p&gt;Kubernetes v1.34 将于 2025 年 8 月底发布。
本次发版不会移除或弃用任何特性，但包含了数量惊人的增强特性。
以下列出一些本次发版最令人兴奋的特性！&lt;/p&gt;
&lt;p&gt;请注意，以下内容反映的是 v1.34 当前的开发状态，发布前可能会发生变更。&lt;/p&gt;
&lt;!--
## Featured enhancements of Kubernetes v1.34

The following list highlights some of the notable enhancements likely to be included in the v1.34 release, 
but is not an exhaustive list of all planned changes. 
This is not a commitment and the release content is subject to change.
--&gt;
&lt;h2 id=&#34;kubernetes-v1-34-的重点增强特性&#34;&gt;Kubernetes v1.34 的重点增强特性&lt;/h2&gt;
&lt;p&gt;以下列出了一些可能会包含在 v1.34 版本中的重要增强特性，
但这并不是所有计划更改的详尽列表。
这并不构成承诺，发布内容可能会发生变更。&lt;/p&gt;
&lt;!--
### The core of DRA targets stable

[Dynamic Resource Allocation](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/) (DRA) provides a flexible way to categorize, 
request, and use devices like GPUs or custom hardware in your Kubernetes cluster.
--&gt;
&lt;h3 id=&#34;dra-核心功能趋向稳定&#34;&gt;DRA 核心功能趋向稳定&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/scheduling-eviction/dynamic-resource-allocation/&#34;&gt;动态资源分配&lt;/a&gt;（DRA）
提供了一种灵活的方式来分类、请求和使用集群中的 GPU 或定制硬件等设备。&lt;/p&gt;
&lt;!--
Since the v1.30 release, DRA has been based around claiming devices using _structured parameters_ that are opaque to the core of Kubernetes.
The relevant enhancement proposal, [KEP-4381](https://kep.k8s.io/4381), took inspiration from dynamic provisioning for storage volumes.
DRA with structured parameters relies on a set of supporting API kinds: ResourceClaim, DeviceClass, ResourceClaimTemplate, 
and ResourceSlice API types under `resource.k8s.io`, while extending the `.spec` for Pods with a new `resourceClaims` field.
The core of DRA is targeting graduation to stable in Kubernetes v1.34.
--&gt;
&lt;p&gt;自 v1.30 版本起，DRA 已基于&lt;strong&gt;结构化参数&lt;/strong&gt;来申领设备，这些参数对于 Kubernetes 核心是不可见的。
相关增强提案 &lt;a href=&#34;https://kep.k8s.io/4381&#34;&gt;KEP-4381&lt;/a&gt; 借鉴了存储卷动态制备的思路。
使用结构化参数的 DRA 依赖一组辅助 API 类别：包括 &lt;code&gt;resource.k8s.io&lt;/code&gt; 下的
ResourceClaim、DeviceClass、ResourceClaimTemplate 和 ResourceSlice，
还在 Pod 的 &lt;code&gt;.spec&lt;/code&gt; 中新增了 &lt;code&gt;resourceClaims&lt;/code&gt; 字段。
DRA 的核心功能计划在 Kubernetes v1.34 中进阶至稳定阶段。&lt;/p&gt;
&lt;!--
With DRA, device drivers and cluster admins define device classes that are available for use. 
Workloads can claim devices from a device class within device requests. 
Kubernetes allocates matching devices to specific claims and places the corresponding Pods on nodes that can access the allocated devices. 
This framework provides flexible device filtering using CEL, centralized device categorization, and simplified Pod requests, among other benefits.

Once this feature has graduated, the `resource.k8s.io/v1` APIs will be available by default.
--&gt;
&lt;p&gt;借助 DRA，设备驱动和集群管理员定义可用的设备类。
工作负载可以在设备请求中从设备类申领设备。
Kubernetes 为每个申领分配匹配的设备，并将相关 Pod 安排到可访问所分配设备的节点上。
这种框架提供了使用 CEL 的灵活设备筛选、集中式设备分类和简化的 Pod 请求等优点。&lt;/p&gt;
&lt;p&gt;一旦此特性进入稳定阶段，&lt;code&gt;resource.k8s.io/v1&lt;/code&gt; API 将默认可用。&lt;/p&gt;
&lt;!--
### ServiceAccount tokens for image pull authentication

The [ServiceAccount](/docs/concepts/security/service-accounts/) token integration for `kubelet` credential providers is likely to reach beta and be enabled by default in Kubernetes v1.34. 
This allows the `kubelet` to use these tokens when pulling container images from registries that require authentication.

That support already exists as alpha, and is tracked as part of [KEP-4412](https://kep.k8s.io/4412).
--&gt;
&lt;h3 id=&#34;使用-serviceaccount-令牌进行镜像拉取身份认证&#34;&gt;使用 ServiceAccount 令牌进行镜像拉取身份认证&lt;/h3&gt;
&lt;p&gt;ServiceAccount 令牌与 kubelet 凭据提供程序集成的特性预计将在 Kubernetes v1.34 中进入 Beta 阶段并默认启用。
这将允许 kubelet 在从需要身份认证的镜像仓库中拉取容器镜像时使用这些令牌。&lt;/p&gt;
&lt;p&gt;此特性已作为 Alpha 存在，并由 &lt;a href=&#34;https://kep.k8s.io/4412&#34;&gt;KEP-4412&lt;/a&gt; 跟踪。&lt;/p&gt;
&lt;!--
The existing alpha integration allows the `kubelet` to use short-lived, automatically rotated ServiceAccount tokens (that follow OIDC-compliant semantics) to authenticate to a container image registry. 
Each token is scoped to one associated Pod; the overall mechanism replaces the need for long-lived image pull Secrets.

Adopting this new approach reduces security risks, supports workload-level identity, and helps cut operational overhead. 
It brings image pull authentication closer to modern, identity-aware good practice.
--&gt;
&lt;p&gt;现有的 Alpha 集成允许 kubelet 使用生命期短、自动轮换的 ServiceAccount 令牌
（符合 OIDC 标准）来向容器镜像仓库进行身份认证。
每个令牌与一个 Pod 相关联；整个机制可替代长期存在的镜像拉取 Secret。&lt;/p&gt;
&lt;p&gt;采用这一新方式可以降低安全风险、支持工作负载级身份，并减少运维负担。
它让镜像拉取认证更加贴合现代、具备身份感知的最佳实践。&lt;/p&gt;
&lt;!--
### Pod replacement policy for Deployments

After a change to a [Deployment](/docs/concepts/workloads/controllers/deployment/), terminating pods may stay up for a considerable amount of time and may consume additional resources.
As part of [KEP-3973](https://kep.k8s.io/3973), the `.spec.podReplacementPolicy` field will be introduced (as alpha) for Deployments.

If your cluster has the feature enabled, you&#39;ll be able to select one of two policies:
--&gt;
&lt;h3 id=&#34;deployment-的-pod-替换策略&#34;&gt;Deployment 的 Pod 替换策略&lt;/h3&gt;
&lt;p&gt;对 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/controllers/deployment/&#34;&gt;Deployment&lt;/a&gt;
做出变更后，终止中的 Pod 可能会保留较长时间，并消耗额外资源。
作为 &lt;a href=&#34;https://kep.k8s.io/3973&#34;&gt;KEP-3973&lt;/a&gt; 的一部分，&lt;code&gt;.spec.podReplacementPolicy&lt;/code&gt;
字段将以 Alpha 形式引入到 Deployment 中。&lt;/p&gt;
&lt;p&gt;如果你的集群启用了此特性，你可以选择以下两种策略之一：&lt;/p&gt;
&lt;!--
`TerminationStarted`
: Creates new pods as soon as old ones start terminating, resulting in faster rollouts at the cost of potentially higher resource consumption.

`TerminationComplete`
: Waits until old pods fully terminate before creating new ones, resulting in slower rollouts but ensuring controlled resource consumption.
--&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;TerminationStarted&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;一旦旧 Pod 开始终止，立即创建新 Pod，带来更快的上线速度，但资源消耗可能更高。&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;TerminationComplete&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;等待旧 Pod 完全终止后才创建新 Pod，上线速度较慢，但资源消耗控制更好。&lt;/dd&gt;
&lt;/dl&gt;
&lt;!--
This feature makes Deployment behavior more predictable by letting you choose when new pods should be created during updates or scaling. 
It&#39;s beneficial when working in clusters with tight resource constraints or with workloads with long termination periods. 

It&#39;s expected to be available as an alpha feature and can be enabled using the `DeploymentPodReplacementPolicy` and `DeploymentReplicaSetTerminatingReplicas` feature gates in the API server and kube-controller-manager.
--&gt;
&lt;p&gt;此特性通过让你选择更新或扩缩容期间何时创建新 Pod，从而使 Deployment 行为更可控。
在资源受限的集群或终止时间较长的工作负载中尤为有用。&lt;/p&gt;
&lt;p&gt;预计此特性将作为 Alpha 特性推出，并可通过在 API 服务器和 kube-controller-manager 中启用
&lt;code&gt;DeploymentPodReplacementPolicy&lt;/code&gt; 和 &lt;code&gt;DeploymentReplicaSetTerminatingReplicas&lt;/code&gt; 特性门控启用。&lt;/p&gt;
&lt;!--
### Production-ready tracing for `kubelet` and API Server

To address the longstanding challenge of debugging node-level issues by correlating disconnected logs, 
[KEP-2831](https://kep.k8s.io/2831) provides deep, contextual insights into the `kubelet`.
--&gt;
&lt;h3 id=&#34;kubelet-和-api-服务器的生产级追踪特性&#34;&gt;kubelet 和 API 服务器的生产级追踪特性&lt;/h3&gt;
&lt;p&gt;为了解决通过日志关联进行节点级调试的长期难题，
&lt;a href=&#34;https://kep.k8s.io/2831&#34;&gt;KEP-2831&lt;/a&gt; 为 kubelet 提供了深度上下文可视化能力。&lt;/p&gt;
&lt;!--
This feature instruments critical `kubelet` operations, particularly its gRPC calls to the Container Runtime Interface (CRI), using the vendor-agnostic OpenTelemetry standard. 
It allows operators to visualize the entire lifecycle of events (for example: a Pod startup) to pinpoint sources of latency and errors. 
Its most powerful aspect is the propagation of trace context; the `kubelet` passes a trace ID with its requests to the container runtime, enabling runtimes to link their own spans.
--&gt;
&lt;p&gt;此特性使用供应商中立的 OpenTelemetry 标准，为关键的 kubelet 操作（特别是其对容器运行时接口的 gRPC 调用）做了插桩。
它使运维人员能够可视化整个事件生命周期（例如：Pod 启动）以定位延迟或错误来源。
其强大之处在于传播链路上下文：kubelet 在向容器运行时发送请求时附带链路 ID，使运行时能够链接自身的 Span。&lt;/p&gt;
&lt;!--
This effort is complemented by a parallel enhancement, [KEP-647](https://kep.k8s.io/647), which brings the same tracing capabilities to the Kubernetes API server. 
Together, these enhancements provide a more unified, end-to-end view of events, simplifying the process of pinpointing latency and errors from the control plane down to the node. 
These features have matured through the official Kubernetes release process. 
[KEP-2831](https://kep.k8s.io/2831) was introduced as an alpha feature in v1.25, while [KEP-647](https://kep.k8s.io/647) debuted as alpha in v1.22. 
Both enhancements were promoted to beta together in the v1.27 release. 
Looking forward, Kubelet Tracing ([KEP-2831](https://kep.k8s.io/2831)) and API Server Tracing ([KEP-647](https://kep.k8s.io/647)) are now targeting graduation to stable in the upcoming v1.34 release.
--&gt;
&lt;p&gt;这一工作得到了另一个增强提案 &lt;a href=&#34;https://kep.k8s.io/647&#34;&gt;KEP-647&lt;/a&gt; 的配合，
后者为 Kubernetes API 服务器引入了相同的链路追踪能力。
两者结合提供了从控制面到节点的端到端事件视图，极大简化了定位延迟和错误的过程。
这些特性已在 Kubernetes 正式版本发布流程中逐渐成熟：&lt;br&gt;
&lt;a href=&#34;https://kep.k8s.io/2831&#34;&gt;KEP-2831&lt;/a&gt; 在 v1.25 中以 Alpha 发布，
&lt;a href=&#34;https://kep.k8s.io/647&#34;&gt;KEP-647&lt;/a&gt; 在 v1.22 中首次作为 Alpha 发布，
这两个特性在 v1.27 中一起进阶至 Beta。
展望未来，kubelet 追踪（&lt;a href=&#34;https://kep.k8s.io/2831&#34;&gt;KEP-2831&lt;/a&gt;）和
API 服务器追踪（&lt;a href=&#34;https://kep.k8s.io/647&#34;&gt;KEP-647&lt;/a&gt;）计划在 v1.34 中进入稳定阶段。&lt;/p&gt;
&lt;!--
### `PreferSameZone` and `PreferSameNode` traffic distribution for Services

The `spec.trafficDistribution` field within a Kubernetes [Service](/docs/concepts/services-networking/service/) allows users to express preferences for how traffic should be routed to Service endpoints. 
--&gt;
&lt;h3 id=&#34;service-的-prefersamezone-和-prefersamenode-流量分发&#34;&gt;Service 的 &lt;code&gt;PreferSameZone&lt;/code&gt; 和 &lt;code&gt;PreferSameNode&lt;/code&gt; 流量分发&lt;/h3&gt;
&lt;p&gt;Kubernetes &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/services-networking/service/&#34;&gt;Service&lt;/a&gt; 的
&lt;code&gt;spec.trafficDistribution&lt;/code&gt; 字段允许用户表达服务端点的流量路由偏好。&lt;/p&gt;
&lt;!--
[KEP-3015](https://kep.k8s.io/3015) deprecates `PreferClose` and introduces two additional values: `PreferSameZone` and `PreferSameNode`. 
`PreferSameZone` is equivalent to the current `PreferClose`. 
`PreferSameNode` prioritizes sending traffic to endpoints on the same node as the client.  

This feature was introduced in v1.33 behind the `PreferSameTrafficDistribution` feature gate. 
It is targeting graduation to beta in v1.34 with its feature gate enabled by default.
--&gt;
&lt;p&gt;&lt;a href=&#34;https://kep.k8s.io/3015&#34;&gt;KEP-3015&lt;/a&gt; 弃用了 &lt;code&gt;PreferClose&lt;/code&gt;，并引入了两个新值：&lt;code&gt;PreferSameZone&lt;/code&gt; 和 &lt;code&gt;PreferSameNode&lt;/code&gt;。
&lt;code&gt;PreferSameZone&lt;/code&gt; 等价于当前的 &lt;code&gt;PreferClose&lt;/code&gt;；&lt;br&gt;
&lt;code&gt;PreferSameNode&lt;/code&gt; 优先将流量发送至与客户端位于同一节点的端点。&lt;/p&gt;
&lt;p&gt;此特性在 v1.33 中引入，受 &lt;code&gt;PreferSameTrafficDistribution&lt;/code&gt; 特性门控控制。
v1.34 中此特性预计将进入 Beta，并默认启用。&lt;/p&gt;
&lt;!--
### Support for KYAML: a Kubernetes dialect of YAML

KYAML aims to be a safer and less ambiguous YAML subset, and was designed specifically
for Kubernetes. Whatever version of Kubernetes you use, you&#39;ll be able use KYAML for writing manifests
and/or Helm charts.
You can write KYAML and pass it as an input to **any** version of `kubectl`,
because all KYAML files are also valid as YAML.
With kubectl v1.34, we expect you&#39;ll also be able to request KYAML output from `kubectl` (as in `kubectl get -o kyaml …`).
If you prefer, you can still request the output in JSON or YAML format.
--&gt;
&lt;h3 id=&#34;支持-kyaml-kubernetes-的-yaml-方言&#34;&gt;支持 KYAML：Kubernetes 的 YAML 方言&lt;/h3&gt;
&lt;p&gt;KYAML 是为 Kubernetes 设计的更安全、更少歧义的 YAML 子集。
无论你使用哪个版本的 Kubernetes，都可以使用 KYAML 编写清单和 Helm 模板。
你可以编写 KYAML 并将其作为输入传递给&lt;strong&gt;任意&lt;/strong&gt;版本的 kubectl，因为所有 KYAML 文件都是合法的 YAML。
在 kubectl v1.34 中，你还可以请求以 KYAML 格式输出（如：&lt;code&gt;kubectl get -o kyaml …&lt;/code&gt;）。
当然，如果你愿意，也可以继续使用 JSON 或 YAML 格式输出。&lt;/p&gt;
&lt;!--
KYAML addresses specific challenges with both YAML and JSON. 
YAML&#39;s significant whitespace requires careful attention to indentation and nesting, 
while its optional string-quoting can lead to unexpected type coercion (for example: [&#34;The Norway Bug&#34;](https://hitchdev.com/strictyaml/why/implicit-typing-removed/)). 
Meanwhile, JSON lacks comment support and has strict requirements for trailing commas and quoted keys.  

[KEP-5295](https://kep.k8s.io/5295) introduces KYAML, which tries to address the most significant problems by:
--&gt;
&lt;p&gt;KYAML 解决了 YAML 和 JSON 的一些具体问题：&lt;br&gt;
YAML 对缩进的敏感性需要你注意空格和嵌套，
而其可选的字符串引号可能导致意外类型转换
（参见 &lt;a href=&#34;https://hitchdev.com/strictyaml/why/implicit-typing-removed/&#34;&gt;“挪威 bug”&lt;/a&gt;）。
与此同时，JSON 不支持注释，且对尾逗号和键的引号有严格要求。&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://kep.k8s.io/5295&#34;&gt;KEP-5295&lt;/a&gt; 引入了 KYAML，尝试解决这些主要问题：&lt;/p&gt;
&lt;!--
* Always double-quoting value strings

* Leaving keys unquoted unless they are potentially ambiguous

* Always using `{}` for mappings (associative arrays)

* Always using `[]` for lists
--&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;所有值字符串始终使用英文双引号&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;键不加英文引号，除非可能产生歧义&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;所有映射使用 &lt;code&gt;{}&lt;/code&gt; 表示（即关联数组）&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;所有列表使用 &lt;code&gt;[]&lt;/code&gt; 表示&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
This might sound a lot like JSON, because it is! But unlike JSON, KYAML supports comments, allows trailing commas, and doesn&#39;t require quoted keys.

We&#39;re hoping to see KYAML introduced as a new output format for `kubectl` v1.34.
As with all these features, none of these changes are 100% confirmed; watch this space!
--&gt;
&lt;p&gt;这听起来像 JSON？确实如此！但与 JSON 不同的是，KYAML 支持注释、允许尾逗号，且不强制键加引号。&lt;/p&gt;
&lt;p&gt;我们希望在 kubectl v1.34 中将 KYAML 引入为一种新的输出格式。
如同其他特性一样，这些变更尚未百分百确定，敬请关注！&lt;/p&gt;
&lt;!--
As a format, KYAML is and will remain a **strict subset of YAML**, ensuring that any compliant YAML parser can parse KYAML documents. 
Kubernetes does not require you to provide input specifically formatted as KYAML, and we have no plans to change that.
--&gt;
&lt;p&gt;KYAML 作为一种格式，是 YAML 的&lt;strong&gt;严格子集&lt;/strong&gt;，
这确保任何符合规范的 YAML 解析器都能解析 KYAML 文档。
Kubernetes 并不要求你必须提供 KYAML 格式的输入，也没有这方面的计划。&lt;/p&gt;
&lt;!--
### Fine-grained autoscaling control with HPA configurable tolerance

[KEP-4951](https://kep.k8s.io/4951) introduces a new feature that allows users to configure autoscaling tolerance on a per-HPA basis, 
overriding the default cluster-wide 10% tolerance setting that often proves too coarse-grained for diverse workloads. 
The enhancement adds an optional `tolerance` field to the HPA&#39;s `spec.behavior.scaleUp` and `spec.behavior.scaleDown` sections, 
enabling different tolerance values for scale-up and scale-down operations, 
which is particularly valuable since scale-up responsiveness is typically more critical than scale-down speed for handling traffic surges.
--&gt;
&lt;h3 id=&#34;hpa-支持精细化自动扩缩控制容忍度配置&#34;&gt;HPA 支持精细化自动扩缩控制容忍度配置&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://kep.k8s.io/4951&#34;&gt;KEP-4951&lt;/a&gt; 引入了一项新特性，允许用户在每个 HPA 上配置扩缩容忍度，
以覆盖默认的集群级 10% 容忍度设置，这一默认值对多样化的工作负载来说往往过于粗略。
本次增强为 HPA 的 &lt;code&gt;spec.behavior.scaleUp&lt;/code&gt; 和 &lt;code&gt;spec.behavior.scaleDown&lt;/code&gt; 部分新增了可选的 &lt;code&gt;tolerance&lt;/code&gt; 字段，
使得扩容和缩容操作可以采用不同的容忍值。
这非常有用，因为在应对突发流量时，扩容响应通常比缩容速度更为关键。&lt;/p&gt;
&lt;!--
Released as alpha in Kubernetes v1.33 behind the `HPAConfigurableTolerance` feature gate, this feature is expected to graduate to beta in v1.34.
This improvement helps to address scaling challenges with large deployments, where for scaling in,
a 10% tolerance might mean leaving hundreds of unnecessary Pods running.
Using the new, more flexible approach would enable workload-specific optimization for both
responsive and conservative scaling behaviors.
--&gt;
&lt;p&gt;此特性作为 Alpha 特性，在 Kubernetes v1.33 中引入，并受 &lt;code&gt;HPAConfigurableTolerance&lt;/code&gt; 特性门控控制。
预计将在 v1.34 中进阶为 Beta。
这项改进有助于解决大规模部署中的扩缩容难题，例如在缩容时，10% 的容忍度可能意味着会保留数百个不必要的 Pod。
通过这一更灵活的配置方式，用户可以针对不同工作负载优化扩缩容行为的响应性和保守性。&lt;/p&gt;
&lt;!--
## Want to know more?
New features and deprecations are also announced in the Kubernetes release notes. 
We will formally announce what&#39;s new in [Kubernetes v1.34](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.34.md) as part of the CHANGELOG for that release.

The Kubernetes v1.34 release is planned for **Wednesday 27th August 2025**. Stay tuned for updates!
--&gt;
&lt;h2 id=&#34;想了解更多&#34;&gt;想了解更多？&lt;/h2&gt;
&lt;p&gt;新特性和弃用项也会在 Kubernetes 发布说明中公布。我们将在
&lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.34.md&#34;&gt;Kubernetes v1.34&lt;/a&gt;
变更日志中正式宣布新增内容。&lt;/p&gt;
&lt;p&gt;Kubernetes v1.34 的计划发布时间为 &lt;strong&gt;2025 年 8 月 27 日（周三）&lt;/strong&gt;。敬请期待更多更新！&lt;/p&gt;
&lt;!--
## Get involved
The simplest way to get involved with Kubernetes is to join one of the many [Special Interest Groups](https://github.com/kubernetes/community/blob/master/sig-list.md) (SIGs) that align with your interests. 
Have something you&#39;d like to broadcast to the Kubernetes community? Share your voice at our weekly [community meeting](https://github.com/kubernetes/community/tree/master/communication), and through the channels below. 
Thank you for your continued feedback and support.
--&gt;
&lt;h2 id=&#34;参与其中&#34;&gt;参与其中&lt;/h2&gt;
&lt;p&gt;参与 Kubernetes 最简单的方式就是加入与你兴趣相关的&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-list.md&#34;&gt;特别兴趣小组（SIG）&lt;/a&gt;。
有想要向社区分享的内容？欢迎在每周的&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/communication&#34;&gt;社区会议&lt;/a&gt;上发声，
或通过以下渠道参与讨论。感谢你一如既往的反馈和支持！&lt;/p&gt;
&lt;!--
* Follow us on Bluesky [@kubernetes.io](https://bsky.app/profile/kubernetes.io) for the latest updates
* Join the community discussion on [Discuss](https://discuss.kubernetes.io/)
* Join the community on [Slack](http://slack.k8s.io/)
* Post questions (or answer questions) on [Server Fault](https://serverfault.com/questions/tagged/kubernetes) or [Stack Overflow](http://stackoverflow.com/questions/tagged/kubernetes)
* Share your Kubernetes [story](https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform)
* Read more about what&#39;s happening with Kubernetes on the [blog](https://kubernetes.io/blog/)
* Learn more about the [Kubernetes Release Team](https://github.com/kubernetes/sig-release/tree/master/release-team)
--&gt;
&lt;ul&gt;
&lt;li&gt;在 Bluesky 上关注我们 &lt;a href=&#34;https://bsky.app/profile/kubernetes.io&#34;&gt;@kubernetes.io&lt;/a&gt;，获取最新动态&lt;/li&gt;
&lt;li&gt;在 &lt;a href=&#34;https://discuss.kubernetes.io/&#34;&gt;Discuss&lt;/a&gt; 上参与社区讨论&lt;/li&gt;
&lt;li&gt;加入 &lt;a href=&#34;http://slack.k8s.io/&#34;&gt;Slack 社区&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;在 &lt;a href=&#34;https://serverfault.com/questions/tagged/kubernetes&#34;&gt;Server Fault&lt;/a&gt; 或
&lt;a href=&#34;http://stackoverflow.com/questions/tagged/kubernetes&#34;&gt;Stack Overflow&lt;/a&gt; 上提问或回答问题&lt;/li&gt;
&lt;li&gt;分享你的 Kubernetes &lt;a href=&#34;https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform&#34;&gt;使用故事&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;阅读 Kubernetes &lt;a href=&#34;https://kubernetes.io/blog/&#34;&gt;官方博客&lt;/a&gt;上的更多动态&lt;/li&gt;
&lt;li&gt;了解 &lt;a href=&#34;https://github.com/kubernetes/sig-release/tree/master/release-team&#34;&gt;Kubernetes 发布团队&lt;/a&gt;的更多信息&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>云原生环境中的镜像兼容性</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/25/image-compatibility-in-cloud-native-environments/</link>
      <pubDate>Wed, 25 Jun 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/25/image-compatibility-in-cloud-native-environments/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Image Compatibility In Cloud Native Environments&#34;
date: 2025-06-25
draft: false
slug: image-compatibility-in-cloud-native-environments
author: &gt;
  Chaoyi Huang (Huawei),
  Marcin Franczyk (Huawei),
  Vanessa Sochat (Lawrence Livermore National Laboratory)
--&gt;
&lt;!--
In industries where systems must run very reliably and meet strict performance criteria such as telecommunication, high-performance or AI computing, containerized applications often need specific operating system configuration or hardware presence.
It is common practice to require the use of specific versions of the kernel, its configuration, device drivers, or system components.
Despite the existence of the [Open Container Initiative (OCI)](https://opencontainers.org/), a governing community to define standards and specifications for container images, there has been a gap in expression of such compatibility requirements.
The need to address this issue has led to different proposals and, ultimately, an implementation in Kubernetes&#39; [Node Feature Discovery (NFD)](https://kubernetes-sigs.github.io/node-feature-discovery/stable/get-started/index.html).
--&gt;
&lt;p&gt;在电信、高性能或 AI 计算等必须高度可靠且满足严格性能标准的行业中，容器化应用通常需要特定的操作系统配置或硬件支持。
通常的做法是要求使用特定版本的内核、其配置、设备驱动程序或系统组件。
尽管存在&lt;a href=&#34;https://opencontainers.org/&#34;&gt;开放容器倡议 (OCI)&lt;/a&gt; 这样一个定义容器镜像标准和规范的治理社区，
但在表达这种兼容性需求方面仍存在空白。为了解决这一问题，业界提出了多个提案，并最终在 Kubernetes
的&lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/stable/get-started/index.html&#34;&gt;节点特性发现 (NFD)&lt;/a&gt; 项目中实现了相关功能。&lt;/p&gt;
&lt;!--
[NFD](https://kubernetes-sigs.github.io/node-feature-discovery/stable/get-started/index.html) is an open source Kubernetes project that automatically detects and reports [hardware and system features](https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/customization-guide.html#available-features) of cluster nodes. This information helps users to schedule workloads on nodes that meet specific system requirements, which is especially useful for applications with strict hardware or operating system dependencies.
--&gt;
&lt;p&gt;&lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/stable/get-started/index.html&#34;&gt;NFD&lt;/a&gt;
是一个开源的 Kubernetes 项目，能够自动检测并报告集群节点的&lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/customization-guide.html#available-features&#34;&gt;硬件和系统特性&lt;/a&gt;。
这些信息帮助用户将工作负载调度到满足特定系统需求的节点上，尤其适用于具有严格硬件或操作系统依赖的应用。&lt;/p&gt;
&lt;!--
## The need for image compatibility specification

### Dependencies between containers and host OS

A container image is built on a base image, which provides a minimal runtime environment, often a stripped-down Linux userland, completely empty or distroless. When an application requires certain features from the host OS, compatibility issues arise. These dependencies can manifest in several ways:
--&gt;
&lt;h2 id=&#34;the-need-for-image-compatibility-specification&#34;&gt;镜像兼容性规范的需求&lt;/h2&gt;
&lt;h3 id=&#34;容器与主机操作系统之间的依赖关系&#34;&gt;容器与主机操作系统之间的依赖关系&lt;/h3&gt;
&lt;p&gt;容器镜像是基于基础镜像构建的，基础镜像提供了最小的运行时环境，通常是一个精简的 Linux 用户态环境，
有时甚至是完全空白或无发行版的。
当应用需要来自主机操作系统的某些特性时，就会出现兼容性问题。这些依赖可能表现为以下几种形式：&lt;/p&gt;
&lt;!--
- **Drivers**:
  Host driver versions must match the supported range of a library version inside the container to avoid compatibility problems. Examples include GPUs and network drivers.
- **Libraries or Software**:
  The container must come with a specific version or range of versions for a library or software to run optimally in the environment. Examples from high performance computing are MPI, EFA, or Infiniband.
- **Kernel Modules or Features**:
  Specific kernel features or modules must be present. Examples include having support of write protected huge page faults, or the presence of VFIO
- And more…
--&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;驱动程序&lt;/strong&gt;：
主机上的驱动程序版本必须与容器内的库所支持的版本范围相匹配，以避免兼容性问题，例如 GPU 和网络驱动。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;库或软件&lt;/strong&gt;：
容器必须包含某个库或软件的特定版本或版本范围，才能在目标环境中以最优方式运行。
高性能计算方面的示例包括 MPI、EFA 或 Infiniband。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;内核模块或特性&lt;/strong&gt;：
必须存在特定的内核特性或模块，例如对写入保护巨页错误的支持，或存在对 VFIO 的支持。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;以及其他更多形式...&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
While containers in Kubernetes are the most likely unit of abstraction for these needs, the definition of compatibility can extend further to include other container technologies such as Singularity and other OCI artifacts such as binaries from a spack binary cache.
--&gt;
&lt;p&gt;虽然在 Kubernetes 中容器是这些需求最常见的抽象单位，但兼容性的定义可以进一步扩展，包括
Singularity 等其他容器技术以及来自 spack 二进制缓存的二进制文件等 OCI 工件。&lt;/p&gt;
&lt;!--
### Multi-cloud and hybrid cloud challenges

Containerized applications are deployed across various Kubernetes distributions and cloud providers, where different host operating systems introduce compatibility challenges.
Often those have to be pre-configured before workload deployment or are immutable.
For instance, different cloud providers will include different operating systems like:
--&gt;
&lt;h3 id=&#34;多云与混合云的挑战&#34;&gt;多云与混合云的挑战&lt;/h3&gt;
&lt;p&gt;容器化应用被部署在各种 Kubernetes 发行版和云平台上，而不同的主机操作系统带来了兼容性挑战。
这些操作系统通常需要在部署工作负载之前预配置，或者它们是不可变的。
例如，不同云平台会使用不同的操作系统，包括：&lt;/p&gt;
&lt;!--
- **RHCOS/RHEL**
- **Photon OS**
- **Amazon Linux 2**
- **Container-Optimized OS**
- **Azure Linux OS**
- And more...
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;RHCOS/RHEL&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Photon OS&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Amazon Linux 2&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Container-Optimized OS&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Azure Linux OS&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;等等...&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Each OS comes with unique kernel versions, configurations, and drivers, making compatibility a non-trivial issue for applications requiring specific features.
It must be possible to quickly assess a container for its suitability to run on any specific environment.
--&gt;
&lt;p&gt;每种操作系统都具有独特的内核版本、配置和驱动程序，对于需要特定特性的应用来说，兼容性问题并不简单。
因此必须能够快速评估某个容器镜像是否适合在某个特定环境中运行。&lt;/p&gt;
&lt;!--
### Image compatibility initiative

An effort was made within the [Open Containers Initiative Image Compatibility](https://github.com/opencontainers/wg-image-compatibility) working group to introduce a standard for image compatibility metadata.
A specification for compatibility would allow container authors to declare required host OS features, making compatibility requirements discoverable and programmable.
The specification implemented in Kubernetes Node Feature Discovery is one of the discussed proposals.
It aims to:
--&gt;
&lt;h3 id=&#34;镜像兼容性倡议&#34;&gt;镜像兼容性倡议&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/opencontainers/wg-image-compatibility&#34;&gt;OCI 镜像兼容性工作组&lt;/a&gt;正在推动引入一个镜像兼容性元数据的标准。
此规范允许容器作者声明所需的主机操作系统特性，使兼容性需求可以被发现和编程化处理。
目前已在 Kubernetes 的 Node Feature Discovery 中实现了其中一个被讨论的提案，其目标包括：&lt;/p&gt;
&lt;!--
- **Define a structured way to express compatibility in OCI image manifests.**
- **Support a compatibility specification alongside container images in image registries.**
- **Allow automated validation of compatibility before scheduling containers.**

The concept has since been implemented in the Kubernetes Node Feature Discovery project.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;在 OCI 镜像清单中定义一种结构化的兼容性表达方式。&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;支持在镜像仓库中将兼容性规范与容器镜像一同存储。&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;在容器调度之前实现兼容性自动验证。&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;这个理念目前已在 Kubernetes 的 Node Feature Discovery 项目中落地。&lt;/p&gt;
&lt;!--
### Implementation in Node Feature Discovery

The solution integrates compatibility metadata into Kubernetes via NFD features and the [NodeFeatureGroup](https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/custom-resources.html#nodefeaturegroup) API.
This interface enables the user to match containers to nodes based on exposing features of hardware and software, allowing for intelligent scheduling and workload optimization.
--&gt;
&lt;h3 id=&#34;在-node-feature-discovery-中的实现&#34;&gt;在 Node Feature Discovery 中的实现&lt;/h3&gt;
&lt;p&gt;这种解决方案通过 NFD 的特性机制和
&lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/custom-resources.html#nodefeaturegroup&#34;&gt;NodeFeatureGroup&lt;/a&gt;
API 将兼容性元数据集成到 Kubernetes 中。
此接口使用户可以根据硬件和软件暴露的特性将容器与节点进行匹配，从而实现智能调度与工作负载优化。&lt;/p&gt;
&lt;!--
### Compatibility specification

The compatibility specification is a structured list of compatibility objects containing *[Node Feature Groups](https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/custom-resources.html#nodefeaturegroup)*.
These objects define image requirements and facilitate validation against host nodes.
The feature requirements are described by using [the list of available features](https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/customization-guide.html#available-features) from the NFD project.
The schema has the following structure:
--&gt;
&lt;h3 id=&#34;兼容性规范&#34;&gt;兼容性规范&lt;/h3&gt;
&lt;p&gt;兼容性规范是一个结构化的兼容性对象列表，包含
&lt;strong&gt;&lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/custom-resources.html#nodefeaturegroup&#34;&gt;Node Feature Groups&lt;/a&gt;&lt;/strong&gt;。
这些对象定义了镜像要求，并支持与主机节点进行验证。特性需求通过
&lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/customization-guide.html#available-features&#34;&gt;NFD 项目提供的特性列表&lt;/a&gt;进行描述。此模式的结构如下：&lt;/p&gt;
&lt;!--
- **version** (string) - Specifies the API version.
- **compatibilities** (array of objects) - List of compatibility sets.
  - **rules** (object) - Specifies [NodeFeatureGroup](https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/custom-resources.html#nodefeaturegroup) to define image requirements.
  - **weight** (int, optional) - Node affinity weight.
  - **tag** (string, optional) - Categorization tag.
  - **description** (string, optional) - Short description.
--&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;version&lt;/strong&gt;（字符串）— 指定 API 版本。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;compatibilities&lt;/strong&gt;（对象数组）— 兼容性集合列表。&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;rules&lt;/strong&gt;（对象）— 指定
&lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/custom-resources.html#nodefeaturegroup&#34;&gt;NodeFeatureGroup&lt;/a&gt;
来定义镜像要求。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;weight&lt;/strong&gt;（整数，可选）— 节点亲和性权重。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;tag&lt;/strong&gt;（字符串，可选）— 分类标记。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;description&lt;/strong&gt;（字符串，可选）— 简短描述。&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
An example might look like the following:
--&gt;
&lt;p&gt;示例如下：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;version&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1alpha1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;compatibilities&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;description&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;My image requirements&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;rules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;kernel and cpu&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchFeatures&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;feature&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;kernel.loadedmodule&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchExpressions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;vfio-pci&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;op&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Exists}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;feature&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;cpu.model&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchExpressions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;vendor_id&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;op: In, value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;Intel&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;AMD&amp;#34;&lt;/span&gt;]}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;one of available nics&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchAny&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchFeatures&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;feature&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;pci.device&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchExpressions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;vendor&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;op: In, value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;0eee&amp;#34;&lt;/span&gt;]}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;class&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;op: In, value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;0200&amp;#34;&lt;/span&gt;]}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchFeatures&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;feature&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;pci.device&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchExpressions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;vendor&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;op: In, value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;0fff&amp;#34;&lt;/span&gt;]}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;class&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;op: In, value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;0200&amp;#34;&lt;/span&gt;]}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
### Client implementation for node validation

To streamline compatibility validation, we implemented a [client tool](https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/reference/node-feature-client-reference.html) that allows for node validation based on an image&#39;s compatibility artifact.
In this workflow, the image author would generate a compatibility artifact that points to the image it describes in a registry via the referrers API.
When a need arises to assess the fit of an image to a host, the tool can discover the artifact and verify compatibility of an image to a node before deployment.
The client can validate nodes both inside and outside a Kubernetes cluster, extending the utility of the tool beyond the single Kubernetes use case.
In the future, image compatibility could play a crucial role in creating specific workload profiles based on image compatibility requirements, aiding in more efficient scheduling.
Additionally, it could potentially enable automatic node configuration to some extent, further optimizing resource allocation and ensuring seamless deployment of specialized workloads.
--&gt;
&lt;h3 id=&#34;节点验证的客户端实现&#34;&gt;节点验证的客户端实现&lt;/h3&gt;
&lt;p&gt;为了简化兼容性验证，
我们实现了一个&lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/reference/node-feature-client-reference.html&#34;&gt;客户端工具&lt;/a&gt;，
可以根据镜像的兼容性工件进行节点验证。在这个流程中，镜像作者会生成一个兼容性工件，
并通过引用者（Referrs） API 将其指向镜像所在的仓库。当需要评估某个镜像是否适用于某个主机节点时，
此工具可以发现工件并在部署前验证镜像对节点的兼容性。
客户端可以验证 Kubernetes 集群内外的节点，扩大了其应用范围。
未来，镜像兼容性还可能在基于镜像要求创建特定工作负载配置文件中发挥关键作用，有助于提升调度效率。
此外，还可能实现一定程度上的节点自动配置，进一步优化资源分配并确保特种工作负载的顺利部署。&lt;/p&gt;
&lt;!--
### Examples of usage

1. **Define image compatibility metadata**

   A [container image](/docs/concepts/containers/images) can have metadata that describes
   its requirements based on features discovered from nodes, like kernel modules or CPU models.
   The previous compatibility specification example in this article exemplified this use case.
--&gt;
&lt;h3 id=&#34;使用示例&#34;&gt;使用示例&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;定义镜像兼容性元数据&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;一个&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/containers/images&#34;&gt;容器镜像&lt;/a&gt;可以包含元数据，
基于节点所发现的特性（如内核模块或 CPU 型号）描述其需求。
上文所述的兼容性规范示例即体现了这种用法。&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
2. **Attach the artifact to the image**

   The image compatibility specification is stored as an OCI artifact.
   You can attach this metadata to your container image using the [oras](https://oras.land/) tool.
   The registry only needs to support OCI artifacts, support for arbitrary types is not required.
   Keep in mind that the container image and the artifact must be stored in the same registry.
   Use the following command to attach the artifact to the image:
--&gt;
&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;将工件挂接到镜像上&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;镜像兼容性规范以 OCI 工件的形式存储。
你可以使用 &lt;a href=&#34;https://oras.land/&#34;&gt;oras&lt;/a&gt; 工具将元数据挂接到你的容器镜像上。
镜像仓库只需支持 OCI 工件，不必支持任意类型。
请注意，容器镜像和工件必须存储在同一个镜像仓库中。
使用以下命令将工件挂接到镜像上：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;oras attach &lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;\ &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;--artifact-type application/vnd.nfd.image-compatibility.v1alpha1 &amp;lt;image-url&amp;gt; &lt;span style=&#34;color:#b62;font-weight:bold&#34;&gt;\ &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;path-to-spec&amp;gt;.yaml:application/vnd.nfd.image-compatibility.spec.v1alpha1+yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
3. **Validate image compatibility**

   After attaching the compatibility specification, you can validate whether a node meets the
   image&#39;s requirements. This validation can be done using the
   [nfd client](https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/reference/node-feature-client-reference.html):

   ```bash
   nfd compat validate-node --image &lt;image-url&gt;
   ```
--&gt;
&lt;ol start=&#34;3&#34;&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;验证镜像兼容性&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;在挂接兼容性规范之后，你可以验证某个节点是否满足镜像的运行要求。这种验证可以通过
&lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/reference/node-feature-client-reference.html&#34;&gt;nfd 客户端&lt;/a&gt;来完成：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;nfd compat validate-node --image &amp;lt;镜像地址&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
4. **Read the output from the client**

   Finally you can read the report generated by the tool or use your own tools to act based on the generated JSON report.

   ![validate-node command output](validate-node-output.png)
--&gt;
&lt;ol start=&#34;4&#34;&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;读取客户端的输出&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;你可以阅读工具生成的报告，也可以使用你自己的工具解析生成的 JSON 报告并做出决策。&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;validate-node 命令输出&#34; src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/25/image-compatibility-in-cloud-native-environments/validate-node-output.png&#34;&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
## Conclusion

The addition of image compatibility to Kubernetes through Node Feature Discovery underscores the growing importance of addressing compatibility in cloud native environments.
It is only a start, as further work is needed to integrate compatibility into scheduling of workloads within and outside of Kubernetes.
However, by integrating this feature into Kubernetes, mission-critical workloads can now define and validate host OS requirements more efficiently.
Moving forward, the adoption of compatibility metadata within Kubernetes ecosystems will significantly enhance the reliability and performance of specialized containerized applications, ensuring they meet the stringent requirements of industries like telecommunications, high-performance computing or any environment that requires special hardware or host OS configuration.
--&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;总结&lt;/h2&gt;
&lt;p&gt;通过 Node Feature Discovery 将镜像兼容性引入 Kubernetes，突显了在云原生环境中解决兼容性问题的重要性。
这只是一个起点，未来仍需进一步将兼容性深度集成到 Kubernetes 内外的工作负载调度中。
然而，借助这一功能，关键任务型工作负载现在可以更高效地定义和验证其对主机操作系统的要求。
展望未来，兼容性元数据在 Kubernetes 生态系统中的广泛采用将显著提升专用容器化应用的可靠性与性能，
确保其能够满足电信、高性能计算等行业对硬件或主机系统配置的严格要求。&lt;/p&gt;
&lt;!--
## Get involved

Join the [Kubernetes Node Feature Discovery](https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/contributing/) project if you&#39;re interested in getting involved with the design and development of Image Compatibility API and tools.
We always welcome new contributors.
--&gt;
&lt;h2 id=&#34;get-involved&#34;&gt;加入我们&lt;/h2&gt;
&lt;p&gt;如果你有兴趣参与镜像兼容性 API 和工具的设计与开发，欢迎加入
&lt;a href=&#34;https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/contributing/&#34;&gt;Kubernetes Node Feature Discovery&lt;/a&gt;
项目。我们始终欢迎新的贡献者加入。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes Slack 变更公告</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/16/changes-to-kubernetes-slack/</link>
      <pubDate>Mon, 16 Jun 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/16/changes-to-kubernetes-slack/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Changes to Kubernetes Slack&#34;
date: 2025-06-16
canonicalUrl: https://www.kubernetes.dev/blog/2025/06/16/changes-to-kubernetes-slack-2025/
slug: changes-to-kubernetes-slack
Author: &gt;
  [Josh Berkus](https://github.com/jberkus)
--&gt;
&lt;!--
**UPDATE**: We&#39;ve received notice from Salesforce that our Slack workspace **WILL NOT BE DOWNGRADED** on June 20th. Stand by for more details, but for now, there is no urgency to back up private channels or direct messages.
--&gt;
&lt;p&gt;&lt;strong&gt;更新&lt;/strong&gt;：我们已收到 Salesforce 的通知，我们的 Slack 工作区在 6 月 20 日&lt;strong&gt;不会被降级&lt;/strong&gt;。
请等待更多细节更新，目前&lt;strong&gt;无需紧急备份&lt;/strong&gt;私有频道或私信。&lt;/p&gt;
&lt;!--
~Kubernetes Slack will lose its special status and will be changing into a standard free Slack on June 20, 2025~~. Sometime later this year, our community may move to a new platform. If you are responsible for a channel or private channel, or a member of a User Group, you will need to take some actions as soon as you can.
--&gt;
&lt;p&gt;&lt;del&gt;Kubernetes Slack 将在 6 月 20 日失去原有的专属支持，并转变为标准免费版 Slack&lt;/del&gt;~。
今年晚些时候，我们的社区可能会迁移到新平台。
如果你是频道或私有频道的负责人，又或是用户组的成员，你需要尽快采取一些行动。&lt;/p&gt;
&lt;!--
For the last decade, Slack has supported our project with a free customized enterprise account. They have let us know that they can no longer do so, particularly since our Slack is one of the largest and more active ones on the platform. As such, they will be downgrading it to a standard free Slack while we decide on, and implement, other options.
--&gt;
&lt;p&gt;在过去十年中，Slack 一直通过免费定制企业账户支持我们的项目。
他们已告知我们无法继续提供这种支持，特别是因为我们的 Slack 是平台上最大和最活跃的社区之一。
因此，在我们决定实施其他选项的同时，他们将把我们的账户降级为标准免费版 Slack。&lt;/p&gt;
&lt;!--
On Friday, June 20, we will be subject to the [feature limitations of free Slack](https://slack.com/help/articles/27204752526611-Feature-limitations-on-the-free-version-of-Slack). The primary ones which will affect us will be only retaining 90 days of history, and having to disable several apps and workflows which we are currently using. The Slack Admin team will do their best to manage these limitations.
--&gt;
&lt;p&gt;在 6 月 20 日星期五，我们将受到&lt;a href=&#34;https://slack.com/help/articles/27204752526611-Feature-limitations-on-the-free-version-of-Slack&#34;&gt;免费版 Slack 的功能限制&lt;/a&gt;。
主要影响包括仅保留 90 天的历史记录，以及必须禁用我们当前使用的几个应用程序和工作流。
Slack 管理团队将尽最大努力管理这些限制。&lt;/p&gt;
&lt;!--
Responsible channel owners, members of private channels, and members of User Groups should [take some actions](https://github.com/kubernetes/community/blob/master/communication/slack-migration-faq.md#what-actions-do-channel-owners-and-user-group-members-need-to-take-soon) to prepare for the upgrade and preserve information as soon as possible.
--&gt;
&lt;p&gt;负责的频道所有者、私有频道成员和用户组成员应该&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/communication/slack-migration-faq.md#what-actions-do-channel-owners-and-user-group-members-need-to-take-soon&#34;&gt;采取一些行动&lt;/a&gt;，
以尽快为升级做准备并保存信息。&lt;/p&gt;
&lt;!--
The CNCF Projects Staff have proposed that our community look at migrating to Discord. Because of existing issues where we have been pushing the limits of Slack, they have already explored what a Kubernetes Discord would look like. Discord would allow us to implement new tools and integrations which would help the community, such as GitHub group membership synchronization. The Steering Committee will discuss and decide on our future platform.
--&gt;
&lt;p&gt;CNCF 项目工作人员建议我们的社区考虑迁移到 Discord。
由于在拓展 Slack 功能极限的过程中存在一些问题，他们已经探索过 Kubernetes Discord 会是什么样子。
Discord 将允许我们实现新的工具和集成，以帮助社区，例如 GitHub 组成员身份同步。
指导委员会将讨论并决定我们的未来平台。&lt;/p&gt;
&lt;!--
Please see our [FAQ](https://github.com/kubernetes/community/blob/master/communication/slack-migration-faq.md), and check the [kubernetes-dev mailing list](https://groups.google.com/a/kubernetes.io/g/dev/) and the [#announcements channel](https://kubernetes.slack.com/archives/C9T0QMNG4) for further news. If you have specific feedback on our Slack status join the [discussion on GitHub](https://github.com/kubernetes/community/issues/8490).
--&gt;
&lt;p&gt;请查看我们的&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/communication/slack-migration-faq.md&#34;&gt;常见问题解答&lt;/a&gt;，
并关注 &lt;a href=&#34;https://groups.google.com/a/kubernetes.io/g/dev/&#34;&gt;kubernetes-dev 邮件列表&lt;/a&gt;和
&lt;a href=&#34;https://kubernetes.slack.com/archives/C9T0QMNG4&#34;&gt;#announcements 频道&lt;/a&gt;以获取更多新闻。
如果你对我们的 Slack 状态有具体反馈，请加入
&lt;a href=&#34;https://github.com/kubernetes/community/issues/8490&#34;&gt;GitHub 上的讨论&lt;/a&gt;。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>通过自定义聚合增强 Kubernetes Event 管理</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/10/enhancing-kubernetes-event-management-custom-aggregation/</link>
      <pubDate>Tue, 10 Jun 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/10/enhancing-kubernetes-event-management-custom-aggregation/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Enhancing Kubernetes Event Management with Custom Aggregation&#34;
date: 2025-06-10
draft: false
slug: enhancing-kubernetes-event-management-custom-aggregation
Author: &gt;
  [Rez Moss](https://github.com/rezmoss)
--&gt;
&lt;!--
Kubernetes [Events](/docs/reference/kubernetes-api/cluster-resources/event-v1/) provide crucial insights into cluster operations, but as clusters grow, managing and analyzing these events becomes increasingly challenging. This blog post explores how to build custom event aggregation systems that help engineering teams better understand cluster behavior and troubleshoot issues more effectively.
--&gt;
&lt;p&gt;Kubernetes &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/kubernetes-api/cluster-resources/event-v1/&#34;&gt;Event&lt;/a&gt;
提供了集群操作的关键洞察信息，但随着集群的增长，管理和分析这些 Event 变得越来越具有挑战性。
这篇博客文章探讨了如何构建自定义 Event 聚合系统，以帮助工程团队更好地理解集群行为并更有效地解决问题。&lt;/p&gt;
&lt;!--
## The challenge with Kubernetes events

In a Kubernetes cluster, events are generated for various operations - from pod scheduling and container starts to volume mounts and network configurations. While these events are invaluable for debugging and monitoring, several challenges emerge in production environments:
--&gt;
&lt;h2 id=&#34;kubernetes-event-的挑战&#34;&gt;Kubernetes Event 的挑战&lt;/h2&gt;
&lt;p&gt;在 Kubernetes 集群中，从 Pod 调度、容器启动到卷挂载和网络配置，
各种操作都会生成 Event。虽然这些 Event 对于调试和监控非常有价值，
但在生产环境中出现了几个挑战：&lt;/p&gt;
&lt;!--
1. **Volume**: Large clusters can generate thousands of events per minute
2. **Retention**: Default event retention is limited to one hour
3. **Correlation**: Related events from different components are not automatically linked
4. **Classification**: Events lack standardized severity or category classifications
5. **Aggregation**: Similar events are not automatically grouped
--&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;量&lt;/strong&gt;：大型集群每分钟可以生成数千个 Event&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;保留&lt;/strong&gt;：默认 Event 保留时间限制为一小时&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;关联&lt;/strong&gt;：不同组件的相关 Event 不会自动链接&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;分类&lt;/strong&gt;：Event 缺乏标准化的严重性或类别分类&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;聚合&lt;/strong&gt;：相似的 Event 不会自动分组&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
To learn more about Events in Kubernetes, read the [Event](/docs/reference/kubernetes-api/cluster-resources/event-v1/) API reference.
--&gt;
&lt;p&gt;要了解更多关于 Kubernetes Event 的信息，请阅读
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/kubernetes-api/cluster-resources/event-v1/&#34;&gt;Event&lt;/a&gt;
API 参考。&lt;/p&gt;
&lt;!--
## Real-World value

Consider a production environment with tens of microservices where the users report intermittent transaction failures:

**Traditional event aggregation process:** Engineers are wasting hours sifting through thousands of standalone events spread across namespaces. By the time they look into it, the older events have long since purged, and correlating pod restarts to node-level issues is practically impossible.
--&gt;
&lt;h2 id=&#34;现实世界的价值&#34;&gt;现实世界的价值&lt;/h2&gt;
&lt;p&gt;考虑一个拥有数十个微服务的生产环境中，用户报告间歇性事务失败的情况：&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;传统的 Event 聚合过程：&lt;/strong&gt; 工程师浪费数小时筛选分散在各个命名空间中的成千上万的独立 Event。
等到他们查看时，较旧的 Event 早已被清除，将 Pod 重启与节点级别问题关联实际上是不可能的。&lt;/p&gt;
&lt;!--
**With its event aggregation in its custom events:** The system groups events across resources, instantly surfacing correlation patterns such as volume mount timeouts before pod restarts. History indicates it occurred during past record traffic spikes, highlighting a storage scalability issue in minutes rather than hours.

The beneﬁt of this approach is that organizations that implement it commonly cut down their troubleshooting time significantly along with increasing the reliability of systems by detecting patterns early.
--&gt;
&lt;p&gt;&lt;strong&gt;在自定义 Event 中使用 Event 聚合器：&lt;/strong&gt; 系统跨资源分组 Event，
即时浮现如卷挂载超时等关联模式，这些模式出现在 Pod 重启之前。
历史记录表明，这发生在过去的流量高峰期间，突显了存储扩缩问题，
在几分钟内而不是几小时内发现问题。&lt;/p&gt;
&lt;p&gt;这种方法的好处是，实施它的组织通常可以显著减少故障排除时间，
并通过早期检测模式来提高系统的可靠性。&lt;/p&gt;
&lt;!--
## Building an Event aggregation system

This post explores how to build a custom event aggregation system that addresses these challenges, aligned to Kubernetes best practices. I&#39;ve picked the Go programming language for my example.
--&gt;
&lt;h2 id=&#34;构建-event-聚合系统&#34;&gt;构建 Event 聚合系统&lt;/h2&gt;
&lt;p&gt;本文探讨了如何构建一个解决这些问题的自定义 Event 聚合系统，
该系统符合 Kubernetes 最佳实践。我选择了 Go 编程语言作为示例。&lt;/p&gt;
&lt;!--
### Architecture overview

This event aggregation system consists of three main components:

1. **Event Watcher**: Monitors the Kubernetes API for new events
2. **Event Processor**: Processes, categorizes, and correlates events
3. **Storage Backend**: Stores processed events for longer retention

Here&#39;s a sketch for how to implement the event watcher:
--&gt;
&lt;h3 id=&#34;架构概述&#34;&gt;架构概述&lt;/h3&gt;
&lt;p&gt;这个 Event 聚合系统由三个主要组件组成：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Event 监视器&lt;/strong&gt;：监控 Kubernetes API 的新 Event&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Event 处理器&lt;/strong&gt;：处理、分类和关联 Event&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;存储后端&lt;/strong&gt;：存储处理过的 Event 以实现更长的保留期&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;以下是实现 Event 监视器的示例代码：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-go&#34; data-lang=&#34;go&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;package&lt;/span&gt; main
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;import&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;context&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    metav1 &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;k8s.io/apimachinery/pkg/apis/meta/v1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;k8s.io/client-go/kubernetes&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;k8s.io/client-go/rest&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    eventsv1 &lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;k8s.io/api/events/v1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;type&lt;/span&gt; EventWatcher &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;struct&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    clientset &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;kubernetes.Clientset
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; &lt;span style=&#34;color:#00a000&#34;&gt;NewEventWatcher&lt;/span&gt;(config &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;rest.Config) (&lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;EventWatcher, &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;error&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    clientset, err &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; kubernetes.&lt;span style=&#34;color:#00a000&#34;&gt;NewForConfig&lt;/span&gt;(config)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; err &lt;span style=&#34;color:#666&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;nil&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;nil&lt;/span&gt;, err
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#666&#34;&gt;&amp;amp;&lt;/span&gt;EventWatcher{clientset: clientset}, &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;nil&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; (w &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;EventWatcher) &lt;span style=&#34;color:#00a000&#34;&gt;Watch&lt;/span&gt;(ctx context.Context) (&lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;chan&lt;/span&gt; &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;eventsv1.Event, &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;error&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    events &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#a2f&#34;&gt;make&lt;/span&gt;(&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;chan&lt;/span&gt; &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;eventsv1.Event)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    watcher, err &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; w.clientset.&lt;span style=&#34;color:#00a000&#34;&gt;EventsV1&lt;/span&gt;().&lt;span style=&#34;color:#00a000&#34;&gt;Events&lt;/span&gt;(&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;).&lt;span style=&#34;color:#00a000&#34;&gt;Watch&lt;/span&gt;(ctx, metav1.ListOptions{})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; err &lt;span style=&#34;color:#666&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;nil&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;nil&lt;/span&gt;, err
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;go&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt;() {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;defer&lt;/span&gt; &lt;span style=&#34;color:#a2f&#34;&gt;close&lt;/span&gt;(events)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;for&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;select&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;case&lt;/span&gt; event &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt;watcher.&lt;span style=&#34;color:#00a000&#34;&gt;ResultChan&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; e, ok &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; event.Object.(&lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;eventsv1.Event); ok {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    events &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt; e
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;case&lt;/span&gt; &lt;span style=&#34;color:#666&#34;&gt;&amp;lt;-&lt;/span&gt;ctx.&lt;span style=&#34;color:#00a000&#34;&gt;Done&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                watcher.&lt;span style=&#34;color:#00a000&#34;&gt;Stop&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; events, &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;nil&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
### Event processing and classification

The event processor enriches events with additional context and classification:
--&gt;
&lt;h3 id=&#34;event-处理和分类&#34;&gt;Event 处理和分类&lt;/h3&gt;
&lt;p&gt;Event 处理器为 Event 添加额外的上下文和分类：&lt;/p&gt;
&lt;!--
```go
type EventProcessor struct {
    categoryRules []CategoryRule
    correlationRules []CorrelationRule
}

type ProcessedEvent struct {
    Event     *eventsv1.Event
    Category  string
    Severity  string
    CorrelationID string
    Metadata  map[string]string
}

func (p *EventProcessor) Process(event *eventsv1.Event) *ProcessedEvent {
    processed := &amp;ProcessedEvent{
        Event:    event,
        Metadata: make(map[string]string),
    }
    
    // Apply classification rules
    processed.Category = p.classifyEvent(event)
    processed.Severity = p.determineSeverity(event)
    
    // Generate correlation ID for related events
    processed.CorrelationID = p.correlateEvent(event)
    
    // Add useful metadata
    processed.Metadata = p.extractMetadata(event)
    
    return processed
}
```
--&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-go&#34; data-lang=&#34;go&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;type&lt;/span&gt; EventProcessor &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;struct&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    categoryRules []CategoryRule
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    correlationRules []CorrelationRule
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;type&lt;/span&gt; ProcessedEvent &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;struct&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Event     &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;eventsv1.Event
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Category  &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Severity  &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    CorrelationID &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Metadata  &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;map&lt;/span&gt;[&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;]&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; (p &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;EventProcessor) &lt;span style=&#34;color:#00a000&#34;&gt;Process&lt;/span&gt;(event &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;eventsv1.Event) &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;ProcessedEvent {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    processed &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#666&#34;&gt;&amp;amp;&lt;/span&gt;ProcessedEvent{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        Event:    event,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        Metadata: &lt;span style=&#34;color:#a2f&#34;&gt;make&lt;/span&gt;(&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;map&lt;/span&gt;[&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;]&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 应用分类规则
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    processed.Category = p.&lt;span style=&#34;color:#00a000&#34;&gt;classifyEvent&lt;/span&gt;(event)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    processed.Severity = p.&lt;span style=&#34;color:#00a000&#34;&gt;determineSeverity&lt;/span&gt;(event)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 为相关 Event 生成关联 ID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    processed.CorrelationID = p.&lt;span style=&#34;color:#00a000&#34;&gt;correlateEvent&lt;/span&gt;(event)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 添加有用的元数据
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    processed.Metadata = p.&lt;span style=&#34;color:#00a000&#34;&gt;extractMetadata&lt;/span&gt;(event)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; processed
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
### Implementing Event correlation

One of the key features you could implement is a way of correlating related Events.
Here&#39;s an example correlation strategy:
--&gt;
&lt;h3 id=&#34;实现-event-关联&#34;&gt;实现 Event 关联&lt;/h3&gt;
&lt;p&gt;你可以实现的一个关键特性是关联相关 Event 的方法，这里有一个示例关联策略：&lt;/p&gt;
&lt;!--
```go
func (p *EventProcessor) correlateEvent(event *eventsv1.Event) string {
    // Correlation strategies:
    // 1. Time-based: Events within a time window
    // 2. Resource-based: Events affecting the same resource
    // 3. Causation-based: Events with cause-effect relationships

    correlationKey := generateCorrelationKey(event)
    return correlationKey
}

func generateCorrelationKey(event *eventsv1.Event) string {
    // Example: Combine namespace, resource type, and name
    return fmt.Sprintf(&#34;%s/%s/%s&#34;,
        event.InvolvedObject.Namespace,
        event.InvolvedObject.Kind,
        event.InvolvedObject.Name,
    )
}
```
--&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-go&#34; data-lang=&#34;go&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; (p &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;EventProcessor) &lt;span style=&#34;color:#00a000&#34;&gt;correlateEvent&lt;/span&gt;(event &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;eventsv1.Event) &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 相关策略：
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 1. 基于时间的：时间窗口内的事件
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 2. 基于资源的：影响同一资源的事件
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 3. 基于因果关系的：具有因果关系的事件
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    correlationKey &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#00a000&#34;&gt;generateCorrelationKey&lt;/span&gt;(event)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; correlationKey
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; &lt;span style=&#34;color:#00a000&#34;&gt;generateCorrelationKey&lt;/span&gt;(event &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;eventsv1.Event) &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 示例：结合命名空间、资源类型和名称
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; fmt.&lt;span style=&#34;color:#00a000&#34;&gt;Sprintf&lt;/span&gt;(&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;%s/%s/%s&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        event.InvolvedObject.Namespace,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        event.InvolvedObject.Kind,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        event.InvolvedObject.Name,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
## Event storage and retention

For long-term storage and analysis, you&#39;ll probably want a backend that supports:
- Efficient querying of large event volumes
- Flexible retention policies
- Support for aggregation queries

Here&#39;s a sample storage interface:
--&gt;
&lt;h2 id=&#34;event-存储和保留&#34;&gt;Event 存储和保留&lt;/h2&gt;
&lt;p&gt;对于长期存储和分析，你可能需要一个支持以下功能的后端：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;大量 Event 的高效查询&lt;/li&gt;
&lt;li&gt;灵活的保留策略&lt;/li&gt;
&lt;li&gt;支持聚合查询&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;这里是一个示例存储接口：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-go&#34; data-lang=&#34;go&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;type&lt;/span&gt; EventStorage &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;interface&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#00a000&#34;&gt;Store&lt;/span&gt;(context.Context, &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;ProcessedEvent) &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;error&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#00a000&#34;&gt;Query&lt;/span&gt;(context.Context, EventQuery) ([]ProcessedEvent, &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;error&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#00a000&#34;&gt;Aggregate&lt;/span&gt;(context.Context, AggregationParams) ([]EventAggregate, &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;error&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;type&lt;/span&gt; EventQuery &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;struct&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    TimeRange     TimeRange
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Categories    []&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Severity      []&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    CorrelationID &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Limit         &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;int&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;type&lt;/span&gt; AggregationParams &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;struct&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    GroupBy    []&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    TimeWindow &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Metrics    []&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
## Good practices for Event management

1. **Resource Efficiency**
   - Implement rate limiting for event processing
   - Use efficient filtering at the API server level
   - Batch events for storage operations
--&gt;
&lt;h2 id=&#34;event-管理的良好实践&#34;&gt;Event 管理的良好实践&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;资源效率&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;为 Event 处理实现速率限制&lt;/li&gt;
&lt;li&gt;在 API 服务器级别使用高效的过滤&lt;/li&gt;
&lt;li&gt;对存储操作批量处理 Event&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
2. **Scalability**
   - Distribute event processing across multiple workers
   - Use leader election for coordination
   - Implement backoff strategies for API rate limits

3. **Reliability**
   - Handle API server disconnections gracefully
   - Buffer events during storage backend unavailability
   - Implement retry mechanisms with exponential backoff
--&gt;
&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;扩缩性&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;将 Event 处理分派给多个工作线程&lt;/li&gt;
&lt;li&gt;使用领导者选举进行协调&lt;/li&gt;
&lt;li&gt;实施 API 速率限制的退避策略&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;可靠性&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;优雅地处理 API 服务器断开连接&lt;/li&gt;
&lt;li&gt;在存储后端不可用期间缓冲 Event&lt;/li&gt;
&lt;li&gt;实施带有指数退避的重试机制&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
## Advanced features

### Pattern detection

Implement pattern detection to identify recurring issues:
--&gt;
&lt;h2 id=&#34;高级特性&#34;&gt;高级特性&lt;/h2&gt;
&lt;h3 id=&#34;模式检测&#34;&gt;模式检测&lt;/h3&gt;
&lt;p&gt;实现模式检测以识别重复出现的问题：&lt;/p&gt;
&lt;!--
```go
type PatternDetector struct {
    patterns map[string]*Pattern
    threshold int
}

func (d *PatternDetector) Detect(events []ProcessedEvent) []Pattern {
    // Group similar events
    groups := groupSimilarEvents(events)
    
    // Analyze frequency and timing
    patterns := identifyPatterns(groups)
    
    return patterns
}

func groupSimilarEvents(events []ProcessedEvent) map[string][]ProcessedEvent {
    groups := make(map[string][]ProcessedEvent)
    
    for _, event := range events {
        // Create similarity key based on event characteristics
        similarityKey := fmt.Sprintf(&#34;%s:%s:%s&#34;,
            event.Event.Reason,
            event.Event.InvolvedObject.Kind,
            event.Event.InvolvedObject.Namespace,
        )
        
        // Group events with the same key
        groups[similarityKey] = append(groups[similarityKey], event)
    }
    
    return groups
}


func identifyPatterns(groups map[string][]ProcessedEvent) []Pattern {
    var patterns []Pattern
    
    for key, events := range groups {
        // Only consider groups with enough events to form a pattern
        if len(events) &lt; 3 {
            continue
        }
        
        // Sort events by time
        sort.Slice(events, func(i, j int) bool {
            return events[i].Event.LastTimestamp.Time.Before(events[j].Event.LastTimestamp.Time)
        })
        
        // Calculate time range and frequency
        firstSeen := events[0].Event.FirstTimestamp.Time
        lastSeen := events[len(events)-1].Event.LastTimestamp.Time
        duration := lastSeen.Sub(firstSeen).Minutes()
        
        var frequency float64
        if duration &gt; 0 {
            frequency = float64(len(events)) / duration
        }
        
        // Create a pattern if it meets threshold criteria
        if frequency &gt; 0.5 { // More than 1 event per 2 minutes
            pattern := Pattern{
                Type:         key,
                Count:        len(events),
                FirstSeen:    firstSeen,
                LastSeen:     lastSeen,
                Frequency:    frequency,
                EventSamples: events[:min(3, len(events))], // Keep up to 3 samples
            }
            patterns = append(patterns, pattern)
        }
    }
    
    return patterns
}
```
--&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-go&#34; data-lang=&#34;go&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;type&lt;/span&gt; PatternDetector &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;struct&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    patterns &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;map&lt;/span&gt;[&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;]&lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;Pattern
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    threshold &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;int&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; (d &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;PatternDetector) &lt;span style=&#34;color:#00a000&#34;&gt;Detect&lt;/span&gt;(events []ProcessedEvent) []Pattern {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 将类似 Event 分组
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    groups &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#00a000&#34;&gt;groupSimilarEvents&lt;/span&gt;(events)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// Analyze frequency and timing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;    patterns &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#00a000&#34;&gt;identifyPatterns&lt;/span&gt;(groups)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; patterns
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; &lt;span style=&#34;color:#00a000&#34;&gt;groupSimilarEvents&lt;/span&gt;(events []ProcessedEvent) &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;map&lt;/span&gt;[&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;][]ProcessedEvent {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    groups &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#a2f&#34;&gt;make&lt;/span&gt;(&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;map&lt;/span&gt;[&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;][]ProcessedEvent)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;for&lt;/span&gt; _, event &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;range&lt;/span&gt; events {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 根据 Event 特征创建相似性键
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;        similarityKey &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; fmt.&lt;span style=&#34;color:#00a000&#34;&gt;Sprintf&lt;/span&gt;(&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;%s:%s:%s&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            event.Event.Reason,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            event.Event.InvolvedObject.Kind,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            event.Event.InvolvedObject.Namespace,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 用相同的键对 Event 进行分组
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;        groups[similarityKey] = &lt;span style=&#34;color:#a2f&#34;&gt;append&lt;/span&gt;(groups[similarityKey], event)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; groups
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; &lt;span style=&#34;color:#00a000&#34;&gt;identifyPatterns&lt;/span&gt;(groups &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;map&lt;/span&gt;[&lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;string&lt;/span&gt;][]ProcessedEvent) []Pattern {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;var&lt;/span&gt; patterns []Pattern
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;for&lt;/span&gt; key, events &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;range&lt;/span&gt; groups {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 只考虑具有足够 Event 以形成模式的组
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#a2f&#34;&gt;len&lt;/span&gt;(events) &amp;lt; &lt;span style=&#34;color:#666&#34;&gt;3&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;continue&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 按时间对 Event 进行排序
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;        sort.&lt;span style=&#34;color:#00a000&#34;&gt;Slice&lt;/span&gt;(events, &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt;(i, j &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;int&lt;/span&gt;) &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;bool&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; events[i].Event.LastTimestamp.Time.&lt;span style=&#34;color:#00a000&#34;&gt;Before&lt;/span&gt;(events[j].Event.LastTimestamp.Time)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        })
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 计算时间范围和频率
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;        firstSeen &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; events[&lt;span style=&#34;color:#666&#34;&gt;0&lt;/span&gt;].Event.FirstTimestamp.Time
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        lastSeen &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; events[&lt;span style=&#34;color:#a2f&#34;&gt;len&lt;/span&gt;(events)&lt;span style=&#34;color:#666&#34;&gt;-&lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1&lt;/span&gt;].Event.LastTimestamp.Time
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        duration &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; lastSeen.&lt;span style=&#34;color:#00a000&#34;&gt;Sub&lt;/span&gt;(firstSeen).&lt;span style=&#34;color:#00a000&#34;&gt;Minutes&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;var&lt;/span&gt; frequency &lt;span style=&#34;color:#0b0;font-weight:bold&#34;&gt;float64&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; duration &amp;gt; &lt;span style=&#34;color:#666&#34;&gt;0&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            frequency = &lt;span style=&#34;color:#a2f&#34;&gt;float64&lt;/span&gt;(&lt;span style=&#34;color:#a2f&#34;&gt;len&lt;/span&gt;(events)) &lt;span style=&#34;color:#666&#34;&gt;/&lt;/span&gt; duration
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 如果满足阈值标准，则创建模式
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; frequency &amp;gt; &lt;span style=&#34;color:#666&#34;&gt;0.5&lt;/span&gt; { &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 每 2 分钟发生超过 1 个事件
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;            pattern &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; Pattern{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                Type:         key,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                Count:        &lt;span style=&#34;color:#a2f&#34;&gt;len&lt;/span&gt;(events),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                FirstSeen:    firstSeen,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                LastSeen:     lastSeen,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                Frequency:    frequency,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                EventSamples: events[:&lt;span style=&#34;color:#a2f&#34;&gt;min&lt;/span&gt;(&lt;span style=&#34;color:#666&#34;&gt;3&lt;/span&gt;, &lt;span style=&#34;color:#a2f&#34;&gt;len&lt;/span&gt;(events))], &lt;span style=&#34;color:#080;font-style:italic&#34;&gt;// 最多保留 3 个样本
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;&lt;/span&gt;            }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            patterns = &lt;span style=&#34;color:#a2f&#34;&gt;append&lt;/span&gt;(patterns, pattern)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;return&lt;/span&gt; patterns
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
With this implementation, the system can identify recurring patterns such as node pressure events, pod scheduling failures, or networking issues that occur with a specific frequency.
--&gt;
&lt;p&gt;通过此实现，系统可以识别诸如节点压力 Event、Pod
调度失败或以特定频率发生的网络问题等重复出现的模式。&lt;/p&gt;
&lt;!--
### Real-time alerts

The following example provides a starting point for building an alerting system based on event patterns. It is not a complete solution but a conceptual sketch to illustrate the approach.
--&gt;
&lt;h3 id=&#34;实时警报&#34;&gt;实时警报&lt;/h3&gt;
&lt;p&gt;以下示例提供了一个基于 Event 模式构建警报系统的基础起点。
它不是一个完整的解决方案，而是一个用于说明方法的概念性草图。&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-go&#34; data-lang=&#34;go&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;type&lt;/span&gt; AlertManager &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;struct&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    rules []AlertRule
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    notifiers []Notifier
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;func&lt;/span&gt; (a &lt;span style=&#34;color:#666&#34;&gt;*&lt;/span&gt;AlertManager) &lt;span style=&#34;color:#00a000&#34;&gt;EvaluateEvents&lt;/span&gt;(events []ProcessedEvent) {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;for&lt;/span&gt; _, rule &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;range&lt;/span&gt; a.rules {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;if&lt;/span&gt; rule.&lt;span style=&#34;color:#00a000&#34;&gt;Matches&lt;/span&gt;(events) {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            alert &lt;span style=&#34;color:#666&#34;&gt;:=&lt;/span&gt; rule.&lt;span style=&#34;color:#00a000&#34;&gt;GenerateAlert&lt;/span&gt;(events)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            a.&lt;span style=&#34;color:#00a000&#34;&gt;notify&lt;/span&gt;(alert)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
## Conclusion

A well-designed event aggregation system can significantly improve cluster observability and troubleshooting capabilities. By implementing custom event processing, correlation, and storage, operators can better understand cluster behavior and respond to issues more effectively.

The solutions presented here can be extended and customized based on specific requirements while maintaining compatibility with the Kubernetes API and following best practices for scalability and reliability.
--&gt;
&lt;h2 id=&#34;结论&#34;&gt;结论&lt;/h2&gt;
&lt;p&gt;一个设计良好的 Event 聚合系统可以显著提高集群的可观测性和故障排查能力。
通过实现自定义的 Event 处理、关联和存储，操作员可以更好地理解集群行为并更有效地响应问题。&lt;/p&gt;
&lt;p&gt;这里介绍的解决方案可以根据具体需求进行扩展和定制，同时保持与
Kubernetes API的兼容性，并遵循可扩展性和可靠性方面的最佳实践。&lt;/p&gt;
&lt;!--
## Next steps

Future enhancements could include:
- Machine learning for anomaly detection
- Integration with popular observability platforms
- Custom event APIs for application-specific events
- Enhanced visualization and reporting capabilities
--&gt;
&lt;h2 id=&#34;下一步&#34;&gt;下一步&lt;/h2&gt;
&lt;p&gt;未来的增强功能可能包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;用于异常检测的机器学习&lt;/li&gt;
&lt;li&gt;与流行的可观测性平台集成&lt;/li&gt;
&lt;li&gt;面向应用 Event 的自定义 Event API&lt;/li&gt;
&lt;li&gt;增强的可视化和报告能力&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
For more information on Kubernetes events and custom [controllers](/docs/concepts/architecture/controller/),
refer to the official Kubernetes [documentation](/docs/).
--&gt;
&lt;p&gt;有关 Kubernetes Event 和自定义&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/architecture/controller/&#34;&gt;控制器&lt;/a&gt; 的更多信息，
请参阅官方 Kubernetes &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/&#34;&gt;文档&lt;/a&gt;。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>介绍 Gateway API 推理扩展</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/05/introducing-gateway-api-inference-extension/</link>
      <pubDate>Thu, 05 Jun 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/05/introducing-gateway-api-inference-extension/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Introducing Gateway API Inference Extension&#34;
date: 2025-06-05
slug: introducing-gateway-api-inference-extension
draft: false
author: &gt;
  Daneyon Hansen (Solo.io),
  Kaushik Mitra (Google),
  Jiaxin Shan (Bytedance),
  Kellen Swain (Google)
--&gt;
&lt;!--
Modern generative AI and large language model (LLM) services create unique traffic-routing challenges
on Kubernetes. Unlike typical short-lived, stateless web requests, LLM inference sessions are often
long-running, resource-intensive, and partially stateful. For example, a single GPU-backed model server
may keep multiple inference sessions active and maintain in-memory token caches.

Traditional load balancers focused on HTTP path or round-robin lack the specialized capabilities needed
for these workloads. They also don’t account for model identity or request criticality (e.g., interactive
chat vs. batch jobs). Organizations often patch together ad-hoc solutions, but a standardized approach
is missing.
--&gt;
&lt;p&gt;现代生成式 AI 和大语言模型（LLM）服务在 Kubernetes 上带来独特的流量路由挑战。
与典型的短生命期的无状态 Web 请求不同，LLM 推理会话通常是长时间运行的、资源密集型的，并且具有一定的状态性。
例如，单个由 GPU 支撑的模型服务器可能会保持多个推理会话处于活跃状态，并保留内存中的令牌缓存。&lt;/p&gt;
&lt;p&gt;传统的负载均衡器注重 HTTP 路径或轮询，缺乏处理这类工作负载所需的专业能力。
传统的负载均衡器通常无法识别模型身份或请求重要性（例如交互式聊天与批处理任务的区别）。
各个组织往往拼凑出临时解决方案，但一直缺乏标准化的做法。&lt;/p&gt;
&lt;!--
## Gateway API Inference Extension

[Gateway API Inference Extension](https://gateway-api-inference-extension.sigs.k8s.io/) was created to address
this gap by building on the existing [Gateway API](https://gateway-api.sigs.k8s.io/), adding inference-specific
routing capabilities while retaining the familiar model of Gateways and HTTPRoutes. By adding an inference
extension to your existing gateway, you effectively transform it into an **Inference Gateway**, enabling you to
self-host GenAI/LLMs with a “model-as-a-service” mindset.
--&gt;
&lt;h2 id=&#34;gateway-api-inference-extension&#34;&gt;Gateway API 推理扩展  &lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://gateway-api-inference-extension.sigs.k8s.io/&#34;&gt;Gateway API 推理扩展&lt;/a&gt;正是为了填补这一空白而创建的，
它基于已有的 &lt;a href=&#34;https://gateway-api.sigs.k8s.io/&#34;&gt;Gateway API&lt;/a&gt; 进行构建，
添加了特定于推理的路由能力，同时保留了 Gateway 与 HTTPRoute 的熟悉模型。
通过为现有 Gateway 添加推理扩展，你就能将其转变为一个&lt;strong&gt;推理网关（Inference Gateway）&lt;/strong&gt;，
从而以“模型即服务”的理念自托管 GenAI/LLM 应用。&lt;/p&gt;
&lt;!--
The project’s goal is to improve and standardize routing to inference workloads across the ecosystem. Key
objectives include enabling model-aware routing, supporting per-request criticalities, facilitating safe model
roll-outs, and optimizing load balancing based on real-time model metrics. By achieving these, the project aims
to reduce latency and improve accelerator (GPU) utilization for AI workloads.

## How it works

The design introduces two new Custom Resources (CRDs) with distinct responsibilities, each aligning with a
specific user persona in the AI/ML serving workflow​:
--&gt;
&lt;p&gt;此项目的目标是在整个生态系统中改进并标准化对推理工作负载的路由。
关键目标包括实现模型感知路由、支持逐个请求的重要性区分、促进安全的模型发布，
以及基于实时模型指标来优化负载均衡。为了实现这些目标，此项目希望降低延迟并提高 AI 负载中的加速器（如 GPU）利用率。&lt;/p&gt;
&lt;h2 id=&#34;how-it-works&#34;&gt;工作原理  &lt;/h2&gt;
&lt;p&gt;功能设计时引入了两个具有不同职责的全新定制资源（CRD），每个 CRD 对应 AI/ML 服务流程中的一个特定用户角色：&lt;/p&gt;
&lt;!--


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/05/introducing-gateway-api-inference-extension/inference-extension-resource-model.png&#34;
         alt=&#34;Resource Model&#34;/&gt; 
&lt;/figure&gt;
--&gt;


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/05/introducing-gateway-api-inference-extension/inference-extension-resource-model.png&#34;
         alt=&#34;资源模型&#34;/&gt; 
&lt;/figure&gt;
&lt;!--
1. [InferencePool](https://gateway-api-inference-extension.sigs.k8s.io/api-types/inferencepool/)
   Defines a pool of pods (model servers) running on shared compute (e.g., GPU nodes). The platform admin can
   configure how these pods are deployed, scaled, and balanced. An InferencePool ensures consistent resource
   usage and enforces platform-wide policies. An InferencePool is similar to a Service but specialized for AI/ML
   serving needs and aware of the model-serving protocol.

2. [InferenceModel](https://gateway-api-inference-extension.sigs.k8s.io/api-types/inferencemodel/)
   A user-facing model endpoint managed by AI/ML owners. It maps a public name (e.g., &#34;gpt-4-chat&#34;) to the actual
   model within an InferencePool. This lets workload owners specify which models (and optional fine-tuning) they
   want served, plus a traffic-splitting or prioritization policy.
--&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://gateway-api-inference-extension.sigs.k8s.io/api-types/inferencepool/&#34;&gt;InferencePool&lt;/a&gt;
定义了一组在共享计算资源（如 GPU 节点）上运行的 Pod（模型服务器）。
平台管理员可以配置这些 Pod 的部署、扩缩容和负载均衡策略。
InferencePool 确保资源使用情况的一致性，并执行平台级的策略。
InferencePool 类似于 Service，但专为 AI/ML 推理服务定制，能够感知模型服务协议。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://gateway-api-inference-extension.sigs.k8s.io/api-types/inferencemodel/&#34;&gt;InferenceModel&lt;/a&gt;
是面向用户的模型端点，由 AI/ML 拥有者管理。
它将一个公共名称（如 &amp;quot;gpt-4-chat&amp;quot;）映射到 InferencePool 内的实际模型。
这使得负载拥有者可以指定要服务的模型（及可选的微调版本），并配置流量拆分或优先级策略。&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
In summary, the InferenceModel API lets AI/ML owners manage what is served, while the InferencePool lets platform
operators manage where and how it’s served.
--&gt;
&lt;p&gt;简而言之，InferenceModel API 让 AI/ML 拥有者管理“提供什么服务”，而
InferencePool 则让平台运维人员管理“在哪儿以及如何提供服务”。&lt;/p&gt;
&lt;!--
## Request flow

The flow of a request builds on the Gateway API model (Gateways and HTTPRoutes) with one or more extra inference-aware
steps (extensions) in the middle. Here’s a high-level example of the request flow with the
[Endpoint Selection Extension (ESE)](https://gateway-api-inference-extension.sigs.k8s.io/#endpoint-selection-extension):
--&gt;
&lt;h2 id=&#34;request-flow&#34;&gt;请求流程  &lt;/h2&gt;
&lt;p&gt;请求的处理流程基于 Gateway API 模型（Gateway 和 HTTPRoute），在其中插入一个或多个对推理有感知的步骤（扩展）。
以下是一个使用&lt;a href=&#34;https://gateway-api-inference-extension.sigs.k8s.io/#endpoint-selection-extension&#34;&gt;端点选择扩展（Endpoint Selection Extension, ESE）&lt;/a&gt;
的高级请求流程示意图：&lt;/p&gt;
&lt;!--


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/05/introducing-gateway-api-inference-extension/inference-extension-request-flow.png&#34;
         alt=&#34;Request Flow&#34;/&gt; 
&lt;/figure&gt;
--&gt;


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/05/introducing-gateway-api-inference-extension/inference-extension-request-flow.png&#34;
         alt=&#34;请求流程&#34;/&gt; 
&lt;/figure&gt;
&lt;!--
1. **Gateway Routing**  
   A client sends a request (e.g., an HTTP POST to /completions). The Gateway (like Envoy) examines the HTTPRoute
   and identifies the matching InferencePool backend.

2. **Endpoint Selection**  
   Instead of simply forwarding to any available pod, the Gateway consults an inference-specific routing extension—
   the Endpoint Selection Extension—to pick the best of the available pods. This extension examines live pod metrics
   (queue lengths, memory usage, loaded adapters) to choose the ideal pod for the request.
--&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Gateway 路由&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;客户端发送请求（例如向 &lt;code&gt;/completions&lt;/code&gt; 发起 HTTP POST）。
Gateway（如 Envoy）会检查 HTTPRoute，并识别出匹配的 InferencePool 后端。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;端点选择&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Gateway 不会简单地将请求转发到任一可用的 Pod，
而是调用一个特定于推理的路由扩展（端点选择扩展）从多个可用 Pod 中选出最优者。
此扩展根据实时 Pod 指标（如队列长度、内存使用量、加载的适配器等）来选择最适合请求的 Pod。&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
3. **Inference-Aware Scheduling**  
   The chosen pod is the one that can handle the request with the lowest latency or highest efficiency, given the
   user’s criticality or resource needs. The Gateway then forwards traffic to that specific pod.
--&gt;
&lt;ol start=&#34;3&#34;&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;推理感知调度&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;所选 Pod 是基于用户重要性或资源需求下延迟最低或效率最高者。
随后 Gateway 将流量转发到这个特定的 Pod。&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;!--


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/05/introducing-gateway-api-inference-extension/inference-extension-epp-scheduling.png&#34;
         alt=&#34;Endpoint Extension Scheduling&#34;/&gt; 
&lt;/figure&gt;
--&gt;


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/05/introducing-gateway-api-inference-extension/inference-extension-epp-scheduling.png&#34;
         alt=&#34;端点扩展调度&#34;/&gt; 
&lt;/figure&gt;
&lt;!--
This extra step provides a smarter, model-aware routing mechanism that still feels like a normal single request to
the client. Additionally, the design is extensible—any Inference Gateway can be enhanced with additional inference-specific
extensions to handle new routing strategies, advanced scheduling logic, or specialized hardware needs. As the project
continues to grow, contributors are encouraged to develop new extensions that are fully compatible with the same underlying
Gateway API model, further expanding the possibilities for efficient and intelligent GenAI/LLM routing.
--&gt;
&lt;p&gt;这一额外步骤提供了一种更为智能的模型感知路由机制，但对于客户端来说感觉就像一个普通的请求。
此外，这种设计具有良好的可扩展性，任何推理网关都可以通过添加新的特定于推理的扩展来处理新的路由策略、高级调度逻辑或特定硬件需求。
随着此项目的持续发展，欢迎社区贡献者开发与底层 Gateway API 模型完全兼容的新扩展，进一步拓展高效、智能的 GenAI/LLM 路由能力。&lt;/p&gt;
&lt;!--
## Benchmarks

We evaluated ​this extension against a standard Kubernetes Service for a [vLLM](https://docs.vllm.ai/en/latest/)‐based model
serving deployment. The test environment consisted of multiple H100 (80 GB) GPU pods running vLLM ([version 1](https://blog.vllm.ai/2025/01/27/v1-alpha-release.html))
on a Kubernetes cluster, with 10 Llama2 model replicas. The [Latency Profile Generator (LPG)](https://github.com/AI-Hypercomputer/inference-benchmark)
tool was used to generate traffic and measure throughput, latency, and other metrics. The
[ShareGPT](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json)
dataset served as the workload, and traffic was ramped from 100 Queries per Second (QPS) up to 1000 QPS.
--&gt;
&lt;h2 id=&#34;benchmarks&#34;&gt;基准测试  &lt;/h2&gt;
&lt;p&gt;我们将此扩展与标准 Kubernetes Service 进行了对比测试，基于
&lt;a href=&#34;https://docs.vllm.ai/en/latest/&#34;&gt;vLLM&lt;/a&gt; 部署模型服务。
测试环境是在 Kubernetes 集群中运行 vLLM（&lt;a href=&#34;https://blog.vllm.ai/2025/01/27/v1-alpha-release.html&#34;&gt;v1&lt;/a&gt;）
的多个 H100（80 GB）GPU Pod，并部署了 10 个 Llama2 模型副本。
本次测试使用了 &lt;a href=&#34;https://github.com/AI-Hypercomputer/inference-benchmark&#34;&gt;Latency Profile Generator (LPG)&lt;/a&gt;
工具生成流量，测量吞吐量、延迟等指标。采用的工作负载数据集为
&lt;a href=&#34;https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json&#34;&gt;ShareGPT&lt;/a&gt;，
流量从 100 QPS 提升到 1000 QPS。&lt;/p&gt;
&lt;!--
### Key results



&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/05/introducing-gateway-api-inference-extension/inference-extension-benchmark.png&#34;
         alt=&#34;Endpoint Extension Scheduling&#34;/&gt; 
&lt;/figure&gt;
--&gt;
&lt;h3 id=&#34;key-results&#34;&gt;主要结果  &lt;/h3&gt;


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/05/introducing-gateway-api-inference-extension/inference-extension-benchmark.png&#34;
         alt=&#34;端点扩展调度&#34;/&gt; 
&lt;/figure&gt;
&lt;!--
- **Comparable Throughput**: Throughout the tested QPS range, the ESE delivered throughput roughly on par with a standard
  Kubernetes Service.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;吞吐量相当&lt;/strong&gt;：在整个测试的 QPS 范围内，ESE 达到的吞吐量基本与标准 Kubernetes Service 持平。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- **Lower Latency**:
  - **Per‐Output‐Token Latency**: The ​ESE showed significantly lower p90 latency at higher QPS (500+), indicating that
  its model-aware routing decisions reduce queueing and resource contention as GPU memory approaches saturation.
  - **Overall p90 Latency**: Similar trends emerged, with the ​ESE reducing end‐to‐end tail latencies compared to the
  baseline, particularly as traffic increased beyond 400–500 QPS.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;延迟更低&lt;/strong&gt;：
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;输出令牌层面的延迟&lt;/strong&gt;：在高负载（QPS 500 以上）时，​ESE 显示了 p90 延迟明显更低，
这表明随着 GPU 显存达到饱和，其模型感知路由决策可以减少排队等待和资源争用。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;整体 p90 延迟&lt;/strong&gt;：出现类似趋势，​ESE 相比基线降低了端到端尾部延迟，特别是在 QPS 超过 400–500 时更明显。&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
These results suggest that this extension&#39;s model‐aware routing significantly reduced latency for GPU‐backed LLM
workloads. By dynamically selecting the least‐loaded or best‐performing model server, it avoids hotspots that can
appear when using traditional load balancing methods for large, long‐running inference requests.

## Roadmap

As the Gateway API Inference Extension heads toward GA, planned features include:
--&gt;
&lt;p&gt;这些结果表明，此扩展的模型感知路由显著降低了 GPU 支撑的 LLM 负载的延迟。
此扩展通过动态选择负载最轻或性能最优的模型服务器，避免了传统负载均衡方法在处理较大的、长时间运行的推理请求时会出现的热点问题。&lt;/p&gt;
&lt;h2 id=&#34;roadmap&#34;&gt;路线图  &lt;/h2&gt;
&lt;p&gt;随着 Gateway API 推理扩展迈向 GA（正式发布），计划中的特性包括：&lt;/p&gt;
&lt;!--
1. **Prefix-cache aware load balancing** for remote caches
2. **LoRA adapter pipelines** for automated rollout
3. **Fairness and priority** between workloads in the same criticality band
4. **HPA support** for scaling based on aggregate, per-model metrics
5. **Support for large multi-modal inputs/outputs**
6. **Additional model types** (e.g., diffusion models)
7. **Heterogeneous accelerators** (serving on multiple accelerator types with latency- and cost-aware load balancing)
8. **Disaggregated serving** for independently scaling pools
--&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;前缀缓存感知负载均衡&lt;/strong&gt;以支持远程缓存&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LoRA 适配器流水线&lt;/strong&gt;方便自动化上线&lt;/li&gt;
&lt;li&gt;同一重要性等级下负载之间的&lt;strong&gt;公平性和优先级&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;HPA 支持&lt;/strong&gt;基于聚合的模型层面指标扩缩容&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;支持大规模多模态输入/输出&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;支持额外的模型类型&lt;/strong&gt;（如扩散模型）&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;异构加速器&lt;/strong&gt;（支持多个加速器类型，并具备延迟和成本感知的负载均衡）&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;解耦式服务架构&lt;/strong&gt;，以独立扩缩资源池&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
## Summary

By aligning model serving with Kubernetes-native tooling, Gateway API Inference Extension aims to simplify
and standardize how AI/ML traffic is routed. With model-aware routing, criticality-based prioritization, and
more, it helps ops teams deliver the right LLM services to the right users—smoothly and efficiently.
--&gt;
&lt;h2 id=&#34;summary&#34;&gt;总结  &lt;/h2&gt;
&lt;p&gt;通过将模型服务对齐到 Kubernetes 原生工具链，Gateway API 推理扩展致力于简化并标准化 AI/ML 流量的路由方式。
此扩展引入模型感知路由、基于重要性的优先级等能力，帮助运维团队平滑高效地将合适的 LLM 服务交付给合适的用户。&lt;/p&gt;
&lt;!--
**Ready to learn more?** Visit the [project docs](https://gateway-api-inference-extension.sigs.k8s.io/) to dive deeper,
give an Inference Gateway extension a try with a few [simple steps](https://gateway-api-inference-extension.sigs.k8s.io/guides/),
and [get involved](https://gateway-api-inference-extension.sigs.k8s.io/contributing/) if you’re interested in
contributing to the project!
--&gt;
&lt;p&gt;&lt;strong&gt;想进一步学习？&lt;/strong&gt;
参阅&lt;a href=&#34;https://gateway-api-inference-extension.sigs.k8s.io/&#34;&gt;项目文档&lt;/a&gt;深入学习，
只需&lt;a href=&#34;https://gateway-api-inference-extension.sigs.k8s.io/guides/&#34;&gt;简单几步&lt;/a&gt;试用推理网关扩展。
如果你想对此项目作贡献，欢迎&lt;a href=&#34;https://gateway-api-inference-extension.sigs.k8s.io/contributing/&#34;&gt;参与其中&lt;/a&gt;！&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>先启动边车：如何避免障碍</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/03/start-sidecar-first/</link>
      <pubDate>Tue, 03 Jun 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/03/start-sidecar-first/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Start Sidecar First: How To Avoid Snags&#34;
date: 2025-06-03
draft: false
slug: start-sidecar-first
author: Agata Skorupka (The Scale Factory)
--&gt;
&lt;!--
From the [Kubernetes Multicontainer Pods: An Overview blog post](/blog/2025/04/22/multi-container-pods-overview/) you know what their job is, what are the main architectural patterns, and how they are implemented in Kubernetes. The main thing I’ll cover in this article is how to ensure that your sidecar containers start before the main app. It’s more complicated than you might think!
--&gt;
&lt;p&gt;从 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/04/22/multi-container-pods-overview/&#34;&gt;&amp;quot;Kubernetes 多容器 Pod：概述&amp;quot;博客&lt;/a&gt;中，
你了解了 Pod 的工作方式，Pod 的主要架构模式，以及 Pod 在 Kubernetes 中是如何实现的。
本文主要介绍的是如何确保你的边车容器在主应用之前启动。这比你想象的要复杂得多！&lt;/p&gt;
&lt;!--
## A gentle refresher

I&#39;d just like to remind readers that the [v1.29.0 release of Kubernetes](/blog/2023/12/13/kubernetes-v1-29-release/) added native support for
[sidecar containers](/docs/concepts/workloads/pods/sidecar-containers/), which can now be defined within the `.spec.initContainers` field,
but with `restartPolicy: Always`. You can see that illustrated in the following example Pod manifest snippet:
--&gt;
&lt;h2 id=&#34;简要回顾&#34;&gt;简要回顾&lt;/h2&gt;
&lt;p&gt;我想提醒读者的是，&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2023/12/13/kubernetes-v1-29-release/&#34;&gt;Kubernetes v1.29.0 版本&lt;/a&gt;增加了对
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/pods/sidecar-containers/&#34;&gt;边车容器&lt;/a&gt;的原生支持，
现在可以在 &lt;code&gt;.spec.initContainers&lt;/code&gt; 字段中定义，但带有 &lt;code&gt;restartPolicy: Always&lt;/code&gt;。
你可以在下面的示例 Pod 清单片段中看到这一点：&lt;/p&gt;
&lt;!--
```yaml
initContainers:
  - name: logshipper
    image: alpine:latest
    restartPolicy: Always # this is what makes it a sidecar container
    command: [&#39;sh&#39;, &#39;-c&#39;, &#39;tail -F /opt/logs.txt&#39;]
    volumeMounts:
    - name: data
        mountPath: /opt
```
--&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;initContainers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;logshipper&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;alpine:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;restartPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Always&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 这就是它成为边车容器的原因&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;sh&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;-c&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;tail -F /opt/logs.txt&amp;#39;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeMounts&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;data&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;mountPath&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;/opt&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
What are the specifics of defining sidecars with a `.spec.initContainers` block, rather than as a legacy multi-container pod with multiple `.spec.containers`?
Well, all `.spec.initContainers` are always launched **before** the main application. If you define Kubernetes-native sidecars, those are terminated **after** the main application. Furthermore, when used with [Jobs](/docs/concepts/workloads/controllers/job/), a sidecar container should still be alive and could potentially even restart after the owning Job is complete; Kubernetes-native sidecar containers do not block pod completion.

To learn more, you can also read the official [Pod sidecar containers tutorial](/docs/tutorials/configuration/pod-sidecar-containers/).
--&gt;
&lt;p&gt;使用 &lt;code&gt;.spec.initContainers&lt;/code&gt; 块定义边车与使用多个 &lt;code&gt;.spec.containers&lt;/code&gt;
定义传统的多容器 Pod 相比，具体有什么不同？
其实，所有 &lt;code&gt;.spec.initContainers&lt;/code&gt; 总是&lt;strong&gt;在&lt;/strong&gt;主应用之前启动。
如果你定义了 Kubernetes 原生的边车容器，这些边车容器将在主应用之后&lt;strong&gt;终止&lt;/strong&gt;。
此外，当与 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/controllers/job/&#34;&gt;Job&lt;/a&gt; 一起使用时，
边车容器仍然保持运行，并且在拥有它的 Job 完成后甚至可能重启；
Kubernetes 原生边车容器不会阻止 Pod 的完成。&lt;/p&gt;
&lt;p&gt;要了解更多，你也可以阅读官方的
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tutorials/configuration/pod-sidecar-containers/&#34;&gt;Pod 边车容器教程&lt;/a&gt;。&lt;/p&gt;
&lt;!--
## The problem

Now you know that defining a sidecar with this native approach will always start it before the main application. From the [kubelet source code](https://github.com/kubernetes/kubernetes/blob/537a602195efdc04cdf2cb0368792afad082d9fd/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L827-L830), it&#39;s visible that this often means being started almost in parallel, and this is not always what an engineer wants to achieve. What I&#39;m really interested in is whether I can delay the start of the main application until the sidecar is not just started, but fully running and ready to serve.
It might be a bit tricky because the problem with sidecars is there’s no obvious success signal, contrary to init containers - designed to run only for a specified period of time. With an init container, exit status 0 is unambiguously &#34;I succeeded&#34;. With a sidecar, there are lots of points at which you can say &#34;a thing is running&#34;.
Starting one container only after the previous one is ready is part of a graceful deployment strategy, ensuring proper sequencing and stability during startup. It’s also actually how I’d expect sidecar containers to work as well, to cover the scenario where the main application is dependent on the sidecar. For example, it may happen that an app errors out if the sidecar isn’t available to serve requests (e.g., logging with DataDog). Sure, one could change the application code (and it would actually be the “best practice” solution), but sometimes they can’t - and this post focuses on this use case.
--&gt;
&lt;h2 id=&#34;问题&#34;&gt;问题&lt;/h2&gt;
&lt;p&gt;现在你知道使用这种原生方法定义边车总是会在主应用之前启动它。
从 &lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/537a602195efdc04cdf2cb0368792afad082d9fd/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L827-L830&#34;&gt;kubelet 源代码&lt;/a&gt;
可以看出，这通常意味着几乎是并行启动的，而这并不总是工程师想要的结果。
我们真正感兴趣的是，是否可以延迟主应用的启动，直到边车不仅启动而且完全运行并准备好服务。
这可能有点棘手，因为与 Init 容器不同（设计为仅运行指定的时间段），边车没有明显的成功信号。
对于一个 Init 容器，退出状态 0 明确表示“我成功了”。而对于边车容器，
在很多情况下你可以说“某个东西正在运行”。
仅在前一个容器准备好之后才启动另一个容器，这是优雅部署策略的一部分，
确保启动期间的正确排序和稳定性。实际上，这也是我希望边车容器工作的方式，
以覆盖主应用依赖于边车的场景。例如，如果边车不可用于服务请求（例如，使用 DataDog 进行日志记录），
应用程序可能会报错。当然，可以更改应用程序代码（这实际上是“最佳实践”解决方案），
但有时他们不能这样做 - 而本文档关注的就是这种情况。&lt;/p&gt;
&lt;!--
I&#39;ll explain some ways that you might try, and show you what approaches will really work.
--&gt;
&lt;p&gt;我会解释一些你可能尝试的方法，并告诉你哪些方法真的有效。&lt;/p&gt;
&lt;!--
## Readiness probe

To check whether Kubernetes native sidecar delays the start of the main application until the sidecar is ready, let’s simulate a short investigation. Firstly, I’ll simulate a sidecar container which will never be ready by implementing a readiness probe which will never succeed. As a reminder, a [readiness probe](/docs/concepts/configuration/liveness-readiness-startup-probes/) checks if the container is ready to start accepting traffic and therefore, if the pod can be used as a backend for services. 
--&gt;
&lt;h2 id=&#34;就绪性检测&#34;&gt;就绪性检测&lt;/h2&gt;
&lt;p&gt;要检查 Kubernetes 原生边车是否会延迟主应用的启动直到边车准备就绪，
让我们模拟一个简短的调查。首先，我将通过实现一个永远不会成功的就绪探针来模拟一个永远不会准备就绪的边车容器。
提醒一下，&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/configuration/liveness-readiness-startup-probes/&#34;&gt;就绪性探针&lt;/a&gt;检查容器是否准备好开始接受流量，
由此判断 Pod 是否可以用于服务的后端。&lt;/p&gt;
&lt;!--
(Unlike standard init containers, sidecar containers can have [probes](https://kubernetes.io/docs/concepts/configuration/liveness-readiness-startup-probes/) so that the kubelet can supervise the sidecar and intervene if there are problems. For example, restarting a sidecar container if it fails a health check.)
--&gt;
&lt;p&gt;（与标准的 Init 容器不同，边车容器可以拥有&lt;a href=&#34;https://kubernetes.io/zh-cn/docs/concepts/configuration/liveness-readiness-startup-probes/&#34;&gt;探针&lt;/a&gt; ，
以便 kubelet 可以监督边车，并在出现问题时进行干预。例如，
如果边车容器未通过健康检查，则重启它。）&lt;/p&gt;
&lt;!--
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: alpine:latest
          command: [&#34;sh&#34;, &#34;-c&#34;, &#34;sleep 3600&#34;]
      initContainers:
        - name: nginx
          image: nginx:latest
          restartPolicy: Always
          ports:
            - containerPort: 80
              protocol: TCP
          readinessProbe:
            exec:
              command:
              - /bin/sh
              - -c
              - exit 1 # this command always fails, keeping the container &#34;Not Ready&#34;
            periodSeconds: 5
      volumes:
        - name: data
          emptyDir: {}
```
--&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;apps/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Deployment&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;labels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;replicas&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;selector&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchLabels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;template&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;labels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;alpine:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;sh&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;-c&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;sleep 3600&amp;#34;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;initContainers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;restartPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Always&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;ports&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containerPort&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;80&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;protocol&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;TCP&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;readinessProbe&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;exec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;- /bin/sh&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;- -c&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;- exit 1&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 此命令总是失败，导致容器处于&amp;#34;未就绪&amp;#34;状态&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;periodSeconds&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;5&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;data&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;emptyDir&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
The result is:
--&gt;
&lt;p&gt;结果是：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-console&#34; data-lang=&#34;console&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;controlplane $ kubectl get pods -w
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;NAME                    READY   STATUS    RESTARTS   AGE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;myapp-db5474f45-htgw5   1/2     Running   0          9m28s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;&lt;/span&gt;&lt;span style=&#34;&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#888&#34;&gt;controlplane $ kubectl describe pod myapp-db5474f45-htgw5 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;Name:             myapp-db5474f45-htgw5
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;Namespace:        default
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;(...)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;Events:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Type     Reason     Age               From               Message
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  ----     ------     ----              ----               -------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Scheduled  17s               default-scheduler  Successfully assigned default/myapp-db5474f45-htgw5 to node01
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Pulling    16s               kubelet            Pulling image &amp;#34;nginx:latest&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Pulled     16s               kubelet            Successfully pulled image &amp;#34;nginx:latest&amp;#34; in 163ms (163ms including waiting). Image size: 72080558 bytes.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Created    16s               kubelet            Created container nginx
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Started    16s               kubelet            Started container nginx
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Pulling    15s               kubelet            Pulling image &amp;#34;alpine:latest&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Pulled     15s               kubelet            Successfully pulled image &amp;#34;alpine:latest&amp;#34; in 159ms (160ms including waiting). Image size: 3652536 bytes.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Created    15s               kubelet            Created container myapp
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Normal   Started    15s               kubelet            Started container myapp
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;  Warning  Unhealthy  1s (x6 over 15s)  kubelet            Readiness probe failed:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
From these logs it’s evident that only one container is ready - and I know it can’t be the sidecar, because I’ve defined it so it’ll never be ready (you can also check container statuses in `kubectl get pod -o json`). I also saw that myapp has been started before the sidecar is ready. That was not the result I wanted to achieve; in this case, the main app container has a hard dependency on its sidecar.
--&gt;
&lt;p&gt;从这些日志中可以明显看出只有一个容器准备就绪 - 我知道这不可能是边车，
因为我将其定义为永远不会准备就绪（你也可以在 &lt;code&gt;kubectl get pod -o json&lt;/code&gt; 中检查容器状态）。
我还看到 myapp 在边车准备就绪之前已经启动。这不是我希望达到的结果；
在这种情况下，主应用容器对它边车有硬依赖。&lt;/p&gt;
&lt;!--
## Maybe a startup probe?

To ensure that the sidecar is ready before the main app container starts, I can define a `startupProbe`. It will delay the start of the main container until the command is successfully executed (returns `0` exit status). If you’re wondering why I’ve added it to my `initContainer`, let’s analyse what happens If I’d added it to myapp container. I wouldn’t have guaranteed the probe would run before the main application code - and this one, can potentially error out without the sidecar being up and running.
--&gt;
&lt;h2 id=&#34;或许是一个启动探针&#34;&gt;或许是一个启动探针？&lt;/h2&gt;
&lt;p&gt;为了确保边车准备就绪后再启动主应用容器，我可以定义一个 &lt;code&gt;startupProbe&lt;/code&gt;。
这将延迟主容器的启动，直到命令成功执行（返回 &lt;code&gt;0&lt;/code&gt; 退出状态）。
如果你想知道为什么我将其添加到我的 &lt;code&gt;initContainer&lt;/code&gt; 中，
让我们分析一下如果我将其添加到 myapp 容器会发生什么。
我不能保证探针会在主应用代码之前运行 - 而这可能会导致错误，尤其是在边车尚未启动和运行时。&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;apps/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Deployment&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;labels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;replicas&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;selector&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchLabels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;template&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;labels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;myapp&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;alpine:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;sh&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;-c&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;sleep 3600&amp;#34;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;initContainers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;ports&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containerPort&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;80&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;protocol&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;TCP&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;restartPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Always&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;startupProbe&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;httpGet&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;path&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;/&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;80&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;initialDelaySeconds&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;5&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;periodSeconds&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;30&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;failureThreshold&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;timeoutSeconds&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;20&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;data&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;emptyDir&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;{}&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
This results in 2/2 containers being ready and running, and from events, it can be inferred that the main application started only after nginx had already been started. But to confirm whether it waited for the sidecar readiness, let’s change the `startupProbe` to the exec type of command: 
--&gt;
&lt;p&gt;这导致 2/2 个容器已就绪并运行，从事件中可以推断主应用仅在 nginx 已启动后才开始启动。
但为了确认它是否等待了边车的就绪状态，让我们将 &lt;code&gt;startupProbe&lt;/code&gt; 更改为执行类型命令：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;startupProbe&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;exec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- /bin/sh&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- -c&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- sleep 15&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
and run `kubectl get pods -w` to watch in real time whether the readiness of both containers only changes after a 15 second delay. Again, events confirm the main application starts after the sidecar.
That means that using the `startupProbe` with a correct `startupProbe.httpGet` request helps to delay the main application start until the sidecar is ready. It’s not optimal, but it works.
--&gt;
&lt;p&gt;并运行 &lt;code&gt;kubectl get pods -w&lt;/code&gt; 以实时观察两个容器的就绪状态是否仅在 15 秒延迟后更改。
再次确认，事件显示主应用在边车之后启动。
这意味着使用带有正确 &lt;code&gt;startupProbe.httpGet&lt;/code&gt; 请求的 &lt;code&gt;startupProbe&lt;/code&gt;
有助于延迟主应用的启动，直到边车准备就绪。这不理想，但它有效。&lt;/p&gt;
&lt;!--
## What about the postStart lifecycle hook?

Fun fact: using the `postStart` lifecycle hook block will also do the job, but I’d have to write my own mini-shell script, which is even less efficient.
--&gt;
&lt;h2 id=&#34;关于-poststart-生命周期钩子&#34;&gt;关于 postStart 生命周期钩子？&lt;/h2&gt;
&lt;p&gt;趣闻：使用 &lt;code&gt;postStart&lt;/code&gt; 生命周期钩子块也可以完成任务，
但我要编写自己的迷你 Shell 脚本，这甚至更低效。&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;initContainers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;restartPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Always&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;ports&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containerPort&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;80&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;protocol&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;TCP&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;lifecycle&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;postStart&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;exec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;- /bin/sh&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;- -c&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;- |&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            echo &amp;#34;Waiting for readiness at http://localhost:80&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            until curl -sf http://localhost:80; do
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;              echo &amp;#34;Still waiting for http://localhost:80...&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;              sleep 5
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            done
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            echo &amp;#34;Service is ready at http://localhost:80&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
## Liveness probe

An interesting exercise would be to check the sidecar container behavior with a [liveness probe](/docs/concepts/configuration/liveness-readiness-startup-probes/).
A liveness probe behaves and is configured similarly to a readiness probe - only with the difference that it doesn’t affect the readiness of the container but restarts it in case the probe fails. 
--&gt;
&lt;h2 id=&#34;存活探针&#34;&gt;存活探针&lt;/h2&gt;
&lt;p&gt;一个有趣的练习是使用&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/configuration/liveness-readiness-startup-probes/&#34;&gt;存活探针&lt;/a&gt;检查边车容器的行为。
存活探针的配置和行为与就绪探针相似——唯一的区别是它不会影响容器的就绪状态，而是在探针失败时重启容器。&lt;/p&gt;
&lt;!--
```yaml
livenessProbe:
  exec:
    command:
    - /bin/sh
    - -c
    - exit 1 # this command always fails, keeping the container &#34;Not Ready&#34;
  periodSeconds: 5
```
--&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;livenessProbe&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;exec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- /bin/sh&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- -c&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- exit 1&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 该命令总是失败，导致容器处于&amp;#34;未就绪&amp;#34;状态&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;periodSeconds&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;5&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
After adding the liveness probe configured just as the previous readiness probe and checking events of the pod by `kubectl describe pod` it’s visible that the sidecar has a restart count above 0. Nevertheless, the main application is not restarted nor influenced at all, even though I&#39;m aware that (in our imaginary worst-case scenario) it can error out when the sidecar is not there serving requests.
What if I’d used a `livenessProbe` without lifecycle `postStart`? Both containers will be immediately ready: at the beginning, this behavior will not be different from the one without any additional probes since the liveness probe doesn’t affect readiness at all. After a while, the sidecar will begin to restart itself, but it won’t influence the main container.
--&gt;
&lt;p&gt;在添加了配置与之前的就绪探针相同的存活探针，并通过 &lt;code&gt;kubectl describe pod&lt;/code&gt;
检查 Pod 的事件后，可以看到边车的重启次数超过 0。尽管如此，主应用并未受到任何影响或重启，
即使我知道（在我们假想的最坏情况下）当边车不处理请求时，主应用可能会出错。
如果我在没有生命周期 &lt;code&gt;postStart&lt;/code&gt; 的情况下使用 &lt;code&gt;livenessProbe&lt;/code&gt; 会怎样？
两个容器将立即准备就绪：一开始，这种行为不会与没有任何额外探针的情况有任何不同，
因为存活探针完全不影响就绪状态。一段时间后，边车将开始重启自己，但这不会影响主容器。&lt;/p&gt;
&lt;!--
## Findings summary

I’ll summarize the startup behavior in the table below:
--&gt;
&lt;h2 id=&#34;调研总结&#34;&gt;调研总结&lt;/h2&gt;
&lt;p&gt;我将在下表中总结启动行为：&lt;/p&gt;
&lt;!--
| Probe/Hook     | Sidecar starts before the main app?                      | Main app waits for the sidecar to be ready?         | What if the check doesn’t pass?                    |
|----------------|----------------------------------------------------------|-----------------------------------------------------|----------------------------------------------------|
| `readinessProbe` | **Yes**, but it’s almost in parallel (effectively **no**)    | **No**                                                  | Sidecar is not ready; main app continues running   |
| `livenessProbe`  | Yes, but it’s almost in parallel (effectively **no**)    | **No**                                                  | Sidecar is restarted, main app continues running   |
| `startupProbe`   | **Yes**                                                      | **Yes**                                                 | Main app is not started                            |
| postStart      | **Yes**, main app container starts after `postStart` completes | **Yes**, but you have to provide custom logic for that  | Main app is not started                            |
--&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;探针/钩子&lt;/th&gt;
&lt;th&gt;边车在主应用之前启动？&lt;/th&gt;
&lt;th&gt;主应用是否等待边车准备就绪？&lt;/th&gt;
&lt;th&gt;如果检查不通过会发生什么？&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;readinessProbe&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;是&lt;/strong&gt;，但几乎是并行的（实际上为 &lt;strong&gt;否&lt;/strong&gt;）&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;否&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;边车未就绪；主应用继续运行&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;livenessProbe&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;是，但几乎是并行的（实际上为 &lt;strong&gt;否&lt;/strong&gt;）&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;否&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;边车被重启，主应用继续运行&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;startupProbe&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;是&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;是&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;主应用不会启动&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;postStart&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;是&lt;/strong&gt;，主应用容器在 &lt;code&gt;postStart&lt;/code&gt; 完成后启动&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;是&lt;/strong&gt;，但你必须为此提供自定义逻辑&lt;/td&gt;
&lt;td&gt;主应用不会启动&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;!--
To summarize: with sidecars often being a dependency of the main application, you may want to delay the start of the latter until the sidecar is healthy.
The ideal pattern is to start both containers simultaneously and have the app container logic delay at all levels, but it’s not always possible. If that&#39;s what you need, you have to use the right kind of customization to the Pod definition. Thankfully, it’s nice and quick, and you have the recipe ready above.

Happy deploying!
--&gt;
&lt;p&gt;总结：由于边车经常是主应用的依赖项，你可能希望延迟后者启动直到边车健康。&lt;/p&gt;
&lt;p&gt;理想模式是同时启动两个容器，并让应用容器逻辑在所有层面上延迟，但这并不总是可行。
如果你需要这样做，就必须对 Pod 定义使用适当的自定义设置。
值得庆幸的是，这既简单又快速，并且你已经有了上面的解决方案。&lt;/p&gt;
&lt;p&gt;祝部署顺利！&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Gateway API v1.3.0：流量复制、CORS、Gateway 合并和重试预算的改进</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/02/gateway-api-v1-3/</link>
      <pubDate>Mon, 02 Jun 2025 09:00:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/02/gateway-api-v1-3/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Gateway API v1.3.0: Advancements in Request Mirroring, CORS, Gateway Merging, and Retry Budgets&#34;
date: 2025-06-02T09:00:00-08:00
draft: false
slug: gateway-api-v1-3
author: &gt;
  [Candace Holman](https://github.com/candita) (Red Hat)
--&gt;
&lt;p&gt;&lt;img alt=&#34;Gateway API logo&#34; src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/06/02/gateway-api-v1-3/gateway-api-logo.svg&#34;&gt;&lt;/p&gt;
&lt;!--
Join us in the Kubernetes SIG Network community in celebrating the general
availability of [Gateway API](https://gateway-api.sigs.k8s.io/) v1.3.0! We are
also pleased to announce that there are already a number of conformant
implementations to try, made possible by postponing this blog
announcement. Version 1.3.0 of the API was released about a month ago on
April 24, 2025.
--&gt;
&lt;p&gt;加入 Kubernetes SIG Network 社区，共同庆祝 &lt;a href=&#34;https://gateway-api.sigs.k8s.io/&#34;&gt;Gateway API&lt;/a&gt; v1.3.0 正式发布！
我们很高兴地宣布，通过推迟这篇博客的发布，现在已经有了多个符合规范的实现可供试用。
API 1.3.0 版本已于 2025 年 4 月 24 日发布。&lt;/p&gt;
&lt;!--
Gateway API v1.3.0 brings a new feature to the _Standard_ channel
(Gateway API&#39;s GA release channel): _percentage-based request mirroring_, and
introduces three new experimental features: cross-origin resource sharing (CORS)
filters, a standardized mechanism for listener and gateway merging, and retry
budgets.
--&gt;
&lt;p&gt;Gateway API v1.3.0 为 &lt;strong&gt;Standard&lt;/strong&gt; 渠道（Gateway API 的正式发布渠道）带来了一个新功能：&lt;strong&gt;基于百分比的流量复制&lt;/strong&gt;，
并引入了三个新的实验性功能：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;跨源资源共享（CORS）过滤器&lt;/li&gt;
&lt;li&gt;Listener 和 Gateway 合并的标准化机制&lt;/li&gt;
&lt;li&gt;重试预算（Retry Budgets）&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Also see the full
[release notes](https://github.com/kubernetes-sigs/gateway-api/blob/54df0a899c1c5c845dd3a80f05dcfdf65576f03c/CHANGELOG/1.3-CHANGELOG.md)
and applaud the
[v1.3.0 release team](https://github.com/kubernetes-sigs/gateway-api/blob/54df0a899c1c5c845dd3a80f05dcfdf65576f03c/CHANGELOG/1.3-TEAM.md)
next time you see them.
--&gt;
&lt;p&gt;另请查看完整的&lt;a href=&#34;https://github.com/kubernetes-sigs/gateway-api/blob/54df0a899c1c5c845dd3a80f05dcfdf65576f03c/CHANGELOG/1.3-CHANGELOG.md&#34;&gt;发布说明&lt;/a&gt;，
下次见到 &lt;a href=&#34;https://github.com/kubernetes-sigs/gateway-api/blob/54df0a899c1c5c845dd3a80f05dcfdf65576f03c/CHANGELOG/1.3-TEAM.md&#34;&gt;v1.3.0 发布团队&lt;/a&gt; 时请为他们鼓掌。&lt;/p&gt;
&lt;!--
## Graduation to Standard channel
--&gt;
&lt;h2 id=&#34;graduation-to-standard-channel&#34;&gt;升级至 Standard 渠道&lt;/h2&gt;
&lt;!--
Graduation to the Standard channel is a notable achievement for Gateway API
features, as inclusion in the Standard release channel denotes a high level of
confidence in the API surface and provides guarantees of backward compatibility.
Of course, as with any other Kubernetes API, Standard channel features can continue
to evolve with backward-compatible additions over time, and we (SIG Network)
certainly expect
further refinements and improvements in the future. For more information on how
all of this works, refer to the [Gateway API Versioning Policy](https://gateway-api.sigs.k8s.io/concepts/versioning/).
--&gt;
&lt;p&gt;对于 Gateway API 的功能来说，升级到 Standard 渠道是一个重要的里程碑。
被纳入 Standard 发布渠道表明我们对该 API 接口的稳定性具有高度信心，并且承诺向后兼容。
当然，与任何其他 Kubernetes API 一样， Standard 渠道中的功能仍可通过向后兼容的方式不断演进。
我们（SIG Network）也确实预计未来会有进一步的优化和改进。
有关这一切如何运作的更多信息，请参阅 &lt;a href=&#34;https://gateway-api.sigs.k8s.io/concepts/versioning/&#34;&gt;Gateway API 版本控制策略&lt;/a&gt;。&lt;/p&gt;
&lt;!--
### Percentage-based request mirroring

Leads: [Lior Lieberman](https://github.com/LiorLieberman),[Jake Bennert](https://github.com/jakebennert)
GEP-3171: [Percentage-Based Request Mirroring](https://github.com/kubernetes-sigs/gateway-api/blob/main/geps/gep-3171/index.md)
--&gt;
&lt;h3 id=&#34;percentage-based-request-mirroring&#34;&gt;基于百分比的流量复制&lt;/h3&gt;
&lt;p&gt;负责人：&lt;a href=&#34;https://github.com/LiorLieberman&#34;&gt;Lior Lieberman&lt;/a&gt;、&lt;a href=&#34;https://github.com/jakebennert&#34;&gt;Jake Bennert&lt;/a&gt;
GEP-3171：&lt;a href=&#34;https://github.com/kubernetes-sigs/gateway-api/blob/main/geps/gep-3171/index.md&#34;&gt;基于百分比的流量复制&lt;/a&gt;&lt;/p&gt;
&lt;!--
_Percentage-based request mirroring_ is an enhancement to the
existing support for [HTTP request mirroring](https://gateway-api.sigs.k8s.io/guides/http-request-mirroring/), which allows HTTP requests to be duplicated to another backend using the
RequestMirror filter type.  Request mirroring is particularly useful in
blue-green deployment. It can be used to assess the impact of request scaling on
application performance without impacting responses to clients.
--&gt;
&lt;p&gt;&lt;strong&gt;基于百分比的流量复制&lt;/strong&gt;是对现有 &lt;a href=&#34;https://gateway-api.sigs.k8s.io/guides/http-request-mirroring/&#34;&gt;HTTP 流量复制&lt;/a&gt; 支持的增强，
它允许使用 RequestMirror 过滤器类型将 HTTP 请求复制到另一个后端。流量复制在蓝绿部署中特别有用。
它可用于评估流量波动对应用程序性能的影响，而不会影响对客户端的响应。&lt;/p&gt;
&lt;!--
The previous mirroring capability worked on all the requests to a `backendRef`.
Percentage-based request mirroring allows users to specify a subset of requests
they want to be mirrored, either by percentage or fraction. This can be
particularly useful when services are receiving a large volume of requests.
Instead of mirroring all of those requests, this new feature can be used to
mirror a smaller subset of them.
--&gt;
&lt;p&gt;之前的流量复制功能适用于对 &lt;code&gt;backendRef&lt;/code&gt; 的所有请求。基于百分比的流量复制允许用户指定他们想要复制的请求子集，
可以通过百分比或分数来指定。当服务接收大量请求时，这特别有用。这个新功能可以用来复制这些请求中的一小部分，
而不是复制所有请求。&lt;/p&gt;
&lt;!--
Here&#39;s an example with 42% of the requests to &#34;foo-v1&#34; being mirrored to &#34;foo-v2&#34;:
--&gt;
&lt;p&gt;以下是一个示例，将发送到 &amp;quot;foo-v1&amp;quot; 的流量的  42% 复制到 &amp;quot;foo-v2&amp;quot;：&lt;/p&gt;
&lt;!--
```yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: http-filter-mirror
  labels:
    gateway: mirror-gateway
spec:
  parentRefs:
  - name: mirror-gateway
  hostnames:
  - mirror.example
  rules:
  - backendRefs:
    - name: foo-v1
      port: 8080
    filters:
    - type: RequestMirror
      requestMirror:
        backendRef:
          name: foo-v2
          port: 8080
        percent: 42 # This value must be an integer.
```
--&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gateway.networking.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;HTTPRoute&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;http-filter-mirror&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;labels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;gateway&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;mirror-gateway&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parentRefs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;mirror-gateway&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;hostnames&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- mirror.example&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;rules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;backendRefs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;foo-v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;8080&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;filters&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;type&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;RequestMirror&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;requestMirror&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;backendRef&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;foo-v2&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;8080&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;percent&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;42&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 此值必须为整数。&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
You can also configure the partial mirroring using a fraction. Here is an example
with 5 out of every 1000 requests to &#34;foo-v1&#34; being mirrored to &#34;foo-v2&#34;.
--&gt;
&lt;p&gt;你也可以通过调整分数来实现部分流量复制。
以下是一个示例，在发送到 &amp;quot;foo-v1&amp;quot; 的请求中，将每 1000 个中的 5 个复制到 &amp;quot;foo-v2&amp;quot;。&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;rules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;backendRefs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;foo-v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;8080&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;filters&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;type&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;RequestMirror&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;requestMirror&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;backendRef&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;foo-v2&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;8080&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;fraction&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;numerator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;5&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;denominator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1000&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
## Additions to Experimental channel
--&gt;
&lt;h2 id=&#34;additions-to-Experimental-channel&#34;&gt;实验渠道的新特性&lt;/h2&gt;
&lt;!--
The Experimental channel is Gateway API&#39;s channel for experimenting with new
features and gaining confidence with them before allowing them to graduate to
standard.  Please note: the experimental channel may include features that are
changed or removed later.
--&gt;
&lt;p&gt;实验渠道（Experimental channel）是 Gateway API 用于试验新功能的渠道，以便在功能成熟之前积累足够信心，
再将其升级为 Standard 渠道功能。
请注意：实验渠道可能包含后续会被更改或移除的功能。&lt;/p&gt;
&lt;!--
Starting in release v1.3.0, in an effort to distinguish Experimental channel
resources from Standard channel resources, any new experimental API kinds have the
prefix &#34;**X**&#34;.  For the same reason, experimental resources are now added to the
API group `gateway.networking.x-k8s.io` instead of `gateway.networking.k8s.io`.
Bear in mind that using new experimental channel resources means they can coexist
with standard channel resources, but migrating these resources to the standard
channel will require recreating them with the standard channel names and API
group (both of which lack the &#34;x-k8s&#34; designator or &#34;X&#34; prefix).
--&gt;
&lt;p&gt;从 v1.3.0 版本开始，为了区分实验渠道资源和 Standard 渠道资源，
所有新的实验性 API 类型都带有 &amp;quot;&lt;strong&gt;X&lt;/strong&gt;&amp;quot; 前缀。
出于同样的原因，实验性资源现在被添加到 API 组 &lt;code&gt;gateway.networking.x-k8s.io&lt;/code&gt;，
而不是 &lt;code&gt;gateway.networking.k8s.io&lt;/code&gt;。
请注意，使用新的实验渠道资源意味着它们可以与 Standard 渠道资源共存，
若要将这些资源迁移到 Standard 渠道，则需要使用 Standard 渠道的名称和 API 组
（两者都不包含 &amp;quot;x-k8s&amp;quot; 标识或 &amp;quot;X&amp;quot; 前缀）来重新创建它们。&lt;/p&gt;
&lt;!--
The v1.3 release introduces two new experimental API kinds: XBackendTrafficPolicy
and XListenerSet.  To be able to use experimental API kinds, you need to install
the Experimental channel Gateway API YAMLs from the locations listed below.
--&gt;
&lt;p&gt;v1.3 版本引入了两个新的实验性 API 类型：XBackendTrafficPolicy 和 XListenerSet。
要使用实验性 API 类型，你需要从下面列出的位置安装实验渠道 Gateway API YAML 文件。&lt;/p&gt;
&lt;!--
### CORS filtering
--&gt;
&lt;h3 id=&#34;cors-filtering&#34;&gt;CORS 过滤&lt;/h3&gt;
&lt;!--
Leads: [Liang Li](https://github.com/liangli), [Eyal Pazz](https://github.com/EyalPazz), [Rob Scott](https://github.com/robscott)

GEP-1767: [CORS Filter](https://github.com/kubernetes-sigs/gateway-api/blob/main/geps/gep-1767/index.md)
--&gt;
&lt;p&gt;负责人：&lt;a href=&#34;https://github.com/liangli&#34;&gt;Liang Li&lt;/a&gt;、&lt;a href=&#34;https://github.com/EyalPazz&#34;&gt;Eyal Pazz&lt;/a&gt;、&lt;a href=&#34;https://github.com/robscott&#34;&gt;Rob Scott&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;GEP-1767：&lt;a href=&#34;https://github.com/kubernetes-sigs/gateway-api/blob/main/geps/gep-1767/index.md&#34;&gt;CORS 过滤器&lt;/a&gt;&lt;/p&gt;
&lt;!--
Cross-origin resource sharing (CORS) is an HTTP-header based mechanism that allows
a web page to access restricted resources from a server on an origin (domain,
scheme, or port) different from the domain that served the web page. This feature
adds a new HTTPRoute `filter` type, called &#34;CORS&#34;, to configure the handling of
cross-origin requests before the response is sent back to the client.
--&gt;
&lt;p&gt;跨源资源共享（CORS）是一种基于 HTTP Header 的机制，
允许网页从与提供网页的域不同的源（域名、协议或端口）访问受限资源。
此功能添加了一个新的 HTTPRoute &lt;code&gt;filter&lt;/code&gt; 类型，
称为 &amp;quot;CORS&amp;quot;，用于在响应发送回客户端之前配置跨源请求的处理。&lt;/p&gt;
&lt;!--
To be able to use experimental CORS filtering, you need to install the
[Experimental channel Gateway API HTTPRoute yaml](https://github.com/kubernetes-sigs/gateway-api/blob/main/config/crd/experimental/gateway.networking.k8s.io_httproutes.yaml).
--&gt;
&lt;p&gt;要使用实验性 CORS 过滤，你需要安装&lt;a href=&#34;https://github.com/kubernetes-sigs/gateway-api/blob/main/config/crd/experimental/gateway.networking.k8s.io_httproutes.yaml&#34;&gt;实验渠道 Gateway API HTTPRoute yaml&lt;/a&gt;。&lt;/p&gt;
&lt;!--
Here&#39;s an example of a simple cross-origin configuration:
--&gt;
&lt;p&gt;以下是一个简单的跨源配置示例：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gateway.networking.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;HTTPRoute&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;http-route-cors&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parentRefs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;http-gateway&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;rules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matches&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;path&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;type&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;PathPrefix&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;value&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;/resource/foo&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;filters&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;cors&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;type&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;CORS&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;allowOrigins&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- *&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;allowMethods&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- GET&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- HEAD&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- POST&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;allowHeaders&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- Accept&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- Accept-Language&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- Content-Language&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- Content-Type&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- Range&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;backendRefs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Service&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;http-route-cors&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;80&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
In this case, the Gateway returns an _origin header_ of &#34;*&#34;, which means that the
requested resource can be referenced from any origin, a _methods header_
(`Access-Control-Allow-Methods`) that permits the `GET`, `HEAD`, and `POST`
verbs, and a _headers header_ allowing `Accept`, `Accept-Language`,
`Content-Language`, `Content-Type`, and `Range`.
--&gt;
&lt;p&gt;在这种情况下，Gateway 返回一个 &lt;strong&gt;origin header&lt;/strong&gt; 为 &amp;quot;*&amp;quot;，这意味着请求的资源可以从任何源引用；
一个 &lt;strong&gt;methods header&lt;/strong&gt; （&lt;code&gt;Access-Control-Allow-Methods&lt;/code&gt;）允许 &lt;code&gt;GET&lt;/code&gt;、&lt;code&gt;HEAD&lt;/code&gt; 和 &lt;code&gt;POST&lt;/code&gt; 方法；
此外，还会返回一个 &lt;strong&gt;headers header&lt;/strong&gt; ，允许的字段包括 &lt;code&gt;Accept&lt;/code&gt;、&lt;code&gt;Accept-Language&lt;/code&gt;、
&lt;code&gt;Content-Language&lt;/code&gt;、&lt;code&gt;Content-Type&lt;/code&gt; 和 &lt;code&gt;Range&lt;/code&gt;。&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;HTTP/1.1 200 OK
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Access-Control-Allow-Origin: *
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Access-Control-Allow-Methods: GET, HEAD, POST
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Access-Control-Allow-Headers: Accept,Accept-Language,Content-Language,Content-Type,Range
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
The complete list of fields in the new CORS filter:
* `allowOrigins`
* `allowMethods`
* `allowHeaders`
* `allowCredentials`
* `exposeHeaders`
* `maxAge`

See [CORS protocol](https://fetch.spec.whatwg.org/#http-cors-protocol) for details.
--&gt;
&lt;p&gt;新的 CORS 过滤器中的完整字段列表如下：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;allowOrigins&lt;/code&gt;：允许的请求来源列表。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;allowMethods&lt;/code&gt;：允许的 HTTP 方法（如 &lt;code&gt;GET&lt;/code&gt;、&lt;code&gt;POST&lt;/code&gt; 等）。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;allowHeaders&lt;/code&gt;：允许携带的请求头字段。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;allowCredentials&lt;/code&gt;：是否允许携带凭据（如 Cookie、Authorization 头等）。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;exposeHeaders&lt;/code&gt;：允许客户端访问的响应头字段。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;maxAge&lt;/code&gt;：预检请求的缓存持续时间（单位：秒）。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;有关详细信息，请参阅 &lt;a href=&#34;https://fetch.spec.whatwg.org/#http-cors-protocol&#34;&gt;CORS 协议&lt;/a&gt;。&lt;/p&gt;
&lt;!--
### XListenerSets (standardized mechanism for Listener and Gateway merging){#XListenerSet}
--&gt;
&lt;h3 id=&#34;XListenerSet&#34;&gt;XListenerSets（Listener 和 Gateway 合并的标准化机制）&lt;/h3&gt;
&lt;!--
Lead: [Dave Protasowski](https://github.com/dprotaso)
--&gt;
&lt;p&gt;负责人：&lt;a href=&#34;https://github.com/dprotaso&#34;&gt;Dave Protasowski&lt;/a&gt;&lt;/p&gt;
&lt;!--
GEP-1713: [ListenerSets - Standard Mechanism to Merge Multiple Gateways](https://github.com/kubernetes-sigs/gateway-api/pull/3213)
--&gt;
&lt;p&gt;GEP-1713：&lt;a href=&#34;https://github.com/kubernetes-sigs/gateway-api/pull/3213&#34;&gt;ListenerSets - 合并多个 Gateway 的标准机制&lt;/a&gt;&lt;/p&gt;
&lt;!--
This release adds a new experimental API kind, XListenerSet, that allows a
shared list of _listeners_ to be attached to one or more parent Gateway(s).  In
addition, it expands upon the existing suggestion that Gateway API implementations
may merge configuration from multiple Gateway objects.  It also:
--&gt;
&lt;p&gt;此版本添加了一个新的实验性 API 类型 XListenerSet，它允许将 &lt;strong&gt;listeners&lt;/strong&gt; 的共享列表附加到一个或多个父 Gateway。
此外，它还扩展了现有的建议，即 Gateway API 实现可以合并来自多个 Gateway 对象的配置。它还包括：&lt;/p&gt;
&lt;!--
- adds a new field `allowedListeners` to the `.spec` of a Gateway. The
`allowedListeners` field defines from which Namespaces to select XListenerSets
that are allowed to attach to that Gateway: Same, All, None, or Selector based.
--&gt;
&lt;ul&gt;
&lt;li&gt;向 Gateway 的 &lt;code&gt;.spec&lt;/code&gt; 添加了一个新字段 &lt;code&gt;allowedListeners&lt;/code&gt;。
&lt;code&gt;allowedListeners&lt;/code&gt; 字段定义了从哪些命名空间选择允许附加到该 Gateway 的 XListenerSets：
Same（同一命名空间）、All（所有命名空间）、None（不允许）、或基于选择器（Selector）的方式。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- increases the previous maximum number (64) of listeners with the addition of
XListenerSets.
--&gt;
&lt;ul&gt;
&lt;li&gt;通过添加 XListenerSets 增加了之前的监听器最大数量（64）。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- allows the delegation of listener configuration, such as TLS, to applications in
other namespaces.
--&gt;
&lt;ul&gt;
&lt;li&gt;允许将监听器配置（如 TLS）委托给其他命名空间中的应用程序。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
To be able to use experimental XListenerSet, you need to install the
[Experimental channel Gateway API XListenerSet yaml](https://github.com/kubernetes-sigs/gateway-api/blob/main/config/crd/experimental/gateway.networking.x-k8s.io_xlistenersets.yaml).
--&gt;
&lt;p&gt;要使用实验性 XListenerSet，你需要安装&lt;a href=&#34;https://github.com/kubernetes-sigs/gateway-api/blob/main/config/crd/experimental/gateway.networking.x-k8s.io_xlistenersets.yaml&#34;&gt;实验渠道 Gateway API XListenerSet yaml&lt;/a&gt;。&lt;/p&gt;
&lt;!--
The following example shows a Gateway with an HTTP listener and two child HTTPS
XListenerSets with unique hostnames and certificates.  The combined set of listeners
attached to the Gateway includes the two additional HTTPS listeners in the
XListenerSets that attach to the Gateway.  This example illustrates the
delegation of listener TLS config to application owners in different namespaces
(&#34;store&#34; and &#34;app&#34;).  The HTTPRoute has both the Gateway listener named &#34;foo&#34; and
one XListenerSet listener named &#34;second&#34; as `parentRefs`.
--&gt;
&lt;p&gt;以下示例展示了一个带有 HTTP 监听器和两个子 HTTPS XListenerSets 的 Gateway，
每个 XListenerSet 都有唯一的主机名和证书。
最终附加到该 Gateway 的监听器集合包含这两个附加的 HTTPS &lt;code&gt;XListenerSet&lt;/code&gt; 监听器。
此示例说明了将监听器 TLS 配置委托给不同命名空间（&amp;quot;store&amp;quot; 和 &amp;quot;app&amp;quot;）中的应用程序所有者。
HTTPRoute 同时将名为 &lt;code&gt;&amp;quot;foo&amp;quot;&lt;/code&gt; 的 Gateway 监听器和一个名为 &lt;code&gt;&amp;quot;second&amp;quot;&lt;/code&gt; 的 &lt;code&gt;XListenerSet&lt;/code&gt;
监听器设置为其 &lt;code&gt;parentRefs&lt;/code&gt;。&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gateway.networking.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Gateway&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;prod-external&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;namespace&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;infra&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;gatewayClassName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;example&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;allowedListeners&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;from&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;All&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;listeners&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;foo&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;hostname&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;foo.com&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;protocol&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;HTTP&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;80&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;---&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gateway.networking.x-k8s.io/v1alpha1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;XListenerSet&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;store&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;namespace&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;store&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parentRef&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;prod-external&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;listeners&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;first&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;hostname&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;first.foo.com&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;protocol&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;HTTPS&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;443&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tls&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;mode&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Terminate&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;certificateRefs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Secret&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;group&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;first-workload-cert&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;---&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gateway.networking.x-k8s.io/v1alpha1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;XListenerSet&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;app&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;namespace&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;app&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parentRef&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;prod-external&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;listeners&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;second&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;hostname&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;second.foo.com&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;protocol&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;HTTPS&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;port&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;443&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tls&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;mode&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Terminate&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;certificateRefs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Secret&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;group&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;second-workload-cert&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;---&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gateway.networking.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;HTTPRoute&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;httproute-example&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parentRefs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;app&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;XListenerSet&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;sectionName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;second&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;parent-gateway&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Gateway&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;sectionName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;foo&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;...&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
Each listener in a Gateway must have a unique combination of `port`, `protocol`,
(and `hostname` if supported by the protocol) in order for all listeners to be
**compatible** and not conflicted over which traffic they should receive.
--&gt;
&lt;p&gt;Gateway 中的每个监听器必须具有唯一的 &lt;code&gt;port&lt;/code&gt;、&lt;code&gt;protocol&lt;/code&gt; 组合
（如果协议支持，还包括 &lt;code&gt;hostname&lt;/code&gt;），
以便所有监听器都&lt;strong&gt;兼容&lt;/strong&gt;，并且不会在它们应该接收的流量上发生冲突。&lt;/p&gt;
&lt;!--
Furthermore, implementations can _merge_ separate Gateways into a single set of
listener addresses if all listeners across those Gateways are compatible.  The
management of merged listeners was under-specified in releases prior to v1.3.0.
--&gt;
&lt;p&gt;此外，如果这些 Gateway 上的所有监听器都兼容，实现可以将单独的 Gateway &lt;strong&gt;合并&lt;/strong&gt;为单个监听器地址集。
在 v1.3.0 之前的版本中，合并监听器的管理规范不足。&lt;/p&gt;
&lt;!--
With the new feature, the specification on merging is expanded.  Implementations
must treat the parent Gateways as having the merged list of all listeners from
itself and from attached XListenerSets, and validation of this list of listeners
must behave the same as if the list were part of a single Gateway. Within a single
Gateway, listeners are ordered using the following precedence:
--&gt;
&lt;p&gt;通过新功能，合并规范得到了扩展。实现必须将父 Gateway 视为具有来自自身和附加的 XListenerSets
的所有监听器的合并列表，
并且对该监听器列表的验证行为应与其作为单个 Gateway 的一部分。
在单个 Gateway 内，监听器使用以下优先级排序：&lt;/p&gt;
&lt;!--
1. Single Listeners (not a part of an XListenerSet) first,
--&gt;
&lt;ol&gt;
&lt;li&gt;首先是单个监听器（而不是 XListenerSet 的一部分），&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
2. Remaining listeners ordered by:
   - object creation time (oldest first), and if two listeners are defined in
   objects that have the same timestamp, then
   - alphabetically based on &#34;{namespace}/{name of listener}&#34;
--&gt;
&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;
&lt;p&gt;其余监听器按以下顺序排序：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;按对象创建时间排序（最早创建的优先）；&lt;/li&gt;
&lt;li&gt;如果两个监听器所在的对象具有相同的时间戳，
则按照 &lt;code&gt;{namespace}/{监听器名称}&lt;/code&gt; 的字母顺序排序&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
### Retry budgets (XBackendTrafficPolicy) {#XBackendTrafficPolicy}
--&gt;
&lt;h3 id=&#34;XBackendTrafficPolicy&#34;&gt;重试预算（Retry budgets）（XBackendTrafficPolicy）&lt;/h3&gt;
&lt;!--
Leads: [Eric Bishop](https://github.com/ericdbishop), [Mike Morris](https://github.com/mikemorris)
--&gt;
&lt;p&gt;负责人：&lt;a href=&#34;https://github.com/ericdbishop&#34;&gt;Eric Bishop&lt;/a&gt;、&lt;a href=&#34;https://github.com/mikemorris&#34;&gt;Mike Morris&lt;/a&gt;&lt;/p&gt;
&lt;!--
GEP-3388: [Retry Budgets](https://gateway-api.sigs.k8s.io/geps/gep-3388)
--&gt;
&lt;p&gt;GEP-3388：&lt;a href=&#34;https://gateway-api.sigs.k8s.io/geps/gep-3388&#34;&gt;重试预算（Retry budgets）&lt;/a&gt;&lt;/p&gt;
&lt;!--
This feature allows you to configure a _retry budget_ across all endpoints
of a destination Service.  This is used to limit additional client-side retries
after reaching a configured threshold. When configuring the budget, the maximum
percentage of active requests that may consist of retries may be specified, as well as
the interval over which requests will be considered when calculating the threshold
for retries. The development of this specification changed the existing
experimental API kind BackendLBPolicy into a new experimental API kind,
XBackendTrafficPolicy, in the interest of reducing the proliferation of policy
resources that had commonalities.
--&gt;
&lt;p&gt;此功能允许你为目标服务的所有端点配置&lt;strong&gt;重试预算（Retry budgets）&lt;/strong&gt;。
用于在达到配置的阈值后限制额外的客户端重试。
配置预算时，可以指定可能包含重试在内的活动请求的最大百分比，
以及在计算重试阈值时考虑请求的时间间隔。
此规范的开发将现有的实验性 API 类型 BackendLBPolicy 更改为新的实验性 API 类型 XBackendTrafficPolicy，
以减少具有共同点的策略资源的扩散。&lt;/p&gt;
&lt;!--
To be able to use experimental retry budgets, you need to install the
[Experimental channel Gateway API XBackendTrafficPolicy yaml](https://github.com/kubernetes-sigs/gateway-api/blob/main/config/crd/experimental/gateway.networking.x-k8s.io_xbackendtrafficpolicies.yaml).
--&gt;
&lt;p&gt;要使用实验性重试预算（Retry budgets），你需要安装&lt;a href=&#34;https://github.com/kubernetes-sigs/gateway-api/blob/main/config/crd/experimental/gateway.networking.x-k8s.io_xbackendtrafficpolicies.yaml&#34;&gt;实验渠道 Gateway API XBackendTrafficPolicy yaml&lt;/a&gt;。&lt;/p&gt;
&lt;!--
The following example shows an XBackendTrafficPolicy that applies a
`retryConstraint` that represents a budget that limits the retries to a maximum
of 20% of requests, over a duration of 10 seconds, and to a minimum of 3 retries
over 1 second.
--&gt;
&lt;p&gt;以下示例显示了一个 XBackendTrafficPolicy，它应用了一个 &lt;code&gt;retryConstraint&lt;/code&gt; （重试约束），
表示一个 重试预算（Retry budgets） ，将重试限制为最多 20% 的请求，持续时间为 10 秒，
并且在 1 秒内最少重试 3 次。&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;gateway.networking.x-k8s.io/v1alpha1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;XBackendTrafficPolicy&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;traffic-policy-example&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;retryConstraint&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;budget&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;percent&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;20&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;interval&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;10s&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;minRetryRate&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;count&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;3&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;interval&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;1s&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;...&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
## Try it out
--&gt;
&lt;h2 id=&#34;try-it-out&#34;&gt;试用  &lt;/h2&gt;
&lt;!--
Unlike other Kubernetes APIs, you don&#39;t need to upgrade to the latest version of
Kubernetes to get the latest version of Gateway API. As long as you&#39;re running
Kubernetes 1.26 or later, you&#39;ll be able to get up and running with this version
of Gateway API.
--&gt;
&lt;p&gt;与其他 Kubernetes API 不同，你不需要升级到最新版本的 Kubernetes 来获取最新版本的 Gateway API。
只要你运行的是 Kubernetes 1.26 或更高版本，你就可以使用此版本的 Gateway API 启动和运行。&lt;/p&gt;
&lt;!--
To try out the API, follow the [Getting Started Guide](https://gateway-api.sigs.k8s.io/guides/).
As of this writing, four implementations are already conformant with Gateway API
v1.3 experimental channel features. In alphabetical order:
--&gt;
&lt;p&gt;要试用 API，请按照&lt;a href=&#34;https://gateway-api.sigs.k8s.io/guides/&#34;&gt;入门指南&lt;/a&gt;操作。
截至本文撰写时，已有四个实现符合 Gateway API v1.3 实验渠道功能。按字母顺序排列：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/airlock/microgateway/releases/tag/4.6.0&#34;&gt;Airlock Microgateway 4.6&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/cilium/cilium&#34;&gt;Cilium main&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/envoyproxy/gateway/releases/tag/v1.4.0&#34;&gt;Envoy Gateway v1.4.0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://istio.io&#34;&gt;Istio 1.27-dev&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Get involved
--&gt;
&lt;h2 id=&#34;get-involved&#34;&gt;参与其中  &lt;/h2&gt;
&lt;!--
Wondering when a feature will be added?  There are lots of opportunities to get
involved and help define the future of Kubernetes routing APIs for both ingress
and service mesh.
--&gt;
&lt;p&gt;想知道何时会添加功能？有很多机会参与并帮助定义 Kubernetes API 路由的未来，包括 Ingress 和服务网格。&lt;/p&gt;
&lt;!--
* Check out the [user guides](https://gateway-api.sigs.k8s.io/guides) to see what use-cases can be addressed.
* Try out one of the [existing Gateway controllers](https://gateway-api.sigs.k8s.io/implementations/).
* Or [join us in the community](https://gateway-api.sigs.k8s.io/contributing/)
and help us build the future of Gateway API together!
--&gt;
&lt;ul&gt;
&lt;li&gt;查看&lt;a href=&#34;https://gateway-api.sigs.k8s.io/guides&#34;&gt;用户指南&lt;/a&gt;了解可以解决哪些用例。&lt;/li&gt;
&lt;li&gt;试用&lt;a href=&#34;https://gateway-api.sigs.k8s.io/implementations/&#34;&gt;现有的 Gateway 控制器&lt;/a&gt;之一。&lt;/li&gt;
&lt;li&gt;或者&lt;a href=&#34;https://gateway-api.sigs.k8s.io/contributing/&#34;&gt;加入我们的社区&lt;/a&gt;，
帮助我们共同构建 Gateway API 的未来！&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
The maintainers would like to thank _everyone_ who&#39;s contributed to Gateway
API, whether in the form of commits to the repo, discussion, ideas, or general
support. We could never have made this kind of progress without the support of
this dedicated and active community.
--&gt;
&lt;p&gt;维护者衷心感谢&lt;strong&gt;所有&lt;/strong&gt;为 Gateway API 做出贡献的人，无论是通过提交代码、讨论、想法还是提供其他支持。
没有这个充满热情和活力的社区，我们永远无法取得如此进展。&lt;/p&gt;
&lt;!--
## Related Kubernetes blog articles
--&gt;
&lt;h2 id=&#34;related-kubernetes-blog-articles&#34;&gt;相关 Kubernetes 博客文章  &lt;/h2&gt;
&lt;!--
* [Gateway API v1.2: WebSockets, Timeouts, Retries, and More](/blog/2024/11/21/gateway-api-v1-2/)
  (November 2024)
* [Gateway API v1.1: Service mesh, GRPCRoute, and a whole lot more](/blog/2024/05/09/gateway-api-v1-1/)
  (May 2024)
* [New Experimental Features in Gateway API v1.0](/blog/2023/11/28/gateway-api-ga/)
  (November 2023)
* [Gateway API v1.0: GA Release](/blog/2023/10/31/gateway-api-ga/)
  (October 2023)
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2024/11/21/gateway-api-v1-2/&#34;&gt;Gateway API v1.2：WebSockets、超时、重试等&lt;/a&gt;
（2024 年 11 月）&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2024/05/09/gateway-api-v1-1/&#34;&gt;Gateway API v1.1：服务网格、GRPCRoute 和更多变化&lt;/a&gt;
（2024 年 5 月）&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2023/11/28/gateway-api-ga/&#34;&gt;Gateway API v1.0 中的新实验功能&lt;/a&gt;
（2023 年 11 月）&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2023/10/31/gateway-api-ga/&#34;&gt;Gateway API v1.0：正式发布（GA）&lt;/a&gt;
（2023 年 10 月）&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33：原地调整 Pod 资源特性升级为 Beta</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/16/kubernetes-v1-33-in-place-pod-resize-beta/</link>
      <pubDate>Fri, 16 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/16/kubernetes-v1-33-in-place-pod-resize-beta/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.33: In-Place Pod Resize Graduated to Beta&#34;
slug: kubernetes-v1-33-in-place-pod-resize-beta
date: 2025-05-16T10:30:00-08:00
author: &#34;Tim Allclair (Google)&#34;
--&gt;
&lt;!--
On behalf of the Kubernetes project, I am excited to announce that the **in-place Pod resize** feature (also known as In-Place Pod Vertical Scaling), first introduced as alpha in Kubernetes v1.27, has graduated to **Beta** and will be enabled by default in the Kubernetes v1.33 release! This marks a significant milestone in making resource management for Kubernetes workloads more flexible and less disruptive.
--&gt;
&lt;p&gt;代表 Kubernetes 项目，我很高兴地宣布，&lt;strong&gt;原地 Pod 调整大小&lt;/strong&gt;特性（也称为原地 Pod 垂直缩放），
在 Kubernetes v1.27 中首次引入为 Alpha 版本，现在已升级为 &lt;strong&gt;Beta&lt;/strong&gt; 版本，
并将在 Kubernetes v1.33 发行版中默认启用！
这标志着 Kubernetes 工作负载的资源管理变得更加灵活和不那么具有干扰性的一个重要里程碑。&lt;/p&gt;
&lt;!--
## What is in-place Pod resize?

Traditionally, changing the CPU or memory resources allocated to a container required restarting the Pod. While acceptable for many stateless applications, this could be disruptive for stateful services, batch jobs, or any workloads sensitive to restarts.
--&gt;
&lt;h2 id=&#34;what-is-in-place-pod-resize&#34;&gt;什么是原地 Pod 调整大小？  &lt;/h2&gt;
&lt;p&gt;传统上，更改分配给容器的 CPU 或内存资源需要重启 Pod。
虽然这对于许多无状态应用来说是可以接受的，
但这对于有状态服务、批处理作业或任何对重启敏感的工作负载可能会造成干扰。&lt;/p&gt;
&lt;!--
In-place Pod resizing allows you to change the CPU and memory requests and limits assigned to containers within a *running* Pod, often without requiring a container restart.
--&gt;
&lt;p&gt;原地 Pod 调整大小允许你更改&lt;strong&gt;运行中&lt;/strong&gt;的 Pod 内容器的 CPU
和内存请求及限制，通常无需重启容器。&lt;/p&gt;
&lt;!--
Here&#39;s the core idea:
* The `spec.containers[*].resources` field in a Pod specification now represents the *desired* resources and is mutable for CPU and memory.
* The `status.containerStatuses[*].resources` field reflects the *actual* resources currently configured on a running container.
* You can trigger a resize by updating the desired resources in the Pod spec via the new `resize` subresource.
--&gt;
&lt;p&gt;核心思想如下：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Pod 规约中的 &lt;code&gt;spec.containers[*].resources&lt;/code&gt; 字段现在代表&lt;strong&gt;期望的&lt;/strong&gt;资源，并且对于 CPU 和内存是可变更的。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;status.containerStatuses[*].resources&lt;/code&gt; 字段反映当前运行容器上已配置的&lt;strong&gt;实际&lt;/strong&gt;资源。&lt;/li&gt;
&lt;li&gt;你可以通过新的 &lt;code&gt;resize&lt;/code&gt; 子资源更新 Pod 规约中的期望资源来触发调整大小。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
You can try it out on a v1.33 Kubernetes cluster by using kubectl to edit a Pod (requires `kubectl` v1.32+):
--&gt;
&lt;p&gt;你可以在 v1.33 的 Kubernetes 集群上使用 kubectl 编辑
Pod 来尝试（需要 v1.32+ 的 kubectl）：&lt;/p&gt;
&lt;!--
```shell
kubectl edit pod &lt;pod-name&gt; --subresource resize
```
--&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl edit pod &amp;lt;Pod 名称&amp;gt; --subresource resize
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
For detailed usage instructions and examples, please refer to the official Kubernetes documentation:
[Resize CPU and Memory Resources assigned to Containers](/docs/tasks/configure-pod-container/resize-container-resources/).
--&gt;
&lt;p&gt;有关详细使用说明和示例，请参阅官方 Kubernetes 文档：
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/configure-pod-container/resize-container-resources/&#34;&gt;调整分配给容器的 CPU 和内存资源&lt;/a&gt;。&lt;/p&gt;
&lt;!--
## Why does in-place Pod resize matter?

Kubernetes still excels at scaling workloads horizontally (adding or removing replicas), but in-place Pod resizing unlocks several key benefits for vertical scaling:
--&gt;
&lt;h2 id=&#34;why-does-in-place-pod-resize-matter&#34;&gt;为什么原地 Pod 调整大小很重要？  &lt;/h2&gt;
&lt;p&gt;Kubernetes 在水平扩缩工作负载（添加或移除副本）方面仍然表现出色，但原地
Pod 调整大小为垂直扩缩解锁了几个关键优势：&lt;/p&gt;
&lt;!--
* **Reduced Disruption:** Stateful applications, long-running batch jobs, and sensitive workloads can have their resources adjusted without suffering the downtime or state loss associated with a Pod restart.
* **Improved Resource Utilization:** Scale down over-provisioned Pods without disruption, freeing up resources in the cluster. Conversely, provide more resources to Pods under heavy load without needing a restart.
* **Faster Scaling:** Address transient resource needs more quickly. For example Java applications often need more CPU during startup than during steady-state operation. Start with higher CPU and resize down later.
--&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;减少干扰：&lt;/strong&gt; 有状态应用、长时间运行的批处理作业和敏感工作负载可以在不经历
Pod 重启相关的停机或状态丢失的情况下调整资源。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;改进资源利用率：&lt;/strong&gt; 无需中断即可缩小过度配置的 Pod，从而释放集群中的资源。
相反，在重负载下的 Pod 可以在不重启的情况下获得更多的资源。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;更快的扩缩：&lt;/strong&gt; 更快速地解决瞬时资源需求。例如，Java
应用在启动期间通常比在稳定状态下需要更多的 CPU。
可以开始时使用更高的 CPU 配置，然后在之后调整减小。&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## What&#39;s changed between Alpha and Beta?

Since the alpha release in v1.27, significant work has gone into maturing the feature, improving its stability, and refining the user experience based on feedback and further development. Here are the key changes:
--&gt;
&lt;h2 id=&#34;whats-changed-between-alpha-and-beta&#34;&gt;从 Alpha 到 Beta 有哪些变化？  &lt;/h2&gt;
&lt;p&gt;自从 v1.27 的 Alpha 版本发布以来，为了完善此特性、
提高其稳定性并根据反馈和进一步开发优化用户体验，已经进行了大量工作。
以下是关键变化：&lt;/p&gt;
&lt;!--
### Notable user-facing changes

* **`resize` Subresource:** Modifying Pod resources must now be done via the Pod&#39;s `resize` subresource (`kubectl patch pod &lt;name&gt; --subresource resize ...`). `kubectl` versions v1.32+ support this argument.
* **Resize Status via Conditions:** The old `status.resize` field is deprecated. The status of a resize operation is now exposed via two Pod conditions:
    * `PodResizePending`: Indicates the Kubelet cannot grant the resize immediately (e.g., `reason: Deferred` if temporarily unable, `reason: Infeasible` if impossible on the node).
    * `PodResizeInProgress`: Indicates the resize is accepted and being applied. Errors encountered during this phase are now reported in this condition&#39;s message with `reason: Error`.
* **Sidecar Support:** Resizing &lt;a class=&#39;glossary-tooltip&#39; title=&#39;在 Pod 的整个生命期内保持运行的辅助容器。&#39; data-toggle=&#39;tooltip&#39; data-placement=&#39;top&#39; href=&#39;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/pods/sidecar-containers/&#39; target=&#39;_blank&#39; aria-label=&#39;sidecar containers&#39;&gt;sidecar containers&lt;/a&gt; in-place is now supported.
--&gt;
&lt;h3 id=&#34;显著的用户可感知的变化&#34;&gt;显著的用户可感知的变化&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;resize&lt;/code&gt; 子资源：&lt;/strong&gt; 修改 Pod 资源现在必须通过 Pod 的 &lt;code&gt;resize&lt;/code&gt;
子资源进行（&lt;code&gt;kubectl patch pod &amp;lt;name&amp;gt; --subresource resize ...&lt;/code&gt;）。
kubectl 版本 v1.32+ 支持此参数。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;通过状况显示调整大小状态：&lt;/strong&gt; 旧的 &lt;code&gt;status.resize&lt;/code&gt; 字段已被弃用。
调整大小操作的状态现在通过两个 Pod 状况暴露：
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;PodResizePending&lt;/code&gt;：表示 kubelet 无法立即批准调整大小
（例如，如果暂时不能，则 &lt;code&gt;reason: Deferred&lt;/code&gt;；如果在节点上不可能，则 &lt;code&gt;reason: Infeasible&lt;/code&gt;）。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PodResizeInProgress&lt;/code&gt;：表示调整大小已被接受并正在应用。
在此阶段遇到的错误现在会在此状况的消息中报告为 &lt;code&gt;reason: Error&lt;/code&gt;。&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;支持边车容器：&lt;/strong&gt; 现在支持对&lt;a class=&#39;glossary-tooltip&#39; title=&#39;在 Pod 的整个生命期内保持运行的辅助容器。&#39; data-toggle=&#39;tooltip&#39; data-placement=&#39;top&#39; href=&#39;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/pods/sidecar-containers/&#39; target=&#39;_blank&#39; aria-label=&#39;边车容器&#39;&gt;边车容器&lt;/a&gt;进行原地调整大小。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
### Stability and reliability enhancements

* **Refined Allocated Resources Management:** The allocation management logic with the Kubelet was significantly reworked, making it more consistent and robust. The changes eliminated whole classes of bugs, and greatly improved the reliability of in-place Pod resize.
--&gt;
&lt;h3 id=&#34;稳定性和可靠性增强&#34;&gt;稳定性和可靠性增强&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;改进的已分配资源管理：&lt;/strong&gt; 对 Kubelet 的分配管理逻辑进行了重大重新设计，
使其更加一致和稳健。这些更改消除了很多种错误，并大大提高了原地 Pod 调整大小的可靠性。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* **Improved Checkpointing &amp; State Tracking:** A more robust system for tracking &#34;allocated&#34; and &#34;actuated&#34; resources was implemented, using new checkpoint files (`allocated_pods_state`, `actuated_pods_state`) to reliably manage resize state across Kubelet restarts and handle edge cases where runtime-reported resources differ from requested ones. Several bugs related to checkpointing and state restoration were fixed. Checkpointing efficiency was also improved.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;改进的检查点操作和状态跟踪操作：&lt;/strong&gt; 实现了更健壮的系统来跟踪“已分配”和“已执行”的资源，
使用新的检查点文件（&lt;code&gt;allocated_pods_state&lt;/code&gt;，&lt;code&gt;actuated_pods_state&lt;/code&gt;）以可靠地管理
kubelet 重启时的调整大小状态，并处理运行时报告的资源与请求的资源不同的边缘情况。
修复了几个与检查点和状态恢复相关的错误。还提高了检查点的效率。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* **Faster Resize Detection:** Enhancements to the Kubelet&#39;s Pod Lifecycle Event Generator (PLEG) allow the Kubelet to respond to and complete resizes much more quickly.
* **Enhanced CRI Integration:** A new `UpdatePodSandboxResources` CRI call was added to better inform runtimes and plugins (like NRI) about Pod-level resource changes.
* **Numerous Bug Fixes:** Addressed issues related to systemd cgroup drivers, handling of containers without limits, CPU minimum share calculations, container restart backoffs, error propagation, test stability, and more.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;更快的调整大小检测：&lt;/strong&gt; 对 kubelet 的 Pod 生命周期事件生成器（PLEG）进行了增强，
使 kubelet 能够更快地响应并完成大小调整。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;增强的 CRI 集成：&lt;/strong&gt; 添加了新的 &lt;code&gt;UpdatePodSandboxResources&lt;/code&gt; CRI 调用，
以更好地通知运行时和插件（如 NRI）有关 Pod 级别的资源变化。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;众多 Bug 修复：&lt;/strong&gt; 解决了与 systemd CGroup 驱动程序、未设资源限制的容器的处理、CPU
最小份额计算、容器重启退避、错误传播、测试稳定性等相关的问题。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## What&#39;s next?

Graduating to Beta means the feature is ready for broader adoption, but development doesn&#39;t stop here! Here&#39;s what the community is focusing on next:
--&gt;
&lt;h2 id=&#34;whats-next&#34;&gt;接下来是什么？  &lt;/h2&gt;
&lt;p&gt;晋升为 Beta 意味着该特性已经准备好被更广泛地采用，但开发工作并不会止步于此！
以下是社区接下来的关注重点：&lt;/p&gt;
&lt;!--
* **Stability and Productionization:** Continued focus on hardening the feature, improving performance, and ensuring it is robust for production environments.
* **Addressing Limitations:** Working towards relaxing some of the current limitations noted in the documentation, such as allowing memory limit decreases.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;稳定性和产品化：&lt;/strong&gt; 持续关注增强特性，提升性能，并确保它在生产环境中足够稳健。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;解决限制：&lt;/strong&gt; 致力于解除文档中提到的一些当前限制，例如允许降低内存限制值。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
* **[VerticalPodAutoscaler](/docs/concepts/workloads/autoscaling/#scaling-workloads-vertically) (VPA) Integration:** Work to enable VPA to leverage in-place Pod resize is already underway. A new `InPlaceOrRecreate` update mode will allow it to attempt non-disruptive resizes first, or fall back to recreation if needed. This will allow users to benefit from VPA&#39;s recommendations with significantly less disruption.
* **User Feedback:** Gathering feedback from users adopting the beta feature is crucial for prioritizing further enhancements and addressing any uncovered issues or bugs.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/autoscaling/#scaling-workloads-vertically&#34;&gt;垂直 Pod 自动扩缩&lt;/a&gt;（VPA）集成：&lt;/strong&gt;
此任务正在进行，为的是使 VPA 能够利用原地 Pod 重新调整大小。一个新的 &lt;strong&gt;InPlaceOrRecreate&lt;/strong&gt;
更新模式将允许它首先尝试非干扰性的重新调整大小，或者在需要时回退到重建。
这将使用户能够受益于 VPA 的建议，并显著减少干扰。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;用户反馈：&lt;/strong&gt; 收集采用 Beta 版特性的用户反馈，对于优先处理后续的增强特性以及解决发现的任何问题或错误至关重要。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Getting started and providing feedback

With the `InPlacePodVerticalScaling` feature gate enabled by default in v1.33, you can start experimenting with in-place Pod resizing right away!

Refer to the [documentation](/docs/tasks/configure-pod-container/resize-container-resources/) for detailed guides and examples.
--&gt;
&lt;h2 id=&#34;getting-started-and-providing-feedback&#34;&gt;开始使用并提供反馈  &lt;/h2&gt;
&lt;p&gt;随着 &lt;strong&gt;InPlacePodVerticalScaling&lt;/strong&gt; 特性门控在 v1.33 中默认启用，
你可以立即开始尝试原地 Pod 资源调整大小！&lt;/p&gt;
&lt;p&gt;参考&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/configure-pod-container/resize-container-resources/&#34;&gt;文档&lt;/a&gt;获取详细的指南和示例。&lt;/p&gt;
&lt;!--
As this feature moves through Beta, your feedback is invaluable. Please report any issues or share your experiences via the standard Kubernetes communication channels (GitHub issues, mailing lists, Slack). You can also review the [KEP-1287: In-place Update of Pod Resources](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources) for the full in-depth design details.

We look forward to seeing how the community leverages in-place Pod resize to build more efficient and resilient applications on Kubernetes!
--&gt;
&lt;p&gt;随着此特性从 Beta 阶段逐步推进，你的反馈是无价的。请通过 Kubernetes
标准沟通渠道（GitHub Issues、邮件列表、Slack）报告任何问题或分享你的经验。
你也可以查看
&lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources&#34;&gt;KEP-1287: In-place Update of Pod Resources&lt;/a&gt;
以获取完整的深入设计细节。&lt;/p&gt;
&lt;p&gt;我们期待看到社区如何利用原地 Pod 调整大小来构建更高效、弹性更好的 Kubernetes 应用！&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes 1.33：Job 的 SuccessPolicy 进阶至 GA</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/15/kubernetes-1-33-jobs-success-policy-goes-ga/</link>
      <pubDate>Thu, 15 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/15/kubernetes-1-33-jobs-success-policy-goes-ga/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes 1.33: Job&#39;s SuccessPolicy Goes GA&#34;
date: 2025-05-15T10:30:00-08:00
slug: kubernetes-1-33-jobs-success-policy-goes-ga
authors: &gt;
  [Yuki Iwai](https://github.com/tenzen-y) (CyberAgent, Inc)
--&gt;
&lt;!--
On behalf of the Kubernetes project, I&#39;m pleased to announce that Job _success policy_ has graduated to General Availability (GA) as part of the v1.33 release.
--&gt;
&lt;p&gt;我代表 Kubernetes 项目组，很高兴地宣布在 v1.33 版本中，Job 的&lt;strong&gt;成功策略&lt;/strong&gt;已进阶至 GA（正式发布）。&lt;/p&gt;
&lt;!--
## About Job&#39;s Success Policy

In batch workloads, you might want to use leader-follower patterns like [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface),
in which the leader controls the execution, including the followers&#39; lifecycle.
--&gt;
&lt;h2 id=&#34;about-jobs-success-policy&#34;&gt;关于 Job 的成功策略  &lt;/h2&gt;
&lt;p&gt;在批处理工作负载中，你可能希望使用类似
&lt;a href=&#34;https://zh.wikipedia.org/zh-cn/%E8%A8%8A%E6%81%AF%E5%82%B3%E9%81%9E%E4%BB%8B%E9%9D%A2&#34;&gt;MPI（消息传递接口）&lt;/a&gt;
的领导者跟随者（leader-follower）模式，其中领导者控制执行过程，包括跟随者的生命周期。&lt;/p&gt;
&lt;!--
In this case, you might want to mark it as succeeded
even if some of the indexes failed. Unfortunately, a leader-follower Kubernetes Job that didn&#39;t use a success policy, in most cases, would have to require **all** Pods to finish successfully
for that Job to reach an overall succeeded state.

For Kubernetes Jobs, the API allows you to specify the early exit criteria using the `.spec.successPolicy`
field (you can only use the `.spec.successPolicy` field for an [indexed Job](/docs/concept/workloads/controllers/job/#completion-mode)).
Which describes a set of rules either using a list of succeeded indexes for a job, or defining a minimal required size of succeeded indexes.
--&gt;
&lt;p&gt;在这种情况下，即使某些索引失败了，你也可能希望将 Job 标记为成功。
然而，在没有使用成功策略的情况下，Kubernetes 中的领导者跟随者
Job 通常必须要求&lt;strong&gt;所有&lt;/strong&gt; Pod 成功完成，整个 Job 才会被视为成功。&lt;/p&gt;
&lt;p&gt;对于 Kubernetes Job，API 允许你通过 &lt;code&gt;.spec.successPolicy&lt;/code&gt; 字段指定提前退出的条件
（你只能将此字段用于&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concept/workloads/controllers/job/#completion-mode&#34;&gt;带索引的 Job&lt;/a&gt;）。
此字段通过使用已成功的索引列表或定义成功索引的最小数量来描述一组规则。&lt;/p&gt;
&lt;!--
This newly stable field is especially valuable for scientific simulation, AI/ML and High-Performance Computing (HPC) batch workloads.
Users in these areas often run numerous experiments and may only need a specific number to complete successfully, rather than requiring all of them to succeed. 
In this case, the leader index failure is the only relevant Job exit criteria, and the outcomes for individual follower Pods are handled
only indirectly via the status of the leader index.
Moreover, followers do not know when they can terminate themselves.
--&gt;
&lt;p&gt;这个全新的稳定字段对科学仿真、AI/ML 和高性能计算（HPC）等批处理工作负载特别有价值。
这些领域的用户通常会运行大量实验，而他们可能只需要其中一部分成功完成，而不需要全部成功。
在这种情况下，领导者索引失败是对应 Job 的唯一重要退出条件，个别跟随者 Pod
的结果仅通过领导者索引的状态间接被处理。此外，跟随者自身并不知道何时可以终止。&lt;/p&gt;
&lt;!--
After Job meets any __Success Policy__, the Job is marked as succeeded, and all Pods are terminated including the running ones.

## How it works

The following excerpt from a Job manifest, using `.successPolicy.rules[0].succeededCount`, shows an example of
using a custom success policy:
--&gt;
&lt;p&gt;一旦 Job 满足任一&lt;strong&gt;成功策略&lt;/strong&gt;，此 Job 就会被标记为成功，并终止所有 Pod，包括正在运行的 Pod。&lt;/p&gt;
&lt;h2 id=&#34;how-it-works&#34;&gt;工作原理  &lt;/h2&gt;
&lt;p&gt;以下是使用 &lt;code&gt;.successPolicy.rules[0].succeededCount&lt;/code&gt; 的 Job 清单片段，
这是一个自定义成功策略的例子：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parallelism&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;completions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;completionMode&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Indexed&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;successPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;rules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;succeededCount&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
Here, the Job is marked as succeeded when one index succeeded regardless of its number.
Additionally, you can constrain index numbers against `succeededCount` in `.successPolicy.rules[0].succeededCount`
as shown below:
--&gt;
&lt;p&gt;在这里，只要有任意一个索引成功，Job 就会被标记为成功，而不管具体是哪个索引。
此外，你还可以基于 &lt;code&gt;.successPolicy.rules[0].succeededCount&lt;/code&gt; 限制索引编号，如下所示：&lt;/p&gt;
&lt;!--
```yaml
parallelism: 10
completions: 10
completionMode: Indexed
successPolicy:
  rules:
  - succeededIndexes: 0 # index of the leader Pod
    succeededCount: 1
```
--&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parallelism&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;completions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;completionMode&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Indexed&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;successPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;rules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;succeededIndexes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;0&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 领导者 Pod 的索引&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;succeededCount&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
This example shows that the Job will be marked as succeeded once a Pod with a specific index (Pod index 0) has succeeded.

Once the Job either reaches one of the `successPolicy` rules, or achieves its `Complete` criteria based on `.spec.completions`,
the Job controller within kube-controller-manager adds the `SuccessCriteriaMet` condition to the Job status.
After that, the job-controller initiates cleanup and termination of Pods for Jobs with `SuccessCriteriaMet` condition.
Eventually, Jobs obtain `Complete` condition when the job-controller finished cleanup and termination.
--&gt;
&lt;p&gt;这个例子表示，只要具有特定索引（Pod 索引 0）的 Pod 成功，整个 Job 就会被标记为成功。&lt;/p&gt;
&lt;p&gt;一旦 Job 满足任一条 &lt;code&gt;successPolicy&lt;/code&gt; 规则，或根据 &lt;code&gt;.spec.completions&lt;/code&gt; 达到其 &lt;code&gt;Complete&lt;/code&gt; 条件，
kube-controller-manager 中的 Job 控制器就会向 Job 状态添加 &lt;code&gt;SuccessCriteriaMet&lt;/code&gt; 状况。
之后，job-controller 会为具有 &lt;code&gt;SuccessCriteriaMet&lt;/code&gt; 状况的 Job 发起 Pod 的清理和终止。
当 job-controller 完成清理和终止后，Job 会获得 &lt;code&gt;Complete&lt;/code&gt; 状况。&lt;/p&gt;
&lt;!--
## Learn more

- Read the documentation for
  [success policy](/docs/concepts/workloads/controllers/job/#success-policy).
- Read the KEP for the [Job success/completion policy](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3998-job-success-completion-policy)
--&gt;
&lt;h2 id=&#34;learn-more&#34;&gt;了解更多   &lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;阅读关于&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/controllers/job/#success-policy&#34;&gt;成功策略的文档&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;阅读关于 &lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3998-job-success-completion-policy&#34;&gt;Job 成功/完成策略的 KEP&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Get involved

This work was led by the Kubernetes
[batch working group](https://github.com/kubernetes/community/tree/master/wg-batch)
in close collaboration with the
[SIG Apps](https://github.com/kubernetes/community/tree/master/sig-apps) community.

If you are interested in working on new features in the space I recommend
subscribing to our [Slack](https://kubernetes.slack.com/messages/wg-batch)
channel and attending the regular community meetings.
--&gt;
&lt;h2 id=&#34;get-involved&#34;&gt;加入我们  &lt;/h2&gt;
&lt;p&gt;这项工作由 Kubernetes 的
&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/wg-batch&#34;&gt;Batch Working Group（批处理工作组）&lt;/a&gt;牵头，并与
&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-apps&#34;&gt;SIG Apps&lt;/a&gt; 社区密切协作。&lt;/p&gt;
&lt;p&gt;如果你对此领域的新特性开发感兴趣，推荐你订阅我们的
&lt;a href=&#34;https://kubernetes.slack.com/messages/wg-batch&#34;&gt;Slack 频道&lt;/a&gt;，并参加定期举行的社区会议。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33：容器生命周期更新</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/14/kubernetes-v1-33-updates-to-container-lifecycle/</link>
      <pubDate>Wed, 14 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/14/kubernetes-v1-33-updates-to-container-lifecycle/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.33: Updates to Container Lifecycle&#34;
date: 2025-05-14T10:30:00-08:00
slug: kubernetes-v1-33-updates-to-container-lifecycle
author: &gt;
  Sreeram Venkitesh (DigitalOcean)
--&gt;
&lt;!--
Kubernetes v1.33 introduces a few updates to the lifecycle of containers. The Sleep action for container lifecycle hooks now supports a zero sleep duration (feature enabled by default).
There is also alpha support for customizing the stop signal sent to containers when they are being terminated.
--&gt;
&lt;p&gt;Kubernetes v1.33 引入了对容器生命周期的一些更新。
容器生命周期回调的 Sleep 动作现在支持零睡眠时长（特性默认启用）。
同时还为定制发送给终止中的容器的停止信号提供了 Alpha 级别支持。&lt;/p&gt;
&lt;!--
This blog post goes into the details of these new aspects of the container lifecycle, and how you can use them.
--&gt;
&lt;p&gt;这篇博客文章深入介绍了容器生命周期的这些新内容，以及如何使用它们。&lt;/p&gt;
&lt;!--
## Zero value for Sleep action

Kubernetes v1.29 introduced the `Sleep` action for container PreStop and PostStart Lifecycle hooks. The Sleep action lets your containers pause for a specified duration after the container is started or before it is terminated. This was needed to provide a straightforward way to manage graceful shutdowns. Before the Sleep action, folks used to run the `sleep` command using the exec action in their container lifecycle hooks. If you wanted to do this you&#39;d need to have the binary for the `sleep` command in your container image. This is difficult if you&#39;re using third party images. 
--&gt;
&lt;h2 id=&#34;sleep-动作的零值&#34;&gt;Sleep 动作的零值&lt;/h2&gt;
&lt;p&gt;Kubernetes v1.29 引入了容器 PreStop 和 PostStart 生命周期回调的 &lt;code&gt;Sleep&lt;/code&gt; 动作。
Sleep 动作允许你的容器在启动后或终止前暂停指定的时长。这为管理优雅关闭提供了一种直接的方法。
在 Sleep 动作之前，人们常使用生命周期回调中的 exec 动作运行 &lt;code&gt;sleep&lt;/code&gt; 命令。
如果你想这样做，则需要在你的容器镜像中包含 &lt;code&gt;sleep&lt;/code&gt; 命令的二进制文件。
如果你使用第三方镜像，这可能会比较困难。&lt;/p&gt;
&lt;!--
The sleep action when it was added initially didn&#39;t have support for a sleep duration of zero seconds. The `time.Sleep` which the Sleep action uses under the hood supports a duration of zero seconds. Using a negative or a zero value for the sleep returns immediately, resulting in a no-op. We wanted the same behaviour with the sleep action. This support for the zero duration was later added in v1.32, with the `PodLifecycleSleepActionAllowZero` feature gate.
--&gt;
&lt;p&gt;最初添加 Sleep 动作时，并不支持零秒的睡眠时间。
然而，&lt;code&gt;time.Sleep&lt;/code&gt;（Sleep 动作底层使用的机制）是支持零秒的持续时间的。
使用负值或零值进行睡眠会立即返回，导致无操作。我们希望 Sleep 动作也有相同的行为。
后来在 v1.32 中通过&lt;strong&gt;特性门控&lt;/strong&gt; &lt;code&gt;PodLifecycleSleepActionAllowZero&lt;/code&gt; 添加了这种对零持续时间的支持。&lt;/p&gt;
&lt;!--
The `PodLifecycleSleepActionAllowZero` feature gate has graduated to beta in v1.33, and is now enabled by default.
The original Sleep action for `preStop` and `postStart` hooks is been enabled by default, starting from Kubernetes v1.30.
With a cluster running Kubernetes v1.33, you are able to set a
zero duration for sleep lifecycle hooks. For a cluster with default configuration, you don&#39;t need 
to enable any feature gate to make that possible.
--&gt;
&lt;p&gt;&lt;code&gt;PodLifecycleSleepActionAllowZero&lt;/code&gt; 特性门控在 v1.33 中已升级到 Beta 阶段，并且现在默认启用。
从 Kubernetes v1.30 开始，&lt;code&gt;preStop&lt;/code&gt; 和 &lt;code&gt;postStart&lt;/code&gt; 回调的原始 Sleep 动作默认情况下已启用。
使用运行 Kubernetes v1.33 的集群时，你可以为 Sleep 生命周期钩子设置零持续时间。
对于采用默认配置的集群，你无需启用任何特性门控即可实现这一点。&lt;/p&gt;
&lt;!--
## Container stop signals

Container runtimes such as containerd and CRI-O honor a `StopSignal` instruction in the container image definition. This can be used to specify a custom stop signal
that the runtime will used to terminate containers based on that image.
Stop signal configuration was not originally part of the Pod API in Kubernetes.
Until Kubernetes v1.33, the only way to override the stop signal for containers was by rebuilding your container image with the new custom stop signal
(for example, specifying `STOPSIGNAL` in a `Containerfile` or `Dockerfile`).
--&gt;
&lt;h2 id=&#34;容器停止信号&#34;&gt;容器停止信号&lt;/h2&gt;
&lt;p&gt;容器运行时如 containerd 和 CRI-O 支持容器镜像定义中的 &lt;code&gt;StopSignal&lt;/code&gt; 指令。
这可以用来指定一个自定义的停止信号，运行时将使用该信号来终止基于此镜像的容器。
停止信号配置最初并不是 Kubernetes Pod API 的一部分。
直到 Kubernetes v1.33，覆盖容器停止信号的唯一方法是通过使用新的自定义停止信号重建容器镜像
（例如，在 &lt;code&gt;Containerfile&lt;/code&gt; 或 &lt;code&gt;Dockerfile&lt;/code&gt; 中指定 &lt;code&gt;STOPSIGNAL&lt;/code&gt;）。&lt;/p&gt;
&lt;!--
The `ContainerStopSignals` feature gate which is newly added in Kubernetes v1.33 adds stop signals to the Kubernetes API. This allows users to specify a custom stop signal in the container spec. Stop signals are added to the API as a new lifecycle along with the existing PreStop and PostStart lifecycle handlers. In order to use this feature, we expect the Pod to have the operating system specified with `spec.os.name`. This is enforced so that we can cross-validate the stop signal against the operating system and make sure that the containers in the Pod are created with a valid stop signal for the operating system the Pod is being scheduled to. For Pods scheduled on Windows nodes, only `SIGTERM` and `SIGKILL` are allowed as valid stop signals. Find the full list of signals supported in Linux nodes [here](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/core/v1/types.go#L2985-L3053).
--&gt;
&lt;p&gt;&lt;code&gt;ContainerStopSignals&lt;/code&gt; 特性门控是 Kubernetes v1.33 新增的，
它将停止信号添加到了 Kubernetes API。这允许用户在容器规格中指定自定义的停止信号。
停止信号作为新生命周期加入 API，连同现有的 PreStop 和 PostStart 生命周期处理器一起使用。
要使用这个特性，Pod 需要用 &lt;code&gt;spec.os.name&lt;/code&gt; 指定操作系统。这是为了能对操作系统进行停止信号的交叉验证，
确保 Pod 中的容器是以适合其调度操作系统的有效停止信号创建的。对于调度到 Windows 节点上的 Pod，
仅允许 &lt;code&gt;SIGTERM&lt;/code&gt; 和 &lt;code&gt;SIGKILL&lt;/code&gt; 作为有效的停止信号。
&lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/core/v1/types.go#L2985-L3053&#34;&gt;这里&lt;/a&gt;可以找到
Linux 节点支持的完整信号列表。&lt;/p&gt;
&lt;!--
### Default behaviour

If a container has a custom stop signal defined in its lifecycle, the container runtime would use the signal defined in the lifecycle to kill the container, given that the container runtime also supports custom stop signals. If there is no custom stop signal defined in the container lifecycle, the runtime would fallback to the stop signal defined in the container image. If there is no stop signal defined in the container image, the default stop signal of the runtime would be used. The default signal is `SIGTERM` for both containerd and CRI-O.
--&gt;
&lt;h3 id=&#34;默认行为&#34;&gt;默认行为&lt;/h3&gt;
&lt;p&gt;如果容器在其生命周期中定义了自定义停止信号，那么只要容器运行时也支持自定义停止信号，
容器运行时就会使用生命周期中定义的信号来终止容器。如果容器生命周期中没有定义自定义停止信号，
运行时将回退到容器镜像中定义的停止信号。如果在容器镜像中也没有定义停止信号，
将会使用运行时的默认停止信号。对于 containerd 和 CRI-O，默认信号都是 &lt;code&gt;SIGTERM&lt;/code&gt;。&lt;/p&gt;
&lt;!--
### Version skew

For the feature to work as intended, both the versions of Kubernetes and the container runtime should support container stop signals. The changes to the Kuberentes API and kubelet are available in alpha stage from v1.33, which can be enabled with the `ContainerStopSignals` feature gate. The container runtime implementations for containerd and CRI-O are still a work in progress and will be rolled out soon.
--&gt;
&lt;h3 id=&#34;版本偏差&#34;&gt;版本偏差&lt;/h3&gt;
&lt;p&gt;为了使该特性按预期工作，Kubernetes 和容器运行时的版本都应支持容器停止信号。
对 Kubernetes API 和 kubelet 的更改从 v1.33 开始进入 Alpha 阶段，
可以通过启用 &lt;code&gt;ContainerStopSignals&lt;/code&gt; 特性门控来使用。
containerd 和 CRI-O 的容器运行时实现仍在进行中，不久将会发布。&lt;/p&gt;
&lt;!--
### Using container stop signals

To enable this feature, you need to turn on the `ContainerStopSignals` feature gate in both the kube-apiserver and the kubelet. Once you have nodes where the feature gate is turned on, you can create Pods with a StopSignal lifecycle and a valid OS name like so:
--&gt;
&lt;h3 id=&#34;使用容器停止信号&#34;&gt;使用容器停止信号&lt;/h3&gt;
&lt;p&gt;要启用此特性，你需要在 kube-apiserver 和 kubelet 中打开 &lt;code&gt;ContainerStopSignals&lt;/code&gt; 特性门控。
一旦你在节点上启用了特性门控，就可以创建带有 StopSignal 生命周期和有效操作系统名称的 Pod，如下所示：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;os&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;linux&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;nginx:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;lifecycle&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;stopSignal&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;SIGUSR1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
Do note that the `SIGUSR1` signal in this example can only be used if the container&#39;s Pod is scheduled to a Linux node. Hence we need to specify `spec.os.name` as `linux` to be able to use the signal. You will only be able to configure `SIGTERM` and `SIGKILL` signals if the Pod is being scheduled to a Windows node. You cannot specify a `containers[*].lifecycle.stopSignal` if the `spec.os.name` field is nil or unset either.
--&gt;
&lt;p&gt;请注意，此示例中的 &lt;code&gt;SIGUSR1&lt;/code&gt; 信号仅在容器的 Pod 被调度到 Linux 节点时才能使用。
因此，我们需要指定 &lt;code&gt;spec.os.name&lt;/code&gt; 为 &lt;code&gt;linux&lt;/code&gt; 才能使用该信号。
如果 Pod 被调度到 Windows 节点，则你只能配置 &lt;code&gt;SIGTERM&lt;/code&gt; 和 &lt;code&gt;SIGKILL&lt;/code&gt; 信号。
此外，如果 &lt;code&gt;spec.os.name&lt;/code&gt; 字段为 nil 或未设置，你也不能指定 &lt;code&gt;containers[*].lifecycle.stopSignal&lt;/code&gt;。&lt;/p&gt;
&lt;!--
## How do I get involved?

This feature is driven by the [SIG Node](https://github.com/Kubernetes/community/blob/master/sig-node/README.md). If you are interested in helping develop this feature, sharing feedback, or participating in any other ongoing SIG Node projects, please reach out to us!
--&gt;
&lt;h2 id=&#34;我如何参与&#34;&gt;我如何参与？&lt;/h2&gt;
&lt;p&gt;此特性由 &lt;a href=&#34;https://github.com/Kubernetes/community/blob/master/sig-node/README.md&#34;&gt;SIG Node&lt;/a&gt;
推动。如果你有兴趣帮助开发此特性、分享反馈或参与任何其他正在进行的 SIG Node 项目，请联系我们！&lt;/p&gt;
&lt;!--
You can reach SIG Node by several means:
- Slack: [#sig-node](https://kubernetes.slack.com/messages/sig-node)
- [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-node)
- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/sig%2Fnode)

You can also contact me directly:
- GitHub: @sreeram-venkitesh
- Slack: @sreeram.venkitesh
--&gt;
&lt;p&gt;你可以通过几种方式联系 SIG Node：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Slack：&lt;a href=&#34;https://kubernetes.slack.com/messages/sig-node&#34;&gt;#sig-node&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://groups.google.com/forum/#!forum/kubernetes-sig-node&#34;&gt;邮件列表&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/community/labels/sig%2Fnode&#34;&gt;开放社区 Issues/PRs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;你也可以直接联系我：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GitHub：@sreeram-venkitesh&lt;/li&gt;
&lt;li&gt;Slack：@sreeram.venkitesh&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33：Job 逐索引的回退限制进阶至 GA</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/13/kubernetes-v1-33-jobs-backoff-limit-per-index-goes-ga/</link>
      <pubDate>Tue, 13 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/13/kubernetes-v1-33-jobs-backoff-limit-per-index-goes-ga/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.33: Job&#39;s Backoff Limit Per Index Goes GA&#34;
date: 2025-05-13T10:30:00-08:00
slug: kubernetes-v1-33-jobs-backoff-limit-per-index-goes-ga
author: &gt;
  [Michał Woźniak](https://github.com/mimowo) (Google)
--&gt;
&lt;!--
In Kubernetes v1.33, the _Backoff Limit Per Index_ feature reaches general
availability (GA). This blog describes the Backoff Limit Per Index feature and
its benefits.
--&gt;
&lt;p&gt;在 Kubernetes v1.33 中，&lt;strong&gt;逐索引的回退限制&lt;/strong&gt;特性进阶至 GA（正式发布）。本文介绍此特性及其优势。&lt;/p&gt;
&lt;!--
## About backoff limit per index

When you run workloads on Kubernetes, you must consider scenarios where Pod
failures can affect the completion of your workloads. Ideally, your workload
should tolerate transient failures and continue running.

To achieve failure tolerance in a Kubernetes Job, you can set the
`spec.backoffLimit` field. This field specifies the total number of tolerated
failures.
--&gt;
&lt;h2 id=&#34;about-backoff-limit-per-index&#34;&gt;关于逐索引的回退限制  &lt;/h2&gt;
&lt;p&gt;当你在 Kubernetes 上运行工作负载时，必须考虑 Pod 失效可能影响工作负载完成的场景。
理想情况下，你的工作负载应该能够容忍短暂的失效并继续运行。&lt;/p&gt;
&lt;p&gt;为了在 Kubernetes Job 中容忍失效，你可以设置 &lt;code&gt;spec.backoffLimit&lt;/code&gt; 字段。
此字段指定容忍的失效总数。&lt;/p&gt;
&lt;!--
However, for workloads where every index is considered independent, like
[embarassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel)
workloads - the `spec.backoffLimit` field is often not flexible enough.
For example, you may choose to run multiple suites of integration tests by
representing each suite as an index within an [Indexed Job](/docs/tasks/job/indexed-parallel-processing-static/).
In that setup, a fast-failing index  (test suite) is likely to consume your
entire budget for tolerating Pod failures, and you might not be able to run the
other indexes.
--&gt;
&lt;p&gt;但是，对于每个索引都被视为独立单元的工作负载，
比如&lt;a href=&#34;https://zh.wikipedia.org/zh-cn/%E8%BF%87%E6%98%93%E5%B9%B6%E8%A1%8C&#34;&gt;过易并行&lt;/a&gt;的工作负载，
&lt;code&gt;spec.backoffLimit&lt;/code&gt; 字段通常不够灵活。例如，你可以选择运行多个继承测试套件，
将每个套件作为&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/job/indexed-parallel-processing-static/&#34;&gt;带索引的 Job&lt;/a&gt;内的某个索引。
在这种情况下，快速失效的索引（测试套件）很可能消耗你全部的 Pod 失效容忍预算，你可能无法运行其他索引的 Pod。&lt;/p&gt;
&lt;!--
In order to address this limitation, Kubernetes introduced _backoff limit per index_,
which allows you to control the number of retries per index.

## How backoff limit per index works

To use Backoff Limit Per Index for Indexed Jobs, specify the number of tolerated
Pod failures per index with the `spec.backoffLimitPerIndex` field. When you set
this field, the Job executes all indexes by default.
--&gt;
&lt;p&gt;为了解决这一限制，Kubernetes 引入了&lt;strong&gt;逐索引的回退限制&lt;/strong&gt;，允许你控制逐索引的重试次数。&lt;/p&gt;
&lt;h2 id=&#34;how-backoff-limit-per-index-works&#34;&gt;逐索引回退限制的工作原理  &lt;/h2&gt;
&lt;p&gt;要在带索引的 Job 中使用逐索引的回退限制，可以通过 &lt;code&gt;spec.backoffLimitPerIndex&lt;/code&gt;
字段指定每个索引允许的 Pod 失效次数。当你设置此字段后，Job 默认将执行所有索引。&lt;/p&gt;
&lt;!--
Additionally, to fine-tune the error handling:
* Specify the cap on the total number of failed indexes by setting the
  `spec.maxFailedIndexes` field. When the limit is exceeded the entire Job is
  terminated.
* Define a short-circuit to detect a failed index by using the `FailIndex` action in the
  [Pod Failure Policy](/docs/concepts/workloads/controllers/job/#pod-failure-policy)
  mechanism.
--&gt;
&lt;p&gt;另外，你可以通过以下方式微调错误处理：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;通过设置 &lt;code&gt;spec.maxFailedIndexes&lt;/code&gt; 字段，指定失效索引总数的上限。超过此限制时，整个 Job 会被终止。&lt;/li&gt;
&lt;li&gt;通过 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/controllers/job/#pod-failure-policy&#34;&gt;Pod 失效策略&lt;/a&gt;机制中的
&lt;code&gt;FailIndex&lt;/code&gt; 动作定义短路来检测失效的索引。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
When the number of tolerated failures is exceeded, the Job marks that index as
failed and lists it in the Job&#39;s `status.failedIndexes` field.

### Example

The following Job spec snippet is an example of how to combine backoff limit per
index with the _Pod Failure Policy_ feature:
--&gt;
&lt;p&gt;当超过容忍的失效次数时，Job 会将该索引标记为失效，并在 Job 的 &lt;code&gt;status.failedIndexes&lt;/code&gt; 字段中列出该索引。&lt;/p&gt;
&lt;h3 id=&#34;示例&#34;&gt;示例&lt;/h3&gt;
&lt;p&gt;下面的 Job 规约片段展示了如何将逐索引的回退限制与 &lt;strong&gt;Pod 失效策略&lt;/strong&gt;特性结合使用：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;completions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parallelism&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;completionMode&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Indexed&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;backoffLimitPerIndex&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;maxFailedIndexes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;5&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;podFailurePolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;rules&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;action&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Ignore&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;onPodConditions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;type&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;DisruptionTarget&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;action&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;FailIndex&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;onExitCodes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;operator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;In&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;values&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;42&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
In this example, the Job handles Pod failures as follows:

- Ignores any failed Pods that have the built-in
  [disruption condition](/docs/concepts/workloads/pods/disruptions/#pod-disruption-conditions),
  called `DisruptionTarget`. These Pods don&#39;t count towards Job backoff limits.
- Fails the index corresponding to the failed Pod if any of the failed Pod&#39;s
  containers finished with the exit code 42 - based on the matching &#34;FailIndex&#34;
  rule.
--&gt;
&lt;p&gt;在此例中，Job 对 Pod 失效的处理逻辑如下：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;忽略具有内置&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/pods/disruptions/#pod-disruption-conditions&#34;&gt;干扰状况&lt;/a&gt;
（称为 &lt;code&gt;DisruptionTarget&lt;/code&gt;）的失效 Pod。这些 Pod 不计入 Job 的回退限制。&lt;/li&gt;
&lt;li&gt;如果失效的 Pod 中任何容器的退出码是 42，则基于匹配的 &lt;code&gt;FailIndex&lt;/code&gt; 规则，将对应的索引标记为失效。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
- Retries the first failure of any index, unless the index failed due to the
  matching `FailIndex` rule.
- Fails the entire Job if the number of failed indexes exceeded 5 (set by the
  `spec.maxFailedIndexes` field).
--&gt;
&lt;ul&gt;
&lt;li&gt;除非索引因匹配的 &lt;code&gt;FailIndex&lt;/code&gt; 规则失效，否则会重试该索引的首次失效。&lt;/li&gt;
&lt;li&gt;如果失效索引数量超过 5 个（由 &lt;code&gt;spec.maxFailedIndexes&lt;/code&gt; 设置），则整个 Job 失效。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Learn more

- Read the blog post on the closely related feature of Pod Failure Policy [Kubernetes 1.31: Pod Failure Policy for Jobs Goes GA](/blog/2024/08/19/kubernetes-1-31-pod-failure-policy-for-jobs-goes-ga/)
- For a hands-on guide to using Pod failure policy, including the use of FailIndex, see
  [Handling retriable and non-retriable pod failures with Pod failure policy](/docs/tasks/job/pod-failure-policy/)
- Read the documentation for
  [Backoff limit per index](/docs/concepts/workloads/controllers/job/#backoff-limit-per-index) and
  [Pod failure policy](/docs/concepts/workloads/controllers/job/#pod-failure-policy)
- Read the KEP for the [Backoff Limits Per Index For Indexed Jobs](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs)
--&gt;
&lt;h2 id=&#34;进一步了解&#34;&gt;进一步了解&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;阅读与 Pod 失效策略密切相关的博客文章：&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2024/08/19/kubernetes-1-31-pod-failure-policy-for-jobs-goes-ga/&#34;&gt;Kubernetes 1.31：Job 的 Pod 失效策略进阶至 GA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;查看包含 FailIndex 用法在内的 Pod 失效策略实操指南：
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/job/pod-failure-policy/&#34;&gt;使用 Pod 失效策略处理可重试和不可重试的 Pod 失效&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;阅读&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/controllers/job/#backoff-limit-per-index&#34;&gt;逐索引的回退限制&lt;/a&gt;和
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/controllers/job/#pod-failure-policy&#34;&gt;Pod 失效策略&lt;/a&gt;等文档&lt;/li&gt;
&lt;li&gt;查阅 KEP：&lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs&#34;&gt;带索引的 Job 的逐索引回退限制&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Get involved

This work was sponsored by the Kubernetes
[batch working group](https://github.com/kubernetes/community/tree/master/wg-batch)
in close collaboration with the
[SIG Apps](https://github.com/kubernetes/community/tree/master/sig-apps) community.

If you are interested in working on new features in the space we recommend
subscribing to our [Slack](https://kubernetes.slack.com/messages/wg-batch)
channel and attending the regular community meetings.
--&gt;
&lt;h2 id=&#34;get-involved&#34;&gt;参与此工作  &lt;/h2&gt;
&lt;p&gt;这项工作由 Kubernetes &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/wg-batch&#34;&gt;Batch Working Group（批处理工作组）&lt;/a&gt;负责，且与
&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-apps&#34;&gt;SIG Apps&lt;/a&gt; 社区密切协作。&lt;/p&gt;
&lt;p&gt;如果你有兴趣参与此领域的新特性开发，建议订阅我们的
&lt;a href=&#34;https://kubernetes.slack.com/messages/wg-batch&#34;&gt;Slack 频道&lt;/a&gt;，并参加定期社区会议。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33：镜像拉取策略终于按你的预期工作了！</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/12/kubernetes-v1-33-ensure-secret-pulled-images-alpha/</link>
      <pubDate>Mon, 12 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/12/kubernetes-v1-33-ensure-secret-pulled-images-alpha/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.33: Image Pull Policy the way you always thought it worked!&#34;
date:  2025-05-12T10:30:00-08:00
slug: kubernetes-v1-33-ensure-secret-pulled-images-alpha
author: &gt;
  [Ben Petersen](https://github.com/benjaminapetersen) (Microsoft),
  [Stanislav Láznička](https://github.com/stlaz) (Microsoft)
--&gt;
&lt;!--
## Image Pull Policy the way you always thought it worked!

Some things in Kubernetes are surprising, and the way `imagePullPolicy` behaves might
be one of them. Given Kubernetes is all about running pods, it may be peculiar
to learn that there has been a caveat to restricting pod access to authenticated images for
over 10 years in the form of [issue 18787](https://github.com/kubernetes/kubernetes/issues/18787)!
It is an exciting release when you can resolve a ten-year-old issue.
--&gt;
&lt;h2 id=&#34;镜像拉取策略终于按你的预期工作了&#34;&gt;镜像拉取策略终于按你的预期工作了！&lt;/h2&gt;
&lt;p&gt;Kubernetes 中有些东西让人感到奇怪，&lt;code&gt;imagePullPolicy&lt;/code&gt; 的行为就是其中之一。
Kubernetes 作为一个专注于运行 Pod 的平台，居然在限制 Pod 访问经认证的镜像方面，存在一个长达十余年的问题，
详见 &lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/18787&#34;&gt;Issue 18787&lt;/a&gt;！
v1.33 解决了这个十年前的老问题，这真是一个有纪念意义的版本。&lt;/p&gt;

&lt;div class=&#34;alert alert-info&#34; role=&#34;alert&#34;&gt;&lt;h4 class=&#34;alert-heading&#34;&gt;说明：&lt;/h4&gt;&lt;!--
Throughout this blog post, the term &#34;pod credentials&#34; will be used often. In this context,
the term generally encapsulates the authentication material that is available to a pod
to authenticate a container image pull.
--&gt;
&lt;p&gt;在本博文中，“Pod 凭据”这个术语将被频繁使用。
在这篇博文的上下文中，这一术语通常指的是 Pod 拉取容器镜像时可用于身份认证的认证材料。&lt;/p&gt;&lt;/div&gt;

&lt;!--
## IfNotPresent, even if I&#39;m not supposed to have it

The gist of the problem is that the `imagePullPolicy: IfNotPresent` strategy has done
precisely what it says, and nothing more. Let&#39;s set up a scenario. To begin, *Pod A* in *Namespace X* is scheduled to *Node 1* and requires *image Foo* from a private repository.
For it&#39;s image pull authentication material, the pod references *Secret 1* in its `imagePullSecrets`. *Secret 1* contains the necessary credentials to pull from the private repository. The Kubelet will utilize the credentials from *Secret 1* as supplied by *Pod A*
and it will pull *container image Foo* from the registry.  This is the intended (and secure)
behavior.
--&gt;
&lt;h2 id=&#34;ifnotpresent-即使我本不该有这个镜像&#34;&gt;IfNotPresent：即使我本不该有这个镜像&lt;/h2&gt;
&lt;p&gt;问题的本质在于，&lt;code&gt;imagePullPolicy: IfNotPresent&lt;/code&gt; 策略正如其字面意义所示，仅此而已。
我们来设想一个场景：&lt;strong&gt;Pod A&lt;/strong&gt; 运行在 &lt;strong&gt;Namespace X&lt;/strong&gt; 中，被调度到 &lt;strong&gt;Node 1&lt;/strong&gt;，
此 Pod 需要从某个私有仓库拉取&lt;strong&gt;镜像 Foo&lt;/strong&gt;。此 Pod 在 &lt;code&gt;imagePullSecrets&lt;/code&gt; 中引用
&lt;strong&gt;Secret 1&lt;/strong&gt; 来作为镜像拉取认证材料。&lt;strong&gt;Secret 1&lt;/strong&gt; 包含从私有仓库拉取镜像所需的凭据。
kubelet 将使用 &lt;strong&gt;Pod A&lt;/strong&gt; 提供的 &lt;strong&gt;Secret 1&lt;/strong&gt; 来拉取 &lt;strong&gt;镜像 Foo&lt;/strong&gt;，这是预期的（也是安全的）行为。&lt;/p&gt;
&lt;!--
But now things get curious. If *Pod B* in *Namespace Y* happens to also be scheduled to *Node 1*, unexpected (and potentially insecure) things happen. *Pod B* may reference the same private image, specifying the `IfNotPresent` image pull policy. *Pod B* does not reference *Secret 1*
(or in our case, any secret) in its `imagePullSecrets`. When the Kubelet tries to run the pod, it honors the `IfNotPresent` policy. The Kubelet sees that the *image Foo* is already present locally, and will provide *image Foo* to *Pod B*. *Pod B* gets to run the image even though it did not provide credentials authorizing it to pull the image in the first place.
--&gt;
&lt;p&gt;但现在情况变得奇怪了。如果 &lt;strong&gt;Namespace Y&lt;/strong&gt; 中的 &lt;strong&gt;Pod B&lt;/strong&gt; 也被调度到 &lt;strong&gt;Node 1&lt;/strong&gt;，就会出现意外（甚至是不安全）的情况。
&lt;strong&gt;Pod B&lt;/strong&gt; 可以引用同一个私有镜像，指定 &lt;code&gt;IfNotPresent&lt;/code&gt; 镜像拉取策略。
&lt;strong&gt;Pod B&lt;/strong&gt; 未在其 &lt;code&gt;imagePullSecrets&lt;/code&gt; 中引用 &lt;strong&gt;Secret 1&lt;/strong&gt;（甚至未引用任何 Secret）。
当 kubelet 尝试运行此 Pod 时，它会采用 &lt;code&gt;IfNotPresent&lt;/code&gt; 策略。
kubelet 发现本地已存在&lt;strong&gt;镜像 Foo&lt;/strong&gt;，会将&lt;strong&gt;镜像 Foo&lt;/strong&gt; 提供给 &lt;strong&gt;Pod B&lt;/strong&gt;。
即便 &lt;strong&gt;Pod B&lt;/strong&gt; 一开始并未提供授权拉取镜像的凭据，却依然能够运行此镜像。&lt;/p&gt;
&lt;!--


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/12/kubernetes-v1-33-ensure-secret-pulled-images-alpha/ensure_secret_image_pulls.svg&#34;
         alt=&#34;Illustration of the process of two pods trying to access a private image, the first one with a pull secret, the second one without it&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;Using a private image pulled by a different pod&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
--&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/12/kubernetes-v1-33-ensure-secret-pulled-images-alpha/ensure_secret_image_pulls.svg&#34;
         alt=&#34;两个 Pod 尝试访问某个私有镜像的过程示意图，第一个 Pod 有拉取 Secret，第二个没有&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;使用由另一个 Pod 拉取的私有镜像&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;!--
While `IfNotPresent` should not pull *image Foo* if it is already present
on the node, it is an incorrect security posture to allow all pods scheduled
to a node to have access to previously pulled private image. These pods were never
authorized to pull the image in the first place.
--&gt;
&lt;p&gt;虽然 &lt;code&gt;IfNotPresent&lt;/code&gt; 不应在节点上已存在&lt;strong&gt;镜像 Foo&lt;/strong&gt; 的情况下再拉取此镜像，
但允许将所有 Pod 调度到有权限访问之前已拉取私有镜像的节点上，这从安全态势讲是不正确的做法。
因为这些 Pod 从开始就未被授权拉取此镜像。&lt;/p&gt;
&lt;!--
## IfNotPresent, but only if I am supposed to have it

In Kubernetes v1.33, we - SIG Auth and SIG Node - have finally started to address this (really old) problem and getting the verification right! The basic expected behavior is not changed. If
an image is not present, the Kubelet will attempt to pull the image. The credentials each pod supplies will be utilized for this task. This matches behavior prior to 1.33.
--&gt;
&lt;h2 id=&#34;ifnotpresent-但前提是我有权限&#34;&gt;IfNotPresent：但前提是我有权限&lt;/h2&gt;
&lt;p&gt;在 Kubernetes v1.33 中，SIG Auth 和 SIG Node 终于开始修复这个（非常古老的）难题，并经过验证可行！
基本的预期行为没有变。如果某镜像不存在，kubelet 会尝试拉取此镜像。
利用每个 Pod 提供的凭据来完成此拉取任务。这与 v1.33 之前的行为相匹配。&lt;/p&gt;
&lt;!--
If the image is present, then the behavior of the Kubelet changes. The Kubelet will now
verify the pod&#39;s credentials before allowing the pod to use the image.

Performance and service stability have been a consideration while revising the feature.
Pods utilizing the same credential will not be required to re-authenticate. This is
also true when pods source credentials from the same Kubernetes Secret object, even
when the credentials are rotated.
--&gt;
&lt;p&gt;但如果镜像存在，kubelet 的行为就变了。
kubelet 现在先要验证 Pod 的凭据，然后才会允许 Pod 使用镜像。&lt;/p&gt;
&lt;p&gt;在修缮此特性时，我们也考虑到了性能和服务稳定性。
如果多个 Pod 使用相同的凭据，则无需重复认证。
即使这些 Pod 使用的是相同的 Kubernetes Secret 对象（即便其凭据已轮换），也同样适用。&lt;/p&gt;
&lt;!--
## Never pull, but use if authorized

The `imagePullPolicy: Never` option does not fetch images. However, if the
container image is already present on the node, any pod attempting to use the private
image will be required to provide credentials, and those credentials require verification.

Pods utilizing the same credential will not be required to re-authenticate.
Pods that do not supply credentials previously used to successfully pull an
image will not be allowed to use the private image.
--&gt;
&lt;h2 id=&#34;never-永不拉取-但使用前仍需鉴权&#34;&gt;Never：永不拉取，但使用前仍需鉴权&lt;/h2&gt;
&lt;p&gt;采用 &lt;code&gt;imagePullPolicy: Never&lt;/code&gt; 选项时，不会获取镜像。
但如果节点上已存在此容器镜像，任何尝试使用此私有镜像的 Pod 都需要提供凭据，并且这些凭据需要经过验证。&lt;/p&gt;
&lt;p&gt;使用相同凭据的 Pod 无需重新认证。未提供之前已成功拉取镜像所用凭据的 Pod，将不允许使用此私有镜像。&lt;/p&gt;
&lt;!--
## Always pull, if authorized

The `imagePullPolicy: Always` has always worked as intended. Each time an image
is requested, the request goes to the registry and the registry will perform an authentication
check.

In the past, forcing the `Always` image pull policy via pod admission was the only way to ensure
that your private container images didn&#39;t get reused by other pods on nodes which already pulled the images.
--&gt;
&lt;h2 id=&#34;always-鉴权通过后始终拉取&#34;&gt;Always：鉴权通过后始终拉取&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;imagePullPolicy: Always&lt;/code&gt; 一直以来都能按预期工作。
每次某镜像被请求时，请求会流转到镜像仓库，镜像仓库将执行身份认证检查。&lt;/p&gt;
&lt;p&gt;过去，为了确保你的私有容器镜像不会被节点上已拉取过镜像的其他 Pod 重复使用，
通过 Pod 准入来强制执行 &lt;code&gt;Always&lt;/code&gt; 镜像拉取策略是唯一的方式。&lt;/p&gt;
&lt;!--
Fortunately, this was somewhat performant. Only the image manifest was pulled, not the image. However, there was still a cost and a risk. During a new rollout, scale up, or pod restart, the image registry that provided the image MUST be available for the auth check, putting the image registry in the critical path for stability of services running inside of the cluster.
--&gt;
&lt;p&gt;幸运的是，这个过程相对高效：仅拉取镜像清单，而不是镜像本体。
但这依然带来代价与风险。每当发布新版本、扩容或重启 Pod 时，
提供镜像的镜像仓库必须可以接受认证检查，从而将镜像仓库放到关键路径中确保集群中所运行的服务的稳定性。&lt;/p&gt;
&lt;!--
## How it all works

The feature is based on persistent, file-based caches that are present on each of
the nodes. The following is a simplified description of how the feature works.
For the complete version, please see [KEP-2535](https://kep.k8s.io/2535).
--&gt;
&lt;h2 id=&#34;工作原理&#34;&gt;工作原理&lt;/h2&gt;
&lt;p&gt;此特性基于每个节点上存在的持久化文件缓存。以下简要说明了此特性的工作原理。
完整细节请参见 &lt;a href=&#34;https://kep.k8s.io/2535&#34;&gt;KEP-2535&lt;/a&gt;。&lt;/p&gt;
&lt;!--
The process of requesting an image for the first time goes like this:
  1. A pod requesting an image from a private registry is scheduled to a node.
  2. The image is not present on the node.
  3. The Kubelet makes a record of the intention to pull the image.
  4. The Kubelet extracts credentials from the Kubernetes Secret referenced by the pod
     as an image pull secret, and uses them to pull the image from the private registry.
--&gt;
&lt;p&gt;首次请求某镜像的流程如下：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;请求私有仓库中某镜像的 Pod 被调度到某节点。&lt;/li&gt;
&lt;li&gt;此镜像在节点上不存在。&lt;/li&gt;
&lt;li&gt;kubelet 记录一次拉取镜像的意图。&lt;/li&gt;
&lt;li&gt;kubelet 从 Pod 引用的 Kubernetes Secret 中提取凭据作为镜像拉取 Secret，并使用这些凭据从私有仓库拉取镜像。&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
  1. After the image has been successfully pulled, the Kubelet makes a record of
     the successful pull. This record includes details about credentials used
     (in the form of a hash) as well as the Secret from which they originated.
  2. The Kubelet removes the original record of intent.
  3. The Kubelet retains the record of successful pull for later use.
--&gt;
&lt;ol start=&#34;5&#34;&gt;
&lt;li&gt;镜像已成功拉取后，kubelet 会记录这次成功的拉取。
记录包括所使用的凭据细节（哈希格式）以及构成这些凭据的原始 Secret。&lt;/li&gt;
&lt;li&gt;kubelet 移除原始意图记录。&lt;/li&gt;
&lt;li&gt;kubelet 保留成功拉取的记录供后续使用。&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
When future pods scheduled to the same node request the previously pulled private image:
  1. The Kubelet checks the credentials that the new pod provides for the pull.
  2. If the hash of these credentials, or the source Secret of the credentials match
     the hash or source Secret which were recorded for a previous successful pull,
     the pod is allowed to use the previously pulled image.
  3. If the credentials or their source Secret are not found in the records of
     successful pulls for that image, the Kubelet will attempt to use
     these new credentials to request a pull from the remote registry, triggering
     the authorization flow.
--&gt;
&lt;p&gt;当以后调度到同一节点的 Pod 请求之前拉取过的私有镜像：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;kubelet 检查新 Pod 为拉取镜像所提供的凭据。&lt;/li&gt;
&lt;li&gt;如果这些凭据的哈希或其源 Secret 与之前成功拉取记录的哈希或源 Secret 相匹配，则允许此 Pod 使用之前拉取的镜像。&lt;/li&gt;
&lt;li&gt;如果在该镜像的成功拉取记录中找不到这些凭据或其源 Secret，则
kubelet 将尝试使用这些新的凭据从远程仓库进行拉取，同时触发认证流程。&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
## Try it out

In Kubernetes v1.33 we shipped the alpha version of this feature. To give it a spin,
enable the `KubeletEnsureSecretPulledImages` feature gate for your 1.33 Kubelets.

You can learn more about the feature and additional optional configuration on the
[concept page for Images](/docs/concepts/containers/images/#ensureimagepullcredentialverification)
in the official Kubernetes documentation.
--&gt;
&lt;h2 id=&#34;试用&#34;&gt;试用&lt;/h2&gt;
&lt;p&gt;在 Kubernetes v1.33 中，我们发布了此特性的 Alpha 版本。
要想试用，在 kubelet v1.33 上启用 &lt;code&gt;KubeletEnsureSecretPulledImages&lt;/code&gt; 特性门控。&lt;/p&gt;
&lt;p&gt;你可以在 Kubernetes
官方文档中的&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/containers/images/#ensureimagepullcredentialverification&#34;&gt;镜像概念页&lt;/a&gt;中了解此特性和更多可选配置的细节。&lt;/p&gt;
&lt;!--
## What&#39;s next?

In future releases we are going to:
1. Make this feature work together with [Projected service account tokens for Kubelet image credential providers](https://kep.k8s.io/4412) which adds a new, workload-specific source of image pull credentials.
1. Write a benchmarking suite to measure the performance of this feature and assess the impact of
   any future changes.
1. Implement an in-memory caching layer so that we don&#39;t need to read files for each image
   pull request.
1. Add support for credential expirations, thus forcing previously validated credentials to
   be re-authenticated.
--&gt;
&lt;h2 id=&#34;下一步工作&#34;&gt;下一步工作&lt;/h2&gt;
&lt;p&gt;在未来的版本中，我们将：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;使此特性与 &lt;a href=&#34;https://kep.k8s.io/4412&#34;&gt;kubelet 镜像凭据提供程序的投射服务账号令牌&lt;/a&gt;协同工作，
后者能够添加新的、特定于工作负载的镜像拉取凭据源。&lt;/li&gt;
&lt;li&gt;编写基准测试套件，以评估此特性的性能并衡量后续变更的影响。&lt;/li&gt;
&lt;li&gt;实现内存中的缓存层，因此我们不需要为每个镜像拉取请求都读取文件。&lt;/li&gt;
&lt;li&gt;添加对凭据过期的支持，从而强制重新认证之前已验证过的凭据。&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
## How to get involved

[Reading KEP-2535](https://kep.k8s.io/2535) is a great way to understand these changes in depth.

If you are interested in further involvement, reach out to us on the [#sig-auth-authenticators-dev](https://kubernetes.slack.com/archives/C04UMAUC4UA) channel
on Kubernetes Slack (for an invitation, visit [https://slack.k8s.io/](https://slack.k8s.io/)).
You are also welcome to join the bi-weekly [SIG Auth meetings](https://github.com/kubernetes/community/blob/master/sig-auth/README.md#meetings),
held every other Wednesday.
--&gt;
&lt;h2 id=&#34;如何参与&#34;&gt;如何参与&lt;/h2&gt;
&lt;p&gt;阅读 &lt;a href=&#34;https://kep.k8s.io/2535&#34;&gt;KEP-2535&lt;/a&gt; 是深入理解这些变更的绝佳方式。&lt;/p&gt;
&lt;p&gt;如果你想进一步参与，可以加入 Kubernetes Slack 频道
&lt;a href=&#34;https://kubernetes.slack.com/archives/C04UMAUC4UA&#34;&gt;#sig-auth-authenticators-dev&lt;/a&gt;
（如需邀请链接，请访问 &lt;a href=&#34;https://slack.k8s.io/&#34;&gt;https://slack.k8s.io/&lt;/a&gt;）。
欢迎你参加每隔一周在星期三举行的 &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-auth/README.md#meetings&#34;&gt;SIG Auth 双周例会&lt;/a&gt;。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33：流式 List 响应</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/09/kubernetes-v1-33-streaming-list-responses/</link>
      <pubDate>Fri, 09 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/09/kubernetes-v1-33-streaming-list-responses/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.33: Streaming List responses&#34;
date: 2025-05-09T10:30:00-08:00
slug: kubernetes-v1-33-streaming-list-responses
author: &gt;
  Marek Siarkowicz (Google),
  Wei Fu (Microsoft)
--&gt;
&lt;!--
Managing Kubernetes cluster stability becomes increasingly critical as your infrastructure grows. One of the most challenging aspects of operating large-scale clusters has been handling List requests that fetch substantial datasets - a common operation that could unexpectedly impact your cluster&#39;s stability.

Today, the Kubernetes community is excited to announce a significant architectural improvement: streaming encoding for List responses.
--&gt;
&lt;p&gt;随着基础设施的增长，管理 Kubernetes 集群的稳定性变得愈发重要。
在大规模集群的运维中，最具挑战性的操作之一就是处理获取大量数据集的 List 请求。
List 请求是一种常见的操作，却可能意外影响集群的稳定性。&lt;/p&gt;
&lt;p&gt;今天，Kubernetes 社区非常高兴地宣布一项重大的架构改进：对 List 响应启用流式编码。&lt;/p&gt;
&lt;!--
## The problem: unnecessary memory consumption with large resources

Current API response encoders just serialize an entire response into a single contiguous memory and perform one [ResponseWriter.Write](https://pkg.go.dev/net/http#ResponseWriter.Write) call to transmit data to the client. Despite HTTP/2&#39;s capability to split responses into smaller frames for transmission, the underlying HTTP server continues to hold the complete response data as a single buffer. Even as individual frames are transmitted to the client, the memory associated with these frames cannot be freed incrementally.
--&gt;
&lt;h2 id=&#34;问题-大型资源导致的不必要内存消耗&#34;&gt;问题：大型资源导致的不必要内存消耗&lt;/h2&gt;
&lt;p&gt;当前的 API 响应编码器会将整个响应序列化为一个连续的内存块，并通过一次
&lt;a href=&#34;https://pkg.go.dev/net/http#ResponseWriter.Write&#34;&gt;ResponseWriter.Write&lt;/a&gt;
调用将数据发送给客户端。尽管 HTTP/2 能够将响应拆分为较小的帧进行传输，
但底层的 HTTP 服务器仍然会将完整的响应数据保存在一个单一缓冲区中。
即使这些帧被逐步传输到客户端，与这些帧关联的内存也无法被逐步释放。&lt;/p&gt;
&lt;!--
When cluster size grows, the single response body can be substantial - like hundreds of megabytes in size. At large scale, the current approach becomes particularly inefficient, as it prevents incremental memory release during transmission. Imagining that when network congestion occurs, that large response body’s memory block stays active for tens of seconds or even minutes. This limitation leads to unnecessarily high and prolonged memory consumption in the kube-apiserver process. If multiple large List requests occur simultaneously, the cumulative memory consumption can escalate rapidly, potentially leading to an Out-of-Memory (OOM) situation that compromises cluster stability.
--&gt;
&lt;p&gt;随着集群规模的扩大，单个响应体可能非常庞大，可能达到几百兆字节。
在大规模环境下，当前的方式显得特别低效，因为它使得系统无法在传输过程中逐步释放内存。
想象一下，如果网络发生拥堵，那么大型响应体的内存块会持续占用数十秒甚至几分钟。
这一局限性导致 kube-apiserver 进程出现不必要的高内存占用，持续时间也很长。
如果多个大型 List 请求同时发生，累计的内存消耗可能迅速飙升，最终可能触发
OOM（内存溢出）事件，从而危及集群稳定性。&lt;/p&gt;
&lt;!--
The encoding/json package uses sync.Pool to reuse memory buffers during serialization. While efficient for consistent workloads, this mechanism creates challenges with sporadic large List responses. When processing these large responses, memory pools expand significantly. But due to sync.Pool&#39;s design, these oversized buffers remain reserved after use. Subsequent small List requests continue utilizing these large memory allocations, preventing garbage collection and maintaining persistently high memory consumption in the kube-apiserver even after the initial large responses complete.
--&gt;
&lt;p&gt;&lt;code&gt;encoding/json&lt;/code&gt; 包在序列化时使用了 &lt;code&gt;sync.Pool&lt;/code&gt; 来复用内存缓冲区。
这对于一致的工作负载来说是高效的，但在处理偶发性的大型 List 响应时却带来了新的挑战。
在处理这些大型响应时，内存池会迅速膨胀。而由于 &lt;code&gt;sync.Pool&lt;/code&gt; 的设计特性，
这些膨胀后的缓冲区在使用后仍然会保留。后续的小型 List 请求继续使用这些大型内存分配，
导致垃圾回收无法生效，使得 kube-apiserver 在处理完大型响应后仍然保持较高的内存占用。&lt;/p&gt;
&lt;!--
Additionally, [Protocol Buffers](https://github.com/protocolbuffers/protocolbuffers.github.io/blob/c14731f55296f8c6367faa4f2e55a3d3594544c6/content/programming-guides/techniques.md?plain=1#L39) are not designed to handle large datasets. But it’s great for handling **individual** messages within a large data set. This highlights the need for streaming-based approaches that can process and transmit large collections incrementally rather than as monolithic blocks.
--&gt;
&lt;p&gt;此外，&lt;a href=&#34;https://github.com/protocolbuffers/protocolbuffers.github.io/blob/c14731f55296f8c6367faa4f2e55a3d3594544c6/content/programming-guides/techniques.md?plain=1#L39&#34;&gt;Protocol Buffers（协议缓冲）&lt;/a&gt;
并不适合处理大型数据集。但它非常适合处理大型数据集中的&lt;strong&gt;单个&lt;/strong&gt;消息。
这凸显出采用基于流式处理方式的必要性，这种方式可以逐步处理和传输大型集合，而不是一次性处理整个数据块。&lt;/p&gt;
&lt;!--
&gt; _As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy._
&gt;
&gt; _From https://protobuf.dev/programming-guides/techniques/_
--&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;一个通用的经验法则是：如果你处理的消息每个都大于一兆字节，那么可能需要考虑替代策略。&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;引自：https://protobuf.dev/programming-guides/techniques/&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;!--
## Streaming encoder for List responses

The streaming encoding mechanism is specifically designed for List responses, leveraging their common well-defined collection structures. The core idea focuses exclusively on the **Items** field within collection structures, which represents the bulk of memory consumption in large responses. Rather than encoding the entire **Items** array as one contiguous memory block, the new streaming encoder processes and transmits each item individually, allowing memory to be freed progressively as frame or chunk is transmitted. As a result, encoding items one by one significantly reduces the memory footprint required by the API server.
--&gt;
&lt;h2 id=&#34;list-响应的流式编码器&#34;&gt;List 响应的流式编码器&lt;/h2&gt;
&lt;p&gt;流式编码机制是专门为 List 响应设计的，它利用了这类响应通用且定义良好的集合结构。
核心思想是聚焦于集合结构中的 &lt;strong&gt;Items&lt;/strong&gt; 字段，此字段在大型响应中占用了大部分内存。
新的流式编码器不再将整个 &lt;strong&gt;Items&lt;/strong&gt; 数组编码为一个连续的内存块，而是逐个处理并传输每个 Item，
从而在传输每个帧或数据块后可以逐步释放内存。逐项编码显著减少了 API 服务器所需的内存占用。&lt;/p&gt;
&lt;!--
With Kubernetes objects typically limited to 1.5 MiB (from ETCD), streaming encoding keeps memory consumption predictable and manageable regardless of how many objects are in a List response. The result is significantly improved API server stability, reduced memory spikes, and better overall cluster performance - especially in environments where multiple large List operations might occur simultaneously.
--&gt;
&lt;p&gt;考虑到 Kubernetes 对象通常限制在 1.5 MiB（由 ETCD 限制），流式编码可使内存占用更加可预测和易于管理，
无论 List 响应中包含多少个对象。其结果是大幅提升了 API 服务器的稳定性，减少了内存峰值，
并改善了整体集群性能，尤其是在同时发生多个大型 List 操作的环境下更是如此。&lt;/p&gt;
&lt;!--
To ensure perfect backward compatibility, the streaming encoder validates Go struct tags rigorously before activation, guaranteeing byte-for-byte consistency with the original encoder. Standard encoding mechanisms process all fields except **Items**, maintaining identical output formatting throughout. This approach seamlessly supports all Kubernetes List types—from built-in **\*List** objects to Custom Resource **UnstructuredList** objects - requiring zero client-side modifications or awareness that the underlying encoding method has changed.
--&gt;
&lt;p&gt;为了确保完全向后兼容，流式编码器在启用前会严格验证 Go 结构体标签，确保与原始编码器在字节级别上保持一致。
标准编码机制仍然会处理除 &lt;strong&gt;Items&lt;/strong&gt; 外的所有字段，从而保持输出格式的一致性。
这种方法无缝支持所有 Kubernetes 的 List 类型（从内置的 &lt;strong&gt;*List&lt;/strong&gt; 对象到自定义资源的 &lt;strong&gt;UnstructuredList&lt;/strong&gt; 对象）
客户端无需任何修改，也无需感知底层的编码方式是否已发生变化。&lt;/p&gt;
&lt;!--
## Performance gains you&#39;ll notice

*   **Reduced Memory Consumption:** Significantly lowers the memory footprint of the API server when handling large **list** requests,
    especially when dealing with **large resources**.
*   **Improved Scalability:** Enables the API server to handle more concurrent requests and larger datasets without running out of memory.
*   **Increased Stability:** Reduces the risk of OOM kills and service disruptions.
*   **Efficient Resource Utilization:** Optimizes memory usage and improves overall resource efficiency.
--&gt;
&lt;h2 id=&#34;肉眼可见的性能提升&#34;&gt;肉眼可见的性能提升&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;内存消耗降低：&lt;/strong&gt; 当处理大型 &lt;strong&gt;list&lt;/strong&gt; 请求，尤其是涉及&lt;strong&gt;大型资源&lt;/strong&gt;时，API 服务器的内存占用大幅下降。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;可扩展性提升：&lt;/strong&gt; 允许 API 服务器处理更多并发请求和更大数据集，而不会耗尽内存。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;稳定性增强：&lt;/strong&gt; 降低 OOM 被杀和服务中断的风险。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;资源利用率提升：&lt;/strong&gt; 优化内存使用率，提高整体资源效率。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Benchmark results

To validate results Kubernetes has introduced a new **list** benchmark which executes concurrently 10 **list** requests each returning 1GB of data.

The benchmark has showed 20x improvement, reducing memory usage from 70-80GB to 3GB.
--&gt;
&lt;h2 id=&#34;基准测试结果&#34;&gt;基准测试结果&lt;/h2&gt;
&lt;p&gt;为了验证效果，Kubernetes 引入了一个新的 &lt;strong&gt;list&lt;/strong&gt; 基准测试，同时并发执行 10 个 &lt;strong&gt;list&lt;/strong&gt; 请求，每个请求返回 1GB 数据。&lt;/p&gt;
&lt;p&gt;此基准测试显示内存使用量下降了 &lt;strong&gt;20 倍&lt;/strong&gt;，从 70–80GB 降低到了 3GB。&lt;/p&gt;
&lt;!--


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/09/kubernetes-v1-33-streaming-list-responses/results.png&#34;
         alt=&#34;Screenshot of a K8s performance dashboard showing memory usage for benchmark list going down from 60GB to 3GB&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;List benchmark memory usage&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
--&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/09/kubernetes-v1-33-streaming-list-responses/results.png&#34;
         alt=&#34;K8s 性能面板截图，显示基准 list 内存使用量从 60GB 降低到 3GB&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;List 基准测试内存使用量&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes 1.33：卷填充器进阶至 GA</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/08/kubernetes-v1-33-volume-populators-ga/</link>
      <pubDate>Thu, 08 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/08/kubernetes-v1-33-volume-populators-ga/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes 1.33: Volume Populators Graduate to GA&#34;
date: 2025-05-08T10:30:00-08:00
slug: kubernetes-v1-33-volume-populators-ga
author: &gt;
  Danna Wang (Google)
  Sunny Song (Google)
--&gt;
&lt;!--
Kubernetes _volume populators_ are now  generally available (GA)! The `AnyVolumeDataSource` feature
gate is treated as always enabled for Kubernetes v1.33, which means that users can specify any appropriate
[custom resource](/docs/concepts/extend-kubernetes/api-extension/custom-resources/#custom-resources)
as the data source of a PersistentVolumeClaim (PVC).

An example of how to use dataSourceRef in PVC:
--&gt;
&lt;p&gt;Kubernetes 的&lt;strong&gt;卷填充器&lt;/strong&gt;现已进阶至 GA（正式发布）！
&lt;code&gt;AnyVolumeDataSource&lt;/code&gt; 特性门控在 Kubernetes v1.33 中设为始终启用，
这意味着用户可以将任何合适的&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/extend-kubernetes/api-extension/custom-resources/#custom-resources&#34;&gt;自定义资源&lt;/a&gt;作为
PersistentVolumeClaim（PVC）的数据源。&lt;/p&gt;
&lt;p&gt;以下是如何在 PVC 中使用 dataSourceRef 的示例：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;PersistentVolumeClaim&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;pvc1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;...&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;dataSourceRef&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiGroup&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;provider.example.com&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Provider&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;provider1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
## What is new

There are four major enhancements from beta.

### Populator Pod is optional

During the beta phase, contributors to Kubernetes identified potential resource leaks with PersistentVolumeClaim (PVC) deletion while volume population was in progress; these leaks happened due to limitations in finalizer handling.
Ahead of the graduation to general availability, the Kubernetes project added support to delete temporary resources (PVC prime, etc.) if the original PVC is deleted.
--&gt;
&lt;h2 id=&#34;what-is-new&#34;&gt;新变化  &lt;/h2&gt;
&lt;p&gt;从 Beta 进阶到 GA 后，主要有四个增强。&lt;/p&gt;
&lt;h3 id=&#34;populator-pod-is-optional&#34;&gt;填充器 Pod 成为可选  &lt;/h3&gt;
&lt;p&gt;在 Beta 阶段，Kubernetes 的贡献者们发现当正在进行卷填充时删除
PersistentVolumeClaim（PVC）可能导致资源泄露问题，这些泄漏是由于 Finalizer 处理机制的局限性所致。
在进阶至 GA 之前，Kubernetes 项目增加了在原始 PVC 被删除时对删除临时资源（PVC 派生体等）的支持。&lt;/p&gt;
&lt;!--
To accommodate this, we&#39;ve introduced three new plugin-based functions:
* `PopulateFn()`: Executes the provider-specific data population logic.
* `PopulateCompleteFn()`: Checks if the data population operation has finished successfully.
* `PopulateCleanupFn()`: Cleans up temporary resources created by the provider-specific functions after data population is completed

A provider example is added in [lib-volume-populator/example](https://github.com/kubernetes-csi/lib-volume-populator/tree/master/example).
--&gt;
&lt;p&gt;为支持此能力，我们引入了三个基于插件的新函数：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;PopulateFn()&lt;/code&gt;：执行特定于提供程序的数据填充逻辑。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PopulateCompleteFn()&lt;/code&gt;：检查数据填充操作是否成功完成。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PopulateCleanupFn()&lt;/code&gt;：在数据填充完成后，清理由提供程序特定函数创建的临时资源。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;有关提供程序的例子，参见
&lt;a href=&#34;https://github.com/kubernetes-csi/lib-volume-populator/tree/master/example&#34;&gt;lib-volume-populator/example&lt;/a&gt;。&lt;/p&gt;
&lt;!--
### Mutator functions to modify the Kubernetes resources

For GA, the CSI volume populator controller code gained a `MutatorConfig`, allowing the specification of mutator functions to modify Kubernetes resources.
For example, if the PVC prime is not an exact copy of the PVC and you need provider-specific information for the driver, you can include this information in the optional `MutatorConfig`. 
This allows you to customize the Kubernetes objects in the volume populator.
--&gt;
&lt;h3 id=&#34;支持修改-kubernetes-资源的-mutator-函数&#34;&gt;支持修改 Kubernetes 资源的 Mutator 函数&lt;/h3&gt;
&lt;p&gt;在 GA 版本中，CSI 卷填充器控制器代码新增了 &lt;code&gt;MutatorConfig&lt;/code&gt;，允许指定 Mutator 函数用于修改 Kubernetes 资源。
例如，如果 PVC 派生体不是 PVC 的完美副本，并且你需要为驱动提供一些特定于提供程序的信息，
你可以通过可选的 &lt;code&gt;MutatorConfig&lt;/code&gt; 将这些信息加入。这使你能够自定义卷填充器中的 Kubernetes 对象。&lt;/p&gt;
&lt;!--
### Flexible metric handling for providers

Our beta phase highlighted a new requirement: the need to aggregate metrics not just from lib-volume-populator, but also from other components within the provider&#39;s codebase.
--&gt;
&lt;h3 id=&#34;灵活处理提供程序的指标&#34;&gt;灵活处理提供程序的指标&lt;/h3&gt;
&lt;p&gt;在 Beta 阶段，我们发现一个新需求：不仅需要从 lib-volume-populator
聚合指标，还要能够从提供程序代码库中的其他组件聚合指标。&lt;/p&gt;
&lt;!--
To address this, SIG Storage introduced a [provider metric manager](https://github.com/kubernetes-csi/lib-volume-populator/blob/8a922a5302fdba13a6c27328ee50e5396940214b/populator-machinery/controller.go#L122).
This enhancement delegates the implementation of metrics logic to the provider itself, rather than relying solely on lib-volume-populator.
This shift provides greater flexibility and control over metrics collection and aggregation, enabling a more comprehensive view of provider performance.
--&gt;
&lt;p&gt;为此，SIG Storage 引入了一个&lt;a href=&#34;https://github.com/kubernetes-csi/lib-volume-populator/blob/8a922a5302fdba13a6c27328ee50e5396940214b/populator-machinery/controller.go#L122&#34;&gt;提供程序指标管理器&lt;/a&gt;。
此增强特性将指标逻辑的实现委托给提供程序自身，而不再仅仅依赖于 lib-volume-populator。
这种转变使指标收集与聚合更灵活、更好控制，有助于更好地观察提供程序的总体性能。&lt;/p&gt;
&lt;!--
### Clean up for temporary resources

During the beta phase, we identified potential resource leaks with PersistentVolumeClaim (PVC) deletion while volume population was in progress, due to limitations in finalizer handling. We have improved the populator to support the deletion of temporary resources (PVC prime, etc.) if the original PVC is deleted in this GA release.
--&gt;
&lt;h3 id=&#34;清理临时资源&#34;&gt;清理临时资源&lt;/h3&gt;
&lt;p&gt;在 Beta 阶段，我们发现当卷填充过程尚未完成时删除 PVC 会导致资源泄露问题，这是由于
Finalizer 的局限性引起的。在 GA 版本中，我们改善了填充器特性，在原始 PVC 被删除时支持删除临时资源（如 PVC 派生体等）。&lt;/p&gt;
&lt;!--
## How to use it

To try it out, please follow the [steps](/blog/2022/05/16/volume-populators-beta/#trying-it-out) in the previous beta blog.

## Future directions and potential feature requests

For next step, there are several potential feature requests for volume populator:
--&gt;
&lt;h2 id=&#34;how-to-use-it&#34;&gt;如何使用  &lt;/h2&gt;
&lt;p&gt;如需试用，请参考之前 Beta 版本博客中的&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2022/05/16/volume-populators-beta/#trying-it-out&#34;&gt;操作步骤&lt;/a&gt;。&lt;/p&gt;
&lt;h2 id=&#34;future-directions-and-potential-feature-requests&#34;&gt;后续方向与潜在特性请求  &lt;/h2&gt;
&lt;p&gt;下一阶段，卷填充器可能会引入以下特性请求：&lt;/p&gt;
&lt;!--
* Multi sync: the current implementation is a one-time unidirectional sync from source to destination. This can be extended to support multiple syncs, enabling periodic syncs or allowing users to sync on demand
* Bidirectional sync: an extension of multi sync above, but making it bidirectional between source and destination
* Populate data with priorities: with a list of different dataSourceRef, populate based on priorities
* Populate data from multiple sources of the same provider: populate multiple different sources to one destination
* Populate data from multiple sources of the different providers: populate multiple different sources to one destination, pipelining different resources’ population
--&gt;
&lt;ul&gt;
&lt;li&gt;多次同步：当前实现是从源到目的地的一次性单向同步，可以扩展为支持周期性同步或允许用户按需同步。&lt;/li&gt;
&lt;li&gt;双向同步：多次同步的扩展版本，实现源与目的地之间的双向同步。&lt;/li&gt;
&lt;li&gt;基于优先级的数据填充：提供多个 dataSourceRef，并按优先级进行数据填充。&lt;/li&gt;
&lt;li&gt;从同一提供程序的多个源填充数据：将多个不同源填充到同一个目的地。&lt;/li&gt;
&lt;li&gt;从不同提供程序的多个源填充数据：将多个不同源填充到一个目的地，支持流水线式的不同资源的填充。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
To ensure we&#39;re building something truly valuable, Kubernetes SIG Storage would love to hear about any specific use cases you have in mind for this feature.
For any inquiries or specific questions related to volume populator, please reach out to the [SIG Storage community](https://github.com/kubernetes/community/tree/master/sig-storage).
--&gt;
&lt;p&gt;为了确保我们构建的特性真正有价值，Kubernetes SIG Storage 非常希望了解你所知道的与此特性有关的任何具体使用场景。
如有任何关于卷填充器的疑问或特定问题，请联系
&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-storage&#34;&gt;SIG Storage 社区&lt;/a&gt;。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33：防止无序删除时 PersistentVolume 泄漏特性进阶到 GA</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/05/kubernetes-v1-33-prevent-persistentvolume-leaks-when-deleting-out-of-order-graduate-to-ga/</link>
      <pubDate>Mon, 05 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/05/kubernetes-v1-33-prevent-persistentvolume-leaks-when-deleting-out-of-order-graduate-to-ga/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#39;Kubernetes v1.33: Prevent PersistentVolume Leaks When Deleting out of Order graduates to GA&#39;
date: 2025-05-05T10:30:00-08:00
slug: kubernetes-v1-33-prevent-persistentvolume-leaks-when-deleting-out-of-order-graduate-to-ga
author: &gt;
  Deepak Kinni (Broadcom)
--&gt;
&lt;!--
I am thrilled to announce that the feature to prevent
[PersistentVolume](/docs/concepts/storage/persistent-volumes/) (or PVs for short)
leaks when deleting out of order has graduated to General Availability (GA) in
Kubernetes v1.33! This improvement, initially introduced as a beta
feature in Kubernetes v1.31, ensures that your storage resources are properly
reclaimed, preventing unwanted leaks.
--&gt;
&lt;p&gt;我很高兴地宣布，当无序删除时防止
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/storage/persistent-volumes/&#34;&gt;PersistentVolume&lt;/a&gt;（简称 PV）
泄漏的特性已经在 Kubernetes v1.33 中进阶为正式版（GA）！这项改进最初在
Kubernetes v1.31 中作为 Beta 特性引入，
确保你的存储资源能够被正确回收，防止不必要的泄漏。&lt;/p&gt;
&lt;!--
## How did reclaim work in previous Kubernetes releases?

[PersistentVolumeClaim](/docs/concepts/storage/persistent-volumes/#Introduction) (or PVC for short) is
a user&#39;s request for storage. A PV and PVC are considered [Bound](/docs/concepts/storage/persistent-volumes/#Binding)
if a newly created PV or a matching PV is found. The PVs themselves are
backed by volumes allocated by the storage backend.
--&gt;
&lt;h2 id=&#34;以前的-kubernetes-版本中-reclaim-是如何工作的&#34;&gt;以前的 Kubernetes 版本中 reclaim 是如何工作的？&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/storage/persistent-volumes/#Introduction&#34;&gt;PersistentVolumeClaim&lt;/a&gt;（简称 PVC）
是用户对存储的请求。如果创建了新的 PV 或找到了匹配的 PV，则认为 PV 和 PVC
是&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/storage/persistent-volumes/#Binding&#34;&gt;绑定&lt;/a&gt;的。
PV 本身由存储后端分配的卷支持。&lt;/p&gt;
&lt;!--
Normally, if the volume is to be deleted, then the expectation is to delete the
PVC for a bound PV-PVC pair. However, there are no restrictions on deleting a PV
before deleting a PVC.
--&gt;
&lt;p&gt;通常，如果卷需要被删除，则预期是删除绑定的 PV-PVC 对的 PVC。但是，
删除 PVC 之前并没有限制不能删除 PV。&lt;/p&gt;
&lt;!--
For a `Bound` PV-PVC pair, the ordering of PV-PVC deletion determines whether
the PV reclaim policy is honored. The reclaim policy is honored if the PVC is
deleted first; however, if the PV is deleted prior to deleting the PVC, then the
reclaim policy is not exercised. As a result of this behavior, the associated
storage asset in the external infrastructure is not removed.
--&gt;
&lt;p&gt;对于一个“已绑定”的 PV-PVC 对，PV 和 PVC 的删除顺序决定了是否遵守 PV 回收策略。
如果先删除 PVC，则会遵守回收策略；然而，如果在删除 PVC 之前删除了 PV，
则不会执行回收策略。因此，外部基础设施中相关的存储资源不会被移除。&lt;/p&gt;
&lt;!--
## PV reclaim policy with Kubernetes v1.33

With the graduation to GA in Kubernetes v1.33, this issue is now resolved. Kubernetes
now reliably honors the configured `Delete` reclaim policy, even when PVs are deleted
before their bound PVCs. This is achieved through the use of finalizers,
ensuring that the storage backend releases the allocated storage resource as intended.
--&gt;
&lt;h2 id=&#34;在-kubernetes-v1-33-中的-pv-回收策略&#34;&gt;在 Kubernetes v1.33 中的 PV 回收策略&lt;/h2&gt;
&lt;p&gt;随着在 Kubernetes v1.33 中升级为 GA，这个问题现在得到了解决。
Kubernetes 现在可靠地遵循配置的 &lt;code&gt;Delete&lt;/code&gt; 回收策略（即使在删除 PV
时，其绑定的 PVC 尚未被删除）。这是通过使用 Finalizer 来实现的，
确保存储后端如预期释放分配的存储资源。&lt;/p&gt;
&lt;!--
### How does it work?

For CSI volumes, the new behavior is achieved by adding a [finalizer](/docs/concepts/overview/working-with-objects/finalizers/) `external-provisioner.volume.kubernetes.io/finalizer`
on new and existing PVs. The finalizer is only removed after the storage from the backend is deleted. Addition or removal of finalizer is handled by `external-provisioner`

An example of a PV with the finalizer, notice the new finalizer in the finalizers list
--&gt;
&lt;h3 id=&#34;它是如何工作的&#34;&gt;它是如何工作的？&lt;/h3&gt;
&lt;p&gt;对于 CSI 卷，新的行为是通过在新创建和现有的 PV 上添加
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/overview/working-with-objects/finalizers/&#34;&gt;Finalizer&lt;/a&gt;
&lt;code&gt;external-provisioner.volume.kubernetes.io/finalizer&lt;/code&gt; 来实现的。
只有在后端存储被删除后，Finalizer 才会被移除。&lt;/p&gt;
&lt;p&gt;下面是一个带 Finalizer 的 PV 示例，请注意 Finalizer 列表中的新 Finalizer：&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;kubectl get pv pvc-a7b7e3ba-f837-45ba-b243-dec7d8aaed53 -o yaml
&lt;/code&gt;&lt;/pre&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;PersistentVolume&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;annotations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;pv.kubernetes.io/provisioned-by&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;csi.example.driver.com&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;creationTimestamp&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;2021-11-17T19:28:56Z&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;finalizers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- kubernetes.io/pv-protection&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- external-provisioner.volume.kubernetes.io/finalizer&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;pvc-a7b7e3ba-f837-45ba-b243-dec7d8aaed53&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;resourceVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;194711&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;uid&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;087f14f2-4157-4e95-8a70-8294b039d30e&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;accessModes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- ReadWriteOnce&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;capacity&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;storage&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;1Gi&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;claimRef&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;PersistentVolumeClaim&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;example-vanilla-block-pvc&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;namespace&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;default&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;resourceVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;194677&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;uid&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;a7b7e3ba-f837-45ba-b243-dec7d8aaed53&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;csi&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;driver&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;csi.example.driver.com&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;fsType&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;ext4&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeAttributes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;storage.kubernetes.io/csiProvisionerIdentity&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;1637110610497-8081&lt;/span&gt;-csi.example.driver.com&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;type&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;CNS Block Volume&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeHandle&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;2dacf297-803f-4ccc-afc7-3d3c3f02051e&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;persistentVolumeReclaimPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Delete&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;storageClassName&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;example-vanilla-block-sc&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeMode&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Filesystem&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;status&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;phase&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Bound&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
The [finalizer](/docs/concepts/overview/working-with-objects/finalizers/) prevents this
PersistentVolume from being removed from the
cluster. As stated previously, the finalizer is only removed from the PV object
after it is successfully deleted from the storage backend. To learn more about
finalizers, please refer to [Using Finalizers to Control Deletion](/blog/2021/05/14/using-finalizers-to-control-deletion/).

Similarly, the finalizer `kubernetes.io/pv-controller` is added to dynamically provisioned in-tree plugin volumes.
--&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/overview/working-with-objects/finalizers/&#34;&gt;Finalizer&lt;/a&gt;
防止此 PersistentVolume 从集群中被移除。如前文所述，Finalizer 仅在从存储后端被成功删除后才会从
PV 对象中被移除。进一步了解 Finalizer，
请参阅&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2021/05/14/using-finalizers-to-control-deletion/&#34;&gt;使用 Finalizer 控制删除&lt;/a&gt;。&lt;/p&gt;
&lt;p&gt;同样，Finalizer &lt;code&gt;kubernetes.io/pv-controller&lt;/code&gt; 也被添加到动态制备的树内插件卷中。&lt;/p&gt;
&lt;!--
### Important note

The fix does not apply to statically provisioned in-tree plugin volumes.
--&gt;
&lt;h3 id=&#34;重要提示&#34;&gt;重要提示&lt;/h3&gt;
&lt;p&gt;此修复不适用于静态制备的内置插件卷。&lt;/p&gt;
&lt;!--
## How to enable new behavior?

To take advantage of the new behavior, you must have upgraded your cluster to the v1.33 release of Kubernetes
and run the CSI [`external-provisioner`](https://github.com/kubernetes-csi/external-provisioner) version `5.0.1` or later.
The feature was released as beta in v1.31 release of Kubernetes, where it was enabled by default.
--&gt;
&lt;h2 id=&#34;如何启用新行为&#34;&gt;如何启用新行为？&lt;/h2&gt;
&lt;p&gt;要利用新行为，你必须将集群升级到 Kubernetes 的 v1.33 版本，
并运行 CSI &lt;a href=&#34;https://github.com/kubernetes-csi/external-provisioner&#34;&gt;&lt;code&gt;external-provisioner&lt;/code&gt;&lt;/a&gt;
5.0.1 或更新版本。
此特性在 Kubernetes 的 v1.31 版本中作为 Beta 版发布，并且默认启用。&lt;/p&gt;
&lt;!--
## References

* [KEP-2644](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2644-honor-pv-reclaim-policy)
* [Volume leak issue](https://github.com/kubernetes-csi/external-provisioner/issues/546)
* [Beta Release Blog](/blog/2024/08/16/kubernetes-1-31-prevent-persistentvolume-leaks-when-deleting-out-of-order/)
--&gt;
&lt;h2 id=&#34;参考&#34;&gt;参考&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2644-honor-pv-reclaim-policy&#34;&gt;KEP-2644&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes-csi/external-provisioner/issues/546&#34;&gt;卷泄漏问题&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2024/08/16/kubernetes-1-31-prevent-persistentvolume-leaks-when-deleting-out-of-order/&#34;&gt;Beta 版发布博客&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## How do I get involved?

The Kubernetes Slack channel [SIG Storage communication channels](https://github.com/kubernetes/community/blob/master/sig-storage/README.md#contact) are great mediums to reach out to the SIG Storage and migration working group teams.

Special thanks to the following people for the insightful reviews, thorough consideration and valuable contribution:
--&gt;
&lt;h2 id=&#34;如何参与&#34;&gt;如何参与？&lt;/h2&gt;
&lt;p&gt;Kubernetes Slack 频道
&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-storage/README.md#contact&#34;&gt;SIG Storage 交流渠道&lt;/a&gt;是接触
SIG Storage 和迁移工作组团队的绝佳方式。&lt;/p&gt;
&lt;p&gt;特别感谢以下人员的深入审查、细致考虑和宝贵贡献：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fan Baofa (carlory)&lt;/li&gt;
&lt;li&gt;Jan Šafránek (jsafrane)&lt;/li&gt;
&lt;li&gt;Xing Yang (xing-yang)&lt;/li&gt;
&lt;li&gt;Matthew Wong (wongma7)&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Join the [Kubernetes Storage Special Interest Group (SIG)](https://github.com/kubernetes/community/tree/master/sig-storage) if you&#39;re interested in getting involved with the design and development of CSI or any part of the Kubernetes Storage system. We’re rapidly growing and always welcome new contributors.
--&gt;
&lt;p&gt;如果你对 CSI 或 Kubernetes 存储系统的任何部分的设计和开发感兴趣，
可以加入 &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-storage&#34;&gt;Kubernetes 存储特别兴趣小组（SIG）&lt;/a&gt;。
我们正在迅速成长，并且总是欢迎新的贡献者。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33：可变的 CSI 节点可分配数</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/02/kubernetes-1-33-mutable-csi-node-allocatable-count/</link>
      <pubDate>Fri, 02 May 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/05/02/kubernetes-1-33-mutable-csi-node-allocatable-count/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.33: Mutable CSI Node Allocatable Count&#34;
date: 2025-05-02T10:30:00-08:00
slug: kubernetes-1-33-mutable-csi-node-allocatable-count
author: Eddie Torres (Amazon Web Services)
--&gt;
&lt;!--
Scheduling stateful applications reliably depends heavily on accurate information about resource availability on nodes.
Kubernetes v1.33 introduces an alpha feature called *mutable CSI node allocatable count*, allowing Container Storage Interface (CSI) drivers to dynamically update the reported maximum number of volumes that a node can handle.
This capability significantly enhances the accuracy of pod scheduling decisions and reduces scheduling failures caused by outdated volume capacity information.
--&gt;
&lt;p&gt;可靠调度有状态应用极度依赖于节点上资源可用性的准确信息。&lt;br&gt;
Kubernetes v1.33 引入一个名为&lt;strong&gt;可变的 CSI 节点可分配计数&lt;/strong&gt;的 Alpha 特性，允许
CSI（容器存储接口）驱动动态更新节点可以处理的最大卷数量。&lt;br&gt;
这一能力显著提升 Pod 调度决策的准确性，并减少因卷容量信息过时而导致的调度失败。&lt;/p&gt;
&lt;!--
## Background

Traditionally, Kubernetes CSI drivers report a static maximum volume attachment limit when initializing. However, actual attachment capacities can change during a node&#39;s lifecycle for various reasons, such as:

- Manual or external operations attaching/detaching volumes outside of Kubernetes control.
- Dynamically attached network interfaces or specialized hardware (GPUs, NICs, etc.) consuming available slots.
- Multi-driver scenarios, where one CSI driver’s operations affect available capacity reported by another.
--&gt;
&lt;h2 id=&#34;background&#34;&gt;背景  &lt;/h2&gt;
&lt;p&gt;传统上，Kubernetes 中的 CSI 驱动在初始化时会报告一个静态的最大卷挂接限制。
然而，在节点生命周期内，实际的挂接容量可能会由于多种原因发生变化，例如：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;在 Kubernetes 控制之外的手动或外部操作挂接/解除挂接卷。&lt;/li&gt;
&lt;li&gt;动态挂接的网络接口或专用硬件（如 GPU、NIC 等）占用可用的插槽。&lt;/li&gt;
&lt;li&gt;在多驱动场景中，一个 CSI 驱动的操作会影响另一个驱动所报告的可用容量。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Static reporting can cause Kubernetes to schedule pods onto nodes that appear to have capacity but don&#39;t, leading to pods stuck in a `ContainerCreating` state.

## Dynamically adapting CSI volume limits

With the new feature gate `MutableCSINodeAllocatableCount`, Kubernetes enables CSI drivers to dynamically adjust and report node attachment capacities at runtime. This ensures that the scheduler has the most accurate, up-to-date view of node capacity.
--&gt;
&lt;p&gt;静态报告可能导致 Kubernetes 将 Pod 调度到看似有容量但实际没有的节点上，进而造成
Pod 长时间卡在 &lt;code&gt;ContainerCreating&lt;/code&gt; 状态。&lt;/p&gt;
&lt;h2 id=&#34;dynamically-adapting-csi-volume-limits&#34;&gt;动态适应 CSI 卷限制  &lt;/h2&gt;
&lt;p&gt;借助新的特性门控 &lt;code&gt;MutableCSINodeAllocatableCount&lt;/code&gt;，Kubernetes 允许 CSI
驱动在运行时动态调整并报告节点的挂接容量。如此确保调度器能获取到最准确、最新的节点容量信息。&lt;/p&gt;
&lt;!--
### How it works

When this feature is enabled, Kubernetes supports two mechanisms for updating the reported node volume limits:

- **Periodic Updates:** CSI drivers specify an interval to periodically refresh the node&#39;s allocatable capacity.
- **Reactive Updates:** An immediate update triggered when a volume attachment fails due to exhausted resources (`ResourceExhausted` error).
--&gt;
&lt;h3 id=&#34;how-it-works&#34;&gt;工作原理  &lt;/h3&gt;
&lt;p&gt;启用此特性后，Kubernetes 支持通过以下两种机制来更新节点卷限制的报告值：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;周期性更新：&lt;/strong&gt; CSI 驱动指定一个间隔时间，来定期刷新节点的可分配容量。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;响应式更新：&lt;/strong&gt; 当因资源耗尽（&lt;code&gt;ResourceExhausted&lt;/code&gt; 错误）导致卷挂接失败时，立即触发更新。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
### Enabling the feature

To use this alpha feature, you must enable the `MutableCSINodeAllocatableCount` feature gate in these components:
--&gt;
&lt;h3 id=&#34;enabling-the-feature&#34;&gt;启用此特性  &lt;/h3&gt;
&lt;p&gt;要使用此 Alpha 特性，你必须在以下组件中启用 &lt;code&gt;MutableCSINodeAllocatableCount&lt;/code&gt; 特性门控：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;kube-apiserver&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kubelet&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
### Example CSI driver configuration

Below is an example of configuring a CSI driver to enable periodic updates every 60 seconds:
--&gt;
&lt;h3 id=&#34;example-csi-driver-configuration&#34;&gt;CSI 驱动配置示例  &lt;/h3&gt;
&lt;p&gt;以下是配置 CSI 驱动以每 60 秒进行一次周期性更新的示例：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;storage.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;CSIDriver&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;example.csi.k8s.io&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;nodeAllocatableUpdatePeriodSeconds&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;60&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
This configuration directs Kubelet to periodically call the CSI driver&#39;s `NodeGetInfo` method every 60 seconds, updating the node’s allocatable volume count. Kubernetes enforces a minimum update interval of 10 seconds to balance accuracy and resource usage.
--&gt;
&lt;p&gt;此配置会指示 Kubelet 每 60 秒调用一次 CSI 驱动的 &lt;code&gt;NodeGetInfo&lt;/code&gt; 方法，从而更新节点的可分配卷数量。&lt;br&gt;
Kubernetes 强制要求最小更新间隔时间为 10 秒，以平衡准确性和资源使用量。&lt;/p&gt;
&lt;!--
### Immediate updates on attachment failures

In addition to periodic updates, Kubernetes now reacts to attachment failures. Specifically, if a volume attachment fails with a `ResourceExhausted` error (gRPC code `8`), an immediate update is triggered to correct the allocatable count promptly.

This proactive correction prevents repeated scheduling errors and helps maintain cluster health.
--&gt;
&lt;h3 id=&#34;immediate-updates-on-attachment-failures&#34;&gt;挂接失败时的即时更新  &lt;/h3&gt;
&lt;p&gt;除了周期性更新外，Kubernetes 现在也能对挂接失败做出响应。&lt;br&gt;
具体来说，如果卷挂接由于 &lt;code&gt;ResourceExhausted&lt;/code&gt; 错误（gRPC 错误码 &lt;code&gt;8&lt;/code&gt;）而失败，将立即触发更新，以快速纠正可分配数量。&lt;/p&gt;
&lt;p&gt;这种主动纠正可以防止重复的调度错误，有助于保持集群的健康状态。&lt;/p&gt;
&lt;!--
## Getting started

To experiment with mutable CSI node allocatable count in your Kubernetes v1.33 cluster:

1. Enable the feature gate `MutableCSINodeAllocatableCount` on the `kube-apiserver` and `kubelet` components.
2. Update your CSI driver configuration by setting `nodeAllocatableUpdatePeriodSeconds`.
3. Monitor and observe improvements in scheduling accuracy and pod placement reliability.
--&gt;
&lt;h2 id=&#34;getting-started&#34;&gt;快速开始   &lt;/h2&gt;
&lt;p&gt;要在 Kubernetes v1.33 集群中试用可变的 CSI 节点可分配数：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;在 &lt;code&gt;kube-apiserver&lt;/code&gt; 和 &lt;code&gt;kubelet&lt;/code&gt; 组件上启用特性门控 &lt;code&gt;MutableCSINodeAllocatableCount&lt;/code&gt;。&lt;/li&gt;
&lt;li&gt;在 CSI 驱动配置中设置 &lt;code&gt;nodeAllocatableUpdatePeriodSeconds&lt;/code&gt;。&lt;/li&gt;
&lt;li&gt;监控并观察调度准确性和 Pod 放置可靠性的提升程度。&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
## Next steps

This feature is currently in alpha and the Kubernetes community welcomes your feedback. Test it, share your experiences, and help guide its evolution toward beta and GA stability.

Join discussions in the [Kubernetes Storage Special Interest Group (SIG-Storage)](https://github.com/kubernetes/community/tree/master/sig-storage) to shape the future of Kubernetes storage capabilities.
--&gt;
&lt;h2 id=&#34;next-steps&#34;&gt;后续计划  &lt;/h2&gt;
&lt;p&gt;此特性目前处于 Alpha 阶段，Kubernetes 社区欢迎你的反馈。
无论是参与测试、分享你的经验，都有助于推动此特性向 Beta 和 GA（正式发布）稳定版迈进。&lt;/p&gt;
&lt;p&gt;欢迎加入 &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-storage&#34;&gt;Kubernetes SIG-Storage&lt;/a&gt;
的讨论，共同塑造 Kubernetes 存储能力的未来。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33：存储动态制备模式下的节点存储容量评分（Alpha 版）</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/04/30/kubernetes-v1-33-storage-capacity-scoring-feature/</link>
      <pubDate>Wed, 30 Apr 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/04/30/kubernetes-v1-33-storage-capacity-scoring-feature/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.33: Storage Capacity Scoring of Nodes for Dynamic Provisioning (alpha)&#34;
date: 2025-04-30T10:30:00-08:00
slug: kubernetes-v1-33-storage-capacity-scoring-feature
author: &gt;
  Yuma Ogami (Cybozu)
--&gt;
&lt;!--
Kubernetes v1.33 introduces a new alpha feature called `StorageCapacityScoring`. This feature adds a scoring method for pod scheduling
with [the topology-aware volume provisioning](/blog/2018/10/11/topology-aware-volume-provisioning-in-kubernetes/).
This feature eases to schedule pods on nodes with either the most or least available storage capacity.
--&gt;
&lt;p&gt;Kubernetes v1.33 引入了一个名为 &lt;code&gt;StorageCapacityScoring&lt;/code&gt; 的新 Alpha 级别&lt;strong&gt;特性&lt;/strong&gt;。
此&lt;strong&gt;特性&lt;/strong&gt;添加了一种为 Pod 调度评分的方法，
并与&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2018/10/11/topology-aware-volume-provisioning-in-kubernetes/&#34;&gt;拓扑感知卷制备&lt;/a&gt;相关。
此&lt;strong&gt;特性&lt;/strong&gt;可以轻松地选择在具有最多或最少可用存储容量的节点上调度 Pod。&lt;/p&gt;
&lt;!--
## About this feature

This feature extends the kube-scheduler&#39;s VolumeBinding plugin to perform scoring using node storage capacity information
obtained from [Storage Capacity](/docs/concepts/storage/storage-capacity/). Currently, you can only filter out nodes with insufficient storage capacity.
So, you have to use a scheduler extender to achieve storage-capacity-based pod scheduling.
--&gt;
&lt;h2 id=&#34;about-this-feature&#34;&gt;关于此特性  &lt;/h2&gt;
&lt;p&gt;此特性扩展了 kube-scheduler 的 VolumeBinding 插件，
以使用从&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/storage/storage-capacity/&#34;&gt;存储容量&lt;/a&gt;获得的节点存储容量信息进行评分。
目前，你只能过滤掉存储容量不足的节点。因此，你必须使用调度器扩展程序来实现基于存储容量的 Pod 调度。&lt;/p&gt;
&lt;!--
This feature is useful for provisioning node-local PVs, which have size limits based on the node&#39;s storage capacity. By using this feature,
you can assign the PVs to the nodes with the most available storage space so that you can expand the PVs later as much as possible.

In another use case, you might want to reduce the number of nodes as much as possible for low operation costs in cloud environments by choosing
the least storage capacity node. This feature helps maximize resource utilization by filling up nodes more sequentially, starting with the most
utilized nodes first that still have enough storage capacity for the requested volume size.
--&gt;
&lt;p&gt;此特性对于制备节点本地的 PV 非常有用，这些 PV 的大小限制取决于节点的存储容量。
通过使用此特性，你可以将 PV 指派给具有最多可用存储空间的节点，
以便以后尽可能多地扩展 PV。&lt;/p&gt;
&lt;p&gt;在另一个用例中，你可能希望通过选择存储容量最小的节点，
在云环境中尽可能减少节点数量以降低运维成本。
此特性通过从利用率最高的节点开始填充节点，从而帮助最大化资源利用率，
前提是这些节点仍有足够的存储容量来满足请求的卷大小。&lt;/p&gt;
&lt;!--
## How to use

### Enabling the feature

In the alpha phase, `StorageCapacityScoring` is disabled by default. To use this feature, add `StorageCapacityScoring=true`
to the kube-scheduler command line option `--feature-gates`.
--&gt;
&lt;h2 id=&#34;how-to-use&#34;&gt;如何使用  &lt;/h2&gt;
&lt;h3 id=&#34;enabling-the-feature&#34;&gt;启用此特性  &lt;/h3&gt;
&lt;p&gt;在 Alpha 阶段，&lt;code&gt;StorageCapacityScoring&lt;/code&gt; 默认是禁用的。要使用此特性，请将
&lt;code&gt;StorageCapacityScoring=true&lt;/code&gt; 添加到 kube-scheduler 命令行选项
&lt;code&gt;--feature-gates&lt;/code&gt; 中。&lt;/p&gt;
&lt;!--
### Configuration changes

You can configure node priorities based on storage utilization using the `shape` parameter in the VolumeBinding plugin configuration.
This allows you to prioritize nodes with higher available storage capacity (default) or, conversely, nodes with lower available storage capacity.
For example, to prioritize lower available storage capacity, configure `KubeSchedulerConfiguration` as follows:
--&gt;
&lt;h3 id=&#34;configuration-changes&#34;&gt;配置更改  &lt;/h3&gt;
&lt;p&gt;你可以使用 VolumeBinding 插件配置中的 &lt;code&gt;shape&lt;/code&gt; 参数，根据存储利用率来配置节点优先级。
这允许你优先考虑具有更高可用存储容量（默认）的节点，或者相反，优先考虑具有更低可用存储容量的节点。
例如，要优先考虑更低的可用存储容量，请按如下方式配置 &lt;code&gt;KubeSchedulerConfiguration&lt;/code&gt;：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;kubescheduler.config.k8s.io/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;KubeSchedulerConfiguration&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;profiles&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;...&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;pluginConfig&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;VolumeBinding&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;args&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;...&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;shape&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;utilization&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;0&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;score&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;0&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;utilization&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;100&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;score&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;10&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
For more details, please refer to the [documentation](/docs/reference/config-api/kube-scheduler-config.v1/#kubescheduler-config-k8s-io-v1-VolumeBindingArgs).
--&gt;
&lt;p&gt;详情请参阅&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/config-api/kube-scheduler-config.v1/#kubescheduler-config-k8s-io-v1-VolumeBindingArgs&#34;&gt;文档&lt;/a&gt;。&lt;/p&gt;
&lt;!--
## Further reading
--&gt;
&lt;h2 id=&#34;further-reading&#34;&gt;进一步阅读  &lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/4049-storage-capacity-scoring-of-nodes-for-dynamic-provisioning/README.md&#34;&gt;KEP-4049: Storage Capacity Scoring of Nodes for Dynamic Provisioning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Additional note: Relationship with VolumeCapacityPriority

The alpha feature gate `VolumeCapacityPriority`, which performs node scoring based on available storage capacity during static provisioning,
will be deprecated and replaced by `StorageCapacityScoring`.
--&gt;
&lt;h2 id=&#34;附加说明-与-volumecapacitypriority-的关系&#34;&gt;附加说明：与 VolumeCapacityPriority 的关系&lt;/h2&gt;
&lt;p&gt;基于静态配置期间的可用存储容量进行节点评分的 Alpha &lt;strong&gt;特性门控&lt;/strong&gt;
&lt;code&gt;VolumeCapacityPriority&lt;/code&gt;，将被弃用，并由 &lt;code&gt;StorageCapacityScoring&lt;/code&gt; 替代。&lt;/p&gt;
&lt;!--
Please note that while `VolumeCapacityPriority` prioritizes nodes with lower available storage capacity by default,
`StorageCapacityScoring` prioritizes nodes with higher available storage capacity by default.
--&gt;
&lt;p&gt;请注意，虽然 &lt;code&gt;VolumeCapacityPriority&lt;/code&gt; 默认优先考虑可用存储容量较低的节点，
但 &lt;code&gt;StorageCapacityScoring&lt;/code&gt; 默认优先考虑可用存储容量较高的节点。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33：镜像卷进阶至 Beta！</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/04/29/kubernetes-v1-33-image-volume-beta/</link>
      <pubDate>Tue, 29 Apr 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/04/29/kubernetes-v1-33-image-volume-beta/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.33: Image Volumes graduate to beta!&#34;
date: 2025-04-29T10:30:00-08:00
slug: kubernetes-v1-33-image-volume-beta
author: Sascha Grunert (Red Hat)
--&gt;
&lt;!--
[Image Volumes](/blog/2024/08/16/kubernetes-1-31-image-volume-source) were
introduced as an Alpha feature with the Kubernetes v1.31 release as part of
[KEP-4639](https://github.com/kubernetes/enhancements/issues/4639). In Kubernetes v1.33, this feature graduates to **beta**.
--&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2024/08/16/kubernetes-1-31-image-volume-source&#34;&gt;镜像卷&lt;/a&gt;作为
Alpha 特性首次引入 Kubernetes v1.31 版本，并作为
&lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/4639&#34;&gt;KEP-4639&lt;/a&gt;
的一部分发布。在 Kubernetes v1.33 中，此特性进阶至 &lt;strong&gt;Beta&lt;/strong&gt;。&lt;/p&gt;
&lt;!--
Please note that the feature is still _disabled_ by default, because not all
[container runtimes](/docs/setup/production-environment/container-runtimes) have
full support for it. [CRI-O](https://cri-o.io) supports the initial feature since version v1.31 and
will add support for Image Volumes as beta in v1.33.
[containerd merged](https://github.com/containerd/containerd/pull/10579) support
for the alpha feature which will be part of the v2.1.0 release and is working on
beta support as part of [PR #11578](https://github.com/containerd/containerd/pull/11578).
--&gt;
&lt;p&gt;请注意，此特性目前仍默认&lt;strong&gt;禁用&lt;/strong&gt;，
因为并非所有的&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/setup/production-environment/container-runtimes&#34;&gt;容器运行时&lt;/a&gt;都完全支持此特性。
&lt;a href=&#34;https://cri-o.io&#34;&gt;CRI-O&lt;/a&gt; 自 v1.31 起就支持此初始特性，并将在 v1.33 中添加对镜像卷的 Beta 支持。
&lt;a href=&#34;https://github.com/containerd/containerd/pull/10579&#34;&gt;containerd 已合并&lt;/a&gt;对 Alpha 特性的支持，
此特性将包含在 containerd v2.1.0 版本中，并正通过
&lt;a href=&#34;https://github.com/containerd/containerd/pull/11578&#34;&gt;PR #11578&lt;/a&gt; 实现对 Beta 的支持。&lt;/p&gt;
&lt;!--
### What&#39;s new

The major change for the beta graduation of Image Volumes is the support for
[`subPath`](/docs/concepts/storage/volumes/#using-subpath) and
[`subPathExpr`](/docs/concepts/storage/volumes/#using-subpath-expanded-environment) mounts
for containers via `spec.containers[*].volumeMounts.[subPath,subPathExpr]`. This
allows end-users to mount a certain subdirectory of an image volume, which is
still mounted as readonly (`noexec`). This means that non-existing
subdirectories cannot be mounted by default. As for other `subPath` and
`subPathExpr` values, Kubernetes will ensure that there are no absolute path or
relative path components part of the specified sub path. Container runtimes are
also required to double check those requirements for safety reasons. If a
specified subdirectory does not exist within a volume, then runtimes should fail
on container creation and provide user feedback by using existing kubelet
events.
--&gt;
&lt;h3 id=&#34;whats-new&#34;&gt;新增内容  &lt;/h3&gt;
&lt;p&gt;镜像卷进阶为 Beta 的主要变化是支持通过 &lt;code&gt;spec.containers[*].volumeMounts.[subPath,subPathExpr]&lt;/code&gt;
配置容器的 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/storage/volumes/#using-subpath&#34;&gt;&lt;code&gt;subPath&lt;/code&gt;&lt;/a&gt; 和
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/storage/volumes/#using-subpath-expanded-environment&#34;&gt;&lt;code&gt;subPathExpr&lt;/code&gt;&lt;/a&gt; 挂载。
这允许最终用户在保持只读（&lt;code&gt;noexec&lt;/code&gt;）方式挂载的同时可以挂载某镜像卷中的某个子目录。
这意味着默认情况下无法挂载不存在的子目录。与其他 &lt;code&gt;subPath&lt;/code&gt; 和 &lt;code&gt;subPathExpr&lt;/code&gt; 取值一样，
Kubernetes 将确保所指定的子路径中不包含绝对路径或相对路径成分。
出于安全考虑，容器运行时也需要再次验证这些要求。如果指定的子目录在卷中不存在，
则运行时应在创建容器时失败，并通过现有的 kubelet 事件向用户提供反馈。&lt;/p&gt;
&lt;!--
Besides that, there are also three new kubelet metrics available for image volumes:

- `kubelet_image_volume_requested_total`: Outlines the number of requested image volumes.
- `kubelet_image_volume_mounted_succeed_total`: Counts the number of successful image volume mounts.
- `kubelet_image_volume_mounted_errors_total`: Accounts the number of failed image volume mounts.
--&gt;
&lt;p&gt;除此之外，还为镜像卷新增三个 kubelet 指标：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;kubelet_image_volume_requested_total&lt;/code&gt;：统计请求镜像卷的数量。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kubelet_image_volume_mounted_succeed_total&lt;/code&gt;：统计镜像卷成功挂载的数量。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kubelet_image_volume_mounted_errors_total&lt;/code&gt;：统计镜像卷挂载失败的数量。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
To use an existing subdirectory for a specific image volume, just use it as
[`subPath`](/docs/concepts/storage/volumes/#using-subpath) (or
[`subPathExpr`](/docs/concepts/storage/volumes/#using-subpath-expanded-environment))
value of the containers `volumeMounts`:
--&gt;
&lt;p&gt;若要为特定镜像卷使用已有的子目录，只需将其用作容器 &lt;code&gt;volumeMounts&lt;/code&gt; 的
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/storage/volumes/#using-subpath&#34;&gt;&lt;code&gt;subPath&lt;/code&gt;&lt;/a&gt;
或 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/storage/volumes/#using-subpath-expanded-environment&#34;&gt;&lt;code&gt;subPathExpr&lt;/code&gt;&lt;/a&gt;
取值：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;image-volume&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;shell&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;sleep&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;infinity&amp;#34;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;debian&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeMounts&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;volume&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;mountPath&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;/volume&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;subPath&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;dir&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumes&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;volume&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;reference&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;quay.io/crio/artifact:v2&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;pullPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;IfNotPresent&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
Then, create the pod on your cluster:
--&gt;
&lt;p&gt;然后，在集群中创建 Pod：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl apply -f image-volumes-subpath.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
Now you can attach to the container:
--&gt;
&lt;p&gt;现在你可以挂接到容器：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl attach -it image-volume bash
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
And check the content of the file from the `dir` sub path in the volume:
--&gt;
&lt;p&gt;并查看卷中 &lt;code&gt;dir&lt;/code&gt; 子路径下的文件内容：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;cat /volume/file
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
The output will be similar to:
--&gt;
&lt;p&gt;输出将类似于：&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code class=&#34;language-none&#34; data-lang=&#34;none&#34;&gt;1
&lt;/code&gt;&lt;/pre&gt;&lt;!--
Thank you for reading through the end of this blog post! SIG Node is proud and
happy to deliver this feature graduation as part of Kubernetes v1.33.

As writer of this blog post, I would like to emphasize my special thanks to
**all** involved individuals out there!
--&gt;
&lt;p&gt;感谢你读完本博文！SIG Node 团队非常自豪和高兴地在 Kubernetes v1.33 中交付此特性的进阶版本。&lt;/p&gt;
&lt;p&gt;作为本文作者，我要特别感谢参与开发此特性的&lt;strong&gt;所有人&lt;/strong&gt;！&lt;/p&gt;
&lt;!--
If you would like to provide feedback or suggestions feel free to reach out
to SIG Node using the [Kubernetes Slack (#sig-node)](https://kubernetes.slack.com/messages/sig-node)
channel or the [SIG Node mailing list](https://groups.google.com/g/kubernetes-sig-node).
--&gt;
&lt;p&gt;如果你有任何反馈或建议，欢迎通过
&lt;a href=&#34;https://kubernetes.slack.com/messages/sig-node&#34;&gt;Kubernetes Slack (#sig-node)&lt;/a&gt;
频道或 &lt;a href=&#34;https://groups.google.com/g/kubernetes-sig-node&#34;&gt;SIG Node 邮件列表&lt;/a&gt;与 SIG Node 团队联系。&lt;/p&gt;
&lt;!--
## Further reading

- [Use an Image Volume With a Pod](/docs/tasks/configure-pod-container/image-volumes)
- [`image` volume overview](/docs/concepts/storage/volumes/#image)
--&gt;
&lt;h2 id=&#34;further-reading&#34;&gt;进一步阅读  &lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/configure-pod-container/image-volumes&#34;&gt;Pod 使用镜像卷&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/storage/volumes/#image&#34;&gt;&lt;code&gt;image&lt;/code&gt; 卷概览&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33：HorizontalPodAutoscaler 可配置容差</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/04/28/kubernetes-v1-33-hpa-configurable-tolerance/</link>
      <pubDate>Mon, 28 Apr 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/04/28/kubernetes-v1-33-hpa-configurable-tolerance/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.33: HorizontalPodAutoscaler Configurable Tolerance&#34;
slug: kubernetes-v1-33-hpa-configurable-tolerance
math: true # for formulae
date: 2025-04-28T10:30:00-08:00
author: &#34;Jean-Marc François (Google)&#34;
--&gt;
&lt;!--
This post describes _configurable tolerance for horizontal Pod autoscaling_,
a new alpha feature first available in Kubernetes 1.33.
--&gt;
&lt;p&gt;这篇文章描述了&lt;strong&gt;水平 Pod 自动扩缩的可配置容差&lt;/strong&gt;，
这是在 Kubernetes 1.33 中首次出现的一个新的 Alpha 特性。&lt;/p&gt;
&lt;!--
## What is it?

[Horizontal Pod Autoscaling](/docs/tasks/run-application/horizontal-pod-autoscale/)
is a well-known Kubernetes feature that allows your workload to
automatically resize by adding or removing replicas based on resource
utilization.
--&gt;
&lt;h2 id=&#34;它是什么&#34;&gt;它是什么？&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/run-application/horizontal-pod-autoscale/&#34;&gt;水平 Pod 自动扩缩&lt;/a&gt;
是 Kubernetes 中一个众所周知的特性，它允许你的工作负载根据资源利用率自动增减副本数量。&lt;/p&gt;
&lt;!--
Let&#39;s say you have a web application running in a Kubernetes cluster with 50
replicas. You configure the Horizontal Pod Autoscaler (HPA) to scale based on
CPU utilization, with a target of 75% utilization. Now, imagine that the current
CPU utilization across all replicas is 90%, which is higher than the desired
75%. The HPA will calculate the required number of replicas using the formula:
--&gt;
&lt;p&gt;假设你在 Kubernetes 集群中运行了一个具有 50 个副本的 Web 应用程序。
你配置了 Horizontal Pod Autoscaler （HPA）根据 CPU 利用率进行扩缩，
目标利用率 75%。现在，假设所有副本的当前 CPU 利用率为 90%，
这高于预期的 75%。HPA 将使用以下公式计算所需的副本数量：&lt;/p&gt;

&lt;div class=&#34;math&#34;&gt;$$desiredReplicas = ceil\left\lceil currentReplicas \times \frac{currentMetricValue}{desiredMetricValue} \right\rceil$$&lt;/div&gt;&lt;!--
In this example:
--&gt;
&lt;p&gt;在此示例中：&lt;/p&gt;

&lt;div class=&#34;math&#34;&gt;$$50 \times (90/75) = 60$$&lt;/div&gt;&lt;!--
So, the HPA will increase the number of replicas from 50 to 60 to reduce the
load on each pod. Similarly, if the CPU utilization were to drop below 75%, the
HPA would scale down the number of replicas accordingly. The Kubernetes
documentation provides a
[detailed description of the scaling algorithm](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details).
--&gt;
&lt;p&gt;因此，HPA 将增加副本数量从 50 个提高到 60 个，以减少每个 Pod 的负载。
同样，如果 CPU 利用率降至 75% 以下，HPA 会相应地减少副本数量。
Kubernetes 文档提供了&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details&#34;&gt;扩缩算法的详细描述&lt;/a&gt;。&lt;/p&gt;
&lt;!--
In order to avoid replicas being created or deleted whenever a small metric
fluctuation occurs, Kubernetes applies a form of hysteresis: it only changes the
number of replicas when the current and desired metric values differ by more
than 10%. In the example above, since the ratio between the current and desired
metric values is \\(90/75\\), or 20% above target, exceeding the 10% tolerance,
the scale-up action will proceed.
--&gt;
&lt;p&gt;为了避免在指标发生小波动时创建或删除副本，
Kubernetes 应用了一种迟滞形式：仅当当前和期望的指标值差异超过 10% 时，
才改变副本数量。在上面的例子中，因为当前和期望的指标值比率是 \(90/75\)，
即超出目标 20%，超过了 10% 的容差，所以扩容操作将继续进行。&lt;/p&gt;
&lt;!--
This default tolerance of 10% is cluster-wide; in older Kubernetes releases, it
could not be fine-tuned. It&#39;s a suitable value for most usage, but too coarse
for large deployments, where a 10% tolerance represents tens of pods. As a
result, the community has long
[asked](https://github.com/kubernetes/kubernetes/issues/116984) to be able to
tune this value.

In Kubernetes v1.33, this is now possible.
--&gt;
&lt;p&gt;这个 10% 的默认容差是集群范围的；在旧版本的 Kubernetes 中，
它无法进行微调。对于大多数使用场景来说，这是一个合适的值，
但对于大型部署而言则过于粗糙，因为 10% 的容差代表着数十个 Pod。
因此，社区长期以来&lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/116984&#34;&gt;要求&lt;/a&gt;能够调整这个值。&lt;/p&gt;
&lt;p&gt;在 Kubernetes v1.33 中，现在这已成为可能。&lt;/p&gt;
&lt;!--
## How do I use it?

After enabling the `HPAConfigurableTolerance`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) in
your Kubernetes v1.33 cluster, you can add your desired tolerance for your
HorizontalPodAutoscaler object.
--&gt;
&lt;h2 id=&#34;我如何使用它&#34;&gt;我如何使用它？&lt;/h2&gt;
&lt;p&gt;在你的 Kubernetes v1.33 集群中启用 &lt;code&gt;HPAConfigurableTolerance&lt;/code&gt;
[特性门控][/zh-cn/docs/reference/command-line-tools-reference/feature-gates/]后，
你可以为你的 HorizontalPodAutoscaler 对象添加期望的容差。&lt;/p&gt;
&lt;!--
Tolerances appear under the `spec.behavior.scaleDown` and
`spec.behavior.scaleUp` fields and can thus be different for scale up and scale
down. A typical usage would be to specify a small tolerance on scale up (to
react quickly to spikes), but higher on scale down (to avoid adding and removing
replicas too quickly in response to small metric fluctuations).

For example, an HPA with a tolerance of 5% on scale-down, and no tolerance on
scale-up, would look like the following:
--&gt;
&lt;p&gt;容差出现在 &lt;code&gt;spec.behavior.scaleDown&lt;/code&gt; 和 &lt;code&gt;spec.behavior.scaleUp&lt;/code&gt;
字段下，因此对于扩容和缩容可以有不同的设置。一个典型的用法是在扩容时指定一个小的容差（以快速响应峰值），
而在缩容时指定较大的容差（以避免因小的指标波动而过快地添加或移除副本）。&lt;/p&gt;
&lt;p&gt;例如，一个在缩容时有 5% 容差，在扩容时没有容差的 HPA 配置如下所示：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;autoscaling/v2&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;HorizontalPodAutoscaler&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;my-app&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;...&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;behavior&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;scaleDown&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tolerance&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;0.05&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;scaleUp&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tolerance&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;0&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
## I want all the details!

Get all the technical details by reading
[KEP-4951](https://github.com/kubernetes/enhancements/tree/master/keps/sig-autoscaling/4951-configurable-hpa-tolerance)
and follow [issue 4951](https://github.com/kubernetes/enhancements/issues/4951)
to be notified of the feature graduation.
--&gt;
&lt;h2 id=&#34;所有细节&#34;&gt;所有细节&lt;/h2&gt;
&lt;p&gt;通过阅读
&lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-autoscaling/4951-configurable-hpa-tolerance&#34;&gt;KEP-4951&lt;/a&gt;
获取所有技术细节，并关注 &lt;a href=&#34;https://github.com/kubernetes/enhancements/issues/4951&#34;&gt;Issue 4951&lt;/a&gt;
以获得&lt;strong&gt;特性毕业&lt;/strong&gt;的通知。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes 多容器 Pod：概述</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/04/22/multi-container-pods-overview/</link>
      <pubDate>Tue, 22 Apr 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/04/22/multi-container-pods-overview/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes Multicontainer Pods: An Overview&#34;
date: 2025-04-22
draft: false
slug: multi-container-pods-overview
author: Agata Skorupka (The Scale Factory)
--&gt;
&lt;!--
As cloud-native architectures continue to evolve, Kubernetes has become the go-to platform for deploying complex, distributed systems. One of the most powerful yet nuanced design patterns in this ecosystem is the sidecar pattern—a technique that allows developers to extend application functionality without diving deep into source code.
--&gt;
&lt;p&gt;随着云原生架构的不断演进，Kubernetes 已成为部署复杂分布式系统的首选平台。
在这个生态系统中，最强大却又微妙的设计模式之一是边车（Sidecar）
模式 —— 一种允许开发者扩展应用功能而不深入源代码的技术。&lt;/p&gt;
&lt;!--
## The origins of the sidecar pattern

Think of a sidecar like a trusty companion motorcycle attachment. Historically, IT infrastructures have always used auxiliary services to handle critical tasks. Before containers, we relied on background processes and helper daemons to manage logging, monitoring, and networking. The microservices revolution transformed this approach, making sidecars a structured and intentional architectural choice.
With the rise of microservices, the sidecar pattern became more clearly defined, allowing developers to offload specific responsibilities from the main service without altering its code. Service meshes like Istio and Linkerd have popularized sidecar proxies, demonstrating how these companion containers can elegantly handle observability, security, and traffic management in distributed systems.
--&gt;
&lt;h2 id=&#34;the-origins-of-the-sidecar-pattern&#34;&gt;边车模式的起源  &lt;/h2&gt;
&lt;p&gt;想象一下边车就像一个可靠的伴侣摩托车附件。历史上，IT 基础设施总是使用辅助服务来处理关键任务。
在容器出现之前，我们依赖后台进程和辅助守护程序来管理日志记录、监控和网络。
微服务革命改变了这种方法，使边车成为一种结构化且有意图的架构选择。
随着微服务的兴起，边车模式变得更加明确，允许开发者从主服务中卸载特定职责而不改变其代码。
诸如 Istio 和 Linkerd 之类的服务网格普及了边车代理，
展示了这些伴侣容器如何优雅地处理分布式系统中的可观察性、安全性和流量管理。&lt;/p&gt;
&lt;!--
## Kubernetes implementation

In Kubernetes, [sidecar containers](/docs/concepts/workloads/pods/sidecar-containers/) operate within
the same Pod as the main application, enabling communication and resource sharing.
Does this sound just like defining multiple containers along each other inside the Pod? It actually does, and
this is how sidecar containers had to be implemented before Kubernetes v1.29.0, which introduced
native support for sidecars.
Sidecar containers  can now be defined within a Pod manifest using the `spec.initContainers` field. What makes
it a sidecar container is that you specify it with `restartPolicy: Always`. You can see an example of this below, which is a partial snippet of the full Kubernetes manifest:
--&gt;
&lt;h2 id=&#34;kubernetes-implementation&#34;&gt;Kubernetes 实现  &lt;/h2&gt;
&lt;p&gt;在 Kubernetes 中，&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/pods/sidecar-containers/&#34;&gt;边车容器&lt;/a&gt;与主应用位于同一个
Pod 内，实现通信和资源共享。这听起来就像是在 Pod 内一起定义多个容器一样？实际上确实如此，
这也是在 Kubernetes v1.29.0 引入对边车的本地支持之前实现边车容器的唯一方式。
现在，边车容器可以使用 &lt;code&gt;spec.initContainers&lt;/code&gt; 字段在 Pod 清单中定义。
所指定容器之所以变成了边车容器，是因为你在规约中设置了 &lt;code&gt;restartPolicy: Always&lt;/code&gt;
你可以在下面看到一个示例，这是完整 Kubernetes 清单的一个片段：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;initContainers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;logshipper&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;alpine:latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;restartPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Always&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;[&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;sh&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;-c&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;tail -F /opt/logs.txt&amp;#39;&lt;/span&gt;]&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;volumeMounts&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;data&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;mountPath&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;/opt&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
That field name, `spec.initContainers` may sound confusing. How come when you want to define a sidecar container, you have to put an entry in the `spec.initContainers` array? `spec.initContainers` are run to completion just before main application starts, so they’re one-off, whereas sidecars often run in parallel to the main app container. It’s the `spec.initContainers` with `restartPolicy:Always` which differs classic [init containers](/docs/concepts/workloads/pods/init-containers/) from Kubernetes-native sidecar containers and ensures they are always up. 
--&gt;
&lt;p&gt;该字段名称 &lt;code&gt;spec.initContainers&lt;/code&gt; 可能听起来令人困惑。为何在定义边车容器时，必须在
&lt;code&gt;spec.initContainers&lt;/code&gt; 数组中添加条目？&lt;code&gt;spec.initContainers&lt;/code&gt;
在主应用启动前运行至完成，因此它们是一次性的，而边车容器通常与主应用容器并行运行。
正是通过带有 &lt;code&gt;restartPolicy:Always&lt;/code&gt; 的 &lt;code&gt;spec.initContainers&lt;/code&gt; 区分了经典的
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/pods/init-containers/&#34;&gt;Init 容器&lt;/a&gt;和
Kubernetes 原生的边车容器，并确保它们始终保持运行。&lt;/p&gt;
&lt;!--
## When to embrace (or avoid) sidecars

While the sidecar pattern can be useful in many cases, it is generally not the preferred approach unless the use case justifies it. Adding a sidecar increases complexity, resource consumption, and potential network latency. Instead, simpler alternatives such as built-in libraries or shared infrastructure should be considered first.
--&gt;
&lt;h2 id=&#34;when-to-embrace-or-avoid-sidecars&#34;&gt;何时采用（或避免使用）边车  &lt;/h2&gt;
&lt;p&gt;虽然边车模式在许多情况下非常有用，但除非使用场景证明其合理性，
否则通常不推荐优先采用这种方法。添加边车会增加复杂性、
资源消耗以及可能的网络延迟。因此，应首先考虑更简单的替代方案，
例如内置库或共享基础设施。&lt;/p&gt;
&lt;!--
**Deploy a sidecar when:**

1. You need to extend application functionality without touching the original code
1. Implementing cross-cutting concerns like logging, monitoring or security
1. Working with legacy applications requiring modern networking capabilities
1. Designing microservices that demand independent scaling and updates
--&gt;
&lt;p&gt;&lt;strong&gt;在以下情况部署边车：&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;你需要扩展应用功能，而无需修改原始代码&lt;/li&gt;
&lt;li&gt;实现日志记录、监控或安全等跨领域关注点&lt;/li&gt;
&lt;li&gt;处理需要现代网络功能的遗留应用&lt;/li&gt;
&lt;li&gt;设计需要独立扩展和更新的微服务&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
**Proceed with caution if:**

1. Resource efficiency is your primary concern
1. Minimal network latency is critical
1. Simpler alternatives exist
1. You want to minimize troubleshooting complexity
--&gt;
&lt;p&gt;&lt;strong&gt;谨慎行事，如果：&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;资源效率是你的首要考虑&lt;/li&gt;
&lt;li&gt;最小网络延迟至关重要&lt;/li&gt;
&lt;li&gt;存在更简单的替代方案&lt;/li&gt;
&lt;li&gt;你希望最小化故障排查的复杂性&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
## Four essential multi-container patterns

### Init container pattern

The **Init container** pattern is used to execute (often critical) setup tasks before the main application container starts. Unlike regular containers, init containers run to completion and then terminate, ensuring that preconditions for the main application are met.
--&gt;
&lt;h2 id=&#34;four-essential-multi-container-patterns&#34;&gt;四个基本的多容器模式  &lt;/h2&gt;
&lt;h3 id=&#34;init-container-pattern&#34;&gt;Init 容器模式  &lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Init 容器&lt;/strong&gt;模式用于在主应用容器启动之前执行（通常是关键的）设置任务。
与常规容器不同，Init 容器会运行至完成然后终止，确保满足主应用的前提条件。&lt;/p&gt;
&lt;!--
**Ideal for:**

1. Preparing configurations
1. Loading secrets
1. Verifying dependency availability
1. Running database migrations

The init container ensures your application starts in a predictable, controlled environment without code modifications.
--&gt;
&lt;p&gt;&lt;strong&gt;适合于：&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;准备配置&lt;/li&gt;
&lt;li&gt;加载密钥&lt;/li&gt;
&lt;li&gt;验证依赖项的可用性&lt;/li&gt;
&lt;li&gt;运行数据库迁移&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Init 容器确保你的应用在一个可预测、受控的环境中启动，而无需修改代码。&lt;/p&gt;
&lt;!--
### Ambassador pattern

An ambassador container provides Pod-local helper services that expose a simple way to access a network service. Commonly, ambassador containers send network requests on behalf of a an application container and
take care of challenges such as service discovery, peer identity verification, or encryption in transit.
--&gt;
&lt;h3 id=&#34;ambassador-pattern&#34;&gt;Ambassador 模式  &lt;/h3&gt;
&lt;p&gt;一个大使（Ambassador）容器提供了 Pod 本地的辅助服务，这些服务暴露了一种访问网络服务的简单方式。
通常，Ambassador 容器代表应用容器发送网络请求，并处理诸如服务发现、对等身份验证或传输中加密等挑战。&lt;/p&gt;
&lt;!--
**Perfect when you need to:**

1. Offload client connectivity concerns
1. Implement language-agnostic networking features
1. Add security layers like TLS
1. Create robust circuit breakers and retry mechanisms
--&gt;
&lt;p&gt;&lt;strong&gt;能够完美地处理以下需求：&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;卸载客户端连接问题&lt;/li&gt;
&lt;li&gt;实现语言无关的网络功能&lt;/li&gt;
&lt;li&gt;添加如 TLS 的安全层&lt;/li&gt;
&lt;li&gt;创建强大的断路器和重试机制&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
### Configuration helper

A _configuration helper_ sidecar provides configuration updates to an application dynamically, ensuring it always has access to the latest settings without disrupting the service. Often the helper needs to provide an initial
configuration before the application would be able to start successfully.
--&gt;
&lt;h3 id=&#34;configuration-helper&#34;&gt;配置助手  &lt;/h3&gt;
&lt;p&gt;一个&lt;strong&gt;配置助手&lt;/strong&gt;边车容器动态地向应用提供配置更新，
确保它始终可以访问最新的设置而不会中断服务。
通常，助手需要在应用能够成功启动之前提供初始配置。&lt;/p&gt;
&lt;!--
**Use cases:**

1. Fetching environment variables and secrets
1. Polling configuration changes
1. Decoupling configuration management from application logic
--&gt;
&lt;p&gt;&lt;strong&gt;使用场景：&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;获取环境变量和密钥&lt;/li&gt;
&lt;li&gt;轮询配置更改&lt;/li&gt;
&lt;li&gt;将配置管理与应用逻辑解耦&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
### Adapter pattern

An _adapter_ (or sometimes _façade_) container enables interoperability between the main application container and external services. It does this by translating data formats, protocols, or APIs.
--&gt;
&lt;h3 id=&#34;adapter-pattern&#34;&gt;适配器模式  &lt;/h3&gt;
&lt;p&gt;一个&lt;strong&gt;适配器（adapter）&lt;/strong&gt;（有时也称为&lt;strong&gt;切面（façade）&lt;/strong&gt;）容器使主应用容器与外部服务之间能够互操作。
它通过转换数据格式、协议或 API 来实现这一点。&lt;/p&gt;
&lt;!--
**Strengths:**

1. Transforming legacy data formats
1. Bridging communication protocols
1. Facilitating integration between mismatched services
--&gt;
&lt;p&gt;&lt;strong&gt;优点：&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;转换遗留数据格式&lt;/li&gt;
&lt;li&gt;搭建通信协议桥梁&lt;/li&gt;
&lt;li&gt;帮助不匹配服务之间的集成&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
## Wrap-up

While sidecar patterns offer tremendous flexibility, they&#39;re not a silver bullet. Each added sidecar introduces complexity, consumes resources, and potentially increases operational overhead. Always evaluate simpler alternatives first.
The key is strategic implementation: use sidecars as precision tools to solve specific architectural challenges, not as a default approach. When used correctly, they can improve security, networking, and configuration management in containerized environments.
Choose wisely, implement carefully, and let your sidecars elevate your container ecosystem.
--&gt;
&lt;h2 id=&#34;wrap-up&#34;&gt;总结  &lt;/h2&gt;
&lt;p&gt;尽管边车模式提供了巨大的灵活性，但它不是万能的。所添加的每个边车容器都会引入复杂性、
消耗资源，并可能增加操作负担。始终首先评估更简单的替代方案。
关键在于战略性实施：将边车用作解决特定架构挑战的精准工具，而不是默认选择。
正确使用时，它们可以提升容器化环境中的安全性、网络和配置管理。
明智地选择，谨慎地实施，让你的边车提升你的容器生态系统。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>kube-scheduler-simulator 介绍</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/04/07/introducing-kube-scheduler-simulator/</link>
      <pubDate>Mon, 07 Apr 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/04/07/introducing-kube-scheduler-simulator/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Introducing kube-scheduler-simulator&#34;
date: 2025-04-07
draft: false 
slug: introducing-kube-scheduler-simulator
author: Kensei Nakada (Tetrate)
--&gt;
&lt;!--
The Kubernetes Scheduler is a crucial control plane component that determines which node a Pod will run on. 
Thus, anyone utilizing Kubernetes relies on a scheduler.

[kube-scheduler-simulator](https://github.com/kubernetes-sigs/kube-scheduler-simulator) is a _simulator_ for the Kubernetes scheduler, that started as a [Google Summer of Code 2021](https://summerofcode.withgoogle.com/) project developed by me (Kensei Nakada) and later received a lot of contributions.
This tool allows users to closely examine the scheduler’s behavior and decisions. 
--&gt;
&lt;p&gt;Kubernetes 调度器（Scheduler）是一个关键的控制平面组件，负责决定 Pod 将运行在哪个节点上。&lt;br&gt;
因此，任何使用 Kubernetes 的人都依赖于调度器。&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/kubernetes-sigs/kube-scheduler-simulator&#34;&gt;kube-scheduler-simulator&lt;/a&gt;
是一个 Kubernetes 调度器的&lt;strong&gt;模拟器&lt;/strong&gt;，最初是作为
&lt;a href=&#34;https://summerofcode.withgoogle.com/&#34;&gt;Google Summer of Code 2021&lt;/a&gt;
项目由我（Kensei Nakada）开发的，后来收到了许多贡献。&lt;br&gt;
该工具允许用户深入检查调度器的行为和决策。&lt;/p&gt;
&lt;!--
It is useful for casual users who employ scheduling constraints (for example, [inter-Pod affinity](/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity/#affinity-and-anti-affinity))
and experts who extend the scheduler with custom plugins.
--&gt;
&lt;p&gt;对于使用调度约束（例如，
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity&#34;&gt;Pod 间亲和性&lt;/a&gt;）
的普通用户和通过自定义插件扩展调度器的专家来说，它都是非常有用的。&lt;/p&gt;
&lt;!--
## Motivation

The scheduler often appears as a black box, 
composed of many plugins that each contribute to the scheduling decision-making process from their unique perspectives. 
Understanding its behavior can be challenging due to the multitude of factors it considers. 

Even if a Pod appears to be scheduled correctly in a simple test cluster, it might have been scheduled based on different calculations than expected. This discrepancy could lead to unexpected scheduling outcomes when deployed in a large production environment.
--&gt;
&lt;h2 id=&#34;出发点&#34;&gt;出发点&lt;/h2&gt;
&lt;p&gt;调度器通常被视为一个“黑箱”，&lt;br&gt;
由许多插件组成，每个插件从其独特的角度对调度决策过程做出贡献。&lt;br&gt;
由于调度器考虑的因素繁多，理解其行为可能会非常具有挑战性。&lt;/p&gt;
&lt;p&gt;即使在一个简单的测试集群中，Pod 似乎被正确调度，它也可能基于与预期不同的计算逻辑进行调度。
这种差异可能会在大规模生产环境中导致意外的调度结果。&lt;/p&gt;
&lt;!--
Also, testing a scheduler is a complex challenge.
There are countless patterns of operations executed within a real cluster, making it unfeasible to anticipate every scenario with a finite number of tests. 
More often than not, bugs are discovered only when the scheduler is deployed in an actual cluster.
Actually, many bugs are found by users after shipping the release, 
even in the upstream kube-scheduler. 
--&gt;
&lt;p&gt;此外，测试调度器是一个复杂的挑战。&lt;br&gt;
在实际集群中执行的操作模式数不胜数，使得通过有限数量的测试来预见每种场景变得不可行。&lt;br&gt;
通常，只有当调度器部署到实际集群时，才会发现其中的 Bug。&lt;/p&gt;
&lt;p&gt;实际上，许多 Bug 是在发布版本后由用户发现的，即使是在上游 kube-scheduler 中也是如此。&lt;/p&gt;
&lt;!--
Having a development or sandbox environment for testing the scheduler — or, indeed, any Kubernetes controllers — is a common practice.
However, this approach falls short of capturing all the potential scenarios that might arise in a production cluster 
because a development cluster is often much smaller with notable differences in workload sizes and scaling dynamics.
It never sees the exact same use or exhibits the same behavior as its production counterpart.
--&gt;
&lt;p&gt;拥有一个用于测试调度器或任何 Kubernetes 控制器的开发或沙箱环境是常见做法。&lt;br&gt;
然而，这种方法不足以捕捉生产集群中可能出现的所有潜在场景，因为开发集群通常规模要小得多，
在工作负载大小和扩展动态方面存在显著差异。&lt;br&gt;
它永远不会看到与生产环境中完全相同的使用情况或表现出相同的行为。&lt;/p&gt;
&lt;!--
The kube-scheduler-simulator aims to solve those problems.
It enables users to test their scheduling constraints, scheduler configurations, 
and custom plugins while checking every detailed part of scheduling decisions.
It also allows users to create a simulated cluster environment, where they can test their scheduler
with the same resources as their production cluster without affecting actual workloads.
--&gt;
&lt;p&gt;kube-scheduler-simulator 旨在解决这些问题。&lt;br&gt;
它使用户能够在检查调度决策每一个细节的同时，测试他们的调度约束、调度器配置和自定义插件。&lt;br&gt;
它还允许用户创建一个模拟集群环境，在该环境中，他们可以使用与生产集群相同的资源来测试其调度器，
而不会影响实际的工作负载。&lt;/p&gt;
&lt;!--
## Features of the kube-scheduler-simulator

The kube-scheduler-simulator’s core feature is its ability to expose the scheduler&#39;s internal decisions.
The scheduler operates based on the [scheduling framework](/docs/concepts/scheduling-eviction/scheduling-framework/), 
using various plugins at different extension points,
filter nodes (Filter phase), score nodes (Score phase), and ultimately determine the best node for the Pod.
--&gt;
&lt;h2 id=&#34;kube-scheduler-simulator-的特性&#34;&gt;kube-scheduler-simulator 的特性&lt;/h2&gt;
&lt;p&gt;kube-scheduler-simulator 的核心特性在于它能够揭示调度器的内部决策过程。&lt;br&gt;
调度器基于 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/scheduling-eviction/scheduling-framework/&#34;&gt;scheduling framework&lt;/a&gt;
运作，在不同的扩展点使用各种插件，过滤节点（Filter 阶段）、为节点打分（Score 阶段），
并最终确定最适合 Pod 的节点。&lt;/p&gt;
&lt;!--
The simulator allows users to create Kubernetes resources and observe how each plugin influences the scheduling decisions for Pods.
This visibility helps users understand the scheduler’s workings and define appropriate scheduling constraints.



&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/images/blog/2025-04-07-kube-scheduler-simulator/simulator.png&#34;
         alt=&#34;Screenshot of the simulator web frontend that shows the detailed scheduling results per node and per extension point&#34;/&gt; &lt;figcaption&gt;
            &lt;h4&gt;The simulator web frontend&lt;/h4&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
--&gt;
&lt;p&gt;模拟器允许用户创建 Kubernetes 资源，并观察每个插件如何影响 Pod 的调度决策。&lt;br&gt;
这种可见性帮助用户理解调度器的工作机制并定义适当的调度约束。&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/images/blog/2025-04-07-kube-scheduler-simulator/simulator.png&#34;
         alt=&#34;模拟器 Web 前端的截图，显示了每个节点和每个扩展点的详细调度结果&#34;/&gt; &lt;figcaption&gt;
            &lt;h4&gt;模拟器 Web 前端&lt;/h4&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;!--
Inside the simulator, a debuggable scheduler runs instead of the vanilla scheduler. 
This debuggable scheduler outputs the results of each scheduler plugin at every extension point to the Pod’s annotations like the following manifest shows
and the web front end formats/visualizes the scheduling results based on these annotations.
--&gt;
&lt;p&gt;在模拟器内部，运行的是一个可调试的调度器，而不是普通的调度器。&lt;br&gt;
这个可调试的调度器会将每个调度器插件在各个扩展点的结果输出到 Pod 的注解中，
如下所示的清单所示，而 Web 前端则基于这些注解对调度结果进行格式化和可视化。&lt;/p&gt;
&lt;!--
# The JSONs within these annotations are manually formatted for clarity in the blog post. 
--&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Pod&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 为了使博客文章更清晰，这些注释中的 JSON 都是手动格式化的。&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;annotations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/bind-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;{&amp;#34;DefaultBinder&amp;#34;:&amp;#34;success&amp;#34;}&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/filter-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&amp;gt;-&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;node-jjfg5&amp;#34;:{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeName&amp;#34;:&amp;#34;passed&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesFit&amp;#34;:&amp;#34;passed&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeUnschedulable&amp;#34;:&amp;#34;passed&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;TaintToleration&amp;#34;:&amp;#34;passed&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        },
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;node-mtb5x&amp;#34;:{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeName&amp;#34;:&amp;#34;passed&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesFit&amp;#34;:&amp;#34;passed&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeUnschedulable&amp;#34;:&amp;#34;passed&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;TaintToleration&amp;#34;:&amp;#34;passed&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      }&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/finalscore-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&amp;gt;-&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;node-jjfg5&amp;#34;:{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;ImageLocality&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeAffinity&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesBalancedAllocation&amp;#34;:&amp;#34;52&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesFit&amp;#34;:&amp;#34;47&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;TaintToleration&amp;#34;:&amp;#34;300&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;VolumeBinding&amp;#34;:&amp;#34;0&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        },
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;node-mtb5x&amp;#34;:{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;ImageLocality&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeAffinity&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesBalancedAllocation&amp;#34;:&amp;#34;76&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesFit&amp;#34;:&amp;#34;73&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;TaintToleration&amp;#34;:&amp;#34;300&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;VolumeBinding&amp;#34;:&amp;#34;0&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      } &lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/permit-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;{}&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/permit-result-timeout&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;{}&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/postfilter-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;{}&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/prebind-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;{&amp;#34;VolumeBinding&amp;#34;:&amp;#34;success&amp;#34;}&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/prefilter-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;{}&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/prefilter-result-status&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&amp;gt;-&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;AzureDiskLimits&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;EBSLimits&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;GCEPDLimits&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;InterPodAffinity&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;NodeAffinity&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;NodePorts&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;NodeResourcesFit&amp;#34;:&amp;#34;success&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;NodeVolumeLimits&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;PodTopologySpread&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;VolumeBinding&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;VolumeRestrictions&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;VolumeZone&amp;#34;:&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      }&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/prescore-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&amp;gt;-&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;InterPodAffinity&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;NodeAffinity&amp;#34;:&amp;#34;success&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;NodeResourcesBalancedAllocation&amp;#34;:&amp;#34;success&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;NodeResourcesFit&amp;#34;:&amp;#34;success&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;PodTopologySpread&amp;#34;:&amp;#34;&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;TaintToleration&amp;#34;:&amp;#34;success&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      }&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/reserve-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#39;{&amp;#34;VolumeBinding&amp;#34;:&amp;#34;success&amp;#34;}&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/result-history&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&amp;gt;-&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      [
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/bind-result&amp;#34;:&amp;#34;{\&amp;#34;DefaultBinder\&amp;#34;:\&amp;#34;success\&amp;#34;}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/filter-result&amp;#34;:&amp;#34;{\&amp;#34;node-jjfg5\&amp;#34;:{\&amp;#34;NodeName\&amp;#34;:\&amp;#34;passed\&amp;#34;,\&amp;#34;NodeResourcesFit\&amp;#34;:\&amp;#34;passed\&amp;#34;,\&amp;#34;NodeUnschedulable\&amp;#34;:\&amp;#34;passed\&amp;#34;,\&amp;#34;TaintToleration\&amp;#34;:\&amp;#34;passed\&amp;#34;},\&amp;#34;node-mtb5x\&amp;#34;:{\&amp;#34;NodeName\&amp;#34;:\&amp;#34;passed\&amp;#34;,\&amp;#34;NodeResourcesFit\&amp;#34;:\&amp;#34;passed\&amp;#34;,\&amp;#34;NodeUnschedulable\&amp;#34;:\&amp;#34;passed\&amp;#34;,\&amp;#34;TaintToleration\&amp;#34;:\&amp;#34;passed\&amp;#34;}}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/finalscore-result&amp;#34;:&amp;#34;{\&amp;#34;node-jjfg5\&amp;#34;:{\&amp;#34;ImageLocality\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;NodeAffinity\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;NodeResourcesBalancedAllocation\&amp;#34;:\&amp;#34;52\&amp;#34;,\&amp;#34;NodeResourcesFit\&amp;#34;:\&amp;#34;47\&amp;#34;,\&amp;#34;TaintToleration\&amp;#34;:\&amp;#34;300\&amp;#34;,\&amp;#34;VolumeBinding\&amp;#34;:\&amp;#34;0\&amp;#34;},\&amp;#34;node-mtb5x\&amp;#34;:{\&amp;#34;ImageLocality\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;NodeAffinity\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;NodeResourcesBalancedAllocation\&amp;#34;:\&amp;#34;76\&amp;#34;,\&amp;#34;NodeResourcesFit\&amp;#34;:\&amp;#34;73\&amp;#34;,\&amp;#34;TaintToleration\&amp;#34;:\&amp;#34;300\&amp;#34;,\&amp;#34;VolumeBinding\&amp;#34;:\&amp;#34;0\&amp;#34;}}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/permit-result&amp;#34;:&amp;#34;{}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/permit-result-timeout&amp;#34;:&amp;#34;{}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/postfilter-result&amp;#34;:&amp;#34;{}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/prebind-result&amp;#34;:&amp;#34;{\&amp;#34;VolumeBinding\&amp;#34;:\&amp;#34;success\&amp;#34;}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/prefilter-result&amp;#34;:&amp;#34;{}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/prefilter-result-status&amp;#34;:&amp;#34;{\&amp;#34;AzureDiskLimits\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;EBSLimits\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;GCEPDLimits\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;InterPodAffinity\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;NodeAffinity\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;NodePorts\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;NodeResourcesFit\&amp;#34;:\&amp;#34;success\&amp;#34;,\&amp;#34;NodeVolumeLimits\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;PodTopologySpread\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;VolumeBinding\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;VolumeRestrictions\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;VolumeZone\&amp;#34;:\&amp;#34;\&amp;#34;}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/prescore-result&amp;#34;:&amp;#34;{\&amp;#34;InterPodAffinity\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;NodeAffinity\&amp;#34;:\&amp;#34;success\&amp;#34;,\&amp;#34;NodeResourcesBalancedAllocation\&amp;#34;:\&amp;#34;success\&amp;#34;,\&amp;#34;NodeResourcesFit\&amp;#34;:\&amp;#34;success\&amp;#34;,\&amp;#34;PodTopologySpread\&amp;#34;:\&amp;#34;\&amp;#34;,\&amp;#34;TaintToleration\&amp;#34;:\&amp;#34;success\&amp;#34;}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/reserve-result&amp;#34;:&amp;#34;{\&amp;#34;VolumeBinding\&amp;#34;:\&amp;#34;success\&amp;#34;}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/score-result&amp;#34;:&amp;#34;{\&amp;#34;node-jjfg5\&amp;#34;:{\&amp;#34;ImageLocality\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;NodeAffinity\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;NodeResourcesBalancedAllocation\&amp;#34;:\&amp;#34;52\&amp;#34;,\&amp;#34;NodeResourcesFit\&amp;#34;:\&amp;#34;47\&amp;#34;,\&amp;#34;TaintToleration\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;VolumeBinding\&amp;#34;:\&amp;#34;0\&amp;#34;},\&amp;#34;node-mtb5x\&amp;#34;:{\&amp;#34;ImageLocality\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;NodeAffinity\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;NodeResourcesBalancedAllocation\&amp;#34;:\&amp;#34;76\&amp;#34;,\&amp;#34;NodeResourcesFit\&amp;#34;:\&amp;#34;73\&amp;#34;,\&amp;#34;TaintToleration\&amp;#34;:\&amp;#34;0\&amp;#34;,\&amp;#34;VolumeBinding\&amp;#34;:\&amp;#34;0\&amp;#34;}}&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;kube-scheduler-simulator.sigs.k8s.io/selected-node&amp;#34;:&amp;#34;node-mtb5x&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      ]&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/score-result&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&amp;gt;-&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;node-jjfg5&amp;#34;:{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;ImageLocality&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeAffinity&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesBalancedAllocation&amp;#34;:&amp;#34;52&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesFit&amp;#34;:&amp;#34;47&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;TaintToleration&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;VolumeBinding&amp;#34;:&amp;#34;0&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        },
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        &amp;#34;node-mtb5x&amp;#34;:{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;ImageLocality&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeAffinity&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesBalancedAllocation&amp;#34;:&amp;#34;76&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;NodeResourcesFit&amp;#34;:&amp;#34;73&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;TaintToleration&amp;#34;:&amp;#34;0&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;            &amp;#34;VolumeBinding&amp;#34;:&amp;#34;0&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;      }&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kube-scheduler-simulator.sigs.k8s.io/selected-node&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;node-mtb5x&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
Users can also integrate [their custom plugins](/docs/concepts/scheduling-eviction/scheduling-framework/) or [extenders](https://github.com/kubernetes/design-proposals-archive/blob/main/scheduling/scheduler_extender.md), into the debuggable scheduler and visualize their results. 

This debuggable scheduler can also run standalone, for example, on any Kubernetes cluster or in integration tests. 
This would be useful to custom plugin developers who want to test their plugins or examine their custom scheduler in a real cluster with better debuggability.
--&gt;
&lt;p&gt;用户还可以将&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/scheduling-eviction/scheduling-framework/&#34;&gt;其自定义插件&lt;/a&gt;
或&lt;a href=&#34;https://github.com/kubernetes/design-proposals-archive/blob/main/scheduling/scheduler_extender.md&#34;&gt;扩展器&lt;/a&gt;
集成到可调试调度器中，并可视化其结果。&lt;/p&gt;
&lt;p&gt;这个可调试调度器还可以独立运行，例如，在任何 Kubernetes 集群上或在集成测试中运行。&lt;br&gt;
这对于希望测试其插件或在真实集群中以更好的可调试性检查其自定义调度器的插件开发者来说非常有用。&lt;/p&gt;
&lt;!--
## The simulator as a better dev cluster

As mentioned earlier, with a limited set of tests, it is impossible to predict every possible scenario in a real-world cluster.
Typically, users will test the scheduler in a small, development cluster before deploying it to production, hoping that no issues arise.
--&gt;
&lt;h2 id=&#34;作为更优开发集群的模拟器&#34;&gt;作为更优开发集群的模拟器&lt;/h2&gt;
&lt;p&gt;如前所述，由于测试用例的数量有限，不可能预测真实世界集群中的每一种可能场景。&lt;br&gt;
通常，用户会在一个小型开发集群中测试调度器，然后再将其部署到生产环境中，
希望能不出现任何问题。&lt;/p&gt;
&lt;!--
[The simulator’s importing feature](https://github.com/kubernetes-sigs/kube-scheduler-simulator/blob/master/simulator/docs/import-cluster-resources.md)
provides a solution by allowing users to simulate deploying a new scheduler version in a production-like environment without impacting their live workloads.

By continuously syncing between a production cluster and the simulator, users can safely test a new scheduler version with the same resources their production cluster handles. 
Once confident in its performance, they can proceed with the production deployment, reducing the risk of unexpected issues.
--&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/kubernetes-sigs/kube-scheduler-simulator/blob/master/simulator/docs/import-cluster-resources.md&#34;&gt;模拟器的导入功能&lt;/a&gt;
通过允许用户在类似生产环境的模拟中部署新的调度器版本而不影响其线上工作负载，
提供了一种解决方案。&lt;/p&gt;
&lt;p&gt;通过在生产集群和模拟器之间进行持续同步，用户可以安全地使用与生产集群相同的资源测试新的调度器版本。
一旦对其性能感到满意，便可以继续进行生产部署，从而减少意外问题的风险。&lt;/p&gt;
&lt;!--
## What are the use cases?

1. **Cluster users**: Examine if scheduling constraints (for example, PodAffinity, PodTopologySpread) work as intended.
1. **Cluster admins**: Assess how a cluster would behave with changes to the scheduler configuration.
1. **Scheduler plugin developers**: Test a custom scheduler plugins or extenders, use the debuggable scheduler in integration tests or development clusters, or use the [syncing](https://github.com/kubernetes-sigs/kube-scheduler-simulator/blob/simulator/v0.3.0/simulator/docs/import-cluster-resources.md) feature for testing within a production-like environment.
--&gt;
&lt;h2 id=&#34;有哪些使用场景&#34;&gt;有哪些使用场景？&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;集群用户&lt;/strong&gt;：检查调度约束（例如，PodAffinity、PodTopologySpread）是否按预期工作。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;集群管理员&lt;/strong&gt;：评估在调度器配置更改后集群的行为表现。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;调度器插件开发者&lt;/strong&gt;：测试自定义调度器插件或扩展器，在集成测试或开发集群中使用可调试调度器，
或利用&lt;a href=&#34;https://github.com/kubernetes-sigs/kube-scheduler-simulator/blob/simulator/v0.3.0/simulator/docs/import-cluster-resources.md&#34;&gt;同步&lt;/a&gt;
功能在类似生产环境的环境中进行测试。&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
## Getting started

The simulator only requires Docker to be installed on a machine; a Kubernetes cluster is not necessary.
--&gt;
&lt;h2 id=&#34;入门指南&#34;&gt;入门指南&lt;/h2&gt;
&lt;p&gt;模拟器仅要求在机器上安装 Docker；并不需要 Kubernetes 集群。&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;git clone git@github.com:kubernetes-sigs/kube-scheduler-simulator.git
cd kube-scheduler-simulator
make docker_up
&lt;/code&gt;&lt;/pre&gt;&lt;!--
You can then access the simulator&#39;s web UI at `http://localhost:3000`.

Visit the [kube-scheduler-simulator repository](https://sigs.k8s.io/kube-scheduler-simulator) for more details!
--&gt;
&lt;p&gt;然后，你可以通过访问 &lt;code&gt;http://localhost:3000&lt;/code&gt; 来使用模拟器的 Web UI。&lt;/p&gt;
&lt;p&gt;更多详情，请访问 &lt;a href=&#34;https://sigs.k8s.io/kube-scheduler-simulator&#34;&gt;kube-scheduler-simulator 仓库&lt;/a&gt;！&lt;/p&gt;
&lt;!--
## Getting involved 

The scheduler simulator is developed by [Kubernetes SIG Scheduling](https://github.com/kubernetes/community/blob/master/sig-scheduling/README.md#kube-scheduler-simulator). Your feedback and contributions are welcome!
--&gt;
&lt;h2 id=&#34;参与其中&#34;&gt;参与其中&lt;/h2&gt;
&lt;p&gt;调度器模拟器由
&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-scheduling/README.md#kube-scheduler-simulator&#34;&gt;Kubernetes SIG Scheduling&lt;/a&gt;
开发。欢迎你提供反馈并参与贡献！&lt;/p&gt;
&lt;!--
Open issues or PRs at the [kube-scheduler-simulator repository](https://sigs.k8s.io/kube-scheduler-simulator).
Join the conversation on the [#sig-scheduling](https://kubernetes.slack.com/messages/sig-scheduling) slack channel.
--&gt;
&lt;p&gt;在 &lt;a href=&#34;https://sigs.k8s.io/kube-scheduler-simulator&#34;&gt;kube-scheduler-simulator 仓库&lt;/a&gt;开启问题或提交 PR。&lt;/p&gt;
&lt;p&gt;加入 &lt;a href=&#34;https://kubernetes.slack.com/messages/sig-scheduling&#34;&gt;#sig-scheduling&lt;/a&gt;
Slack 频道参与讨论。&lt;/p&gt;
&lt;!--
## Acknowledgments

The simulator has been maintained by dedicated volunteer engineers, overcoming many challenges to reach its current form. 

A big shout out to all [the awesome contributors](https://github.com/kubernetes-sigs/kube-scheduler-simulator/graphs/contributors)!
--&gt;
&lt;h2 id=&#34;致谢&#34;&gt;致谢&lt;/h2&gt;
&lt;p&gt;模拟器由致力于该项目的志愿者工程师们维护，克服了许多挑战才达到了现在的形式。&lt;/p&gt;
&lt;p&gt;特别感谢所有&lt;a href=&#34;https://github.com/kubernetes-sigs/kube-scheduler-simulator/graphs/contributors&#34;&gt;杰出的贡献者&lt;/a&gt;！&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.33 预览</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/03/26/kubernetes-v1-33-upcoming-changes/</link>
      <pubDate>Wed, 26 Mar 2025 10:30:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/03/26/kubernetes-v1-33-upcoming-changes/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#39;Kubernetes v1.33 sneak peek&#39;
date: 2025-03-26T10:30:00-08:00
slug: kubernetes-v1-33-upcoming-changes
author: &gt;
  Agustina Barbetta,
  Aakanksha Bhende,
  Udi Hofesh,
  Ryota Sawada,
  Sneha Yadav
--&gt;
&lt;!--
As the release of Kubernetes v1.33 approaches, the Kubernetes project continues to evolve. Features may be deprecated, removed, or replaced to improve the overall health of the project. This blog post outlines some planned changes for the v1.33 release, which the release team believes you should be aware of to ensure the continued smooth operation of your Kubernetes environment and to keep you up-to-date with the latest developments.  The information below is based on the current status of the v1.33 release and is subject to change before the final release date.
--&gt;
&lt;p&gt;随着 Kubernetes v1.33 版本的发布临近，Kubernetes 项目仍在不断发展。
为了提升项目的整体健康状况，某些特性可能会被弃用、移除或替换。
这篇博客文章概述了 v1.33 版本的一些计划变更，发布团队认为你有必要了解这些内容，
以确保 Kubernetes 环境的持续平稳运行，并让你掌握最新的发展动态。
以下信息基于 v1.33 版本的当前状态，在最终发布日期之前可能会有所变化。&lt;/p&gt;
&lt;!--
## The Kubernetes API removal and deprecation process

The Kubernetes project has a well-documented [deprecation policy](/docs/reference/using-api/deprecation-policy/) for features. This policy states that stable APIs may only be deprecated when a newer, stable version of that same API is available and that APIs have a minimum lifetime for each stability level. A deprecated API has been marked for removal in a future Kubernetes release. It will continue to function until removal (at least one year from the deprecation), but usage will result in a warning being displayed. Removed APIs are no longer available in the current version, at which point you must migrate to using the replacement.
--&gt;
&lt;h2 id=&#34;kubernetes-api-的移除与弃用流程&#34;&gt;Kubernetes API 的移除与弃用流程&lt;/h2&gt;
&lt;p&gt;Kubernetes 项目针对特性的弃用有一套完善的&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/using-api/deprecation-policy/&#34;&gt;弃用政策&lt;/a&gt;。
该政策规定，只有在有更新的、稳定的同名 API 可用时，才能弃用稳定的 API，
并且每个稳定性级别的 API 都有最低的生命周期要求。被弃用的 API 已被标记为将在未来的
Kubernetes 版本中移除。在移除之前（自弃用起至少一年内），它仍然可以继续使用，
但使用时会显示警告信息。已被移除的 API 在当前版本中不再可用，届时你必须迁移到使用替代方案。&lt;/p&gt;
&lt;!--
* Generally available (GA) or stable API versions may be marked as deprecated but must not be removed within a major version of Kubernetes.

* Beta or pre-release API versions must be supported for 3 releases after the deprecation.

* Alpha or experimental API versions may be removed in any release without prior deprecation notice; this process can become a withdrawal in cases where a different implementation for the same feature is already in place.
--&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;一般可用（GA）或稳定 API 版本可以被标记为已弃用，但在 Kubernetes
的一个主要版本内不得移除。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;测试版或预发布 API 版本在弃用后必须支持至少三个发行版本。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Alpha 或实验性 API 版本可以在任何版本中被移除，且无需事先发出弃用通知；
如果同一特性已经有了不同的实现，这个过程可能会变为撤回。&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Whether an API is removed as a result of a feature graduating from beta to stable, or because that API simply did not succeed, all removals comply with this deprecation policy. Whenever an API is removed, migration options are communicated in the [deprecation guide](/docs/reference/using-api/deprecation-guide/).
--&gt;
&lt;p&gt;无论是由于某个特性从测试阶段升级为稳定阶段而导致 API 被移除，还是因为该
API 未能成功，所有的移除操作都遵循此弃用政策。每当一个 API 被移除时，
迁移选项都会在&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/using-api/deprecation-guide/&#34;&gt;弃用指南&lt;/a&gt;中进行说明。&lt;/p&gt;
&lt;!--
## Deprecations and removals for Kubernetes v1.33

### Deprecation of the stable Endpoints API

The [EndpointSlices](/docs/concepts/services-networking/endpoint-slices/) API has been stable since v1.21, which effectively replaced the original Endpoints API. While the original Endpoints API was simple and straightforward, it also posed some challenges when scaling to large numbers of network endpoints. The EndpointSlices API has introduced new features such as dual-stack networking, making the original Endpoints API ready for deprecation.
--&gt;
&lt;h2 id=&#34;kubernetes-v1-33-的弃用与移除&#34;&gt;Kubernetes v1.33 的弃用与移除&lt;/h2&gt;
&lt;h3 id=&#34;稳定版-endpoints-api-的弃用&#34;&gt;稳定版 Endpoints API 的弃用&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/services-networking/endpoint-slices/&#34;&gt;EndpointSlices&lt;/a&gt; API
自 v1.21 起已稳定，实际上取代了原有的 Endpoints API。虽然原有的 Endpoints API 简单直接，
但在扩展到大量网络端点时也带来了一些挑战。EndpointSlices API 引入了诸如双栈网络等新特性，
使得原有的 Endpoints API 已准备好被弃用。&lt;/p&gt;
&lt;!--
This deprecation only impacts those who use the Endpoints API directly from workloads or scripts; these users should migrate to use EndpointSlices instead. There will be a dedicated blog post with more details on the deprecation implications and migration plans in the coming weeks.

You can find more in [KEP-4974: Deprecate v1.Endpoints](https://kep.k8s.io/4974).
--&gt;
&lt;p&gt;此弃用仅影响那些直接在工作负载或脚本中使用 Endpoints API 的用户；
这些用户应迁移到使用 EndpointSlices。未来几周内将发布一篇专门的博客文章，
详细介绍弃用的影响和迁移计划。&lt;/p&gt;
&lt;p&gt;你可以在 &lt;a href=&#34;https://kep.k8s.io/4974&#34;&gt;KEP-4974: Deprecate v1.Endpoints&lt;/a&gt;
中找到更多信息。&lt;/p&gt;
&lt;!--
### Removal of kube-proxy version information in node status

Following its deprecation in v1.31, as highlighted in the [release announcement](/blog/2024/07/19/kubernetes-1-31-upcoming-changes/#deprecation-of-status-nodeinfo-kubeproxyversion-field-for-nodes-kep-4004-https-github-com-kubernetes-enhancements-issues-4004), the `status.nodeInfo.kubeProxyVersion` field will be removed in v1.33. This field was set by kubelet, but its value was not consistently accurate. As it has been disabled by default since v1.31, the v1.33 release will remove this field entirely.
--&gt;
&lt;h3 id=&#34;节点状态中-kube-proxy-版本信息的移除&#34;&gt;节点状态中 kube-proxy 版本信息的移除&lt;/h3&gt;
&lt;p&gt;继在 v1.31 中被弃用，并在&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2024/07/19/kubernetes-1-31-upcoming-changes/#deprecation-of-status-nodeinfo-kubeproxyversion-field-for-nodes-kep-4004-https-github-com-kubernetes-enhancements-issues-4004&#34;&gt;发布说明&lt;/a&gt;中强调后，
&lt;code&gt;status.nodeInfo.kubeProxyVersion&lt;/code&gt; 字段将在 v1.33 中被移除。
此字段由 kubelet 设置，但其值并不总是准确的。由于自 v1.31
起该字段默认已被禁用，v1.33 发行版将完全移除此字段。&lt;/p&gt;
&lt;!--
You can find more in [KEP-4004: Deprecate status.nodeInfo.kubeProxyVersion field](https://kep.k8s.io/4004).

### Removal of host network support for Windows pods
--&gt;
&lt;p&gt;你可以在 &lt;a href=&#34;https://kep.k8s.io/4004&#34;&gt;KEP-4004: Deprecate status.nodeInfo.kubeProxyVersion field&lt;/a&gt;
中找到更多信息。&lt;/p&gt;
&lt;h3 id=&#34;移除对-windows-pod-的主机网络支持&#34;&gt;移除对 Windows Pod 的主机网络支持&lt;/h3&gt;
&lt;!--
Windows Pod networking aimed to achieve feature parity with Linux and provide better cluster density by allowing containers to use the Node’s networking namespace.
The original implementation landed as alpha with v1.26, but as it faced unexpected containerd behaviours,
and alternative solutions were available, the Kubernetes project has decided to withdraw the associated
KEP. We&#39;re expecting to see support fully removed in v1.33.
--&gt;
&lt;p&gt;Windows Pod 网络旨在通过允许容器使用节点的网络命名空间来实现与 Linux 的特性对等，
并提供更高的集群密度。最初的实现作为 Alpha 版本在 v1.26 中引入，但由于遇到了未预期的
containerd 行为，且存在替代方案，Kubernetes 项目决定撤回相关的 KEP。
我们预计在 v1.33 中完全移除对该特性的支持。&lt;/p&gt;
&lt;!--
You can find more in [KEP-3503: Host network support for Windows pods](https://kep.k8s.io/3503).

## Featured improvement of Kubernetes v1.33

As authors of this article, we picked one improvement as the most significant change to call out!
--&gt;
&lt;p&gt;你可以在 &lt;a href=&#34;https://kep.k8s.io/3503&#34;&gt;KEP-3503: Host network support for Windows pods&lt;/a&gt;
中找到更多信息。&lt;/p&gt;
&lt;h2 id=&#34;kubernetes-v1-33-的特色改进&#34;&gt;Kubernetes v1.33 的特色改进&lt;/h2&gt;
&lt;p&gt;作为本文的作者，我们挑选了一项改进作为最重要的变更来特别提及！&lt;/p&gt;
&lt;!--
### Support for user namespaces within Linux Pods

One of the oldest open KEPs today is [KEP-127](https://kep.k8s.io/127), Pod security improvement by using Linux [User namespaces](/docs/concepts/workloads/pods/user-namespaces/) for Pods. This KEP was first opened in late 2016, and after multiple iterations, had its alpha release in v1.25, initial beta in v1.30 (where it was disabled by default), and now is set to be a part of v1.33, where the feature is available by default.
--&gt;
&lt;h3 id=&#34;linux-pods-中用户命名空间的支持&#34;&gt;Linux Pods 中用户命名空间的支持&lt;/h3&gt;
&lt;p&gt;当前最古老的开放 KEP 之一是 &lt;a href=&#34;https://kep.k8s.io/127&#34;&gt;KEP-127&lt;/a&gt;，
通过使用 Linux &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/pods/user-namespaces/&#34;&gt;用户命名空间&lt;/a&gt;为
Pod 提供安全性改进。该 KEP 最初在 2016 年末提出，经过多次迭代，在 v1.25 中发布了 Alpha 版本，
在 v1.30 中首次进入 Beta 阶段（在此版本中默认禁用），现在它将成为 v1.33 的一部分，
默认情况下即可使用该特性。&lt;/p&gt;
&lt;!--
This support will not impact existing Pods unless you manually specify `pod.spec.hostUsers` to opt in. As highlighted in the [v1.30 sneak peek blog](/blog/2024/03/12/kubernetes-1-30-upcoming-changes/), this is an important milestone for mitigating vulnerabilities.

You can find more in [KEP-127: Support User Namespaces in pods](https://kep.k8s.io/127).
--&gt;
&lt;p&gt;除非你手动指定 &lt;code&gt;pod.spec.hostUsers&lt;/code&gt; 以选择使用此特性，否则此支持不会影响现有的 Pod。
正如在 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2024/03/12/kubernetes-1-30-upcoming-changes/&#34;&gt;v1.30 预览博客&lt;/a&gt;中强调的那样，
就缓解漏洞的影响而言，这是一个重要里程碑。&lt;/p&gt;
&lt;p&gt;你可以在 &lt;a href=&#34;https://kep.k8s.io/127&#34;&gt;KEP-127: Support User Namespaces in pods&lt;/a&gt;
中找到更多信息。&lt;/p&gt;
&lt;!--
## Selected other Kubernetes v1.33 improvements

The following list of enhancements is likely to be included in the upcoming v1.33 release. This is not a commitment and the release content is subject to change.
--&gt;
&lt;h2 id=&#34;精选的其他-kubernetes-v1-33-改进&#34;&gt;精选的其他 Kubernetes v1.33 改进&lt;/h2&gt;
&lt;p&gt;以下列出的改进很可能会包含在即将到来的 v1.33 发行版中。
这些改进尚无法承诺，发行内容仍有可能发生变化。&lt;/p&gt;
&lt;!--
### In-place resource resize for vertical scaling of Pods

When provisioning a Pod, you can use various resources such as Deployment, StatefulSet, etc. Scalability requirements may need horizontal scaling by updating the Pod replica count, or vertical scaling by updating resources allocated to Pod’s container(s). Before this enhancement, container resources defined in a Pod&#39;s `spec` were immutable, and updating any of these details within a Pod template would trigger Pod replacement.
--&gt;
&lt;h3 id=&#34;pod-垂直扩展的就地资源调整&#34;&gt;Pod 垂直扩展的就地资源调整&lt;/h3&gt;
&lt;p&gt;在制备某个 Pod 时，你可以使用诸如 Deployment、StatefulSet 等多种资源。
为了满足可扩缩性需求，可能需要通过更新 Pod 副本数量进行水平扩缩，或通过更新分配给
Pod 容器的资源进行垂直扩缩。在此增强特性之前，Pod 的 &lt;code&gt;spec&lt;/code&gt;
中定义的容器资源是不可变的，更新 Pod 模板中的这类细节会触发 Pod 的替换。&lt;/p&gt;
&lt;!--
But what if you could dynamically update the resource configuration for your existing Pods without restarting them?

The [KEP-1287](https://kep.k8s.io/1287) is precisely to allow such in-place Pod updates. It opens up various possibilities of vertical scale-up for stateful processes without any downtime, seamless scale-down when the traffic is low, and even allocating larger resources during startup that is eventually reduced once the initial setup is complete. This was released as alpha in v1.27, and is expected to land as beta in v1.33.
--&gt;
&lt;p&gt;但是如果可以在不重启的情况下动态更新现有 Pod 的资源配置，那会怎样呢？&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://kep.k8s.io/1287&#34;&gt;KEP-1287&lt;/a&gt; 正是为了实现这种就地 Pod 更新而设计的。
它为无状态进程的垂直扩缩开辟了多种可能性，例如在不停机的情况下进行扩容、
在流量较低时无缝缩容，甚至在启动时分配更多资源，待初始设置完成后减少资源分配。
该特性在 v1.27 中以 Alpha 版本发布，并预计在 v1.33 中进入 beta 阶段。&lt;/p&gt;
&lt;!--
You can find more in [KEP-1287: In-Place Update of Pod Resources](https://kep.k8s.io/1287).

### DRA’s ResourceClaim Device Status graduates to beta
--&gt;
&lt;p&gt;你可以在 &lt;a href=&#34;https://kep.k8s.io/1287&#34;&gt;KEP-1287：Pod 资源的就地更新&lt;/a&gt;中找到更多信息。&lt;/p&gt;
&lt;h3 id=&#34;dra-的-resourceclaim-设备状态升级为-beta&#34;&gt;DRA 的 ResourceClaim 设备状态升级为 Beta&lt;/h3&gt;
&lt;!--
The `devices` field in ResourceClaim `status`, originally introduced in the v1.32 release, is likely to graduate to beta in v1.33. This field allows drivers to report device status data, improving both observability and troubleshooting capabilities.
--&gt;
&lt;p&gt;在 v1.32 版本中首次引入的 ResourceClaim &lt;code&gt;status&lt;/code&gt; 中的 &lt;code&gt;devices&lt;/code&gt; 字段，
预计将在 v1.33 中升级为 beta 阶段。此字段允许驱动程序报告设备状态数据，
从而提升可观测性和故障排查能力。&lt;/p&gt;
&lt;!--
For example, reporting the interface name, MAC address, and IP addresses of network interfaces in the status of a ResourceClaim can significantly help in configuring and managing network services, as well as in debugging network related issues. You can read more about ResourceClaim Device Status in [Dynamic Resource Allocation: ResourceClaim Device Status](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#resourceclaim-device-status) document.
--&gt;
&lt;p&gt;例如，在 ResourceClaim 的状态中报告网络接口的接口名称、MAC 地址和 IP 地址，
可以显著帮助配置和管理网络服务，并且在调试网络相关问题时也非常有用。
你可以在&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#resourceclaim-device-status&#34;&gt;动态资源分配：ResourceClaim 设备状态&lt;/a&gt;
文档中阅读关于 ResourceClaim 设备状态的更多信息。&lt;/p&gt;
&lt;!--
Also, you can find more about the planned enhancement in [KEP-4817: DRA: Resource Claim Status with possible standardized network interface data](https://kep.k8s.io/4817).
--&gt;
&lt;p&gt;此外，你可以在
&lt;a href=&#34;https://kep.k8s.io/4817&#34;&gt;KEP-4817: DRA: Resource Claim Status with possible standardized network interface data&lt;/a&gt;
中找到更多关于此计划增强特性的信息。&lt;/p&gt;
&lt;!--
### Ordered namespace deletion

This KEP introduces a more structured deletion process for Kubernetes namespaces to ensure secure and deterministic resource removal. The current semi-random deletion order can create security gaps or unintended behaviour, such as Pods persisting after their associated NetworkPolicies are deleted. By enforcing a structured deletion sequence that respects logical and security dependencies, this approach ensures Pods are removed before other resources. The design improves Kubernetes’s security and reliability by mitigating risks associated with non-deterministic deletions.
--&gt;
&lt;h3 id=&#34;有序的命名空间删除&#34;&gt;有序的命名空间删除&lt;/h3&gt;
&lt;p&gt;此 KEP 为 Kubernetes 命名空间引入了一种更为结构化的删除流程，
以确保更为安全且更为确定的资源移除。当前半随机的删除顺序可能会导致安全漏洞或意外行为，
例如在相关的 NetworkPolicy 被删除后，Pod 仍然存在。
通过强制执行尊重逻辑和安全依赖关系的结构化删除顺序，此方法确保在删除其他资源之前先删除 Pod。
这种设计通过减少与非确定性删除相关的风险，提升了 Kubernetes 的安全性和可靠性。&lt;/p&gt;
&lt;!--
You can find more in [KEP-5080: Ordered namespace deletion](https://kep.k8s.io/5080).
--&gt;
&lt;p&gt;你可以在 &lt;a href=&#34;https://kep.k8s.io/5080&#34;&gt;KEP-5080: Ordered namespace deletion&lt;/a&gt;
中找到更多信息。&lt;/p&gt;
&lt;!--
### Enhancements for indexed job management

These two KEPs are both set to graduate to GA to provide better reliability for job handling, specifically for indexed jobs. [KEP-3850](https://kep.k8s.io/3850) provides per-index backoff limits for indexed jobs, which allows each index to be fully independent of other indexes. Also, [KEP-3998](https://kep.k8s.io/3998) extends Job API to define conditions for making an indexed job as successfully completed when not all indexes are succeeded.
--&gt;
&lt;h3 id=&#34;针对带索引作业-indexed-job-管理的增强&#34;&gt;针对带索引作业（Indexed Job）管理的增强&lt;/h3&gt;
&lt;p&gt;这两个 KEP 都计划升级为 GA，以提供更好的作业处理可靠性，特别是针对索引作业。
&lt;a href=&#34;https://kep.k8s.io/3850&#34;&gt;KEP-3850&lt;/a&gt; 为索引作业中的不同索引分别支持独立的回退限制，
这使得每个索引可以完全独立于其他索引。此外，&lt;a href=&#34;https://kep.k8s.io/3998&#34;&gt;KEP-3998&lt;/a&gt;
扩展了 Job API，定义了在并非所有索引都成功的情况下将索引作业标记为成功完成的条件。&lt;/p&gt;
&lt;!--
You can find more in [KEP-3850: Backoff Limit Per Index For Indexed Jobs](https://kep.k8s.io/3850) and [KEP-3998: Job success/completion policy](https://kep.k8s.io/3998).
--&gt;
&lt;p&gt;你可以在 &lt;a href=&#34;https://kep.k8s.io/3850&#34;&gt;KEP-3850: Backoff Limit Per Index For Indexed Jobs&lt;/a&gt; 和
&lt;a href=&#34;https://kep.k8s.io/3998&#34;&gt;KEP-3998: Job success/completion policy&lt;/a&gt; 中找到更多信息。&lt;/p&gt;
&lt;!--
## Want to know more?

New features and deprecations are also announced in the Kubernetes release notes. We will formally announce what&#39;s new in [Kubernetes v1.33](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.33.md) as part of the CHANGELOG for that release.
--&gt;
&lt;h2 id=&#34;想了解更多&#34;&gt;想了解更多？&lt;/h2&gt;
&lt;p&gt;新特性和弃用也会在 Kubernetes 发行说明中宣布。我们将在该版本的
CHANGELOG 中正式宣布 &lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.33.md&#34;&gt;Kubernetes v1.33&lt;/a&gt;
的新内容。&lt;/p&gt;
&lt;!--
Kubernetes v1.33 release is planned for **Wednesday, 23rd April, 2025**. Stay tuned for updates!

You can also see the announcements of changes in the release notes for:
--&gt;
&lt;p&gt;Kubernetes v1.33 版本计划于 &lt;strong&gt;2025年4月23日星期三&lt;/strong&gt;发布。请持续关注以获取更新！&lt;/p&gt;
&lt;p&gt;你也可以在以下版本的发行说明中查看变更公告：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.32.md&#34;&gt;Kubernetes v1.32&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md&#34;&gt;Kubernetes v1.31&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md&#34;&gt;Kubernetes v1.30&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Get involved

The simplest way to get involved with Kubernetes is by joining one of the many [Special Interest Groups](https://github.com/kubernetes/community/blob/master/sig-list.md) (SIGs) that align with your interests. Have something you’d like to broadcast to the Kubernetes community? Share your voice at our weekly [community meeting](https://github.com/kubernetes/community/tree/master/communication), and through the channels below. Thank you for your continued feedback and support.
--&gt;
&lt;h2 id=&#34;参与进来&#34;&gt;参与进来&lt;/h2&gt;
&lt;p&gt;参与 Kubernetes 最简单的方式是加入与你兴趣相符的众多&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-list.md&#34;&gt;特别兴趣小组&lt;/a&gt;（SIG）
之一。你有什么想向 Kubernetes 社区广播的内容吗？
通过我们每周的&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/communication&#34;&gt;社区会议&lt;/a&gt;和以下渠道分享你的声音。
感谢你持续的反馈和支持。&lt;/p&gt;
&lt;!--
- Follow us on Bluesky [@kubernetes.io](https://bsky.app/profile/kubernetes.io) for the latest updates
- Join the community discussion on [Discuss](https://discuss.kubernetes.io/)
- Join the community on [Slack](http://slack.k8s.io/)
- Post questions (or answer questions) on [Server Fault](https://serverfault.com/questions/tagged/kubernetes) or [Stack Overflow](http://stackoverflow.com/questions/tagged/kubernetes)
- Share your Kubernetes [story](https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform)
- Read more about what’s happening with Kubernetes on the [blog](https://kubernetes.io/blog/)
- Learn more about the [Kubernetes Release Team](https://github.com/kubernetes/sig-release/tree/master/release-team)
--&gt;
&lt;ul&gt;
&lt;li&gt;在 Bluesky 上关注我们 &lt;a href=&#34;https://bsky.app/profile/kubernetes.io&#34;&gt;@kubernetes.io&lt;/a&gt; 以获取最新更新&lt;/li&gt;
&lt;li&gt;在 &lt;a href=&#34;https://discuss.kubernetes.io/&#34;&gt;Discuss&lt;/a&gt; 上参与社区讨论&lt;/li&gt;
&lt;li&gt;在 &lt;a href=&#34;http://slack.k8s.io/&#34;&gt;Slack&lt;/a&gt; 上加入社区&lt;/li&gt;
&lt;li&gt;在 &lt;a href=&#34;https://serverfault.com/questions/tagged/kubernetes&#34;&gt;Server Fault&lt;/a&gt; 或
&lt;a href=&#34;http://stackoverflow.com/questions/tagged/kubernetes&#34;&gt;Stack Overflow&lt;/a&gt; 上提问（或回答问题）&lt;/li&gt;
&lt;li&gt;分享你的 Kubernetes &lt;a href=&#34;https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform&#34;&gt;故事&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;在&lt;a href=&#34;https://kubernetes.io/zh-cn/blog/&#34;&gt;博客&lt;/a&gt;上阅读更多关于 Kubernetes 最新动态的内容&lt;/li&gt;
&lt;li&gt;了解更多关于 &lt;a href=&#34;https://github.com/kubernetes/sig-release/tree/master/release-team&#34;&gt;Kubernetes 发布团队&lt;/a&gt;的信息&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>ingress-nginx CVE-2025-1974 须知</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/03/24/ingress-nginx-cve-2025-1974/</link>
      <pubDate>Mon, 24 Mar 2025 12:00:00 -0800</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/03/24/ingress-nginx-cve-2025-1974/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Ingress-nginx CVE-2025-1974: What You Need to Know&#34;
date: 2025-03-24T12:00:00-08:00
slug: ingress-nginx-CVE-2025-1974
author: &gt;
  Tabitha Sable (Kubernetes Security Response Committee)
--&gt;
&lt;!--
Today, the ingress-nginx maintainers have [released patches for a batch of critical vulnerabilities](https://github.com/kubernetes/ingress-nginx/releases) that could make it easy for attackers to take over your Kubernetes cluster. If you are among the over 40% of Kubernetes administrators using [ingress-nginx](https://github.com/kubernetes/ingress-nginx/), you should take action immediately to protect your users and data.
--&gt;
&lt;p&gt;今天，ingress-nginx 项目的维护者们&lt;a href=&#34;https://github.com/kubernetes/ingress-nginx/releases&#34;&gt;发布了一批关键漏洞的修复补丁&lt;/a&gt;，
这些漏洞可能让攻击者轻易接管你的 Kubernetes 集群。目前有 40% 以上的 Kubernetes 管理员正在使用
&lt;a href=&#34;https://github.com/kubernetes/ingress-nginx/&#34;&gt;ingress-nginx&lt;/a&gt;，
如果你是其中之一，请立即采取行动，保护你的用户和数据。&lt;/p&gt;
&lt;!--
## Background

[Ingress](/docs/concepts/services-networking/ingress/) is the traditional Kubernetes feature for exposing your workload Pods to the world so that they can be useful. In an implementation-agnostic way, Kubernetes users can define how their applications should be made available on the network. Then, an [ingress controller](/docs/concepts/services-networking/ingress-controllers/) uses that definition to set up local or cloud resources as required for the user’s particular situation and needs.
--&gt;
&lt;h2 id=&#34;background&#34;&gt;背景  &lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/services-networking/ingress/&#34;&gt;Ingress&lt;/a&gt;
是 Kubernetes 提供的一种传统特性，可以将你的工作负载 Pod 暴露给外部世界，方便外部用户使用。
Kubernetes 用户可以用与实现无关的方式来定义应用如何在网络上可用。
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/services-networking/ingress-controllers/&#34;&gt;Ingress 控制器&lt;/a&gt;会根据定义，
配置所需的本地资源或云端资源，以满足用户的特定场景和需求。&lt;/p&gt;
&lt;!--
Many different ingress controllers are available, to suit users of different cloud providers or brands of load balancers. Ingress-nginx is a software-only ingress controller provided by the Kubernetes project. Because of its versatility and ease of use, ingress-nginx is quite popular: it is deployed in over 40% of Kubernetes clusters\!

Ingress-nginx translates the requirements from Ingress objects into configuration for nginx, a powerful open source webserver daemon. Then, nginx uses that configuration to accept and route requests to the various applications running within a Kubernetes cluster. Proper handling of these nginx configuration parameters is crucial, because ingress-nginx needs to allow users significant flexibility while preventing them from accidentally or intentionally tricking nginx into doing things it shouldn’t.
--&gt;
&lt;p&gt;为了满足不同云厂商用户或负载均衡器产品的需求，目前有许多不同类型的 Ingress 控制器。
ingress-nginx 是 Kubernetes 项目提供的纯软件的 Ingress 控制器。
ingress-nginx 由于灵活易用，非常受用户欢迎。它已经被部署在超过 40% 的 Kubernetes 集群中！&lt;/p&gt;
&lt;p&gt;ingress-nginx 会将 Ingress 对象中的要求转换为 Nginx（一个强大的开源 Web 服务器守护进程）的配置。
Nginx 使用这些配置接受请求并将其路由到 Kubernetes 集群中运行的不同应用。
正确处理这些 Nginx 配置参数至关重要，因为 ingress-nginx 既要给予用户足够的灵活性，
又要防止用户无意或有意诱使 Nginx 执行其不应执行的操作。&lt;/p&gt;
&lt;!--
## Vulnerabilities Patched Today

Four of today’s ingress-nginx vulnerabilities are improvements to how ingress-nginx handles particular bits of nginx config. Without these fixes, a specially-crafted Ingress object can cause nginx to misbehave in various ways, including revealing the values of [Secrets](/docs/concepts/configuration/secret/) that are accessible to ingress-nginx. By default, ingress-nginx has access to all Secrets cluster-wide, so this can often lead to complete cluster takeover by any user or entity that has permission to create an Ingress.
--&gt;
&lt;h2 id=&#34;vulnerabilities-patched-today&#34;&gt;今日修复的漏洞   &lt;/h2&gt;
&lt;p&gt;今天修复的四个 ingress-nginx 漏洞都是对 ingress-nginx 如何处理特定 Nginx 配置细节的改进。
如果不打这些修复补丁，一个精心构造的 Ingress 资源对象就可以让 Nginx 出现异常行为，
包括泄露 ingress-nginx 可访问的 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/configuration/secret/&#34;&gt;Secret&lt;/a&gt;
的值。默认情况下，ingress-nginx 可以访问集群范围内的所有 Secret，因此这往往会导致任一有权限创建
Ingress 的用户或实体接管整个集群。&lt;/p&gt;
&lt;!--
The most serious of today’s vulnerabilities, [CVE-2025-1974](https://github.com/kubernetes/kubernetes/issues/131009), rated [9.8 CVSS](https://www.first.org/cvss/calculator/3-1#CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H), allows anything on the Pod network to exploit configuration injection vulnerabilities via the Validating Admission Controller feature of ingress-nginx. This makes such vulnerabilities far more dangerous: ordinarily one would need to be able to create an Ingress object in the cluster, which is a fairly privileged action. When combined with today’s other vulnerabilities, **CVE-2025-1974 means that anything on the Pod network has a good chance of taking over your Kubernetes cluster, with no credentials or administrative access required**. In many common scenarios, the Pod network is accessible to all workloads in your cloud VPC, or even anyone connected to your corporate network\! This is a very serious situation.
--&gt;
&lt;p&gt;本次最严重的漏洞是 &lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/131009&#34;&gt;CVE-2025-1974&lt;/a&gt;，
CVSS 评分高达 &lt;a href=&#34;https://www.first.org/cvss/calculator/3-1#CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H&#34;&gt;9.8&lt;/a&gt;，
它允许 Pod 网络中的任意实体通过 ingress-nginx 的验证性准入控制器特性滥用配置注入漏洞。
这种机制使得这些漏洞会产生更危险的情形：攻击者通常需要能够在集群中创建 Ingress 对象（这是一种较高权限的操作）。
当结合使用今天修复的其他漏洞（比如 CVE-2025-1974），
&lt;strong&gt;就意味着 Pod 网络中的任何实体都有极大可能接管你的 Kubernetes 集群，而不需要任何凭证或管理权限&lt;/strong&gt;。
在许多常见场景下，Pod 网络可以访问云端 VPC 中的所有工作负载，甚至能访问连接到你公司内网的任何人的机器！
这是一个非常严重的安全风险。&lt;/p&gt;
&lt;!--
Today, we have [released ingress-nginx v1.12.1 and v1.11.5](https://github.com/kubernetes/ingress-nginx/releases), which have fixes for all five of these vulnerabilities.

## Your next steps

First, determine if your clusters are using ingress-nginx. In most cases, you can check this by running `kubectl get pods --all-namespaces --selector app.kubernetes.io/name=ingress-nginx` with cluster administrator permissions.
--&gt;
&lt;p&gt;我们今天已经&lt;a href=&#34;https://github.com/kubernetes/ingress-nginx/releases&#34;&gt;发布了 ingress-nginx v1.12.1 和 v1.11.5&lt;/a&gt;，
这两个版本修复了所有这 5 个漏洞。&lt;/p&gt;
&lt;h2 id=&#34;your-next-steps&#34;&gt;你需要做什么  &lt;/h2&gt;
&lt;p&gt;首先，确定你的集群是否在使用 ingress-nginx。大多数情况下，你可以使用集群管理员权限运行以下命令进行检查：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;kubectl get pods --all-namespaces --selector app.kubernetes.io/name&lt;span style=&#34;color:#666&#34;&gt;=&lt;/span&gt;ingress-nginx
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
**If you are using ingress-nginx, make a plan to remediate these vulnerabilities immediately.**

**The best and easiest remedy is to [upgrade to the new patch release of ingress-nginx](https://kubernetes.github.io/ingress-nginx/deploy/upgrade/).** All five of today’s vulnerabilities are fixed by installing today’s patches.

If you can’t upgrade right away, you can significantly reduce your risk by turning off the Validating Admission Controller feature of ingress-nginx.
--&gt;
&lt;p&gt;&lt;strong&gt;如果你在使用 ingress-nginx，请立即针对这些漏洞制定补救计划。&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;最简单且推荐的补救方案是&lt;a href=&#34;https://kubernetes.github.io/ingress-nginx/deploy/upgrade/&#34;&gt;立即升级到最新补丁版本&lt;/a&gt;。&lt;/strong&gt;
安装今天的补丁，就能修复所有这 5 个漏洞。&lt;/p&gt;
&lt;p&gt;如果你暂时无法升级，可以通过关闭 ingress-nginx 的验证性准入控制器特性来显著降低风险。&lt;/p&gt;
&lt;!--
* If you have installed ingress-nginx using Helm  
  * Reinstall, setting the Helm value `controller.admissionWebhooks.enabled=false`  
* If you have installed ingress-nginx manually  
  * delete the ValidatingWebhookconfiguration called `ingress-nginx-admission`  
  * edit the `ingress-nginx-controller` Deployment or Daemonset, removing `--validating-webhook` from the controller container’s argument list
--&gt;
&lt;ul&gt;
&lt;li&gt;如果你使用 Helm 安装了 ingress-nginx
&lt;ul&gt;
&lt;li&gt;重新安装，设置 Helm 参数 &lt;code&gt;controller.admissionWebhooks.enabled=false&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;如果你是手动安装的
&lt;ul&gt;
&lt;li&gt;删除名为 &lt;code&gt;ingress-nginx-admission&lt;/code&gt; 的 ValidatingWebhookConfiguration&lt;/li&gt;
&lt;li&gt;编辑 &lt;code&gt;ingress-nginx-controller&lt;/code&gt; Deployment 或 DaemonSet，从控制器容器的参数列表中移除 &lt;code&gt;--validating-webhook&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
If you turn off the Validating Admission Controller feature as a mitigation for CVE-2025-1974, remember to turn it back on after you upgrade. This feature provides important quality of life improvements for your users, warning them about incorrect Ingress configurations before they can take effect.
--&gt;
&lt;p&gt;如果你为了缓解 CVE-2025-1974 造成的风险而关闭了验证性准入控制器特性，
请在升级完成后记得重新开启此特性。这个特性可以为你的用户提供重要的生命期帮助，
可以在错误的 Ingress 配置在生效之前及时提醒用户。&lt;/p&gt;
&lt;!--
## Conclusion, thanks, and further reading

The ingress-nginx vulnerabilities announced today, including CVE-2025-1974, present a serious risk to many Kubernetes users and their data. If you use ingress-nginx, you should take action immediately to keep yourself safe.

Thanks go out to Nir Ohfeld, Sagi Tzadik, Ronen Shustin, and Hillai Ben-Sasson from Wiz for responsibly disclosing these vulnerabilities, and for working with the Kubernetes SRC members and ingress-nginx maintainers (Marco Ebert and James Strong) to ensure we fixed them effectively.
--&gt;
&lt;h2 id=&#34;conclusion-thanks-and-further-reading&#34;&gt;总结、致谢与更多参考  &lt;/h2&gt;
&lt;p&gt;今天公布的包括 CVE-2025-1974 在内的 ingress-nginx 漏洞对许多 Kubernetes 用户及其数据构成了严重风险。
如果你正在使用 ingress-nginx，请立即采取行动确保自身安全。&lt;/p&gt;
&lt;p&gt;我们要感谢来自 Wiz 的 Nir Ohfeld、Sagi Tzadik、Ronen Shustin 和 Hillai Ben-Sasson，
他们负责任地披露了这些漏洞，并与 Kubernetes 安全响应委员会成员以及 ingress-nginx
维护者（Marco Ebert 和 James Strong）协同合作，确保这些漏洞被有效修复。&lt;/p&gt;
&lt;!--
For further information about the maintenance and future of ingress-nginx, please see this [GitHub issue](https://github.com/kubernetes/ingress-nginx/issues/13002) and/or attend [James and Marco’s KubeCon/CloudNativeCon EU 2025 presentation](https://kccnceu2025.sched.com/event/1tcyc/).

For further information about the specific vulnerabilities discussed in this article, please see the appropriate GitHub issue: [CVE-2025-24513](https://github.com/kubernetes/kubernetes/issues/131005), [CVE-2025-24514](https://github.com/kubernetes/kubernetes/issues/131006), [CVE-2025-1097](https://github.com/kubernetes/kubernetes/issues/131007), [CVE-2025-1098](https://github.com/kubernetes/kubernetes/issues/131008), or [CVE-2025-1974](https://github.com/kubernetes/kubernetes/issues/131009)
--&gt;
&lt;p&gt;有关 ingress-nginx 的维护和未来的更多信息，
请参阅&lt;a href=&#34;https://github.com/kubernetes/ingress-nginx/issues/13002&#34;&gt;这个 GitHub Issue&lt;/a&gt;，
或参与 &lt;a href=&#34;https://kccnceu2025.sched.com/event/1tcyc/&#34;&gt;James 和 Marco 在 KubeCon/CloudNativeCon EU 2025 的演讲&lt;/a&gt;。&lt;/p&gt;
&lt;p&gt;关于本文中提到的具体漏洞的信息，请参阅以下 GitHub Issue：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/131005&#34;&gt;CVE-2025-24513&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/131006&#34;&gt;CVE-2025-24514&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/131007&#34;&gt;CVE-2025-1097&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/131008&#34;&gt;CVE-2025-1098&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/kubernetes/issues/131009&#34;&gt;CVE-2025-1974&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>JobSet 介绍</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/03/23/introducing-jobset/</link>
      <pubDate>Sun, 23 Mar 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/03/23/introducing-jobset/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Introducing JobSet&#34;
date: 2025-03-23
slug: introducing-jobset

**Authors**: Daniel Vega-Myhre (Google), Abdullah Gharaibeh (Google), Kevin Hannon (Red Hat)
--&gt;
&lt;!--
In this article, we introduce [JobSet](https://jobset.sigs.k8s.io/), an open source API for
representing distributed jobs. The goal of JobSet is to provide a unified API for distributed ML
training and HPC workloads on Kubernetes.
--&gt;
&lt;p&gt;在本文中，我们介绍 &lt;a href=&#34;https://jobset.sigs.k8s.io/&#34;&gt;JobSet&lt;/a&gt;，这是一个用于表示分布式任务的开源 API。
JobSet 的目标是为 Kubernetes 上的分布式机器学习训练和高性能计算（HPC）工作负载提供统一的 API。&lt;/p&gt;
&lt;!--
## Why JobSet?

The Kubernetes community’s recent enhancements to the batch ecosystem on Kubernetes has attracted ML
engineers who have found it to be a natural fit for the requirements of running distributed training
workloads. 

Large ML models (particularly LLMs) which cannot fit into the memory of the GPU or TPU chips on a
single host are often distributed across tens of thousands of accelerator chips, which in turn may
span thousands of hosts.
--&gt;
&lt;h2 id=&#34;why-jobset&#34;&gt;为什么需要 JobSet？  &lt;/h2&gt;
&lt;p&gt;Kubernetes 社区近期对 Kubernetes 批处理生态系统的增强，吸引了许多机器学习工程师，
他们发现这非常符合运行分布式训练工作负载的需求。&lt;/p&gt;
&lt;p&gt;单个主机上的 GPU 或 TPU 芯片通常无法满足大型机器学习模型（尤其是大语言模型，LLM）的内存需求，
因此往往会被分布到成千上万的加速器芯片上，而这些芯片可能跨越数千个主机。&lt;/p&gt;
&lt;!--
As such, the model training code is often containerized and executed simultaneously on all these
hosts, performing distributed computations which often shard both the model parameters and/or the
training dataset across the target accelerator chips, using communication collective primitives like
all-gather and all-reduce to perform distributed computations and synchronize gradients between
hosts. 

These workload characteristics make Kubernetes a great fit for this type of workload, as efficiently
scheduling and managing the lifecycle of containerized applications across a cluster of compute
resources is an area where it shines. 
--&gt;
&lt;p&gt;因此，模型训练代码通常会被容器化，并在所有这些主机上同时执行，进行分布式计算。
这些计算通常会将模型参数和/或训练数据集拆分到目标加速器芯片上，并使用如
all-gather 和 all-reduce 等通信集合原语来进行分布式计算以及在主机之间同步梯度。&lt;/p&gt;
&lt;p&gt;这些工作负载的特性使得 Kubernetes 非常适合此类任务，
因为高效地调度和管理跨计算资源集群的容器化应用生命周期是 Kubernetes 的强项。&lt;/p&gt;
&lt;!--
It is also very extensible, allowing developers to define their own Kubernetes APIs, objects, and
controllers which manage the behavior and life cycle of these objects, allowing engineers to develop
custom distributed training orchestration solutions to fit their needs.

However, as distributed ML training techniques continue to evolve, existing Kubernetes primitives do
not adequately model them alone anymore.
--&gt;
&lt;p&gt;Kubernetes 还具有很强的可扩展性，允许开发者定义自己的 Kubernetes API、
对象以及管理这些对象行为和生命周期的控制器，
从而让工程师能够开发定制化的分布式训练编排解决方案以满足特定需求。&lt;/p&gt;
&lt;p&gt;然而，随着分布式机器学习训练技术的不断发展，现有的 Kubernetes
原语已经无法单独充分描述这些新技术。&lt;/p&gt;
&lt;!--
Furthermore, the landscape of Kubernetes distributed training orchestration APIs has become
fragmented, and each of the existing solutions in this fragmented landscape has certain limitations
that make it non-optimal for distributed ML training. 

For example, the KubeFlow training operator defines custom APIs for different frameworks (e.g.
PyTorchJob, TFJob, MPIJob, etc.); however, each of these job types are in fact a solution fit
specifically to the target framework, each with different semantics and behavior.
--&gt;
&lt;p&gt;此外，Kubernetes 分布式训练编排 API 的领域已经变得支离破碎，
而这个碎片化的领域中每个现有的解决方案都存在某些限制，
使得它们在分布式机器学习训练方面并非最优选择。&lt;/p&gt;
&lt;p&gt;例如，KubeFlow 训练 Operator 为不同的框架定义了自定义 API（例如 PyTorchJob、TFJob、MPIJob 等）。
然而，这些作业类型实际上分别是针对特定框架量身定制的解决方案，各自具有不同的语义和行为。&lt;/p&gt;
&lt;!--
On the other hand, the Job API fixed many gaps for running batch workloads, including Indexed
completion mode, higher scalability, Pod failure policies and Pod backoff policy to mention a few of
the most recent enhancements. However, running ML training and HPC workloads using the upstream Job
API requires extra orchestration to fill the following gaps:

Multi-template Pods : Most HPC or ML training jobs include more than one type of Pods. The different
Pods are part of the same workload, but they need to run a different container, request different
resources or have different failure policies. A common example is the driver-worker pattern.
--&gt;
&lt;p&gt;另一方面，Job API 弥补了运行批处理工作负载的许多空白，包括带索引的完成模式（Indexed Completion Mode）、
更高的可扩展性、Pod 失效策略和 Pod 回退策略等，这些都是最近的一些重要增强功能。然而，使用上游
Job API 运行机器学习训练和高性能计算（HPC）工作负载时，需要额外的编排来填补以下空白：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;多模板 Pod：大多数 HPC 或机器学习训练任务包含多种类型的 Pod。这些不同的 Pod 属于同一工作负载，
但它们需要运行不同的容器、请求不同的资源或具有不同的失效策略。
一个常见的例子是驱动器-工作节点（driver-worker）模式。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Job groups : Large scale training workloads span multiple network topologies, running across
multiple racks for example. Such workloads are network latency sensitive, and aim to localize
communication and minimize traffic crossing the higher-latency network links. To facilitate this,
the workload needs to be split into groups of Pods each assigned to a network topology.

Inter-Pod communication : Create and manage the resources (e.g. [headless
Services](/docs/concepts/services-networking/service/#headless-services)) necessary to establish
communication between the Pods of a job.
--&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;任务组：大规模训练工作负载跨越多个网络拓扑，例如在多个机架之间运行。
这类工作负载对网络延迟非常敏感，目标是将通信本地化并尽量减少跨越高延迟网络链路的流量。
为此，需要将工作负载拆分为 Pod 组，每组分配到一个网络拓扑。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Pod 间通信：创建和管理建立作业中 Pod 之间通信所需的资源
（例如&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/services-networking/service/#headless-services&#34;&gt;无头服务&lt;/a&gt;）。&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Startup sequencing : Some jobs require a specific start sequence of pods; sometimes the driver is
expected to start first (like Ray or Spark), in other cases the workers are expected to be ready
before starting the driver (like MPI).

JobSet aims to address those gaps using the Job API as a building block to build a richer API for
large-scale distributed HPC and ML use cases.
--&gt;
&lt;ul&gt;
&lt;li&gt;启动顺序：某些任务需要特定的 Pod 启动顺序；有时需要驱动（driver）首先启动（例如 Ray 或 Spark），
而有时，人们期望多个工作节点（worker）在驱动启动之前就绪（例如 MPI）。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;JobSet 旨在以 Job API 为基础，填补这些空白，构建一个更丰富的 API，
以支持大规模分布式 HPC 和 ML 使用场景。&lt;/p&gt;
&lt;!--
## How JobSet Works
JobSet models a distributed batch workload as a group of Kubernetes Jobs. This allows a user to
easily specify different pod templates for different distinct groups of pods (e.g. a leader,
workers, parameter servers, etc.). 

It uses the abstraction of a ReplicatedJob to manage child Jobs, where a ReplicatedJob is
essentially a Job Template with some desired number of Job replicas specified. This provides a
declarative way to easily create identical child-jobs to run on different islands of accelerators,
without resorting to scripting or Helm charts to generate many versions of the same job but with
different names.
--&gt;
&lt;h2 id=&#34;how-jobset-works&#34;&gt;JobSet 的工作原理  &lt;/h2&gt;
&lt;p&gt;JobSet 将分布式批处理工作负载建模为一组 Kubernetes Job。
这使得用户可以轻松为不同的 Pod 组（例如领导者 Pod、工作节点 Pod、参数服务器 Pod 等）
指定不同的 Pod 模板。&lt;/p&gt;
&lt;p&gt;它通过抽象概念 ReplicatedJob 来管理子 Job，其中 ReplicatedJob 本质上是一个带有指定副本数量的
Job 模板。这种方式提供了一种声明式的手段，能够轻松创建相同的子 Job，使其在不同的加速器集群上运行，
而无需借助脚本或 Helm Chart 来生成具有不同名称的多个相同任务版本。&lt;/p&gt;
&lt;!--


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/03/23/introducing-jobset/jobset_diagram.svg&#34;
         alt=&#34;JobSet Architecture&#34;/&gt; 
&lt;/figure&gt;

Some other key JobSet features which address the problems described above include:

Replicated Jobs : In modern data centers, hardware accelerators like GPUs and TPUs allocated in
islands of homogenous accelerators connected via a specialized, high bandwidth network links. For
example, a user might provision nodes containing a group of hosts co-located on a rack, each with
H100 GPUs, where GPU chips within each host are connected via NVLink, with a NVLink Switch
connecting the multiple NVLinks. TPU Pods are another example of this: TPU ViperLitePods consist of
64 hosts, each with 4 TPU v5e chips attached, all connected via ICI mesh. When running a distributed
training job across multiple of these islands, we often want to partition the workload into a group
of smaller identical jobs, 1 per island, where each pod primarily communicates with the pods within
the same island to do segments of distributed computation, and keeping the gradient synchronization
over DCN (data center network, which is lower bandwidth than ICI) to a bare minimum. 
--&gt;


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/03/23/introducing-jobset/jobset_diagram.svg&#34;
         alt=&#34;JobSet 架构&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;解决上述问题的其他一些关键 JobSet 特性包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;任务副本（Replicated Jobs）&lt;/strong&gt;：在现代数据中心中，硬件加速器（如 GPU 和 TPU）通常以同质加速器岛的形式分配，
并通过专用的高带宽网络链路连接。例如，用户可能会配置包含一组主机的节点，这些主机位于同一机架内，
每个主机都配备了 H100 GPU，主机内的 GPU 芯片通过 NVLink 连接，并通过 NVLink 交换机连接多个 NVLink。
TPU Pod 是另一个例子：TPU ViperLitePods 包含 64 个主机，每个主机连接了 4 个 TPU v5e 芯片，
所有芯片通过 ICI 网格连接。在跨多个这样的加速器岛运行分布式训练任务时，我们通常希望将工作负载划分为一组较小的相同任务，
每个岛一个任务，其中每个 Pod 主要与同一岛内的其他 Pod 通信以完成分布式计算的部分段，
并将梯度同步通过数据中心网络（DCN，其带宽低于 ICI）降到最低。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Automatic headless service creation, configuration, and lifecycle management : Pod-to-pod
communication via pod hostname is enabled by default, with automatic configuration and lifecycle
management of the headless service enabling this. 

Configurable success policies : JobSet has configurable success policies which target specific
ReplicatedJobs, with operators to target “Any” or “All” of their child jobs. For example, you can
configure the JobSet to be marked complete if and only if all pods that are part of the “worker”
ReplicatedJob are completed.
--&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;自动创建、配置无头服务并管理其生命周期&lt;/strong&gt;：默认情况下，启用通过 Pod 主机名来完成
Pod 到 Pod 的通信，并通过无头服务的自动配置和生命周期管理来支持这一功能。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;可配置的成功策略&lt;/strong&gt;：JobSet 提供了可配置的成功策略，这些策略针对特定的 ReplicatedJob，
并可通过操作符指定 &amp;quot;Any&amp;quot; 或 &amp;quot;All&amp;quot; 子任务。例如，你可以将 JobSet 配置为仅在属于 &amp;quot;worker&amp;quot;
ReplicatedJob 的所有 Pod 完成时才标记为完成。&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Configurable failure policies : JobSet has configurable failure policies which allow the user to
specify a maximum number of times the JobSet should be restarted in the event of a failure. If any
job is marked failed, the entire JobSet will be recreated, allowing the workload to resume from the
last checkpoint. When no failure policy is specified, if any job fails, the JobSet simply fails. 
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;可配置的失效策略&lt;/strong&gt;：JobSet 提供了可配置的失效策略，允许用户指定在发生故障时
JobSet 应重启的最大次数。如果任何任务被标记为失败，整个 JobSet 将会被重新创建，
从而使工作负载可以从最后一个检查点恢复。当未指定失效策略时，如果任何任务失败，
JobSet 会直接标记为失败。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Exclusive placement per topology domain : JobSet allows users to express that child jobs have 1:1
exclusive assignment to a topology domain, typically an accelerator island like a rack. For example,
if the JobSet creates two child jobs, then this feature will enforce that the pods of each child job
will be co-located on the same island, and that only one child job is allowed to schedule per
island. This is useful for scenarios where we want to use a distributed data parallel (DDP) training
strategy to train a model using multiple islands of compute resources (GPU racks or TPU slices),
running 1 model replica in each accelerator island, ensuring the forward and backward passes
themselves occur within a single model replica occurs over the high bandwidth interconnect linking
the accelerators chips within the island, and only the gradient synchronization between model
replicas occurs across accelerator islands over the lower bandwidth data center network.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;按拓扑域的独占放置&lt;/strong&gt;：JobSet 允许用户指定子任务与拓扑域（通常是加速器岛，例如机架）
之间的一对一独占分配关系。例如，如果 JobSet 创建了两个子任务，
此功能将确保每个子任务的 Pod 位于同一个加速器岛内，并且每个岛只允许调度一个子任务。
这在我们希望使用分布式数据并行（DDP）训练策略的情况下非常有用，
例如利用多个计算资源岛（GPU 机架或 TPU 切片）训练模型，在每个加速器岛内运行一个模型副本，
确保前向和反向传播过程通过岛内加速器芯片之间的高带宽互联完成，
而模型副本之间的梯度同步则通过低带宽的数据中心网络在加速器岛之间进行。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Integration with Kueue : Users can submit JobSets via [Kueue](https://kueue.sigs.k8s.io/) to
oversubscribe their clusters, queue workloads to run as capacity becomes available, prevent partial
scheduling and deadlocks, enable multi-tenancy, and more.
--&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;与 Kueue 集成&lt;/strong&gt;：用户可以通过 &lt;a href=&#34;https://kueue.sigs.k8s.io/&#34;&gt;Kueue&lt;/a&gt;
提交 JobSet，以实现集群的超额订阅、将工作负载排队等待容量可用时运行、
防止部分调度和死锁、支持多租户等更多功能。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Example use case

### Distributed ML training on multiple TPU slices with Jax

The following example is a JobSet spec for running a TPU Multislice workload on 4 TPU v5e
[slices](https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#slices). To learn more about
TPU concepts and terminology, please refer to these
[docs](https://cloud.google.com/tpu/docs/system-architecture-tpu-vm).
--&gt;
&lt;h2 id=&#34;example-use-case&#34;&gt;示例用例  &lt;/h2&gt;
&lt;h3 id=&#34;使用-jax-在多个-tpu-切片上进行分布式-ml-训练&#34;&gt;使用 Jax 在多个 TPU 切片上进行分布式 ML 训练&lt;/h3&gt;
&lt;p&gt;以下示例展示了一个 JobSet 规范，用于在 4 个 TPU v5e
&lt;a href=&#34;https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#slices&#34;&gt;切片&lt;/a&gt;上运行
TPU 多切片工作负载。若想了解更多关于 TPU 的概念和术语，
请参考这些&lt;a href=&#34;https://cloud.google.com/tpu/docs/system-architecture-tpu-vm&#34;&gt;文档&lt;/a&gt;。&lt;/p&gt;
&lt;!--
This example uses [Jax](https://jax.readthedocs.io/en/latest/quickstart.html), an ML framework with
native support for Just-In-Time (JIT) compilation targeting TPU chips via
[OpenXLA](https://github.com/openxla). However, you can also use
[PyTorch/XLA](https://pytorch.org/xla/release/2.3/index.html) to do ML training on TPUs.

This example makes use of several JobSet features (both explicitly and implicitly) to support the
unique scheduling requirements of TPU multislice training out-of-the-box with very little
configuration required by the user.
--&gt;
&lt;p&gt;此示例使用了 &lt;a href=&#34;https://jax.readthedocs.io/en/latest/quickstart.html&#34;&gt;Jax&lt;/a&gt;，
这是一个通过 &lt;a href=&#34;https://github.com/openxla&#34;&gt;OpenXLA&lt;/a&gt; 提供对 TPU 芯片即时（JIT）
编译原生支持的机器学习框架。不过，你也可以使用 &lt;a href=&#34;https://pytorch.org/xla/release/2.3/index.html&#34;&gt;PyTorch/XLA&lt;/a&gt;
在 TPUs 上进行机器学习训练。&lt;/p&gt;
&lt;p&gt;此示例利用了 JobSet 的多个功能（无论是显式还是隐式），以开箱即用地支持 TPU
多切片训练的独特调度需求，而用户需要的配置非常少。&lt;/p&gt;
&lt;!--
```yaml
# Run a simple Jax workload on 
apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: multislice
  annotations:
    # Give each child Job exclusive usage of a TPU slice 
    alpha.jobset.sigs.k8s.io/exclusive-topology: cloud.google.com/gke-nodepool
spec:
  failurePolicy:
    maxRestarts: 3
  replicatedJobs:
  - name: workers
    replicas: 4 # Set to number of TPU slices
    template:
      spec:
        parallelism: 2 # Set to number of VMs per TPU slice
        completions: 2 # Set to number of VMs per TPU slice
        backoffLimit: 0
        template:
          spec:
            hostNetwork: true
            dnsPolicy: ClusterFirstWithHostNet
            nodeSelector:
              cloud.google.com/gke-tpu-accelerator: tpu-v5-lite-podslice
              cloud.google.com/gke-tpu-topology: 2x4
            containers:
            - name: jax-tpu
              image: python:3.8
              ports:
              - containerPort: 8471
              - containerPort: 8080
              securityContext:
                privileged: true
              command:
              - bash
              - -c
              - |
                pip install &#34;jax[tpu]&#34; -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
                python -c &#39;import jax; print(&#34;Global device count:&#34;, jax.device_count())&#39;
                sleep 60
              resources:
                limits:
                  google.com/tpu: 4
```
--&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 运行简单的 Jax 工作负载&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;jobset.x-k8s.io/v1alpha2&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;JobSet&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;multislice&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;annotations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 为每个子任务提供 TPU 切片的独占使用权&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;alpha.jobset.sigs.k8s.io/exclusive-topology&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;cloud.google.com/gke-nodepool&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;failurePolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;maxRestarts&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;3&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;replicatedJobs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;workers&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;replicas&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;4&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 设置为 TPU 切片的数量&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;template&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;parallelism&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;2&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 设置为每个 TPU 切片的虚拟机数量&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;completions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;2&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 设置为每个 TPU 切片的虚拟机数量&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;backoffLimit&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;0&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;template&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;hostNetwork&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;true&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;dnsPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;ClusterFirstWithHostNet&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;nodeSelector&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;cloud.google.com/gke-tpu-accelerator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;tpu-v5-lite-podslice&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;cloud.google.com/gke-tpu-topology&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;2x4&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;jax-tpu&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;python:3.8&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;ports&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containerPort&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;8471&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containerPort&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;8080&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;securityContext&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;                &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;privileged&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;true&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;- bash&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;- -c&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;- |&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;                pip install &amp;#34;jax[tpu]&amp;#34; -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;                python -c &amp;#39;import jax; print(&amp;#34;Global device count:&amp;#34;, jax.device_count())&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b44;font-style:italic&#34;&gt;                sleep 60&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;                
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;resources&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;                &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;limits&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;                  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;google.com/tpu&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;4&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
## Future work and getting involved
We have a number of features on the JobSet roadmap planned for development this year, which can be
found in the [JobSet roadmap](https://github.com/kubernetes-sigs/jobset?tab=readme-ov-file#roadmap).

Please feel free to reach out with feedback of any kind. We’re also open to additional contributors,
whether it is to fix or report bugs, or help add new features or write documentation. 
--&gt;
&lt;h2 id=&#34;furture-work-and-getting-involved&#34;&gt;未来工作与参与方式  &lt;/h2&gt;
&lt;p&gt;我们今年的 JobSet 路线图中计划开发多项功能，具体内容可以在
&lt;a href=&#34;https://github.com/kubernetes-sigs/jobset?tab=readme-ov-file#roadmap&#34;&gt;JobSet 路线图&lt;/a&gt;中找到。&lt;/p&gt;
&lt;p&gt;欢迎你随时提供任何形式的反馈。我们也欢迎更多贡献者加入，无论是修复或报告问题、
帮助添加新功能，还是撰写文档，都非常欢迎。&lt;/p&gt;
&lt;!--
You can get in touch with us via our [repo](http://sigs.k8s.io/jobset), [mailing
list](https://groups.google.com/a/kubernetes.io/g/wg-batch) or on
[Slack](https://kubernetes.slack.com/messages/wg-batch).

Last but not least, thanks to all [our
contributors](https://github.com/kubernetes-sigs/jobset/graphs/contributors) who made this project
possible!
--&gt;
&lt;p&gt;你可以通过我们的&lt;a href=&#34;http://sigs.k8s.io/jobset&#34;&gt;代码仓库&lt;/a&gt;、
&lt;a href=&#34;https://groups.google.com/a/kubernetes.io/g/wg-batch&#34;&gt;邮件列表&lt;/a&gt;或者在
&lt;a href=&#34;https://kubernetes.slack.com/messages/wg-batch&#34;&gt;Slack&lt;/a&gt; 上与我们联系。&lt;/p&gt;
&lt;p&gt;最后但同样重要的是，感谢所有&lt;a href=&#34;https://github.com/kubernetes-sigs/jobset/graphs/contributors&#34;&gt;贡献者&lt;/a&gt;，
是你们让这个项目成为可能！&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>聚焦 SIG Apps</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/03/12/sig-apps-spotlight-2025/</link>
      <pubDate>Wed, 12 Mar 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/03/12/sig-apps-spotlight-2025/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Spotlight on SIG Apps&#34;
slug: sig-apps-spotlight-2025
canonicalUrl: https://www.kubernetes.dev/blog/2025/03/12/sig-apps-spotlight-2025
date: 2025-03-12
author: &#34;Sandipan Panda (DevZero)&#34;
--&gt;
&lt;!--
In our ongoing SIG Spotlight series, we dive into the heart of the Kubernetes project by talking to
the leaders of its various Special Interest Groups (SIGs). This time, we focus on 
**[SIG Apps](https://github.com/kubernetes/community/tree/master/sig-apps#apps-special-interest-group)**,
the group responsible for everything related to developing, deploying, and operating applications on
Kubernetes. [Sandipan Panda](https://www.linkedin.com/in/sandipanpanda)
([DevZero](https://www.devzero.io/)) had the opportunity to interview [Maciej
Szulik](https://github.com/soltysh) ([Defense Unicorns](https://defenseunicorns.com/)) and [Janet
Kuo](https://github.com/janetkuo) ([Google](https://about.google/)), the chairs and tech leads of
SIG Apps. They shared their experiences, challenges, and visions for the future of application
management within the Kubernetes ecosystem.
--&gt;
&lt;p&gt;在我们正在进行的 SIG 聚焦系列中，我们通过与 Kubernetes 项目各个特别兴趣小组（SIG）的领导者对话，
深入探讨 Kubernetes 项目的核心。这一次，我们聚焦于
&lt;strong&gt;&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-apps#apps-special-interest-group&#34;&gt;SIG Apps&lt;/a&gt;&lt;/strong&gt;，
这个小组负责 Kubernetes 上与应用程序开发、部署和操作相关的所有内容。
&lt;a href=&#34;https://www.linkedin.com/in/sandipanpanda&#34;&gt;Sandipan Panda&lt;/a&gt;（[DevZero](&lt;a href=&#34;https://www.devzero.io/&#34;&gt;https://www.devzero.io/&lt;/a&gt;））
有机会采访了 SIG Apps 的主席和技术负责人
&lt;a href=&#34;https://github.com/soltysh&#34;&gt;Maciej Szulik&lt;/a&gt;（&lt;a href=&#34;https://defenseunicorns.com/&#34;&gt;Defense Unicorns&lt;/a&gt;）
以及 &lt;a href=&#34;https://github.com/janetkuo&#34;&gt;Janet Kuo&lt;/a&gt;（&lt;a href=&#34;https://about.google/&#34;&gt;Google&lt;/a&gt;）。
他们分享了在 Kubernetes 生态系统中关于应用管理的经验、挑战以及未来愿景。&lt;/p&gt;
&lt;!--
## Introductions

**Sandipan: Hello, could you start by telling us a bit about yourself, your role, and your journey
within the Kubernetes community that led to your current roles in SIG Apps?**

**Maciej**: Hey, my name is Maciej, and I’m one of the leads for SIG Apps. Aside from this role, you
can also find me helping
[SIG CLI](https://github.com/kubernetes/community/tree/master/sig-cli#readme) and also being one of
the Steering Committee members. I’ve been contributing to Kubernetes since late 2014 in various
areas, including controllers, apiserver, and kubectl.
--&gt;
&lt;h2 id=&#34;自我介绍&#34;&gt;自我介绍&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Sandipan&lt;/strong&gt;：你好，能否先简单介绍一下你自己、你的角色，以及你在
Kubernetes 社区中的经历，这些经历是如何引导你担任 SIG Apps 的当前角色的？&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Maciej&lt;/strong&gt;：嗨，我叫 Maciej，是 SIG Apps 的负责人之一。除了这个角色，
你还可以看到我在协助 &lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-cli#readme&#34;&gt;SIG CLI&lt;/a&gt;
的工作，同时我也是指导委员会的成员之一。自 2014 年底以来，我一直为
Kubernetes 做出贡献，涉及的领域包括控制器、API 服务器以及 kubectl。&lt;/p&gt;
&lt;!--
**Janet**: Certainly! I&#39;m Janet, a Staff Software Engineer at Google, and I&#39;ve been deeply involved
with the Kubernetes project since its early days, even before the 1.0 launch in 2015.  It&#39;s been an
amazing journey!

My current role within the Kubernetes community is one of the chairs and tech leads of SIG Apps. My
journey with SIG Apps started organically. I started with building the Deployment API and adding
rolling update functionalities. I naturally gravitated towards SIG Apps and became increasingly
involved. Over time, I took on more responsibilities, culminating in my current leadership roles.
--&gt;
&lt;p&gt;&lt;strong&gt;Janet&lt;/strong&gt;：当然可以！我是 Janet，在 Google 担任资深软件工程师，
并且从 Kubernetes 项目早期（甚至在 2015 年 1.0 版本发布之前）就深度参与其中。
这是一段非常精彩的旅程！&lt;/p&gt;
&lt;p&gt;我在 Kubernetes 社区中的当前角色是 SIG Apps 的主席之一和技术负责人之一。
我与 SIG Apps 的结缘始于自然而然的过程。最初，我从构建 Deployment API
并添加滚动更新功能开始，逐渐对 SIG Apps 产生了浓厚的兴趣，并且参与度越来越高。
随着时间推移，我承担了更多的责任，最终走到了目前的领导岗位。&lt;/p&gt;
&lt;!--
## About SIG Apps

*All following answers were jointly provided by Maciej and Janet.*

**Sandipan: For those unfamiliar, could you provide an overview of SIG Apps&#39; mission and objectives?
What key problems does it aim to solve within the Kubernetes ecosystem?**
--&gt;
&lt;h2 id=&#34;关于-sig-apps&#34;&gt;关于 SIG Apps&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;以下所有回答均由 Maciej 和 Janet 共同提供。&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sandipan&lt;/strong&gt;：对于那些不熟悉的人，能否简要介绍一下 SIG Apps 的使命和目标？
它在 Kubernetes 生态系统中旨在解决哪些关键问题？&lt;/p&gt;
&lt;!--
As described in our
[charter](https://github.com/kubernetes/community/blob/master/sig-apps/charter.md#scope), we cover a
broad area related to developing, deploying, and operating applications on Kubernetes. That, in
short, means we’re open to each and everyone showing up at our bi-weekly meetings and discussing the
ups and downs of writing and deploying various applications on Kubernetes.

**Sandipan: What are some of the most significant projects or initiatives currently being undertaken
by SIG Apps?**
--&gt;
&lt;p&gt;正如我们在&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-apps/charter.md#scope&#34;&gt;章程&lt;/a&gt;中所描述的那样，
我们涵盖了与在 Kubernetes 上开发、部署和操作应用程序相关的广泛领域。
简而言之，这意味着我们欢迎每个人参加我们的双周会议，讨论在 Kubernetes
上编写和部署各种应用程序的经验和挑战。&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sandipan&lt;/strong&gt;：SIG Apps 目前正在进行的一些最重要项目或倡议有哪些？&lt;/p&gt;
&lt;!--
At this point in time, the main factors driving the development of our controllers are the
challenges coming from running various AI-related workloads. It’s worth giving credit here to two
working groups we’ve sponsored over the past years:
--&gt;
&lt;p&gt;在当前阶段，推动我们控制器开发的主要因素是运行各种 AI 相关工作负载所带来的挑战。
在此值得一提的是，过去几年我们支持的两个工作组：&lt;/p&gt;
&lt;!--
1. [The Batch Working Group](https://github.com/kubernetes/community/tree/master/wg-batch), which is
   looking at running HPC, AI/ML, and data analytics jobs on top of Kubernetes.
2. [The Serving Working Group](https://github.com/kubernetes/community/tree/master/wg-serving), which
   is focusing on hardware-accelerated AI/ML inference.
--&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/wg-batch&#34;&gt;Batch 工作组&lt;/a&gt;，
该工作组致力于在 Kubernetes 上运行 HPC、AI/ML 和数据分析作业。&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/wg-serving&#34;&gt;Serving 工作组&lt;/a&gt;，
该工作组专注于硬件加速的 AI/ML 推理。&lt;/li&gt;
&lt;/ol&gt;
&lt;!---
## Best practices and challenges

**Sandipan: SIG Apps plays a crucial role in developing application management best practices for
Kubernetes. Can you share some of these best practices and how they help improve application
lifecycle management?**
--&gt;
&lt;h2 id=&#34;最佳实践与挑战&#34;&gt;最佳实践与挑战&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Sandipan&lt;/strong&gt;：SIG Apps 在为 Kubernetes 开发应用程序管理最佳实践方面发挥着关键作用。
你能分享一些这些最佳实践吗？以及它们如何帮助改进应用程序生命周期管理？&lt;/p&gt;
&lt;!--
1. Implementing [health checks and readiness probes](/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
ensures that your applications are healthy and ready to serve traffic, leading to improved
reliability and uptime. The above, combined with comprehensive logging, monitoring, and tracing
solutions, will provide insights into your application&#39;s behavior, enabling you to identify and
resolve issues quickly.
--&gt;
&lt;ol&gt;
&lt;li&gt;实施&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/&#34;&gt;健康检查和就绪探针&lt;/a&gt;
确保你的应用程序处于健康状态并准备好处理流量，从而提高可靠性和正常运行时间。
结合全面的日志记录、监控和跟踪解决方案，上述措施将为您提供应用程序行为的洞察，
使你能够快速识别并解决问题。&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
2. [Auto-scale your application](/docs/concepts/workloads/autoscaling/) based
   on resource utilization or custom metrics, optimizing resource usage and ensuring your
   application can handle varying loads.
--&gt;
&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;根据资源利用率或自定义指标&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/autoscaling/&#34;&gt;自动扩缩你的应用&lt;/a&gt;，
优化资源使用并确保您的应用程序能够处理不同的负载。&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
3. Use Deployment for stateless applications, StatefulSet for stateful applications, Job
   and CronJob for batch workloads, and DaemonSet for running a daemon on each node. Use
   Operators and CRDs to extend the Kubernetes API to automate the deployment, management, and
   lifecycle of complex applications, making them easier to operate and reducing manual
   intervention.
--&gt;
&lt;ol start=&#34;3&#34;&gt;
&lt;li&gt;对于无状态应用程序使用 Deployment，对于有状态应用程序使用 StatefulSet，
对于批处理工作负载使用 Job 和 CronJob，在每个节点上运行守护进程时使用
DaemonSet。使用 Operator 和 CRD 扩展 Kubernetes API 以自动化复杂应用程序的部署、
管理和生命周期，使其更易于操作并减少手动干预。&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
**Sandipan: What are some of the common challenges SIG Apps faces, and how do you address them?**

The biggest challenge we’re facing all the time is the need to reject a lot of features, ideas, and
improvements. This requires a lot of discipline and patience to be able to explain the reasons
behind those decisions.
--&gt;
&lt;p&gt;&lt;strong&gt;Sandipan&lt;/strong&gt;：SIG Apps 面临的一些常见挑战是什么？你们是如何解决这些问题的？&lt;/p&gt;
&lt;p&gt;我们一直面临的最大挑战是需要拒绝许多功能、想法和改进。这需要大量的纪律性和耐心，
以便能够解释做出这些决定背后的原因。&lt;/p&gt;
&lt;!--
**Sandipan: How has the evolution of Kubernetes influenced the work of SIG Apps? Are there any
recent changes or upcoming features in Kubernetes that you find particularly relevant or beneficial
for SIG Apps?**

The main benefit for both us and the whole community around SIG Apps is the ability to extend
kubernetes with [Custom Resource Definitions](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
and the fact that users can build their own custom controllers leveraging the built-in ones to
achieve whatever sophisticated use cases they might have and we, as the core maintainers, haven’t
considered or weren’t able to efficiently resolve inside Kubernetes.
--&gt;
&lt;p&gt;&lt;strong&gt;Sandipan&lt;/strong&gt;：Kubernetes 的演进如何影响了 SIG Apps 的工作？
Kubernetes 最近是否有任何变化或即将推出的功能，你认为对
SIG Apps 特别相关或有益？&lt;/p&gt;
&lt;p&gt;对我们以及围绕 SIG Apps 的整个社区而言，
最大的好处是能够通过&lt;a href=&#34;https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/&#34;&gt;自定义资源定义（Custom Resource Definitions）&lt;/a&gt;扩展
Kubernetes。用户可以利用内置控制器构建自己的自定义控制器，
以实现他们可能面对的各种复杂用例，而我们作为核心维护者，
可能没有考虑过这些用例，或者无法在 Kubernetes 内部高效解决。&lt;/p&gt;
&lt;!--
## Contributing to SIG Apps

**Sandipan: What opportunities are available for new contributors who want to get involved with SIG
Apps, and what advice would you give them?**
--&gt;
&lt;h2 id=&#34;贡献于-sig-apps&#34;&gt;贡献于 SIG Apps&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Sandipan&lt;/strong&gt;：对于想要参与 SIG Apps 的新贡献者，有哪些机会？
你会给他们什么建议？&lt;/p&gt;
&lt;!--
We get the question, &#34;What good first issue might you recommend we start with?&#34; a lot :-) But
unfortunately, there’s no easy answer to it. We always tell everyone that the best option to start
contributing to core controllers is to find one you are willing to spend some time with. Read
through the code, then try running unit tests and integration tests focusing on that
controller. Once you grasp the general idea, try breaking it and the tests again to verify your
breakage. Once you start feeling confident you understand that particular controller, you may want
to search through open issues affecting that controller and either provide suggestions, explaining
the problem users have, or maybe attempt your first fix.
--&gt;
&lt;p&gt;我们经常被问道：“你们建议我们从哪个好的初始问题开始？” :-)
但遗憾的是，这个问题没有简单的答案。我们总是告诉大家，
为核心控制器做贡献的最佳方式是找到一个你愿意花时间研究的控制器。
阅读代码，然后尝试运行针对该控制器的单元测试和集成测试。一旦你掌握了大致的概念，
试着破坏它并再次运行测试以验证你的改动。当你开始有信心理解了这个特定的控制器后，
你可以搜索影响该控制器的待处理问题，提供一些建议，解释用户遇到的问题，
或者尝试提交你的第一个修复。&lt;/p&gt;
&lt;!--
Like we said, there are no shortcuts on that road; you need to spend the time with the codebase to
understand all the edge cases we’ve slowly built up to get to the point where we are. Once you’re
successful with one controller, you’ll need to repeat that same process with others all over again.

**Sandipan: How does SIG Apps gather feedback from the community, and how is this feedback
integrated into your work?**
--&gt;
&lt;p&gt;正如我们所说，在这条道路上没有捷径可走；你需要花时间研究代码库，
以理解我们逐步积累的所有边缘情况，从而达到我们现在的位置。
一旦你在一个控制器上取得了成功，你就需要在其他控制器上重复同样的过程。&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sandipan&lt;/strong&gt;：SIG Apps 如何从社区收集反馈，以及这些反馈是如何整合到你们的工作中的？&lt;/p&gt;
&lt;!--
We always encourage everyone to show up and present their problems and solutions during our
bi-weekly [meetings](https://github.com/kubernetes/community/tree/master/sig-apps#meetings). As long
as you’re solving an interesting problem on top of Kubernetes and you can provide valuable feedback
about any of the core controllers, we’re always happy to hear from everyone.
--&gt;
&lt;p&gt;我们总是鼓励每个人参加我们的双周&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-apps#meetings&#34;&gt;会议&lt;/a&gt;，
并在会上提出他们的问题和解决方案。只要你是在 Kubernetes 上解决一个有趣的问题，
并且能够对任何核心控制器提供有价值的反馈，我们都非常乐意听取每个人的意见。&lt;/p&gt;
&lt;!--
## Looking ahead

**Sandipan: Looking ahead, what are the key focus areas or upcoming trends in application management
within Kubernetes that SIG Apps is excited about? How is the SIG adapting to these trends?**

Definitely the current AI hype is the major driving factor; as mentioned above, we have two working
groups, each covering a different aspect of it.
--&gt;
&lt;h2 id=&#34;展望未来&#34;&gt;展望未来&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Sandipan&lt;/strong&gt;：展望未来，Kubernetes 中应用程序管理的关键关注领域或即将到来的趋势有哪些是
SIG Apps 感到兴奋的？SIG 是如何适应这些趋势的？&lt;/p&gt;
&lt;p&gt;当前的 AI 热潮无疑是主要的驱动因素；如上所述，我们有两个工作组，
每个工作组都涵盖了它的一个不同方面。&lt;/p&gt;
&lt;!--
**Sandipan: What are some of your favorite things about this SIG?**

Without a doubt, the people that participate in our meetings and on
[Slack](https://kubernetes.slack.com/messages/sig-apps), who tirelessly help triage issues, pull
requests and invest a lot of their time (very frequently their private time) into making kubernetes
great!
--&gt;
&lt;p&gt;&lt;strong&gt;Sandipan&lt;/strong&gt;：关于这个 SIG，你们最喜欢的事情有哪些？&lt;/p&gt;
&lt;p&gt;毫无疑问，参与我们会议和
&lt;a href=&#34;https://kubernetes.slack.com/messages/sig-apps&#34;&gt;Slack&lt;/a&gt; 频道的人们是最让我们感到欣慰的。
他们不知疲倦地帮助处理问题、拉取请求，并投入大量的时间（很多时候是他们的私人时间）来让
Kubernetes 变得更好！&lt;/p&gt;
&lt;hr&gt;
&lt;!--
SIG Apps is an essential part of the Kubernetes community, helping to shape how applications are
deployed and managed at scale. From its work on improving Kubernetes&#39; workload APIs to driving
innovation in AI/ML application management, SIG Apps is continually adapting to meet the needs of
modern application developers and operators. Whether you’re a new contributor or an experienced
developer, there’s always an opportunity to get involved and make an impact.
--&gt;
&lt;p&gt;SIG Apps 是 Kubernetes 社区的重要组成部分，
帮助塑造了应用程序如何在大规模下部署和管理的方式。从改进 Kubernetes
的工作负载 API 到推动 AI/ML 应用程序管理的创新，SIG Apps
不断适应以满足现代应用程序开发者和操作人员的需求。无论你是新贡献者还是有经验的开发者，
都有机会参与其中并产生影响。&lt;/p&gt;
&lt;!--
If you’re interested in learning more or contributing to SIG Apps, be sure to check out their [SIG
README](https://github.com/kubernetes/community/tree/master/sig-apps) and join their bi-weekly [meetings](https://github.com/kubernetes/community/tree/master/sig-apps#meetings).

- [SIG Apps Mailing List](https://groups.google.com/a/kubernetes.io/g/sig-apps)
- [SIG Apps on Slack](https://kubernetes.slack.com/messages/sig-apps)
--&gt;
&lt;p&gt;如果你有兴趣了解更多关于 SIG Apps 的信息或为其做出贡献，务必查看他们的
&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-apps&#34;&gt;SIG README&lt;/a&gt;，
并加入他们的双周&lt;a href=&#34;https://github.com/kubernetes/community/tree/master/sig-apps#meetings&#34;&gt;会议&lt;/a&gt;。&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://groups.google.com/a/kubernetes.io/g/sig-apps&#34;&gt;SIG Apps 邮件列表&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://kubernetes.slack.com/messages/sig-apps&#34;&gt;SIG Apps 在 Slack 上&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>kube-proxy 的 NFTables 模式</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/02/28/nftables-kube-proxy/</link>
      <pubDate>Fri, 28 Feb 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/02/28/nftables-kube-proxy/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;NFTables mode for kube-proxy&#34;
date: 2025-02-28
slug: nftables-kube-proxy
author: &gt;
  Dan Winship (Red Hat)
--&gt;
&lt;!--
A new nftables mode for kube-proxy was introduced as an alpha feature
in Kubernetes 1.29. Currently in beta, it is expected to be GA as of
1.33. The new mode fixes long-standing performance problems with the
iptables mode and all users running on systems with reasonably-recent
kernels are encouraged to try it out. (For compatibility reasons, even
once nftables becomes GA, iptables will still be the _default_.)
--&gt;
&lt;p&gt;Kubernetes 1.29 引入了一种新的 Alpha 特性：kube-proxy 的 nftables 模式。
目前该模式处于 Beta 阶段，并预计将在 1.33 版本中达到一般可用（GA）状态。
新模式解决了 iptables 模式长期存在的性能问题，建议所有运行在较新内核版本系统上的用户尝试使用。
出于兼容性原因，即使 nftables 成为 GA 功能，iptables 仍将是&lt;strong&gt;默认&lt;/strong&gt;模式。&lt;/p&gt;
&lt;!--
## Why nftables? Part 1: data plane latency

The iptables API was designed for implementing simple firewalls, and
has problems scaling up to support Service proxying in a large
Kubernetes cluster with tens of thousands of Services.

In general, the ruleset generated by kube-proxy in iptables mode has a
number of iptables rules proportional to the sum of the number of
Services and the total number of endpoints. In particular, at the top
level of the ruleset, there is one rule to test each possible Service
IP (and port) that a packet might be addressed to:
--&gt;
&lt;h2 id=&#34;为什么选择-nftables-第一部分-数据平面延迟&#34;&gt;为什么选择 nftables？第一部分：数据平面延迟&lt;/h2&gt;
&lt;p&gt;iptables API 是被设计用于实现简单的防火墙功能，在扩展到支持大型 Kubernetes 集群中的 Service
代理时存在局限性，尤其是在包含数万个 Service 的集群中。&lt;/p&gt;
&lt;p&gt;通常，kube-proxy 在 iptables 模式下生成的规则集中的 iptables 规则数量与
Service 数量和总端点数量的总和成正比。
特别是，在规则集的顶层，针对数据包可能指向的每个可能的 Service IP（以及端口），
都有一条规则用于测试。&lt;/p&gt;
&lt;!--
```
# If the packet is addressed to 172.30.0.41:80, then jump to the chain
# KUBE-SVC-XPGD46QRK7WJZT7O for further processing
-A KUBE-SERVICES -m comment --comment &#34;namespace1/service1:p80 cluster IP&#34; -m tcp -p tcp -d 172.30.0.41 --dport 80 -j KUBE-SVC-XPGD46QRK7WJZT7O

# If the packet is addressed to 172.30.0.42:443, then...
-A KUBE-SERVICES -m comment --comment &#34;namespace2/service2:p443 cluster IP&#34; -m tcp -p tcp -d 172.30.0.42 --dport 443 -j KUBE-SVC-GNZBNJ2PO5MGZ6GT

# etc...
-A KUBE-SERVICES -m comment --comment &#34;namespace3/service3:p80 cluster IP&#34; -m tcp -p tcp -d 172.30.0.43 --dport 80 -j KUBE-SVC-X27LE4BHSL4DOUIK
```
--&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# 如果数据包的目标地址是 172.30.0.41:80，则跳转到 KUBE-SVC-XPGD46QRK7WJZT7O 链进行进一步处理
-A KUBE-SERVICES -m comment --comment &amp;#34;namespace1/service1:p80 cluster IP&amp;#34; -m tcp -p tcp -d 172.30.0.41 --dport 80 -j KUBE-SVC-XPGD46QRK7WJZT7O

# 如果数据包的目标地址是 172.30.0.42:443，则...
-A KUBE-SERVICES -m comment --comment &amp;#34;namespace2/service2:p443 cluster IP&amp;#34; -m tcp -p tcp -d 172.30.0.42 --dport 443 -j KUBE-SVC-GNZBNJ2PO5MGZ6GT

# 等等...
-A KUBE-SERVICES -m comment --comment &amp;#34;namespace3/service3:p80 cluster IP&amp;#34; -m tcp -p tcp -d 172.30.0.43 --dport 80 -j KUBE-SVC-X27LE4BHSL4DOUIK
&lt;/code&gt;&lt;/pre&gt;&lt;!--
This means that when a packet comes in, the time it takes the kernel
to check it against all of the Service rules is **O(n)** in the number
of Services. As the number of Services increases, both the average and
the worst-case latency for the first packet of a new connection
increases (with the difference between best-case, average, and
worst-case being mostly determined by whether a given Service IP
address appears earlier or later in the `KUBE-SERVICES` chain).



&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/02/28/nftables-kube-proxy/iptables-only.svg&#34;
         alt=&#34;kube-proxy iptables first packet latency, at various percentiles, in clusters of various sizes&#34;/&gt; 
&lt;/figure&gt;

By contrast, with nftables, the normal way to write a ruleset like
this is to have a _single_ rule, using a &#34;verdict map&#34; to do the
dispatch:
--&gt;
&lt;p&gt;这意味着当数据包到达时，内核检查该数据包与所有 Service 规则所需的时间是 &lt;strong&gt;O(n)&lt;/strong&gt;，
其中 n 为 Service 的数量。随着 Service 数量的增加，新连接的第一个数据包的平均延迟和最坏情况下的延迟都会增加
（最佳情况、平均情况和最坏情况之间的差异主要取决于某个 Service IP 地址在 &lt;code&gt;KUBE-SERVICES&lt;/code&gt;
链中出现的顺序是靠前还是靠后）。&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/02/28/nftables-kube-proxy/iptables-only.svg&#34;
         alt=&#34;kube-proxy iptables 在不同规模集群中各百分位数下的第一个数据包延迟&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;相比之下，使用 nftables，编写此类规则集的常规方法是使用一个单一规则，
并通过&amp;quot;判决映射&amp;quot;（verdict map）来完成分发：&lt;/p&gt;
&lt;!--
```
table ip kube-proxy {

        # The service-ips verdict map indicates the action to take for each matching packet.
	map service-ips {
		type ipv4_addr . inet_proto . inet_service : verdict
		comment &#34;ClusterIP, ExternalIP and LoadBalancer IP traffic&#34;
		elements = { 172.30.0.41 . tcp . 80 : goto service-ULMVA6XW-namespace1/service1/tcp/p80,
                             172.30.0.42 . tcp . 443 : goto service-42NFTM6N-namespace2/service2/tcp/p443,
                             172.30.0.43 . tcp . 80 : goto service-4AT6LBPK-namespace3/service3/tcp/p80,
                             ... }
        }

        # Now we just need a single rule to process all packets matching an
        # element in the map. (This rule says, &#34;construct a tuple from the
        # destination IP address, layer 4 protocol, and destination port; look
        # that tuple up in &#34;service-ips&#34;; and if there&#39;s a match, execute the
        # associated verdict.)
	chain services {
		ip daddr . meta l4proto . th dport vmap @service-ips
	}

        ...
}
```
--&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code class=&#34;language-none&#34; data-lang=&#34;none&#34;&gt;table ip kube-proxy {

  # service-ips 判决映射指示了对每个匹配数据包应采取的操作。
  map service-ips {
    type ipv4_addr . inet_proto . inet_service : verdict
    comment &amp;#34;ClusterIP、ExternalIP 和 LoadBalancer IP 流量&amp;#34;
    elements = { 172.30.0.41 . tcp . 80 : goto service-ULMVA6XW-namespace1/service1/tcp/p80,
                 172.30.0.42 . tcp . 443 : goto service-42NFTM6N-namespace2/service2/tcp/p443,
                 172.30.0.43 . tcp . 80 : goto service-4AT6LBPK-namespace3/service3/tcp/p80,
                 ... }
    }

  # 现在我们只需要一条规则来处理所有与映射中元素匹配的数据包。
  # （此规则表示：&amp;#34;根据目标 IP 地址、第 4 层协议和目标端口构建一个元组；
  # 在 &amp;#39;service-ips&amp;#39; 中查找该元组；如果找到匹配项，则执行与之关联的判定。&amp;#34;）
  chain services {
    ip daddr . meta l4proto . th dport vmap @service-ips
  }

  ...
}
&lt;/code&gt;&lt;/pre&gt;&lt;!--
Since there&#39;s only a single rule, with a roughly **O(1)** map lookup,
packet processing time is more or less constant regardless of cluster
size, and the best/average/worst cases are very similar:



&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/02/28/nftables-kube-proxy/nftables-only.svg&#34;
         alt=&#34;kube-proxy nftables first packet latency, at various percentiles, in clusters of various sizes&#34;/&gt; 
&lt;/figure&gt;
--&gt;
&lt;p&gt;由于只有一条规则，并且映射查找的时间复杂度大约为 &lt;strong&gt;O(1)&lt;/strong&gt;，因此数据包处理时间几乎与集群规模无关，
并且最佳、平均和最坏情况下的表现非常接近：&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/02/28/nftables-kube-proxy/nftables-only.svg&#34;
         alt=&#34;kube-proxy nftables 在不同规模集群中各百分位数下的第一个数据包延迟&#34;/&gt; 
&lt;/figure&gt;
&lt;!--
But note the huge difference in the vertical scale between the
iptables and nftables graphs! In the clusters with 5000 and 10,000
Services, the p50 (average) latency for nftables is about the same as
the p01 (approximately best-case) latency for iptables. In the 30,000
Service cluster, the p99 (approximately worst-case) latency for
nftables manages to beat out the p01 latency for iptables by a few
microseconds! Here&#39;s both sets of data together, but you may have to
squint to see the nftables results!:



&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/02/28/nftables-kube-proxy/iptables-vs-nftables.svg&#34;
         alt=&#34;kube-proxy iptables-vs-nftables first packet latency, at various percentiles, in clusters of various sizes&#34;/&gt; 
&lt;/figure&gt;
--&gt;
&lt;p&gt;但请注意图表中 iptables 和 nftables 之间在纵轴上的巨大差异！
在包含 5000 和 10,000 个 Service 的集群中，nftables 的 p50（平均）延迟与 iptables
的 p01（接近最佳情况）延迟大致相同。
在包含 30,000 个 Service 的集群中，nftables 的 p99（接近最坏情况）延迟比 iptables 的 p01 延迟快了几微秒！
以下是两组数据的对比图，但你可能需要仔细观察才能看到 nftables 的结果！&lt;/p&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/02/28/nftables-kube-proxy/iptables-vs-nftables.svg&#34;
         alt=&#34;kube-proxy iptables 与 nftables 在不同规模集群中各百分位数下的第一个数据包延迟对比&#34;/&gt; 
&lt;/figure&gt;
&lt;!--
## Why nftables? Part 2: control plane latency

While the improvements to data plane latency in large clusters are
great, there&#39;s another problem with iptables kube-proxy that often
keeps users from even being able to grow their clusters to that size:
the time it takes kube-proxy to program new iptables rules when
Services and their endpoints change.
--&gt;
&lt;h2 id=&#34;为什么选择-nftables-第二部分-控制平面延迟&#34;&gt;为什么选择 nftables？第二部分：控制平面延迟&lt;/h2&gt;
&lt;p&gt;虽然在大型集群中数据平面延迟的改进非常显著，但 iptables 模式的 kube-proxy 还存在另一个问题，
这往往使得用户无法将集群扩展到较大规模：那就是当 Service 及其端点发生变化时，kube-proxy
更新 iptables 规则所需的时间。&lt;/p&gt;
&lt;!--
With both iptables and nftables, the total size of the ruleset as a
whole (actual rules, plus associated data) is **O(n)** in the combined
number of Services and their endpoints. Originally, the iptables
backend would rewrite every rule on every update, and with tens of
thousands of Services, this could grow to be hundreds of thousands of
iptables rules. Starting in Kubernetes 1.26, we began improving
kube-proxy so that it could skip updating _most_ of the unchanged
rules in each update, but the limitations of `iptables-restore` as an
API meant that it was still always necessary to send an update that&#39;s
**O(n)** in the number of Services (though with a noticeably smaller
constant than it used to be). Even with those optimizations, it can
still be necessary to make use of kube-proxy&#39;s `minSyncPeriod` config
option to ensure that it doesn&#39;t spend every waking second trying to
push iptables updates.
--&gt;
&lt;p&gt;对于 iptables 和 nftables，规则集的整体大小（实际规则加上相关数据）与 Service
及其端点的总数呈 &lt;strong&gt;O(n)&lt;/strong&gt; 关系。原来，iptables 后端在每次更新时都会重写所有规则，
当集群中存在数万个 Service 时，这可能导致规则数量增长至数十万条 iptables 规则。
从 Kubernetes 1.26 开始，我们开始优化 kube-proxy，使其能够在每次更新时跳过对大多数未更改规则的更新，
但由于 &lt;code&gt;iptables-restore&lt;/code&gt; API 的限制，仍然需要发送与 Service 数量呈 &lt;strong&gt;O(n)&lt;/strong&gt;
比例的更新（尽管常数因子比以前明显减小）。即使进行了这些优化，有时仍需使用 kube-proxy 的
&lt;code&gt;minSyncPeriod&lt;/code&gt; 配置选项，以确保它不会每秒钟都在尝试推送 iptables 更新。&lt;/p&gt;
&lt;!--
The nftables APIs allow for doing much more incremental updates, and
when kube-proxy in nftables mode does an update, the size of the
update is only **O(n)** in the number of Services and endpoints that
have changed since the last sync, regardless of the total number of
Services and endpoints. The fact that the nftables API allows each
nftables-using component to have its own private table also means that
there is no global lock contention between components like with
iptables. As a result, kube-proxy&#39;s nftables updates can be done much
more efficiently than with iptables.

(Unfortunately I don&#39;t have cool graphs for this part.)
--&gt;
&lt;p&gt;nftables API 支持更为增量化的更新，当以 nftables 模式运行的 kube-proxy 执行更新时，
更新的规模仅与自上次同步以来发生变化的 Service 和端点数量呈 &lt;strong&gt;O(n)&lt;/strong&gt; 关系，而与总的 Service 和端点数量无关。
此外，由于 nftables API 允许每个使用 nftables 的组件拥有自己的私有表，因此不会像 iptables
那样在组件之间产生全局锁竞争。结果是，kube-proxy 在 nftables 模式下的更新可以比 iptables 模式下高效得多。&lt;/p&gt;
&lt;p&gt;（不幸的是，这部分我没有酷炫的图表。）&lt;/p&gt;
&lt;!--
## Why _not_ nftables? {#why-not-nftables}

All that said, there are a few reasons why you might not want to jump
right into using the nftables backend for now.

First, the code is still fairly new. While it has plenty of unit
tests, performs correctly in our CI system, and has now been used in
the real world by multiple users, it has not seen anything close to as
much real-world usage as the iptables backend has, so we can&#39;t promise
that it is as stable and bug-free.
--&gt;
&lt;h2 id=&#34;why-not-nftables&#34;&gt;不选择 nftables 的理由有哪些？ &lt;/h2&gt;
&lt;p&gt;尽管如此，仍有几个原因可能让你目前不希望立即使用 nftables 后端。&lt;/p&gt;
&lt;p&gt;首先，该代码仍然相对较新。虽然它拥有大量的单元测试，在我们的 CI 系统中表现正确，
并且已经在现实世界中被多个用户使用，但其实际使用量远远不及 iptables 后端，
因此我们无法保证它同样稳定且无缺陷。&lt;/p&gt;
&lt;!--
Second, the nftables mode will not work on older Linux distributions;
currently it requires a 5.13 or newer kernel. Additionally, because of
bugs in early versions of the `nft` command line tool, you should not
run kube-proxy in nftables mode on nodes that have an old (earlier
than 1.0.0) version of `nft` in the host filesystem (or else
kube-proxy&#39;s use of nftables may interfere with other uses of nftables
on the system).
--&gt;
&lt;p&gt;其次，nftables 模式无法在较旧的 Linux 发行版上工作；目前它需要 5.13 或更高版本的内核。
此外，由于早期版本的 &lt;code&gt;nft&lt;/code&gt; 命令行工具存在缺陷，不应在运行旧版本（早于 1.0.0）
&lt;code&gt;nft&lt;/code&gt; 的节点主机文件系统中上以 nftables 模式运行 kube-proxy（否则 kube-proxy
对 nftables 的使用可能会影响系统上其他程序对 nftables 的使用）。&lt;/p&gt;
&lt;!--
Third, you may have other networking components in your cluster, such
as the pod network or NetworkPolicy implementation, that do not yet
support kube-proxy in nftables mode. You should consult the
documentation (or forums, bug tracker, etc.) for any such components
to see if they have problems with nftables mode. (In many cases they
will not; as long as they don&#39;t try to directly interact with or
override kube-proxy&#39;s iptables rules, they shouldn&#39;t care whether
kube-proxy is using iptables or nftables.) Additionally, observability
and monitoring tools that have not been updated may report less data
for kube-proxy in nftables mode than they do for kube-proxy in
iptables mode.
--&gt;
&lt;p&gt;第三，你的集群中可能还存在其他网络组件，例如 Pod 网络或 NetworkPolicy 实现，
这些组件可能尚不支持以 nftables 模式运行的 kube-proxy。你应查阅相关组件的文档（或论坛、问题跟踪系统等），
以确认它们是否与 nftables 模式存在兼容性问题。（在许多情况下，它们并不会受到影响；
只要它们不尝试直接操作或覆盖 kube-proxy 的 iptables 规则，就不在乎 kube-proxy
使用的是 iptables 还是 nftables。）
此外，相较于 iptables 模式下，尚未更新的可观测性和监控工具在 nftables
模式下可能会为 kube-proxy 提供更少的数据。&lt;/p&gt;
&lt;!--
Finally, kube-proxy in nftables mode is intentionally not 100%
compatible with kube-proxy in iptables mode. There are a few old
kube-proxy features whose default behaviors are less secure, less
performant, or less intuitive than we&#39;d like, but where we felt that
changing the default would be a compatibility break. Since the
nftables mode is opt-in, this gave us a chance to fix those bad
defaults without breaking users who weren&#39;t expecting changes. (In
particular, with nftables mode, NodePort Services are now only
reachable on their nodes&#39; default IPs, as opposed to being reachable
on all IPs, including `127.0.0.1`, with iptables mode.) The
[kube-proxy documentation] has more information about this, including
information about metrics you can look at to determine if you are
relying on any of the changed functionality, and what configuration
options are available to get more backward-compatible behavior.

[kube-proxy documentation]: https://kubernetes.io/docs/reference/networking/virtual-ips/#migrating-from-iptables-mode-to-nftables
--&gt;
&lt;p&gt;最后，以 nftables 模式运行的 kube-proxy 有意不与以 iptables 模式运行的 kube-proxy 完全兼容。
有一些较旧的 kube-proxy 功能，默认行为不如我们期望的那样安全、高效或直观，但我们认为更改默认行为会导致兼容性问题。
由于 nftables 模式是可选的，这为我们提供了一个机会，在不影响期望稳定性的用户的情况下修复这些不良默认设置。
（特别是，在 nftables 模式下，NodePort 类型的 Service 现在仅在其节点的默认 IP 上可访问，而在 iptables 模式下，
它们在所有 IP 上均可访问，包括 &lt;code&gt;127.0.0.1&lt;/code&gt;。）&lt;a href=&#34;https://kubernetes.io/zh-cn/docs/reference/networking/virtual-ips/#migrating-from-iptables-mode-to-nftables&#34;&gt;kube-proxy 文档&lt;/a&gt; 提供了更多关于此方面的信息，
包括如何通过查看某些指标来判断你是否依赖于任何已更改的特性，以及有哪些配置选项可用于实现更向后兼容的行为。&lt;/p&gt;
&lt;!--
## Trying out nftables mode

Ready to try it out? In Kubernetes 1.31 and later, you just need to
pass `--proxy-mode nftables` to kube-proxy (or set `mode: nftables` in
your kube-proxy config file).

If you are using kubeadm to set up your cluster, the kubeadm
documentation explains [how to pass a `KubeProxyConfiguration` to
`kubeadm init`]. You can also [deploy nftables-based clusters with
`kind`].
--&gt;
&lt;h2 id=&#34;尝试使用-nftables-模式&#34;&gt;尝试使用 nftables 模式&lt;/h2&gt;
&lt;p&gt;准备尝试了吗？在 Kubernetes 1.31 及更高版本中，你只需将 &lt;code&gt;--proxy-mode nftables&lt;/code&gt;
参数传递给 kube-proxy（或在 kube-proxy 配置文件中设置 &lt;code&gt;mode: nftables&lt;/code&gt;）。&lt;/p&gt;
&lt;p&gt;如果你使用 kubeadm 部署集群，kubeadm 文档解释了&lt;a href=&#34;https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#config-file&#34;&gt;如何向 &lt;code&gt;kubeadm init&lt;/code&gt; 传递 &lt;code&gt;KubeProxyConfiguration&lt;/code&gt;&lt;/a&gt;。
你还可以&lt;a href=&#34;https://kind.sigs.k8s.io/docs/user/configuration/#kube-proxy-mode&#34;&gt;通过 &lt;code&gt;kind&lt;/code&gt; 部署基于 nftables 的集群&lt;/a&gt;。&lt;/p&gt;
&lt;!--
You can also convert existing clusters from iptables (or ipvs) mode to
nftables by updating the kube-proxy configuration and restarting the
kube-proxy pods. (You do not need to reboot the nodes: when restarting
in nftables mode, kube-proxy will delete any existing iptables or ipvs
rules, and likewise, if you later revert back to iptables or ipvs
mode, it will delete any existing nftables rules.)

[how to pass a `KubeProxyConfiguration` to `kubeadm init`]: /docs/setup/production-environment/tools/kubeadm/control-plane-flags/#customizing-kube-proxy
[deploy nftables-based clusters with `kind`]: https://kind.sigs.k8s.io/docs/user/configuration/#kube-proxy-mode
--&gt;
&lt;p&gt;你还可以通过更新 kube-proxy 配置并重启 kube-proxy Pod，将现有集群从
iptables（或 ipvs）模式转换为 nftables 模式。（无需重启节点：
在以 nftables 模式重新启动时，kube-proxy 会删除现有的所有 iptables 或 ipvs 规则；
同样，如果你之后切换回 iptables 或 ipvs 模式，它将删除现有的所有 nftables 规则。）&lt;/p&gt;
&lt;!--
## Future plans

As mentioned above, while nftables is now the _best_ kube-proxy mode,
it is not the _default_, and we do not yet have a plan for changing
that. We will continue to support the iptables mode for a long time.

The future of the IPVS mode of kube-proxy is less certain: its main
advantage over iptables was that it was faster, but certain aspects of
the IPVS architecture and APIs were awkward for kube-proxy&#39;s purposes
(for example, the fact that the `kube-ipvs0` device needs to have
_every_ Service IP address assigned to it), and some parts of
Kubernetes Service proxying semantics were difficult to implement
using IPVS (particularly the fact that some Services had to have
different endpoints depending on whether you connected to them from a
local or remote client). And now, the nftables mode has the same
performance as IPVS mode (actually, slightly better), without any of
the downsides:
--&gt;
&lt;h2 id=&#34;未来计划&#34;&gt;未来计划&lt;/h2&gt;
&lt;p&gt;如上所述，虽然 nftables 现在是的 kube-proxy 的最佳模式，但它还不是默认模式，
我们目前还没有更改这一设置的计划。我们将继续长期支持 iptables 模式。&lt;/p&gt;
&lt;p&gt;kube-proxy 的 IPVS 模式的未来则不太确定：它相对于 iptables 的主要优势在于速度更快，
但 IPVS 的架构和 API 在某些方面对 kube-proxy 来说不够理想（例如，&lt;code&gt;kube-ipvs0&lt;/code&gt;
设备需要被分配所有 Service IP 地址），
并且 Kubernetes Service 代理的部分语义使用 IPVS 难以实现（特别是某些
Service 根据连接的客户端是本地还是远程，需要有不同的端点）。
现在，nftables 模式的性能与 IPVS 模式相同（实际上略胜一筹），而且没有任何缺点：&lt;/p&gt;
&lt;!--


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/02/28/nftables-kube-proxy/ipvs-vs-nftables.svg&#34;
         alt=&#34;kube-proxy ipvs-vs-nftables first packet latency, at various percentiles, in clusters of various sizes&#34;/&gt; 
&lt;/figure&gt;

(In theory the IPVS mode also has the advantage of being able to use
various other IPVS functionality, like alternative &#34;schedulers&#34; for
balancing endpoints. In practice, this ended up not being very useful,
because kube-proxy runs independently on every node, and the IPVS
schedulers on each node had no way of sharing their state with the
proxies on other nodes, thus thwarting the effort to balance traffic
more cleverly.)
--&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/02/28/nftables-kube-proxy/ipvs-vs-nftables.svg&#34;
         alt=&#34;kube-proxy IPVS 与 nftables 在不同规模集群中各百分位数下的第一个数据包延迟对比&#34;/&gt; 
&lt;/figure&gt;
&lt;p&gt;（理论上，IPVS 模式还具有可以使用其他 IPVS 功能的优势，例如使用替代的&amp;quot;调度器&amp;quot;来平衡端点。
但实际上，这并不太有用，因为 kube-proxy 在每个节点上独立运行，每个节点上的 IPVS
调度器无法与其他节点上的代理共享状态，从而无法实现更智能的流量均衡。）&lt;/p&gt;
&lt;!--
While the Kubernetes project does not have an immediate plan to drop
the IPVS backend, it is probably doomed in the long run, and people
who are currently using IPVS mode should try out the nftables mode
instead (and file bugs if you think there is missing functionality in
nftables mode that you can&#39;t work around).
--&gt;
&lt;p&gt;虽然 Kubernetes 项目目前没有立即放弃 IPVS 后端的计划，但从长远来看，IPVS 可能难逃被淘汰的命运。
目前使用 IPVS 模式的用户应尝试使用 nftables 模式（如果发现 nftables 模式中缺少某些无法绕过的功能，
请提交问题报告）。&lt;/p&gt;
&lt;!--
## Learn more

- &#34;[KEP-3866: Add an nftables-based kube-proxy backend]&#34; has the
  history of the new feature.

- &#34;[How the Tables Have Turned: Kubernetes Says Goodbye to IPTables]&#34;,
  from KubeCon/CloudNativeCon North America 2024, talks about porting
  kube-proxy and Calico from iptables to nftables.

- &#34;[From Observability to Performance]&#34;, from KubeCon/CloudNativeCon
  North America 2024. (This is where the kube-proxy latency data came
  from; the [raw data for the charts] is also available.)
--&gt;
&lt;h2 id=&#34;进一步了解&#34;&gt;进一步了解&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&amp;quot;&lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/3866-nftables-proxy/README.md&#34;&gt;KEP-3866: Add an nftables-based kube-proxy backend&lt;/a&gt;&amp;quot; 记录了此新特性的历史。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;quot;&lt;a href=&#34;https://youtu.be/yOGHb2HjslY?si=6O4PVJu7fGpReo1U&#34;&gt;How the Tables Have Turned: Kubernetes Says Goodbye to IPTables&lt;/a&gt;&amp;quot;，来自 2024 年
KubeCon/CloudNativeCon 北美大会，讨论了将 kube-proxy 和 Calico 从 iptables 迁移到 nftables 的过程。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;quot;&lt;a href=&#34;https://youtu.be/uYo2O3jbJLk?si=py2AXzMJZ4PuhxNg&#34;&gt;From Observability to Performance&lt;/a&gt;&amp;quot;，同样来自 2024 年 KubeCon/CloudNativeCon 北美大会。
（kube-proxy 延迟数据来源于此；&lt;a href=&#34;https://docs.google.com/spreadsheets/d/1-ryDNc6gZocnMHEXC7mNtqknKSOv5uhXFKDx8Hu3AYA/edit&#34;&gt;raw data for the charts&lt;/a&gt; 也可用。）&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

      </description>
    </item>
    
    <item>
      <title>云控制器管理器（Cloud Controller Manager）&#39;鸡与蛋&#39;的问题</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/02/14/cloud-controller-manager-chicken-egg-problem/</link>
      <pubDate>Fri, 14 Feb 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/02/14/cloud-controller-manager-chicken-egg-problem/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;The Cloud Controller Manager Chicken and Egg Problem&#34;
date: 2025-02-14
slug: cloud-controller-manager-chicken-egg-problem
author: &gt;
  Antonio Ojea,
  Michael McCune
--&gt;
&lt;!--
Kubernetes 1.31
[completed the largest migration in Kubernetes history][migration-blog], removing the in-tree
cloud provider.  While the component migration is now done, this leaves some additional
complexity for users and installer projects (for example, kOps or Cluster API) .  We will go
over those additional steps and failure points and make recommendations for cluster owners.
This migration was complex and some logic had to be extracted from the core components,
building four new subsystems.
--&gt;
&lt;p&gt;Kubernetes 1.31&lt;br&gt;
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2024/05/20/completing-cloud-provider-migration/&#34;&gt;完成了 Kubernetes 历史上最大的迁移&lt;/a&gt;，移除了树内云驱动（in-tree cloud provider）。
虽然组件迁移已经完成，但这为用户和安装项目（例如 kOps 或 Cluster API）带来了一些额外的复杂性。
我们将回顾这些额外的步骤和可能的故障点，并为集群所有者提供改进建议。&lt;br&gt;
此次迁移非常复杂，必须从核心组件中提取部分逻辑，构建四个新的子系统。&lt;/p&gt;
&lt;!--
1. **Cloud controller manager** ([KEP-2392][kep2392])
2. **API server network proxy** ([KEP-1281][kep1281])
3. **kubelet credential provider plugins** ([KEP-2133][kep2133])
4. **Storage migration to use [CSI][csi]** ([KEP-625][kep625])

The [cloud controller manager is part of the control plane][ccm]. It is a critical component
that replaces some functionality that existed previously in the kube-controller-manager and the
kubelet.
--&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;云控制器管理器&lt;/strong&gt; (&lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-cloud-provider/2392-cloud-controller-manager/README.md&#34;&gt;KEP-2392&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;API 服务器网络代理&lt;/strong&gt; (&lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1281-network-proxy&#34;&gt;KEP-1281&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;kubelet 凭证提供程序插件&lt;/strong&gt; (&lt;a href=&#34;https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2133-kubelet-credential-providers&#34;&gt;KEP-2133&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;存储迁移到使用 &lt;a href=&#34;https://github.com/container-storage-interface/spec?tab=readme-ov-file#container-storage-interface-csi-specification-&#34;&gt;CSI&lt;/a&gt;&lt;/strong&gt; (&lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/625-csi-migration/README.md&#34;&gt;KEP-625&lt;/a&gt;)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/architecture/cloud-controller/&#34;&gt;云控制器管理器是控制平面的一部分&lt;/a&gt;。这是一个关键组件，替换了之前存在于 kube-controller-manager
和 kubelet 中的某些特性。&lt;/p&gt;
&lt;!--


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/images/docs/components-of-kubernetes.svg&#34;
         alt=&#34;Components of Kubernetes&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;Components of Kubernetes&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
--&gt;


&lt;figure&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/images/components-of-kubernetes.svg&#34;
         alt=&#34;Kubernetes 组件&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;Kubernetes 组件&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;!--
One of the most critical functionalities of the cloud controller manager is the node controller,
which is responsible for the initialization of the nodes.

As you can see in the following diagram, when the **kubelet** starts, it registers the Node
object with the apiserver, Tainting the node so it can be processed first by the
cloud-controller-manager. The initial Node is missing the cloud-provider specific information,
like the Node Addresses and the Labels with the cloud provider specific information like the
Node, Region and Instance type information.
--&gt;
&lt;p&gt;云控制器管理器最重要的功能之一是节点控制器，它负责节点的初始化。&lt;/p&gt;
&lt;p&gt;从下图可以看出，当 &lt;strong&gt;kubelet&lt;/strong&gt; 启动时，它会向 apiserver 注册 Node 对象，并对节点设置污点，
以便云控制器管理器可以先处理该节点。初始的 Node 缺少与云提供商相关的信息，
例如节点地址和包含云提供商特定信息的标签，如节点、区域和实例类型信息。&lt;/p&gt;
&lt;!--


&lt;figure class=&#34;diagram-medium &#34;&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/02/14/cloud-controller-manager-chicken-egg-problem/ccm-chicken-egg-problem-sequence-diagram.svg&#34;
         alt=&#34;Chicken and egg problem sequence diagram&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;Chicken and egg problem sequence diagram&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
--&gt;


&lt;figure class=&#34;diagram-medium &#34;&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/02/14/cloud-controller-manager-chicken-egg-problem/ccm-chicken-egg-problem-sequence-diagram.svg&#34;
         alt=&#34;鸡和蛋问题时序图&#34;/&gt; &lt;figcaption&gt;
            &lt;p&gt;鸡和蛋问题时序图&lt;/p&gt;
        &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;!--
This new initialization process adds some latency to the node readiness. Previously, the kubelet
was able to initialize the node at the same time it created the node. Since the logic has moved
to the cloud-controller-manager, this can cause a [chicken and egg problem][chicken-and-egg]
during the cluster bootstrapping for those Kubernetes architectures that do not deploy the
controller manager as the other components of the control plane, commonly as static pods,
standalone binaries or daemonsets/deployments with tolerations to the taints and using
`hostNetwork` (more on this below)
--&gt;
&lt;p&gt;这一新的初始化过程会增加节点就绪的延迟。以前，kubelet 可以在创建节点的同时初始化节点。
对于某些 Kubernetes 架构而言，其控制平面其他组件以静态 Pod、独立二进制文件或具有容忍污点功能的、
用 &lt;code&gt;hostNetwork&lt;/code&gt; DaemonSet/Deployment 部署，由于节点初始化逻辑已移至云控制管理器中，
如果不将控制器管理器作为控制平面的一部分，则可能会导致集群引导过程中出现&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/administer-cluster/running-cloud-controller/#chicken-and-egg&#34;&gt;鸡和蛋问题&lt;/a&gt;（更多内容见下文）。&lt;/p&gt;
&lt;!--
## Examples of the dependency problem

As noted above, it is possible during bootstrapping for the cloud-controller-manager to be
unschedulable and as such the cluster will not initialize properly. The following are a few
concrete examples of how this problem can be expressed and the root causes for why they might
occur.

These examples assume you are running your cloud-controller-manager using a Kubernetes resource
(e.g. Deployment, DaemonSet, or similar) to control its lifecycle. Because these methods
rely on Kubernetes to schedule the cloud-controller-manager, care must be taken to ensure it
will schedule properly.
--&gt;
&lt;h2 id=&#34;依赖问题的示例&#34;&gt;依赖问题的示例&lt;/h2&gt;
&lt;p&gt;如上所述，在引导过程中，云控制器管理器可能无法被调度，
因此集群将无法正确初始化。以下几个具体示例说明此问题的可能表现形式及其根本原因。&lt;/p&gt;
&lt;p&gt;这些示例假设你使用 Kubernetes 资源（例如 Deployment、DaemonSet
或类似资源）来控制云控制器管理器的生命周期。由于这些方法依赖于 Kubernetes 来调度云控制器管理器，
因此必须确保其能够正确调度。&lt;/p&gt;
&lt;!--
### Example: Cloud controller manager not scheduling due to uninitialized taint

As [noted in the Kubernetes documentation][kubedocs0], when the kubelet is started with the command line
flag `--cloud-provider=external`, its corresponding `Node` object will have a no schedule taint
named `node.cloudprovider.kubernetes.io/uninitialized` added. Because the cloud-controller-manager
is responsible for removing the no schedule taint, this can create a situation where a
cloud-controller-manager that is being managed by a Kubernetes resource, such as a `Deployment`
or `DaemonSet`, may not be able to schedule.
--&gt;
&lt;h3 id=&#34;示例-由于未初始化的污点导致云控制器管理器无法调度&#34;&gt;示例：由于未初始化的污点导致云控制器管理器无法调度&lt;/h3&gt;
&lt;p&gt;如 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/administer-cluster/running-cloud-controller/#running-cloud-controller-manager&#34;&gt;Kubernetes 文档中所述&lt;/a&gt;，当 kubelet 使用命令行标志 &lt;code&gt;--cloud-provider=external&lt;/code&gt;
启动时，其对应的 &lt;code&gt;Node&lt;/code&gt; 对象将添加一个名为 &lt;code&gt;node.cloudprovider.kubernetes.io/uninitialized&lt;/code&gt;
的不可调度污点。由于云控制器管理器负责移除该不可调度污点，这可能会导致由某个 Kubernetes
资源（例如 &lt;code&gt;Deployment&lt;/code&gt; 或 &lt;code&gt;DaemonSet&lt;/code&gt;）管理的云控制器管理器无法被调度的情况。&lt;/p&gt;
&lt;!--
If the cloud-controller-manager is not able to be scheduled during the initialization of the
control plane, then the resulting `Node` objects will all have the
`node.cloudprovider.kubernetes.io/uninitialized` no schedule taint. It also means that this taint
will not be removed as the cloud-controller-manager is responsible for its removal. If the no
schedule taint is not removed, then critical workloads, such as the container network interface
controllers, will not be able to schedule, and the cluster will be left in an unhealthy state.
--&gt;
&lt;p&gt;如果在控制平面初始化期间云控制器管理器无法被调度，那么生成的 &lt;code&gt;Node&lt;/code&gt; 对象将全部带有
&lt;code&gt;node.cloudprovider.kubernetes.io/uninitialized&lt;/code&gt; 不可调度污点。这也意味着该污点不会被移除，
因为云控制器管理器负责其移除工作。如果不可调度污点未被移除，关键工作负载（例如容器网络接口控制器）
将无法被调度，集群将处于不健康状态。&lt;/p&gt;
&lt;!--
### Example: Cloud controller manager not scheduling due to not-ready taint

The next example would be possible in situations where the container network interface (CNI) is
waiting for IP address information from the cloud-controller-manager (CCM), and the CCM has not
tolerated the taint which would be removed by the CNI.

The [Kubernetes documentation describes][kubedocs1] the `node.kubernetes.io/not-ready` taint as follows:

&gt; &#34;The Node controller detects whether a Node is ready by monitoring its health and adds or removes this taint accordingly.&#34;
--&gt;
&lt;h3 id=&#34;示例-由于未就绪污点导致云控制器管理器无法调度&#34;&gt;示例：由于未就绪污点导致云控制器管理器无法调度&lt;/h3&gt;
&lt;p&gt;下一个示例可能出现在容器网络接口（CNI）正在等待来自云控制器管理器（CCM）的
IP 地址信息，而 CCM 未容忍将由 CNI 移除的污点的情况下。&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/labels-annotations-taints/#node-kubernetes-io-not-ready&#34;&gt;Kubernetes 文档&lt;/a&gt; 对 &lt;code&gt;node.kubernetes.io/not-ready&lt;/code&gt; 污点的描述如下：&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;quot;节点控制器通过监控节点的健康状态来检测节点是否已准备好，并据此添加或移除此污点。&amp;quot;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;!--
One of the conditions that can lead to a Node resource having this taint is when the container
network has not yet been initialized on that node. As the cloud-controller-manager is responsible
for adding the IP addresses to a Node resource, and the IP addresses are needed by the container
network controllers to properly configure the container network, it is possible in some
circumstances for a node to become stuck as not ready and uninitialized permanently.

This situation occurs for a similar reason as the first example, although in this case, the
`node.kubernetes.io/not-ready` taint is used with the no execute effect and thus will cause the
cloud-controller-manager not to run on the node with the taint. If the cloud-controller-manager is
not able to execute, then it will not initialize the node. It will cascade into the container
network controllers not being able to run properly, and the node will end up carrying both the
`node.cloudprovider.kubernetes.io/uninitialized` and `node.kubernetes.io/not-ready` taints,
leaving the cluster in an unhealthy state.
--&gt;
&lt;p&gt;当容器网络尚未在某节点上初始化时，可能导致 Node 资源具有此污点。由于云控制器管理器负责为
Node 资源添加 IP 地址，而容器网络控制器需要这些 IP 地址来正确配置容器网络，因此在某些情况下，
节点可能会永久处于未就绪且未初始化的状态。&lt;/p&gt;
&lt;p&gt;这种情况的发生原因与第一个示例类似，但在此情况下，&lt;code&gt;node.kubernetes.io/not-ready&lt;/code&gt;
污点使用了 NoExecute 效果，从而导致云控制器管理器无法在带有该污点的节点上运行。
如果云控制器管理器无法执行，则它将无法初始化节点。这将进一步导致容器网络控制器无法正常运行，
节点最终会同时携带 &lt;code&gt;node.cloudprovider.kubernetes.io/uninitialized&lt;/code&gt; 和
&lt;code&gt;node.kubernetes.io/not-ready&lt;/code&gt; 两个污点，从而使集群处于不健康状态。&lt;/p&gt;
&lt;!--
## Our Recommendations

There is no one “correct way” to run a cloud-controller-manager. The details will depend on the
specific needs of the cluster administrators and users. When planning your clusters and the
lifecycle of the cloud-controller-managers please consider the following guidance:

For cloud-controller-managers running in the same cluster, they are managing.
--&gt;
&lt;h2 id=&#34;我们的建议&#34;&gt;我们的建议&lt;/h2&gt;
&lt;p&gt;运行云控制器管理器并没有唯一的“正确方式”。具体细节将取决于集群管理员和用户的特定需求。
在规划你的集群以及云控制器管理器的生命周期时，请考虑以下指导。&lt;/p&gt;
&lt;p&gt;对于在同一集群中运行的云控制器管理器，它们所管理的集群也是这一集群，需要特别注意。&lt;/p&gt;
&lt;!--
1. Use host network mode, rather than the pod network: in most cases, a cloud controller manager
  will need to communicate with an API service endpoint associated with the infrastructure.
  Setting “hostNetwork” to true will ensure that the cloud controller is using the host
  networking instead of the container network and, as such, will have the same network access as
  the host operating system. It will also remove the dependency on the networking plugin. This
  will ensure that the cloud controller has access to the infrastructure endpoint (always check
  your networking configuration against your infrastructure provider’s instructions).
2. Use a scalable resource type. `Deployments` and `DaemonSets` are useful for controlling the
  lifecycle of a cloud controller. They allow easy access to running multiple copies for redundancy
  as well as using the Kubernetes scheduling to ensure proper placement in the cluster. When using
  these primitives to control the lifecycle of your cloud controllers and running multiple
  replicas, you must remember to enable leader election, or else your controllers will collide
  with each other which could lead to nodes not being initialized in the cluster.
--&gt;
&lt;ol&gt;
&lt;li&gt;使用主机网络模式，而不是 Pod 网络：在大多数情况下，云控制器管理器需要与基础设施相关的 API 服务端点进行通信。
将 &amp;quot;hostNetwork&amp;quot; 设置为 &lt;code&gt;true&lt;/code&gt; 可确保云控制器使用主机网络而非容器网络，从而拥有与主机操作系统相同的网络访问权限。
这还将消除对网络插件的依赖。这可以确保云控制器能够访问基础设施端点
（你应该始终检查网络配置是否与基础设施提供商所给的指导相符）。&lt;/li&gt;
&lt;li&gt;使用规模可扩缩的资源类型。&lt;code&gt;Deployment&lt;/code&gt; 和 &lt;code&gt;DaemonSet&lt;/code&gt; 对于控制云控制器的生命周期非常有用。
它们支持轻松地运行多个副本以实现冗余，并利用 Kubernetes 调度来确保在集群中的正确放置。
当使用这些原语控制云控制器的生命周期并运行多个副本时，请务必启用领导者选举，
否则控制器之间可能会发生冲突，导致集群中的节点无法初始化。&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
3. Target the controller manager containers to the control plane. There might exist other
  controllers which need to run outside the control plane (for example, Azure’s node manager
  controller). Still, the controller managers themselves should be deployed to the control plane.
  Use a node selector or affinity stanza to direct the scheduling of cloud controllers to the
  control plane to ensure that they are running in a protected space. Cloud controllers are vital
  to adding and removing nodes to a cluster as they form a link between Kubernetes and the
  physical infrastructure. Running them on the control plane will help to ensure that they run
  with a similar priority as other core cluster controllers and that they have some separation
  from non-privileged user workloads.
   1. It is worth noting that an anti-affinity stanza to prevent cloud controllers from running
     on the same host is also very useful to ensure that a single node failure will not degrade
     the cloud controller performance.
--&gt;
&lt;ol start=&#34;3&#34;&gt;
&lt;li&gt;将控制器管理器容器定位到控制平面。可能存在一些需要在控制平面之外运行的其他控制器
（例如，Azure 的节点管理器控制器），但云控制器管理器本身应部署到控制平面。
使用节点选择算符或亲和性配置将云控制器管理器定向调度到控制平面节点，以确保它们运行在受保护的空间中。
云控制器管理器在集群中添加和移除节点时至关重要，因为它们构成了 Kubernetes 与物理基础设施之间的桥梁。
&lt;ol&gt;
&lt;li&gt;值得注意的是，使用反亲和性配置以防止多个云控制器管理器运行在同一主机上也非常有用，
这可以确保单个节点故障不会影响云控制器管理器的性能。&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
4. Ensure that the tolerations allow operation. Use tolerations on the manifest for the cloud
  controller container to ensure that it will schedule to the correct nodes and that it can run
  in situations where a node is initializing. This means that cloud controllers should tolerate
  the `node.cloudprovider.kubernetes.io/uninitialized` taint, and it should also tolerate any
  taints associated with the control plane (for example, `node-role.kubernetes.io/control-plane`
  or `node-role.kubernetes.io/master`). It can also be useful to tolerate the
  `node.kubernetes.io/not-ready` taint to ensure that the cloud controller can run even when the
  node is not yet available for health monitoring.

For cloud-controller-managers that will not be running on the cluster they manage (for example,
in a hosted control plane on a separate cluster), then the rules are much more constrained by the
dependencies of the environment of the cluster running the cloud-controller-manager. The advice
for running on a self-managed cluster may not be appropriate as the types of conflicts and network
constraints will be different. Please consult the architecture and requirements of your topology
for these scenarios.
--&gt;
&lt;ol start=&#34;4&#34;&gt;
&lt;li&gt;确保污点容忍规则允许操作。在云控制器管理器容器的清单中使用污点容忍规则，以确保其能够被调度到正确的节点，
并能够在节点初始化时运行。这意味着云控制器应容忍 &lt;code&gt;node.cloudprovider.kubernetes.io/uninitialized&lt;/code&gt;
污点，还应容忍与控制平面相关的任何污点（例如，&lt;code&gt;node-role.kubernetes.io/control-plane&lt;/code&gt; 或
&lt;code&gt;node-role.kubernetes.io/master&lt;/code&gt;）。容忍 &lt;code&gt;node.kubernetes.io/not-ready&lt;/code&gt; 污点也可能很有用，
以确保即使节点尚未准备好进行健康监控时，云控制器仍能运行。&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;对于不在其所管理的集群上（例如，在其他集群上的托管控制平面上）运行的云控制器管理器，
其规则将更多地受限于运行云控制器管理器的集群环境的依赖项。针对自管集群的运行建议可能不适用，
因为冲突类型和网络约束会有所不同。请根据这些场景咨询你的拓扑结构的架构和需求。&lt;/p&gt;
&lt;!--
### Example

This is an example of a Kubernetes Deployment highlighting the guidance shown above. It is
important to note that this is for demonstration purposes only, for production uses please
consult your cloud provider’s documentation.
--&gt;
&lt;h3 id=&#34;示例&#34;&gt;示例&lt;/h3&gt;
&lt;p&gt;这是一个 Kubernetes Deployment 的示例，突显了上述指导原则。需要注意的是，
此示例仅用于演示目的，对于生产环境的使用，请参考你的云提供商的文档。&lt;/p&gt;
&lt;!--
```
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: cloud-controller-manager
  name: cloud-controller-manager
  namespace: kube-system
spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: cloud-controller-manager
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app.kubernetes.io/name: cloud-controller-manager
      annotations:
        kubernetes.io/description: Cloud controller manager for my infrastructure
    spec:
      containers: # the container details will depend on your specific cloud controller manager
      - name: cloud-controller-manager
        command:
        - /bin/my-infrastructure-cloud-controller-manager
        - --leader-elect=true
        - -v=1
        image: registry/my-infrastructure-cloud-controller-manager@latest
        resources:
          requests:
            cpu: 200m
            memory: 50Mi
      hostNetwork: true # these Pods are part of the control plane
      nodeSelector:
        node-role.kubernetes.io/control-plane: &#34;&#34;
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - topologyKey: &#34;kubernetes.io/hostname&#34;
            labelSelector:
              matchLabels:
                app.kubernetes.io/name: cloud-controller-manager
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 120
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 120
      - effect: NoSchedule
        key: node.cloudprovider.kubernetes.io/uninitialized
        operator: Exists
      - effect: NoSchedule
        key: node.kubernetes.io/not-ready
        operator: Exists
```
--&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;apps/v1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Deployment&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;labels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app.kubernetes.io/name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;cloud-controller-manager&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;cloud-controller-manager&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;namespace&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;kube-system&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;replicas&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;2&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;selector&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchLabels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app.kubernetes.io/name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;cloud-controller-manager&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;strategy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;type&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Recreate&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;template&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;metadata&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;labels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app.kubernetes.io/name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;cloud-controller-manager&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;annotations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kubernetes.io/description&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Cloud controller manager for my infrastructure&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;    &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;spec&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;containers&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 容器的详细信息将取决于你具体的云控制器管理器&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;cloud-controller-manager&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;command&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- /bin/my-infrastructure-cloud-controller-manager&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- --leader-elect=true&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;- -v=1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;image&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;registry/my-infrastructure-cloud-controller-manager@latest&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;resources&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;requests&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;cpu&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;200m&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;memory&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;50Mi&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;hostNetwork&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;true&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#080;font-style:italic&#34;&gt;# 这些 Pod 是控制平面的一部分&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;nodeSelector&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;node-role.kubernetes.io/control-plane&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;affinity&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;podAntiAffinity&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;requiredDuringSchedulingIgnoredDuringExecution&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;          &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;topologyKey&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;kubernetes.io/hostname&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;            &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;labelSelector&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;              &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;matchLabels&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;                &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;app.kubernetes.io/name&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;cloud-controller-manager&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tolerations&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;NoSchedule&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;node-role.kubernetes.io/master&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;operator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Exists&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;NoExecute&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;node.kubernetes.io/unreachable&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;operator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Exists&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tolerationSeconds&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;120&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;NoExecute&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;node.kubernetes.io/not-ready&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;operator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Exists&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;tolerationSeconds&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#666&#34;&gt;120&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;NoSchedule&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;node.cloudprovider.kubernetes.io/uninitialized&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;operator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Exists&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;      &lt;/span&gt;- &lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;effect&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;NoSchedule&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;key&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;node.kubernetes.io/not-ready&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;        &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;operator&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;Exists&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
When deciding how to deploy your cloud controller manager it is worth noting that
cluster-proportional, or resource-based, pod autoscaling is not recommended. Running multiple
replicas of a cloud controller manager is good practice for ensuring high-availability and
redundancy, but does not contribute to better performance. In general, only a single instance
of a cloud controller manager will be reconciling a cluster at any given time.
--&gt;
&lt;p&gt;在决定如何部署云控制器管理器时，需要注意的是，不建议使用与集群规模成比例的或基于资源的 Pod
自动规模扩缩。运行多个云控制器管理器副本是确保高可用性和冗余的良好实践，但这并不会提高性能。
通常情况下，任何时候只有一个云控制器管理器实例会负责协调集群。&lt;/p&gt;
&lt;!--
[migration-blog]: /blog/2024/05/20/completing-cloud-provider-migration/
[kep2392]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-cloud-provider/2392-cloud-controller-manager/README.md
[kep1281]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1281-network-proxy
[kep2133]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2133-kubelet-credential-providers
[csi]: https://github.com/container-storage-interface/spec?tab=readme-ov-file#container-storage-interface-csi-specification-
[kep625]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/625-csi-migration/README.md
[ccm]: /docs/concepts/architecture/cloud-controller/
[chicken-and-egg]: /docs/tasks/administer-cluster/running-cloud-controller/#chicken-and-egg
[kubedocs0]: /docs/tasks/administer-cluster/running-cloud-controller/#running-cloud-controller-manager
[kubedocs1]: /docs/reference/labels-annotations-taints/#node-kubernetes-io-not-ready
--&gt;

      </description>
    </item>
    
    <item>
      <title>聚焦 SIG Architecture: Enhancements</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/01/21/sig-architecture-enhancements/</link>
      <pubDate>Tue, 21 Jan 2025 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2025/01/21/sig-architecture-enhancements/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Spotlight on SIG Architecture: Enhancements&#34;
slug: sig-architecture-enhancements
canonicalUrl: https://www.kubernetes.dev/blog/2025/01/21/sig-architecture-enhancements
date: 2025-01-21
author: &#34;Frederico Muñoz (SAS Institute)&#34;
--&gt;
&lt;!--
_This is the fourth interview of a SIG Architecture Spotlight series that will cover the different
subprojects, and we will be covering [SIG Architecture:
Enhancements](https://github.com/kubernetes/community/blob/master/sig-architecture/README.md#enhancements)._

In this SIG Architecture spotlight we talked with [Kirsten
Garrison](https://github.com/kikisdeliveryservice), lead of the Enhancements subproject.
--&gt;
&lt;p&gt;&lt;strong&gt;这是 SIG Architecture 聚光灯系列的第四次采访，我们将介绍
&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-architecture/README.md#enhancements&#34;&gt;SIG Architecture: Enhancements&lt;/a&gt;。&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;在本次 SIG Architecture 专题采访中，我们访谈了 Enhancements
子项目的负责人 &lt;a href=&#34;https://github.com/kikisdeliveryservice&#34;&gt;Kirsten Garrison&lt;/a&gt;。&lt;/p&gt;
&lt;!--
## The Enhancements subproject

**Frederico (FSM): Hi Kirsten, very happy to have the opportunity to talk about the Enhancements
subproject. Let&#39;s start with some quick information about yourself and your role.**
--&gt;
&lt;h2 id=&#34;enhancements-子项目&#34;&gt;Enhancements 子项目&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Frederico (FSM)：你好 Kirsten，很高兴有机会讨论 Enhancements
子项目。开始请先介绍一下你自己和所承担的职责。&lt;/strong&gt;&lt;/p&gt;
&lt;!--
**Kirsten Garrison (KG)**: I’m a lead of the Enhancements subproject of SIG-Architecture and
currently work at Google. I first got involved by contributing to the service-catalog project with
the help of [Carolyn Van Slyck](https://github.com/carolynvs). With time, [I joined the Release
team](https://github.com/kubernetes/sig-release/blob/master/releases/release-1.17/release_team.md),
eventually becoming the Enhancements Lead and a Release Lead shadow. While on the release team, I
worked on some ideas to make the process better for the SIGs and Enhancements team (the opt-in
process) based on my team’s experiences. Eventually, I started attending Subproject meetings and
contributing to the Subproject’s work.
--&gt;
&lt;p&gt;&lt;strong&gt;Kirsten Garrison (KG)&lt;/strong&gt;：我是 SIG-Architecture 的 Enhancements 子项目的负责人，目前就职于 Google。
我最初在 &lt;a href=&#34;https://github.com/carolynvs&#34;&gt;Carolyn Van Slyck&lt;/a&gt; 的帮助下，为 service-catalog 项目贡献代码，
后来&lt;a href=&#34;https://github.com/kubernetes/sig-release/blob/master/releases/release-1.17/release_team.md&#34;&gt;加入了 Release 团队&lt;/a&gt;，
最终成为 Enhancements Lead 和 Release Lead 影子。
在发布团队工作期间，我根据团队的经验为 SIG 和 Enhancements 团队提出了一些改进流程的想法（如参与其中的流程）。
之后，我开始参加子项目会议，并为这个子项目的工作做贡献。&lt;/p&gt;
&lt;!--
**FSM: You mentioned the Enhancements subproject: how would you describe its main goals and areas of
intervention?**

**KG**: The [Enhancements
Subproject](https://github.com/kubernetes/community/blob/master/sig-architecture/README.md#enhancements)
primarily concerns itself with the [Kubernetes Enhancement
Proposal](https://github.com/kubernetes/enhancements/blob/master/keps/sig-architecture/0000-kep-process/README.md)
(_KEP_ for short)—the &#34;design&#34; documents required for all features and significant changes
to the Kubernetes project.
--&gt;
&lt;p&gt;&lt;strong&gt;FSM：你提到了 Enhancements 子项目，你如何描述它的主要目标和干预范围？&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KG&lt;/strong&gt;：&lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-architecture/README.md#enhancements&#34;&gt;Enhancements 子项目&lt;/a&gt;的核心是管理
&lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/sig-architecture/0000-kep-process/README.md&#34;&gt;Kubernetes 增强提案（KEP）&lt;/a&gt;，
这是 Kubernetes 项目所有特性和重大变更的“设计”文档。&lt;/p&gt;
&lt;!--
## The KEP and its impact

**FSM: The improvement of the KEP process was (and is) one in which SIG Architecture was heavily
involved. Could you explain the process to those that aren’t aware of it?**
--&gt;
&lt;h2 id=&#34;the-kep-and-its-impact&#34;&gt;KEP 及其影响   &lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;FSM：KEP 流程的改进一直是 SIG Architecture 深度参与的工作之一。你能为不了解的人介绍一下这个流程吗？&lt;/strong&gt;&lt;/p&gt;
&lt;!--
**KG**: [Every release](https://kubernetes.io/releases/release/#the-release-cycle), the SIGs let the
Release Team know which features they intend to work on to be put into the release. As mentioned
above, the prerequisite for these changes is a KEP - a standardized design document that all authors
must fill out and approve in the first weeks of the release cycle. Most features [will move
through 3
phases](https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#feature-stages):
alpha, beta and finally GA so approving a feature represents a significant commitment for the SIG.
--&gt;
&lt;p&gt;&lt;strong&gt;KG&lt;/strong&gt;：在&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/releases/release/#the-release-cycle&#34;&gt;每次发布版本&lt;/a&gt;时，各个
SIG 需要告知 Release Team 各自计划将哪些特性放到当前的版本发布中。
正如前面提到的，所有变更的前提是有一个 KEP，这是一种标准化的设计文档，
所有 KEP 的作者必须在发布周期的最初几周内填写完并获得批准。
大多数特性&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/reference/command-line-tools-reference/feature-gates/#feature-stages&#34;&gt;会经历三个阶段&lt;/a&gt;：
Alpha、Beta，最终进入 GA，因此批准一个特性对 SIG 来说是一项重大承诺。&lt;/p&gt;
&lt;!--
The KEP serves as the full source of truth of a feature. The [KEP
template](https://github.com/kubernetes/enhancements/blob/master/keps/NNNN-kep-template/README.md)
has different requirements based on what stage a feature is in, but it generally requires a detailed
discussion of the design and the impact as well as providing artifacts of stability and
performance. The KEP takes quite a bit of iterative work between authors, SIG reviewers, api review
team and the Production Readiness Review team[^1] before it is approved. Each set of reviewers is
looking to make sure that the proposal meets their standards in order to have a stable and
performant Kubernetes release. Only after all approvals are secured, can an author go forth and
merge their feature in the Kubernetes code base.
--&gt;
&lt;p&gt;KEP 作为某个特性真实、完整的信息来源。
&lt;a href=&#34;https://github.com/kubernetes/enhancements/blob/master/keps/NNNN-kep-template/README.md&#34;&gt;KEP 模板&lt;/a&gt;
对处于不同阶段的特性具有不同的要求，但通常需要详细讨论其设计、影响，并提供稳定性和性能的证明材料。
KEP 通常会在作者、SIG 审查人员、API 审查团队和 Production Readiness Review 团队&lt;sup id=&#34;fnref:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt;之间进行多轮迭代后才能获批。
每组审查者都会确保提案符合其标准，以保证 Kubernetes 版本的稳定性和性能。
只有在所有审批完成后，作者才能将其特性合并到 Kubernetes 代码库。&lt;/p&gt;
&lt;!--
**FSM: I see, quite a bit of additional structure was added. Looking back, what were the most
significant improvements of that approach?**

**KG**: In general, I think that the improvements with the most impact had to do with focusing on
the core intent of the KEP. KEPs exist not just to memorialize designs, but provide a structured way
to discuss and come to an agreement about different facets of the change. At the core of the KEP
process is communication and consideration.
--&gt;
&lt;p&gt;&lt;strong&gt;FSM：我懂了，新增了一些结构。回顾来看，你认为这种流程方法最重要的改进是什么？&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KG&lt;/strong&gt;：总体而言，我认为最有影响力的改进在于聚焦 KEP 的核心意图。
KEP 不仅仅是设计的存档文件，更是提供了一种结构化的方式来讨论和达成共识。
KEP 流程的核心是沟通和审慎考虑。&lt;/p&gt;
&lt;!--
To that end, some of the significant changes revolve around a more detailed and accessible KEP
template. A significant amount of work was put in over time to get the
[k/enhancements](https://github.com/kubernetes/enhancements) repo into its current form -- a
directory structure organized by SIG with the contours of the modern KEP template (with
Proposal/Motivation/Design Details subsections). We might take that basic structure for granted
today, but it really represents the work of many people trying to get the foundation of this process
in place over time.
--&gt;
&lt;p&gt;为此，一些重要的改进围绕着更详细且更易于访问的 KEP 模板展开。
我们投入了大量时间，使 &lt;a href=&#34;https://github.com/kubernetes/enhancements&#34;&gt;k/enhancements&lt;/a&gt;
仓库发展成当前的形式：目录结构按 SIG 小组划分，附带现代 KEP 模板文件，
其中包含 Proposal/Motivation/Design Details（提案/动机/设计细节）等小节。
我们今天可能认为这种基本结构是理所当然的，但它实际上代表付出了许多人力和时间努力工作才奠定了这一流程基础。&lt;/p&gt;
&lt;!--
As Kubernetes matures, we’ve needed to think about more than just the end goal of getting a single
feature merged. We need to think about things like: stability, performance, setting and meeting user
expectations. And as we’ve thought about those things the template has grown more detailed. The
addition of the Production Readiness Review was major as well as the enhanced testing requirements
(varying at different stages of a KEP’s lifecycle).
--&gt;
&lt;p&gt;随着 Kubernetes 的发展和成熟，我们需要考虑的不仅仅是如何合并单个特性，还需要关注稳定性、性能、设置和用户期望等问题。
因此随着我们的思考深入，KEP 模板变得更详细。例如增加了 Production Readiness Review 机制，同时对测试要求进行了强化
（这些要求会随着 KEP 生命周期的不同阶段动态调整）。&lt;/p&gt;
&lt;!--
## Current areas of focus

**FSM: Speaking of maturing, we’ve [recently released Kubernetes
v1.31](https://kubernetes.io/blog/2024/08/13/kubernetes-v1-31-release/), and work on v1.32 [has
started](https://github.com/fsmunoz/sig-release/tree/release-1.32/releases/release-1.32). Are there
any areas that the Enhancements sub-project is currently addressing that might change the way things
are done?**
--&gt;
&lt;h2 id=&#34;current-areas-of-focus&#34;&gt;当前关注领域  &lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;FSM：说到发展，我们&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2024/08/13/kubernetes-v1-31-release/&#34;&gt;最近发布了 Kubernetes v1.31&lt;/a&gt;，
而 v1.32 版本的开发工作&lt;a href=&#34;https://github.com/fsmunoz/sig-release/tree/release-1.32/releases/release-1.32&#34;&gt;已经开始&lt;/a&gt;。
Enhancements 子项目目前有哪些领域正在推进以改进这个流程？&lt;/strong&gt;&lt;/p&gt;
&lt;!--
**KG**: We’re currently working on two things:

  1) _Creating a Process KEP template._ Sometimes people want to harness the KEP process for
  significant changes that are more process oriented rather than feature oriented. We want to
  support this because memorializing changes is important and giving people a better tool to do so
  will only encourage more discussion and transparency.
  2) _KEP versioning._ While our template changes aim to be as non-disruptive as possible, we
  believe that it will be easier to track and communicate those changes to the community better with
  a versioned KEP template and the policies that go alongside such versioning.
--&gt;
&lt;p&gt;&lt;strong&gt;KG&lt;/strong&gt;：我们目前正在进行两项工作：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;创建一个 Process KEP 模板&lt;/strong&gt;。有时，人们希望使用 KEP 流程来记录重要的流程变更，而不是特性变更。
我们希望支持这一点，因为记录变更很重要，为此提供更好的工具将鼓励更多的讨论和更透明。&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;KEP 版本化&lt;/strong&gt;。虽然我们的模板变更旨在尽量减少破坏性影响，但我们认为引入 KEP 版本化及相应的策略，
可以让变更更易于追踪并更好地与社区沟通。&lt;/li&gt;
&lt;/ol&gt;
&lt;!--
Both features will take some time to get right and fully roll out (just like a KEP feature) but we
believe that they will both provide improvements that will benefit the community at large.

**FSM: You mentioned improvements: I remember when project boards for Enhancement tracking were
introduced in recent releases, to great effect and unanimous applause from release team members. Was
this a particular area of focus for the subproject?**
--&gt;
&lt;p&gt;这两项改进都需要时间来完善和推广（就像 KEP 特性本身一样），但我们相信它们最终会给社区带来很大的好处。&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;FSM：你提到了改进：我记得最近的发布引入了用于 Enhancement 追踪的项目看板（Project Board），
发布团队成员对此表示一致好评。这是 Enhancements 子项目的一个重点方向吗？&lt;/strong&gt;&lt;/p&gt;
&lt;!--
**KG**: The Subproject provided support to the Release Team’s Enhancement team in the migration away
from using the spreadsheet to a project board. The collection and tracking of enhancements has
always been a logistical challenge. During my time on the Release Team, I helped with the transition
to an opt-in system of enhancements, whereby the SIG leads &#34;opt-in&#34; KEPs for release tracking. This
helped to enhance communication between authors and SIGs before any significant work was undertaken
on a KEP and removed toil from the Enhancements team. This change used the existing tools to avoid
introducing too many changes at once to the community. Later, the Release Team approached the
Subproject with an idea of leveraging GitHub Project Boards to further improve the collection
process. This was to be a move away from the use of complicated spreadsheets to using repo-native
labels on [k/enhancement](https://github.com/kubernetes/enhancements) issues and project boards.
--&gt;
&lt;p&gt;&lt;strong&gt;KG&lt;/strong&gt;：Enhancements 子项目为 Release Team 的 Enhancement 团队提供支持，从使用电子表格迁移到一个项目看板。
增强提案的收集和跟踪一直是后勤支持的一项挑战。在我担任 Release Team 成员期间，我帮助推动了增强的“选择加入”机制，
即 SIG 负责人需要主动“选择加入” KEP 进行发布追踪。
这有助于在对 KEP 实施重大工作之前，加强作者与 SIG 之间的沟通，并减少 Enhancements 团队的重复工作。
这一变更利用了现有工具，以避免一次性向社区引入过多变化。
后来，Release Team 向子项目提出了利用 GitHub 项目看板进一步改进收集流程的想法。
这一举措旨在从使用复杂的电子表格转为使用 &lt;a href=&#34;https://github.com/kubernetes/enhancements&#34;&gt;k/enhancement&lt;/a&gt;
Issues 和项目看板上的原生仓库标签。&lt;/p&gt;
&lt;!--
**FSM: That surely adds an impact on simplifying the workflow...**

**KG**: Removing sources of friction and promoting clear communication is very important to the
Enhancements Subproject.  At the same time, it’s important to give careful consideration to
decisions that impact the community as a whole. We want to make sure that changes are balanced to
give an upside and while not causing any regressions and pain in the rollout. We supported the
Release Team in ideation as well as through the actual migration to the project boards. It was a
great success and exciting to see the team make high impact changes that helped everyone involved in
the KEP process!
--&gt;
&lt;p&gt;&lt;strong&gt;FSM：这无疑简化了工作流程...&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KG&lt;/strong&gt;：减少摩擦来源、促进清晰沟通对 Enhancements 子项目至关重要。同时，我们也需要谨慎考虑影响整个社区的决策。
我们希望确保变更既带来好处，又不会在推广过程中造成回归或额外负担。
我们支持 Release Team 进行头脑风暴，并协助完成迁移到项目看板的工作。
这次变更取得了巨大成功，很高兴看到团队做出了高影响力的改进，使所有参与 KEP 流程的每个人受益！&lt;/p&gt;
&lt;!--
## Getting involved

**FSM: For those reading that might be curious and interested in helping, how would you describe the
required skills for participating in the sub-project?**

**KG**: Familiarity with KEPs either via experience or taking time to look through the
kubernetes/enhancements repo is helpful. All are welcome to participate if interested - we can take
it from there.
--&gt;
&lt;h2 id=&#34;getting-involved&#34;&gt;如何参与  &lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;FSM：如果有人想要参与 Enhancements 子项目，你认为需要具备哪些技能？&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KG&lt;/strong&gt;：熟悉 KEP 机制，无论是通过体验，还是花时间阅读
&lt;a href=&#34;https://github.com/kubernetes/enhancements&#34;&gt;kubernetes/enhancements&lt;/a&gt; 仓库都会有所帮助。
我们欢迎所有感兴趣的人参与，我们可以一步步引导他们。&lt;/p&gt;
&lt;!--
**FSM: Excellent! Many thanks for your time and insight -- any final comments you would like to
share with our readers?**

**KG**: The Enhancements process is one of the most important parts of Kubernetes and requires
enormous amounts of coordination and collaboration of people and teams across the project to make it
successful. I’m thankful and inspired by everyone’s continued hard work and dedication to making the
project great. This is truly a wonderful community.
--&gt;
&lt;p&gt;&lt;strong&gt;FSM：太棒了！非常感谢你的时间和分享——最后你有什么想对读者们说的吗？&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KG&lt;/strong&gt;：Enhancements 流程是 Kubernetes 生态中最重要组成部分之一，需要各个团队的密切协作才能成功。
我很感激并敬佩大家持续不断的努力工作和奉献，让这个项目越来越好。这真是一个很棒的社区。&lt;/p&gt;
&lt;!--
[^1]: For more information, check the [Production Readiness Review spotlight
    interview](https://kubernetes.io/blog/2023/11/02/sig-architecture-production-readiness-spotlight-2023/)
    in this series.
--&gt;
&lt;div class=&#34;footnotes&#34; role=&#34;doc-endnotes&#34;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&#34;fn:1&#34;&gt;
&lt;p&gt;更多信息参考 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2023/11/02/sig-architecture-production-readiness-spotlight-2023/&#34;&gt;Production Readiness Review 专题采访&lt;/a&gt;。&amp;#160;&lt;a href=&#34;#fnref:1&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

      </description>
    </item>
    
    <item>
      <title>使用 API 流式传输来增强 Kubernetes API 服务器效率</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2024/12/17/kube-apiserver-api-streaming/</link>
      <pubDate>Tue, 17 Dec 2024 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2024/12/17/kube-apiserver-api-streaming/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#39;Enhancing Kubernetes API Server Efficiency with API Streaming&#39;
date: 2024-12-17
slug: kube-apiserver-api-streaming
author: &gt;
 Stefan Schimanski (Upbound),
 Wojciech Tyczynski (Google),
 Lukasz Szaszkiewicz (Red Hat)
--&gt;
&lt;!--
Managing Kubernetes clusters efficiently is critical, especially as their size is growing. 
A significant challenge with large clusters is the memory overhead caused by **list** requests.
--&gt;
&lt;p&gt;高效管理 Kubernetes 集群至关重要，特别是在集群规模不断增长的情况下更是如此。
大型集群面临的一个重大挑战是 &lt;strong&gt;list&lt;/strong&gt; 请求所造成的内存开销。&lt;/p&gt;
&lt;!--
In the existing implementation, the kube-apiserver processes **list** requests by assembling the entire response in-memory before transmitting any data to the client. 
But what if the response body is substantial, say hundreds of megabytes? Additionally, imagine a scenario where multiple **list** requests flood in simultaneously, perhaps after a brief network outage. 
While [API Priority and Fairness](/docs/concepts/cluster-administration/flow-control) has proven to reasonably protect kube-apiserver from CPU overload, its impact is visibly smaller for memory protection. 
This can be explained by the differing nature of resource consumption by a single API request - the CPU usage at any given time is capped by a constant, whereas memory, being uncompressible, can grow proportionally with the number of processed objects and is unbounded.
This situation poses a genuine risk, potentially overwhelming and crashing any kube-apiserver within seconds due to out-of-memory (OOM) conditions. To better visualize the issue, let&#39;s consider the below graph.
--&gt;
&lt;p&gt;在现有的实现中，kube-apiserver 在处理 &lt;strong&gt;list&lt;/strong&gt; 请求时，先在内存中组装整个响应，再将所有数据传输给客户端。
但如果响应体非常庞大，比如数百兆字节呢？另外再想象这样一种场景，有多个 &lt;strong&gt;list&lt;/strong&gt; 请求同时涌入，可能是在短暂的网络中断后涌入。
虽然 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/cluster-administration/flow-control&#34;&gt;API 优先级和公平性&lt;/a&gt;已经证明可以合理地保护
kube-apiserver 免受 CPU 过载，但其对内存保护的影响却明显较弱。这可以解释为各个 API 请求的资源消耗性质有所不同。
在任何给定时间，CPU 使用量都会受到某个常量的限制，而内存由于不可压缩，会随着处理对象数量的增加而成比例增长，且没有上限。
这种情况会带来真正的风险，kube-apiserver 可能会在几秒钟内因内存不足（OOM）状况而淹没和崩溃。
为了更直观地查验这个问题，我们看看下面的图表。&lt;/p&gt;
&lt;!--


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2024/12/17/kube-apiserver-api-streaming/kube-apiserver-memory_usage.png&#34;
         alt=&#34;Monitoring graph showing kube-apiserver memory usage&#34;/&gt; 
&lt;/figure&gt;
--&gt;


&lt;figure class=&#34;diagram-large clickable-zoom&#34;&gt;
    &lt;img src=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2024/12/17/kube-apiserver-api-streaming/kube-apiserver-memory_usage.png&#34;
         alt=&#34;显示 kube-apiserver 内存使用量的监控图表&#34;/&gt; 
&lt;/figure&gt;
&lt;!--
The graph shows the memory usage of a kube-apiserver during a synthetic test.
(see the [synthetic test](#the-synthetic-test) section for more details).
The results clearly show that increasing the number of informers significantly boosts the server&#39;s memory consumption. 
Notably, at approximately 16:40, the server crashed when serving only 16 informers.
--&gt;
&lt;p&gt;以上图表显示了 kube-apiserver 在一次模拟测试中的内存使用情况。
（有关更多细节，参见&lt;a href=&#34;#the-synthetic-test&#34;&gt;模拟测试&lt;/a&gt;一节）。
结果清楚地表明，增加 informer 的数量显著提高了服务器的内存消耗量。
值得注意的是，在大约 16:40 时，服务器在仅提供了 16 个 informer 时就崩溃了。&lt;/p&gt;
&lt;!--
## Why does kube-apiserver allocate so much memory for list requests?

Our investigation revealed that this substantial memory allocation occurs because the server before sending the first byte to the client must:
* fetch data from the database,
* deserialize the data from its stored format,
* and finally construct the final response by converting and serializing the data into a client requested format
--&gt;
&lt;h2 id=&#34;why-does-kube-apiserver-allocates-so-much-memory-for-list-requests&#34;&gt;为什么 kube-apiserver 为 list 请求分配这么多内存？  &lt;/h2&gt;
&lt;p&gt;我们的调查显示，这种大量内存分配的发生是因为在向客户端发送第一个字节之前，服务器必须：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;从数据库中获取数据&lt;/li&gt;
&lt;li&gt;对数据执行从其存储格式的反序列化&lt;/li&gt;
&lt;li&gt;最后通过将数据转换和序列化为客户端所请求的格式来构造最终的响应。&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
This sequence results in significant temporary memory consumption. 
The actual usage depends on many factors like the page size, applied filters (e.g. label selectors), query parameters, and sizes of individual objects. 

Unfortunately, neither [API Priority and Fairness](/docs/concepts/cluster-administration/flow-control) nor Golang&#39;s garbage collection or Golang memory limits can prevent the system from exhausting memory under these conditions. 
The memory is allocated suddenly and rapidly, and just a few requests can quickly deplete the available memory, leading to resource exhaustion.
--&gt;
&lt;p&gt;这个序列导致了显著的临时内存消耗。实际使用量取决于许多因素，
比如分页大小、所施加的过滤器（例如标签选择算符）、查询参数和单个对象的体量。&lt;/p&gt;
&lt;p&gt;不巧的是，无论是 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/cluster-administration/flow-control&#34;&gt;API 优先级和公平性&lt;/a&gt;，
还是 Golang 的垃圾收集或 Golang 的内存限制，都无法在这些状况下防止系统耗尽内存。
内存是被突然且快速分配的，仅仅几个请求就可能迅速耗尽可用内存，导致资源耗尽。&lt;/p&gt;
&lt;!--
Depending on how the API server is run on the node, it might either be killed through OOM by the kernel when exceeding the configured memory limits during these uncontrolled spikes, or if limits are not configured it might have even worse impact on the control plane node.
And worst, after the first API server failure, the same requests will likely hit another control plane node in an HA setup with probably the same impact. 
Potentially a situation that is hard to diagnose and hard to recover from.
--&gt;
&lt;p&gt;取决于 API 服务器在节点上的运行方式，API 服务器可能在这些不受控制的峰值期间因为超过所配置的内存限制而被内核通过 OOM 杀死，
或者如果没有为服务器配置限制值，则其可能对控制平面节点产生更糟糕的影响。最糟糕的是，
在第一个 API 服务器出现故障后，相同的请求将很可能会影响高可用（HA）部署中的另一个控制平面节点，
并可能产生相同的影响。这可能是一个难以诊断和难以恢复的情况。&lt;/p&gt;
&lt;!--
## Streaming list requests

Today, we&#39;re excited to announce a major improvement. 
With the graduation of the _watch list_ feature to beta in Kubernetes 1.32, client-go users can opt-in (after explicitly enabling `WatchListClient` feature gate) 
to streaming lists by switching from **list** to (a special kind of) **watch** requests.
--&gt;
&lt;h2 id=&#34;streaming-list-requests&#34;&gt;流式处理 list 请求  &lt;/h2&gt;
&lt;p&gt;今天，我们很高兴地宣布一项重大改进。随着 Kubernetes 1.32 中 &lt;em&gt;watch list&lt;/em&gt; 特性进阶至 Beta，
client-go 用户可以选择（在显式启用 &lt;code&gt;WatchListClient&lt;/code&gt; 特性门控后）通过将 &lt;strong&gt;list&lt;/strong&gt; 请求切换为（某种特殊类别的）
&lt;strong&gt;watch&lt;/strong&gt; 请求来进行流式处理。&lt;/p&gt;
&lt;!--
**Watch** requests are served from the _watch cache_, an in-memory cache designed to improve scalability of read operations. 
By streaming each item individually instead of returning the entire collection, the new method maintains constant memory overhead. 
The API server is bound by the maximum allowed size of an object in etcd plus a few additional allocations. 
This approach drastically reduces the temporary memory usage compared to traditional **list** requests, ensuring a more efficient and stable system, 
especially in clusters with a large number of objects of a given type or large average object sizes where despite paging memory consumption used to be high.
--&gt;
&lt;p&gt;&lt;strong&gt;watch&lt;/strong&gt; 请求使用 &lt;strong&gt;监视缓存（watch cache）&lt;/strong&gt; 提供服务，监视缓存是设计来提高读操作扩缩容能力的一个内存缓存。
通过逐个流式传输每一项，而不是返回整个集合，这种新方法保持了恒定的内存开销。
API 服务器受限于 etcd 中对象的最大允许体量加上少量额外分配的内存。
与传统的 &lt;strong&gt;list&lt;/strong&gt; 请求相比，尤其是在分页情况下内存消耗仍较高的、具有大量特定类别的对象或对象体量平均较大的集群中，
这种方法大幅降低了临时内存使用量，确保了系统更高效和更稳定。&lt;/p&gt;
&lt;!--
Building on the insight gained from the synthetic test (see the [synthetic test](#the-synthetic-test), we developed an automated performance test to systematically evaluate the impact of the _watch list_ feature. 
This test replicates the same scenario, generating a large number of Secrets with a large payload, and scaling the number of informers to simulate heavy **list** request patterns. 
The automated test is executed periodically to monitor memory usage of the server with the feature enabled and disabled.
--&gt;
&lt;p&gt;基于模拟测试所了解的情况（参见&lt;a href=&#34;#the-synthetic-test&#34;&gt;模拟测试&lt;/a&gt;），我们开发了一种自动化的性能测试，
以系统地评估 &lt;em&gt;watch list&lt;/em&gt; 特性的影响。此测试能够重现相同的场景，生成大量载荷较大的 Secret，
并扩缩容 informer 的数量以模拟高频率的 &lt;strong&gt;list&lt;/strong&gt; 请求模式。
这种自动化测试被定期执行，以监控启用和禁用此特性后服务器的内存使用情况。&lt;/p&gt;
&lt;!--
The results showed significant improvements with the _watch list_ feature enabled. 
With the feature turned on, the kube-apiserver’s memory consumption stabilized at approximately **2 GB**. 
By contrast, with the feature disabled, memory usage increased to approximately **20GB**, a **10x** increase! 
These results confirm the effectiveness of the new streaming API, which reduces the temporary memory footprint.
--&gt;
&lt;p&gt;结果表明，启用 &lt;em&gt;watch list&lt;/em&gt; 特性后有显著改善。
启用此特性时，kube-apiserver 的内存消耗稳定在大约 &lt;strong&gt;2 GB&lt;/strong&gt;。
相比之下，禁用此特性时，内存使用量增加到约 &lt;strong&gt;20 GB&lt;/strong&gt;，增长了 &lt;strong&gt;10 倍&lt;/strong&gt;！
这些结果证实了新的流式 API 的有效性，减少了临时内存占用。&lt;/p&gt;
&lt;!--
## Enabling API Streaming for your component

Upgrade to Kubernetes 1.32. Make sure your cluster uses etcd in version 3.4.31+ or 3.5.13+.
Change your client software to use watch lists. If your client code is written in Golang, you&#39;ll want to enable `WatchListClient` for client-go. 
For details on enabling that feature, read [Introducing Feature Gates to Client-Go: Enhancing Flexibility and Control](/blog/2024/08/12/feature-gates-in-client-go).
--&gt;
&lt;h2 id=&#34;enabling-api-streaming-for-your-component&#34;&gt;为你的组件启用 API 流式传输  &lt;/h2&gt;
&lt;p&gt;升级到 Kubernetes 1.32。确保你的集群使用 etcd v3.4.31+ 或 v3.5.13+。将你的客户端软件更改为使用 watch list。
如果你的客户端代码是用 Golang 编写的，你将需要为 client-go 启用 &lt;code&gt;WatchListClient&lt;/code&gt;。有关启用该特性的细节，
参阅&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2024/08/12/feature-gates-in-client-go&#34;&gt;为 client-go 引入特性门控：增强灵活性和控制&lt;/a&gt;。&lt;/p&gt;
&lt;!--
## What&#39;s next?
In Kubernetes 1.32, the feature is enabled in kube-controller-manager by default despite its beta state. 
This will eventually be expanded to other core components like kube-scheduler or kubelet; once the feature becomes generally available, if not earlier.
Other 3rd-party components are encouraged to opt-in to the feature during the beta phase, especially when they are at risk of accessing a large number of resources or kinds with potentially large object sizes.
--&gt;
&lt;h2 id=&#34;whats-next&#34;&gt;接下来  &lt;/h2&gt;
&lt;p&gt;在 Kubernetes 1.32 中，尽管此特性处于 Beta 状态，但在 kube-controller-manager 中默认被启用。
一旦此特性进阶至正式发布（GA），或许更早，此特性最终将被扩展到 kube-scheduler 或 kubelet 这类其他核心组件。
我们鼓励其他第三方组件在此特性处于 Beta 阶段时选择使用此特性，特别是这些组件在有可能访问大量资源或对象体量较大的情况下。&lt;/p&gt;
&lt;!--
For the time being, [API Priority and Fairness](/docs/concepts/cluster-administration/flow-control) assigns a reasonable small cost to **list** requests. 
This is necessary to allow enough parallelism for the average case where **list** requests are cheap enough. 
But it does not match the spiky exceptional situation of many and large objects. 
Once the majority of the Kubernetes ecosystem has switched to _watch list_, the **list** cost estimation can be changed to larger values without risking degraded performance in the average case,
and with that increasing the protection against this kind of requests that can still hit the API server in the future.
--&gt;
&lt;p&gt;目前，&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/cluster-administration/flow-control&#34;&gt;API 优先级和公平性&lt;/a&gt;为
&lt;strong&gt;list&lt;/strong&gt; 请求带来了少量但合理的开销。这是必要的，以允许在通常 &lt;strong&gt;list&lt;/strong&gt; 请求开销足够低的情况下实现足够的并行性。
但这并不适用于对象数量众多、体量巨大的峰值异常情形。一旦大多数 Kubernetes 生态体系切换到 &lt;em&gt;watch list&lt;/em&gt; ，
就可以将 &lt;strong&gt;list&lt;/strong&gt; 开销估算调整为更大的值，而不必担心在平均情况下出现性能下降，
从而提高对未来可能仍会影响 API 服务器的此类请求的保护。&lt;/p&gt;
&lt;!--
## The synthetic test

In order to reproduce the issue, we conducted a manual test to understand the impact of **list** requests on kube-apiserver memory usage. 
In the test, we created 400 Secrets, each containing 1 MB of data, and used informers to retrieve all Secrets.
--&gt;
&lt;h2 id=&#34;the-synthetic-test&#34;&gt;模拟测试  &lt;/h2&gt;
&lt;p&gt;为了重现此问题，我们实施了手动测试，以了解 &lt;strong&gt;list&lt;/strong&gt; 请求对 kube-apiserver 内存使用量的影响。
在测试中，我们创建了 400 个 Secret，每个 Secret 包含 1 MB 的数据，并使用 informer 检索所有 Secret。&lt;/p&gt;
&lt;!--
The results were alarming, only 16 informers were needed to cause the test server to run out of memory and crash, demonstrating how quickly memory consumption can grow under such conditions.

Special shout out to [@deads2k](https://github.com/deads2k) for his help in shaping this feature.
--&gt;
&lt;p&gt;结果令人担忧，仅需 16 个 informer 就足以导致测试服务器内存耗尽并崩溃，展示了在这些状况下内存消耗快速增长的方式。&lt;/p&gt;
&lt;p&gt;特别感谢 &lt;a href=&#34;https://github.com/deads2k&#34;&gt;@deads2k&lt;/a&gt; 在构造此特性所提供的帮助。&lt;/p&gt;
&lt;!--
## Kubernetes 1.33 update

Since this feature was started, [Marek Siarkowicz](https://github.com/serathius) integrated a new technology into the
Kubernetes API server: _streaming collection encoding_.
Kubernetes v1.33 introduced two related feature gates, `StreamingCollectionEncodingToJSON` and `StreamingCollectionEncodingToProtobuf`.
These features encode via a stream and avoid allocating all the memory at once.
This functionality is bit-for-bit compatible with existing **list** encodings, produces even greater server-side memory savings, and doesn&#39;t require any changes to client code.
In 1.33, the `WatchList` feature gate is disabled by default.
--&gt;
&lt;h2 id=&#34;kubernetes-1.33-update&#34;&gt;Kubernetes 1.33 更新  &lt;/h2&gt;
&lt;p&gt;自该功能启动以来，&lt;a href=&#34;https://github.com/serathius&#34;&gt;Marek Siarkowicz&lt;/a&gt; 在 Kubernetes API
服务器中加入了一项新技术：&lt;strong&gt;流式集合编码&lt;/strong&gt;。在 Kubernetes v1.33 中，引入了两个相关的特性门控：
&lt;code&gt;StreamingCollectionEncodingToJSON&lt;/code&gt; 和 &lt;code&gt;StreamingCollectionEncodingToProtobuf&lt;/code&gt;。它们通过流的方式进行编码，
避免一次性分配所有内存。该功能与现有的 &lt;strong&gt;list&lt;/strong&gt; 编码实现了比特级完全兼容，不仅能更显著地节省服务器端内存，
而且无需修改任何客户端代码。在 1.33 版本中，&lt;code&gt;WatchList&lt;/code&gt; 特性门控默认是禁用的。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.32 增加了新的 CPU Manager 静态策略选项用于严格 CPU 预留</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2024/12/16/cpumanager-strict-cpu-reservation/</link>
      <pubDate>Mon, 16 Dec 2024 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2024/12/16/cpumanager-strict-cpu-reservation/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#39;Kubernetes v1.32 Adds A New CPU Manager Static Policy Option For Strict CPU Reservation&#39;
date: 2024-12-16
slug: cpumanager-strict-cpu-reservation
author: &gt;
  [Jing Zhang](https://github.com/jingczhang) (Nokia)
--&gt;
&lt;!--
In Kubernetes v1.32, after years of community discussion, we are excited to introduce a
`strict-cpu-reservation` option for the [CPU Manager static policy](/docs/tasks/administer-cluster/cpu-management-policies/#static-policy-options).
This feature is currently in alpha, with the associated policy hidden by default. You can only use the
policy if you explicitly enable the alpha behavior in your cluster.
--&gt;
&lt;p&gt;在 Kubernetes v1.32 中，经过社区多年的讨论，我们很高兴地引入了
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/administer-cluster/cpu-management-policies/#static-policy-options&#34;&gt;CPU Manager 静态策略&lt;/a&gt;的
&lt;code&gt;strict-cpu-reservation&lt;/code&gt; 选项。此特性当前处于 Alpha 阶段，默认情况下关联的策略是隐藏的。
只有在你的集群中明确启用了此 Alpha 行为后，才能使用此策略。&lt;/p&gt;
&lt;!--
## Understanding the feature

The CPU Manager static policy is used to reduce latency or improve performance. The `reservedSystemCPUs` defines an explicit CPU set for OS system daemons and kubernetes system daemons. This option is designed for Telco/NFV type use cases where uncontrolled interrupts/timers may impact the workload performance. you can use this option to define the explicit cpuset for the system/kubernetes daemons as well as the interrupts/timers, so the rest CPUs on the system can be used exclusively for workloads, with less impact from uncontrolled interrupts/timers. More details of this parameter can be found on the [Explicitly Reserved CPU List](/docs/tasks/administer-cluster/reserve-compute-resources/#explicitly-reserved-cpu-list) page.

If you want to protect your system daemons and interrupt processing, the obvious way is to use the `reservedSystemCPUs` option.
--&gt;
&lt;h2 id=&#34;理解此特性&#34;&gt;理解此特性&lt;/h2&gt;
&lt;p&gt;CPU Manager 静态策略用于减少延迟或提高性能。&lt;code&gt;reservedSystemCPUs&lt;/code&gt;
定义了一个明确的 CPU 集合，供操作系统系统守护进程和 Kubernetes 系统守护进程使用。
此选项专为 Telco/NFV 类型的使用场景设计，在这些场景中，不受控制的中断/计时器可能会影响工作负载的性能。
你可以使用此选项为系统/Kubernetes 守护进程以及中断/计时器定义明确的 CPU 集合，
从而使系统上的其余 CPU 可以专用于工作负载，并减少不受控制的中断/计时器带来的影响。
有关此参数的更多详细信息，请参阅
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/administer-cluster/reserve-compute-resources/#explicitly-reserved-cpu-list&#34;&gt;显式预留的 CPU 列表&lt;/a&gt;
页面。&lt;/p&gt;
&lt;p&gt;如果你希望保护系统守护进程和中断处理，显而易见的方法是使用 &lt;code&gt;reservedSystemCPUs&lt;/code&gt; 选项。&lt;/p&gt;
&lt;!--
However, until the Kubernetes v1.32 release, this isolation was only implemented for guaranteed
pods that made requests for a whole number of CPUs. At pod admission time, the kubelet only
compares the CPU _requests_ against the allocatable CPUs. In Kubernetes, limits can be higher than
the requests; the previous implementation allowed burstable and best-effort pods to use up
the capacity of `reservedSystemCPUs`, which could then starve host OS services of CPU - and we
know that people saw this in real life deployments.
The existing behavior also made benchmarking (for both infrastructure and workloads) results inaccurate.

When this new `strict-cpu-reservation` policy option is enabled, the CPU Manager static policy will not allow any workload to use the reserved system CPU cores.
--&gt;
&lt;p&gt;然而，在 Kubernetes v1.32 发布之前，这种隔离仅针对请求整数个 CPU
的 Guaranteed 类型 Pod 实现。在 Pod 准入时，kubelet 仅将 CPU
&lt;strong&gt;请求量&lt;/strong&gt;与可分配的 CPU 进行比较。在 Kubernetes 中，限制值可以高于请求值；
之前的实现允许 Burstable 和 BestEffort 类型的 Pod 使用 &lt;code&gt;reservedSystemCPUs&lt;/code&gt; 的容量，
这可能导致主机操作系统服务缺乏足够的 CPU 资源 —— 并且我们已经知道在实际部署中确实发生过这种情况。
现有的行为还导致基础设施和工作负载的基准测试结果不准确。&lt;/p&gt;
&lt;p&gt;当启用这个新的 &lt;code&gt;strict-cpu-reservation&lt;/code&gt; 策略选项后，CPU Manager
静态策略将不允许任何工作负载使用预留的系统 CPU 核心。&lt;/p&gt;
&lt;!--
## Enabling the feature

To enable this feature, you need to turn on both the `CPUManagerPolicyAlphaOptions` feature gate and the `strict-cpu-reservation` policy option. And you need to remove the `/var/lib/kubelet/cpu_manager_state` file if it exists and restart kubelet.

With the following kubelet configuration:
--&gt;
&lt;h2 id=&#34;启用此特性&#34;&gt;启用此特性&lt;/h2&gt;
&lt;p&gt;要启用此特性，你需要同时开启 &lt;code&gt;CPUManagerPolicyAlphaOptions&lt;/code&gt; 特性门控和
&lt;code&gt;strict-cpu-reservation&lt;/code&gt; 策略选项。并且如果存在 &lt;code&gt;/var/lib/kubelet/cpu_manager_state&lt;/code&gt;
文件，则需要删除该文件并重启 kubelet。&lt;/p&gt;
&lt;p&gt;使用以下 kubelet 配置：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yaml&#34; data-lang=&#34;yaml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;kind&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;KubeletConfiguration&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;apiVersion&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;kubelet.config.k8s.io/v1beta1&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;featureGates&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;...&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;CPUManagerPolicyOptions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;true&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;CPUManagerPolicyAlphaOptions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#a2f;font-weight:bold&#34;&gt;true&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;cpuManagerPolicy&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;static&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;cpuManagerPolicyOptions&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;  &lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;strict-cpu-reservation&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;true&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#008000;font-weight:bold&#34;&gt;reservedSystemCPUs&lt;/span&gt;:&lt;span style=&#34;color:#bbb&#34;&gt; &lt;/span&gt;&lt;span style=&#34;color:#b44&#34;&gt;&amp;#34;0,32,1,33,16,48&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#00f;font-weight:bold&#34;&gt;...&lt;/span&gt;&lt;span style=&#34;color:#bbb&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
When `strict-cpu-reservation` is not set or set to false:
--&gt;
&lt;p&gt;当未设置 &lt;code&gt;strict-cpu-reservation&lt;/code&gt; 或将其设置为 false 时：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-console&#34; data-lang=&#34;console&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000080;font-weight:bold&#34;&gt;#&lt;/span&gt; cat /var/lib/kubelet/cpu_manager_state
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;{&amp;#34;policyName&amp;#34;:&amp;#34;static&amp;#34;,&amp;#34;defaultCpuSet&amp;#34;:&amp;#34;0-63&amp;#34;,&amp;#34;checksum&amp;#34;:1058907510}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
When `strict-cpu-reservation` is set to true:
--&gt;
&lt;p&gt;当 &lt;code&gt;strict-cpu-reservation&lt;/code&gt; 设置为 true 时：&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-console&#34; data-lang=&#34;console&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#000080;font-weight:bold&#34;&gt;#&lt;/span&gt; cat /var/lib/kubelet/cpu_manager_state
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;{&amp;#34;policyName&amp;#34;:&amp;#34;static&amp;#34;,&amp;#34;defaultCpuSet&amp;#34;:&amp;#34;2-15,17-31,34-47,49-63&amp;#34;,&amp;#34;checksum&amp;#34;:4141502832}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;!--
## Monitoring the feature

You can monitor the feature impact by checking the following CPU Manager counters:
- `cpu_manager_shared_pool_size_millicores`: report shared pool size, in millicores (e.g. 13500m)
- `cpu_manager_exclusive_cpu_allocation_count`: report exclusively allocated cores, counting full cores (e.g. 16)
--&gt;
&lt;h2 id=&#34;监控此特性&#34;&gt;监控此特性&lt;/h2&gt;
&lt;p&gt;你可以通过检查以下 CPU Manager 计数器来监控该特性的影响：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;cpu_manager_shared_pool_size_millicores&lt;/code&gt;：报告共享池大小，以毫核为单位（例如 13500m）&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cpu_manager_exclusive_cpu_allocation_count&lt;/code&gt;：报告独占分配的核心数，按完整核心计数（例如 16）&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
Your best-effort workloads may starve if the `cpu_manager_shared_pool_size_millicores` count is zero for prolonged time.

We believe any pod that is required for operational purpose like a log forwarder should not run as best-effort, but you can review and adjust the amount of CPU cores reserved as needed.
--&gt;
&lt;p&gt;如果 &lt;code&gt;cpu_manager_shared_pool_size_millicores&lt;/code&gt; 计数在长时间内为零，
你的 BestEffort 类型工作负载可能会因资源匮乏而受到影响。&lt;/p&gt;
&lt;p&gt;我们建议，任何用于操作目的的 Pod（如日志转发器）都不应以 BestEffort 方式运行，
但你可以根据需要审查并调整预留的 CPU 核心数量。&lt;/p&gt;
&lt;!--
## Conclusion

Strict CPU reservation is critical for Telco/NFV use cases. It is also a prerequisite for enabling the all-in-one type of deployments where workloads are placed on nodes serving combined control+worker+storage roles.

We want you to start using the feature and looking forward to your feedback.
--&gt;
&lt;h2 id=&#34;总结&#34;&gt;总结&lt;/h2&gt;
&lt;p&gt;严格的 CPU 预留对于 Telco/NFV 使用场景至关重要。
它也是启用一体化部署类型（其中工作负载被放置在同时担任控制面节点、工作节点和存储角色的节点上）的前提条件。&lt;/p&gt;
&lt;p&gt;我们希望你开始使用该特性，并期待你的反馈。&lt;/p&gt;
&lt;!--
## Further reading

Please check out the [Control CPU Management Policies on the Node](/docs/tasks/administer-cluster/cpu-management-policies/)
task page to learn more about the CPU Manager, and how it fits in relation to the other node-level resource managers.
--&gt;
&lt;h2 id=&#34;进一步阅读&#34;&gt;进一步阅读&lt;/h2&gt;
&lt;p&gt;请查看&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/administer-cluster/cpu-management-policies/&#34;&gt;节点上的控制 CPU 管理策略&lt;/a&gt;任务页面，
以了解更多关于 CPU Manager 的信息，以及它如何与其他节点级资源管理器相关联。&lt;/p&gt;
&lt;!--
## Getting involved

This feature is driven by the [SIG Node](https://github.com/Kubernetes/community/blob/master/sig-node/README.md). If you are interested in helping develop this feature, sharing feedback, or participating in any other ongoing SIG Node projects, please attend the SIG Node meeting for more details.
--&gt;
&lt;h2 id=&#34;参与其中&#34;&gt;参与其中&lt;/h2&gt;
&lt;p&gt;此特性由 &lt;a href=&#34;https://github.com/kubernetes/community/blob/master/sig-node/README.md&#34;&gt;SIG Node&lt;/a&gt;
推动。如果你有兴趣帮助开发此特性、分享反馈或参与任何其他正在进行的 SIG Node 项目，
请参加 SIG Node 会议以获取更多详情。&lt;/p&gt;

      </description>
    </item>
    
    <item>
      <title>Kubernetes v1.32：内存管理器进阶至 GA</title>
      <link>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2024/12/13/memory-manager-goes-ga/</link>
      <pubDate>Fri, 13 Dec 2024 00:00:00 +0000</pubDate>
      
      <guid>https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/blog/2024/12/13/memory-manager-goes-ga/</guid>
      <description>
        
        
        &lt;!--
layout: blog
title: &#34;Kubernetes v1.32: Memory Manager Goes GA&#34;
date: 2024-12-13
slug: memory-manager-goes-ga
author: &gt;
  [Talor Itzhak](https://github.com/Tal-or) (Red Hat)
--&gt;
&lt;!--
With Kubernetes 1.32, the memory manager has officially graduated to General Availability (GA),
marking a significant milestone in the journey toward efficient and predictable memory allocation for containerized applications.
Since Kubernetes v1.22, where it graduated to beta, the memory manager has proved itself reliable, stable and a good complementary feature for the
[CPU Manager](/docs/tasks/administer-cluster/cpu-management-policies/).
--&gt;
&lt;p&gt;随着 Kubernetes 1.32 的发布，内存管理器已进阶至正式发布（GA），
这标志着在为容器化应用实现高效和可预测的内存分配的旅程中迈出了重要的一步。
内存管理器自 Kubernetes v1.22 进阶至 Beta 后，其可靠性、稳定性已得到证实，
是 &lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/tasks/administer-cluster/cpu-management-policies/&#34;&gt;CPU 管理器&lt;/a&gt;的一个良好补充特性。&lt;/p&gt;
&lt;!--
As part of kubelet&#39;s workload admission process, 
the memory manager provides topology hints 
to optimize memory allocation and alignment. 
This enables users to allocate exclusive
memory for Pods in the [Guaranteed](/docs/concepts/workloads/pods/pod-qos/#guaranteed) QoS class.
More details about the process can be found in the memory manager goes to beta [blog](/blog/2021/08/11/kubernetes-1-22-feature-memory-manager-moves-to-beta/).
--&gt;
&lt;p&gt;作为 kubelet 的工作负载准入过程的一部分，内存管理器提供拓扑提示以优化内存分配和对齐。这使得用户能够为
&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/zh-cn/docs/concepts/workloads/pods/pod-qos/#guaranteed&#34;&gt;Guaranteed&lt;/a&gt; QoS 类的 Pod 分配独占的内存。
有关此过程的细节，参见博客：&lt;a href=&#34;https://deploy-preview-54621--kubernetes-io-main-staging.netlify.app/blog/2021/08/11/kubernetes-1-22-feature-memory-manager-moves-to-beta/&#34;&gt;内存管理器进阶至 Beta&lt;/a&gt;。&lt;/p&gt;
&lt;!--
Most of the changes introduced since the Beta are bug fixes, internal refactoring and 
observability improvements, such as metrics and better logging.
--&gt;
&lt;p&gt;自 Beta 以来引入的大部分变更是修复 Bug、内部重构以及改进可观测性（例如优化指标和日志）。&lt;/p&gt;
&lt;!--
## Observability improvements

As part of the effort
to increase the observability of memory manager, new metrics have been added
to provide some statistics on memory allocation patterns.
--&gt;
&lt;h2 id=&#34;observability-improvements&#34;&gt;改进可观测性  &lt;/h2&gt;
&lt;p&gt;作为提高内存管理器可观测性工作的一部分，新增了一些指标以提供关于内存分配模式的某些统计信息。&lt;/p&gt;
&lt;!--
* **memory_manager_pinning_requests_total** -
tracks the number of times the pod spec required the memory manager to pin memory pages.

* **memory_manager_pinning_errors_total** - 
tracks the number of times the pod spec required the memory manager 
to pin memory pages, but the allocation failed.
--&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;memory_manager_pinning_requests_total&lt;/strong&gt; -
跟踪 Pod 规约要求内存管理器锁定内存页的次数。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;memory_manager_pinning_errors_total&lt;/strong&gt; -
跟踪 Pod 规约要求内存管理器锁定内存页但分配失败的次数。&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;!--
## Improving memory manager reliability and consistency

The kubelet does not guarantee pod ordering
when admitting pods after a restart or reboot.

In certain edge cases, this behavior could cause
the memory manager to reject some pods,
and in more extreme cases, it may cause kubelet to fail upon restart.
--&gt;
&lt;h2 id=&#34;improving-memory-manager-reliability-and-consistency&#34;&gt;提高内存管理器可靠性和一致性  &lt;/h2&gt;
&lt;p&gt;kubelet 不保证在 Pod 重启或重新引导后准入 Pod 的顺序。&lt;/p&gt;
&lt;p&gt;在某些边缘情况下，这种行为可能导致内存管理器拒绝某些 Pod，
在更极端的情况下，可能导致 kubelet 在重启时失败。&lt;/p&gt;
&lt;!--
Previously, the beta implementation lacked certain checks and logic to prevent 
these issues.

To stabilize the memory manager for general availability (GA) readiness,
small but critical refinements have been
made to the algorithm, improving its robustness and handling of edge cases.
--&gt;
&lt;p&gt;以前，Beta 实现缺乏某些检查和逻辑来防止这些问题的发生。&lt;/p&gt;
&lt;p&gt;为了使内存管理器更为稳定，以便为进阶至正式发布（GA）做好准备，
我们对算法进行了小而美的改进，提高了其稳健性和对边缘场景的处理能力。&lt;/p&gt;
&lt;!--
## Future development

There is more to come for the future of Topology Manager in general,
and memory manager in particular.
Notably, ongoing efforts are underway
to extend [memory manager support to Windows](https://github.com/kubernetes/kubernetes/pull/128560),
enabling CPU and memory affinity on a Windows operating system.
--&gt;
&lt;h2 id=&#34;future-development&#34;&gt;未来发展  &lt;/h2&gt;
&lt;p&gt;总体而言，未来对拓扑管理器（Topology Manager），特别是内存管理器，会有更多特性推出。
值得一提的是，目前的工作重心是将&lt;a href=&#34;https://github.com/kubernetes/kubernetes/pull/128560&#34;&gt;内存管理器支持扩展到 Windows&lt;/a&gt;，
使得在 Windows 操作系统上实现 CPU 和内存亲和性成为可能。&lt;/p&gt;
&lt;!--
## Getting involved

This feature is driven by the [SIG Node](https://github.com/Kubernetes/community/blob/master/sig-node/README.md) community.
Please join us to connect with the community
and share your ideas and feedback around the above feature and
beyond.
We look forward to hearing from you!
--&gt;
&lt;h2 id=&#34;getting-involved&#34;&gt;参与其中  &lt;/h2&gt;
&lt;p&gt;此特性由 &lt;a href=&#34;https://github.com/Kubernetes/community/blob/master/sig-node/README.md&#34;&gt;SIG Node&lt;/a&gt;
社区推动。请加入我们，与社区建立联系，分享你对上述特性及其他方面的想法和反馈。
我们期待听到你的声音！&lt;/p&gt;

      </description>
    </item>
    
  </channel>
</rss>
