KubeCon + CloudNativeCon North America 2025#
Thursday, November 13, 2025#
Total Sessions: 101
Badge Pick-Up#
Time: 8:00am EST - 4:00pm EST
Venue: Building B | Level 4 | Registration Hall B, Atlanta, GA, USA
Type: REGISTRATION
Coat + Bag Check#
Time: 8:00am EST - 4:15pm EST
Venue: Building A | Level 4 | A412, Atlanta, GA, USA
Type: REGISTRATION
Description: Please note we are unable to store any items overnight and cameras, laptop equipment or any other electronic devices cannot be stored at any time.
Keynote: Welcome Back + Opening Remarks#
Time: 9:00am EST - 9:05am EST
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Keynote: Beyond Operations: Scaling Platform Engineering in the CNCF Community#
Time: 9:07am EST - 9:23am EST
Speakers: Abby Bangser (Principal Engineer, Syntasso)
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Description: Cloud native has always been about more than technology. Each wave of innovation, from early projects reshaping deployment to today’s higher-level platforms, has combined technical progress with shifts in how organisations deliver value. Yet too often these movements narrow into tool-centric quick fixes. Platform engineering sits at these crossroads. On one side it risks becoming another hype cycle, but on the other it offers a way out of today’s fragmented, unsustainable reality. This keynote explores how decades of platform experience paired with the Linux Foundation and CNCF are guiding this evolution through community groups, white papers, and foundational technologies. It will challenge us to see platform engineering not as another operational trend, but as a higher-level abstraction, one that highlights organisational patterns, tackles interoperability, and informs architectural choices to ensure platforms deliver lasting value.
Sponsored Keynote: Cloud Scale Enterprise AI: How Cohere Runs on Open Source with Oracle Cloud#
Time: 9:25am EST - 9:30am EST
Speakers: Aanand Krishnan (Vice President of Products at Oracle Cloud Infrastructure, Oracle); Autumn Moulder (Vice President of Engineering, Cohere)
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Description: Open source and Kubernetes are at the heart of modern AI innovation. In this keynote, Cohere and Oracle Cloud Infrastructure (OCI) will share how Cohere builds and serves cutting-edge models on top of Kubernetes leveraging open source technologies and foundational infrastructure and tooling from OCI. Attendees will get an inside look at architectural choices and lessons learned from training and serving massive AI workloads for enterprises in regulated industries worldwide, as well as how open collaboration between cloud providers and AI pioneers is shaping the future of cloud native agentic AI.
Keynote: The Community-Driven Evolution of the Kubernetes Network Driver#
Time: 9:32am EST - 9:47am EST
Speakers: Lionel Jouin (Software Engineer, Red Hat); Antonio Ojea (Staff Software Engineer, Google)
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Description: Kubernetes networking is changing significantly by moving beyond traditional technologies to meet the demands of complex and evolving needs for AI and telecommunications applications. This talk delves into the next generation of Kubernetes networking by exploring the creation of a network driver using the Dynamic Resource Allocation (DRA) framework. We aim to simplify complex concepts, highlight the benefits of this flexible and adaptable approach, and offer a practical guide to help you get started with this innovative technology. As active Kubernetes developers in networking and part of the WG Device Management group developing DRA, we have both contributed to modeling the existing API and created DRA Networking drivers, some of which are already used in production. The presenters will provide attendees with practical, field-tested knowledge and best practices derived from their unique position in developing and deploying DRA.
Sponsored Keynote: Scaling Smarter: Simplifying Multicluster AI with KAITO and KubeFleet#
Time: 9:49am EST - 9:54am EST
Speakers: Jorge Palma (Principal PM Lead for Azure Kubernetes Services, Microsoft)
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Description: As demand for AI workloads on Kubernetes grows, multicluster inferencing has emerged as a powerful yet complex architectural pattern. While multicluster support offers benefits in terms of geographic redundancy, data sovereignty, and resource optimization, it also introduces significant challenges around orchestration, traffic routing, cost control, and operational overhead. To address these challenges, we’ll introduce two CNCF projects—KAITO and KubeFleet—that work together to simplify and optimize multicluster AI operations. KAITO provides a declarative framework for managing AI inference workflows with built-in support for model versioning, and performance telemetry. KubeFleet complements this by enabling seamless workload distribution across clusters, based on cost, latency, and availability. Together, these tools reduce operational complexity, improve cost efficiency, and ensure consistent performance at scale.
Keynote: Cloud Native Back to the Future: The Road Ahead#
Time: 9:56am EST - 10:06am EST
Speakers: Jeremy Rickard (Principal Software Engineer, Microsoft); Alex Chircop (Chief Architect, Akamai)
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Description: The Cloud Native Computing Foundation (CNCF) turns 10 this year, now home to more than 200 projects across the cloud native landscape. As we look ahead, the community faces new demands around security, sustainability, complexity, and emerging workloads like AI inference and agents. As many areas of the ecosystem transition to mature foundational building blocks, we are excited to explore the next evolution of cloud native development. The TOC will highlight how these challenges open opportunities to shape the next generation of applications and ensure the ecosystem continues to thrive. How are new projects addressing these new emerging workloads?How will these new projects impact security hygiene in the ecosystem?How will existing projects adapt to meet new realities?How is the CNCF evolving to support this next generation of computing? Join us as we reflect on the first decade of cloud native—and look ahead to how this community will power the age of AI, intelligent systems, and beyond.
Keynote: Predictive Scaling and Capacity Planning With Machine Learning at Amazon.com#
Time: 10:08am EST - 10:23am EST
Speakers: Artur Souza (Principal Engineer, Amazon); Chunpeng Wang (Senior Applied Scientist, Amazon)
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Description: Most cloud native services scale horizontally by reacting to increased incoming traffic throughput, CPU utilization or any other relevant metric. While this approach works most of the time, the reaction time might not be fast enough for high velocity events, resulting in many customer facing errors (like high latency and checkout failures), consequently, impacting customer experience. In this session, Chunpeng and Artur will delve into how Prime Day, Diwali, Black Friday and many other high velocity events are handled to keep all services at Amazon.com available. You will learn how we use machine learning and record keeping of future events to proactively scale thousands of services running on top of AWS and then de-scale to keep running at forecasted loads for business-as-usual.
Keynote: Closing Remarks#
Time: 10:25am EST - 10:30am EST
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Coffee Break ☕#
Time: 10:30am EST - 11:00am EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: BREAKS
Relaxation Station#
Time: 10:30am EST - 2:00pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: EXPERIENCES
Description: Take a break from the buzz of the Solutions Showcase and sit back and relax at the Relaxation Station. Enjoy a soothing massage, try your hand at crocheting, or challenge someone to a game of chess. This is the perfect spot to recharge and unwind before diving back into action. Sponsored by:
Solutions Showcase#
Time: 10:30am EST - 2:00pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Sponsored Demo: KubeHound: Identifying attack paths in Kubernetes clusters at scale#
Time: 10:35am EST - 10:55am EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: In this talk we will introduce how KubeHound, an opinionated, scalable, offensive-minded Kubernetes attack graph tool used by security teams across Datadog, can help you pinpoint the most critical attack within your Kubernetes cluster. Single point security findings have little traction either for an attacker or defender. So we will demonstrate how KubeHound being a queryable, graph database of attack paths makes reasoning about security problems via data-driven testing of hypotheses extremely efficient. Live demos of KubeHound will be performed during the talk. At the end of the talk, we will leave you with an open-source tool designed to be run from a laptop to evaluate the attack paths within a single cluster from an attacker or defender point of view. Finally, we will discuss the approach and challenges of implementing a distributed, large-scale version of the tool at Datadog and how you might implement a similar solution in your own environment. In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Learning Lounge: AI May Be the Lead Singer, But You Still Need the Band#
Time: 10:45am EST - 11:00am EST
Speakers: Clyde Seepersad (SVP & General Manager, Linux Foundation Education)
Venue: Building B | Level 1 | Exhibit Hall B3-B5 | Learning Lounge, Atlanta, GA, USA
Type: EXPERIENCES
Description: 10-Minute Tip Talk
Project Demos#
Time: 10:45am EST - 11:50am EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5 | Project Pavilion, Atlanta, GA, USA
Type: PROJECT OPPORTUNITIES
AI-Assisted GitOps With Flux MCP Server#
Time: 11:00am EST - 11:30am EST
Speakers: Stefan Prodan (Principal Engineer, ControlPlane)
Venue: Building B | Level 2 | B206, Atlanta, GA, USA
Type: AI + ML
Description: Managing complex GitOps pipelines across multiple environments often requires deep Kubernetes expertise and time-consuming troubleshooting. What if your team could interact with Flux CD using natural language, getting instant root cause analysis and visualizations of your delivery pipelines? In this session, Stefan will introduce the Flux MCP Server, a new tool in the Flux CD ecosystem that connects AI assistants to Kubernetes clusters. Stefan will demonstrate how to compare deployments across clusters, debug GitOps pipelines end-to-end from Git repos to Flux resources to application logs, and even perform operations using conversational prompts. Stefan will discuss the security implications of using AI assistants in production and how the Flux MCP Server can be configured to prevent harmful operations and sensitive data exposure. The session concludes with the MCP Server roadmap and opportunities for community contribution.
Building Cloud Native Agentic Workflows on Kubernetes for Preventative Healthcare#
Time: 11:00am EST - 11:30am EST
Speakers: Benjamin Consolvo (AMD & Daron Yöndem, AWS)
Venue: Building B | Level 3 | B308-309, Atlanta, GA, USA
Type: AI + ML
Description: Modern preventative-care programs need to reach thousands of patients without drowning clinicians in manual outreach. In this session we show how an open-source, cloud-native stack (including Kubernetes, APISIX, and Prometheus), and the AutoGen multi-agent framework, automates the entire loop: (1) defining U.S. Preventive Services Task Force screening criteria, (2) filtering patient records, and (3) generating personalized emails via OSS LLMs (Llama 3 & DeepSeek-R1) served behind an OpenAI-compatible API on K8s-native AI accelerators. We’ll dissect the YAML and Helm chart flows that keep model endpoints, agents, and the Streamlit front end deployed. Attendees will learn how to: • Build the model inference endpoint orchestration layers with K8s and Helm charts; • Manage API traffic and authentication with APISIX (built on NGINX and etcd); • Stitch agents together with async Python; and • Oversee monitoring and observability with Prometheus and Grafana.
High-Performance AI Workloads in KubeVirt VMs With NVIDIA GPUs: Challenges and Real-World Solutions#
Time: 11:00am EST - 11:30am EST
Speakers: Ezra Silvera (Mr, IBM); Michael Hrivnak (Software Architect, Red Hat)
Venue: Building B | Level 4 | B401-402, Atlanta, GA, USA
Type: AI + ML
Description: Running AI/ML workloads in Pods on bare-metal is common for maximizing GPU performance but lacks strong isolation and flexibility. In this talk, we share how we use KubeVirt to run high-performance AI workloads inside VMs with NVIDIA GPUs and NVLink, achieving near bare-metal speeds. This enables multi-tenancy, improved security, and resource partitioning—critical for service providers and cost-efficient for customers. We’ll show how VM-based worker nodes enable virtual Kubernetes clusters on shared infrastructure, supporting both full BM nodes and partitioned node use cases. We’ll also dive into challenges like integrating NVIDIA Fabric Manager with the Kubebernets/KubeVirt workflow , optimizing NUMA and PCI topology, and aligning Kubernetes scheduling with VM-based GPU layouts. Finally, we’ll share customer use cases demonstrating the need for isolated, high-performance AI environments using Kubernetes-native tooling.
Autoscaling Spring Boot Apps in Kubernetes With KEDA#
Time: 11:00am EST - 11:30am EST
Speakers: John Coyne (Distinguished Engineer, Capital One)
Venue: Building B | Level 4 | B406b-407, Atlanta, GA, USA
Type: APPLICATION DEVELOPMENT
Description: The Horizonal Pod Autoscaler (HPA) in Kubernetes has limited functionality in offering container CPU or Memory as metrics to monitor pod scaling performance. However, the CNCF Graduated Project KEDA, short for Kubernetes-based Event Driven Autoscaling, offers a wide array of options. One of these includes using metrics exposed by Prometheus, another CNCF Graduated project. Another is Micrometer, an Open Source metrics facade that integrates nicely with Spring Boot, which can expose many different metrics from an application. Adding one additional dependency can help to expose those metrics in a format that is understood by Prometheus. Putting all of these together, we can expose any metric we desire from our application and use that to drive the scaling of an application in Kubernetes.
Feature Flags Suck!#
Time: 11:00am EST - 11:30am EST
Speakers: The Problems With Feature Flagging and How To Avoid Them - Pete Hodgson (Software Delivery Consultant, PH1)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 2-3, Atlanta, GA, USA
Type: APPLICATION DEVELOPMENT
Description: OK, feature flags are actually pretty awesome, but if you use them wrong you could certainly be forgiven for thinking they suck. After helping many teams along their flag adoption journey, some common pitfalls become clear. In this talk we’ll avoid the pain and suffering by hearing some horror stories which illustrate the most common problems that teams run into and how to avoid them. • Learn about the most common mistake - too many flags! • Marvel at 3 neat tricks to keep the number of flags in check! • Commiserate with the many organizations who roll their own feature flagging solution, then learn how CNCF’s OpenFeature project can save them! • See how to track flag usage and impact using Open Telemetry!
Spark on Kubernetes, a Practical Guide#
Time: 11:00am EST - 11:30am EST
Speakers: Damon Cortesi (Staff Software Engineer, Airbnb)
Venue: Building B | Level 4 | B405-406a, Atlanta, GA, USA
Type: DATA PROCESSING + STORAGE
Description: If you’re just getting started with Apache Spark on Kubernetes, this talk is for you. Especially if you’re moving on from Spark on YARN where most components are tightly integrated with Spark. Spark on Kubernetes sounds great in theory, but as soon as you get started there are numerous decision to be made including how to submit your jobs (there are now 2 main operators), which shuffle service to use (again at least 2), which scheduler and queue management system to use (about 4 this time), and that doesn’t even begin to touch how to optimize your storage and where to send logs. Join me on our journey through all of the above, including how we benchmarked Spark on Kubernetes with existing systems using TPC-DS and simulating real-world workloads. This talk is your field guide for getting started with Spark on Kubernetes, built from real-world experimentation and hard-earned lessons.
etcd V3.6 and Beyond + etcd-operator Updates#
Time: 11:00am EST - 11:30am EST
Speakers: Siyuan Zhang & Justin Santa Barbara (Software Engineer, Google); Wei Fu (Software Engineer, Microsoft); Arka Saha (Software Engineer, VMware by Broadcom); Ivan Valdes Castillo (VP, Engineering, Inmar Intelligence)
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: The recently released etcd 3.6 marks a significant milestone, bringing crucial advancements that directly impact the stability, performance, and operational efficiency of Kubernetes. This session will delve into the key features of etcd 3.6, and provide an upgrade checklist and highlight changes users need to make before upgrading to the 3.6 release. We will also discuss the extended support of 3.4, and roadmap for 3.7. We will also bring you the latest updates of the etcd-operator. Come join us and raise your etcd questions with the on-site etcd maintainers. Even though etcd 3.6 was announce in KubeCon London, due to its importance, we want to advocate again to make sure Kubernetes are aware of the changes and well-prepared for the upgrade by providing comprehensive guidance and support. We also would like to encourage contributions to the etcd and etcd-operator projects for further enhancements.
Longhorn: Intro, Deep Dive and Q&A#
Time: 11:00am EST - 11:30am EST
Speakers: Shuo Wu (Staff Software Engineer, SUSE)
Venue: Building C | Level 1 | C111-112, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Longhorn is a cloud-native, distributed block storage solution for Kubernetes that supports persistent volumes via CSI. It is designed for resource efficiency and flexible deployment across public cloud, private cloud, on-premises, and edge environments. Longhorn has seen widespread community adoption, with hundreds of thousands of installations worldwide, due to its high availability, resilience, and feature-rich functionality powered by its general-purpose V1 data engine. To further expand adoption, the V2 data engine is now the main focus. In the latest release, we introduced the UBLK frontend, offline replica rebuilding, and several other key capabilities. In this session, we will share upcoming plans for the V2 engine, including online replica delta chunk-based rebuilding, interruption mode for the SPDK data path to enable V2 on low-resource setups. We’ll also cover roadmap and community updates, including adoption trends, contributions, and community engagement initiatives.
Managing Data at Scale: Best Practices and Evolution of SIG-Apps#
Time: 11:00am EST - 11:30am EST
Speakers: Maciej Szulik (Staff Platform Engineer, Defense Unicorns); Janet Kuo (Staff Software Engineer, Google)
Venue: Building C | Level 3 | Georgia Ballroom 1, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Over the past year, Kubernetes has expanded support for high-volume data workloads through Jobs, while the Workload APIs (StatefulSet, ReplicaSet, PDBs, etc.) have become more mature, consistent, and full-featured. SIG Apps has been hard at work, and there’s even more on the horizon. In this session, the SIG Apps leads will provide an overview of the accomplishments over the past year. They will delve into specific changes that have been implemented and discuss potential directions for further improvements. A significant focus will be on the features requiring community input to reach completion. The session will conclude with an open discussion and Q&A, offering attendees insights into contributing to SIG Apps and becoming part of its ongoing evolution.
In-Place Pod Resize in Kubernetes: Dynamic Resource Management Without Restarts#
Time: 11:00am EST - 11:30am EST
Speakers: Tim Allclair & Mofi Rahman (Software Engineer, Google)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 4, Atlanta, GA, USA
Type: OPERATIONS + PERFORMANCE
Description: Adjusting resource requirements in Kubernetes traditionally meant disruptive Pod restarts. This session introduces the In-Place Pod Resize feature, which enables dynamic CPU and memory adjustments for running Pods. We’ll explore how this feature allows on-the-fly resource changes with minimal disruption to reduce downtime for stateful applications, batch jobs, and other restart-sensitive workloads. Attendees will learn about: - The core concepts and mechanics of in-place pod resizing - Practical use cases: when, why and how to use in-place resizing - The current limitations and the future roadmap Join us to discover how to leverage this powerful feature for more resilient and efficient Kubernetes deployments.
Economics of Platforms: Building Marketplaces Beyond Golden Paths#
Time: 11:00am EST - 11:30am EST
Speakers: Atulpriya Sharma (Sr. Developer Advocate | CNCF Ambassador, InfraCloud Technologies)
Venue: Building B | Level 3 | B312-314, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: You’ve built golden paths, achieved adoption, but now face the scaling bottleneck: every new capability requires your team to build, maintain, and support it. What if instead of being the sole provider, you became the marketplace operator? This talk introduces the Internal Developer Marketplace model, which transforms platforms from centralised services into economic ecosystems where any team contributes capabilities. We’ll explore how organisations can evolve from paved paths to community-driven platforms where engineering capabilities become tradeable assets. Through practical examples, we’ll learn about contribution frameworks that turn domain expertise into platform capabilities, governance models that maintain quality without gatekeeping, and recognition systems that incentivise meaningful contributions. The result? Platform engineering that scales beyond your team’s capacity, leverages distributed expertise, and creates sustainable growth through network effects.
The Journey of Deploying Backstage in a Large Organization#
Time: 11:00am EST - 11:30am EST
Speakers: Mathieu Girard & Teddy Poingt (Software Engineering Manager, Beneva)
Venue: Building B | Level 3 | B304-305, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Being a developer in 2025 means a lot more than it used to. The cognitive load and the number of development related activities other than coding that needs to be taken into consideration keeps growing every day (CI/CD, infra-as-code, cloud, finops, networking, monitoring, containers, security, etc.) Internal developer portals are meant to help with this issue but can be challenging to deploy successfully in a large organization. At Beneva, we’ve been running Backstage for almost three years now and our developers are grateful. In our talk, we’ll deep dive in our journey, from the very beginning of sponsorship and financing for our portal initiative to the deployment of Backstage itself and how the mindset of our development teams as portal users evolved from “one more thing they would have to care about” to considering it the one thing that could help them manage and deliver their solutions more efficiently.
The Ultimate Container Challenge: An Interactive Trivia Game on Supply Chain Security#
Time: 11:00am EST - 11:30am EST
Speakers: Aurélie Vache (OVHcloud & Sherine Khoury, Red Hat)
Venue: Building B | Level 3 | B302-303, Atlanta, GA, USA
Type: SECURITY
Description: Congratulations! You’ve successfully built and pushed your container image to a registry, but are you ready to deploy to production? Is your SecOps team confident with your container’s robustness in the face of production environments? How do you ensure the image you’ve built is the one running? Are you sure it is composed of vulnerability-free software and that your supply chain hasn’t been compromised along the way? Don’t panic! In this fun and dynamic talk, you can learn and/or improve your knowledge, about the way to secure your containers, with supply chain security. With a mix of quiz and live demos, you will discover or dig into several supply chain concepts and frameworks, CNCF and open source projects like SBOM, SigStore, SLSA, OpenSSF, VEX, GUAC, in-toto and many more! Are you up for this new quiz challenge? Icing on the cake: Top scores will win some swags.
Tools and Strategies for Making the Most of Kubernetes Access Control#
Time: 11:00am EST - 11:30am EST
Speakers: Lucas Käldström (Upbound & Micah Hausler, AWS)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 1, Atlanta, GA, USA
Type: SECURITY
Description: Have you ever struggled writing least-privilege access control policies for Kubernetes? Are you concerned about the wide permissions of installed Helm charts? Do you manage to regularly audit who has access to sensitive resources? In this talk, Kubernetes contributors Micah and Lucas introduce you to open source tools that help you on your defense in depth journey for securing the Kubernetes API surface. They demonstrate how to right-size your RBAC rules semi-automatically, audit who can access sensitive resources, and check whether policy refactors are correct. This talk is part of a journey to improve Kubernetes access control in core. However, to make this initiative successful, user feedback is needed throughout the process. You’ll learn about the planned Kubernetes Conditional Authorization feature, which will make authoring right-sized policies easier. By the end of the talk, you will know how to get involved, and future directions for improved Kubernetes access control.
🚨 Contribfest: Inspektor Gadget Contribfest: Enhancing the Observability and Security of Your K8s Clusters Through an easy to use Framework#
Time: 11:00am EST - 12:15pm EST
Speakers: Mauricio Vásquez Bernal & Jose Blanquicet (Software Engineer, Microsoft)
Venue: Building B | Level 2 | B208, Atlanta, GA, USA
Type: 🚨 CONTRIBFEST
Description: This session is the perfect opportunity to learn how you can take granular kernel level data (collected using eBPF!) and apply it to a wide range of troubleshooting, monitoring, and security use cases by putting your hands on the Inspektor Gadget project: A system inspection framework for observing Kubernetes and Linux hosts. We’ll start with a quick overview of how Inspektor Gadget works, then show you how to install and configure it in your cluster. Next, we’ll run some live demos of Gadgets like DNS tracing and identifying the pods that use the most resources, which are common issues that come up on K8s. After the introduction is done, we’ll guide you to set up the development environment, then you’ll have the opportunity to contribute in different ways by: - Fixing bugs - Developing your own creative use cases for existing gadgets - Implementing new features - Creating or extending gadgets for new use cases - Brainstorming ideas on features, use cases, etc.
🚨 Contribfest: Level up Your Open Source Journey: Hands-On Backstage Contributions!#
Time: 11:00am EST - 12:15pm EST
Speakers: André Wanlin (Customer Success Engineer, Spotify); Avantika Iyer (Engineering Manager, Spotify); Aramis Sennyey (Software Engineer, DoorDash); Kurt King (Senior Software Engineer, Procore)
Venue: Building B | Level 2 | B207, Atlanta, GA, USA
Type: 🚨 CONTRIBFEST
Description: Jump into the world of Backstage at our interactive ContribFest session! Whether you’re just starting out or already building with Backstage, this event is designed to help you contribute confidently to this CNCF project that’s transforming Internal Developer Portals. We’ll help you set up your development environment (think: Node.js, TypeScript, and more), walk you through the Backstage Contributing Guide, and connect you with beginner-friendly GitHub issues to get you started. Seasoned Backstage contributors can dive deeper—explore advanced topics, build plugins, or tackle more complex issues alongside maintainers and fellow developers. Bring your questions, share your ideas, and collaborate in real time with the Backstage community. No matter your experience level, you’ll leave with practical knowledge and the satisfaction of making a real impact on Backstage!
Sponsored Demo: No Silos, One Data Plane: Fast & Secure SW-Defined Data Ops for Kubernetes#
Time: 11:05am EST - 11:25am EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: Kubernetes workloads demand diverse data services: block CSI for databases, file CSI for analytics, and native S3 for AI/ML pipelines. Today, these are delivered through siloed backends with separate lifecycles, quotas, and policies. This session shows how to unify them into a single software-defined data plane using a centralized data intelligence platform. The platform simultaneously presents block, file, and S3 into Kubernetes via one control plane; supports multi-tenant policies and quotas; and enforces QoS with rate-limiting and priority scheduling to prevent noisy-neighbor issues. As a software-defined solution, it spans core, cloud, and hybrid models. In the demo, we will provision multi-tenant block/file/object storage, set per-tenant quotas and QoS via APIs, and show enforcement across AI training (file), inference (block), and RAG (S3). Attendees will see how a centralized platform simplifies Kubernetes data services, ensuring predictable performance and streamlined operations. In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Building AI/ML Pipelines on Kubernetes#
Time: 11:45am EST - 12:15pm EST
Speakers: Susan Wu (Outbound Product Manager, Google); Ian Chakares (Cloud Builder, Google); Lu Qiu (Database Engineer, LanceDB); Anant Vyas (Sr Staff Software Engineer, Uber); Lucy Sweet (Senior Software Engineer at Uber, Uber)
Venue: Building B | Level 3 | B308-309, Atlanta, GA, USA
Type: AI + ML
Description: Kubernetes is becoming the standard container orchestration including AI/ML supporting use cases for extracting insights, parsing unstructured data, and filling-in missing multimodal data (images, text, audio, video) AI workflows used to waterfall development through one model at a time and are now trending towards data-driven, multi-path development. This means that platform engineers have to build data pipelines that support multi-model infrastructure, giving the AI app the granular ability to choose the right model for any request. Hear from a panel of platform engineers on how they can build data pipelines in Kubernetes for multimodal AI that optimizes for data retrieval, advanced reasoning, and generation capabilities.
Inference Awakens: Tools for the Age of GenAI#
Time: 11:45am EST - 12:15pm EST
Speakers: Alexa Griffith (Senior Software Engineer, Bloomberg); Erica Hughberg (Envoy AI Gateway Maintainer, Tetrate.io)
Venue: Building B | Level 4 | B401-402, Atlanta, GA, USA
Type: AI + ML
Description: A long time ago (okay, not that long), stateless microservices ruled the galaxy. But a new force is rising, shaped by agents, LLMs, and highly dynamic, resource-intensive workloads. As traffic shifts from simple REST to streaming tokens, prompt orchestration and GPU-aware routing, traditional gateways are showing their limits. We’ll explore what it takes to build GenAI platforms capable of serving modern inference at scale without falling to the dark side of complexity. You’ll learn from a real-world reference architecture using open-source tools like Envoy AI Gateway, KServe, and others—designed to support dynamic model-based routing, token-level rate limiting, secure upstream auth, observability, and multi-provider failover. These aren’t bonus features–they’re the new minimum requirements for reliable AI inference. You’ll leave with a practical blueprint for routing, serving and observing LLM traffic, and a clearer vision for how today’s CNCF tools are awakening in the GenAI era.
Bringing OCI Into the GitOps World#
Time: 11:45am EST - 12:15pm EST
Speakers: Jesse Suen (Argo & Kargo Project Creator & Co-Founder & CTO, Akuity)
Venue: Building B | Level 4 | B406b-407, Atlanta, GA, USA
Type: APPLICATION DEVELOPMENT
Description: Git has long been the foundation of GitOps, but what if deployment configs were treated like versioned software packages? This talk explores how the Open Container Initiative (OCI) format, originally designed for container images, is evolving to support a wide array of cloud-native artifacts, including Helm charts, WebAssembly (Wasm) modules, Lambda functions, and Kustomize configuration bundles. We’ll dive into how the OCI ecosystem is becoming a powerful, interoperable standard not just for deployment runtimes but for the full lifecycle of cloud-native applications. We’ll examine how tools like Argo CD, Flux, and Kargo are embracing OCI registries as first-class citizens and the benefits and tradeoffs of adopting this approach. Attendees will learn: how OCI extends beyond containers; how to package and distribute complex or binary artifacts; and how OCI compares to Git in a GitOps context.
Public Technical Oversight Committee (TOC) Meeting#
Time: 11:45am EST - 12:15pm EST
Speakers: Chris Aniszczyk (CTO, Cloud Native Computing Foundation)
Venue: Building C | Level 3 | Georgia Ballroom 3, Atlanta, GA, USA
Type: CLOUD NATIVE EXPERIENCE
Description: This session is a panel discussion moderated by Chris Aniszczyk with members of the Technical Oversight Committee. Feel free to come with questions, but we’ll be doing an overview of the Technical Oversight Committee’s governance structure, scope, mission and processes. To learn more about the TOC, visit https://github.com/cncf/toc
CafeGPT: Serving LLMs Like Coffee With Kubernetes#
Time: 11:45am EST - 12:15pm EST
Speakers: Madhav Jivrajani & Kartik Ramesh (Student, UIUC)
Venue: Building B | Level 3 | B302-303, Atlanta, GA, USA
Type: CLOUD NATIVE NOVICE
Description: Kubernetes is quickly becoming the de-facto platform for serving LLM workloads and with the ecosystem evolving at a staggering pace, it can get quite difficult to not only decouple the fundamentals from the diversity of features that so many Kubernetes based solutions offer today, but also understand them in a way that is not overwhelming. What if we could explore the fundamentals of LLM inference, strategies of efficient deployment, GPU scheduling and where Kubernetes comes in, while learning about how to run a cafe? Join us as we tune out the world for some time, geek out and revisit the fundamentals to explore how Kubernetes and LLM inference systems make sense together, and maybe even learn how to run a cafe!
Next Generation Extension Management Using Kubernetes Image Volumes#
Time: 11:45am EST - 12:15pm EST
Speakers: Andrew L’Ecuyer (Engineering Manager, Snowflake)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 2-3, Atlanta, GA, USA
Type: DATA PROCESSING + STORAGE
Description: A longstanding challenge within the Postgres ecosystem is the management of add-ons, such as database extensions. With hundreds of extensions available, a typical Postgres user will require at least one database extension. Yet challenges arise in immutable container environments where we want to minimize image bloat and build complexity. Thanks to the introduction of Image Volumes in Kubernetes, however, the conversation around extension management has shifted. More specifically, Image Volumes provide a powerful new way of packaging and distributing extensions using your existing cloud-native infrastructure. In this talk, you will see how Snowflake is using Image Volumes, together with the latest Postgres features, to enable a seamless extension management in Kubernetes. We’ll walk through a real-world example using pgvector, demonstrating how extensions can be built into OCI images, distributed via an image registry, and then mounted into database Pods for installation and use.
Building a Kubernetes-native Experience for Your Multi-cluster Fleet#
Time: 11:45am EST - 12:15pm EST
Speakers: Andrea Tosatto & James Munnelly (Site Reliability Engineer, Apple)
Venue: Building B | Level 2 | B206, Atlanta, GA, USA
Type: EMERGING + ADVANCED
Description: Operating multiple Kubernetes clusters is becoming par for the course in any enterprise-scale Kubernetes deployment. As the complexity of your infrastructure deployments grow, so do the opportunities for inefficiencies. Once the boundaries of a single cluster have been broken, the first place this complexity goes to is the end-user, often the least-well-informed to make decisions about optimal resource placement across a global fleet. This talk is a deep-dive into taming the user experience of multi-cluster scheduling in order to enable platform engineering teams to reclaim the power in placing workload across fleets of clusters, and will cover: Multi-cluster resource views (‘pod aggregation’) and subresource proxying for a seamless multi-cluster experience. Multi-cluster discovery, authentication and authorization patterns. Using MultiKueue as the engine for multi-cluster quota abstractions. Using the tenant as the scaling boundaries with scoped apiserver views.
Kubernetes SIG Storage: Intro & Deep Dive#
Time: 11:45am EST - 12:15pm EST
Speakers: Xing Yang (Tech Lead, VMware by Broadcom); Michelle Au (Software Engineer, Google); Hemant Kumar (Software Engineer, Red Hat, Red Hat)
Venue: Building C | Level 1 | C111-112, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Kubernetes SIG Storage is responsible for ensuring that different types of file and block storage are available wherever a container is scheduled, storage capacity management (container ephemeral storage usage, volume resizing, etc.), influencing scheduling of containers based on storage (data gravity, availability, etc.), and generic operations on storage (snapshotting, etc.). SIG Storage also has a project that provides APIs for object storage support in Kubernetes. In this session, we will deep dive into some projects that SIG Storage is currently working on, provide an update on the current status, and discuss what might be coming in the future.
Open Policy Agent (OPA) Intro & Deep Dive#
Time: 11:45am EST - 12:15pm EST
Speakers: Philip Conrad (Software Engineer, Apple); Tyler Schade (Distinguished Engineer, GEICO Tech); Rita Zhang (Principal software engineer, Microsoft); Jaydip Gabani (Software Engineer, Microsoft)
Venue: Building C | Level 3 | Georgia Ballroom 1, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Come to this session to learn about the Open Policy Agent (OPA) project. OPA is a general-purpose policy engine that solves a number of policy-related use cases for authorization, Kubernetes, service mesh, CI/CD, infrastructure permissions, and more. This session will begin with OPA maintainers welcoming newcomers by introducing the project and covering core concepts. Current users will then receive an overview of recent changes, highlighting exciting new features and improvements across OPA and its broader ecosystem. OPA Envoy maintainer Tyler Schade will also share some details about recent changes to the OPA Envoy project, including enhancements to Envoy integration and improved support for SPIFFE identities. A short Gatekeeper update will cover VAP integration (now beta & default), a new audit violations disk export driver, Rego v1 syntax support and more. If you are interested in policy as code and security as it relates to cloud native technology, this session is for you. OPA maintainers will also be available for questions after the session.
SIG API Machinery and AI: What Comes Next?#
Time: 11:45am EST - 12:15pm EST
Speakers: Joe Betz (Staff Software Engineer, sig-api-machinery TL, Google); David Eads (Senior Principal Software Engineer, Red Hat)
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Machine learning and its particular loads, from massive training jobs to complex inference-scale pipelines, has placed unparalleled demands on Kubernetes’s key APIs. This maintainer-track talk will provide an update on the SIG-API Machine Learning’s current work and future directions. We will consider critical AI/ML challenges such as scaling API performance for high-throughput input, evolution of Custom Resource Definitions (CRDs) to manage complex machine learning workflows and specialized hardware (like TPUs and GPUs), and adaptation of the API pattern for efficient processing of huge data sets and distributed machine learning jobs. Attendees will be introduced to the best ways to influence the design, what future KEPs to be added, and the architectural considerations for “what’s next” to keep Kubernetes a premier platform for AI/ML.
Evicted! All the Ways Kubernetes Kills Your Pods (and How To Avoid Them)#
Time: 11:45am EST - 12:15pm EST
Speakers: Ahmet Alp Balkan (Sr. Staff Software Engineer, LinkedIn)
Venue: Building B | Level 4 | B405-406a, Atlanta, GA, USA
Type: OPERATIONS + PERFORMANCE
Description: Anyone running Kubernetes in a large-scale production environment cares deeply about having a predictable Pod lifecycle. Having unknown actors in the system that can terminate your Pods is a scary thought — especially if you run stateful systems on Kubernetes. There are many paths in the Kubernetes core that can abruptly terminate your workloads and cause your apps to dip below their Pod Disruption Budgets, risking unavailability for your customers. Documentation doesn’t go so far as to explain all these paths or how they work. In this talk, we’ll focus on the lesser-known abrupt pod eviction modes caused by Kubernetes components — ranging from kubelet to scheduler to controller-manager — and do a deep dive into Kubernetes internals to explain exactly how these pod terminations happen and what guarantees you can expect. We’ll also debunk some myths like ‘kubelet restarts are safe’. At the end, you’ll leave with a cheatsheet to help you reason about all eviction modes in Kubernetes.
From Pull To Predict: Accelerating AI Model Deployment on Kubernetes#
Time: 11:45am EST - 12:15pm EST
Speakers: Lucas Duarte & Tiago Reichert (Sr. Specialis Solutions Architect, AWS)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 4, Atlanta, GA, USA
Type: OPERATIONS + PERFORMANCE
Description: In the era of large AI models, deployment latency and resource utilization present significant challenges for Kubernetes operators. This session demonstrates techniques to reduce model startup times and optimize cluster resources. We’ll deploy a 7B parameter LLM using Ray and vLLM for scaling and serving, implementing three key optimizations: SOCI (Seekable OCI) for lazy loading of container images, enabling containers to start without downloading the entire image first; an optimized storage layer that keeps models pre-downloaded and ready for quick access; and intelligent node provisioning using Karpenter for dynamic resource allocation. We’ll compare a standard deployment against one using these optimizations, showing the differences in startup times, resource usage, and operational costs. Attendees will learn implementation steps for these techniques, which they can apply to their own Kubernetes environments to improve AI model deployment efficiency.
Flip That Stack: Renovating Edge Infrastructure at the Home Depot#
Time: 11:45am EST - 12:15pm EST
Speakers: Dillon TenBrink (Distinguished Engineer, The Home Depot)
Venue: Building B | Level 3 | B312-314, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: At The Home Depot, we know a solid foundation is key—whether you’re laying tile or deploying edge infrastructure. In this session, we’ll walk through how we designed and built a scalable, resilient edge platform to support real-time retail operations across thousands of stores. You’ll learn how we approached hardware abstraction, workload orchestration, interactions between heritage and modern systems, and observability at the edge, all while keeping developer experience and operational simplicity front and center. We’ll share lessons learned, tools we used (and avoided), and how we’re evolving the platform to support future innovation. Whether you’re just framing your edge strategy or ready to hang drywall, this talk will give you practical insights to take home.
The Questions Not Asked: A Critical Retrospective on Platform Engineering#
Time: 11:45am EST - 12:15pm EST
Speakers: Whitney Lee (Senior Technical Advocate, Datadog); Ram Iyengar (Community Manager, OpenSSF Foundation); Daniel Bryant (Platform Engineer and Head of Product Marketing, Syntasso); Kunal Kushwaha (Senior Developer Advocate, CAST AI); Aditya Soni (CNCF Ambassador, SRE, Forrester)
Venue: Building B | Level 3 | B304-305, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Everyone is doing platform engineering — yet something still feels off. Why? Rather than celebrate what we already know, this session surfaces what we may be overlooking & the new thinking needed to move forward. We’ll unpack how the meaning of platform is evolving & challenge assumptions like developer experience above all, platform as a product, unclear ownership that stalls progress, and golden paths that feel more like restrictions. These are the contradictions many teams face today. In this panel, trusted platform thinkers from across the ecosystem come together to ask: Are we solving the right problems, in the right way, with the right teams or just reinforcing complexity & losing business context that platform engineering is meant to serve? We’ll revisit platform patterns that stood the test of time, explore shifts in open source and orchestration, and imagine what continuity and innovation should look like next. This isn’t a debate — it’s an industry check-in. A creative reset.
Pet-a-Pup#
Time: 12:00pm EST - 1:00pm EST
Venue: Building B | Level 2 | Willow Garden Foyer, Atlanta, GA, USA
Type: EXPERIENCES
Description: Take a “paws” from your busy day! Join us for a visit with some friendly therapy puppies to help reduce stress and boost your mood.
Lunch 🍲#
Time: 12:15pm EST - 1:45pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: BREAKS
Learning Lounge: A Peek Inside the LLMOps Black Box#
Time: 12:30pm EST - 12:45pm EST
Speakers: Aleks Jones (Technical Trainer, Linux Foundation Education)
Venue: Building B | Level 1 | Exhibit Hall B3-B5 | Learning Lounge, Atlanta, GA, USA
Type: EXPERIENCES
Description: 10-Minute Tip Talk
Network Nook Meetup: Share & Continue the Conversation#
Time: 12:30pm EST - 1:30pm EST
Venue: Building B | Level 1 | Solutions Showcase, Atlanta, GA, USA
Type: EXPERIENCES
Description: Join us for casual and engaging meetups at the Network Nook during lunch breaks! These informal gatherings are open to all, whether you’re a first-time attendee, a solo traveler, or simply looking to chat about shared interests. This is a great way to connect with others. Today’s theme is: Share & Continue the Conversation What were your favorite sessions, networking events, etc., this week? Where can you contribute to the mindshare in your local community?
GitHub Actions: Project Usage and Deep Dive#
Time: 1:00pm EST - 2:00pm EST
Speakers: Jeremy Rickard (Principal Software Engineer, Microsoft); Jeffrey Sica (Head of Projects, Cloud Native Computing Foundation)
Venue: Building B | Level 1 | Exhibit Hall B3-B5 | Project Pavilion, Atlanta, GA, USA
Type: PROJECT OPPORTUNITIES
Description: Many open source projects (CNCF or otherwise) rely on GitHub Actions in order to handle building and testing. GitHub does a fantastic job at making simple jobs easy to implement, and has the flexibility to support extraordinarily complex or matrix-style jobs. That flexibility however can cause many maintainers to create needlessly complicated jobs. On top that, running on hosted runners can cause a new set of problems that are difficult to debug and optimize. Join Jeefy and Jeremy as they dive deep on setting up GitHub Actions and discuss how the CNCF has set up hosted runners for its projects.
Achieving Peak Performance Through Hardware Alignment in DRA#
Time: 1:45pm EST - 2:15pm EST
Speakers: Gaurav Ghildiyal (Senior Software Engineer, Google); Byonggon Chun (Senior Software Engineer, FuriosaAI)
Venue: Building B | Level 2 | B206, Atlanta, GA, USA
Type: AI + ML
Description: Are you using the newly GA’d Dynamic Resource Allocation API for your device allocation needs, but wondering why two seemingly identical pods exhibit wildly different performance characteristics? The answer often lies in the complex world of modern hardware topologies. This talk will dive into the critical importance of device alignment – ensuring your allocated CPUs, GPUs, NICs and other devices are optimally co-located. We’ll explore the nuances of multi-CPU, NUMA-aware systems, and PCIe hierarchies, demonstrating how subtle misalignments can lead to significant performance bottlenecks. We’ll show how you can leverage the DRA API to achieve various forms of alignment and discuss what the community is doing to further enhance this area. Join us to learn how to optimize your high-performance workloads by truly understanding and leveraging the underlying hardware with DRA. Together, we’ll make sure you’re getting the most bang for your buck from your powerful machines.
Effortlessly Build High-Performance AI/ML Processing Pipelines Within the ML Lifecycles#
Time: 1:45pm EST - 2:15pm EST
Speakers: Kazuki Yamamoto (Software Research Engineer, NIPPON TELEGRAPH AND TELEPHONE CORPORATION(NTT))
Venue: Building B | Level 3 | B308-309, Atlanta, GA, USA
Type: AI + ML
Description: Join us for an enlightening presentation on effortlessly building an advanced processing form called “processing pipeline”for AI/ML workloads. In streaming processing, accelerators are assigned only to specific tasks in the workload. However, we can build high-performance processing infrastructure at the service level by assigning each task to a suitable accelerator and chaining them together. Native Kubernetes is a popular choice for deploying AI/ML workloads. However, more is needed to deploy a new processing form: the “processing pipeline” chaining accelerators. This presentation will demonstrate how effortlessly a video inference system can be built using the “processing pipeline”, which leverages Numaflow and “Dynamic Resource Allocation” (DRA). It will also introduce the “processing pipeline”that can be integrated into MLOps through Kubeflow Pipelines. You will see a glimpse of future innovations, including high-speed communication between accelerators via the 2nd NIC.
Intelligent Topology for AI Power: Network-Aware Scheduling Optimization With Volcano HyperNode#
Time: 1:45pm EST - 2:15pm EST
Speakers: Kevin Wang (Technical Expert, Lead of Cloud Native Open Source, Huawei)
Venue: Building B | Level 4 | B401-402, Atlanta, GA, USA
Type: AI + ML
Description: The increasing complexity of AI models drives a demand for advanced computing, leading to the evolution of workload deployment towards intricate patterns like disaggregated-PD deployment on high-performance AI clusters. Optimizing these clusters requires sophisticated orchestration, particularly regarding network topology awareness. This presentation introduces the exploration within the CNCF community, focusing on Volcano and Kueue, into network topology abstraction and scheduling. We will delve into key challenges users face in building AI infrastructure: 1. building commond topology abstraction across diverse hardware, 2. efficiently managing topology data for optimal scheduling decisions, and 3. the ecosystem support for typical workload abstractions like LWS, etc.
Creating and Maintaining Ephemeral Runtime Environments for 18,000 Developers#
Time: 1:45pm EST - 2:15pm EST
Speakers: Alexandre Astolpho Thomaz (Alexandre Astolpho Thomaz, Itau Unibanco)
Venue: Building B | Level 4 | B406b-407, Atlanta, GA, USA
Type: APPLICATION DEVELOPMENT
Description: In this session, we will explore how Itaú Unibanco, the largest bank in Brazil and Latin America, successfully implemented and maintained ephemeral runtime environments to support its 18,000 developers. This case study will delve into the challenges faced, the solutions developed, and the benefits realized from this large-scale implementation. Attendees will gain insights into the technical and organizational strategies that enabled Itaú Unibanco to enhance developer productivity, reduce costs, and improve system stability. Standardizing technology stacks across an organization as large as Itaú Unibanco is crucial for simplifying maintenance, enhancing interoperability, and reducing the learning curve for developers. This standardization also played a key role in the successful implementation of ephemeral runtime environments, as it allowed for predictable and repeatable setups that could be easily managed and scaled.
Supercharge Cloud Native SQL Database With Object Storage: Scaling TiKV With S3 as the Backbone#
Time: 1:45pm EST - 2:15pm EST
Speakers: Jinpeng Zhang (Principle Engineer, PingCAP)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 2-3, Atlanta, GA, USA
Type: DATA PROCESSING + STORAGE
Description: Object storage like Amazon S3 is known for its scalability, durability, and cost-efficiency—but can it serve as the foundation for high-performance transactional databases? In this talk, I’ll share how TiKV, the storage layer of the distributed SQL database TiDB, was re-architected to use object storage as the source of truth. This shift unlocked massive scalability, faster disaster recovery and backup restores, and significantly lower storage costs—without sacrificing performance. I’ll dive into the key innovations that decouple compute and storage, keeping object storage off the performance-critical path. Through smart caching, tiered storage, and optimized write paths, we achieved single-digit millisecond p99 latency and millions of TPS at cloud scale. Join us to see how this approach redefines the storage stack for cloud-native databases—and what it means for the future.
Learning Lounge: Starting your Kubestronaut Journey with the KCNA, CKA and CKAD#
Time: 1:45pm EST - 2:00pm EST
Speakers: James Spurin (Founder, DiveInto)
Venue: Building B | Level 1 | Exhibit Hall B3-B5 | Learning Lounge, Atlanta, GA, USA
Type: EXPERIENCES
Description: 10-Minute Tip Talk
Backstage Celebrations: Stable Foundations and MCP Innovations#
Time: 1:45pm EST - 2:15pm EST
Speakers: Ben Lambert & Patrik Oldsberg (Senior Engineer, Spotify)
Venue: Building C | Level 3 | Georgia Ballroom 3, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Following on from the previous talk at KubeCon EU, the Backstage project has continued its push for stability and maturity. The last couple of years there has been a large effort to switch to a more drop-in method of installing plugins, through the introduction of the new backend and frontend systems. This is now coming to a close with the stable release of the new frontend system, and the maintainers are excited to talk about what this means for the present and future of Backstage. You will also hear about the work towards an AI-Native Backstage, through new systems like the Actions Registry and first-class support for MCP. They’ll also share what this means for authentication in Backstage, and how that system has been evolved to allow for more use-cases. As always, there will also be highlights from the different Project Areas and time for Q&A, so here’s your chance to ask any burning questions.
Butterfly Effect: What Kubernetes SIG Security Has in Flight#
Time: 1:45pm EST - 2:15pm EST
Speakers: Ian Coldwater (Security Researcher, Independent); Savitha Raghunathan (Senior Software Engineer, Red Hat); Carol Valencia (she/her, Elastic)
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Kubernetes SIG Security continues to spread security across the cloud native field. Flutter in for updates about what we’ve been up to, featuring VEXing bugs, the perennial third-party audit coming back up, Security Self-Assessments emerging from dormancy to bloom again, (O)wasps, budding new contributors, and collaborating across SIGs to bee better together. Everything we do as contributors has ripple effects outward. Security is everyone’s responsibility, and every one of us can make a difference. What’s landing, and what’s taking flight? Come hear the buzz with us, and learn how you can get involved!
Package Management for Your Cluster, Reimagined#
Time: 1:45pm EST - 2:15pm EST
Speakers: Jordan Keister (Principal Software Engineer, Red Hat, Inc.); Joe Lanford (Red Hat); Attila Mészáros (Senior Software Engineer, Apple)
Venue: Building C | Level 3 | Georgia Ballroom 1, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: This year, after a significant refactor, the Operator Lifecycle Manager (OLM) reached its 1.0 release, and the Java Operator SDK introduced several improvements and new features across multiple minor releases. The Operator Framework team will discuss exciting new features in OLM such as declarative approaches to installation previews and policy-based approvals, dynamic workload configuration, and new content types. The team will also cover enhancements in the Java Operator SDK addressing common but complex challenges in building Kubernetes operators, including the expectation pattern. Please join the team to talk shop about all things operators!
Prometheus Intro, Deep Dive, and Open Q+A#
Time: 1:45pm EST - 2:15pm EST
Speakers: Owen Williams (Principal Software Engineer, Grafana Labs); David Ashpole (David Ashpole, Google)
Venue: Building C | Level 1 | C111-112, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: As the 2nd oldest project in the CNCF, you have probably heard about Prometheus before. Nevertheless, the project maintainers will give you an introduction from the very beginning, followed by a deep dive into the exciting new features that have been released recently or are in the pipeline. You will learn about many opportunities to use Prometheus, and maybe we can even tempt you to contribute to the project yourself.
Fast and the Furious: CICD Pipeline for eBPF Programs at Meta Scale#
Time: 1:45pm EST - 2:15pm EST
Speakers: Theophilus Benson & Prankur Gupta (Professor, Meta)
Venue: Building B | Level 4 | B405-406a, Atlanta, GA, USA
Type: OPERATIONS + PERFORMANCE
Description: At Meta, we run over 150 eBPF programs on our machines which provide a range of custom functionality from network observability, load balancing, network stack specialization and security features. We update these eBPF programs at least once a week to cater to changes in our microservice stack which introduces opportunities for outages. While we maintain a custom and flexible in house CI/CD pipeline for ensuring high velocity of our microservices, we observed that we were unable to use the same pipeline for our eBPF programs because of nuances of the eBPF ecosystem and the fact our eBPF programs run in the kernel. Over the last five years, we have tailored our CI/CD pipeline to effectively support eBPF programs and their nuances. In this talk, we describe challenges faced in tailoring our CI/CD pipeline, highlight lessons learned from several production outages, and discuss on-going work to further enhance our eBPF-centric CI/CD pipeline.
GitOps Without Variables#
Time: 1:45pm EST - 2:15pm EST
Speakers: Brian Grant & Alexis Richardson (Cofounder/ CTO, ConfigHub Inc)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 1, Atlanta, GA, USA
Type: OPERATIONS + PERFORMANCE
Description: ClickOps vs GitOps? Why not both? Initiating every change with an update to Git means finding the right lines of files to change, committing the fix, and updating production. But when your application is broken, speed matters. Do you have time to work out how to modify a 3000-line Helm chart and figure out which other teams, workloads, and clusters this will impact? Or you can ‘break glass’ and edit live production state; but now you have created drift from desired state! We want to end your frustration - with a new approach. We will demonstrate an evolution of GitOps where you can query and operate on configuration in bulk, use Kubernetes dashboards to fix runtime errors quickly, and synchronize bidirectionally with your clusters. Our solution is to separate all config data from the code used to update it. We store config as fully rendered plain YAML resources (“Write Every Time”) and show how to update this live without using “DRY” templates, variables, inline conditionals or loops.
Mission Abort: Intercepting Dangerous Deletes Before Helm Hits Apply#
Time: 1:45pm EST - 2:15pm EST
Speakers: Payal Godhani (Principal Engineer, Oracle)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 4, Atlanta, GA, USA
Type: OPERATIONS + PERFORMANCE
Description: What if your next Helm deployment silently deletes a LoadBalancer, a Gateway, or an entire namespace? We’ve lived that nightmare—multiple times. In this talk, we’ll share how we turned painful Sev1 outages into a resilient, guardrail-first deployment strategy. By integrating Helm Diff and Argo CD Diff, we built a system that scans every deployment for destructive changes—like the removal of LoadBalancers, KGateways, Services, PVCs, or Namespaces—and blocks them unless explicitly approved. This second-layer approval acts as a safety circuit for your release pipelines. No guesswork. No blind deploys. Just real-time visibility into what’s about to break—before it actually does. Whether you’re managing a single cluster or an entire fleet, this talk will show you how to stop fearing Helm and start trusting it again. Because resilience isn’t about avoiding failure—it’s about learning, adapting, and building guardrails that protect everyone.
GitOps and the Manifest Dilemma: Helm, Kustomize, Crossplane, Kro, and Beyond#
Time: 1:45pm EST - 2:15pm EST
Speakers: Dag Bjerre Andersen (Senior Platform Engineer, Egmont)
Venue: Building B | Level 3 | B312-314, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: As Kubernetes adoption grows, so does the need to manage manifests in a scalable and maintainable way. We used to rely mainly on Helm and Kustomize for templating, but today the ecosystem is evolving rapidly - new manifest-generation tools appear regularly, each with its own syntax, execution model, and design philosophy. This talk explores the fragmented landscape of manifest rendering tools and how well they align - or clash - with GitOps principles. We’ll examine the core differences between tools that render manifests outside the cluster (e.g., Helm, Kustomize) and those that rely on in-cluster controllers and CRDs (e.g., Crossplane, KubeVela, Kro). These two paradigms differ not only technically, but also in how they integrate with popular GitOps platforms like Argo CD and Flux. Through practical comparisons, attendees will gain a clear understanding of the core paradigms behind these tools - their strengths, limitations, and how well they align with GitOps workflows.
Transforming Kubernetes Clusters Into a Multi-Tenant Powerhouse at Electronic Arts#
Time: 1:45pm EST - 2:15pm EST
Speakers: Michael Dundek & Ruben Vasconcelos (Technical Director, Electronic Arts)
Venue: Building B | Level 3 | B304-305, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: In the dynamic landscape of application development and efficiently managing Kubernetes clusters while ensuring security, isolation, and ease of use is a persistent challenge. Electronic Arts developed a Kubernetes framework that transforms standard clusters into secure, fully segregated multi-tenant platforms. Our custom operator and CRDs enable seamless tenant onboarding with complete isolation across network, RBAC, and resources. This approach allows teams to manage fewer clusters while reducing overall costs. Our solution simplifies complex deployments by handling ingress, certificates, secrets, GitOps, and daily operations through specialized CRDs. Development teams can leverage Kubernetes capabilities without managing its underlying complexity. This presentation reveals our problem-solving strategies and technical implementation details - essential knowledge for anyone tackling Kubernetes multi-tenancy challenges or seeking to simplify Kubernetes adoption for end users.
Authenticating and Authorizing Every Connection at Uber#
Time: 1:45pm EST - 2:15pm EST
Speakers: Yangmin Zhu & Matt Mathew (Staff Software Engineer, Uber)
Venue: Building B | Level 3 | B302-303, Atlanta, GA, USA
Type: SECURITY
Description: Uber operates one of the world’s largest and most complex microservice architectures, composed of thousands of services built in diverse languages and maintained by independent teams. Ensuring consistent, secure service-to-service communication, without requiring code changes, posed a massive challenge. In this talk, we’ll share how we built and scaled a platform-level authentication and authorization solution based on Envoy, SPIRE, and the SPIFFE standard. Over a 3-year journey, we rolled out a Zero Trust architecture securing every service interaction with mTLS, authenticating workloads using SPIFFE identities, and enforcing fine-grained policies through a unified control plane. Attendees will learn about the architectural decisions, operational hurdles, and user-experience tradeoffs we faced along the way. Whether you’re starting your Zero Trust journey or looking to scale Envoy/SPIRE across a large org, this talk will offer practical insights from real-world deployment at scale.
🚨 Contribfest: Create Your Own Generators for External Secrets!#
Time: 1:45pm EST - 3:00pm EST
Speakers: Gustavo Carvalho (Thursday November 13, 2025 1:45pm - 3:00pm EST)
Venue: Building B | Level 2 | B208, Atlanta, GA, USA
Type: 🚨 CONTRIBFEST
Description: This Contribfest is focused on creating a few generators for popular SaaS projects. With this contribfest, you’ll be able to see how easy it is to contribute any secrets generation mechanism you need to leverage internally, thus making your infrastructure much safer by allowing dynamic credentials rotation out of the box!
🚨 Contribfest: Meshery Contribfest: Dive Deep Into Extending Cloud Native Management#
Time: 1:45pm EST - 3:00pm EST
Speakers: Yash Sharma (Developer Advocate, DigitalOcean); Lee Calcote (Founder, Layer5); Hussaina Begum (Principal Engineer, Independent); Shivay Lamba (Developer Relations Engineer, Qualcomm)
Venue: Building B | Level 2 | B207, Atlanta, GA, USA
Type: 🚨 CONTRIBFEST
Description: Join us for an in-depth session on Meshery a leading cloud native management plane with Meshery maintainers and community. This is your chance to get hands-on with the tools shaping the future of collaborative cloud native management. We will walk through the Contributing Guide to help you familiarize yourself with the project and the contribution process with opportunities to work on core functionality in the Server (Golang) or UI (React) or extend Meshery by building your own plugin. Contribute to the documentation by incorporating your own examples of cloud native architectures using Meshery Designer. You will gain experience with cloud native technologies, including essentially every CNCF project and open source development practices. This session will be led by Meshery maintainers and contributors. No Prior Experience Needed: We welcome contributions from all levels of experience. Join us at Meshery Contribfest and be part of the future of collaborative cloud native management.
AI Inference Without Boundaries: Dynamic Routing With Multi-Cluster Inference Gateway#
Time: 2:30pm EST - 3:00pm EST
Speakers: Rob Scott (Software Engineer, Google); Daneyon Hansen (Senior Principal Software Engineer, Solo.io)
Venue: Building B | Level 2 | B206, Atlanta, GA, USA
Type: AI + ML
Description: Your AI inference workloads need more GPUs than any single cluster can provide. Sound familiar? When demand exceeds local capacity and your resources are spread across multiple clusters, intelligent routing becomes critical. This talk introduces Multi-Cluster Inference Gateway, a new part of the open-source Inference Gateway project that tackles distributed AI infrastructure head-on. We’ll show you how it leverages existing Gateway API and multi-cluster patterns to dynamically shift traffic where GPUs are available. Solving your GPU scarcity problem starts here. We’ll share practical deployment strategies, show you how to optimize costs by intelligently utilizing GPUs, and ensure your AI workloads remain highly available across clusters. Get ready for real-world examples that illustrate how to scale AI serving beyond the confines of a single cluster, empowering you to maximize utilization and minimize latency for your distributed AI workloads.
Evolving Kubernetes Scheduling#
Time: 2:30pm EST - 3:00pm EST
Speakers: Eric Tune & Wojciech Tyczyński (Principal Engineer, Google Cloud)
Venue: Building B | Level 3 | B308-309, Atlanta, GA, USA
Type: AI + ML
Description: Today, Kubernetes users rely on extensions to provide key scheduling functionality needed by AI workloads, including workload-aware scheduling and preemption, and topology-aware scheduling. These extensions can keep clusters full of expensive accelerators highly utilized, share them fairly between teams, and support parallel training and inference on complex hardware topologies. However, they are complex to write and maintain, and are not designed to interoperate. In this session we offer a comparison of “second-level schedulers”, including Kueue, Volcano, Ray, and Slurm, and of how they interact with the Kubernetes core scheduler (kube-scheduler). We’ll cover the current and future work to extend kube-scheduler with resource reservations, workload-awareness, and integration with infrastructure autoscaling. These changes create a clear separation of responsibilities between the core and second-level schedulers.
Llm-d: Multi-Accelerator LLM Inference on Kubernetes#
Time: 2:30pm EST - 3:00pm EST
Speakers: Erwan Gallen (Senior Principal Product Manager, Red Hat)
Venue: Building B | Level 4 | B401-402, Atlanta, GA, USA
Type: AI + ML
Description: Large language model serving has grown beyond one GPU per pod. Kubernetes clusters now mix GPUs, TPUs and custom AI ASICs, yet the community still needs a unified recipe to harness them. llm-d is a Kubernetes-native distributed-inference stack built around vLLM. It adds a workload-aware scheduler, disaggregated prefill and decode, a tiered KV cache and visibility into interconnect bandwidth, from NIXL fabrics to GPU peer-to-peer links. This talk shows how llm-d feeds that topology data to Kubernetes so each request lands on the accelerator and network path that meets its latency target at the lowest cost. Attendees learn how llm-d reasons about accelerator classes and interconnects, and receive a clear scorecard for selecting the best hardware mix for chat, long-context or batch generation. They leave with a practical blueprint for llm-d understanding, ready to combine high performance with tight budgets.
Design Patterns for Consistent Centralized Authorization#
Time: 2:30pm EST - 3:00pm EST
Speakers: José Padilla (Auth0 & Alice Gibbons, Diagrid)
Venue: Building B | Level 4 | B406b-407, Atlanta, GA, USA
Type: APPLICATION DEVELOPMENT
Description: Modern centralized authorization systems like OpenFGA don’t just centralize policy—they also amalgamate authorization data such as users, roles, and relationships. This shift brings great flexibility and visibility, but introduces a key challenge: keeping that data in sync with distributed application state. As having a single shared database is an anti-pattern, this talk will explore options to handle this challenge reliably at scale. Using Dapr’s building block APIs—including pub/sub and state management—attendees will learn how to coordinate consistent dual writes between apps and OpenFGA. Attendees will walk away with an architecture that emits domain events from services and uses Dapr to process them asynchronously, ensuring consistency without tight coupling or fragile orchestration. José and Alice will demonstrate a real-world example where centralized authorization is integrated cleanly into an open source, microservices system.
Optimized Scheduling for Big Data Workloads#
Time: 2:30pm EST - 3:00pm EST
Speakers: The Why (What and How of K8s Schedulers - Rahul Sharma); Wilfred Spiegelenburg (Principal Engineer, Cloudera)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 2-3, Atlanta, GA, USA
Type: DATA PROCESSING + STORAGE
Description: Stateful workloads like databases, caching systems, and message brokers are the backbone of many modern applications, yet they pose distinct challenges within Kubernetes. Unlike their stateless counterparts, these workloads demand persistent storage, stable network identities, and precise resource allocations. The default Kubernetes scheduler, while robust, often struggles to meet these specialized needs, potentially resulting in performance degradation, resource inefficiencies, or operational instability. In this session, we’ll take a deep dive into the art of scheduling stateful workloads, breaking it down into three essential components - why, what and how of the most popular Kubernetes schedulers out there. By the end of this talk, attendees will walk away with a clear understanding of how to adapt Kubernetes scheduling to the unique demands of big data workloads.
Peer Group Mentoring#
Time: 2:30pm EST - 3:30pm EST
Venue: Building B | Level 2 | B211-212, Atlanta, GA, USA
Type: INCLUSION + ACCESSIBILITY
Description: Peer Group Mentoring allows participants to meet with experienced open source veterans across many CNCF projects. Mentees are paired with 2 – 10 other people in a pod-like setting to explore technical, community, career, and certification questions together. If you’re interested in being a Mentor, you can sign up here
Kubeflow Ecosystem: Navigating the Cloud Native AI/ML and LLMOps Frontier#
Time: 2:30pm EST - 3:00pm EST
Speakers: Yuki Iwai (CyberAgent, Inc.); Valentina Rodriguez Sosa (Principal Architect, Red Hat); Johnu George (Technical Director, Nutanix); Akshay Chitneni (Staff Software Engineer, Apple); Josh Bottum (Founder, Indemnify AI, Inc.)
Venue: Building C | Level 1 | C111-112, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: GenAI is redefining what a cloud-native ML platform must deliver. Kubeflow answers with a modular stack, from pipeline orchestration and data processing to distributed training, tuning, and inference. In this update, we will map that frontier for the entire ML workload lifecycle. It unveils reproducible blueprints for fine-tuning and serving large language models on diverse GPU fleets driven by Kubeflow families and components that balance utilization with isolation. We explain how declarative components, artifact versioning, and lineage tracking keep experiments portable between on-prem and cloud. Additionally, it outlines the twelve-month roadmap: security hardening, comprehensive ML user experience, and multi-cluster awareness. The session ends with an open Q&A and contribution on-ramps for good-first issues, mentorship, and governance so that attendees depart with step-by-step recipes, architecture guides, and a concrete way to shape Kubeflow’s next chapter.
Project Harbor Maintainers Update#
Time: 2:30pm EST - 3:00pm EST
Speakers: Where Are We Heading? - Orlin Vasilev (SUSE & Vadim Bauer, 8gears)
Venue: Building C | Level 3 | Georgia Ballroom 1, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: This talk highlights the major advancements in Harbor from version 2.12 to 2.13. In v2.12, Harbor introduced enhanced robot accounts for better CI/CD automation, added proxy cache speed limits for efficient network usage, improved LDAP onboarding for smoother authentication, and expanded integration with ACR & ACR EE registries for flexible image replication. In v2.13, Harbor continues this momentum with significant security and observability improvements: an extended audit log offers detailed user action tracking, OIDC enhancements add better session handling and PKCE support, and Redis TLS ensures secure communication—though a known TLS config issue with external Redis was noted. New CloudNativeAI integration now supports AI model storage and lifecycle management. Dragonfly preheating was optimized for large AI artifacts. Join us to get the information first hand were are we going with 2.14 and onwards!
Supercharge Your Canary Deployments With Argo Rollouts Step Plugins#
Time: 2:30pm EST - 3:00pm EST
Speakers: Kostis Kapelonis (Developer Advocate, Octopus Deploy); Alexandre Gaudreault (Software Developer & Argo CD Maintainer, Intuit)
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Argo Rollouts is a Kubernetes controller for Progressive Delivery (blue/green and canary deployments). The controller already supports a plugin system for traffic providers (Istio, Traefik, Gateway API, etc.) and for metric providers (Prometheus, Datadog, etc.). In the latest release, the Argo team completed the trilogy by implementing support for Canary Step plugins. This extends Argo Rollouts capabilities and enriches the progressive delivery experience to accommodate a multitude of scenarios. With Canary step plugins, you can now fully control what happens DURING the canary process and implement any custom functionality that you want within the canary steps. Did you always want to do canary gating? Deployment sync between different controllers? Custom notifications while the canary is running? Now you can! In this talk we will see the architecture of the new plugin mechanism and explain how you can extend canary deployments with your own custom workflows.
What’s New With Kubectl and Kustomize … and How You Can Help!#
Time: 2:30pm EST - 3:00pm EST
Speakers: Marly Salazar (Co-chair, Independent); Arda Guclu (Software Engineer, Red Hat); Maciej Szulik (Staff Platform Engineer, Defense Unicorns); Eddie Zaneski (Technical Advisor to the CTO, Defense Unicorns)
Venue: Building C | Level 3 | Georgia Ballroom 3, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Have you ever wondered how kubectl and kustomize enhancements are designed and built? Curious why your favorite feature request wasn’t accepted? Join the folks from Kubernetes SIG CLI to find out! In this session, the SIG CLI maintainers will provide an introduction to the plethora of tooling they are working on and an overview of how to get started contributing. They will share the work done over the past year and the roadmap for what is next. Join us to help shape your favorite tools!
Final Boss Fights#
Time: 2:30pm EST - 3:00pm EST
Speakers: What Zynga Monitors When Game Teams Perform World Wide Launches & How We Prepare - Molly Sheets (Director of Engineering, Kubernetes, Take-Two Interactive Software, Inc.); Krunal Soni (Take-Two Interactive Software, Inc.); Steve Phillips (Sr. Technical Account Manager, Amazon); Justin Schwartz (Sr. Architect, Zynga)
Venue: Building B | Level 4 | B405-406a, Atlanta, GA, USA
Type: OPERATIONS + PERFORMANCE
Description: Zynga, a global leader in interactive entertainment and a wholly-owned subsidiary of Take-Two Interactive Software, Inc, launches new mobile games with our platform engineering team multiple times a year on Kubernetes. Join the central infrastructure teams behind the company that made mobile hits like Words with Friends, FarmVille, Zynga Poker, and many more. Learn from both Amazon and Zynga how together they support game teams and central teams to scale applications for game launches before, during, and after to make sure they are ready for everything.
Help! My LLM Is a Resource Hog: How We Tamed Inference With Kubernetes and Open Source Muscle#
Time: 2:30pm EST - 3:00pm EST
Speakers: Aditya Soni (CNCF Ambassador, SRE, Forrester); Hrittik Roy (Platform Advocate | CNCF Ambassador, vCluster)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 4, Atlanta, GA, USA
Type: OPERATIONS + PERFORMANCE
Description: A client came to us with a problem we’re seeing more and more, their large language model (LLM) was deployed, but inference was painfully slow, GPU usage was unpredictable, and costs were spiraling out of control. Kubernetes alone wasn’t enough, they needed a production-ready, efficient, and scalable stack. In this talk, we’ll walk through how we diagnosed and solved the issue using open-source CNCF tools, turning a chaotic deployment into a well-oiled inference machine. You’ll learn how to: 1. Use KServe and Kubeflow to serve LLMs reliably. 2. Benchmark and auto-scale workloads using Volcano and KEDA while optimizing resource usage and latency. 3. Track model performance and drift with Prometheus, Grafana, and OpenTelemetry. We’ll share benchmarks, architectures, and lessons from the field, all based on open-source tooling you can try today. Whether you’re running LLMs at scale or just exploring GenAI, this talk is packed with real-world solutions to help you do more with less.
Harmonizing Strategy and Engineering: Lessons Learnt in Building a Platform Plugin for Diverse Users#
Time: 2:30pm EST - 3:00pm EST
Speakers: Sri Chandrasekaran & Kate Klymkovska (Senior Product Manager, Spotify)
Venue: Building B | Level 3 | B312-314, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Creating a successful platform plugin or product is a strategic and technical journey requiring clear vision, deliberate engineering, and a focus on user needs. In this talk, Kate and I will share our experience at Spotify, from engineering and product perspectives, as we developed a Backstage plugin that evolved from a simple internal tool to support diverse user groups. We’ll explore Spotify’s journey, balancing challenges and opportunities at scale, and share key lessons for addressing diverse user needs. Attendees will gain a practical framework to align product strategy with engineering execution and externalize internal tools effectively. Using a real-world example, we’ll provide actionable insights on crafting a strong product vision, managing technical trade-offs, and prioritizing for impact. Whether building a plugin, platform, or product, this talk equips participants with tools to drive adoption, stakeholder satisfaction, and alignment between strategy and engineering.
Ulysses’ Odyssey Through Platform Engineering#
Time: 2:30pm EST - 3:00pm EST
Speakers: William Rizzo (Strategy Lead, Mirantis Inc - USA)
Venue: Building B | Level 3 | B304-305, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Embark on an epic journey paralleling Ulysses’ Odyssey with the challenges faced in modern platform engineering. Like Ulysses navigating mythical trials, platform teams encounter cultural resistance, integration complexities, resource constraints, and technical debt. This session creatively aligns these common blockers to Ulysses’ legendary adventure, providing practical strategies to overcome each obstacle. Join us to discover how clear vision, resilience, and strategic navigation empower teams to successfully adopt platform engineering and drive exceptional developer experiences.
Securing Data Applications at Pinterest With Finer Grained Access Control on Kubernetes#
Time: 2:30pm EST - 3:00pm EST
Speakers: Soam Acharya & William Tom (Principal Engineer, Pinterest)
Venue: Building B | Level 3 | B302-303, Atlanta, GA, USA
Type: SECURITY
Description: At Pinterest, our data processing platform runs nearly 90K jobs on 20K nodes ingesting about 200PB of data daily, powering ML models, user insights, data lakes, and more. This massive scale, while pushing the limits of cloud computing, requires secure, least-privileged data management that also has to meet evolving regulations. To address these needs, we introduced Finer Grained Access Control (FGAC) into Moka, our new Kubernetes-based processing platform. FGAC integrates Kubernetes and AWS features (namespaces, sidecars, service accounts, RBAC, STS, EKS, IRSA) to authenticate with internal services (servicemesh, mTLS, IAM proxy) for a secure multi-tenant environment supporting Spark, Ray, and Flink. In this talk, we detail our design for Moka FGAC and current migration status. We also share the trade-offs and design decisions that led to better data isolation, scale, improved resource utilization and an overall simpler approach compared to our previous Hadoop/Kerberos based solution.
You Deployed What?! Data-Driven Lessons on Unsafe Helm Chart Defaults#
Time: 2:30pm EST - 3:00pm EST
Speakers: Michael Katchinskiy & Yossi Weizman (Security Researcher, Microsoft)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 1, Atlanta, GA, USA
Type: SECURITY
Description: Most breach post-mortems start with “Which CVE?” However, ours usually end with “There wasn’t one.” We analyzed 10 B Kubernetes audit events and scanned over 3000 clusters to map compromise paths that rely solely on insecure defaults shipped by default in widely trusted Helm charts. The pattern is painfully consistent: world-reachable Service/Ingress, authentication set to “off by default,” and a pod that have permissions to go wild. We’ll chain those three defaults against Apache Pinot, Selenium Grid and Meshery all without a single vulnerability. To flip the script, we’ll walk through hardening the same workloads using existing community tools like OPA Gatekeeper, Kyverno, Pod Security Admission, and GitHub Actions to enforce guardrails before someone in your organization is going to deploy an “official” Helm chart.
An Open Source AI Compute Stack: Kubernetes + Ray + PyTorch + VLLM#
Time: 3:15pm EST - 3:45pm EST
Speakers: Robert Nishihara (Co-founder, Anyscale)
Venue: Building B | Level 2 | B206, Atlanta, GA, USA
Type: AI + ML
Description: AI workloads require increasing scale for both compute and data, as well as significant heterogeneity across workloads, models, data types, and hardware accelerators. As a consequence, the software stack for running compute-intensive AI workloads is fragmented and rapidly evolving. Companies that productionize AI end up building large AI platform teams to manage these workloads. However, within the fragmented landscape, common patterns are beginning to emerge. This talk describes a popular software stack combining Kubernetes, Ray, PyTorch, and vLLM. It describes the role of each of these frameworks, how they operate together, and illustrates this combination with case studies from Pinterest, Uber, and Roblox as well as from today’s most popular post-training frameworks.
Beyond ChatOps: Agentic AI in Kubernetes—What Works, What Breaks, and What’s Next#
Time: 3:15pm EST - 3:45pm EST
Speakers: Pavneet Ahluwalia (Principal PM Lead, Microsoft); Idit Levine (Founder & CEO, Solo.io); Arik ALon (CTO, Robusta); Valeria Ortiz (Site Reliability Engineer, Akamai)
Venue: Building B | Level 3 | B308-309, Atlanta, GA, USA
Type: AI + ML
Description: Agentic AI is evolving from hype to hands-on reality—no longer just copilots, but autonomous actors in Kubernetes clusters. But how effective are these AI agents in real-world ops? This panel brings together builders and operators who’ve deployed LLM-powered agents at scale in production to share what worked, what broke, and what surprised them. Expect a candid, high-signal conversation on the true strengths and sharp limitations of AI agents for Kubernetes. SREs, platform engineers/operators—come with questions, leave with a clearer sense of where AI can reduce toil, when it still needs babysitting(human-in-the-loop), and how to experiment and deploy safely. We’ll cover: - High-efficacy use cases: RCA, triage, incident summarization - Common failure patterns: hallucinations, context loss, unpredictability, alert attention - Evaluation strategies in dynamic prod environments - Design trends: agent chaining, feedback loops, safety guardrails
GitOps for AI Agents: Building Reliable AI Pipelines With Argo#
Time: 3:15pm EST - 3:45pm EST
Speakers: Benji Kalman (VP Engineering and Co-Founder, Root.io); Shiran Melamed (DevOps Group Leader, JFrog)
Venue: Building B | Level 4 | B401-402, Atlanta, GA, USA
Type: AI + ML
Description: As AI agents evolve from toy experiments to production-grade workflows, engineering teams face new orchestration challenges. LLM-based agents don’t just need GPUs—they need state, retries, memory & coordination across tools, APIs, & data systems. Enter Argo: the battle-tested workflow engine that’s quietly becoming the backbone of serious agentic infrastructure. In this talk, we’ll explore how Argo Workflows power the runtime execution of complex agent pipelines—chaining tasks, managing dependencies & scaling based on dynamic resource needs—while Argo CD provides GitOps-style control over the K8s infra and configurations those agents rely on. Together, they enable full lifecycle management: declarative deployment, dynamic execution, and safe rollbacks. We’ll also share how layering agents on top of Argo unlocks new capabilities—from safely iterating on prompt-driven pipelines, to automatically rolling back failed steps, to integrating observability and human-in-the-loop checkpoints.
Feature Flag Driven Development: Seamlessly Integrate Feature Flags Into Your SDLC#
Time: 3:15pm EST - 3:45pm EST
Speakers: Kris Coleman (Director of Platform Engineering, TestifySec); Michael Beemer (Senior Product Manager, Dynatrace)
Venue: Building B | Level 4 | B406b-407, Atlanta, GA, USA
Type: APPLICATION DEVELOPMENT
Description: Feature flags decouple deployment from release, enabling safe rollouts and continuous delivery. But at scale, they introduce challenges: drift between environments, inconsistent definitions, and runtime bugs from undefined or mistyped flags. At one of the largest health systems in the US, daily releases using feature flags boosted delivery speed but exposed risks due to poor flag hygiene. To address this, we created Feature Flag Driven Development (FFDD); a workflow that embeds flag management into the SDLC. This talk introduces FFDD, a practice that treats feature flags as first-class citizens. Built on the CNCF OpenFeature spec and tools like the OpenFeature CLI, FFDD eliminates manual coordination, reduces errors, and enables GitOps-driven promotion. We’ll demo the full FFDD flow: defining flags, generating type-safe code, validating in CI, syncing flags, and promoting safely across environments using GitHub Actions.
Quorum-Based Consistency for Cluster Changes With CloudNativePG Operator#
Time: 3:15pm EST - 3:45pm EST
Speakers: Jeremy Schneider (Postgres Engineer, GEICO Tech); Leonardo Cecchi (CloudNativePG maintainer, EDB, EDB)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 2-3, Atlanta, GA, USA
Type: DATA PROCESSING + STORAGE
Description: Most people don’t think of Postgres in the context of quorum or distributed systems theory but vanilla open source Postgres has supported quorum commits across multiple replicas for almost 10 years now. Technologies like cassandra and dynamo popularized quorum consistency in the hot path of distributed writes and reads, but the theory also applies to cluster reconfigurations in a single-writer database like Postgres. Stateful operators at level V of the capabilities framework require very careful end-to-end coordination between control plane and data plane algorithms to avoid data loss when providing auto-healing under circumstances like network partitions or compounded failures. This session will explore how quorum consistency can be applied in the CloudNativePG operator, offering lessons and insights to fellow maintainers of other Kubernetes operators for stateful workloads.
Dragonfly v2.3.0#
Time: 3:15pm EST - 3:45pm EST
Speakers: Intro (Updates, Model Distribution With Cloud Native Infra - Wenbo Qi & Tao Peng, Ant Group)
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Dragonfly provides efficient, stable, and secure file distribution and image acceleration using P2P technology within cloud-native architectures. This talk will briefly introduce Dragonfly and highlight the features of its latest version. Key updates include enhanced security and new functionalities tailored for more efficient and robust model distribution. We will also demonstrate how Dragonfly preheats and distributes AI models (packaged as OCI Artifacts) to read-only volumes in Kubernetes, enabling faster deployments.
Mapping the Next Phase: Updating the Cloud Native Maturity Model for 2025, The AI Era and Beyond#
Time: 3:15pm EST - 3:45pm EST
Speakers: Danielle Cook (CNCF Ambassador, Co-Organizer, Cartografos Working Group, Akamai); Simon Forster (Technical Architect, Stackegy); Robert Glenn (CEO, Glennium)
Venue: Building C | Level 1 | C111-112, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: The CNCF Cloud Native Maturity Model has long been a guiding framework for organizations navigating cloud native adoption. But the ecosystem moves fast, and so must the model. In this interactive session, members of the CNCF Cartografos Working Group will unveil the latest evolution of the model, designed to meet the demands of today’s fast-changing landscape, including AI, security, platform engineering and developer experience. We’ll share what’s changed, explore how to apply the model, and open the floor for your input. Whether you’re leading transformation, building cloud native applications and platforms, or guiding your team through complexity, this session will help you benchmark where you are, align across stakeholders, and influence where the model goes next. Bring your challenges, your wins, and your voice. This is more than a presentation—it is your chance to help shape the community’s vision of what the cloud native maturity journey looks like.
Scaling and Securing CoreDNS: Performance and Resilience#
Time: 3:15pm EST - 3:45pm EST
Speakers: Yong Tang (Director of Engineering, DataDirect Networks); John Belamaric (Senior Staff Software Engineer, Google)
Venue: Building C | Level 3 | Georgia Ballroom 1, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: CoreDNS continues to evolve as the DNS backbone of Kubernetes, with recent updates focused on performance, extensibility, and operational hardening. This session explores a new plugin that improves multi-core scalability, enabling higher throughput and lower latency in large-scale clusters. We’ll share tuning strategies, production insights, and how CoreDNS is adapting to modern DNS workloads. We’ll also touch on best practices for securing DNS in Kubernetes—including common attack patterns like spoofing and cache abuse—and how CoreDNS features can help mitigate them. Finally, we’ll review recent plugin ecosystem changes and preview what’s ahead on the project roadmap. Whether you operate clusters or contribute upstream, this session offers practical guidance on running CoreDNS securely and efficiently at scale.
SIG Cloud Provider Deep Dive: Expanding Our Mission#
Time: 3:15pm EST - 3:45pm EST
Speakers: Bridget Kromhout (Principal Product Manager, Microsoft); Michael McCune (Senior Principal Software Engineer, Red Hat); Joel Speed (Principal Software Engineer, Red Hat); Walter Fender (Staff Software Engineer, Google); Jesse Butler (Principal PM, AWS)
Venue: Building C | Level 3 | Georgia Ballroom 3, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: When SIG Cloud Provider began, a major focus was the migration of cloud provider code out of the main Kubernetes code repository. The successful out-of-tree migration increased flexibility but also added challenges for testing and verification. We’ll cover where efficiencies and coordination between providers are making the ecosystem more robust. CSI, CNI, and other domain-specific API solutions often lack broader reuse across the ecosystem, so providers themselves are collaborating to build the authoritative low-level KRM layer, clearing the way with robust cloud resource management abstractions so as to allow builders to create solutions at higher layers. The SIG maintainers will discuss how doing more collaboration and creating cross-provider building blocks in the Kubernetes community will lead to a better platform for everyone. Expect to walk away from this talk with a clear vision for where SIG Cloud Provider has evolved, and how you can contribute to creating the SIG’s future!
Deep Dive: Handling Kubernetes Memory Pressure & Achieving Workload Stability With NodeSwap#
Time: 3:15pm EST - 3:45pm EST
Speakers: Ajay Sundar Karuppasamy (Google LLC & Itamar Holder, Red Hat)
Venue: Building B | Level 4 | B405-406a, Atlanta, GA, USA
Type: OPERATIONS + PERFORMANCE
Description: Effectively managing memory pressure is key to stable Kubernetes workloads and preventing OOM errors. This talk shows how using disk-space as temporary memory (swap) on Kubernetes nodes enhances application stability and makes your nodes more resilient to memory spikes. The Kubernetes ‘NodeSwap’ feature, set for GA in 1.34, promises better resource management and node stability. NodeSwap offers stability for long-running applications from abrupt termination due to sudden memory spikes. However, swap is not without trade-offs, as it can decrease predictability and degrade performance for some workloads. We’ll share our stress-test learnings on potential problems and performance tuning with kernel parameters to optimize swap on Kubernetes nodes. Attendees will learn practical swap utilization limits, recommended configurations and their effects on node stability. We will also discuss ongoing work of a critical pod-level API for fine-grained swap control and workload compatibility.
Finally, a Cluster Inventory I Can USE!#
Time: 3:15pm EST - 3:45pm EST
Speakers: Corentin Debains (Software Engineer, Google); Ryan Zhang (Principal Software Engineering Manager, Microsoft)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 4, Atlanta, GA, USA
Type: OPERATIONS + PERFORMANCE
Description: For years, multi-cluster systems created an ad hoc and proprietary list of clusters, which led to fragmentation in the market. Ever dreamt of having a Cluster Inventory that you could use with every controller, on every platform? This talk introduces the newly standardized credential support for the ClusterProfile CRD, resolving a key blocker for the multi-cluster controller community. Learn how Sig-multicluster’s year-long effort has culminated in a universal Cluster Inventory, usable with any controller on any platform. We will also go over some of the Fleet managers that support ClusterProfile, and how a multi-cluster application can leverage the ClusterProfile API to interact with multiple clusters at the same time. Additionally, we will discuss best practices and future extensions for ClusterProfile. This session empowers you to learn a unified approach to multi-cluster management, eliminating ad hoc solutions and fostering interoperability.
From Code To Cluster: Orchestrating 100,000+ Kubernetes Deployments With 1 Pipeline#
Time: 3:15pm EST - 3:45pm EST
Speakers: Andrada Raducanu (DevOps Engineer, ING Hubs Romania)
Venue: Building B | Level 3 | B312-314, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: There is a sea of tools one can use for the critical phase of Deployment during your SDLC. To keep our environment secure and reliable, ING chose to work with Kubernetes and Azure DevOps. In this talk, we will share the success story of how 1400 in-house developed APIs reached 100k+ Production deployments in half a year, using one single pipeline. In order to stay in control, we use Open Policy Agent. To ensure the reliability and the resilience of the APIs, we use tools like: QuotaAutoscaler (ING open source CRD) and HorizontalPodAustoscaler, native rollback mechanisms with Helm, automatic certificates using CertManager and Prometheus monitoring. The pipeline deploys code in Azure Kubernetes Service and on-prem Kubernetes clusters. This solution was built as a platform, designed to be agnostic to the target system, reducing the cognitive load on the teams and allowing them to focus on the application development. We call this The Kingsroad.
Turbocharging Argo CD: Replacing Redis With Dragonfly for Better Performance and Lower Bills#
Time: 3:15pm EST - 3:45pm EST
Speakers: Soumya Ghosh Dastidar & Justin Marquis (Founding Engineer, Akuity Inc.)
Venue: Building B | Level 3 | B304-305, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Argo CD is a foundational tool in GitOps workflows, enabling declarative continuous delivery in Kubernetes environments. However, it relies heavily on Redis for caching and state management, making Redis a critical component that can become a bottleneck as workloads scale. Enter Dragonfly, a drop-in Redis replacement purpose-built for modern cloud hardware. Dragonfly delivers up to 25x better performance at 80% lower cost, making it a compelling alternative for performance-intensive Kubernetes-native applications. In this session, we’ll share how we seamlessly replaced Redis in Argo CD with Dragonfly, using the Dragonfly Operator without modifying Argo CD’s internals. You’ll learn how to deploy Dragonfly with Kubernetes-native tooling, manage it at scale, and configure it for high availability, security, and resilience. This talk is ideal for platform engineers, SREs, DevOps teams, and Kubernetes users seeking improved performance, lower costs, or easier Redis operations.