KubeCon + CloudNativeCon North America 2025#
Wednesday, November 12, 2025#
Total Sessions: 165
Badge Pick-Up#
Time: 8:00am EST - 6:00pm EST
Venue: Building B | Level 4 | Registration Hall B, Atlanta, GA, USA
Type: REGISTRATION
Coat + Bag Check#
Time: 8:00am EST - 6:30pm EST
Venue: Building A | Level 4 | A412, Atlanta, GA, USA
Type: REGISTRATION
Description: Please note we are unable to store any items overnight and cameras, laptop equipment or any other electronic devices cannot be stored at any time.
Keynote: Welcome Back + Opening Remarks#
Time: 9:00am EST - 9:05am EST
Speakers: Jorge Castro (Developer Relations, Cloud Native Computing Foundation)
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Keynote: Turn Up the Heat: Driving Cloud Native Innovation into Real-World Impact#
Time: 9:07am EST - 9:22am EST
Speakers: Maura Kelly (Mailchimp, Director of Engineering); Steven Bower (Cloud Native Compute Services, Manager, Bloomberg); Adam Kocoloski (Distinguished Engineer, Airbnb); Liguang Xie (Engineering Director, ByteDance)
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Description: In this keynote, leaders from end user companies will share how their organization leverages cloud native technologies to tackle real-world challenges, scale efficiently, and drive innovation. Through firsthand insights, you’ll learn key lessons, best practices, and the evolving needs of cloud native adopters. MAILCHIMP Maura Kelly, Engineering Director at Mailchimp, will share how the team migrated a massive on-prem monolith to Intuit’s cloud-native platform built on CNCF technologies, achieving 99.997% availability while development continued. She’ll highlight three key engineering practices that made this scale and reliability leap possible. BLOOMBERG Learn about Bloomberg’s Kubernetes journey, which started with us building AI infrastructure and has led to us building cloud native platforms. Steve Bower, Manager of Bloomberg’s Cloud Native Compute Services Engineering group, will briefly highlight some of the many different platforms our engineers have built atop Kubernetes using open source CNCF projects, and how his team is now using KServe, Karmada, and the Envoy AI Gateway to power the next generation of our enterprise Gen AI infrastructure and workloads. AIRBNB Description TBA AIBRIX AIBrix was born from our mission to run large-scale model inference efficiently across heterogeneous accelerators, regions, and even clouds — all in a Kubernetes-native way that developers already love. Over the past year, we’ve evolved AIBrix into a powerful, modular GenAI inference infrastructure built on Kubernetes — featuring innovations like distributed KVCache offloading, multimodal serving, intelligent request routing, and flexible orchestration for disaggregated inference. Today, I’m thrilled to announce that AIBrix has been invited to join the CNCF — and we warmly invite the community to collaborate with us in shaping the next generation of open, scalable AI infrastructure.
Keynote: How One Line of Code Freed 30,000 CPU Cores: Deep-Diving Fluent Bit at Petabyte Scale#
Time: 9:24am EST - 9:29am EST
Speakers: Fabian Ponce (Member of Technical Staff, OpenAI)
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Description: Processing 9+ petabytes of logs daily, OpenAI’s Kubernetes fleet was maxed out; throttling events were dropping critical logs and there was no more CPU to allocate. Using perf to profile Fluent Bit revealed an unexpected culprit: millions of unnecessary syscalls from inotify’s interaction with line-by-line log flushing. A one-line configuration change cut CPU usage by 50%, freed 30,000 cores for other workloads, and eliminated the largest source of throttling events. Learn how understanding tools at the system level unlocks optimizations that benefit everyone, and why OpenAI is contributing this fix upstream to the CNCF community.
Keynote: Awards Ceremony#
Time: 9:31am EST - 9:41am EST
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Sponsored Keynote: Universal Mesh: Simplifying Modern Connectivity#
Time: 9:43am EST - 9:48am EST
Speakers: Frank Mancina (VP of Engineering and Operations, HAProxy Technologies)
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Description: Universal Mesh is an architectural pattern that emerged from observing common connectivity challenges faced by HAProxy customers. Unlike traditional, more complex service meshes, Universal Mesh is a fresh approach to creating a system that facilitates a more seamless and secure connectivity. This pattern lets you connect systems across different environments, like business units, cloud regions, and outside partners. It allows both Kubernetes and non-Kubernetes services to communicate seamlessly, no matter where they are located. It offers a streamlined approach to security, observability, and scalability, integrating with existing brownfield services and Kubernetes fundamentals without altering network fundamentals. Find out how Universal Mesh can connect fragmented setups, simplifying the complex connectivity issues that modern teams face.
Keynote: There’s Nothing to Fear About the EU’s New Cybersecurity Law#
Time: 9:50am EST - 9:55am EST
Speakers: Greg Kroah-Hartman (Linux Kernel Maintainer & Fellow, The Linux Foundation)
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Description: There has been a lot of uncertainty around the European Union’s new Cyber Resilience Act when it comes to open source projects and developers. This short talk will go into the basics of what developers need to know about this law that will affect them no matter where they live, and the two tiny things that they will need to do if they manage a project that they should already be doing to comply with it.
Sponsored Keynote: Accelerating Innovation: The Evolution of Kubernetes and the Road Ahead#
Time: 9:57am EST - 10:03am EST
Speakers: Jago Macleod (Engineering Director, Kubernetes & GKE, Google)
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Description: In its first decade, Kubernetes earned a critical place in modern workload and infrastructure orchestration - but new workloads and unprecedented scale were pushing the limits. Last year, we highlighted some fundamental gaps in Kubernetes for emerging workloads, and shared Google’s commitment to invest in its evolution. In this keynote, we will reflect on the foundational role and exciting future of Kubernetes as the leading platform for both traditional and AI workloads. Kubernetes evolved quickly over the past year, due largely to the declarative API, the modular and extensible nature of the platform, and the strength of the ecosystem - and innovation is only accelerating. With some key new primitives landing recently, and the broader evolution of Kubernetes well underway, we’ll close with a vision of the future and highlight some exciting projects underway.
Keynote: Cloud Native for Good#
Time: 10:05am EST - 10:20am EST
Speakers: Faseela K (Experienced Cloud-native Developer, Ericsson Software Technology); Omar Mohsine (Open Source Coordinator, United Nations); Roberto Machorro (Sr. Software Developer, Child Rescue Coalition); Bodhish Thomas (Chief Architect, Open Healthcare Network); Jayson Workman (Director of Platform Engineering, American Red Cross)
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Description: Cloud native technologies have revolutionized industries across the board—but what about their role in solving real-world societal challenges? While we often celebrate enterprise success stories, the true transformative power of Kubernetes and its wider ecosystem is also unfolding in unexpected and deeply impactful ways. Did you know Kubernetes is helping over 30 governments map, monitor, and improve internet access for schools? Or that KubeVirt and Longhorn have helped rescue 3,800+ children from abuse, leading to the arrest of over 16,700+ online predators in over 105 countries? Or that Prometheus and OpenTelemetry not only power platforms that have saved millions from floods and pandemics, but—closer to home—team up with Kubernetes to steer the fleet, check its vitals, and trace every heartbeat of 2000+ TeleICU Bed edge nodes in perfect rhythm across geographically dispersed locations? Or that Kubernetes, ArgoCD, and other CNCF tools power mission-critical services like blood supply management, disaster response, and emergency training—stretching every dollar to maximize humanitarian impact? In this compelling panel, Faseela brings together real-world end-user stories from around the globe, showcasing how open source and cloud-native technologies are being harnessed to tackle humanitarian issues. From disaster response and education to public health, child safety, and environmental conservation, learn how organizations are using cloud-native tooling to collaborate at scale, unlock powerful analytics, and reallocate resources from overhead to frontline impact. With expert speakers from diverse domains, this session offers a unique look into how Kubernetes and its ecosystem are driving innovation and ultimately driving social good, sustainability and inclusivity beyond the enterprise world!
Keynote: Closing Remarks#
Time: 10:20am EST - 10:30am EST
Venue: Building B | Level 1 | Exhibit Hall B2, Atlanta, GA, USA
Type: KEYNOTE SESSIONS
Coffee Break ☕#
Time: 10:30am EST - 11:00am EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: BREAKS
Relaxation Station#
Time: 10:30am EST - 5:00pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: EXPERIENCES
Description: Take a break from the buzz of the Solutions Showcase and sit back and relax at the Relaxation Station. Enjoy a soothing massage, try your hand at crocheting, or challenge someone to a game of chess. This is the perfect spot to recharge and unwind before diving back into action. Sponsored by:
Gold Sponsor In-Booth Demos#
Time: 10:30am EST - 11:00am EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: Sponsor: Aqua Demo: See Aqua Secure AI in Action! Booth Number: 230 Sponsor: Canonical Booth Number: 821 Sponsor: Odigos Demo: Using eBPF to Power OpenTelemetry at Scale Booth Number: 831 In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Solutions Showcase#
Time: 10:30am EST - 5:00pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Sponsored Demo: Beware Argo CD App Promotion Anti-Patterns and Embrace Scalable Promotion in GitOps Cloud#
Time: 10:35am EST - 10:55am EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: We all love and use Argo CD to sync our applications to clusters. With thousands of applications, what’s the best way to promote changes from one application to another? In 2025, anti-patterns abound with many teams adopting backwards and even dangerous methods to orchestrate changes across applications. In this session, we’ll expose anti-patterns like promoting SHAs, misusing branch environments and many more. We’ll replace these with a pathway to handling app and environment promotion that is flexible, easy to configure, and incredibly scalable, even to the point of handling thousands of deployment targets with a couple of simple Kubernetes CRDs for configuration. We’ll show how the value of abstracting clusters into environments, how to build relationships and customize diffing between applications, and demonstrate how to streamline your change management in 2026. In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Project Pavilion Tour#
Time: 10:40am EST - 11:00am EST
Speakers: Lori Lorusso (Director of Outreach, Rust Foundation)
Venue: Building B | Level 1 | Exhibit Hall B3-B5 | Project Pavilion, Atlanta, GA, USA
Type: PROJECT OPPORTUNITIES
Description: Explore the Project Pavilion, a hub of innovation and discovery! Take part in daily tours, interact with project maintainers at their kiosks, gain insights on community engagement and KCD event organization, and learn more about certification opportunities to showcase your expertise. This tour will include an introduction to the Pavilion, making introductions, interacting with maintainers, and ensuring you end up talking to the right projects!
Learning Lounge: What Platform Engineers Need to Know About Developer Experience#
Time: 10:45am EST - 11:00am EST
Speakers: Mauricio Salatino (Ecosystem Engineer, Diagrid)
Venue: Building B | Level 1 | Exhibit Hall B3-B5 | Learning Lounge, Atlanta, GA, USA
Type: EXPERIENCES
Description: 10-Minute Tip Talk
Models as Microservices, Platforms as Partners: Collaboratively Building ML Infra at Hinge#
Time: 11:00am EST - 11:30am EST
Speakers: Stephanie Pavlick (Machine Learning Platform Engineer, Hinge)
Venue: Building B | Level 4 | B401-402, Atlanta, GA, USA
Type: AI + ML
Description: What does it take to build an online model serving platform that ML engineers actually want to use? At Hinge, we’re building an ML platform that smooths the rough edges of OSS Kubernetes-native tools with interfaces aligned to internal conventions like gRPC and OTel, so ML engineers can focus on models that shine. In this talk, Stephanie Pavlick shares how Hinge’s AI Platform Core team designed a self-serve platform for deploying and monitoring online models using Ray Serve, MLflow, Grafana, and more. The platform enables ML engineers to deploy models as microservices without touching Kubernetes or Helm while offering fine-grained observability and standardized SLIs. But the story is more than just tooling. Stephanie will share how the team cut model production timelines by over 40% through early partnerships and a focus on developer experience. Along the way, they fostered a culture of collaboration and trust that helped drive broader adoption across Hinge’s engineering org.
Your Kubernetes Playbook at Your Fingertips: Advanced Troubleshooting With MCP, RAG, and K8sgpt#
Time: 11:00am EST - 11:30am EST
Speakers: David vonThenen (NetApp & Yash Sharma, DigitalOcean)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 1, Atlanta, GA, USA
Type: AI + ML
Description: Kubernetes environments evolve quickly, making static troubleshooting methods outdated soon after they’re deployed. k8sgpt has emerged as a powerful tool to simplify diagnostics, but what if it could dynamically adapt to your team’s runbooks and instantly consult the latest Kubernetes documentation? Leveraging the Model Context Protocol (MCP) and Retrieval-Augmented Generation (RAG), this session introduces a novel integration enabling k8sgpt to ingest real-time updates, ensuring your diagnostics remain perpetually relevant and accurate. Further, by harnessing RAG explicitly tailored to your organization’s unique operational playbooks, k8sgpt doesn’t just diagnose… it proactively recommends solutions aligned with your workflows. Through practical demonstrations, attendees will witness firsthand how MCP-enabled RAG transforms k8sgpt troubleshooting into a highly customized and continuously evolving practice, significantly improving response times, accuracy, and operational agility.
“Do You Even Merge?”#
Time: 11:00am EST - 11:30am EST
Speakers: Welcome To Maintainers Life (Please Bring Snacks and Boundaries - Nitish Kumar, Akuity); Verónica López (Software Engineer, AuthZed); Lee Calcote (Founder, Layer5)
Venue: Building B | Level 3 | B308-309, Atlanta, GA, USA
Type: CLOUD NATIVE EXPERIENCE
Description: Open Source maintainers are often seen as gatekeepers of progress, expected to fix bugs, review PRs, design features, answer questions, and keep the community happy. But behind that façade is often an unexpected reality, especially in high-velocity CNCF projects. In this talk, we will share what it’s like to maintain a CNCF Graduated project, not just the technical side, but the human side: how we keep the project growing, prioritize responsibly, and how governance structures like SIGs help scale responsibilities. You’ll hear stories, lessons, and hard truths that rarely make it into blog posts. The more successful a project becomes, the harder it gets to sustain, and that’s something we don’t acknowledge enough. If you’ve ever wondered why your issue didn’t get a response, why a feature was declined, or what “being supportive” really looks like in open source, this talk will help you understand the maintainer’s side and maybe rethink your own role in the ecosystem to support them.
The OS Behind the Curtain: What Happens on Your Nodes When Things Happen in Your Cluster#
Time: 11:00am EST - 11:30am EST
Speakers: Joe Thompson (Consulting Engineer, Clarity Business Solutions)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 4, Atlanta, GA, USA
Type: CLOUD NATIVE NOVICE
Description: Sometimes it seems like magic when we deploy a service with a few lines of YAML and Kubernetes starts actually routing traffic to our pods – or we create those pods with a certain configuration and Kubernetes sets everything up for us. But what is Kubernetes actually doing? In many cases, it’s actually the node’s OS that does the heavy lifting behind the scenes, through existing OS features that were used to manage workloads for years before Kubernetes appeared. In this session, Joe pulls aside the curtain to give you a tour of the OS nuts and bolts behind the magic of services, pods, and other mystical Kubernetes incantations. Whether you’re a novice who’s never heard of cgroups or iptables, or a more experienced user who just wants to learn how everything you already knew ties together, you’ll see what your nodes do to make Kubernetes possible and gain a deeper understanding of why certain things in your cluster work the way they do.
Agent-Driven MCP for AI Workloads on Kubernetes#
Time: 11:00am EST - 11:30am EST
Speakers: Ganeshkumar Ashokavardhanan & Qinghui Zhuang (Software Engineer 2, Microsoft)
Venue: Building B | Level 2 | B206, Atlanta, GA, USA
Type: EMERGING + ADVANCED
Description: Managing AI inference workloads on k8s is hard: you need to choose the right GPU instance type, configure services, balance costs vs. performance, write tedious YAML, and continuously monitor utilization and inference latency. What if you can leverage AI to determine these factors with high accuracy? Learn how to build an end-to-end AI-PaaS on k8s by combining cloud-native tools, Model Context Protocol (MCP) servers, and intelligent agents. We will show how an agent can resolve a simple text command (“deploy llama-3-70b-chat”), call out to external MCP metadata services (e.g. HuggingFace), calculate the optimal GPU topology, and provision nodes via the Kubernetes AI Toolchain Operator, deploy the model, and then automatically scale based on real-time metrics – all without hand-editing a single manifest. We will also discuss how to address underspecified aspects (like model quantization levels, cost vs latency tradeoffs), and the guardrails needed to validate before deploying.
🚩Capture The Flag Experience#
Time: 11:00am EST - 4:45pm EST
Venue: Building B | Level 2 | B203, Atlanta, GA, USA
Type: EXPERIENCES
Description: The Capture The Flag (CTF) experience runs concurrently to KubeCon + CloudNativeCon North America 2025! Delve deeper into the dark and mysterious world of Cloud Native security! Exploit a supply chain or foothold attack and start your journey deep inside the target infrastructure, utilize your position to hunt and collect the flags, and hopefully learn something new and wryly amusing along the way! Attendees can play three increasingly treacherous and demanding scenarios to bushwhack their way through the dense jungle of Cloud Native security. Everybody is welcome, from beginner to seasoned veterans, as we venture amongst the low-hanging fruits of insecure configuration and scale the lofty peaks of cluster compromise! Learn more about the CTF.
Building Resilience Workshop#
Time: 11:00am EST - 12:00pm EST
Venue: Building B | Level 2 | B211-212, Atlanta, GA, USA
Type: INCLUSION + ACCESSIBILITY
Description: We all encounter unexpected hurdles that can impact our well-being and productivity. This workshop offers practical tools to build resilience and navigate those challenges effectively. Through shared experiences and adaptable strategies, you’ll learn to strengthen your ability to bounce back and maintain focus, leaving with a personalized toolkit to confidently manage adversity, no matter where you are in the world.
Gateway API: Table Stakes#
Time: 11:00am EST - 11:30am EST
Speakers: Shane Utt (Senior Principal Software Engineer @ Red Hat | SIG Network Chair, Gateway API Maintainer, Red Hat); Candace Holman (Associate Manager, Engineering, Red Hat); Mike Morris (Senior Product Manager, Microsoft); Lior Lieberman (Senior Engineering Lead, Google); Kellen Swain (Software Engineer, Google)
Venue: Building C | Level 3 | Georgia Ballroom 3, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Gateway API has matured into a core building block for cloud-native application networking. As organizations increasingly converge on this technology, it’s expanding to support critical new use cases, including AI Inference. In this session, maintainers will provide a comprehensive overview of the latest progress, challenges, and roadmap, highlighting the recent v1.4 release and future milestones. They’ll dive into complex topics such as handling conflicting use cases like TLS termination across multiple routes, efforts to create a more intuitive API, and the upcoming critical features. Beyond a mere status update, the talk will offer deep insights into project decision-making and the challenges of managing a large-scale open-source initiative. Whether you’re a platform builder, SIG contributor, or someone using ingress Gateways or service mesh in production, this session invites you to help shape the future of Kubernetes networking.
Introducing Helm 4#
Time: 11:00am EST - 11:30am EST
Speakers: Matt Farina & Robert Sirchia (Distinguished Engineer, SUSE)
Venue: Building C | Level 3 | Georgia Ballroom 1, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: The wait is over! After six years with Helm v3, Helm v4 is finally here. In this session you’ll learn about Helm v4, why there was 6 years between major versions (from backwards compatible feature development to maintainer ups and downs), what’s new in Helm v4, how long Helm v3 will still be supported, and what comes next. Could that include a Helm v5?
Offline, Not Off-Limits: Edge Fleet Management With Argo CD#
Time: 11:00am EST - 11:30am EST
Speakers: Alexander Matyushentsev (Chief Software Architect, Akuity)
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Kubernetes at the edge is no longer experimental - it’s becoming the default choice for various use cases. As adoption grows, new challenges emerge: managing scale, handling connectivity issues, and delivering updates. Sparse, unreliable internet connectivity - requires a rethinking of typical GitOps workflows. This talk presents a production-grade, end-to-end architecture with Argo CD at the center, enabling centralized management of a fleet of disconnected or intermittently connected clusters. The design includes an in-cluster Git server and container image registry to ensure local availability of both manifests and images during offline operation. We’ll cover controlled sync strategies, secure update propagation during reconnection windows, and how to bootstrap new clusters in isolated environments. The talk concludes with a live demo of the full setup, and attendees will walk away with a reference implementation they can adopt for industrial, telco, or remote field deployments.
Strengthening Kubernetes Trust: SIG Auth’s Latest Security Enhancements#
Time: 11:00am EST - 11:30am EST
Speakers: Anish Ramasekar (Mo Khan, Stanislav Láznička, Rita Zhang); Peter Engelbert (Microsoft)
Venue: Building C | Level 1 | C111-112, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: SIG Auth is leading efforts to strengthen Kubernetes’ authentication and authorization foundations. This session covers recent and upcoming features shaping security across the stack. Secure image pulls are being enabled using ephemeral ServiceAccount tokens, reducing reliance on long-lived secrets and node-scoped credentials. Kubernetes is gaining a new mechanism for provisioning X.509 certificates directly to pods via the kubelet, enabling strong mTLS authentication and service-to-service communication. Kubelet serving certificate validation is being hardened to prevent node impersonation, especially in dynamic or on-prem environments. In resource management, DRA adds support for privileged admin access to devices in use, enabling secure diagnostics without weakening isolation. We’ll also cover current and future improvements in authorization, such as tighter policy for image pull operations. Join us to learn how these efforts are improving the trust model across Kubernetes.
Retrofitting OTEL Collectors & Prometheus#
Time: 11:00am EST - 11:30am EST
Speakers: How To Overcome Scale/Design Limitations - Vijay Samuel & Sandeep Raveesh (Observability Tech Lead, eBay)
Venue: Building B | Level 3 | B304-305, Atlanta, GA, USA
Type: OBSERVABILITY
Description: We have all run into this problem - “OTEL Collector/Prometheus does things differently and it doesn’t fit my organization’s design or can’t really scale to my need”. Once that statement is made, the almost immediate goto is to either customize or ditch the project completely. With some of the connectors/processors in OTEL we ran into this issue where we were unable to operate them. Ex: eBay’s span traffic spans multiple kubernetes clusters and multiple regions. Service graph connector would need to see spans globally routed into a single collector. At scale, this requires a lot of memory to account for all kind of trace durations. Prometheus has no support for long term retention for exemplars. In this talk we discussion practical solutions of how we leveraged our Clickhouse based internal tracestore to provide spans to servicegraph connector in a sustainable way to generate metrics and how we used clickhouse to provide long term retention of exemplars.
Rearchitecting Compute at Coinbase: Migrating To Karpenter for Fast, Reliable Scaling#
Time: 11:00am EST - 11:30am EST
Speakers: Frances Chong (Staff Software Engineer, Coinbase)
Venue: Building B | Level 4 | B406b-407, Atlanta, GA, USA
Type: OPERATIONS + PERFORMANCE
Description: At Coinbase, we run thousands of workloads - from high-throughput APIs to long-lived blockchain nodes. As traffic patterns scale due to crypto’s volatility and expectations around reliability and cost got tighter, our prior cluster setup using cluster-autoscaler and EKS managed node groups couldn’t strike the right balance. This talk covers why we moved to Karpenter (including pain points around scaling latency, bin packing, and satisfying unique topology and hardware requirements) and what it took to get there. You’ll learn how we implemented a scalable and tag-driven EC2NodeClass abstraction, tuned Karpenter for burst elasticity, and built observability and guardrails to make scaling safer and more predictable. Finally, we’ll share lessons from overly aggressive node consolidation, launch template drift, VPC-CNI coordination, and how we’ve adapted our platform to support rapid, responsive scaling under crypto-scale pressure.
Adopting a Fleet-first Mindset#
Time: 11:00am EST - 11:30am EST
Speakers: Andy Beane (Senior Software Engineer, Spotify)
Venue: Building B | Level 4 | B405-406a, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Navigating platform engineering at scale requires innovation. This talk explores Spotify’s ‘Fleet-first’ mindset—a strategic shift in managing and evolving our vast software ecosystem. More than just tools, it’s a cultural change: treating our software as a fleet to drive rapid, reliable, and scalable improvements. We’ll cover how Golden Tech, Declarative Infrastructure, and Fleet Management tools support this approach. From identifying the target fleet of software, to rolling out migrations of varying complexity, we’ll share how Spotify enables efficient change at scale. Real-world examples will show how a Fleet-first approach saves engineers time and improves overall tech health. Whether you’re managing large infrastructure or evolving engineering practices, you’ll gain actionable insights and a framework to adopt Fleet-first thinking—boosting both efficiency and resilience in your organization.
Making Platform Engineering Accessible: From Newcomer To Power User#
Time: 11:00am EST - 11:30am EST
Speakers: Luke Philips (Independant & Julia Furst Morgado, Dash0)
Venue: Building B | Level 3 | B312-314, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Platform engineering promises accelerated developer productivity, but complex abstractions often create barriers instead of removing them. How do you design platforms that welcome newcomers while empowering experienced users? Drawing from real-world implementations at CNCF Orgs and community insights from organizing major cloud native events, we’ll demonstrate progressive platform experiences that grow with developers. Learn about abstractions from Helm, ArgoCD, Dapr, or CNOE that level up without sacrificing flexibility, implement progressive patterns, and create feedback loops ensuring platforms evolve with user needs. Through practical examples whether GitOps or Dapr or other enablers we’ll show how to build platforms serving both overwhelmed newcomers and power users demanding control. Get actionable strategies for making sophisticated platforms approachable and effective.
Patch Me If You Can: Tackling Outdated Addons Before They Become a Risk#
Time: 11:00am EST - 11:30am EST
Speakers: Stevie Caldwell & Andy Suderman (SRE Tech Lead, Fairwinds)
Venue: Building B | Level 3 | B302-303, Atlanta, GA, USA
Type: SECURITY
Description: Kubernetes addons often sit quietly behind the scenes—until they become your biggest security liability. Whether it’s an old version of a DNS provider, metrics server, or ingress controller, these components are essential to your cluster’s operation but rarely treated as part of a regular update cycle. In this session, we’ll dive into the risks of neglecting addon maintenance and share practical strategies for getting ahead of potential failures or vulnerabilities. You’ll learn how to assess addon health, prioritize updates, and communicate the business case for proactive maintenance—even when everything seems to be working “just fine.” Walk away with tools to build a repeatable, low-friction update plan that boosts both the security and reliability of your clusters.
🚨 Contribfest: From Farm (Fork) To Table (Feature): Growing Your First (Free-range Organic) Istio PR#
Time: 11:00am EST - 12:15pm EST
Speakers: Ian Rudie (Principal Software Engineer, Solo.io); Lin Sun (Head of Open Source & CNCF TOC, Solo.io); Faseela Kundattil (Experienced Cloud-native Developer, Ericsson Software Technology); Steven Jin (Software Engineer, Microsoft)
Venue: Building B | Level 2 | B207, Atlanta, GA, USA
Type: 🚨 CONTRIBFEST
Description: Curious about how we build the most powerful and innovative service mesh available? Want to get started as an open source contributor in a popular CNCF graduated project, learn about the codebase underlying the mesh or simply take a peek behind the curtain to see how it’s done? This is your chance! Join Istio maintainers for a session diving into the codebase and learn how you can join us to help shape the future of Istio. During this session we’ll cover the architecture of the two primary operating modes for Istio, how to set up your development environment, how to interact with the community and start contributing your first PR to Istio.
🚨 Contribfest: Kyverno#
Time: 11:00am EST - 12:15pm EST
Speakers: Let’s Build Together! - Jim Bugwadia & Cortney Nickerson (Head of Community, Nirmata)
Venue: Building B | Level 2 | B208, Atlanta, GA, USA
Type: 🚨 CONTRIBFEST
Description: This hands-on session is designed to enable end-users and ecosystem partners to contribute to Kyverno, a CNCF policy as code engine that elegantly solves critical challenges across security, automation, and compliance, by understanding the internals of the project and its governance. You will learn about Kyverno’s architecture, the role of each policy type, the components, how to set up your development environment, and how to contribute to the project. This session will be led by Kyverno maintainers and contributors and is organized so that both developers as well as non-developers can contribute across the software base, sample policies, and documentation. Join us to shape the future of cloud native governance together!
📚 Tutorial: Getting Started With gRPC: Hands-On Codelab#
Time: 11:00am EST - 12:15pm EST
Speakers: Richard Belleville & E John Feig (Staff Software Engineer, Google)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 2-3, Atlanta, GA, USA
Type: 📚 TUTORIALS
Description: Dive into the world of gRPC with this interactive codelab. Roll up your sleeves and build a fully functional gRPC service from the ground up, in Go, Java, or Python. You’ll gain first-hand experience with: - Protocol Buffers. You’ll learn how to define service contracts and data structures using this powerful interface definition language. - gRPC Code Generation: Streamline development by automatically generating Python code from your protobuf definitions. - Client/Server Communication: Implement client and server logic to establish seamless communication between distributed components. - Error Handling and Interceptors: Explore techniques for graceful error handling and implementing middleware using gRPC interceptors.
LFX Insights#
Time: 11:05am EST - 11:30am EST
Speakers: Jonathan Reimer (The Linux Foundation)
Venue: Building B | Level 1 | Exhibit Hall B3-B5 | Project Pavilion, Atlanta, GA, USA
Type: PROJECT OPPORTUNITIES
Description: LFX Insights is a Linux Foundation initiative that provides standardized, data-driven metrics to help you understand the open source projects you rely on. We’ll explain how we identify critical projects, outline the framework used to evaluate project health, and share the data systems behind our analyses. By improving transparency and access to key indicators, LFX Insights aims to enable better decisions, identify where support is needed, and strengthen the long-term sustainability of the open source ecosystem.
Sponsored Demo: No-Touch Observability with the OpenTelemetry Operator and Dash0#
Time: 11:05am EST - 11:25am EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: Observability should empower developers - not burden them. Yet too often, instrumenting applications for traces, metrics, and logs means code changes, redeploys, and ongoing friction. In this demo-focused session, we’ll showcase the OpenTelemetry Operator, the community project that brings no-touch instrumentation to your Kubernetes workloads. You’ll see how platform engineers can: - Deploy the OpenTelemetry Operator and Collectors - Instrument applications automatically - without touching a single line of code - Correlate metrics, traces, and logs across services to unlock deep insights We’ll finish by visualizing the results in Dash0, showing how open standards like OpenTelemetry let you achieve full-stack observability with minimal effort - and without vendor lock-in. In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Sponsored Demo: Streamlining Kubernetes Operations with Model Context Protocol#
Time: 11:35am EST - 11:55am EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: Kubernetes operations can be complex, especially when it comes to writing manifests, tuning configurations, and troubleshooting live clusters. In this demo, we’ll show how the open source Amazon EKS MCP server bridges AI assistants with Kubernetes clusters to make these tasks faster and easier. You’ll see live examples of AI agents using MCP to: 1/Generate Kubernetes manifests taking advantage of EKS Auto Mode configurations 2/Provide context-aware guidance for cluster setup and management. 3/Analyze logs, events, and cluster state to troubleshoot common issues. By the end of the session, you’ll understand how MCP gives AI tools the context they need to work with Kubernetes effectively—turning complex operations into simple, guided workflows. In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
AI Models Are Huge, but Your GPUs Aren’t: Mastering Multi-Node Distributed Inference on Kubernetes#
Time: 11:45am EST - 12:15pm EST
Speakers: Ernest Wong (Software Engineer, Microsoft); Jiaxin Shan (Software Engineer, Bytedance)
Venue: Building B | Level 2 | B206, Atlanta, GA, USA
Type: AI + ML
Description: As AI models like DeepSeek-R1 grow beyond 600B parameters, deploying them for inference becomes a major infrastructure challenge. This talk goes beyond initial setup to show how Kubernetes can support massive AI workloads reliably and efficiently in production. We’ll cover day 0/1 operations with latency, cost, and accuracy tradeoffs in mind: selecting full-precision vs. quantized models; sizing worker nodes for GPU, memory, and networking; managing model parallelism; traffic routing; and adaptive strategies to balance cost and performance. We’ll explore Kubernetes-native challenges like topology-aware scheduling, GPU-NIC binding, and orchestrating inference phases with custom controllers. To support varied prompt lengths, we’ll discuss Prefill/Decode disaggregation in static and pooled modes. Insights come from benchmarks and production experience confirming what works at scale. Attendees leave with diagrams, checklists, and manifests to deploy confidently.
Navigating the AI/ML Networking Maze in Kubernetes: Lessons From the Trenches#
Time: 11:45am EST - 12:15pm EST
Speakers: Antonio Ojea (Staff Software Engineer, Google)
Venue: Building B | Level 4 | B401-402, Atlanta, GA, USA
Type: AI + ML
Description: AI/ML workloads are pushing HPC networking concepts like RDMA, MPI, and patterns for distributed collective operations into Kubernetes. This creates a new learning curve for many platform and infrastructure engineers, especially those looking to bridge their experience from more traditional networking paradigms. This session shares our practical lessons, learned from the trenches while developing networking solutions for these demanding environments. We’ll demystify these advanced technologies and discuss the intricacies of integrating specialized hardware, managing out-of-band RDMA, and understanding the communication patterns vital for distributed training. Discover how Kubernetes, particularly through Dynamic Resource Allocation (DRA), is adapting to expose and manage these complex resources. Gain real-world insights, drawn from our experience building a DRA-based network driver, on making advanced AI/ML networking more accessible and manageable in your cloud-native stack.
Unlocking Financial Progress: Credit Karma’s AI Assistant Powered by Kubernetes for Personalized Ins#
Time: 11:45am EST - 12:15pm EST
Speakers: Raj Kiran Gupta Katakam & Sukanya Moorthy (staff machine learning engineer, credit karma)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 1, Atlanta, GA, USA
Type: AI + ML
Description: Credit Karma empowers 140M+ members toward financial progress with a personalized, data-driven Financial Assistant, offering secure insights and actionable recommendations. Building this requires a robust, scalable, and secure cloud-native architecture. This session details how Kubernetes is foundational to Credit Karma’s AI Financial Assistant by showcasing its critical role in: Retrieval Augmented Generation (RAG): K8s orchestrates RAG pipelines for accurate, context-aware LLM financial advice. LLM Guardrails for Safety: Discover our Kubernetes-native approach to dynamic guardrails, crucial for LLM safety and compliance in finance. K8s Sidecars enable agile deployment within our multi-infrastructure serving strategy (inc. Vertex/AWS). Content Quality Evaluation: K8s powers automated evaluation pipelines, continuously improving recommendation trustworthiness at scale.
Adapt, Include, Thrive: Disability-Informed Strategies for Cloud Native Resilience#
Time: 11:45am EST - 12:15pm EST
Speakers: Travis Johnson (Level 3 Engineer, Convo Communications); Catherine Paganini (Marketing Director, Buoyant); Chris Khanoyan (Tech Lead, Booz Allen Hamilton); Milad Vafaeifard (Lead Software Engineer, Epam); Alex Stine (Site Reliability Engineer, Waystar)
Venue: Building B | Level 3 | B308-309, Atlanta, GA, USA
Type: CLOUD NATIVE EXPERIENCE
Description: In the fast-paced, ever-evolving cloud native landscape, adaptability isn’t optional—it’s essential. Engineers with disabilities have long been experts in navigating shifting environments, not by choice, but by necessity. From working around inaccessible tooling to building inclusive, high-performing teams, they’ve developed techniques for thriving in the face of friction. Join members of the CNCF Deaf and Hard of Hearing Working Group and the Blind and Visually Impaired Initiative as they share powerful, practical lessons in resilience, creativity, and community. This panel isn’t just about accessibility—it’s about learning to thrive in chaos and designing systems that welcome everyone.
Fluent Bit: Smarter Telemetry Routing, Faster Pipelines#
Time: 11:45am EST - 12:15pm EST
Speakers: Eduardo Silva (Engineering Manager, Chronosphere)
Venue: Building C | Level 3 | Georgia Ballroom 1, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Fluent Bit is a lightweight and blazing-fast telemetry processor widely adopted in cloud-native environments. As we celebrate 10 years of Fluent Bit, this session offers a fresh intro for new users and a look at what’s next. We’ll cover major updates including OpenTelemetry-native support, routing enhancements, and performance improvements that let you process logs, metrics, and traces efficiently, right at the edge. Learn how Fluent Bit enables high-performance observability pipelines, scales with your infrastructure, and continues to evolve as a core building block for modern platforms.
SIG Autoscaling Projects Update#
Time: 11:45am EST - 12:15pm EST
Speakers: Jack Francis (Principal Software Engineer, Microsoft); Jason Deal (Software Engineer, AWS)
Venue: Building C | Level 1 | C111-112, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: The continued embrace of Kubernetes as a platform for AI workloads (training and inference) has presented new challenges for ensuring that clusters and workloads can scale and make efficient use of hardware. Since the last Kubecon North America, SIG Autoscaling has been focused on enabling Kubernetes to autoscale to meet the needs of AI Workloads, both for Cluster Autoscaler and karpenter. Join us to hear about the SIG’s work over the past on Dynamic Resource Allocation, improved support for batch workloads, and other features. Attendees will leave the session with a better understanding of the roadmap for the SIG ensuring we can meet the needs of workloads scaling on Kubernetes into the future, and how attendees can get involved.
To InGate and Beyond Ingress-nginx!#
Time: 11:45am EST - 12:15pm EST
Speakers: James Strong (Senior Solution Architect, Isovalent @ Cisco); Marco Ebert (Site Reliability Engineer, Giant Swarm)
Venue: Building C | Level 3 | Georgia Ballroom 3, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Ingress-nginx patches have been released, & we have a working demo of InGate. What more could a Kubernetes Application Developer want? James & Marco will dive into the journey of managing two open source projects—ingress-nginx and InGate. They’ll share insights on how they split the responsibilities between the two maintainers. Attendees will get an inside look at the ongoing work to implement compliance tests to support HTTPRoute, & addressing the most requested community features. Additionally, the maintainers will provide an updated timeline for archiving ingress-nginx, highlighting key milestones & steps involved in migrating to InGate. The presentation will outline how folks can get involved in the migration process, tackle current challenges, and shape the future of InGate. Whether you’re maintaining existing applications, planning a migration, or interested in contributing to open source, this session offers practical guidance and a collaborative vision for the future of InGate.
What’s New in gRPC#
Time: 11:45am EST - 12:15pm EST
Speakers: Kevin Nilson (Software Engineer, Google); Israel Shapiro (Cloud Native solutions architect, Broadcom)
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: This talk will go through all the exciting new features we have recently added to gRPC. We will be covering topics such as OpenTelemetry, Service Mesh, K8s Gateway APIs and GAMMA. We will also cover tips and tricks for building a Microservices Application with gRPC.
From Panic To Peace: Making K8s Controller Observability Suck Less#
Time: 11:45am EST - 12:15pm EST
Speakers: Cat Morris & Derik Evangelista (Staff Product Manager, Syntasso)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 4, Atlanta, GA, USA
Type: OBSERVABILITY
Description: Your infra runs on K8s, and your controllers and custom resources hum happily. Until they don’t. One change later and… UH OH. The logs are a never-ending scroll of problems, dashboards flash like disco lights, and you are knee-deep in debugging hell. You check your custom resources, and they offer no guidance. You eventually fix it, add better error messages, and feel proud. And then it breaks again, with a brand new, even less helpful error. Why is this so hard? In this talk, Derik, a prolific controller-writer, and Cat, a regular listener to DevOps meltdowns, will walk you through tested strategies for making your K8s resources less mysterious and more manageable. You will learn how to: - Add meaningful, traceable errors - Make events and status your best friends - Build dashboards that humans want to use - Set up business-relevant alerts, not just CPU spikes - Automate away the slog using AI Make your controller logs and custom resource statuses a source of clarity, not chaos!
OTel+K8s= ❤️: An Introduction To OpenTelemetry for Kubernetes Users#
Time: 11:45am EST - 12:15pm EST
Speakers: Christos Markou (Principal Software Engineer, Elastic)
Venue: Building B | Level 3 | B304-305, Atlanta, GA, USA
Type: OBSERVABILITY
Description: Running OpenTelemetry on Kubernetes offers immense potential for observability. This session is an introduction to OpenTelemetry for Kubernetes users, providing an analysis of core components, such as Collector receivers and their use cases, types of data collected, key metrics to monitor, and practical log collection approaches. In addition, this session explores various deployment strategies and architectural decisions. The session demonstrates using the OpenTelemetry Collector, Operator, and Helm charts, to achieve effective observability on K8s. Last but not least, it will touch on operational challenges, trade offs, and limitations of running OpenTelemetry at scale. Join us to learn why OpenTelemetry and Kubernetes is all about love!
Maximizing Global Potential: Cost-Optimized, High-Availability Workloads Across Regions#
Time: 11:45am EST - 12:15pm EST
Speakers: Wei Jiang (Tech Lead, CloudPilot AI); Jingkang Jiang (Co-founder and CEO, CloudPilot AI, Inc.); Michael McCune (Senior Principal Software Engineer, Red Hat); Praseeda Sathaye (Principal Product Manager, Amazon)
Venue: Building B | Level 4 | B406b-407, Atlanta, GA, USA
Type: OPERATIONS + PERFORMANCE
Description: Running stateless workloads like AI inference and video encoding demands significant compute, which can be costly and subject to shortages in a single region. Distributing workloads across regions and cloud providers helps optimize both cost and availability. This session explores how multi-cluster scheduling and Karpenter enable dynamic provisioning across preemptible (cost-effective but region-limited) and non-preemptible node VMs. By integrating with cloud providers’ pricing and availability APIs, we present a unified strategy for cost-efficient scheduling and autoscaling without sacrificing availability. As we scale resources across regions and providers, we ensure traffic gets routed the right way - so your service stays responsive and efficient no matter where the pods are running. Attendees will learn how to design a resilient, multi-cloud architecture that adapts to fluctuating costs while ensuring seamless workload execution.
Ambient Global Compute: Orchestrating the Non-Elastic Cloud With Kubernetes#
Time: 11:45am EST - 12:15pm EST
Speakers: Jago Macleod (Engineering Director, Google)
Venue: Building B | Level 4 | B405-406a, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Plot twist: cloud compute isn’t the infinitely elastic utility it once seemed! Customers increasingly rely on reservations and sophisticated capacity management to navigate this reality, especially for scarce, specialized hardware like GPUs & TPUs. This demands evolving Kubernetes beyond single clusters to become the “operating system of the cloud for distributed systems”–what we call “Ambient Global Compute”. As the engineering director leading OSS K8s at Google, I will share a comprehensive view on how K8s is adapting to orchestrate workloads across non-elastic, global environments. Learn about initiatives supporting this shift & how they fit together to support unified multi-cluster, often multi-cloud, deployments, including: DRA, Kueue, Karpenter/ ComputeClasses, MCO, ArgoCD, & more. This talk equips attendees to understand and leverage Kubernetes’ evolution for modern, demanding workloads, and return to a simpler mental model despite growing complexity in infrastructure.
Managing a Million Infra Resources at Spotify: Designing the Platform To Manage Change at Scale#
Time: 11:45am EST - 12:15pm EST
Speakers: Oliver Soell & Fredrik Sommar (Staff Engineer, Spotify AB)
Venue: Building B | Level 3 | B312-314, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Spotify’s “Declarative Infrastructure platform” (a Kubernetes-based infrastructure management platform) enables Spotify’s developers to manage almost a million infrastructure resources, and enables tens of Platform teams to support first- or third-party resources through the platform. The platform empowers Platform teams to manage all resources of a specific type, like Spotify’s Storage team managing all Bigtable instances. You may ask, how do those Platform teams plug into the platform to support their specific resources? Depending on their use case, they can mix and match technologies such as KCC (GCP Config Connector), kro (Kube Resource Orchestrator), K-Poperator (an internal Spotify operator framework), and Kubebuilder to meet their needs. In this talk you’ll learn how these platform primitives are used by Spotify’s platform teams, how these primitives support Spotify’s platform principles, and lessons learned from 5 years of running the platform.
CNCF Ambassadors#
Time: 11:45am EST - 12:10pm EST
Speakers: Ask The Experts!
Venue: Building B | Level 1 | Exhibit Hall B3-B5 | Project Pavilion, Atlanta, GA, USA
Type: PROJECT OPPORTUNITIES
Description: Join senior CNCF Ambassadors from around the globe as they share their experience with our program. Bring your questions to ask these experts anything relevant to their role within the cloud native ecosystem.
Quantum-Resistant Kubernetes: Realities, Risks & (Versioning) Pitfalls#
Time: 11:45am EST - 12:15pm EST
Speakers: Fabian Kammel (Principal Security Consultant, ControlPlane)
Venue: Building B | Level 3 | B302-303, Atlanta, GA, USA
Type: SECURITY
Description: Post-Quantum Cryptography (PQC) is no longer theoretical. With Go 1.24+ enabling ML-KEM by default, Kubernetes v1.33+ inherits significant quantum resistance for key exchange. This talk dives into the practical realities. We’ll briefly cover the current state of PQC standardization, such as ML-KEM (FIPS-203) and then critically examine real-world implications: how K8s “accidentally” already benefits from PQC key exchange, the subtle but critical downgrade risks from mismatched Go versions (e.g., Go 1.23’s X25519Kyber768Draft00 vs. 1.24’s X25519MLKEM768), and the “tldr.fail” issue where large PQC key shares can break TLS handshakes due to packet size limits. We’ll explore these challenges with evidence from the K8s ecosystem, offering insights for maintainers and advanced users navigating the PQC transition.
Kubernetes SIG/WG Meet + Greet, Lunch and Learn#
Time: 12:00pm EST - 2:30pm EST
Venue: Building B | Level 2 | B216-217, Atlanta, GA, USA
Type: EXPERIENCES
Description: Chart your course within the Kubernetes community. This dedicated meet and greet connects you directly with the SIGs and WGs that power the project. Representatives from each group will be here to discuss their charter, current initiatives, and how your skills can make an impact. Whether you’re an experienced contributor looking to expand your scope or a new contributor ready to get started, this session will help you find the right team to begin or continue your journey.
Lunch 🍲#
Time: 12:15pm EST - 2:15pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: BREAKS
Gold Sponsor In-Booth Demos#
Time: 12:15pm EST - 12:45pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: Sponsor: AugmentCode Booth Number: 1421 Sponsor: PerfectScale Demo: The Perfect Cluster: The 6-Step Kubernetes Optimization Framework Booth Number: 533
Sponsored Demo: Three Well-Lit Paths to Scalable LLM Inference with llm-d on Kubernetes#
Time: 12:15pm EST - 12:35pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: As GenAI workloads move from prototypes to production, engineering teams hit scaling walls: exploding GPU costs, uneven latency, and bloated, black-box inference stacks. In this live demo, Red Hat will explore these three well-lit paths for scaling LLM inference on Kubernetes using the open-source llm-d framework. Each path addresses a specific challenge in real-world GenAI operations: Intelligent Inference Scheduling reduces latency through prefix cache–aware routing and prompt/session stickiness. Prefill/Decode Disaggregation improves GPU efficiency and reduces tail latency by decoupling compute stages. Wide Expert Parallelism unlocks the ability to deploy Mixture-of-Experts (MoE) models with high throughput across multiple replicas. The session will feature a live demonstration of llm-d, deployed with vLLM, Prometheus, and Grafana. Attendees will discover how to disaggregate LLM workloads into composable services and gain actionable insights for implementation on any Kubernetes platform. A key highlight will be a walkthrough of a Mixture-of-Experts (MoE) model configuration, demonstrating llm-d’s efficient scheduling of models using expert parallelism across different nodes, leveraging AI accelerators. In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Network Nook Meetup: Conference Buddies#
Time: 12:45pm EST - 1:45pm EST
Venue: Building B | Level 1 | Solutions Showcase, Atlanta, GA, USA
Type: EXPERIENCES
Description: Join us for casual and engaging meetups at the Network Nook during lunch breaks! These informal gatherings are open to all, whether you’re a first-time attendee, a solo traveler, or simply looking to chat about shared interests. This is a great way to connect with others. Today’s theme is: Conference Buddies Meet other attendees, make new connections, and find a conference buddy to explore sessions and events together!
Project Pavilion Tour#
Time: 12:45pm EST - 1:05pm EST
Speakers: Calum Murray (CNCF Ambassador)
Venue: Building B | Level 1 | Exhibit Hall B3-B5 | Project Pavilion, Atlanta, GA, USA
Type: PROJECT OPPORTUNITIES
Description: Explore the Project Pavilion, a hub of innovation and discovery! Take part in daily tours, interact with project maintainers at their kiosks, gain insights on community engagement and KCD event organization, and learn more about certification opportunities to showcase your expertise. This tour will include an introduction to the Pavilion, making introductions, interacting with maintainers, and ensuring you end up talking to the right projects!
Sponsored Demo: Smarter Scaling for Kubernetes: Real Time Optimization for Cost and Performance#
Time: 12:45pm EST - 1:05pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: Kubernetes resource management often means choosing between wasted spend and unpredictable performance. In this session, the Global Director of Sales Engineering at ScaleOps will demonstrate how autonomous scaling delivers the right resources at the right time automatically. See how intelligent optimization improves efficiency across CPU, memory, and GPU workloads, including AI-driven environments, without manual tuning or complex configuration. About ScaleOps: The leading autonomous cloud resource management platform, trusted by enterprises like Salesforce, Wiz, DocuSign, and Coupa. ScaleOps powers thousands of workloads worldwide, ensuring performance, resilience, and cost efficiency at scale. In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Learning Lounge: Ten Minutes, Ten Insights: What LF Research Reveals About Cloud, AI, and Open Source#
Time: 1:00pm EST - 1:15pm EST
Speakers: Hilary Carter (SVP Research, LF Research)
Venue: Building B | Level 1 | Exhibit Hall B3-B5 | Learning Lounge, Atlanta, GA, USA
Type: EXPERIENCES
Description: 10-Minute Tip Talk
Gold Sponsor In-Booth Demos#
Time: 1:15pm EST - 1:45pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: Sponsor: GitLab Demo: Don’t Reinvent the Wheel: A Developer’s Guide to AI Reusability Booth Number: 120 Sponsor: Harness Booth Number: 522 Sponsor: MariaDB Demo: Unified cloud GenAI platform for seamless native deployment, scaling, and management. Booth Number: 1021 Sponsor: New Relic Demo: Turn The Lights On: How New Relic Illuminates Your OTel Data Booth Number: 1420 Sponsor: SolarWinds Demo: Full Stack Observability with AI-driven Insights Booth Number: 630 In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Sponsored Demo: The DRA Paradigm Shift: Request a Capability, Not a Node#
Time: 1:15pm EST - 1:35pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: For years, scheduling GPUs in Kubernetes meant one thing: forcing a direct link between your workload and a specific node. Through a brittle web of taints, tolerations, and nodeSelectors, we’ve been telling our pods where to run, tightly coupling our applications to the physical infrastructure beneath them. This operational model is inefficient, hinders portability, and doesn’t scale. In this session, you’ll see a live demonstration of Dynamic Resource Allocation (DRA), a new Kubernetes standard that fundamentally changes how we orchestrate specialized hardware. We’ll show you how to stop targeting nodes and start requesting capabilities. Using a declarative ResourceClaim on GKE to request specific NVIDIA GPU attributes, we can let the Kubernetes scheduler intelligently match your workload with the right hardware anywhere in the cluster. This is the future: a truly portable, flexible, and automated approach to managing high-value resources. In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Sponsored Demo: Beyond Grep: Interactive, Offline Analysis of Kubernetes Failures#
Time: 1:45pm EST - 2:05pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: Static support bundles capture logs and resource dumps, but they lack the interactivity needed for real analysis. To fully understand a Kubernetes failure, engineers must query the API as it existed during the incident. This talk introduces a technique that turns static bundles into interactive, queryable Kubernetes environments. We’ll start with troubleshoot.sh, which creates a diagnostic bundle of the failing cluster. Then we’ll show troubleshoot-live, which ingests that bundle and launches a local Kubernetes API server and etcd instance, rehydrating the cluster state. The result is a high-fidelity, offline replica accessible via kubeconfig with kubectl or any compliant tool. This enables interactive debugging, post-mortem analysis, and automated triage completely offline and without production access. Finally, we highlight that by replaying real-world failures safely and consistently, teams can train AIOps pipelines, experiment with Retrieval-Augmented Generation (RAG), and develop advanced Agentic Workflows. This capability improves day-2-day troubleshooting while paving the way for next-gen intelligent Kubernetes operations. In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Becoming an Impactful CNCF Member#
Time: 2:00pm EST - 2:30pm EST
Venue: Building B | Level 1 | Solutions Showcase, Atlanta, GA, USA
Type: EXPERIENCES
Description: How CNCF members directly fuel the health and growth of the cloud native community. This session goes beyond sponsorship to show how member contributions, addressing financial support to engineering resources which are vital for sustaining core projects, funding security audits, and enabling community programs. Learn about the tangible impact of membership and what an ideal contributing member looks like. You’ll leave with a clear understanding of the virtuous cycle that connects membership to community vitality.
Learning Lounge: Sensitive Keys in Codebases & Hidden in Layers Contest#
Time: 2:00pm EST - 2:15pm EST
Speakers: Aleks Jones (Technical Trainer, Linux Foundation Education)
Venue: Building B | Level 1 | Exhibit Hall B3-B5 | Learning Lounge, Atlanta, GA, USA
Type: EXPERIENCES
Description: Live Security Challenge
Pet-a-Pup#
Time: 2:00pm EST - 3:00pm EST
Venue: Building B | Level 2 | Willow Garden Foyer, Atlanta, GA, USA
Type: EXPERIENCES
Description: Take a “paws” from your busy day! Join us for a visit with some friendly therapy puppies to help reduce stress and boost your mood.
Agile for Every Brain#
Time: 2:00pm EST - 3:00pm EST
Venue: Building B | Level 2 | B211-212, Atlanta, GA, USA
Type: INCLUSION + ACCESSIBILITY
Description: Join this interactive discussion on adapting Agile methodologies to be more inclusive and effective for every type of thinker, particularly neurodivergent individuals. We will explore common challenges in ceremonies like stand-ups and retrospectives that can unintentionally exclude team members with different communication and processing styles. The primary goal is to crowdsource and share practical, actionable strategies that foster psychological safety and enhance team collaboration, making our work environments more equitable and accessible. This is also a great way to meet and join members of the new Cloud Native Neurodiversity Community Group!
No More GPU Cold Starts: Making Serverless ML Inference Truly Real-Time#
Time: 2:15pm EST - 2:45pm EST
Speakers: Nikunj Goyal (MTS - 2, Adobe); Aditi Gupta (Aditi Gupta, Software Developer at Disney + Hotstar, Disney Plus Hotstar)
Venue: Building B | Level 4 | B401-402, Atlanta, GA, USA
Type: AI + ML
Description: Serverless ML inference is great but when GPUs are involved, cold starts can turn milliseconds into minutes. Whether scaling transformer models or using custom inference services, the startup latency caused by container initialization, GPU driver loading, and heavyweight model deserialization can kill real-time performance and cost you tons of money. In this talk, we’ll break down the anatomy of GPU cold starts in modern ML serving stacks including why GPUs introduce unique cold-path delays, how CRI and device plugins contribute to it, and what really happens when a PyTorch model boot-up on a fresh pod. We’ll walk through production-ready strategies to reduce startup latency: - Pre-warmed GPU pod pools to bypass init time - Model snapshotting with TorchScript or ONNX to speed up deserialization - Lazy loading techniques that delay model initialization until the first request Thus helping you eliminate cold start pain and keep your services fast, efficient, and production-ready.
Tuning GenAI Workloads on Kubernetes: What Actually Works (and What Doesn’t)?#
Time: 2:15pm EST - 2:45pm EST
Speakers: Ishaan Sehgal (Co-founder, Omnara); Brian Lockwood (Senior Systems Software Engineer, NVIDIA)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 1, Atlanta, GA, USA
Type: AI + ML
Description: Curious about what really moves the needle for GenAI performance on Kubernetes? We put Skyhook, a Kubernetes-native OS package manager, and Kaito, the CNCF Sandbox AI workload operator, through their paces. Running LLaMA, Falcon, and Phi-4 workloads on both single and dual GPU nodes, we tweaked everything from kernel flags and GRUB settings to sysctl values and GPU clock locking. The real breakthrough wasn’t in the numbers; it was in the process. We developed a declarative, GitOps-ready workflow that makes infrastructure benchmarking safe, repeatable, and easy to adopt. In this session, we’ll walk you through real benchmark data and Grafana dashboards, share reusable YAML examples for OS and GPU tuning, and introduce a practical A/B testing framework for GenAI. You’ll leave with a clear sense of what’s actually worth tuning, and what you can safely ignore, when running LLMs on Kubernetes.
But What About Reliability?#
Time: 2:15pm EST - 2:45pm EST
Speakers: The Multi-Million Dollar Kubernetes Cost Optimization Question - Zain Malik (Exostellar & Nibir Bora, Clean Compute)
Venue: Building B | Level 3 | B308-309, Atlanta, GA, USA
Type: CLOUD NATIVE EXPERIENCE
Description: “But what about reliability?” We heard this question 865 times when staring at 9% CPU utilization. Every time followed by a VM-era horror story or a revenue shield - “We bring in millions in revenue; we deserve idle resources for peace of mind”. This session reveals 9 battle-tested Kubernetes-native strategies that took us from 9% to 50% utilization while IMPROVING reliability. The same directors who predicted “catastrophic failure” now champion optimization, panic-paging our team if costs regress. Discover practical implementations and pitfalls, such as tuning workload limits, too many pods on nodes, API server pressure, reliable spot nodes, etc. You can selectively adopt and combine these strategies to build your own multi-dimensional cost optimization blueprint, precisely tailored to address the distinct challenges of your platform. Every technique uses open-source CNCF tools, because the most expensive infrastructure isn’t compute - it’s fear.
Open Source at the Edge: Hardware, Firmware, and AI Stacks#
Time: 2:15pm EST - 2:45pm EST
Speakers: Miley Fu (WasmEdge founding member, CNCF Ambassador, Second State Inc.); Saiyam Pathak (Principal Developer Advocate, vCluster)
Venue: Building B | Level 2 | B206, Atlanta, GA, USA
Type: EMERGING + ADVANCED
Description: In edge AI, we are often locked into proprietary cloud services and black-box APIs. This talk introduces a full-stack, end-to-end open-source solution that gives you control over the entire pipeline. We will demonstrate how to use a stack built entirely in Rust to turn an edge hardware device into a fun voice AI Agent. We’ll walk through a complete interactive flow: from VAD, to Automatic Speech Recognition (ASR), LLM-powered reasoning, and Text-to-Speech (TTS) on a self-hosted AI agent server. This stack is completely open, including: The Rust firmware for the edge device The Rust-based AI agent server for coordinating and managing AI models Plug-and-play access to any open LLM and speech models and MCP calling. Audience will learn: How to build a customizable, extensible AI voice agent using this open source framework. How open hardware + software benefits AI learning/ maker communities and product builders.
Efficient Kubernetes Autoscaling: News, Challenges, and Best Practices With KEDA#
Time: 2:15pm EST - 2:45pm EST
Speakers: Zbynek Roubalik & Jan Wozniak (CTO, Kedify)
Venue: Building C | Level 3 | Georgia Ballroom 1, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: This session offers a comprehensive overview of recent KEDA news, highlighting new features and integrations designed to simplify Kubernetes autoscaling, as well as discussing upcoming enhancements and future roadmap items. The talk will cover practical lessons and best practices drawn from hands-on experiences, addressing key challenges like metrics latency, overloaded infrastructure, and scaling trigger optimization. You will learn effective approaches for managing performance, minimizing scaling delays, and preventing bottlenecks. We will discuss the challenges of using (or misusing?) third-party monitoring and metrics providers for scaling decisions, exploring practical ways to overcome limitations without complex architectural changes. Learn from our direct experiences about effective strategies, performance optimization techniques, and how to build resilient, cost-optimized autoscaling solutions with KEDA.
Getting up To Date With Docsy: The Kubernetes Docs Upgrade in Progress#
Time: 2:15pm EST - 2:45pm EST
Speakers: Natali Vlatko (Open Source Lead Architect, Cisco); Rey Lejano (Specialist Adoption Architect, Red Hat); Divya Mohan (Principal Technology Advocate, SUSE); Sayak Mukhopadhyay (Technical Solutions Lead, Gemini Solutions)
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: With 12 million users of the Kubernetes documentation a year, the maintenance of our docs is key. Alongside tech docs principles, Kubernetes utilizes Hugo and Docsy to deliver the best experience for contributors and users. SIG Docs is in the process of aligning with upstream Docsy, doing the work needed to upgrade several versions and adopt the latest features. While a significant amount of groundwork has been laid for this upgrade, preparation is only half the story. It’s tempting to overhaul everything at once, but it’s imperative to continue delivering docs while the upgrade is in process. During this session, maintainers will cover why upgrading is worthwhile, how we’ve approached the refactor, and the challenges we’ve encountered as we push on. Learn how upgrading our docs infrastructure means ensuring we’re on the latest environment and dependency versions, the constant rebases a refactor requires, and the best practices SIG Docs will pass on when attempting your own upgrade.
Navigating the Rapid Evolution of Large Model Inference: Where Does Kubernetes Fit?#
Time: 2:15pm EST - 2:45pm EST
Speakers: Jiaxin Shan (Software Engineer, Bytedance); Yuan Tang (Senior Principal Software Engineer, Red Hat); Sergey Kanzhelev (SWE, Google); Rita Zhang (Principal software engineer, Microsoft)
Venue: Building C | Level 1 | C111-112, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Large model inference is evolving rapidly: model or expert parallelism, prefill/decode disaggregation, multi-lora and kv cache offloading push the limits of traditional serving. As infrastructure teams, we must decide — what belongs in Kubernetes core primitives vs engines vs ecosystem projects? In this session, WG-Serving chairs and industry leaders will share real-world lessons on managing these blurry boundaries. We’ll discuss how to evaluate new patterns, balance control vs observability, and adapt infrastructure to stay ahead in this dynamic landscape. Attendees will gain practical frameworks to decide when to extend Kubernetes vs offload to runtimes, and insights into top emerging demands from large-scale LLM workloads.
Under the Hood of Vitess: Database Engineered for Scale and Resilience#
Time: 2:15pm EST - 2:45pm EST
Speakers: Matt Lord & Florent Poinsard (Vitess Maintainer, PlanetScale)
Venue: Building C | Level 3 | Georgia Ballroom 3, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Vitess powers some of the world’s largest production MySQL databases. In this session, we’ll take you under the hood of Vitess to explore how it’s engineered to deliver scalability, performance, and resilience in distributed environments. We’ll walk through key internal components and show how they work together to manage sharded, multi-tenant, and geo-distributed workloads. This talk is designed for contributors, operators, and advanced users who want to understand Vitess beyond the basics. We’ll share real-world lessons from operating Vitess at scale, recent performance improvements, and upcoming roadmap features. Whether you’re looking to contribute to the project or deepen your understanding of its internals, this session will equip you with the architectural insights needed to reason about Vitess in production.
Beyond the Dashboard: Modern Observability for Platform Engineering at Scale#
Time: 2:15pm EST - 2:45pm EST
Speakers: Danielle Cook (CNCF Ambassador, Co-Organizer, Cartografos Working Group, Akamai); Whitney Lee (Senior Technical Advocate, Datadog); Stevie Caldwell (SRE Tech Lead, Fairwinds); Khallai Taylor (Sr. Observability Engineer & Consultant, E.ON Digital Technology GmbH); Payal Bagga (Staff Product Manager, Intuit)
Venue: Building B | Level 3 | B312-314, Atlanta, GA, USA
Type: OBSERVABILITY
Description: Observability is everywhere, but understanding where to focus can be overwhelming. As platform teams build internal developer platforms and self-service capabilities, observability must evolve from a sea of dashboards to something far more strategic: actionable insights that support performance, reliability, and developer productivity. This panel brings together a lineup of observability and platform engineering leaders—to share candid stories and hard-won lessons from end-user organizations like E.ON and Intuit. Together, they’ll break down the many layers of observability from telemetry pipelines and OpenTelemetry adoption to aligning SLOs with business goals and embedding observability directly into developer workflows. We’ll also explore how teams are leveraging AI to reduce MTTR and turn noisy signals into clear, actionable intelligence, all while keeping costs in check. This conversation will offer guidance and help you rethink what “good observability” really looks like.
UX Research Report: Prometheus and OTel’s Resource Attributes.#
Time: 2:15pm EST - 2:45pm EST
Speakers: Victoria Nduka (User Experience Designer, Independent); Amy Super (Principal Product Designer, Grafana Labs)
Venue: Building B | Level 3 | B304-305, Atlanta, GA, USA
Type: OBSERVABILITY
Description: Prometheus and OpenTelemetry are two CNCF projects that focus on observability and truly excel at their main purposes. However, it’s no secret that they started their integration journey on the wrong foot. This story is getting better over time and being done in a data-driven way! Through CNCF’s LFX mentorship program, Victoria and Amy conducted UX Research to understand how Prometheus should handle OTel’s Resource attributes. Using quantitative and qualitative approaches, they collected opinions from co-founders of both projects, active and old maintainers, and several end-users. By joining this talk, the audience will learn more about what is going well and what could be improved, while listening to Victoria and Amy’s educated suggestions for the project maintainers.
Anti Patterns for Platform Teams (number 3 Will Surprise You!)#
Time: 2:15pm EST - 2:45pm EST
Speakers: David Stenglein (Missing Mass, LLC)
Venue: Building B | Level 4 | B405-406a, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: As platform engineering has experienced a surge in interest and popularity, some anti-patterns have started to emerge. If not dealt with these situations can fester and unravel the benefits of internal platforms. Recognize any of these situations? - The dumping ground: Hey, you do ops, right? Here, take over my app. - Fixing the world: We’re being crushed under an inverted testing pyramid but we want the platform to fix our problems. - The kitchen sink: Hey, this new XYZ thing (probably AI) would make a great re-usable service! Scenarios like these are best dealt with by avoiding them altogether and we’ll go over recognition and prevention. If you’ve already found yourself in the hole, we’ll also talk about some things you can do to dig your way out.
Harmonizing Your Platform Domain With Kubernetes and Custom Resource Definitions#
Time: 2:15pm EST - 2:45pm EST
Speakers: Sebastien Blanc (Developer Relations Engineer, Port)
Venue: Building B | Level 4 | B406b-407, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: As organizations adopt cloud-native technologies, they often find themselves managing multiple platforms, including internal developer platforms and SaaS providers. This fragmentation creates challenges for developers and SREs, who must navigate disparate systems and tools to perform routine tasks. This talk explores how to harmonize your platform domain by leveraging Kubernetes and Custom Resource Definitions (CRDs). CRDs allow you to extend the Kubernetes API to define and manage custom objects, representing your specific platform domain concepts. We’ll discuss the benefits of this approach, including streamlined developer workflows and increased platform adoption. We’ll also address potential challenges, such as API exposure and operator sprawl, and offer strategies for overcoming them. By the end of this talk, you’ll have a clear understanding of how to leverage Kubernetes and CRDs to create a cohesive and harmonious platform domain.
Managing Netflix’s Compute Infrastructure With Kubernetes and Dynamic Capacity Management#
Time: 2:15pm EST - 2:45pm EST
Speakers: Charles Zheng & Nick Parker (Netflix)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 4, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Netflix runs large, multi-tenant Kubernetes clusters that power the company’s compute infrastructure across streaming services and batch workloads. In this talk, we share lessons from managing these fleets while building a resilient and cost-efficient capacity management system at cloud scale. We explain how a federated cellular structure to share resources across teams, how we treat latency-sensitive services and throughput-heavy batch jobs, and how we organize hardware into managed pools of nodes across a variety of users. We cover our soft capacity reservations that monitor real-time demand with shared buffers to support traffic spikes, our extended disruption budgets that use health signals to limit service impact, and our automated scaling, and predictive resizing that reduce cost. Finally, we show how unused capacity is filled with preemptible, low-priority workloads to reduce waste. Throughout all of this we will discuss what has and hasn’t worked along the way.
Red Vs. Blue: A Live Attacker-Defender Showdown in Kubernetes Security#
Time: 2:15pm EST - 2:45pm EST
Speakers: Lucy Sweet (Senior Software Engineer at Uber, Uber); Sandeep Kanabar (Lead Software Engineer, Gen (formerly NortonLifeLock))
Venue: Building B | Level 3 | B302-303, Atlanta, GA, USA
Type: SECURITY
Description: What if learning Kubernetes security could be thrilling, practical - and a little chaotic? In this interactive session, we stage a live attacker-versus-defender “chess match” inside a Kubernetes cluster. One speaker plays the role of a determined attacker, exploiting common misconfigurations, privilege escalations, and overly permissive RBAC. The other, a vigilant defender, responds with best-practice mitigations and live troubleshooting. You’ll watch a Kubernetes environment come under siege - and see how thoughtful, layered defenses can stop even persistent attackers in their tracks. Expect live demos, sharp insights, and just enough chaos to keep it real. We’ll cover escalating security scenarios, from pod privilege abuse to namespace isolation, resource quotas and Admission Webhooks, showing not just what to do, but why it matters. This isn’t theory-it’s security by example, performed live.
Gold Sponsor In-Booth Demos#
Time: 2:15pm EST - 2:45pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: Sponsor: DoiT Demo: From Metrics to Meaning: Real-Time Kubernetes Cost Intelligence Booth Number: 532 Sponsor: Elastic Booth Number: 931 Sponsor: Vultr Demo: Vultr Cloud Services Agent Booth Number: 731 In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Sponsored Demo: HolmesGPT: Agentic K8s troubleshooting in your terminal#
Time: 2:15pm EST - 2:35pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: Troubleshooting Kubernetes shouldn’t require hopping across dashboards, logs, and docs. With open-source tools like HolmesGPT and the Model Context Protocol (MCP) server, you can now bring an agentic experience directly into your CLI. In this demo, we’ll show how this OSS stack can run everywhere, from lightweight kind clusters on your laptop to production-grade clusters at scale. The experience supports any LLM provider: in-cluster, local, or cloud, ensuring data never leaves your environment and costs remain predictable. We will showcase how users can ask natural-language questions (e.g., “why is my pod Pending?”) and get grounded reasoning, targeted diagnostics, and safe, human-in-the-loop remediation steps – all without leaving the terminal. Whether you’re experimenting locally or running mission-critical workloads, you’ll walk away knowing how to extend these OSS components to build your own agentic workflows in Kubernetes. In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
🚨 Contribfest: Contribute To K8gb’s Journey Toward CNCF Incubation#
Time: 2:15pm EST - 3:30pm EST
Speakers: Yury Tsarev (Principal Solutions Architect, Upbound); Andre Aguas (Customer Success Engineer, Spotify)
Venue: Building B | Level 2 | B208, Atlanta, GA, USA
Type: 🚨 CONTRIBFEST
Description: Help drive k8gb toward CNCF Incubation! In this hands-on Contribfest, we’ll simplify core components and prepare k8gb for broader adoption. You’ll work on real open issues focused on modularizing the control loop, improving testability, and refactoring strategy logic. We’ll also explore how to externalize zone delegation to support dynamic, provider-agnostic configurations across clusters—an essential capability for real-world global failover setups. Whether you’re an experienced Go developer or a newcomer eager to get involved, we’ll guide you through setting up a multi-cluster local testbed and submitting your first PR. Your contributions will directly advance k8gb’s architecture, quality, and readiness for the next phase in the CNCF ecosystem.
🚨 Contribfest: Hands-On With Helm 4: Wasm Plugins, OCI, and Resource Sequencing. Oh My!#
Time: 2:15pm EST - 3:30pm EST
Speakers: Andrew Block (Distinguished Architect, Red Hat); Scott Rigby (Helm Maintainer, Replicated); George Jenkins (Senior Software Engineer, Bloomberg)
Venue: Building B | Level 2 | B207, Atlanta, GA, USA
Type: 🚨 CONTRIBFEST
Description: Join Helm maintainers for an interactive session contributing to core Helm and building integrations with some of Helm 4’s emerging features. We’ll guide contributors through creating Helm 4’s newest enhancements including WebAssembly plugins, enhancements to how OCI content is manged, and implementing resource sequencing for controlled deployment order. Attendees will explore how to build Download/Postrender/CLI plugins in WebAssembly, develop capabilities related to changes to Helm’s management of OCI content including repository prefixes and aliases, and use approaches for sequencing chart deployments beyond Helm’s traditional mechanisms. This session is geared toward anyone interested in Helm development including leveraging and building upon some of the latest features associated with Helm 4!
📚 Tutorial: Intelligent Failure: Using AI To Push Your Cluster To the Brink#
Time: 2:15pm EST - 3:30pm EST
Speakers: James Ilse (Principal Field Engineer, Solo.io)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 2-3, Atlanta, GA, USA
Type: 📚 TUTORIALS
Description: How do you test the resilience of your environment without risking an outage? Stress testing is often a one-time pre-production task, immediately forgotten due to the complexity of keeping it current. In this tutorial, we’ll show how to automate stress testing using AI to adapt to your ever-changing Kubernetes environment. Attendees will learn to deploy a repeatable, low-effort system using Kagent and Kgateway as the human-friendly control plane, Fortio for load generation, and Istio Ambient Mesh for enhanced observability. Think of it as your eager assistant continuously probing your system until cracks appear. You’ll leave with a working knowledge of how to setup and run the tools to create intelligent, production-grade stress tests anytime.
Sponsored Demo: Coding Agents Are The Only Agent Framework You Need#
Time: 2:45pm EST - 3:05pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: Do you really need a custom framework to build AI agents? Many teams invest in bespoke planning loops, tool registries, and orchestration layers. You may already have everything you need. Coding agents like Claude Code, Cursor, and Codex were designed to write software, but code itself is the universal language of infrastructure. This talk shows how we built an agent harness around Claude Code to go beyond coding tasks into SRE workflows, security operations, and infrastructure management. We’ll cover: Why coding agents double as powerful sysadmins and operators The architecture of our production-ready agent harness Examples of applications from on-call to incident response A live demo of Heroku Garden, our “vibe-coding” interface powered by this system Learn how to leverage existing coding agents instead of reinventing frameworks and see a production agent in action. In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Multi-Cluster Wars: The Scheduler Awakens#
Time: 3:00pm EST - 3:30pm EST
Speakers: Dejan Pejchev & Priyanka Ravi (Platform Tech Advocate, G-Research)
Venue: Building B | Level 4 | B401-402, Atlanta, GA, USA
Type: AI + ML
Description: AI/ML pipelines generate millions of diverse, resource-hungry batch jobs—GPU bursts for training or CPU and memory-intensive preprocessing—that push single-cluster schedulers beyond ETCD scalability limits and single-region failure domains. Multi-cluster batch schedulers federate Kubernetes clusters to dynamically extend capacity across on-prem and cloud environments, isolate tenants, and survive zone outages. In a multi-cluster context, preemption must be coordinated globally to reclaim capacity where it’s most needed without starving distant workloads. Fair-share depends on quota enforcement so every team gets its entitled slice of compute. Gang scheduling reserves resources across clusters and only releases them when all parts of a multi-node job are ready to launch simultaneously. This deep dive will explore how multi-cluster schedulers implement core batch scheduling features across federated Kubernetes clusters and their architectures.
The Hidden Risks in AI/ML Supply Chains: How To Secure Your Workloads#
Time: 3:00pm EST - 3:30pm EST
Speakers: Yash Pimple (Software Engineer, Chainguard)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 1, Atlanta, GA, USA
Type: AI + ML
Description: What happens when your AI model gets hacked before it even runs? The culprit: a fragile AI/ML supply chain, vulnerable to data poisoning, model tampering, and rogue dependencies. These threats can silently wreck trust in your Kubernetes workloads. In this session, we’ll dissect the AI/ML supply chain lifecycle, explore its evolving threat landscape, and understand practical security measures. By the end of this talk, attendees will walk away with actionable insights around SBOMs, model cards, and tools to ensure the transparency and integrity of your model within their Kubernetes environments.
Community Capital: Making OSS and Businesses Successful Together#
Time: 3:00pm EST - 3:30pm EST
Speakers: Liz Rice (Chief Open Source Officer, Isovalent, Cisco)
Venue: Building B | Level 3 | B308-309, Atlanta, GA, USA
Type: CLOUD NATIVE EXPERIENCE
Description: Just as open source success is about more than great code, building a successful business on OSS relies on more than pricing. This talk explores how ecosystems thrive when maintainers, vendors, and users build on shared values and trust. We’ll unpack why timing matters when open sourcing a project or contributing it to a foundation, how vendors can grow real businesses by adding value around open source rather than trying to control it, and why vendor success matters to the projects themselves. Drawing on Liz’s experience with the Cilium project and as former Chair of the TOC, she’ll look at examples from the CNCF and beyond, to show how shared values can lead to collective success, and draw out the relationships between project health and vendor viability. Expect thoughtful metaphors, practical takeaways, and a reminder that open source isn’t a zero-sum game, and commercial success can amplify community impact.
Public End User Technical Advisory Board (TAB) Town Hall#
Time: 3:00pm EST - 3:30pm EST
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: CLOUD NATIVE EXPERIENCE
Description: This session is a panel discussion moderated by Bob Killen with members of the Public End User Technical Advisory Board. Feel free to come with questions, but we’ll be doing an overview of the Public End User Technical Advisory Board’s governance structure, scope, mission and processes. To learn more about the TAB, visit https://github.com/cncf/tab
Intelligent LLM Routing: A New Paradigm for Multi-Model AI Orchestration in Kubernetes#
Time: 3:00pm EST - 3:30pm EST
Speakers: Chen Wang (Senior Research Scientist, IBM Research); Huamin Chen (Distinguished Engineer, Red Hat)
Venue: Building B | Level 2 | B206, Atlanta, GA, USA
Type: EMERGING + ADVANCED
Description: This research-driven talk introduces a novel architecture paradigm that complements recent advances in timely intelligent inference routing for large language models. By integrating proxy-based classification and reranking techniques, we’ve developed a system that efficiently routes incoming prompts to domain-specialized LLMs based on rapid content analysis. Our approach creates a meta-layer of intelligence above traditional model serving infrastructures, enabling specialized models to handle queries they’re optimized for while maintaining a unified API interface. We’ll present performance research comparing this distributed approach against monolithic inference-time scaling, demonstrating how intelligent routing can achieve superior results for complex, multi-domain workloads while reducing computational overhead. The session includes a Kubernetes-based reference implementation and quantitative analysis of throughput, latency, and accuracy across diverse prompt categories.
Sound Bath#
Time: 3:00pm EST - 3:15pm EST
Venue: Building C | Level 1 | C108, Atlanta, GA, USA
Type: EXPERIENCES
Description: Immerse yourself in a 15-min “Sound Bath” meditative experience, where you’ll be enveloped in the healing vibrations of crystal singing bowls, gongs, and chimes to release stress, restore balance, and promote deep relaxation. Simply lie down, breathe deeply, and let the resonant sounds wash over you, guiding your mind and body toward a state of peace, calm, and rejuvenation.
Karmada in Action: Scaling AI Workloads Across Multi-Cluster at Scale#
Time: 3:00pm EST - 3:30pm EST
Speakers: Hongcai Ren (Senior Software Engineer, Huawei); Tessa Pham & Wei-Cheng Lai (Senior Software Engineer, Bloomberg)
Venue: Building C | Level 3 | Georgia Ballroom 1, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: In the era of rapidly evolving AI technology, efficiently running AI workloads across multiple clusters has become a critical challenge. Karmada, as a powerful open-source multi-cluster orchestration solution, is being increasingly adopted by users to run AI workloads. This session will explore the practical strategies employed and the key capabilities that Karmada has built to support these AI workloads. Session Outline: - Why do we need to run AI workloads on multi-clusters - What’s the challenge of it - How Karmada address these challenges - Key Capabilities of Karmada for AI - Multicluster Scheduling - Resource Interpreter - FederatedResourceQuota - Multi-Cluster Queue: - FederatedHPA - Real World Practices - QA By the end of this session, attendees will have a comprehensive understanding of Karmada’s capabilities in running AI applications across multiple clusters and be inspired to explore new possibilities for leveraging Karmada in their own AI projects.
Kubernetes SIG-Windows Updates#
Time: 3:00pm EST - 3:30pm EST
Speakers: Mark Rossetti (Principal Software Engineer, Microsoft); Jose Valdes (Red Hat)
Venue: Building C | Level 3 | Georgia Ballroom 3, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: At this maintainer track talk we will cover what is new in the Windows Special Interest Group. This talk will focus on improvements on recently added support for features such as graceful node shutdown support on Windows, recent improvements made to kube-proxy, and more!
The Future of Virtualization in Kubernetes: What’s Next for KubeVirt#
Time: 3:00pm EST - 3:30pm EST
Speakers: Vladik Romanovsky (Senior Principle Software Engineer, Red Hat)
Venue: Building C | Level 1 | C111-112, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: KubeVirt is rapidly evolving. In the past year, we’ve introduced many new features, such as decentralized live migration with major stability improvements, VM SWAP support, seamless TCP migration with Passt, and enhancements to the VM rollout mechanism. We added initial support for Kubernetes Dynamic Resource Allocation (DRA) - and much more. Looking ahead, we’re exploring support for multiple VMMs beyond QEMU/KVM, expanding into Confidential Computing across architectures, and introducing a new plugin system. As part of our journey toward CNCF graduation, we’re refining how we plan and deliver features through a new Enhancements process to improve roadmap clarity and community focus. We’ll also discuss ongoing challenges with Kubernetes’ native resource quota system and how rethinking it could better serve virtualized workloads. Join us to see how KubeVirt is shaping the future of virtualization in Kubernetes.
Building Scalable End-to-end Latency Metrics From Distributed Trace#
Time: 3:00pm EST - 3:30pm EST
Speakers: Kusha Maharshi (Senior Software Engineer, Bloomberg)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 4, Atlanta, GA, USA
Type: OBSERVABILITY
Description: This talk shares how Bloomberg used distributed tracing, an OpenTelemetry standard, to address a prevalent need: timing requests from point A to point Z in a cloud native system where a whole alphabet’s worth of steps occur in between. This real-time solution ingests more than 50 billion daily spans, modeled as streaming directed acyclic graphs with deep fan-outs and fan-ins. These fan-ins create scalability chokepoints but represent real-world scenarios like queuing systems for high-volume messaging, order processing or batched notifications. This session will highlight the scalability challenges and lessons from building the architecture of an Apache Kafka and Kubernetes-based observability solution that turns complex trace data into actionable insight. The resulting end-to-end latency metrics power SLOs, alerts, and root-cause analysis. If you’re into building low-latency, high-throughput telemetry systems, applying trace at scale, or just geeking over graphs, this talk is for you!
Where’s My Pod? End-to-End Tracing for Kubernetes With OpenTelemetry#
Time: 3:00pm EST - 3:30pm EST
Speakers: Artem Tkachuk & JP Phillips (Software Engineer 4 (Compute Runtime), Netflix)
Venue: Building B | Level 3 | B304-305, Atlanta, GA, USA
Type: OBSERVABILITY
Description: A Kubernetes developer opens the dashboard to check on a critical deployment, only to find the pod is stuck in “Pending”. What happened? Where’s the bottleneck? In the journey to modernize its compute infrastructure, the Compute team at Netflix faced these same mysteries while migrating from a custom stack to open-source Kubernetes + containerd + CRI. To answer the perennial question, “What happened to my Pod?”, the Compute team built an end-to-end observability solution using OpenTelemetry tracing across the pod lifecycle. This talk demonstrates how connecting Netflix’s custom scheduler, kubelet’s syncPod, and container runtime traces enabled the team to visualize hidden delays, such as global locks being the bottleneck for launching pods at scale and container registry issues affecting container launch times. The session offers a practical, story-driven exploration of transforming pod observability—so you can finally answer your own “what happened?” with confidence.
Capabilities, APIs, and Experiences: Blueprints To Build Interoperable Platforms#
Time: 3:00pm EST - 3:30pm EST
Speakers: Kyle Penfound (Dagger & Mauricio “Salaboy” Salatino, Diagrid)
Venue: Building B | Level 4 | B405-406a, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Over the last 10 years, Kubernetes has changed and unified the underlying APIs to define and run our cloud native ecosystems. But Kubernetes is just the start, as the vibrant ecosystem of tools designed to add capabilities on top of your clusters is far from being unified and interoperable. This has the unintended consequence of introducing rigidity into your organization’s platform, which manifests as reduced iteration speed and prolonged timeframes to adopt new technologies. In this presentation, Marcos and Mauricio will showcase existing blueprints that combine different projects to build platforms. They will then cover three aspects (platform capabilities, platform APIs, and experiences) that can help cloud-native projects deliver consistent experiences for platform teams, mixing and matching tools to add capabilities to their platforms. This presentation not only focuses on the end-user platforms but also on how we can all contribute to driving consistency in the ecosystem.
Node Manager#
Time: 3:00pm EST - 3:30pm EST
Speakers: How Yahoo Manages Thousands of Nodes at Scale? - Payal Patel (Principal Software Development Engineer, Yahoo)
Venue: Building B | Level 4 | B406b-407, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Yahoo has been an early adopter of Kubernetes, operating 37 on-premises K8s clusters that host 2,700 applications across more than 8,000 physical nodes. The daily management of these clusters & their physical nodes presents significant challenges, including the process of upgrading the K8s version, applying security updates, upgrading the OS version of nodes, and addressing node failures & safely removing faulty nodes from rotation to prevent impact on application pods. Yahoo developed the Node Manager operator, which helps the execution of maintenance tasks, performs controlled K8s version upgrades for control plane & worker nodes, ensuring no impact to currently running applications & performs continuous health checks on nodes as configured per nodegroup & auto remediates nodes in the event of any issues. This talk will discuss how Yahoo manages these operations at scale through automation, reducing manual work for engineers & improving efficiency in their infrastructure management.
CLBO: ClashLoopBackOff: Attendee Edition#
Time: 3:00pm EST - 4:00pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5 | Project Pavilion, Atlanta, GA, USA
Type: PROJECT OPPORTUNITIES
Description: Come and join a competition of people using their technical ingenuity and creativity to solve a challenge put forth by the Scheduler (host). Time is limited and stakes are high, as this isn’t just a “live demo” for the masses. Over the course of twenty minutes, competitors will attempt to resolve a broken cluster, or deploy a service to production. At the end of the time, entries will be judged on four categories. Each category will be rated on Stability, Resiliency, Flexibility, and Observability. Participants won’t know what challenge they’ll be given ahead of time but any cloud resources or APIs will given out before the challenge is announced. During the competition, the Scheduler will engage with the audience, ask questions, and perhaps engage in some light roasting of good and bad ideas. Join us, root for our competitors, and feel free to engage live!
Sound Bath#
Time: 3:20pm EST - 3:35pm EST
Venue: Building C | Level 1 | C108, Atlanta, GA, USA
Type: EXPERIENCES
Description: Immerse yourself in a 15-min “Sound Bath” meditative experience, where you’ll be enveloped in the healing vibrations of crystal singing bowls, gongs, and chimes to release stress, restore balance, and promote deep relaxation. Simply lie down, breathe deeply, and let the resonant sounds wash over you, guiding your mind and body toward a state of peace, calm, and rejuvenation.
Coffee Break ☕#
Time: 3:30pm EST - 4:00pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: BREAKS
Appreciation for Newly Graduated Projects + Project Birthdays Celebration 🎉#
Time: 3:30pm EST - 4:00pm EST
Venue: Building B | Level 1 | Sponsor Showcase, Project Pavilion, Atlanta, GA, USA
Type: EXPERIENCES
Description: Join us in the Project Pavilion to recognize our newest Graduated Projects 🎉 Crossplane, Dragonfly + Knative! Grab a treat & stop by these projects kiosks to say congratulations on their recent graduation or project birthday! Project Birthdays! 10 Years: Containerd, Helm, in-toto, + OPA 11 Years: Harbor + Prometheus 12 Years: etcd 14 Years: Fluent 15 Years: Vitess 16 Years: TUF
Learning Lounge: Don’t Cross Wires#
Time: 3:30pm EST - 3:45pm EST
Speakers: Cross-Skill: Aligning Teams Around Smart Learning Paths - Mary Campbell & Randi Armour (Account Executive, Educational Solutions, Linux Foundation Education)
Venue: Building B | Level 1 | Exhibit Hall B3-B5 | Learning Lounge, Atlanta, GA, USA
Type: EXPERIENCES
Description: 10-Minute Tip Talk
🤟 Sign Language Crash Course#
Time: 3:30pm EST - 4:30pm EST
Venue: Building B | Level 2 | B211-212, Atlanta, GA, USA
Type: INCLUSION + ACCESSIBILITY
Description: This interactive crash course introduces participants to American Sign Language (ASL) along with standardized signs for cloud-native terms such as CNCF, Kubernetes, containers, and service mesh. Through hands-on practice, attendees will gain practical signing skills they can immediately use to connect more effectively with Deaf and hard-of-hearing community members in the cloud-native ecosystem and beyond. No prior sign language experience is required, just curiosity and openness to learning. This goal supports the CNCF commitment to fostering a welcoming cloud-native community for all.
Gold Sponsor In-Booth Demos#
Time: 3:30pm EST - 4:00pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: Sponsor: Port Demo: Port: The Agentic Engineering Platform Booth Number: 430 In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Sponsored Demo: Running etcd in Production: Best Practices & Rescue Recipes#
Time: 3:30pm EST - 3:50pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: Running etcd in production isn’t always smooth sailing - slow disks, network hiccups, or unexpected leader elections can quickly bring clusters to their knees. In this demo, we’ll walk through proven best practices for deploying and operating etcd at scale, then showcase “rescue recipes” for common problems like instability, database bloat, and failed upgrades. You’ll leave with practical guidance to keep your clusters healthy and recover fast when things go wrong. In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Sound Bath#
Time: 3:40pm EST - 3:55pm EST
Venue: Building C | Level 1 | C108, Atlanta, GA, USA
Type: EXPERIENCES
Description: Immerse yourself in a 15-min “Sound Bath” meditative experience, where you’ll be enveloped in the healing vibrations of crystal singing bowls, gongs, and chimes to release stress, restore balance, and promote deep relaxation. Simply lie down, breathe deeply, and let the resonant sounds wash over you, guiding your mind and body toward a state of peace, calm, and rejuvenation.
Optimizing Multi-Agent LLM Workloads With AMD GPUs and Kueue#
Time: 4:00pm EST - 4:30pm EST
Speakers: Yuchen Fama (Cofounder, CTO, Cognality Learning); Jodie Su (AMD); Zhiming Shen (CTO, Exostellar)
Venue: Building B | Level 4 | B401-402, Atlanta, GA, USA
Type: AI + ML
Description: LLM inference workloads present distinct compute-memory phase transitions. Prompt ingestion involves compute-bound attention calculations, whereas token generation becomes memory-bound due to repeated parameter loading from DRAM and HBM. Multi-agent systems integrate heterogeneous components with disparate resource demands that must operate synchronously. This session showcases how AMD GPUs and Kueue optimize compute and memory partitioning, binpacking, and colocation of tightly coupled agentic workflows alongside inference tasks with bursty resource patterns. Attendees will learn strategies to design advanced scheduling and binpacking for agent interaction workflows to achieve 50-70% higher throughput compared to traditional approaches. We’ll demonstrate how high-capacity, high-bandwidth GPUs such as AMD MI355x are optimized for mixed-workload AI applications and leveraging unified memory access to minimize cross-component latency while preserving isolation.
The Future of Debugging Is No Debugging: Observability Is Dead#
Time: 4:00pm EST - 4:30pm EST
Speakers: Jeremy Adams (Developer Advocate, Neo4j)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 1, Atlanta, GA, USA
Type: AI + ML
Description: Modern observability stacks are complex, powerful, and… sometimes, overkill. As engineers, we’ve grown obsessed with tracing every request, logging every event, and instrumenting every line of code. But is that really making debugging easier? Or are we just drowning in data that we rarely use effectively? In this talk, I’ll challenge the dominant narrative that better observability equals better debugging. Drawing from my experiences with Dagger, I’ll argue that debugging itself is a symptom of reactive thinking—chasing down problems after they’ve already happened. Instead, we should shift our focus to proactive design: Immutable pipelines where container states are frozen and reproducible, reducing variabilityEphemeral, disposable environments that cleanly reset after every runAutomated rollback and self-healing mechanisms that correct errors before they escalateAI agents that adapt workflows dynamically to avoid known failure patterns
From Kubestronaut To Production Hero: Turning Study Paths Into Real-World Wins#
Time: 4:00pm EST - 4:30pm EST
Speakers: David Pech (Wrike & Pedro Célestin, CLDF)
Venue: Building B | Level 3 | B308-309, Atlanta, GA, USA
Type: CLOUD NATIVE EXPERIENCE
Description: What if your certification study plan could solve tomorrow’s production issue? In this session, two CNCF Golden Kubestronauts — from Brazil and Czechia — share how their certification journeys weren’t just about passing exams, but rather about preparing for actual firefights in production. We’ll unpack real-world scenarios where skills gained while preparing for certifications like Istio, OpenTelemetry, and CKA turned into practical fixes for observability gaps, traffic chaos, and service resilience challenges. We’ll also share candid advice on which learning paths really translated into job impact, how to structure your upskilling around active projects, and why some certs didn’t quite live up to their promise in day-to-day ops. If you’re looking to grow your work impact, this session will help you connect your learning path with your production path — and turn every study sprint into a career boost.
Extending Kubernetes API: The Hidden Power of Aggregated Server Objects#
Time: 4:00pm EST - 4:30pm EST
Speakers: Amir Malka (Senior Software Engineer at ARMO, ARMO)
Venue: Building B | Level 2 | B206, Atlanta, GA, USA
Type: EMERGING + ADVANCED
Description: Kubernetes Custom Resource Definitions (CRDs) are the de facto method for extending the Kubernetes API. While powerful and flexible, CRDs rely on etcd for storage, making them suboptimal for managing larger objects such as Software Bill of Materials (SBOMs) or other high-volume datasets since they create a high load on etcd. This talk re-introduces API server aggregation as an alternative extension mechanism for those who need to become more familiar with it. By leveraging this lesser-known Kubernetes feature, projects like Kubescape have successfully managed oversized objects efficiently without burdening etcd. In this session we will dive into the technical architecture, use cases, and the real-world benefits of using this technology via the use-case example of Kubescape.
A Journey To Zero-Downtime Upgrades With Keycloak#
Time: 4:00pm EST - 4:30pm EST
Speakers: Martin Bartoš (Ryan Emerson, IBM)
Venue: Building C | Level 3 | Georgia Ballroom 3, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: In order to mitigate the impact of CVEs and allow continuous delivery of features, it is crucial that upgrades can be rolled out seamlessly. For stateless applications zero downtime upgrades is a solved problem, but for stateful applications, upgrades can present a significant challenge. As the leading open-source identity and access management solution, Keycloak is a critical component in many organizations’ infrastructure. Achieving maximum uptime is vital in order for dependent services to function. Join us to discover how Keycloak has evolved to support zero-downtime rollouts of configuration changes and patch upgrades. In this talk we explain the technical and project management challenges we faced, the measures taken to overcome them and what best practices you can leverage in your projects to enable zero-downtime upgrades. Key focus areas will be the Keycloak Operator, how we ensure clustering compatibility, testing strategies and our plans for the future.
KubeEdge DeepDive: Extending Kubernetes To the Edge With Real-World Industry Use Cases#
Time: 4:00pm EST - 4:30pm EST
Speakers: Tina Tsou (TikTok); Hongbing Zhang (KubeEdge TSC Member, DaoCloud); Huan Wei (senior technical director, Hangzhou harmonycloud Co., Ltd); Yin Ding (KubeEdge TSC Member, KubeEdge)
Venue: Building C | Level 3 | Georgia Ballroom 1, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: In this session, KubeEdge project maintainers will provide an overview of KubeEdge’s architecture, explore how KubeEdge with its industry-specific use cases. The session will kick off with a brief introduction to edge computing and its growing importance in IoT and distributed systems. The maintainers will then delve into the core components and architecture of KubeEdge, showcasing how it extends the capabilities of Kubernetes to manage edge computing workloads efficiently. Drawing on a range of industry use cases, including smart cities, industrial IoT, edge AI, robotics, and retail, the maintainers will share success stories and insights from organizations that have deployed KubeEdge in their edge environments, highlighting the tangible benefits and transformational possibilities it offers. The session will provide a detailed introduction to the certified KubeEdge conformance test. The maintainers will also share the advancements in technology and community governance in KubeEdge.
The Next Decoupling: From Monolithic Cluster, To Control-Plane With Nodes#
Time: 4:00pm EST - 4:30pm EST
Speakers: Justin Santa Barbara (Google & Ciprian Hacman, Microsoft)
Venue: Building C | Level 1 | C111-112, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Historically a kubernetes cluster is both a control-plane and worker nodes; management tools create and manage both. This model is simple to understand and effective, but accidentally introduces operational friction, particularly around upgrades, scalability, and cloud portability. The community has tackled many of those issues with projects such as cluster-api and karpenter, moving node management to the extensible kubernetes control plane. In this session, the kOps maintainers will explore this paradigm. We propose a clear division of responsibility where tools like kOps focus on their core strength - bootstrapping a robust, production-grade control plane - and then cede node lifecycle management to common in-cluster, kubernetes-API-driven tools like cluster-api and karpenter. We’ll show how this addresses past challenges by abstracting node management behind the Kubernetes API itself, creating a powerful “unified but interchangeable” story for “cluster” management tooling.
Debugging Your Cluster When It’s on Fire#
Time: 4:00pm EST - 4:25pm EST
Speakers: Nikola Grcevski (Nikola Grcevski, Grafana Labs); Tyler Yahn (Senior Software Engineer, Splunk)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 4, Atlanta, GA, USA
Type: OBSERVABILITY
Description: Imagine you have a Kubernetes cluster that’s hosting some number of services, perhaps these services are written in different programming languages, perhaps there are some databases in the cluster too. Now, imagine that this cluster is intermittently experiencing errors and it’s not easy to tell what’s going on. In this talk we will show you how you can add detailed telemetry immediately to a problematic production environment cluster, without any changes to your existing cluster configuration or applications, with the new OpenTelemetry eBPF Instrumentation project. We’ll show you how you’ll be able to get insights into what’s wrong with your cluster or services, by leveraging on-demand distributed traces and connectivity graphs, even if it’s the first time you have heard the term OpenTelemetry. We’ll discuss the design principles which make this technology safe to deploy in an already problematic environment, without further compromising the stability of your cluster.
Zero-Downtime Telemetry: Hot Reloading OpenTelemetry Collector Pipelines#
Time: 4:00pm EST - 4:30pm EST
Speakers: Amir Jakoby (CTO and Co-Founder, Sawmills); Shiran Melamed (DevOps Group Leader, JFrog)
Venue: Building B | Level 3 | B304-305, Atlanta, GA, USA
Type: OBSERVABILITY
Description: Cloud-native systems demand real-time adaptability—yet updating OpenTelemetry Collector processors often means restarts, risking data loss and breaking observability. In this talk, we introduce a powerful hot reload mechanism for dynamic reconfiguration of processors like filters, samplers, and transformers—without ever restarting the collector. Attendees will learn how to architect hot-swappable processors that respond instantly to pipeline changes, enabling seamless updates, uninterrupted data flow, and zero downtime. Whether you’re managing complex telemetry at scale or just tired of fragile restarts, this session delivers the blueprint for a more resilient, continuously observable stack.
Component Contributor Architecture: Democratizing Platform Engineering With CNCF Projects#
Time: 4:00pm EST - 4:30pm EST
Speakers: Anoop Gopalakrishnan & Jerome Guionnet (VP of Engineering, Guidewire)
Venue: Building B | Level 4 | B405-406a, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: At Guidewire, we needed a Kubernetes platform that reliably runs mission-critical insurance workloads while empowering teams to innovate rapidly. We developed the Component Contributor Model, combining Crossplane’s resource abstraction with KubeVela’s Open Application Model into a modular, extensible architecture. Application teams can now build and integrate platform components directly, eliminating centralized bottlenecks. We standardized a component lifecycle to scale across diverse cloud services. Explicit trust boundaries preserve security while granting developer autonomy. A governance layer balances speed with stability. In this session, you’ll discover how we configured a contributor-driven, secure platform; managed component lifecycles at scale; and applied practical governance frameworks. You’ll gain actionable patterns for leveraging Crossplane and KubeVela in production, plus strategies to evolve your platform as organizational demands grow.
Designing Platforms With Judgment: Agentic Flows With MCP#
Time: 4:00pm EST - 4:30pm EST
Speakers: Shivay Lamba (Developer Relations Engineer, Qualcomm); Ekansh Gupta (Software Engineer, SigNoz)
Venue: Building B | Level 4 | B406b-407, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Imagine a GitOps operator that doesn’t just blindly apply manifests but understands cost constraints, incident context, and security posture before acting. Platform engineering today isn’t just about scaling infrastructure, it’s about building intelligent systems that can interpret developer intent, adapt to change, and make responsible decisions autonomously. In this session, we introduce a new architectural pattern: agentic platforms. These platforms embed situational awareness into the delivery pipeline, transforming ops from reactive to responsive. By incorporating policy, workload, and environment signals, they enable secure-by-default provisioning, context-aware automation, and smart failure recovery. We’ll explore how the Model Context Protocol (Kagent) supports this, offering a flexible way to inject context into every automation path. Through real-world scenarios, you’ll see how MCP-powered agentic flows help platforms say no when needed, manage uncertainty gracefully.
Not Forking Around: Leveraging NRI To Extend Kubernetes at Scale#
Time: 4:00pm EST - 4:30pm EST
Speakers: Johan Jensen & Wesley Bermbach (Sr Software Engineer, Uber)
Venue: Building B | Level 3 | B312-314, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Transitioning a fleet to Kubernetes is challenging, even in the simplest environments. But what happens when hard business requirements demand features that Kubernetes doesn’t support yet? At Uber, we faced this challenge as we migrated over 200,000 database workloads from our proprietary infrastructure to Kubernetes. In this talk, we’ll share how we used the Node Resource Interface (NRI) as a powerful interface to extend Kubernetes with custom features critical to our business, without forking the core. NRI enabled us to meet our immediate needs while providing a clear path to deprecate it once upstream Kubernetes has the features we need. We’ll walk through our migration strategy, the missing features that drove our use of NRI, and how we successfully completed the transition. Finally, we’ll outline our plans to phase out NRI, adopt a fully vanilla Kubernetes stack, and share key lessons others can take from our journey.
Safely Sourcing OSS#
Time: 4:00pm EST - 4:30pm EST
Speakers: Beyond 0 CVEs - John Kjell (Principal Cloud Native Consultant, ControlPlane)
Venue: Building B | Level 3 | B302-303, Atlanta, GA, USA
Type: SECURITY
Description: As container images shrink and teams chase the elusive “0 CVE” scan, a host of other threats lurk beneath the surface of open source software. Security is more than vulnerabilities; it’s about trust, transparency, and maintainability. Open source can be: - Improperly governed – at risk of hostile takeovers - Maliciously licensed – hiding legal landmines - End-of-life – abandoned with no path forward - Poorly documented – where “read the code” is the only option - Untested – bugs waiting to detonate at scale - Insecurely released – exposing the supply chain These non-obvious risks often paralyze teams trying to make informed choices. But a new generation of tools is emerging to bring clarity. We’ll explore how CNCF projects and Linux Foundation initiatives are using OpenSSF’s Security Scorecards, SLSA, Security Baseline, and the 2025 updated TAG Security guidance on supply chain security to surface and share critical metadata that empowers safer open source adoption.
Sponsored Demo: Right-Sizing Kubernetes: Gain Performance and Efficiency in OpenShift and Beyond#
Time: 4:00pm EST - 4:20pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: As Kubernetes workloads scale across platforms like OpenShift, EKS, GKE, and AKS, right-sizing becomes essential to balance performance and efficiency. In this session, we’ll explore how the latest Kubecost capabilities help engineering and platform teams make smarter right-sizing decisions with enhanced usage visualizations, GPU-aware recommendations, and automated container-level controls. We’ll also touch on architectural upgrades that improve responsiveness and scalability. You’ll walk away with real-world strategies to cut waste, plan resources more effectively, and improve performance across cloud-native environments. In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
📚 Tutorial: Unlock the Future of Kubernetes and Accelerators With Dynamic Resource Allocation (DRA)#
Time: 4:00pm EST - 5:15pm EST
Speakers: Rey Lejano (Specialist Adoption Architect, Red Hat)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 2-3, Atlanta, GA, USA
Type: 📚 TUTORIALS
Description: At the heart of the AI revolution are GPUs and the platform that provides access to them is Kubernetes. Workloads historically access GPUs and other devices with the device plugin API but features are lacking. The new Dynamic Resource Allocation (DRA) feature helps maximize GPU utilization across workloads with additional features like the ability to control device sharing across Pods, use multiple GPU models per node, handle dynamic allocation of multi-instance GPU (MIG) and more. DRA is not limited to GPUs but any specialized hardware that a Pod may use including network attached resources such as edge devices like IP cameras. DRA is a new way to request for resources like GPUs and gives the ability to precisely control how resources are shared between Pods. This tutorial introduces DRA, reviews the “behind-the-scenes” of DRA in the Kubernetes cluster and walks through multiple ways to use DRA to request for GPU and a network attached resource.
Project Demo#
Time: 4:30pm EST - 4:55pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5 | Project Pavilion, Atlanta, GA, USA
Type: PROJECT OPPORTUNITIES
Sponsored Demo: Building a Cloud-Native Internal Developer Platform on Kubernetes#
Time: 4:30pm EST - 4:50pm EST
Venue: Building B | Level 1 | Exhibit Hall B3-B5, Atlanta, GA, USA
Type: SOLUTIONS SHOWCASE
Description: You’ve just mastered Kubernetes, and now you’re being asked to support next-generation cloud-native workloads. But where do you start when building an internal developer platform that connects essential services and tooling without locking into a proprietary service or spending weeks integrating open source tools? In this demo-driven session, Akamai will show how to operationalize a cloud native stack using familiar CNCF technologies. Learn practical examples of deploying multi-tenant Kubernetes environments and assembling an internal developer platform (IDP). In order to facilitate networking and business relationships at the event, you may choose to visit a third party’s booth or access sponsored content. You are never required to visit third party booths or to access sponsored content. When visiting a booth or participating in sponsored activities, the third party will receive some of your registration data. This data includes your first name, last name, title, company, address, email, standard demographics questions (i.e. job function, industry), and details about the sponsored content or resources you interacted with. If you choose to interact with a booth or access sponsored content, you are explicitly consenting to receipt and use of such data by the third-party recipients, which will be subject to their own privacy policies.
Partitionable Devices: Putting the “Dynamic” Back in Dynamic Resource Allocation#
Time: 4:45pm EST - 5:15pm EST
Speakers: Morten Jæger Torkildsen (Google & Jan-Philip Gehrcke, NVIDIA)
Venue: Building B | Level 4 | B401-402, Atlanta, GA, USA
Type: AI + ML
Description: Not all workloads need giant GPUs. Think about inference - smaller models could use smaller GPUs. NVIDIA’s Multi-Instance GPUs (MIGs) are meant to solve this problem, but using them in Kubernetes has been a big hassle requiring static pre-provisioning of these partitions, or use of specialized CRDs and tooling. Imagine if you could just ask for how much memory your model needs, and Kubernetes would dynamically provision a partition just big enough to fit it! Dynamic Resource Allocation (DRA) can make that happen! Come learn how the latest version of DRA implements simple, on-demand provisioning of MIGs based on the resource needs of your workload. You’ll also learn about how that same feature enables similar use cases with other accelerator technologies like Google’s TPU. Discover how to optimize GPU utilization and see it in action with a demo!
Taming the AI Hydra: Real-World Lessons in Governing AI Across the Enterprise#
Time: 4:45pm EST - 5:15pm EST
Speakers: Brian Fox (Co-founder and CTO, Sonatype); Sarah Evans (Distinguished Engineer, Dell Technologies); Christopher Robinson (Security Lorax, OpenSSF)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 1, Atlanta, GA, USA
Type: AI + ML
Description: As AI races into the enterprise, security, legal, and development teams are being forced to rethink governance at scale. From hallucinating copilots to uncontrolled shadow AI adoption, organizations are grappling with how to govern models, data, and outputs without stifling innovation. In this panel, three industry leaders on the front lines of open source and enterprise AI — Brian Fox (Sonatype), Christopher “CRob” Robinson (OpenSSF), and Sarah Evans (Dell Technologies) — discuss the emerging best practices, cultural challenges, and open source opportunities in AI governance. They will explore: -Lessons from existing software governance that do and don’t translate to AI -The risks of duplicative AI policy efforts across teams -The role of open source foundations in shaping responsible AI norms -How to balance transparency, control, and velocity with AI adoption -Where regulation helps (and where it hinders) innovation
TikTok’s IPv6 Journey To Cilium: Pitfalls and Lessons Learned#
Time: 4:45pm EST - 5:15pm EST
Speakers: Giri Kuncoro & Joseph Pallamidessi (Site Reliability Engineer, Cloud Native Security @TikTok, TikTok)
Venue: Building B | Level 3 | B308-309, Atlanta, GA, USA
Type: CONNECTIVITY
Description: Cilium has been the standard for Kubernetes networking and security. TikTok migrated clusters to use Cilium for its advanced security features like mutual authentication, along with high performance networking and enhanced observability. The main challenge was executing this on TikTok IPv6 only datacenters, as Cilium has been battle tested with IPv4 and dual-stack, but not with IPv6 only environments. This talk shares the journey of making Cilium work for IPv6 only Kubernetes, highlighting the limitations and techniques to overcome them. First, Cilium doesn’t support tunneling over IPv6, native routing mode must be configured. Second, we encountered several bugs related to IPv6 only: NDP traffic getting dropped by Cilium Network Policy due to incorrect identification; DNS policy not allowing traffic for IPv6 DNS servers; broken cilium debug tools when IPv4 related BPF maps not found. Finally, the NodePort timeout issue was blocking us from enabling Cilium to fully replace kube-proxy.
Deploying Lightweight AI Agents at the Healthcare Edge With K8s + Ollama#
Time: 4:45pm EST - 5:15pm EST
Speakers: Gary Arora & Samarth Shah (Chief Architect Cloud and AI Solutions, Deloitte)
Venue: Building B | Level 2 | B206, Atlanta, GA, USA
Type: EMERGING + ADVANCED
Description: AI agents are reshaping healthcare operations but traditional centralized LLMs come with challenges: high latency, data privacy concerns, and steep cloud costs. In this talk, we’ll explore how lightweight, Kubernetes-native deployments of AI agents powered by Ollama and K3s/MicroK8s enable intelligent, autonomous operations directly at the healthcare edge. We’ll walk through a real-world architecture where multi-agent systems orchestrate hospital workflows like patient triage, imaging coordination, and resource scheduling all without sending sensitive data offsite. You’ll see how small LLMs deployed locally can drive powerful workflows, the K8s primitives used to scale and monitor agents, and how this approach achieves both operational efficiency and regulatory compliance (HIPAA, GDPR). This talk blends cloud-native engineering, AI orchestration, and real healthcare needs and offers a blueprint for deploying resilient, scalable AI agent ecosystems anywhere edge computing is needed.
Introduction To TAG Infrastructure#
Time: 4:45pm EST - 5:15pm EST
Speakers: Dylan Page (Eng Manager, Core Infra | CNCF TAG Infrastructure Co-Chair, Lambda.ai); Kashif Khan (Maintainer, metal3.io)
Venue: Building C | Level 1 | C111-112, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: This presentation will introduce the recently rebooted CNCF TAG Infrastructure. We’ll cover its operational structure and its collaborative efforts with CNCF projects in key areas such as data and storage, networking, DNS, compute, service mesh, infrastructure lifecycle, edge computing, sovereignty, and load balancing. We will also highlight our ongoing work in developing ecosystem guidance and whitepapers. Attendees will learn how to contribute to and participate in the CNCF Infrastructure community, and gain practical insights into leveraging cloud-native infrastructure in their own environments.
Istio Project Updates: AI Inference, Ambient Multicluster & Default Deny#
Time: 4:45pm EST - 5:15pm EST
Speakers: Keith Mattix (Principal Software Engineer Lead, Microsoft)
Venue: Building C | Level 3 | Georgia Ballroom 3, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: The Istio community has been hard at work making Istio even better for the hundreds of users and organizations that use it in production. Come hear about the most exciting new features from the Istio Technical Oversight Committee as well as a roadmap for what we aim to accomplish next year.
Kubernetes Infra SIG: Intro and Updates#
Time: 4:45pm EST - 5:15pm EST
Speakers: Mahamed Ali (Senior DevOps Engineer, Arab Center for Research & Policy Studies)
Venue: Building C | Level 3 | Georgia Ballroom 1, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: The Kubernetes Infrastructure SIG is responsible for maintaining the overall infrastructure of the Kubernetes project. In this session, we will take a deep dive into some of the projects that the SIG is currently working on, as well as existing collaborations with other platform providers and Kubernetes SIGs. We will also provide an update on the current state of the SIG and explore what’s next.
Designing for Observability: From Noise To Insight#
Time: 4:45pm EST - 5:15pm EST
Speakers: Andrea Chomiak (Product designer, Dash0)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 4, Atlanta, GA, USA
Type: OBSERVABILITY
Description: Engineers often face cognitive overload, noisy dashboards, and alert fatigue, especially when working with complex tools like OpenTelemetry. These challenges aren’t just technical; they’re usability failures that impact incident response and developer experience. Yet the success of observability platforms often hinges on something less visible: design. Behind every intuitive dashboard and frictionless workflow is a designer translating complexity into clarity. This session offers a behind-the-scenes look from a product designer working in observability. Drawing on real-world experience designing for Kubernetes-native platforms built on OpenTelemetry, the speaker shares how design decisions shape engineers’ ability to understand, trust, and act on telemetry data, especially during high-pressure incidents. Takeaways: Why design is critical to observability and DevEx, plus actionable strategies for visualizing complex telemetry.
OpenTelemetry Logs Driving a Major Shift: Events, Richer Data, and Smarter Semantics#
Time: 4:45pm EST - 5:15pm EST
Speakers: Robert Pająk (OpenTelemetry Go maintainer and specification sponsor, Splunk, a Cisco Company)
Venue: Building B | Level 3 | B304-305, Atlanta, GA, USA
Type: OBSERVABILITY
Description: OpenTelemetry Logs are no longer the least mature signal. They’re driving major changes across the project. This talk explores how recent developments, including the introduction of OpenTelemetry Events, richer semantic conventions, and support for complex attribute values like nested objects and arrays. These changes are not isolated. They represent a coordinated effort to unify and modernize telemetry data, improve correlation across signals, and enable richer, more structured observability experiences. This session will dive into the technical challenges, design decisions, and emerging patterns that are turning logs into a first-class citizen in the OpenTelemetry ecosystem. This session makes the case that logs are no longer “legacy”—they’re a foundation for smarter, more unified observability. Whether you’re a platform engineer, SRE, or tooling vendor, understanding this shift is key to staying ahead as OpenTelemetry evolves.
Container Runtime Customization at Netflix: A Case Study With NRI and OCI Hooks#
Time: 4:45pm EST - 5:15pm EST
Speakers: Erikson Tung (Software Engineer, Netflix)
Venue: Building B | Level 4 | B405-406a, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Titus, Netflix’s Kubernetes-based container platform, runs hundreds of thousands of containers globally. This case study details Titus’s use of ContainerD’s Node Resource Interface (NRI) and OCI hooks to adapt a custom compute platform to a more conventional dataplane while operating at scale and maintaining Kubernetes compatibility. These extensions support Netflix’s unique workload needs, custom business logic, sidecar management, and systemd-compatible runtime environments. The session will cover an overview of Titus and its migration to a standard Kubernetes distribution while preserving specialized runtime capabilities. It will also provide a deep dive into the NRI plugin/OCI hook implementation, detailing Titus workload lifecycle management, special network configuration, storage handling, and sidecar management. The presentation will conclude with an examination of the challenges and lessons learned from scaling these runtime extensions.
How Comcast Leverages Radius in Their Internal Developer Platform#
Time: 4:45pm EST - 5:15pm EST
Speakers: Nick Beenham (Distinguished Engineer, Comcast); Jonathan Smith (Product Lead, Azure Open Source Incubations, Microsoft)
Venue: Building B | Level 4 | B406b-407, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Building cloud-native applications which can be deployed across multiple cloud environments is challenging—especially when using cloud resources beyond Kubernetes. In this session, learn about Comcast’s journey of discovering a more intuitive way to combine application and infrastructure definitions and how they are using Radius to manage resources across multiple cloud environments. As an contributor to the Radius project, Comcast is using Radius custom resource types to model application resources deployed in the cloud and on-premises. This session will explore key features and benefits of custom resource types in Radius as well as real-world use cases and best practices for implementing Radius in a multi-cloud enterprise environment.
On the Origin of Platforms: Evolution of a Capital One Enterprise Platform#
Time: 4:45pm EST - 5:15pm EST
Speakers: Bradley Whitfield & Jacob Walden (Distinguished Engineer - Platform Engineering, Capital One)
Venue: Building B | Level 3 | B312-314, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: In 2017, our team set out to build a Kubernetes developer platform for microservices at Capital One. This talk explores our platform evolution, from a proof-of-concept serving one team to a robust platform serving the enterprise. Attendees will learn how we leveraged the operator pattern to incorporate Capital One architectural patterns into the Kubernetes API. We’ll also explore how building trust and partnerships is a critical, yet often overlooked, aspect of platform engineering. Expanding on the operator pattern, our platform evolved its capabilities into managing cloud resources outside the cluster with Kubernetes controllers and a managed Kubernetes offering to deploy off-the-shelf applications. Join us to hear more about the obstacles we overcame, practical lessons learned, and what we are exploring next as the platform continues its evolution.
Securing AI Agent Infrastructure: AuthN/AuthZ Patterns for MCP and A2A#
Time: 4:45pm EST - 5:15pm EST
Speakers: Yoshiyuki Tabata (Hitachi, Ltd.)
Venue: Building B | Level 3 | B302-303, Atlanta, GA, USA
Type: SECURITY
Description: Is your AI agent infrastructure secure? As AI agents begin to exchange model context and coordinate across systems, secure interaction is no longer optional—it’s essential. To bring structure to these interactions, protocols like Model Context Protocol (MCP) and Agent-to-Agent (A2A) have emerged, offering standardized ways for agents to communicate. Adopting these protocols introduces new responsibilities. Developers must implement authentication and authorization (AuthN/AuthZ) mechanisms that comply with MCP and A2A while remaining practical for real-world deployment. In this session, Yoshiyuki Tabata shares best practices for designing AuthN/AuthZ and shows how to apply key principles from the CNCF IAM whitepaper to AI agent infrastructure—such as OAuth-based API access, P*P architecture for authorization, and workload authentication. The session includes a demo of secure AuthZ for an MCP server using Keycloak, illustrating how these practices apply in real-world agent interactions.
⚡ Lightning Talk: Tracing the Untraceable: OpenTelemetry for ‘Vibe-Coded’ LLM Apps#
Time: 4:45pm EST - 4:50pm EST
Speakers: Pranay Prateek (Maintainer, SigNoz)
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: ⚡ LIGHTNING TALKS
Description: Large-language-model–powered features feel like magic—until the magic misfires in production. Unlike the rest of our cloud-native stack, LLM apps are non-deterministic: the same prompt can yield wildly different outputs depending on model state, temperature, etc. Traditional pre-prod tests catch only a fraction of these edge cases. In this lightning talk, I’ll demo a “vibe-coded” LLM application that looks fine in dev but hallucinates customer-visible issues once real traffic hits. I will then inject OpenTelemetry spans and semantic attributes (prompt, temperature, model version, vector-DB latency, user feedback) and stream them into an open source observability backend. Attendees can watch in real time as traces pinpoint where uncertainty mutates into failure—turning opaque AI behavior into measurable, alertable signals. Attendees will leave with a template on how to make their application more observable and reliable using OpenTelemetry
⚡ Lightning Talk: Getting (and Staying) up To Speed on DRA With the DRA Example Driver#
Time: 4:52pm EST - 4:57pm EST
Speakers: Jon Huhn (Software Engineer, Microsoft)
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: ⚡ LIGHTNING TALKS
Description: As Dynamic Resource Allocation (DRA) matures in Kubernetes, the dra-example-driver (https://github.com/kubernetes-sigs/dra-example-driver) provides an end-to-end starting point for device vendors to create their own drivers, a lightweight driver implementation that can be deployed to clusters without any special hardware for users to rapidly experiment with DRA, and examples showcasing how the latest and greatest DRA features can be implemented by drivers and leveraged by users. This talk from a project maintainer will highlight the importance of the dra-example-driver as a key component across the DRA ecosystem as DRA continues to evolve. It will also call on the community to aggregate its best practices and more examples there to reinforce the dra-example-driver as the go-to place for referencing how DRA works in practice.
⚡ Lightning Talk: Graceful Controller Operations: Achieving Leader Election Without Restarts#
Time: 4:59pm EST - 5:04pm EST
Speakers: Jeffrey Ying (Software Engineer, Google)
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: ⚡ LIGHTNING TALKS
Description: Leader election in kubernetes often carries a hidden cost: disruptive container restarts during lease transitions. This lightning talk will illuminate the critical caveat in current leader election mechanisms where the entire container is forcefully shutdown and restarted via an os.Exit call to facilitate a lease change. We will explore the challenges this poses, including service disruption and the inability to perform graceful cleanup. This session will outline the work that’s been put in to make the entire transition process graceful and the new best practices of using the leader election client library.
⚡ Lightning Talk: Young? First-Gen? Female? New? Here’s Why You Belong Here Too#
Time: 5:06pm EST - 5:11pm EST
Speakers: Jennifer Weir (Kubernetes Platform Engineer, Ford Motor Company)
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: ⚡ LIGHTNING TALKS
Description: “Community” is derived from communis, meaning “shared by all”. You’re part of many communities—some by birth, others by choice—and the ones you’re born into shape the communities you choose to join. Now introduce the complexity of underrepresentation and you may start to wonder if you belong there at all. Open source is the most inclusive solution to technology’s greatest problems, but did you know only 3% of the 2017 Open Source survey’s respondents were women? CNCF has the power to accept contributions based on merit and inspire community members (yes, that’s you!) to build confidence to take a seat at the table and contribute—whether it’s the 1st time or 100th. If communities are “shared”, but individuals belong to unique groups, how can you build and support a diverse cloud native community? This talk explores key research on underrepresentation in open source contributions, highlights the value of diverse perspectives in business, and explains why representation matters to you.
⚡ Lightning Talk: How We Used Data Structures When Contributing To the Kubernetes Project#
Time: 5:13pm EST - 5:18pm EST
Speakers: Arsh Sharma (Senior DevRel Engineer, MetalBear)
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: ⚡ LIGHTNING TALKS
Description: There’s often a lot of debate in software engineering circles about the value of teaching data structures and algorithms to those just starting out. Honestly, when I was beginning my journey, I used to think of them as a complete waste of time too. This talk aims to challenge such notions by demonstrating how data structures can have real world applications in open source projects like Kubernetes. I’m one of the creators and maintainers of the depstat project, which Kubernetes uses to evaluate dependency updates. In this talk, I’ll share the design decisions we made while building this tool and how we leveraged data structures like graphs and graph traversal algorithms to implement it effectively. This talk will provide attendees with an understanding of what the depstat project does and also leave them with a deeper appreciation for the application of foundational computer science concepts in the world of open source!
⚡ Lightning Talk: Know Before You Go! Speedrun Intro To Gateway API#
Time: 5:20pm EST - 5:25pm EST
Speakers: Christine Kim (Software Engineer, Isovalent at Cisco)
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: ⚡ LIGHTNING TALKS
Description: The past few years, you may have heard a big splash around Gateway API. There seems to be a lot of confusion on what it is, what it can do, and who it is for. Especially with ingress-nginx and ingate, Gateway API will need your help to narrow down features, get feedback, and get contributors. This speedy talk will be the what and how of Gateway API, and talk about how to get involved and what’s next for the project!
⚡ Lightning Talk: Summarizing the Noise: LLM Observability With Open Data Hub, VLLM, KServe and Prometheus#
Time: 5:27pm EST - 5:32pm EST
Speakers: Twinkll Sisodia (Senior Software Engineer, Red Hat)
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: ⚡ LIGHTNING TALKS
Description: As large language models (LLMs) move into production, raw metrics alone aren’t enough. This talk presents an open-source AI observability solution built on Open Data Hub (ODH) that deploys LLMs using vLLM and KServe, scrapes inference metrics using Prometheus, and feeds them into a summarization model to generate actionable insights. We’ll demonstrate a working UI that translates low-level metrics like latency, GPU usage, and token throughput into human-readable summaries—giving platform teams an intelligent way to monitor LLMs at scale. No dashboards to interpret—just straight answers from your models about your models.
LLMs on Kubernetes: Squeeze 5x GPU Efficiency With Cache, Route, Repeat!#
Time: 5:30pm EST - 6:00pm EST
Speakers: Yuhan Liu (PhD Student, University of Chicago); Suraj Deshmukh (Senior Software Engineer, Microsoft)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 1, Atlanta, GA, USA
Type: AI + ML
Description: Struggling with GPU shortages & sky-high inference costs? You’re not alone. Deploying production-grade LLMs on K8s in 2025 still feels like competing in the GPU Hunger Games. But what if you could 5x your cluster’s efficiency using battle-tested Comp. Sci. principles—not magic? Join us to explore how the open-source project “Production Stack” (a first-party vLLM project) supercharges vLLM on K8s with: - Cache: Offload KV Cache using LMCache to CPU/disk/remote storage(no redundant computations!) - Smarter Routing: Match requests to GPUs with pre-computed caches (lower TTFT) - Fault Tolerance++: Migrate live requests mid-generation during failures - RAG Revolution: Blend non-prefix caches from retrieved chunks(CacheBlend=3x faster TTFT) - Benchmarks Don’t Lie: See 5x throughput vs. vanilla vLLM in real-world tests Whether you’re an Infra Engineer, ML Developer or SRE, this deep dive will leave you with actionable patterns to deploy faster,cheaper & more reliably.
Prepare for Disruptions: How We Upgrade the Whole ML Training Fleet Bi-weekly#
Time: 5:30pm EST - 6:00pm EST
Speakers: Cong Gu & Ankit Goyal (Manager, AI Platform, LinkedIn, LinkedIn)
Venue: Building B | Level 4 | B401-402, Atlanta, GA, USA
Type: AI + ML
Description: Machine learning jobs are particularly vulnerable to node disruptions—whether planned (like host maintenance, kernel upgrades, or security patches) or unplanned (such as GPU ECC memory errors or sudden node failures). These interruptions can derail progress and waste valuable training time. In this talk, we’ll explore how to build disruption-tolerant ML infrastructure on Kubernetes that balances platform reliability with job continuity. We’ll cover techniques we’ve developed and battle-tested at scale, including: • Automatic Multi-stage checkpoint and restore of training jobs to allow fast and seamless recovery after interruptions. • Intelligent scheduling and smart collocation to account for node health, job characteristics, and maintenance timing. • Job-aware backpressure mechanisms that coordinate updates and reduce the likelihood of disruption during critical job phases. Attendees will leave with practical strategies for managing infrastructure disruptions leveraging Kubernetes.
Drasi: A New Take on Change-driven Architectures#
Time: 5:30pm EST - 6:00pm EST
Speakers: Aman Singh (Principal Software Engineer, Microsoft)
Venue: Building B | Level 3 | B308-309, Atlanta, GA, USA
Type: APPLICATION DEVELOPMENT
Description: Modern cloud-native systems constantly generate data changes, and applications often need to react to them. Building change-driven solutions that respond to specific changes in distributed data is challenging. This talk introduces Drasi, a CNCF Sandbox project that simplifies the design and implementation of change-driven architectures by codifying continuous query and reaction patterns. For example, with Drasi you can declaratively write automation to detect and respond to running containers with newly identified vulnerabilities across pods and deployments in a Kubernetes cluster. Join us for a walkthrough of real-world use cases and live demos that show how Drasi’s approach brings structure and responsiveness to complex distributed environments—without writing custom code.
Kubernetes at the Edge – Come See It in Action!#
Time: 5:30pm EST - 6:00pm EST
Speakers: Xavier Avrillier & Antonia von den Driesch (Solutions architect, Giant Swarm)
Venue: Building B | Level 2 | B206, Atlanta, GA, USA
Type: EMERGING + ADVANCED
Description: Edge computing is still a fairly new area in the cloud native tech industry and is growing fast. As computing moves to the edge, what does Kubernetes look like beyond the cloud, and why does it matter? This session features a live demo with a Raspberry Pi, camera, and real-time AI detection. Watch as our edge device identifies raised hands, sends data to a Kubernetes cluster via KubeEdge, and visualizes results instantly. We’ll explore: - Edge Kubernetes challenges: connectivity, resources, security - KubeEdge’s approach to decentralized workloads - Real-world applications across industries Join us to see how AI, Kubernetes, and edge computing converge to enable powerful new possibilities.
Building Resilient Cloud Native Infrastructure in the Second Decade#
Time: 5:30pm EST - 6:00pm EST
Speakers: TAG Operational Resilience - Rafael Brito (Principal Engineer | CNCF TAG OpRes Co-Chair, StormForge by CloudBolt); Mario Fahlandt (Customer Delivery Architect, Kubermatic); Saiyam Pathak (Principal Developer Advocate, vCluster); Alolita Sharma (Engineering Leader, AIML Platform Engineering, Observability, Apple); Nabarun Pal (Principal Engineer, Broadcom)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 2-3, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: Not only Kubernetes but a myriad of projects have evolved to be the core of the infrastructure for many companies. With this evolution, new challenges arise, including management, Day 2 operations, sustainability, and many more. If you care about the resilience and operational health of your Cloud-Native Infrastructure, join us in this session. You will meet the recently elected leadership of the CNCF Technical Advisor Group (TAG), which is responsible for Operational Resilience. The recently created “OpRes” TAG is responsible for Observability, Management, Business Continuity, Resource Optimization, Cost Efficiency, Energy, Performance, Troubleshooting, Reliability, and Day 2 Operations of Cloud Native infrastructure. You will gain a bird’s-eye view of the CNCF projects that fall under this TAG and explore how you can contribute to making the Cloud Native ecosystem more resilient and easier to manage, helping to shape the future of Cloud Native standards for Day 2.
Dapr in 2026: Durable Execution and Resilient Eventing for AI Agents#
Time: 5:30pm EST - 6:00pm EST
Speakers: Yaron Schneider (CTO, Diagrid); Rajesh Iyer (Principal Engineer, JPMC)
Venue: Building C | Level 3 | Georgia Ballroom 3, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: In this maintainer track we’ll cover existing and upcoming features that allow developers to more easily create complex workflow based applications, as well as Agentic AI systems. We will also showcase Dapr’s role as an Application Developer Platform that is filling the gap required to govern and regulate access from applications to their underlying infrastructure and providing zero-trust security, agent-to-agent discovery and event ingestion to AI agent frameworks like LangGraph, CrewAI and others
MUST/SHOULD/MAY#
Time: 5:30pm EST - 6:00pm EST
Speakers: A Tour of TAG Security and Compliance Project Services - Evan Anderson (Custcodian & Brandt Keller, Defense Unicorns)
Venue: Building C | Level 3 | Georgia Ballroom 1, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: This talk is for CNCF project maintainers and community members who want to help projects improve the maturity of their security practices. TAG Security and Compliance maintains a number of sub-projects and initiatives aimed to help projects improve their security posture and reduce the likelihood and impact of vulnerabilities. Some of these services (like joint security assessments) are mandatory for projects looking to graduate through the CNCF lifecycle phases, while others are used to form recommendations for CNCF policy or are purely advisory. In this talk, Eddie and Evan will walk projects through the TAG services as well as complementary services such as LF code audits, OpenSSF Scorecard, and Best Practices Badge. You’ll walk away with practical information about when projects should engage various activities and a clear understanding of the benefits that each service provides.
SIG-Multicluster Intro and Deep Dive#
Time: 5:30pm EST - 6:00pm EST
Speakers: Stephen Kitt (Senior Principal Software Engineer, Red Hat); Pavanipriya Sajja (UX Designer, Independent); Jeremy Olmsted-Thompson (Principal Engineer, Google)
Venue: Building C | Level 1 | C111-112, Atlanta, GA, USA
Type: MAINTAINER TRACK
Description: SIG-Multicluster is focused on solving common challenges related to the management of many Kubernetes clusters, and applications deployed across many clusters, or even across cloud providers. In this session, we’ll give attendees an overview of the current status of the multi-cluster problem space in Kubernetes and of the SIG. We’ll discuss current thinking around best practices for multi-cluster deployments and what it means to be part of a ClusterSet. Then we’ll highlight current SIG projects, focused use cases, and ideas for what’s next. Most importantly, we’ll provide information on how you can get involved either as a contributor or as a user who wants to provide feedback about the SIG’s current efforts and future direction. Bring your questions, problems, and ideas - help us expand the multi-cluster Kubernetes landscape!
Diagnosing Application Performance With EBPF, Pyroscope, and Kubernetes#
Time: 5:30pm EST - 6:00pm EST
Speakers: Liam Mackie (Lead Cloud Engineer, Octopus Deploy)
Venue: Building B | Level 5 | Thomas Murphy Ballroom 4, Atlanta, GA, USA
Type: OBSERVABILITY
Description: In this case study, we’ll share how our team guarantees that the software we ship - packaged as a Helm chart and running in customer clusters - is performant under any situation a customer can put it in. When traditional metrics fell short, we turned to kernel-level eBPF profiling using Pixie and continuous profiling using Pyroscope. Along with OpenTelemetry tracing, we were able to drill down into our implementation’s hotspots. By deploying this tooling in our own clusters, pushing our software to its limits, and correlating the telemetry gathered, we uncovered inefficient loops, then rolled out fixes before our customers hit these problems themselves. Attendees will walk away with a framework of observability tooling to enable the analysis of applications running in-cluster, and allow them to find and fix performance regressions faster than ever before!
Observing Dark Matter With OpenTelemetry#
Time: 5:30pm EST - 6:00pm EST
Speakers: Sam Alipio & Mario Macías (Staff Product Manager, Pasteur Labs)
Venue: Building B | Level 3 | B304-305, Atlanta, GA, USA
Type: OBSERVABILITY
Description: Stories of time wasted and (lots of emotions). What time do we really measure when we instrument our applications and try to collect request duration times? What conclusions do we draw when the request duration reported by client requests don’t even come close to the times reported by the server? Do we blame our network, our reverse proxies or our monitoring tools when “dark matter” gaps exist in our traces? This talk covers some subtle nuances in request duration measurement, which can often lead to completely misleading results. We dig deep into the OpenTelemetry instrumentation approaches in a few popular programming languages and correlate what the duration timings we see in our traces or logs mean. We’ll show how using the newly donated OpenTelemetry eBPF Instrumentation for application instrumentation allows us to augment our telemetry to overcome various inaccuracies in duration measurement… and get closer to the truth.
Composable Platforms in the Wild: Patterns That Work (and Fail)#
Time: 5:30pm EST - 6:00pm EST
Speakers: Daniel Bryant (Platform Engineer and Head of Product Marketing, Syntasso)
Venue: Building B | Level 4 | B405-406a, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Composable platforms are redefining how internal platform teams deliver value. While the theory is compelling, the real-world results can be mixed. In this talk, we’ll explore the patterns that actually work (and the ones that don’t) when building composable platforms using Kubernetes, and tools like Backstage, Crossplane, and Kratix. Drawing from experience across regulated enterprises, scaling startups, and open source communities, we’ll examine how teams are approaching modular service delivery, integrating self-service with governance, and balancing autonomy with consistency. You’ll learn: - Which architectural and team patterns accelerate platform adoption - Why some “composability” efforts create more confusion than clarity with janky abstractions - How to identify early warning signs of failure, such as bloated GitOps repos and overly rigid golden paths This talk is for platform engineers, architects, and leaders who want to make composable platforms deliver real outcomes.
One Dozen To One Thousand Clusters: How Argo Kept up as We Scaled#
Time: 5:30pm EST - 6:00pm EST
Speakers: Jérémy Albuixech & Kahou Lei (Staff Software Engineer, Okta)
Venue: Building B | Level 3 | B312-314, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: In 2020, a team of engineers at Auth0 selected Argo Workflows and ArgoCD for their new private cloud platform because of the Kubernetes compatibility, support for Kustomize, and ability to speed up the initial implementation efforts. Since then, the platform has scaled dramatically, to nearly a thousand clusters, 1M+ resources and hundreds of daily deploys. Argo has continued to prove its value, adapting seamlessly to the platform’s rapid growth and evolving requirements. At KubeCon NA last year Kahou and Jeremy presented their red/black deployment strategy for zero-downtime daily updates across numerous clusters. This time they will focus on Argo, a central component of their platform’s architecture. Their presentation will cover the evolution of their Argo usage and the difficulties they faced. They will share the solutions they implemented, the present status of their Argo deployments, the existing limitations, and their rationale for choosing Argo over alternative solutions.
Why Is My Query Slow? Real AI Use Cases With Vitess + Kubernetes + Prometheus#
Time: 5:30pm EST - 6:00pm EST
Speakers: Brett Warminski & Gourav Khanijoe (Senior Engineering Manager, HubSpot)
Venue: Building B | Level 4 | B406b-407, Atlanta, GA, USA
Type: PLATFORM ENGINEERING
Description: Self-service database platforms help teams move fast—but only if engineers can help themselves. Over the past two years, the HubSpot data infrastructure team has built and evolved a suite of LLM-powered tools to assist developers working with their large Vitess installation running on Kubernetes. In this talk, the team will share what worked (and what didn’t): from interpreting slow query plans to guiding self-scaling decisions to failed attempts at RAG-based documentation search. Participants will learn about latency workarounds, integrations with chat and UI flows, prompt pitfalls, and the essential glue work that made it all usable.
Security Theater or Real Defense? Navigating Open Source Security in a Cloud Native World#
Time: 5:30pm EST - 6:00pm EST
Speakers: Rotem Refael (VP R&D, ARMO); Constanze Roedig (Key Researcher, Open Source Maintainer, SBA Research, Technical University of Vienna); Megan Wolf (Platform Engineer, Defense Unicorns); Stefana Muller (VP, Infrastructure, Salesforce); Oshrat Nir (Product Marketing @MetalBear, MetalBear)
Venue: Building B | Level 3 | B302-303, Atlanta, GA, USA
Type: SECURITY
Description: Kubernetes teams are drowning in dashboards, buried in YAML, and haunted by the ghost of “shift left.” Everyone says security is built-in, but breaches still happen, compliance still bites and engineers are still burned out. So what’s actually working… and what’s just performative security theater? This women-led panel cuts through the noise. Featuring OSS contributors, DevSecOps veterans, and security leads from production-grade, cloud-native environments, we’re here to talk honestly about what breaks, what works, and what’s pure illusion. They’re contributors and practitioners behind CNCF toolsets—and they’ve seen it all: what works, what fails, and what we wish we knew earlier. Explore what’s real vs. theater in Kubernetes security: how to measure impact, where CNCF tools help (or fall short), and how to stay effective under pressure. No fluff, no vendor pitches. Just battle-tested insights from engineers on the front lines of securing cloud-native infrastructure at scale.
⚡ Lightning Talk: Threat Modeling Kubernetes: Fast, Practical, and LLM-Driven#
Time: 5:34pm EST - 5:39pm EST
Speakers: Maxime Coquerel (Principal Cloud Security Architect, RBC - Royal Bank of Canada)
Venue: Building C | Level 3 | Georgia Ballroom 2, Atlanta, GA, USA
Type: ⚡ LIGHTNING TALKS
Description: Kubernetes environments are inherently complex especially in regulated industries due to integrations with service meshes, secrets management, and policy engines. Traditional threat modeling often struggles to keep up with this dynamic ecosystem. In this 5-minute session, I’ll introduce a streamlined, Kubernetes-specific threat modeling approach that integrates naturally into dev and SRE workflows. You’ll learn how to quickly identify risks across clusters, workloads, and third-party components. We’ll also cover how large language models (LLMs) can support this process by generating threat scenarios, mapping to MITRE ATT&CK, and helping teams continuously refine their security posture. You’ll leave with actionable tactics to embed threat modeling into your engineering practices fast, repeatable, and built for modern cloud-native environments.
Evening Social Hosted by Heroku + AWS | SEPARATE REGISTRATION REQUIRED#
Time: 6:00pm EST - 9:00pm EST
Venue: SkyLounge at the Glenn Hotel, 110 Marietta St NW, Atlanta, GA 30303
Type: SPONSOR-HOSTED CO-LOCATED EVENT
Description: Join us at KubeCon + CloudNativeCon North America 2025 for an evening with panoramic views on Wednesday, November 12, 6pm to 9pm at SkyLounge. Heroku and AWS are hosting a spectacular happy hour at the historic SkyLounge on the rooftop of the Glenn Hotel in Atlanta. Don’t miss the chance to connect with fellow CNCF community members in a relaxed setting. Whether you’re a long time Heroku user or new to our platform, this evening social is the perfect opportunity to network. This event is invite-only. Please register today to secure your spot. We hope to see you there! Please note that this is an off-site Sponsor-hosted Co-located event.