AI InfrastructureExplainerJun 28, 2026, 2:31 PM· 6 min read· #2 of 3 in ai

OpenAI and Broadcom Unveil 'Jalapeño,' a Custom AI Inference Chip Designed in Record Nine Months to Halve LLM Serving Costs

OpenAI has partnered with Broadcom to launch its first proprietary AI inference processor, designed from scratch in just nine months. The new silicon aims to drastically reduce the cost and energy required to run advanced AI models, signaling a major shift toward custom hardware in the AI industry.

By Factlen Editorial Team

Share this story

AI Model Developers 30%Enterprise Customers 30%Semiconductor Partners 20%Hardware Analysts 20%

AI Model Developers: Focus on vertical integration to lower costs and optimize performance.
Enterprise Customers: Prioritize predictable token economics and scalable AI deployment.
Semiconductor Partners: View custom AI accelerators as a massive growth vector for infrastructure contracts.
Hardware Analysts: Analyze the architectural trade-offs and the challenge to incumbent GPU dominance.

What's not represented

· Environmental Advocates concerned about the massive energy footprint of gigawatt-scale AI data centers.
· Smaller AI Startups who cannot afford to design custom silicon and must rely on off-the-shelf hardware.

Why this matters

Running advanced AI models for millions of users requires astronomical amounts of electricity and expensive hardware, driving up costs for businesses and consumers. By designing a highly efficient, custom-built chip, OpenAI is fundamentally changing the economics of AI, paving the way for cheaper, faster, and more accessible intelligence tools.

Key points

OpenAI and Broadcom have unveiled Jalapeño, a custom-built AI inference processor designed specifically for large language models.
The chip was developed from initial design to manufacturing tape-out in a record-breaking nine months using AI assistance.
Jalapeño is an Application-Specific Integrated Circuit (ASIC) built exclusively for inference, not for training AI models.
The architecture utilizes a multi-chip module with eight HBM3E memory stacks to maximize data throughput and minimize latency.
Early testing indicates the chip delivers a performance-per-watt ratio substantially better than current state-of-the-art hardware.
The hardware will be deployed in gigawatt-scale data centers starting in late 2026, aiming to drastically lower AI serving costs.

9 months

Development cycle to tape-out

HBM3E memory stacks per chip

1.3 GW

Compute deployment in 2027

10 GW

Total agreement scale by 2029

For years, the artificial intelligence boom has been constrained by a single, expensive bottleneck: the hardware required to run the models. Now, the industry's most prominent player is taking matters into its own hands. OpenAI, in direct collaboration with semiconductor giant Broadcom, has unveiled "Jalapeño," its first proprietary AI inference processor. The custom-built chip represents a massive strategic pivot for the ChatGPT maker, marking its transition from a pure software and research company into a vertically integrated hardware architect. Designed from a blank slate to serve large language models (LLMs), Jalapeño aims to drastically reduce the energy and financial costs associated with generating AI responses.[1][7][8]

The announcement underscores a fundamental shift in the AI supply chain. Historically, companies like OpenAI have relied heavily on general-purpose graphics processing units (GPUs), primarily from Nvidia, to both train their models and serve them to users. While GPUs are unmatched for the mathematically intensive process of training AI, using them for "inference"—the act of generating live responses for millions of users—is highly inefficient and astronomically expensive. By designing an Application-Specific Integrated Circuit (ASIC) exclusively for inference, OpenAI is targeting the exact phase of the AI lifecycle that drives its daily operational costs.[2][5][7]

What makes Jalapeño particularly notable is the unprecedented speed of its creation. In the semiconductor industry, designing an advanced custom chip typically takes two to three years. OpenAI and Broadcom transitioned Jalapeño from an initial concept to "tape-out"—the final design phase before physical manufacturing—in just nine months. This rapid development cycle was achieved through a unique feedback loop: OpenAI used its own advanced AI models to simulate, design, and optimize portions of the hardware architecture. As OpenAI noted, the same models served to users are now helping to engineer the infrastructure required to run future models.[1][7][8]

By using its own AI models to assist in the design process, OpenAI accelerated the chip's development cycle to just nine months.

From an architectural standpoint, Jalapeño is built to solve the primary bottleneck in modern interactive AI: data movement. When an LLM generates a response, it must constantly shuffle massive amounts of data between the compute cores and the memory banks. To address this, Jalapeño utilizes a contemporary multi-chip module design. It features a large, centrally located logic tile flanked by eight stacks of HBM3E (High Bandwidth Memory), all connected via a high-speed interposer. This configuration prioritizes high throughput and low latency, ensuring that the chip can feed data to the processor as fast as the processor can calculate it.[2][5][7]

The chip's design is heavily informed by OpenAI's deep understanding of its own software stack. Rather than building a generic AI accelerator, OpenAI optimized Jalapeño's architecture around the specific kernels, memory movement patterns, and serving systems that power ChatGPT, Codex, and future agentic AI products. This tight integration between software and hardware is conceptually similar to Google's Tensor Processing Units (TPUs), allowing every layer of the technology stack to be tuned for maximum efficiency. By controlling the hardware, OpenAI can ensure that its future models are not constrained by the limitations of off-the-shelf silicon.[1][2][5][6]

The chip's design is heavily informed by OpenAI's deep understanding of its own software stack.

Early laboratory testing suggests the gamble is paying off. Engineering samples of the Jalapeño chip are already running machine learning workloads, including an unreleased model dubbed "GPT-5.3-Codex-Spark," at target production frequencies and power levels. While final performance metrics are still being measured, OpenAI claims that the first-generation accelerator is delivering a performance-per-watt ratio that is "substantially better than current state-of-the-art" hardware. If these claims hold true in real-world deployments, Jalapeño could operate close to its theoretical maximum efficiency, drastically lowering the electricity required to power advanced AI.[1][5][8]

Jalapeño utilizes a multi-chip module design with eight high-bandwidth memory stacks to maximize data throughput.

Bringing a chip of this complexity to market requires a massive coalition of industry heavyweights. Broadcom provided the crucial silicon implementation expertise and integrated its Tomahawk networking technology, which is essential for linking thousands of chips together in a data center. Taiwan Semiconductor Manufacturing Company (TSMC) has been tasked with the physical fabrication of the silicon, while Celestica is managing the engineering of the server boards and rack systems. This consortium approach allows OpenAI to leverage the manufacturing scale of established hardware giants without having to build its own fabrication plants.[1][4][7]

The scale of the planned deployment is staggering. Jalapeño is not an experimental prototype; it is the foundation of a multi-generation compute platform. Broadcom executives confirmed that initial deployments in enterprise data centers will begin in late 2026. Furthermore, Broadcom has secured a contractual commitment to deploy 1.3 gigawatts of compute infrastructure for OpenAI in 2027, as part of a broader 10-gigawatt agreement extending through 2029. To put that in perspective, a single gigawatt is roughly equivalent to the power consumption of a mid-sized city, highlighting the sheer physical footprint of the next generation of AI.[1][2][4]

For enterprise customers, the introduction of Jalapeño could fundamentally alter the economics of artificial intelligence. Currently, businesses face explosive costs associated with "tokenomics"—the pricing model based on the volume of data processed by an AI. As companies integrate AI deeper into their operations, their token consumption and associated costs have skyrocketed. By changing the economics of converting electricity into tokens, Jalapeño gives OpenAI the financial bandwidth to lower prices. Analysts suggest this could trigger a price war in the model ecosystem, making advanced AI significantly more affordable for end-users.[6]

Broadcom is contractually committed to deploying 1.3 gigawatts of compute infrastructure for OpenAI by 2027.

Beyond cost savings, Jalapeño represents a strategic declaration of independence. By diversifying its infrastructure away from a strict reliance on Nvidia, OpenAI gains immense leverage in the hardware market. While the company will continue to use third-party GPUs for the computationally grueling task of training new models, handling inference internally protects OpenAI from supply chain bottlenecks and the steep profit margins commanded by incumbent chipmakers. It ensures that as AI transitions from a research novelty to a ubiquitous utility, the underlying engine remains firmly under OpenAI's control.[1][2][6][7]

The ripple effects of this launch will be felt across the semiconductor industry. For Broadcom, the Jalapeño partnership validates its custom AI accelerator strategy, providing tangible evidence that its silicon roadmap is progressing and strengthening its revenue visibility for years to come. Broadcom expects its AI semiconductor revenues to exceed $100 billion in 2027, driven largely by sustained demand from hyperscale customers like OpenAI. Meanwhile, incumbent GPU manufacturers will face increasing pressure to prove that their general-purpose hardware can compete with the efficiency of application-specific designs.[4][7]

Ultimately, the unveiling of Jalapeño marks the maturation of the artificial intelligence industry. Just as the smartphone era eventually required Apple to design its own silicon to achieve optimal performance, the AI era is now demanding bespoke hardware. By co-developing a chip that perfectly matches its software, OpenAI is not just cutting costs; it is paving the way for faster, more reliable, and more accessible AI. As these custom processors light up data centers around the world, they will quietly power the next wave of agentic AI, turning the theoretical promise of artificial intelligence into a scalable, everyday reality.[1][5][8]

How we got here

Oct 2025
OpenAI and Broadcom first announce plans to collaborate on a custom AI accelerator.
Late 2025
OpenAI begins using its own advanced language models to accelerate the chip's design and optimization.
Jun 2026
OpenAI and Broadcom officially unveil the Jalapeño inference processor after a record nine-month development cycle.
Late 2026
Initial deployment of Jalapeño chips is slated to begin in enterprise data centers.
2027
Broadcom is contractually committed to deploying 1.3 gigawatts of compute infrastructure for OpenAI.
2029
The multi-generation compute platform agreement is scheduled to reach a massive 10-gigawatt scale.

Viewpoints in depth

AI Model Developers

Focus on vertical integration to lower costs and optimize performance.

For companies like OpenAI, the transition to custom silicon is an existential necessity. Relying on general-purpose GPUs for inference introduces massive inefficiencies, as those chips are burdened with hardware designed for training rather than serving. By designing the Jalapeño ASIC from a blank slate, developers can perfectly align the physical hardware with their specific software kernels and memory movement patterns. This vertical integration not only slashes the electricity required per token but also frees AI companies from the steep profit margins commanded by incumbent chipmakers, granting them total control over their operational destiny.

Enterprise Customers

Prioritize predictable token economics and scalable AI deployment.

From the perspective of enterprise users integrating AI into their daily operations, the hardware architecture matters less than the resulting 'tokenomics.' Businesses are currently facing explosive cost increases as their token consumption rises, making large-scale AI deployment financially daunting. The introduction of highly efficient inference chips like Jalapeño is viewed as a critical relief valve. By fundamentally changing the economics of converting watts to tokens, these custom processors give AI vendors the financial leeway to lower prices, potentially triggering a price war that makes advanced AI affordable for widespread corporate adoption.

Hardware Analysts

Analyze the architectural trade-offs and the challenge to incumbent GPU dominance.

Semiconductor analysts view the Jalapeño chip as a fascinating case study in architectural specialization. By utilizing a massive compute chiplet flanked by expensive HBM3E memory, the design prioritizes high throughput and low latency—crucial for agentic AI workloads but costly to manufacture. Analysts note that while purpose-built ASICs offer vastly superior performance-per-watt for specific tasks, they lack the flexibility of general-purpose GPUs. The success of this hardware will depend on whether the immense upfront development costs can be offset by the operational savings achieved at gigawatt-scale deployments.

What we don't know

The exact performance benchmarks and technical specifications of the Jalapeño chip, which OpenAI plans to release in a future report.
How incumbent GPU manufacturers like Nvidia will adjust their pricing and product roadmaps in response to the rise of custom inference ASICs.
The total financial investment required by OpenAI to fund the gigawatt-scale deployment of this custom hardware.

Key terms

Inference: The process of a trained AI model generating responses, predictions, or actions based on live user input.
ASIC (Application-Specific Integrated Circuit): A microchip designed for a very specific task—in this case, running AI models—rather than general-purpose computing.
Tape-out: The final phase of the chip design process before the blueprints are sent to a manufacturing facility to be printed into physical silicon.
HBM3E: High Bandwidth Memory, a type of stacked memory that allows for incredibly fast data transfer, which is crucial for preventing bottlenecks in AI workloads.
Tokenomics: The economic model of how AI companies charge for and process "tokens" (pieces of words or data) when generating text or code.

Frequently asked

What is an inference chip?

An inference chip is a specialized processor designed specifically to run trained AI models and generate responses for users, rather than being used to train the models from scratch.

Why did OpenAI build its own chip?

OpenAI built Jalapeño to reduce the massive electricity and financial costs of running ChatGPT, and to lessen its reliance on expensive, general-purpose chips from companies like Nvidia.

When will Jalapeño be deployed?

Initial deployment in enterprise data centers is slated for late 2026, with a massive scale-up planned through 2029.

Did AI help design the chip?

Yes. OpenAI used its own advanced language models to accelerate parts of the design and optimization process, helping achieve a record nine-month development cycle.

Sources

[1]OpenAIAI Model Developers
OpenAI and Broadcom unveil LLM-optimized inference chip
Read on OpenAI →
[2]TechPowerUpHardware Analysts
OpenAI Unveils Jalapeno LLM Inferencing Accelerator Built in Collaboration with Broadcom
Read on TechPowerUp →
[3]EngadgetHardware Analysts
OpenAI and Broadcom have unveiled the design for Jalapeño, their first jointly-made chip
Read on Engadget →
[4]Yahoo FinanceSemiconductor Partners
Broadcom AVGO and OpenAI unveiled Jalapeño
Read on Yahoo Finance →
[5]Tom's HardwareHardware Analysts
Broadcom and OpenAI unveil custom-built Jalapeño inference processor
Read on Tom's Hardware →
[6]AI BusinessEnterprise Customers
OpenAI and Broadcom unveiled a new LLM-optimized inference chip
Read on AI Business →
[7]CRN AsiaSemiconductor Partners
OpenAI, in a direct collaboration with Broadcom, has unveiled its first proprietary Intelligence Processor
Read on CRN Asia →
[8]BroadcomAI Model Developers
OpenAI and Broadcom today unveiled Jalapeño, OpenAI's first Intelligence Processor
Read on Broadcom →

Up next

Model Distillation

Explainer: How the 28.8 Million-Exchange 'Distillation Attack' on Claude is Reshaping AI Security

Anthropic has accused Alibaba of using 25,000 fake accounts to harvest millions of AI outputs, highlighting the growing battle over 'model distillation' and intellectual property.

Every angle. Every day.

Get ai stories with full source coverage and perspective breakdowns delivered to your inbox.

Get the briefing →Browse ai