Robotics Data Startup XDOF Secures $70 Million and Open-Sources Landmark Training Dataset
XDOF has emerged from stealth with $70 million in funding to solve the biggest bottleneck in physical AI: the lack of real-world training data for robots. Alongside the launch, the startup released the world's largest open-source dataset for bimanual robot manipulation.
By Factlen Editorial Team
- AI Infrastructure Providers
- Focus on the business model of providing 'pick-and-shovel' physical data services to AI labs.
- Academic & Open-Source Community
- Value the release of the ABC-130K dataset as a massive leap forward for reproducible, accessible robotics research.
- Frontier AI Developers
- View high-quality, scalable physical data as the final bottleneck to achieving general-purpose embodied AI.
What's not represented
- · Human Teleoperators
- · Industrial Automation Buyers
Why this matters
While language models like ChatGPT trained on the entire internet, physical robots have lacked a comparable data source to learn from. XDOF's infrastructure and open-source dataset could dramatically accelerate the development of general-purpose robots capable of performing complex physical tasks in homes and factories.
Key points
- XDOF emerged from stealth with $70 million from investors including Thrive Capital and a16z.
- The startup provides outsourced data pipelines and teleoperation infrastructure for training physical robots.
- XDOF released ABC-130K, the world's largest open-source dataset for bimanual robot manipulation.
- The company already serves roughly 20 customers, including major frontier AI laboratories.
As the artificial intelligence industry races to build machines that can operate in the physical world, a critical bottleneck has emerged: the lack of high-quality training data. While large language models achieved breakthroughs by ingesting the entire internet, robots require precise, physical interaction data that simply does not exist online. To solve this "chicken-and-egg" problem, robotics data infrastructure startup XDOF officially emerged from stealth this week, announcing a $70 million funding round.[1][4]
The round drew participation from a roster of heavyweight venture capital firms, including Thrive Capital, Spark Capital, Andreessen Horowitz (a16z), Lux Capital, and WndrCo. Founded in October 2024 by Philipp Wu, Fred Shentu, and Nemo Jin—alumni of UC Berkeley, Tesla, and Meta—XDOF operates as an outsourced data factory for the robotics industry. The company builds the specialized data pipelines, collection hardware, and annotation systems required to train robots for real-world physical interaction.[2][3][4]
"We didn't have large-scale data to work with," CEO Philipp Wu explained, recalling his time as a Ph.D. student. "We first needed to actually collect data before we could even ask how to train a foundation model for robotics." To prove the efficacy of its infrastructure, XDOF simultaneously released ABC-130K, which it describes as the world's largest open-source bimanual robot manipulation dataset.[1][5]

Developed in collaboration with researchers from UC Berkeley, Carnegie Mellon University, MIT, and Amazon, the landmark dataset provides the academic community with an unprecedented foundation for training physical AI. The release includes 130,000 robot manipulation trajectories, representing 3,500 hours of real-world interaction data. The dataset spans 195 distinct physical tasks, capturing a wide spectrum of manipulation primitives such as pick-and-place actions, handovers, and tool use.[2][5][6]
The release includes 130,000 robot manipulation trajectories, representing 3,500 hours of real-world interaction data.
More impressively, the dataset includes highly dexterous behaviors that have traditionally confounded robotic systems, such as folding T-shirts, flattening cardboard boxes, and precisely placing AirPods into their plastic charging cases. By open-sourcing this massive corpus, XDOF aims to establish a universal baseline, allowing researchers to iterate on model designs without spending millions of dollars to build their own data collection warehouses.[1][5][6]

Generating this caliber of data requires a massive, labor-intensive operation. Existing workarounds, such as scraping YouTube videos or using low-quality footage captured by gig workers, have proven inadequate because they lack the precise spatial fidelity and control inputs needed for effective robotic learning. Instead, XDOF employs a three-tier data acquisition strategy. The primary method involves direct teleoperation, where trained human operators use specialized rigs to remotely control robotic arms, effectively demonstrating the exact physical movements the AI needs to mimic.[1][2][4]
The company's secondary and tertiary data collection tiers involve lower-cost teleoperation devices—stemming from an open-source project called GELLO—and egocentric wearable sensors that capture everyday human movements from a first-person perspective. This layered approach allows XDOF to scale its data production efficiently, transforming a bespoke research experiment into a standardized, industrial-grade infrastructure business.[1][2]

The market demand for this infrastructure is already evident. Despite operating under the radar for nearly two years, XDOF has grown to 60 employees and secured approximately 20 active customers, including several of the world's leading frontier AI laboratories. The fact that these well-funded labs are choosing to pay XDOF rather than build their own internal pipelines reveals a deliberate strategic shift. By outsourcing data collection, AI developers can keep the operational complexity of maintaining warehouses, calibrating hundreds of robots, and managing teleoperators off their balance sheets.[1][2][4]
XDOF's launch arrives at a pivotal moment for the broader technology sector. Just weeks ago, OpenAI announced the revival of its internal robotics training program—an initiative it had previously shuttered in 2021 to focus exclusively on software models. This pivot signals a growing consensus that "physical AI" is the next major frontier. With its massive open-source contribution and robust commercial pipeline, XDOF is positioning itself as the foundational utility for the incoming robotics revolution.[1][4][5]
How we got here
2021
OpenAI shuts down its initial robotics research program to focus entirely on software-based models.
October 2024
XDOF is founded by researchers from UC Berkeley, Meta, and Tesla to solve the robotics data bottleneck.
May 2026
OpenAI announces the revival of its robotics training program, signaling renewed industry focus on physical AI.
June 2026
XDOF emerges from stealth, announcing $70 million in funding and releasing the ABC-130K dataset.
Viewpoints in depth
Frontier AI Laboratories
View outsourced data infrastructure as a strategic necessity.
For the world's leading AI labs, building a massive physical data operation is a distraction from their core competency: designing neural networks. By outsourcing to companies like XDOF, these labs can keep the immense operational complexity of maintaining warehouses, calibrating hundreds of robots, and managing global teams of teleoperators off their balance sheets. This allows them to scale their robotics programs rapidly without diluting their focus on model architecture.
Academic Researchers
Celebrate the open-source ABC-130K dataset as a democratizing force.
Historically, state-of-the-art robotic systems have been developed behind closed doors in well-funded corporate labs, leaving university researchers without the resources to compete. The academic community views the release of the ABC-130K dataset—and its accompanying simulation pipelines—as a leveling of the playing field. By providing 3,500 hours of high-quality interaction data for free, researchers can now test new algorithms and architectural designs without needing millions of dollars for hardware and data collection.
Robotics Infrastructure Founders
Argue that the next defensible layer in the AI boom is deeply physical.
Founders and investors in the embodied AI space believe that the era of purely software-based AI moats is ending. They argue that the next massive value creation will happen in the physical realm. Because collecting real-world interaction data requires specialized hardware, massive physical footprints, and trained human operators, it creates a highly defensible business model that cannot be easily replicated by simply renting more cloud computing power.
What we don't know
- Which specific frontier AI laboratories make up XDOF's 20 active customers.
- How quickly the open-source ABC-130K dataset will translate into commercially viable robotic capabilities.
- Whether the cost of human teleoperation can be driven down enough to make physical data collection as scalable as web scraping.
Key terms
- Physical AI
- Artificial intelligence systems designed to operate and interact within the physical world, typically embodied in robots.
- Teleoperation
- The remote control of a machine or robot by a human operator, often used to demonstrate tasks so the robot can learn them.
- Behavior Cloning
- A machine learning method where an AI model learns to perform a task by mimicking the recorded actions of a human expert.
- Bimanual Manipulation
- The ability of a robot to use two arms or hands in coordination to perform complex tasks, such as folding clothes.
- Degrees of Freedom (DOF)
- The number of independent parameters that define the configuration or state of a mechanical system, such as the joints in a robotic arm.
Frequently asked
Why can't robots just learn from YouTube videos?
YouTube videos and gig-worker footage lack the precise, multi-dimensional spatial data and control inputs required to effectively train a robot for complex physical manipulation.
What exactly is the ABC-130K dataset?
It is the world's largest open-source dataset for bimanual robot manipulation, containing 130,000 trajectories and 3,500 hours of real-world interaction data.
Who are XDOF's customers?
While specific names are confidential, XDOF already serves about 20 customers, including several leading frontier AI laboratories racing to develop general-purpose robots.
Sources
[1]SiliconANGLEFrontier AI Developers
Robotic teleoperation data startup XDOF launches with $70M in funding
Read on SiliconANGLE →[2]AI WeeklyAI Infrastructure Providers
XDOF Lands $70M to Build Robot Training Data Pipelines
Read on AI Weekly →[3]The SaaS NewsAI Infrastructure Providers
XDOF Raises $70M in Funding
Read on The SaaS News →[4]Hyper AIFrontier AI Developers
XDOF Raises $70M to Build Data Pipelines for Robot Training
Read on Hyper AI →[5]DiggAcademic & Open-Source Community
Robotics startup XDOF raises $70M and releases ABC-130K, the largest open-source bimanual teleoperation dataset
Read on Digg →[6]ABC Bot ResearchAcademic & Open-Source Community
ABC: A fully open-source stack for manipulation with behavior cloning
Read on ABC Bot Research →
Every angle. Every day.
Get business stories with full source coverage and perspective breakdowns delivered to your inbox.







