Home Download Contact Us Conferences

News and updates

What is ARMBench?

ARMBench is a large-scale benchmark dataset for perception and manipulation challenges across Amazon Robotics systems. The datasets are collected in Amazon warehouses and capture a wide variety of objects and configurations. They comprise images, videos, and tabular data for different stages of robotic manipulation with high-quality annotations.

ARMBench currently includes datasets from two Amazon Robotics programs:

Sparrow - Amazon's intelligent robotic system that streamlines the fulfillment process by moving individual products before they get packaged. Sparrow is a robotic system in Amazon warehouses that can detect, select, and handle individual products in inventory using computer vision and artificial intelligence. It focuses on object segmentation, identification, and defect detection, enabling more efficient and safer operations while allowing employees to focus on higher-value tasks.

Vulcan Stow - Amazon's first robot with a sense of touch, built on advances in robotics, engineering, and physical AI. Vulcan can manipulate objects within fabric-covered pods that are divided into compartments, each holding up to 10 items. Using specialized end-of-arm tooling and force feedback sensors, Vulcan can detect when it makes contact with objects and apply appropriate force, making stowing operations more efficient and safer while achieving human levels of packing density and speed.

Currently, the dataset provides data annotations for four main tasks across these robotic systems.

Sparrow Tasks

Object Segmentation
Object Segmentation

450,000+ high-quality labels for object segments on 50,000+ images. Clutter and variety of objects present a novel challenge for instance segmentation algorithms.

Object Identification
Object Identification

Open-set object recognition challenge with 200,000+ unique objects. Benchmark will evaluate image retrieval and classification methods with uncertainty estimation.

Defect Detection
Defect Detection

19,000+ images and 4,000+ videos for rare, but costly, robot-induced defects such as multi-pick and packaging defects. Dataset contains 100,000+ no defect cases.

Vulcan Tasks

Stow Success Prediction
Stow Success Prediction

Complete dataset of 72,000+ stow cycles with bin images, item images, and tabular data. Predict binary outcomes (success/failure) and space creation in fabric-covered pods.


Research Papers

ARMBench: An object-centric benchmark dataset for robotic manipulation

Stow: Robotic Packing of Items into Fabric Pods

Supplementary Videos

ARMBench: Object-centric Benchmark Dataset

Stow: Robotic Packing of Items into Fabric Pods

Credits

These datasets are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). Use the following citations if you use these datasets for publication.

Sparrow Dataset:

@article{mitash2023armbench, title = {ARMBench: An object-centric benchmark dataset for robotic manipulation}, author = {Mitash, Chaitanya and Wang, Fan and Lu, Shiyang and Terhuja, Vikedo, and Garaas, Tyler and Polido, Felipe and Nambi, Manikantan}, journal = {arXiv preprint arXiv:2303.16382}, year = {2023} }

Vulcan Stow Dataset:

@article{hudson2025stow, title = {Stow: Robotic Packing of Items into Fabric Pods}, author = {Hudson, Nicolas and Hooks, Josh and Warrier, Rahul and Salisbury, Curt and Hartley, Ross and Kumar, Kislay and Chandrashekhar, Bhavana and Birkmeyer, Paul and Tang, Bosch and Frost, Matt and Thakar, Shantanu and Piaskowy, Tony and Nilsson, Petter and Petersen, Josh and Doshi, Neel and Slatter, Alan and Bhatia, Ankit and Meeker, Cassie and Xue, Yuechuan and Cox, Dylan and Kyriazis, Alex and Lou, Bai and Hasan, Nadeem and Rana, Asif and Chacko, Nikhil and Xu, Ruinian and Faal, Siamak and Seraj, Esi and Agrawal, Mudit and Jamieson, Kevin and Bisagni, Alessio and Samzun, Valerie and Fuller, Christine and Keklak, Alex and Frenkel, Alex and Ratliff, Lillian and Parness, Aaron}, journal = {arXiv preprint arXiv:2505.04572}, year = {2025} }