Predicting Object Interactions with Behavior Primitives:
An Application in Stowing Tasks

CoRL 2023 (Oral)

Finalist - Best Paper/Best Student Paper Awards

University of Illinois Urbana-Champaign,
(* indicate equal contribution)


Abstract

Stowing, the task of placing objects in cluttered shelves or bins, is a common task in warehouse and manufacturing operations. However, this task is still predominantly carried out by human workers as stowing is challenging to automate due to the complex multi-object interactions and long-horizon nature of the task. Previous works typically involve extensive data collection and costly human labeling of semantic priors across diverse object categories. This paper presents a method to learn a generalizable robot stowing policy from predictive model of object interactions and a single demonstration with behavior primitives. We propose a novel framework that utilizes Graph Neural Networks to predict object interactions within the parameter space of behavioral primitives. We further employ primitive-augmented trajectory optimization to search the parameters of a predefined library of heterogeneous behavioral primitives to instantiate the control action. Our framework enables robots to proficiently execute long-horizon stowing tasks with a few keyframes (3-4) from a single demonstration. Despite being solely trained in a simulation, our framework demonstrates remarkable generalization capabilities. It efficiently adapts to a broad spectrum of real-world conditions, including various shelf widths, fluctuating quantities of objects, and objects with diverse attributes such as sizes and shapes.



Video

Qualitititave Results

The stowing task can be accomplished by 3 skills: (sweeping, pushing, transporting).

Nominal setup

With heavy and trapezoid objects

With tiny object

With tiny object 2

With bottle to be grasped

With slippery roller

Small shelf setup

With deformable objects

Method Overview



Overview of the proposed framework. (a) A particle-based representation characterizes the object state. The object state's predicted outcome following the executed robot actions is computed alongside the ground truth object state using the MSE loss function to train the GNN. (b) For each skill, we apply random shooting to sample parameters within the action parameter space, utilizing the GNN to predict object movement. We then select the action that brings us closest to the desired state. Each skill is executed in sequence.

BibTeX

@inproceedings{
        chen2023predicting,
        title={Predicting Object Interactions with Behavior Primitives: An Application in Stowing Tasks},
        author={Haonan Chen and Yilong Niu and Kaiwen Hong and Shuijing Liu and Yixuan Wang and Yunzhu Li and Katherine Rose Driggs-Campbell},
        booktitle={7th Annual Conference on Robot Learning},
        year={2023},
        url={https://openreview.net/forum?id=VH6WIPF4Sj}       
    }