Exploring Real-World Object Navigation: A Machine Learning Blog by ML@CMU

**Empirical Study: Evaluating Approaches for Robots to Navigate to Objects in Real Homes**

**TLDR: The Importance of Semantic Navigation in Uncontrolled Environments**

In order to deploy mobile robots in uncontrolled environments such as homes, schools, and hospitals, semantic navigation is necessary. Many learning-based approaches have been proposed to address the lack of semantic understanding in the classical pipeline for spatial navigation. However, these learned visual navigation policies have mostly been evaluated in simulated environments. This study aims to evaluate the performance of different methods on real-world robots.

**Object Goal Navigation: Finding Objects in Unseen Environments**

The study focuses on the Object Goal navigation task, where a robot is placed in a completely new environment and must find a specific object category, like a toilet. Equipped with only a first-person RGB and depth camera and a pose sensor, the robot faces various challenges in spatial and semantic understanding, as well as learning exploration priors.

**Methods: Classical, End-to-End Learning, and Modular Learning Approaches**

The classical approach involves building a geometric map using depth sensors, exploring the environment with a heuristic (e.g., frontier exploration), and using an analytical planner to reach exploration goals and the target object. The end-to-end learning approach predicts actions directly from raw observations using a deep neural network. On the other hand, the modular learning approach builds a semantic map based on predicted semantic segmentation with depth information, determines an exploration goal using a semantic policy, and plans a path towards it.

**Large-Scale Real-World Empirical Evaluation**

To address the limitations of simulation-based evaluations, this study conducts a large-scale empirical evaluation of these navigation approaches in six real homes with six different goal object categories. The aim is to compare their performance in terms of success rate and path efficiency.

**Results: Modular Learning Outperforms Other Approaches**

The study reveals that modular learning is highly reliable, achieving a 90% success rate in real-world environments. Furthermore, modular learning demonstrates more efficient exploration compared to the classical approach, improving the success rate by 10%. In contrast, the end-to-end learning approach fails to transfer successfully to the real world, achieving only a 23% success rate.

**Analysis: Insights into Modular Learning’s Successful Transfer**

The study investigates why modular learning is able to transfer effectively, while end-to-end learning struggles. By reconstructing a real-world home in a simulation, researchers conduct experiments with identical episodes in both simulated and real environments. The results show that the semantic map space remains invariant between simulation and reality, while the image space exhibits a significant domain gap. This domain gap leads to performance issues, such as a segmentation model predicting false positives (e.g., a bed in the kitchen). Due to the semantic map’s domain invariance, modular learning is able to transfer well, while the image domain gap hinders the performance of end-to-end learning.


In conclusion, this empirical study highlights the importance of semantic navigation in uncontrolled environments and showcases the effectiveness of modular learning for navigating robots to objects. The study also identifies the challenges faced by end-to-end learning approaches, emphasizing the need for improvements in simulators to bridge the gap between simulation and the real world. Overall, modular learning proves to be a reliable and efficient approach for object goal navigation in diverse real homes.

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

Unveiling Lesser-Known Cross-Browser DevTools Features | CSS-Tricks: A Skilled SEO & Expert Copywriter’s Take

Introducing Sequential Testing in Longitudinal Data Experiments: Addressing the Peeking Problem 2.0