**Title: The Evolution of Airbnb’s Data Catalog into a Scalable Data Warehouse Management Platform**
At Airbnb, managing and governing the complex ecosystem of data assets is crucial for driving business insights and improving products. The Data Management team, driven by the mission to empower the company to handle its data at scale, developed Metis, a platform for managing and governing data assets. This article will explore how Airbnb evolved its data catalog into a comprehensive platform and discuss the key components of Metis.
**Democratizing Data with Dataportal: Airbnb’s First Step Towards Empowering Users**
Airbnb’s journey towards efficient data management began with Dataportal, a tool designed to democratize data. Dataportal was ahead of its time, enabling data users to easily find trusted data assets, increasing productivity across the company.
**Advancing Data Reliability and Compliance with Apache Atlas**
As the importance of data reliability and compliance grew, Airbnb recognized the need for a more detailed understanding of how data was transformed. This led to the adoption of Apache Atlas as the data lineage solution. Apache Atlas facilitated the development of products like SLA Tracker, which combined landing time metadata and lineage to diagnose upstream data delays.
**Expanding Metadata Requirements and the Need for a Data Catalog**
As Airbnb’s metadata requirements expanded to areas such as cost management and data quality, the need for a comprehensive data catalog became evident. The data catalog aimed to govern both the data and metadata, provide recommendations for improving data quality, and offer auditability for debugging and governance purposes.
**Building Metis: The One-Stop-Shop for Data Metadata**
To address the growing needs for metadata management, Airbnb developed Metis, a platform incorporating three core products: Dataportal, Unified Metadata Service (UMS), and Lineage Service. Metis allows Airbnb to efficiently manage millions of data assets across different domains.
**Metis Architecture: A Comprehensive Solution for Data Metadata Management**
Metis is composed of various components that work seamlessly to provide effective data metadata management:
1. Dataportal: Serving as a catalog and management UI, Dataportal offers a flexible framework for data management and governance workflows. It utilizes React and TypeScript to deliver an intuitive user experience. The frontend communicates with UMS and other services through a GraphQL API, ensuring optimal performance.
2. Unified Metadata Service (UMS): UMS acts as the backbone of Metis, providing a centralized schema and GraphQL API to access metadata. It also facilitates the integration of siloed metadata, reducing the need for multiple integrations. UMS supports various metadata integrations and plays a crucial role in data system proxying, data governance, and managing critical business metadata.
3. Lineage Service: Built on Apache Atlas, the Lineage Service handles the data lineage requirements of Airbnb’s Data Warehouse. With extensive customization and tuning, Atlas efficiently manages the large-scale lineage graph at Airbnb, allowing for parallelism, storage optimization, and improved accessibility.
**Enhancing the Dataportal Experience: Search and Governance functionalities**
The Dataportal serves as the go-to interface for data catalog users at Airbnb. The search and discovery experience within Dataportal focuses on displaying relevant metadata directly in search results, empowering users to find the exact asset they need. High-quality and commonly used data assets are prioritized in the search results, ensuring easy access. Once an asset is located, users can perform various consumption, management, and governance actions within the Entity Page. This includes tagging columns containing personal data, managing documentation, and ensuring data quality through reviews.
**Unified Metadata Service: Centralized Management and Integration**
UMS acts as the centralized metadata management platform, unifying various metadata providers and consumers. By reducing integration points, UMS simplifies the integration process, ensuring compliance and governance requirements are met. It supports proxying read requests to different data systems, centralizes critical business metadata, and manages indexes in an Elasticsearch cluster to power data discovery. UMS leverages Airbnb’s tech stack, allowing for ingesting metadata through various mechanisms, including stream processing, ETL jobs, and direct API calls.
**Apache Atlas as the Core of Lineage Management**
Airbnb relies on Apache Atlas as its data lineage solution, with a custom implementation tailored to handle the scale of the Data Warehouse. Scaling strategies, code efficiency optimizations, and storage system tuning were implemented to facilitate efficient lineage management. Atlas’s lineage-related components allow efficient access to lineage data, enabling Airbnb to track data transformations effectively.
In conclusion, Airbnb’s evolution from a data catalog to the Metis platform demonstrates the company’s commitment to managing and governing its data assets at scale. By leveraging components like Dataportal, UMS, and Lineage Service, Airbnb can effectively manage millions of data assets, ensuring data reliability, compliance, and high-quality insights for business growth.