Building Airbnb’s Next Generation Data Management Platform: Introducing Metis

**Title: The Evolution of Airbnb’s Data Catalog into a Scalable Data Warehouse Management Platform**

At Airbnb, managing and governing the complex ecosystem of data assets is crucial for driving business insights and improving products. The Data Management team, driven by the mission to empower the company to handle its data at scale, developed Metis, a platform for managing and governing data assets. This article will explore how Airbnb evolved its data catalog into a comprehensive platform and discuss the key components of Metis.

**Democratizing Data with Dataportal: Airbnb’s First Step Towards Empowering Users**

Airbnb’s journey towards efficient data management began with Dataportal, a tool designed to democratize data. Dataportal was ahead of its time, enabling data users to easily find trusted data assets, increasing productivity across the company.

**Advancing Data Reliability and Compliance with Apache Atlas**

As the importance of data reliability and compliance grew, Airbnb recognized the need for a more detailed understanding of how data was transformed. This led to the adoption of Apache Atlas as the data lineage solution. Apache Atlas facilitated the development of products like SLA Tracker, which combined landing time metadata and lineage to diagnose upstream data delays.

**Expanding Metadata Requirements and the Need for a Data Catalog**

As Airbnb’s metadata requirements expanded to areas such as cost management and data quality, the need for a comprehensive data catalog became evident. The data catalog aimed to govern both the data and metadata, provide recommendations for improving data quality, and offer auditability for debugging and governance purposes.

**Building Metis: The One-Stop-Shop for Data Metadata**

To address the growing needs for metadata management, Airbnb developed Metis, a platform incorporating three core products: Dataportal, Unified Metadata Service (UMS), and Lineage Service. Metis allows Airbnb to efficiently manage millions of data assets across different domains.

**Metis Architecture: A Comprehensive Solution for Data Metadata Management**

Metis is composed of various components that work seamlessly to provide effective data metadata management:

1. Dataportal: Serving as a catalog and management UI, Dataportal offers a flexible framework for data management and governance workflows. It utilizes React and TypeScript to deliver an intuitive user experience. The frontend communicates with UMS and other services through a GraphQL API, ensuring optimal performance.

2. Unified Metadata Service (UMS): UMS acts as the backbone of Metis, providing a centralized schema and GraphQL API to access metadata. It also facilitates the integration of siloed metadata, reducing the need for multiple integrations. UMS supports various metadata integrations and plays a crucial role in data system proxying, data governance, and managing critical business metadata.

3. Lineage Service: Built on Apache Atlas, the Lineage Service handles the data lineage requirements of Airbnb’s Data Warehouse. With extensive customization and tuning, Atlas efficiently manages the large-scale lineage graph at Airbnb, allowing for parallelism, storage optimization, and improved accessibility.

**Enhancing the Dataportal Experience: Search and Governance functionalities**

The Dataportal serves as the go-to interface for data catalog users at Airbnb. The search and discovery experience within Dataportal focuses on displaying relevant metadata directly in search results, empowering users to find the exact asset they need. High-quality and commonly used data assets are prioritized in the search results, ensuring easy access. Once an asset is located, users can perform various consumption, management, and governance actions within the Entity Page. This includes tagging columns containing personal data, managing documentation, and ensuring data quality through reviews.

**Unified Metadata Service: Centralized Management and Integration**

UMS acts as the centralized metadata management platform, unifying various metadata providers and consumers. By reducing integration points, UMS simplifies the integration process, ensuring compliance and governance requirements are met. It supports proxying read requests to different data systems, centralizes critical business metadata, and manages indexes in an Elasticsearch cluster to power data discovery. UMS leverages Airbnb’s tech stack, allowing for ingesting metadata through various mechanisms, including stream processing, ETL jobs, and direct API calls.

**Apache Atlas as the Core of Lineage Management**

Airbnb relies on Apache Atlas as its data lineage solution, with a custom implementation tailored to handle the scale of the Data Warehouse. Scaling strategies, code efficiency optimizations, and storage system tuning were implemented to facilitate efficient lineage management. Atlas’s lineage-related components allow efficient access to lineage data, enabling Airbnb to track data transformations effectively.

In conclusion, Airbnb’s evolution from a data catalog to the Metis platform demonstrates the company’s commitment to managing and governing its data assets at scale. By leveraging components like Dataportal, UMS, and Lineage Service, Airbnb can effectively manage millions of data assets, ensuring data reliability, compliance, and high-quality insights for business growth.

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

An Expert’s Comprehensive Overview: Unveiling Amazon’s Extensive Collection of 65+ Papers at the ACL Conference

Introducing NevronAI/metisfl: Seamlessly Federate Machine Learning Workflows with MetisFL – Unlocking Distributed Data Silos without Compromising Privacy Are you seeking an effective solution to federate your machine learning workflows and efficiently train models across multiple distributed data silos? Look no further than MetisFL, a cutting-edge federated learning framework developed by NevronAI/metisfl. Operating without the need for centralized data collection, this advanced framework underscores scalability, speed, and resiliency. Harnessing the power of C++ programming, MetisFL empowers developers to effortlessly unite diverse data sources while ensuring privacy and security. Gone are the days of laborious data consolidation; instead, MetisFL excels in leveraging distributed repositories, amplifying efficiency without compromising data integrity. Easily integrate MetisFL into your workflow and embark on a federated learning journey like never before. With its state-of-the-art capabilities, this framework facilitates seamless collaboration between machine learning models and distributed data silos. Savor the benefits of decentralized data privacy while capitalizing on MetisFL’s unmatched scalability, ultimate velocity, and rock-solid stability. In the dynamic landscape of AI and data-driven applications, NevronAI/metisfl’s MetisFL stands tall as an indispensable asset for forward-thinking developers. Empower your machine learning endeavors with this robust, English-fluent solution, epitomizing excellence in both SEO and high-end copywriting.