in Tech

“Enhancing Privacy in Machine Learning through PII Masking”

1.2k Views

How Grab’s Data Streaming Team Protects User Privacy with PII Masking

Introduction
Grab’s data engineers play a crucial role in designing and building machine learning models that provide strategic insights using the data that flows through the Grab Platform. In order to refine these models and ensure they work effectively in production, data engineers require access to actual production data. However, data engineers must be prevented from accessing any Personally Identifiable Information (PII) of the users for maintaining privacy.

PII Tagging
Grab leverages the Protocol Buffers (protobuf) data format to structure in-transit data. When creating a new stream, developers must describe its fields in a protobuf schema that is then used for serialising the data wherever it is sent over the wire, and deserialising it wherever it is consumed. Here, the developers must tag fields containing PII with a PII label like PII_TYPE_NAME. The passengerName field is an example of PII, which must be flagged accordingly.

CI Pipeline
A Continuous Integration (CI) pipeline ensures that all PII fields are correctly tagged. Developers need to publish the schema of their new stream into Coban’s Git repository. The CI pipeline runs an in-house Python script to scan each variable name of the committed schema and test it against an extensive list of PII keywords. If there is a match and the variable is not tagged with the expected PII label, the pipeline fails. Approval from the Coban team is required for updating the whitelist.

Production Environment
Data streaming at Grab uses Kafka to produce and consume data. The production environment is where user-generated data is produced by interacting with the Grab superapp. The booking service generates Kafka records and produces them for other services to consume. Machine learning pipelines are among the consuming services that require PII data. As access to the production environment is highly restricted and monitored, PII is not masked in this process.

PII Masking
Data engineers are not granted access to the production environment, instead, they access a staging environment where PII is masked. An in-house Flink application residing in the production environment performs PII masking. It consumes the original data as a regular Kafka consumer and dynamically masks the PII based on the PII tags of the schema. The sanitised data produced at the Kafka cluster is then consumed by the staging machine learning pipelines. The Kafka cluster in the staging environment is secured with authorisation and authentication.

Conclusion
Grab’s data streaming team (Coban) enforces PII masking on machine learning data streaming pipelines to ensure the security and privacy of users while enabling data engineers to refine their models with sanitised production data. The CI pipeline verifies that all fields describing PII are correctly tagged, and the in-house Flink application dynamically masks the PII. Grab’s mature privacy programme ensures that users’ data are well-protected against any human errors.

“Enhancing Privacy in Machine Learning through PII Masking”

Ezoic Earnings: Report on Income from Niche Sites in May 2024

Attract Free Traffic to Your Links, Website, and Affiliate Marketing in 2024

Starting a Profitable Affiliate Marketing Business in 7 Days Using A.I.

Introduction to Affiliate Marketing Trends: Part 1

Creating a Free Affiliate Marketing Website with AI

iRobot’s Revolutionary Roomba j7+ with Poop Detection Available at Unbeatable Price!

Examining the mechanisms of server-side rendering and hydration in Gatsby and Next

Samsung launches the highly anticipated One UI 6 beta program, welcoming users to immerse themselves in the cutting-edge interface

Detecting new fraudulent behaviors through unsupervised graph anomaly detection

Xsolla Unveils Exciting Collaborations to Empower Game Developers and Unveils Tokyo Expansion

Enhanced Streaming Experience for Sidekick Users

Leave a ReplyCancel reply

Tour of Pearl Garden in Om Nagar, Vasai West

Watch the detailed tutorial on investing in UAP Old Mutual Unit Trust Fund now!

GenAfrica Asset Managers: Our Portfolio

Assessing Vulnerabilities of 5G Networks: An In-depth Field Campaign | MIT News

Gabriel Davidescu, UTI Construction and Facility Management, unveils all about Brașov Airport

iRobot’s Revolutionary Roomba j7+ with Poop Detection Available at Unbeatable Price!

Ezoic Earnings: Report on Income from Niche Sites in May 2024

Attract Free Traffic to Your Links, Website, and Affiliate Marketing in 2024

Starting a Profitable Affiliate Marketing Business in 7 Days Using A.I.

Introduction to Affiliate Marketing Trends: Part 1

Creating a Free Affiliate Marketing Website with AI

Traffic source that is free for affiliate marketing and websites in 2024 by Anup Gutta.

Download the free book on GetBigCommissions.Com. For high-quality lead magnets.

Demo of the UpTik Affiliate Outreach Bot for TikTok Shop Live with a Comprehensive Update Overview and a 2-Day Trial Offer

Building a Profitable Affiliate Marketing Funnel on Pinterest

Ezoic Earnings: Report on Income from Niche Sites in May 2024

Attract Free Traffic to Your Links, Website, and Affiliate Marketing in 2024

Starting a Profitable Affiliate Marketing Business in 7 Days Using A.I.

Introduction to Affiliate Marketing Trends: Part 1

Creating a Free Affiliate Marketing Website with AI

Traffic source that is free for affiliate marketing and websites in 2024 by Anup Gutta.

Download the free book on GetBigCommissions.Com. For high-quality lead magnets.

for Autonomous Driving: Significance and Future Prospects “Revolutionizing Autonomous Driving Through Modular Deep Learning: Importance and Prospects Ahead”

“Discovering Product/Market Fit: A Comprehensive Guide for Modern Product Builders”

Leave a ReplyCancel reply

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Hold on! Before you go away...