Skip to content

mgaur10/dlp-demo-bundle

Repository files navigation

Cloud DLP Demo Bundle

This repository contains a Terraform-based deployment bundle designed to showcase various Google Cloud Data Loss Prevention (DLP) use cases. It automates the provisioning of infrastructure to demonstrate DLP capabilities such as API inspection, automated storage classification, BigQuery masking/tokenization, and PDF redaction.

⚠️ Disclaimer: This code is intended for demonstration purposes only and is not meant for production workloads.

📋 Overview

The Terraform script deploys a folder containing five distinct Google Cloud projects, each demonstrating a specific DLP capability:

Project Name Description Key Tech Stack
1. DLP API Calls Deploys a Node.js app on a Compute Engine instance to demonstrate direct DLP API calls for string/file inspection and de-identification. Compute Engine, Node.js, DLP API
2. DLP Auto GCS Classification Automates data classification. Files uploaded to a "Quarantine" bucket are scanned; sensitive files go to a secure bucket, non-sensitive to another. Cloud Functions, Pub/Sub, GCS, DLP API
3. DLP BigQuery UDF Demonstrates Remote Functions (UDF) in BigQuery to de-identify, mask, and re-identify data dynamically using SQL queries. BigQuery, Cloud Functions, KMS
4. DLP BQ Findings Export Scans a BigQuery dataset and exports findings to Security Command Center (SCC) and Dataplex. (Module disabled by default). BigQuery, Eventarc, SCC, Dataplex
5. DLP PDF Redaction A serverless pipeline that automatically redacts sensitive information (PII) from uploaded PDF files. Workflows, Cloud Run, Cloud Functions, GCS

🏗️ Deployed Architecture

Each module deploys its own isolated environment. Below is a high-level summary of the workflows:

  • API Inspection: Users SSH into a VM to run scripts that send payload data to the DLP API.
  • GCS Classification: Upload -> Quarantine Bucket -> Trigger Function -> DLP Scan -> Move to Target Bucket.
  • BigQuery UDF: SQL Query -> Remote Function -> DLP Processor -> Return Masked/Tokenized Data.
  • PDF Redaction: Upload PDF -> Trigger Workflow -> Split Pages -> Redact Images (DLP) -> Merge PDF -> Save Output.

🛠️ Prerequisites

Before deploying, ensure you have the following:

  1. Google Cloud Project/Organization: Access to a Google Cloud Organization (or a demo environment like Argolis).
  2. IAM Roles: You must have the following roles assigned to your user:
    • Billing Account User
    • Folder Creator
    • Organization Role Viewer
    • Project Creator
    • Billing User
  3. Tools:

🚀 Deployment Guide

1. Clone the Repository

Open Cloud Shell or your terminal and clone this repository:

git clone [https://github.com/mgaur10/dlp-demo-bundle.git](https://github.com/mgaur10/dlp-demo-bundle.git)
cd dlp-demo-bundle

2. Configure Variables

Navigate to the bundle folder and edit the terraform.tfvars file. You must update the following values to match your environment:

terraform.tfvars

organization_id = "YOUR_ORG_ID"
billing_account = "YOUR_BILLING_ACCOUNT_ID"

3. Apply Terraform

Run the following commands to provision the resources.

terraform init
terraform plan
terraform apply

4. Save Outputs

Upon successful completion, Terraform will display a list of Outputs (green text). Copy these outputs; they contain the project IDs, bucket names, and commands you will need for the demos.

🧪 Demo Walkthroughs

Project 1: DLP API Calls

Goal: Inspect and redact strings/files via command line.

Go to Compute Engine in the DLP API Calls project.

SSH into dlp-demo-server using the command provided in the Terraform output (_module_dlp_api_02_iap_ssh_tunnel...).

Run the sample scripts (found in Terraform outputs):

Inspect Text: node /tmp/nodejs-dlp/samples/inspectString.js ...

Inspect File: node /tmp/nodejs-dlp/samples/inspectFile.js ...

Masking: node /tmp/nodejs-dlp/samples/deidentifyWithMask.js ...

Redaction: node /tmp/nodejs-dlp/samples/redactText.js ...

Project 2: Auto GCS Classification

Goal: Upload files and watch them get sorted based on sensitivity.

Locate the Quarantine Bucket (dlp-demo-qa-xxxx) from the outputs.

Upload the sample data using the provided gsutil command output.

gsutil -m cp sample_data/*sample* gs://dlp-demo-qa-touk

Check the Sensitive Bucket (dlp-demo-sens-xxxx) and Non-Sensitive Bucket (dlp-demo-nonsens-xxxx). The files will automatically move to the correct bucket based on their content.

Project 3: BigQuery UDF (Tokenization)

Goal: Use SQL to mask and re-identify PII.

Navigate to BigQuery in the DLP BigQuery UDF project.

Open the clear-data table to see the raw data.

De-identify (Tokenize): Run the query provided in the output (_module_dlp_bigquery_udf_02...). It will replace SSNs with tokens.

Save the result as a new table named udf-deid.

Re-identify (Decrypt): Run the query from output _module_dlp_bigquery_udf_03... against the udf-deid table to retrieve the original SSNs.

Masking: Run the query from output _module_dlp_bigquery_udf_01... to mask Credit Card numbers with asterisks.

Project 4: BQ Findings Export (Optional)

Note: Ensure this module is enabled in base.tf if you wish to use it.

This project scans a BQ dataset upon job completion.

View findings in Security Command Center under the "Data Loss Prevention" source.

View data lineage and tag counts in Dataplex.

Project 5: PDF Redaction

Goal: Upload a PDF resume and get a redacted copy.

Locate the Input Bucket (pdf-input-bucket-xxxx) in the DLP PDF Redaction project.

Upload a sample PDF (e.g., test_file.pdf) using the provided gsutil command.

Wait for the Cloud Workflow to finish.

Check the Output Bucket (pdf-output-bucket-xxxx) for test_file-redacted.pdf.

Open the redacted PDF to see PII (like Names, Phones) blacked out.

🧹 Clean Up To avoid incurring ongoing charges, destroy the resources once you are done with the demo:

terraform destroy

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors