Building Modern Data Applications Using Databricks Lakehouse

This is the code repository for Building Modern Data Applications Using Databricks Lakehouse, published by Packt.

Develop, optimize, and monitor data pipelines on Databricks

What is this book about?

Learn the latest Databricks features, with up-to-date insights into the platform. This book will develop your skills to build scalable and secure data pipelines to ingest, transform, and deliver timely, accurate data to drive business decisions.

This book covers the following exciting features:

Deploy near-real-time data pipelines in Databricks using Delta Live Tables
Orchestrate data pipelines using Databricks workflows
Implement data validation policies and monitor/quarantine bad data
Apply slowly changing dimensions (SCD), Type 1 and 2, data to lakehouse tables
Secure data access across different groups and users using Unity Catalog
Automate continuous data pipeline deployment by integrating Git with build tools such as Terraform and Databricks Asset Bundles

If you feel this book is for you, get your copy today!

Disclaimer: Educational Purposes Only

This book and the associated code are intended solely for educational purposes. The examples and pipelines demonstrated are not to be used in production environments without obtaining the necessary licenses from Databricks, Inc., and signing a Master Cloud Services Agreement (MCSA) with Databricks for production use of Databricks Services, including the 'dbldatagen' library. Refer to the license here: License.

Instructions and Navigations

All of the code is organized into folders. For example, chapter01.

The code will look like the following:

@dlt.table(
    name="random_trip_data_raw",
    comment="The raw taxi trip data ingested from a landing zone.",
    table_properties={
        "quality": "bronze"
    }
)

Following is what you need for this book: This book is for data engineers looking to streamline data ingestion, transformation, and orchestration tasks. Data analysts responsible for managing and processing lakehouse data for analysis, reporting, and visualization will also find this book beneficial. Additionally, DataOps/DevOps engineers will find this book helpful for automating the testing and deployment of data pipelines, optimizing table tasks, and tracking data lineage within the lakehouse. Beginner-level knowledge of Apache Spark and Python is needed to make the most out of this book.

To get the most out of this book

While not a mandatory requirement, to get the most out of this book, it’s recommended that you have beginner-level knowledge of Python and Apache Spark, and at least some knowledge of navigating around the Databricks Data Intelligence Platform. It’s also recommended to have the following dependencies installed locally in order to follow along with the hands-on exercises and code examples throughout the book:(Chapter 1-10).

Software and Hardware List

Chapter	Software required	OS required
1-10	Python 3.6+	Windows, macOS, or Linux
1-10	Databricks CLI 0.205+	Windows, macOS, or Linux

Furthermore, it’s recommended that you have a Databricks account and workspace to log in, import notebooks, create clusters, and create new data pipelines. If you do not have a Databricks account, you can sign up for a free trial on the Databricks website.

Related products

Databricks Certified Associate Developer for Apache Spark Using Python [Packt] [Amazon]
Machine Learning Security Principles [Packt] [Amazon]

Get to Know the Author

Will Girten is a lead specialist solutions architect who joined Databricks in early 2019. With over a decade of experience in data and AI, Will has worked in various business verticals, from healthcare to government and financial services. Will’s primary focus has been helping enterprises implement data warehousing strategies for the lakehouse and performance-tuning BI dashboards, reports, and queries. Will is a certified Databricks Data Engineering Professional and Databricks Machine Learning Professional. He holds a Bachelor of Science in computer engineering from the University of Delaware.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
chapter01		chapter01
chapter02		chapter02
chapter03		chapter03
chapter04		chapter04
chapter05		chapter05
chapter06		chapter06
chapter07		chapter07
chapter08		chapter08
chapter09		chapter09
chapter10		chapter10
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building Modern Data Applications Using Databricks Lakehouse

What is this book about?

Disclaimer: Educational Purposes Only

Instructions and Navigations

To get the most out of this book

Software and Hardware List

Related products

Get to Know the Author

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Building Modern Data Applications Using Databricks Lakehouse

What is this book about?

Disclaimer: Educational Purposes Only

Instructions and Navigations

To get the most out of this book

Software and Hardware List

Related products

Get to Know the Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages