This repository contains tools and utilities for Azure Databricks, organized into two main categories: Docker configurations and Network diagnostics.
Databricks/
βββ docker/ # Docker configurations for Databricks environments
βββ network_analysis/ # Network diagnostic tools for Azure Databricks
Docker configurations and custom images for Azure Databricks
This folder contains various Docker configurations for creating custom Databricks cluster images with pre-installed packages and configurations.
| Folder | Description | Base Image |
|---|---|---|
R/ |
R-based environments | R runtime |
alphine/ |
Alpine Linux minimal images | Alpine |
min20/ |
Minimal Ubuntu 20.04 setup | Ubuntu 20.04 |
python env/ |
Python environment configurations | Python |
rbase/ |
R base configurations | R |
rbase-std/ |
R standard configurations | R |
std20/ |
Standard Ubuntu 20.04 images | Ubuntu 20.04 |
- Custom cluster images with pre-installed libraries
- Specialized runtime environments (R, Python, etc.)
- Minimal images for optimized performance
- Standard enterprise configurations
cd docker/<your-folder>
docker build -t your-image-name .π Documentation: See individual folders for specific Dockerfiles and instructions.
Comprehensive network diagnostic suite for Azure Databricks
Modular scripts for troubleshooting Azure Databricks networking issues including Private Link, VNet injection, DNS, NSGs, and serverless connectivity.
- Main Guide: network_analysis/README.md
- Databricks Scripts: network_analysis/databricks_notebooks/
- Azure CLI Scripts: network_analysis/azure_cli_scripts/
-
Private Link Diagnostics (
01_private_link_diagnostics.py)- Comprehensive Private Link connectivity testing
- DNS resolution validation (private vs public)
- TCP connectivity and reliability testing
- β±οΈ ~2 minutes
-
DNS Diagnostics (
02_dns_diagnostics.py)- Workspace, control plane, and storage DNS
- Private endpoint detection
- β±οΈ ~30 seconds
-
Serverless Diagnostics (
03_serverless_diagnostics.py)- Serverless compute networking
- Storage connectivity and egress validation
- β±οΈ ~1 minute
-
Private Link Validation (
01_private_link_validation.sh)- Azure VM-based connectivity validation
- Multi-tool DNS testing
- Infrastructure comparison
-
VNet/NSG Validation (
02_classic_compute_vnet_nsg.sh)- VNet injection configuration
- NSG rules validation
- Subnet delegation checks
Test from Databricks Notebook:
# Example: DNS diagnostics
import requests
exec(requests.get("https://raw.githubusercontent.com/prabakar2610/Databricks/master/network_analysis/databricks_notebooks/02_dns_diagnostics.py").text)Test from Azure CLI:
# Download and run VNet validation
curl -o vnet_check.sh https://raw.githubusercontent.com/prabakar2610/Databricks/master/network_analysis/azure_cli_scripts/02_classic_compute_vnet_nsg.sh
chmod +x vnet_check.sh && ./vnet_check.sh- β Can't connect to internal API β Use Private Link diagnostics
- β DNS not resolving β Use DNS diagnostics
- β Serverless storage issues β Use Serverless diagnostics
- β VNet injection problems β Use VNet/NSG validation
- β Need infrastructure comparison β Use both Databricks + Azure CLI scripts
| Task | Go To |
|---|---|
| Create custom Databricks cluster image | docker/ |
| Add pre-installed packages to clusters | docker/ |
| Test Private Link connectivity | network_analysis/databricks_notebooks/01_private_link_diagnostics.py |
| Validate DNS configuration | network_analysis/databricks_notebooks/02_dns_diagnostics.py |
| Troubleshoot serverless networking | network_analysis/databricks_notebooks/03_serverless_diagnostics.py |
| Check VNet/NSG configuration | network_analysis/azure_cli_scripts/02_classic_compute_vnet_nsg.sh |
| Compare Azure VM vs Databricks connectivity | Use both folders in network_analysis/ |
- See individual folders for Dockerfile configurations
- Base images and package lists documented in each folder
- Complete Guide: network_analysis/README.md
- Databricks Scripts Guide: network_analysis/databricks_notebooks/README.md
- Azure CLI Scripts Guide: network_analysis/azure_cli_scripts/README.md
- Troubleshooting Flowchart: network_analysis/docs/TROUBLESHOOTING_FLOWCHART.md
- Quick Reference: network_analysis/docs/CHEAT_SHEET.txt
- Browse
docker/for appropriate base configuration - Customize Dockerfile with your requirements
- Build and push to your container registry
- Configure Databricks cluster to use custom image
- Run DNS diagnostics (
network_analysis/databricks_notebooks/02_dns_diagnostics.py) - If DNS shows public IP, run Private Link diagnostics (
01_private_link_diagnostics.py) - Compare with Azure VM validation (
azure_cli_scripts/01_private_link_validation.sh) - Follow troubleshooting guide for specific issues
- Run VNet/NSG validation script (
azure_cli_scripts/02_classic_compute_vnet_nsg.sh) - Fix any NSG or delegation issues found
- Verify from Databricks with DNS diagnostics
- Test connectivity with Private Link diagnostics
Feel free to contribute:
- Add new Docker configurations
- Improve network diagnostic scripts
- Add new diagnostic scenarios
- Enhance documentation
- For detailed troubleshooting, see network_analysis/docs/TROUBLESHOOTING_FLOWCHART.md
- For quick reference, see network_analysis/docs/CHEAT_SHEET.txt
| Category | Components | Status |
|---|---|---|
| Docker Configurations | 7 environments | β Active |
| Network Diagnostic Scripts | 5 scripts | β Active |
| Documentation Files | 10+ guides | β Complete |
azure-databricks docker networking diagnostics private-link vnet-injection troubleshooting azure python bash
- v3.0 (2026-01-24): Reorganized into
docker/andnetwork_analysis/folders - v2.0 (2026-01-23): Added modular network diagnostics
- v1.0: Initial Docker configurations
This repository is maintained for Azure Databricks troubleshooting and configuration purposes.
Repository: https://github.com/prabakar2610/Databricks
Quick Links:
- π³ Docker Configurations
- π Network Analysis