-
Notifications
You must be signed in to change notification settings - Fork 7
B. Detailed bdc Usage
Just invoke bdc with no arguments for a quick usage message.
bdc can be invoked several different ways. Each is described below.
- Getting the abbreviated usage message
- Getting the full usage message
- Show only the version
- Check your
build.yamlfor errors - Get a list of the notebooks in a course
- Build a course
- Upload course notebooks to a Databricks shard
- Download course notebooks to a Databricks shard
- Relative Shard Paths
Invoke bdc with no arguments to get a quick usage message.
bdc -h or bdc --help
bdc --version
Running bdc --check against a build.yaml file parses the file and
checks it for obvious problems, without actually doing anything else.
bdc performs that same validation automatically, when you run a
build or use --upload or --download. But --check lets you force
a validation check.
bdc --list-notebooks [build-yaml]
With this command, bdc will list the full paths of all the (source) notebooks
that comprise a particular course, one per line. build-yaml is the path to
the course's build.yaml file, and it defaults to build.yaml in the current
directory.
bdc [-o | --overwrite] [-v | --verbose] [-d DEST | --dest DEST] [build-yaml]
This version of the command builds a course, writing the results to the
specified destination directory, DEST. If the destination directory
doesn't exist, it defaults to $HOME/tmp/curriculum/<course-id> (e.g.,
$HOME/tmp/curriculum/Spark-100-105-1.8.11).
If the destination directory already exists, the build will fail unless you
also specify -o (or --overwrite).
If you specify -v (--verbose), the build process will emit various verbose
messages as it builds the course.
build-yaml is the path to the course's build.yaml file, and it defaults to
build.yaml in the current directory.
You can use bdc to upload all notebooks for a course to a Databricks shard.
bdc --upload shard-path [build-yaml]
Or, if you want to use a different databricks authentication profile than
DEFAULT:
bdc --upload --dprofile profile shard-path [build-yaml]
--dbprofile (or -P) corresponds directly to the databricks
command's --profile argument.
This version of the command gets the list of source notebooks from the build
file and uploads them to a shard using a layout similar to the build layout.
You can then edit and test the notebooks in Databricks. When you're done
editing, you can use bdc to download the notebooks again. (See below.)
shard-path is the path to the folder on the Databricks shard. For instance:
/Users/foo@example.com/Spark-ML-301. The folder must not exist in the
shard. If it already exists, the upload will abort.
shard-path can be relative to your home directory. See
Relative Shard Paths, below.
build-yaml is the path to the course's build.yaml file, and it defaults to
build.yaml in the current directory.
Uploads and build profiles: If two notebooks with separate profiles
("amazon" and "azure") map to the same dest value, bdc would overwrite one
of them during the upload and would arbitrarily choose one on the download.
Now, it adds an "az" or "am" qualifier to the uploaded file. For instance,
assume build.yaml has these two notebooks (and assume typical values in
notebook_defaults):
- src: 02-ETL-Process-Overview-az.py
dest: ${target_lang}/02-ETL-Process-Overview.py
only_in_profile: azure
- src: 02-ETL-Process-Overview-am.py
dest: ${target_lang}/02-ETL-Process-Overview.py
only_in_profile: amazon
Both notebooks map to the same build destination. bdc --upload will upload
02-ETL-Process-Overview-az.py as 01-az-ETL-Process-Overview.py, and it will
upload 02-ETL-Process-Overview-am.py as 01-am-ETL-Process-Overview.py.
bdc always applies the am or az prefix, if only_in_profile is specified,
even if there are no destination conflicts. The prefix is placed after any
numerals in the destination file name; if there are no numerals, it's placed
at the beginning.
You can use bdc to download all notebooks for a course to a Databricks shard.
bdc --download shard-path [build-yaml]
Or, if you want to use a different databricks authentication profile than
DEFAULT:
bdc --download --dprofile profile shard-path [build-yaml]
--dbprofile (or -P) corresponds directly to the databricks
command's --profile argument.
This version of the command downloads the contents of the specified Databricks
shard folder to a local temporary directory. Then, for each downloaded file,
bdc uses the build.yaml file to identify the original source file and
copies the downloaded file over top of the original source.
shard-path is the path to the folder on the Databricks shard. For instance:
/Users/foo@example.com/Spark-ML-301. The folder must exist in the
shard. If it doesn't exist, the upload will abort.
shard-path can be relative to your home directory. See
Relative Shard Paths, below.
build-yaml is the path to the course's build.yaml file, and it defaults to
build.yaml in the current directory.
WARNING: If the build.yaml points to your cloned Git repository,
ensure that everything is committed first. Don't download into a dirty
Git repository. If the download fails or somehow screws things up, you want to
be able to reset the Git repository to before you ran the download.
To reset your repository, use:
git reset --hard HEAD
This resets your repository back to the last-committed state.
--upload and --download can support relative shard paths, allowing you
to specify foo, instead of /Users/user@example.com/foo, for instance.
To enable relative shard paths, you must do one of the following:
Set DB_SHARD_HOME
You can set the DB_SHARD_HOME environment variable (e.g., in your
~/.bashrc) to specify your home path on the shard. For example:
export DB_SHARD_HOME=/Users/user@example.comAdd a home setting to ~/.databrickscfg
You can also add a home variable to ~/.databrickscfg, in the DEFAULT
section. The Databricks CLI command will ignore it, but bdc will honor it.
For example:
[DEFAULT]
host = https://trainers.cloud.databricks.com
token = lsakdjfaksjhasdfkjhaslku89iuyhasdkfhjasd
home = /Users/user@example.net
NOTICE
- This software is copyright © 2017-2021 Databricks, Inc., and is released under the Apache License, version 2.0. See LICENSE.txt in the main repository for details.
- Databricks cannot support this software for you. We use it internally, and we have released it as open source, for use by those who are interested in building similar kinds of Databricks notebook-based curriculum. But this software does not constitute an official Databricks product, and it is subject to change without notice.