This repository is a DataLad dataset containing procedures for setting up a DataLad dataset.
There are procedures to:
- Add a
README - Add a
.gitignorefile - Add a
/codefolder andREADME - Set up an RIA backup and associated GitLab sibling
- Add licenses for data and code
- DataLad - installation instructions
- A python-gitlab config file
Please go through the following steps to set your machine up to run the procedures:
- Create a GitLab account. These procedures are currently configured to use the University of Southampton's GitLab instance
- You will need to generate a personal access token for GitLab here.
- Create a
python-gitlab config file. Copy and paste the following into a text file, inserting your personal access token in the appropriate field:
[soton]
url = https://git.soton.ac.uk
private_token = [insert token here]
- Save this file in your
homedirectory (~) as.python-gitlab.cfg
- Create a new DataLad dataset for your project:
datalad create new-dataset
- Go into your new dataset
cd new-dataset
- Clone this repository into
.datalad/procedures
datalad clone -d . https://github.com/nhuneke/dataset-setup-procedures .datalad/procedures
- Check installation was successful
datalad run-procedure --discover
This should produce the following output:
setup (/Users/user1/new-dataset/.datalad/procedures/setup.sh) [executable]
add-licenses (/Users/user1/new-dataset/.datalad/procedures/add-licenses.py) [executable]
add-code (/Users/user1/new-dataset/.datalad/procedures/add-code.py) [executable]
backup (/Users/user1/new-dataset/.datalad/procedures/backup.sh) [bash_script]
add-gitignore (/Users/user1/new-dataset/.datalad/procedures/add-gitignore.py) [python_script]
add-readme (/Users/user1/new-dataset/.datalad/procedures/add-readme.py) [executable]
Each procedure can be run individually to complete a specific task, or as a group for initial set up of a new dataset.
This procedure creates a remote indexed archive (RIA) backup and associated GitLab sibling. Before running this procedure ensure you have completed the setup as above.
Run the procedure from within your dataset with:
datalad run-procedure backup
Complete the URL for the RIA backup and the project address for the Gitlab sibling when prompted. If you don't know these then ask your supervisor.
You can check this procedure has run correctly with
datalad siblings
You should see ria-backup, ria-backup-storage and gitlab siblings. Publishing to the gitlab sibling is dependent on the ria-backup sibling, so that if you run
datalad push --to gitlab
your dataset will be backed up to both the RIA and GitLab siblings.
This procedure adds a README template to your dataset. From within your dataset, run
datalad run-procedure add-readme
This procedure adds a .gitignore template to your dataset. From within your dataset, run
datalad run-procedure add-gitignore
This procedure adds a /code directory with an accompanying README template and instructs the annex to send all contents of /code to git. From within your dataset, run
datalad run-procedure add-code
The above procedures can also be run as a group with a single command for complete initial set up. From within your dataset, run
datalad run-procedure setup
This procedure is not included in the initial setup group as not all datasets will be shared and so need a license. Further, the licenses chosen for this procedure might not be appropriate for all datasets.
This procedure adds a CC-BY 4.0 license to the dataset and an MIT license to the code directory in the form of a LICENSE file.
If you plan to share your dataset, code, or both, and these licenses are appropriate, then from within your dataset, run
datalad run-procedure add-licenses
Contributions welcome. Please use the following procedure:
- On GitHub, fork a copy of this repository into your userspace
- Install that copy locally using
datalad clone <https://github.com/url/of/fork.git>
- Make changes. Save changes and push the branch to your fork
datalad save -m '<message>'
datalad push --to origin
- Send a pull request