forked from murphyqm/research-software-development
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathworkflow.qmd
More file actions
349 lines (238 loc) · 10.9 KB
/
workflow.qmd
File metadata and controls
349 lines (238 loc) · 10.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
---
title: "Research code workflow"
---
This page is designed to be a quick reference for the stages in building a research coding project. Please refer back to the main text for detailed guidance for any of these steps or stages.
::: {.panel-tabset group="workflow"}
## Single directory project
If you select this tab, the instructions below will show information relevant to organising your project in a single project folder: this folder will contain all of your code, application, analysis, figures, and notes.
## Dual directory project
If you select this tab, the instructions below will show information relevant to organising your project in two project folders: one folder for your core code as an installable package, the other containing the application of this code, analysis, notes, figures etc.
:::
## Part One: building the core code
### 1. Brainstorm and gather requirements
- Functional requirements: what must the software do?
- Non-functional requirements: how should it work/behave?
- Constraints: what are the limitations or assumptions?
### 2. Create your project directory structure in a repository
<label>Project name: <input id="name" value="example_name" type="text" placeholder="example_name" pattern="[a-z0-9_]*" style="font-family:monospace;"></label>
:::::{.callout-note collapse="true"}
## View commands to create structure
:::: {.panel-tabset group="workflow"}
## Single directory project
<pre id="output_a" style="background-color: #f8f9fa; padding: 10px; border-radius: 5px; border-left: 4px solid #007acc;"></pre>
## Dual directory project
<pre id="output" style="background-color: #f8f9fa; padding: 10px; border-radius: 5px; border-left: 4px solid #007acc;"></pre>
<script>
const name = document.getElementById('name');
function updateOutput() {
sanitized = (name.value || "").replace(/[^a-z0-9_]/g, "")
output.textContent =
`mkdir -p tests src/${sanitized}
touch {pyproject.toml,environment.yml,README.md,CITATION.cff,src/${sanitized}/__init__.py,src/${sanitized}/example.py}
echo -e 'import sys\nsys.path.append("src")' > tests/__init__.py
echo -e 'name: ${sanitized}-env
channels:
- conda-forge
- nodefaults
dependencies:
- python=3.13
- pytest
- blackd
- isort
# remove/modify these as needed
- numpy
- pandas
# keep this to install your Python package locally
- pip
- pip:
- --editable .' > environment.yml
echo -e '[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "${sanitized}"
version = "0.1.0"
description = "Brief description"
authors = [{name = "Your Name"}]
requires-python = ">=3.13"
[tool.setuptools.packages.find]
where = ["src"]' > pyproject.toml`
output_a.textContent =
`mkdir -p tests src/${sanitized} data/raw data/results notebooks reports
touch {pyproject.toml,environment.yml,README.md,CITATION.cff,src/${sanitized}/__init__.py,src/${sanitized}/example.py}
echo -e 'import sys\nsys.path.append("src")' > tests/__init__.py
echo -e 'name: ${sanitized}-env
channels:
- conda-forge
- nodefaults
dependencies:
- python=3.13
- pytest
- blackd
- isort
# remove/modify these as needed
- numpy
- pandas
# keep this to install your Python package locally
- pip
- pip:
- --editable .' > environment.yml
echo -e '[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "${sanitized}"
version = "0.1.0"
description = "Brief description"
authors = [{name = "Your Name"}]
requires-python = ">=3.13"
[tool.setuptools.packages.find]
where = ["src"]' > pyproject.toml`
}
name.addEventListener('input', updateOutput);
updateOutput();
</script>
::::
:::::
### 3. Create a development environment
```bash
conda env create -f environment.yml
```
```bash
conda env update --file environment.yaml --prune
```
### 4. Write your pseudocode, comments, and code
- Write code for yourself in a year: name variables and functions sensibly, and use comments to add context
### 5. Write a test suite
Create your tests in a folder called `test`. Call the file `test_<YOUR-PYTHON-MODULE-NAME>.py`, and define the functions inside it as `def test_<YOUR-FUNCTION-NAME>():`.
Use this format:
```python
def test_example():
'''Test for the example function'''
# Arrange
test_variable_1 = 0
test_variable_2 = 1
expected_output = 7
# Act
output = your_function(test_variable_1, test_variable_2)
# Assert
assert output == expected_output
# No cleanup needed
```
All functions in your core code package should be tested.
### 6. Write documentation
::: {.panel-tabset group="workflow"}
## Single directory project
You'll want to add:
- Module and function level docstrings
- A README.md file
- A `pyproject.toml` file to document your core Python package
- Possibly some example notebooks running the code
## Dual directory project
In your core code/package repository:
- Module and function level docstrings
- A README.md file
- A `pyproject.toml` file to document your core Python package
In your analysis/application file:
- Module and function level docstrings of code
- Example notebooks
- Instructions in your README.md on how to use the code in the package repository.
:::
## Part two: using the core code
### 1. Setting up your folder structure
::: {.panel-tabset group="workflow"}
## Single directory project
You've already sorted this back in [Step 2](#setting-up-your-folder-structure)
## Dual directory project
This is more flexible, but it's generally still a good idea to keep things organised. This folder might look like this:
```text
pallasite-parent-body-evolution/ The project git repository
├── LICENSE
├── README.md
├── environment.yml The libraries I need for analysis (including the package from our package repository)
├── data I usually load in large data from storage elsewhere
│ ├── interim But sometimes do keep small summary datafiles in the repository
│ ├── processed
│ └── raw
├── docs Notes on analysis, process etc.
├── notebooks Jupyter notebooks used for analysis
├── reports For a manuscript source, e.g., LaTeX, Markdown, etc., or any project reports
│ └── figures Figures for the manuscript or reports
├── src Source code for this project
│ ├── data Scripts and programs to process data
│ ├── tools Any helper scripts go here
│ └── visualization Scripts for visualisation of your results, e.g., matplotlib, ggplot2 related.
└── tests Test code for this project, benchmarking, comparison to analytical models
```
:::
### 2. Creating a research environment
::: {.panel-tabset group="workflow"}
## Single directory project
We have already done this back in [Step 3](#create-a-development-environment).
## Dual directory project
You have a few options:
### Just keep using your development environment from Step 3
As you have installed your package, you can continue to use your development environment from [Step 3](#create-a-development-environment) across your system, in other folders.
### Create a new environment and hardlink local package
Take the development environment you created in [Step 3](#create-a-development-environment), and copy it into the analysis folder, and replace the following lines:
```yaml
# keep this to install your Python package locally
- pip
- pip:
- --editable .
```
with:
```yaml
# keep this to install your Python package locally
- pip
- pip:
- --editable /absolute/path/to/package-repo
```
This is still not easily reusable, but at least it's now more obvious what environment you're using, and you can still easily make changes to the package as you work.
### Install your package via GitHub
Take the development environment you created in [Step 3](#create-a-development-environment), and copy it into the analysis folder, and replace the following lines:
```yaml
# keep this to install your Python package locally
- pip
- pip:
- --editable .
```
with:
```yaml
# keep this to install your Python package locally
- pip
- pip:
- <PACKAGE_NAME>@git+https://github.com/<USER_NAME>/<REPO_NAME>
```
Replacing the variables `<In angled brackets>` with the appropriate values.
This can be a bit tricky when you're in the early stages of the project and need to frequently update your core code package (as you'll need to push the package changes to GitHub and then update your environment file every time); I often leave this until my package code is fairly stable. You'll definitely want to do this before releasing your code.
:::
### 3. Do your research!
#### Reloading libraries
When working with an editable install, you may need to force reload the module after changing it to ensure the changes carry through.
Let's say I have updated my package `amazing_project`, and it's installed in my current Conda environment as an editable install. I don't need to update my environment, but if you're using a notebook, you will have to force reload the module. This is very easy using `importlib` (this is included in the core Python library, so doesn't need to be added to your environment):
```python
import amazing_project as ap
import importlib
importlib.reload(ap)
```
#### Using notebooks
Jupyter notebooks can be tricky when it comes to version control: they are filled html formatting that updates when you rerun cells, and this can obscure actual code changes. There are a few different options if you are keep to use notebooks:
- [Jupytext](https://jupytext.readthedocs.io/en/latest/index.html): creating Jupyter notebooks in plain `.py` files
- [Marimo notebooks](https://marimo.io/): an alternative notebook framework, again in plain `.py`.
### 4. Export a record of your environment
::: {.panel-tabset group="workflow"}
## Single directory project
When you've created a "batch" of results that you're happy with, you should record your exact dependencies at that point:
```bash
conda env export > env-record.yml # from inside the activated env
```
## Dual directory project
When you've created a "batch" of results that you're happy with, you should record your exact dependencies in the environment used to produce those results at that point (so your [research environment from Step 2](#creating-a-research-environment)), and save this in your analysis folder:
```bash
conda env export > env-record.yml # from inside the activated env
```
It's often a good idea to create a release of your package (see [Step 5 below](#create-a-release-synced-with-zenodo)) and install it properly using the version number and the GitHub url in your research environment, and *then* do your final analysis and export your research environment.
:::
### 5. Create a release, synced with zenodo
Add your DOI to your citation file!