-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathmodules-software.qmd
More file actions
403 lines (295 loc) · 10.4 KB
/
modules-software.qmd
File metadata and controls
403 lines (295 loc) · 10.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
---
title: "Session 4: Modules and Software"
subtitle: "Managing Software Environments on HPC Systems"
format: html
---
# Session content
## Session aims
By the end of this session, you will be able to:
- Understand the module system and its benefits for HPC environments
- Use basic module commands to list, load, and unload software
- Create scripts that load modules and run software
- Request new software installations through proper channels
- Explore alternative software management approaches (Spack, containers)
- Apply best practices for reproducible software environments
In this session we will learn about software on Aire, and how to access software via the module system. We will also discuss some alternatives to install software yourself on the system.
[**View Interactive Slides: Module System on Aire**](modules-software-slides.qmd){.btn .btn-primary target="_blank"}
## What are Modules?
Modules are a way to manage different software environments on HPC systems:
- They allow users to load and unload software packages dynamically
- This helps in managing different versions of software and their dependencies
- Simplifies the user environment and avoids conflicts between software versions
- Provides a consistent and reproducible environment
### Why Use Modules?
::: {.columns}
::: {.column width="50%"}
**Benefits:**
- Clean separation of software environments
- Easy to switch between environments
- Optimized builds for HPC hardware
:::
::: {.column width="50%"}
**Without Modules:**
- Software conflicts
- Path management nightmares
- Inconsistent environments
- Difficult reproducibility
:::
:::
## Basic Module Commands
### Listing Available Modules
```bash
module avail # List all available modules
module avail python # List all Python modules
module avail gcc # List all GCC modules
```
### Loading and Unloading Modules
```bash
# Load a module
module load python/3.13.0
# Load without specifying version (uses default)
module load python
# Unload a module
module unload python/3.13.0
# List currently loaded modules
module list
# Unload all modules
module purge
```
## Using Modules in Scripts
Let's say we have a Python file called `hello_world.py`:
```python
print("hello world!")
```
How would we write a bash script that loads the Python module and runs the Python script?
### Creating a Module Script
Create a file called `python_test.sh`:
```bash
#!/bin/bash
module load python/3.13.0
python hello_world.py
```
Make it executable and run it:
```bash
chmod +x python_test.sh # Add executable permissions
ls -F # Check it's executable (shows *)
./python_test.sh # Run the script
```
::: {.callout-important}
## Best Practice for Python
When running Python jobs, we recommend using the Miniforge module to create a conda environment instead of using the basic Python install. [Read our documentation on dependency management](https://arcdocs.leeds.ac.uk/aire/usage/dependency_management.html).
:::
## Requesting New Software
### Centralized Management
- Popular software is centrally installed by the Research Computing team
- Ensures optimized performance and avoids conflicts
- Regularly updated with new versions
### How to Request New Software
If software you need isn't available:
1. **Submit a Research Computing Query** with details about the software
2. Include: software name, version, and brief justification for use in your research
3. The team will evaluate and install if appropriate
## Alternative Software Management
You can also manage your own software on Aire through several routes. Many users won't need this, but it may be necessary if you want fine-grained control or need older versions of software.
### Package Managers
#### Spack
- **[Spack](https://spack.io/)**: Flexible package manager for HPC systems
- Allows users to install software without admin privileges
- Supports complex dependency management
```bash
module load spack
spack install htop
spack load htop
```
#### EasyBuild
- **[EasyBuild](https://easybuild.io/)**: Framework for building and installing software on HPC
- Automates the build process using configuration files
- Good for complex scientific software
### Other Options
#### Manual Building
- Download and compile software yourself
- Requires knowledge of build systems and dependencies
- Most control but most work
#### Containers
- Encapsulate software environments using **[Apptainer](https://apptainer.org/)**
- Ensures portability and consistency across systems
- Great for complex software stacks
## Best Practices
### Environment Management
- Use modules to manage software environments effectively
- **Unload modules** when no longer needed to avoid conflicts
- For R or Python, use **Miniforge module** to create conda environments
### Reproducibility
::: {.callout-important}
## Key Principles
1. **Always specify versions**: `module load gcc/14.2.0` not `module load gcc`
2. **Document everything**: Keep track of modules and versions used
3. **Use scripts**: Automate your module loading in job scripts
4. **Version control**: Keep your workflow scripts in version control
:::
### Collaboration
- Share module load commands with collaborators
- Use version-controlled scripts to manage workflows
- Consider containers for complex environments
## Common Module Workflows
### Data Analysis Workflow
```bash
#!/bin/bash
module load python/3.13.0
module load scipy/1.11.3
module load matplotlib/3.7.2
python analysis.py
```
### Compilation Workflow
```bash
#!/bin/bash
module load gcc/14.2.0
module load cmake/3.24.2
module load openmpi/4.1.4
cmake .
make -j8
```
### Machine Learning Workflow
```bash
#!/bin/bash
module load miniforge/24.3.0
conda activate ml-env
python train_model.py
```
# Exercises
Work through these exercises to practice using the module system on Aire.
### Exercise 1: Explore Available Software
Get familiar with the available software on Aire:
```bash
# List all available modules
module avail
# Search for specific software
module avail python
module avail gcc
module avail cmake
# Look for software you might need for your research
module avail R
module avail matlab
```
**Questions to consider:**
- How many versions of Python are available?
- What's the default version when multiple versions exist?
- Can you find software relevant to your research area?
### Exercise 2: Practice Loading and Managing Modules
Learn to load, check, and unload modules:
```bash
# Load a module and check it's loaded
module load gcc
module list
# Load multiple modules
module load python/3.13.0
module load cmake/3.24.2
module list
# Try loading without specifying version
module unload gcc
module load gcc
module list
# Clean up - unload all modules
module purge
module list
```
**Key Learning Points:**
- Always specify versions for reproducibility: `module load gcc/14.2.0`
- Use `module list` to see what's currently loaded
- Use `module purge` to start with a clean environment
### Exercise 3: Create and Test a Module Script
Create a script that uses modules to run software:
```bash
# Create a simple Python script first
cat > hello_modules.py << 'EOF'
import sys
print(f"Hello from Python {sys.version}")
print(f"Python executable: {sys.executable}")
EOF
# Create a bash script that loads modules and runs Python
cat > test_modules.sh << 'EOF'
#!/bin/bash
echo "Starting with clean environment..."
module purge
module list
echo "Loading Python module..."
module load python/3.13.0
module list
echo "Running Python script..."
python hello_modules.py
echo "Script completed!"
EOF
# Make executable and test
chmod +x test_modules.sh
./test_modules.sh
```
### Exercise 4: Create a Project Setup Script
Create a reusable script for a typical research project:
```bash
# Create a comprehensive project setup script
cat > project_setup.sh << 'EOF'
#!/bin/bash
# Project Setup Script
# Description: Loads all necessary modules for data analysis project
echo "Setting up research environment..."
# Start with clean environment
module purge
# Load essential tools
module load gcc/14.2.0 # Compiler
module load python/3.13.0 # Python
module load cmake/3.24.2 # Build system
# Optional: Load domain-specific software
# module load r/4.3.1 # For R users
# module load matlab/2023b # For MATLAB users
echo "Loaded modules:"
module list
echo "Environment ready!"
echo "Python version: $(python --version)"
echo "GCC version: $(gcc --version | head -n1)"
# Optional: Activate conda environment
# echo "Activating conda environment..."
# conda activate myproject
EOF
chmod +x project_setup.sh
./project_setup.sh
```
### Exercise 5: Explore Software Request Process
Practice finding and understanding the software request process:
1. **Find the request form**: Navigate to the [Research Computing Query form](https://bit.ly/arc-help)
2. **Identify software needs**: Think of software you need that isn't available
3. **Draft a request**: Write a brief justification for a software package you might need
**Example request template:**
```
Software Name: [e.g., TensorFlow 2.14]
Version: [specific version if needed]
Research Purpose: [brief description of how it will be used]
Justification: [why this specific version/software is needed]
```
::: {.callout-tip}
## What You've Accomplished
- ✅ Explored available software using `module avail`
- ✅ Practiced loading and unloading modules
- ✅ Created scripts that use modules effectively
- ✅ Built a reusable project setup script
- ✅ Understanding the software request process
- ✅ Applied best practices for reproducible environments
:::
---
# Summary
::: {.callout-note}
## Key Takeaways
- **Modules provide clean software environments** without conflicts
- **Always specify versions** for reproducible research
- **Use scripts** to automate and document your module usage
- **Request new software** through Research Computing queries
- **Consider alternatives** like Spack or containers for special requirements
- **Follow best practices** for collaboration and reproducibility
:::
---
## Next Steps
Now you know how to manage software on Aire! Let's move on to [Session 5: Job Scheduling and Submission](scheduling-submission.qmd) to learn how to run your code on the compute nodes.
## Additional Resources
- [Aire Software Documentation](https://arcdocs.leeds.ac.uk/aire/software/start.html)
- [Dependency Management Guide](https://arcdocs.leeds.ac.uk/aire/usage/dependency_management.html)
- [Software Request Form](https://bit.ly/arc-help)
- [Miniforge and Conda Guide](https://arcdocs.leeds.ac.uk/aire/usage/dependency_management.html)