Skip to content

Commit 2948526

Browse files
committed
Adding tool to dynamically download packages at other repos
- for now onlyDV and Zenodo - tested only on DV - not yet integrated into pipeline, but that is the goal
1 parent 46bfaa2 commit 2948526

3 files changed

Lines changed: 688 additions & 1 deletion

File tree

Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
(help-download_from_jira_url)=
2+
3+
# download_from_jira_url.py - Download replication packages from Jira-specified URLs
4+
5+
::::{warning}
6+
7+
This documentation was AI-generated by Claude Code and should be reviewed for accuracy. Please report any errors or inconsistencies.
8+
9+
::::
10+
11+
## Description
12+
13+
This script orchestrates downloads from various repositories (Dataverse, Zenodo, OSF) using the replication package URL stored in a Jira issue. It automatically detects the repository type, checks for openICPSR deposits, and calls the appropriate download tool with the correct parameters.
14+
15+
## Usage
16+
17+
```bash
18+
python3.12 tools/download_from_jira_url.py <issue-key>
19+
python3.12 tools/download_from_jira_url.py -h|--help
20+
```
21+
22+
## Arguments
23+
24+
- **issue-key** (Required) - Jira issue key (e.g., AEAREP-8983, aearep-8361, case-insensitive)
25+
26+
## Examples
27+
28+
```bash
29+
# Download replication package for a Jira issue
30+
python3.12 tools/download_from_jira_url.py AEAREP-8983
31+
32+
# Show help
33+
python3.12 tools/download_from_jira_url.py --help
34+
```
35+
36+
## Workflow
37+
38+
The script follows this sequence:
39+
40+
1. **Check openICPSR**: Verifies if openICPSR Project Number is populated in Jira
41+
- If yes: exits with code 2 (openICPSR handled separately)
42+
- If no: proceeds to next step
43+
44+
2. **Retrieve URL**: Gets "Replication package URL" from Jira issue
45+
46+
3. **Detect Repository**: Analyzes URL to determine repository type:
47+
- **Dataverse**: URLs containing "DVN" or "dataverse"
48+
- **Zenodo**: URLs containing "zenodo"
49+
- **OSF**: URLs containing "osf.io"
50+
51+
4. **Download**: Calls appropriate download tool:
52+
- Dataverse: `download_dv.py` (extracts DOI)
53+
- Zenodo draft: `download_zenodo_draft.py` (for /deposit/ URLs)
54+
- Zenodo public: `download_zenodo_public.sh` (for /record/ URLs)
55+
- OSF: `download_osf.sh` (if available)
56+
57+
5. **Git Integration**: Handles staging/commit in CI mode
58+
59+
## Repository Detection
60+
61+
### Dataverse
62+
63+
Recognizes URLs matching:
64+
- `https://doi.org/10.7910/DVN/XXXXX`
65+
- `https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/XXXXX`
66+
- Any URL containing "DVN" or "dataverse"
67+
68+
Extracts DOI and passes to `download_dv.py`.
69+
70+
### Zenodo
71+
72+
Recognizes URLs matching:
73+
- `https://zenodo.org/record/12345678` (public record)
74+
- `https://zenodo.org/deposit/12345678` (draft deposit)
75+
- `10.5281/zenodo.12345678` (DOI format)
76+
77+
Detects draft vs. public based on `/deposit/` in URL path.
78+
79+
### OSF
80+
81+
Recognizes URLs containing:
82+
- `osf.io`
83+
84+
**Note**: OSF download not yet fully implemented in this script.
85+
86+
## Output Structure
87+
88+
Downloads create repository-specific directories:
89+
90+
- **Dataverse**: `dv-[PUBLISHER]-[DATASET_ID]/`
91+
- **Zenodo**: `zenodo-[RECORD_ID]/`
92+
- **OSF**: `osf-[PROJECT_ID]/` (when implemented)
93+
94+
## Exit Codes
95+
96+
- **0**: Success - download completed
97+
- **1**: Error - missing arguments, Jira errors, download failures, unsupported repository
98+
- **2**: openICPSR deposit found (intentional skip - handled separately)
99+
100+
## Prerequisites
101+
102+
### Required Environment Variables
103+
104+
- `JIRA_USERNAME` - Your Jira email address
105+
- `JIRA_API_KEY` - API token from https://id.atlassian.com/manage-profile/security/api-tokens
106+
107+
### Optional Environment Variables
108+
109+
- `ZENODO_ACCESS_TOKEN` - Required for Zenodo draft deposits
110+
- `CI` - Set in CI/CD environments for automatic git commits
111+
112+
### Required Tools
113+
114+
- `tools/jira_get_info.py` with 'replicationurl' keyword support
115+
- Download tools for supported repositories:
116+
- `tools/download_dv.py` (Dataverse)
117+
- `tools/download_zenodo_draft.py` (Zenodo drafts)
118+
- `tools/download_zenodo_public.sh` (Zenodo public)
119+
- `tools/download_osf.sh` (OSF, optional)
120+
121+
## Git Integration
122+
123+
### In CI Environments
124+
125+
When `CI` environment variable is set:
126+
- Automatically stages downloaded files with `git add`
127+
- Commits with descriptive message including repository type and identifier
128+
- Example: `"[skip ci] Adding files from Dataverse dataset doi:10.7910/DVN/ABC123"`
129+
130+
### In Local Environments
131+
132+
- Suggests manual `git add` operation
133+
- Does not auto-commit (leaves control to user)
134+
135+
## Error Handling
136+
137+
The script handles various error conditions:
138+
139+
- **Missing Jira credentials**: Reports error and exits
140+
- **Missing Replication package URL**: Reports error and suggests checking Jira field
141+
- **Unsupported repository**: Reports error and lists supported repositories
142+
- **Invalid URL format**: Reports error with URL pattern extraction failure
143+
- **Download tool failures**: Propagates exit code from underlying tool
144+
145+
## URL Parsing Examples
146+
147+
### Dataverse
148+
149+
| Input URL | Extracted DOI |
150+
|-----------|---------------|
151+
| `https://doi.org/10.7910/DVN/ABC123` | `doi:10.7910/DVN/ABC123` |
152+
| `https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ABC123` | `doi:10.7910/DVN/ABC123` |
153+
| `https://dataverse.example.edu/file.xhtml?persistentId=doi:10.5072/DVN/XYZ789` | `doi:10.5072/DVN/XYZ789` |
154+
155+
### Zenodo
156+
157+
| Input URL | Record ID | Type |
158+
|-----------|-----------|------|
159+
| `https://zenodo.org/record/1234567` | `1234567` | Public |
160+
| `https://zenodo.org/deposit/1234567` | `1234567` | Draft |
161+
| `10.5281/zenodo.1234567` | `1234567` | Public |
162+
| `https://zenodo.org/records/12345678` | `12345678` | Public |
163+
164+
## Requirements
165+
166+
- Python 3.12+
167+
- All prerequisites from called download tools:
168+
- `requests` library (for Dataverse, Zenodo Python tools)
169+
- `zenodo_get` (for Zenodo public downloads)
170+
- Jira API credentials
171+
172+
## Integration with Pipeline
173+
174+
This script is designed to integrate with the AEA replication workflow:
175+
176+
```yaml
177+
# Example bitbucket-pipelines.yml usage
178+
script:
179+
- python3.12 tools/download_from_jira_url.py $JIRATICKET
180+
```
181+
182+
Can replace or supplement existing openICPSR/Zenodo download logic for cases where the replication package is hosted on alternative repositories.
183+
184+
## See Also
185+
186+
- [jira_get_info.py](help-jira_get_info) - Retrieve Jira issue information
187+
- [download_dv.py](help-download_dv) - Download from Dataverse
188+
- [download_zenodo_draft.py](help-download_zenodo_draft) - Download from Zenodo draft deposits
189+
- [download_zenodo_public.sh](help-download_zenodo_public) - Download from public Zenodo records
190+
- [download_osf.sh](help-download_osf) - Download from OSF (if available)
191+
192+
## Troubleshooting
193+
194+
### "No Replication package URL found in Jira issue"
195+
196+
**Cause**: The "Replication package URL" field is not populated in the Jira issue.
197+
198+
**Solution**:
199+
1. Check the Jira issue in browser
200+
2. Verify the "Replication package URL" field contains a valid URL
201+
3. Ensure Jira credentials are correctly configured
202+
203+
### "Could not extract DOI from Dataverse URL"
204+
205+
**Cause**: URL format doesn't match expected Dataverse patterns.
206+
207+
**Solution**:
208+
1. Verify the URL is a valid Dataverse URL
209+
2. Ensure the URL contains either a DOI or DVN identifier
210+
3. Check for typos in the URL
211+
212+
### "Could not extract record ID from Zenodo URL"
213+
214+
**Cause**: URL format doesn't match expected Zenodo patterns.
215+
216+
**Solution**:
217+
1. Verify the URL is a valid Zenodo URL
218+
2. Ensure the URL contains a numeric record ID
219+
3. Try using just the record ID number instead of full URL
220+
221+
### "openICPSR deposit found (exit code 2)"
222+
223+
**Cause**: The Jira issue has an openICPSR Project Number populated.
224+
225+
**Solution**: This is intentional behavior. openICPSR deposits are handled separately through `download_openicpsr-private.py` or `download_openicpsr-public.py`.
226+
227+
## Known Limitations
228+
229+
- OSF download currently reports "not yet implemented" - manual download required
230+
- Zenodo detection defaults to trying public download first; may fail for draft deposits requiring authentication
231+
- Only supports public Dataverse datasets (no authentication support)
232+
- Custom Dataverse instances must use standard API patterns
233+
234+
## Future Enhancements
235+
236+
Potential improvements:
237+
- Full OSF integration
238+
- Support for additional repositories (WorldBank, Box, etc.)
239+
- Better Zenodo draft vs. public detection
240+
- Parallel download support for multiple URLs
241+
- URL validation before attempting download

0 commit comments

Comments
 (0)