|
| 1 | +(help-download_from_jira_url)= |
| 2 | + |
| 3 | +# download_from_jira_url.py - Download replication packages from Jira-specified URLs |
| 4 | + |
| 5 | +::::{warning} |
| 6 | + |
| 7 | +This documentation was AI-generated by Claude Code and should be reviewed for accuracy. Please report any errors or inconsistencies. |
| 8 | + |
| 9 | +:::: |
| 10 | + |
| 11 | +## Description |
| 12 | + |
| 13 | +This script orchestrates downloads from various repositories (Dataverse, Zenodo, OSF) using the replication package URL stored in a Jira issue. It automatically detects the repository type, checks for openICPSR deposits, and calls the appropriate download tool with the correct parameters. |
| 14 | + |
| 15 | +## Usage |
| 16 | + |
| 17 | +```bash |
| 18 | +python3.12 tools/download_from_jira_url.py <issue-key> |
| 19 | +python3.12 tools/download_from_jira_url.py -h|--help |
| 20 | +``` |
| 21 | + |
| 22 | +## Arguments |
| 23 | + |
| 24 | +- **issue-key** (Required) - Jira issue key (e.g., AEAREP-8983, aearep-8361, case-insensitive) |
| 25 | + |
| 26 | +## Examples |
| 27 | + |
| 28 | +```bash |
| 29 | +# Download replication package for a Jira issue |
| 30 | +python3.12 tools/download_from_jira_url.py AEAREP-8983 |
| 31 | + |
| 32 | +# Show help |
| 33 | +python3.12 tools/download_from_jira_url.py --help |
| 34 | +``` |
| 35 | + |
| 36 | +## Workflow |
| 37 | + |
| 38 | +The script follows this sequence: |
| 39 | + |
| 40 | +1. **Check openICPSR**: Verifies if openICPSR Project Number is populated in Jira |
| 41 | + - If yes: exits with code 2 (openICPSR handled separately) |
| 42 | + - If no: proceeds to next step |
| 43 | + |
| 44 | +2. **Retrieve URL**: Gets "Replication package URL" from Jira issue |
| 45 | + |
| 46 | +3. **Detect Repository**: Analyzes URL to determine repository type: |
| 47 | + - **Dataverse**: URLs containing "DVN" or "dataverse" |
| 48 | + - **Zenodo**: URLs containing "zenodo" |
| 49 | + - **OSF**: URLs containing "osf.io" |
| 50 | + |
| 51 | +4. **Download**: Calls appropriate download tool: |
| 52 | + - Dataverse: `download_dv.py` (extracts DOI) |
| 53 | + - Zenodo draft: `download_zenodo_draft.py` (for /deposit/ URLs) |
| 54 | + - Zenodo public: `download_zenodo_public.sh` (for /record/ URLs) |
| 55 | + - OSF: `download_osf.sh` (if available) |
| 56 | + |
| 57 | +5. **Git Integration**: Handles staging/commit in CI mode |
| 58 | + |
| 59 | +## Repository Detection |
| 60 | + |
| 61 | +### Dataverse |
| 62 | + |
| 63 | +Recognizes URLs matching: |
| 64 | +- `https://doi.org/10.7910/DVN/XXXXX` |
| 65 | +- `https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/XXXXX` |
| 66 | +- Any URL containing "DVN" or "dataverse" |
| 67 | + |
| 68 | +Extracts DOI and passes to `download_dv.py`. |
| 69 | + |
| 70 | +### Zenodo |
| 71 | + |
| 72 | +Recognizes URLs matching: |
| 73 | +- `https://zenodo.org/record/12345678` (public record) |
| 74 | +- `https://zenodo.org/deposit/12345678` (draft deposit) |
| 75 | +- `10.5281/zenodo.12345678` (DOI format) |
| 76 | + |
| 77 | +Detects draft vs. public based on `/deposit/` in URL path. |
| 78 | + |
| 79 | +### OSF |
| 80 | + |
| 81 | +Recognizes URLs containing: |
| 82 | +- `osf.io` |
| 83 | + |
| 84 | +**Note**: OSF download not yet fully implemented in this script. |
| 85 | + |
| 86 | +## Output Structure |
| 87 | + |
| 88 | +Downloads create repository-specific directories: |
| 89 | + |
| 90 | +- **Dataverse**: `dv-[PUBLISHER]-[DATASET_ID]/` |
| 91 | +- **Zenodo**: `zenodo-[RECORD_ID]/` |
| 92 | +- **OSF**: `osf-[PROJECT_ID]/` (when implemented) |
| 93 | + |
| 94 | +## Exit Codes |
| 95 | + |
| 96 | +- **0**: Success - download completed |
| 97 | +- **1**: Error - missing arguments, Jira errors, download failures, unsupported repository |
| 98 | +- **2**: openICPSR deposit found (intentional skip - handled separately) |
| 99 | + |
| 100 | +## Prerequisites |
| 101 | + |
| 102 | +### Required Environment Variables |
| 103 | + |
| 104 | +- `JIRA_USERNAME` - Your Jira email address |
| 105 | +- `JIRA_API_KEY` - API token from https://id.atlassian.com/manage-profile/security/api-tokens |
| 106 | + |
| 107 | +### Optional Environment Variables |
| 108 | + |
| 109 | +- `ZENODO_ACCESS_TOKEN` - Required for Zenodo draft deposits |
| 110 | +- `CI` - Set in CI/CD environments for automatic git commits |
| 111 | + |
| 112 | +### Required Tools |
| 113 | + |
| 114 | +- `tools/jira_get_info.py` with 'replicationurl' keyword support |
| 115 | +- Download tools for supported repositories: |
| 116 | + - `tools/download_dv.py` (Dataverse) |
| 117 | + - `tools/download_zenodo_draft.py` (Zenodo drafts) |
| 118 | + - `tools/download_zenodo_public.sh` (Zenodo public) |
| 119 | + - `tools/download_osf.sh` (OSF, optional) |
| 120 | + |
| 121 | +## Git Integration |
| 122 | + |
| 123 | +### In CI Environments |
| 124 | + |
| 125 | +When `CI` environment variable is set: |
| 126 | +- Automatically stages downloaded files with `git add` |
| 127 | +- Commits with descriptive message including repository type and identifier |
| 128 | +- Example: `"[skip ci] Adding files from Dataverse dataset doi:10.7910/DVN/ABC123"` |
| 129 | + |
| 130 | +### In Local Environments |
| 131 | + |
| 132 | +- Suggests manual `git add` operation |
| 133 | +- Does not auto-commit (leaves control to user) |
| 134 | + |
| 135 | +## Error Handling |
| 136 | + |
| 137 | +The script handles various error conditions: |
| 138 | + |
| 139 | +- **Missing Jira credentials**: Reports error and exits |
| 140 | +- **Missing Replication package URL**: Reports error and suggests checking Jira field |
| 141 | +- **Unsupported repository**: Reports error and lists supported repositories |
| 142 | +- **Invalid URL format**: Reports error with URL pattern extraction failure |
| 143 | +- **Download tool failures**: Propagates exit code from underlying tool |
| 144 | + |
| 145 | +## URL Parsing Examples |
| 146 | + |
| 147 | +### Dataverse |
| 148 | + |
| 149 | +| Input URL | Extracted DOI | |
| 150 | +|-----------|---------------| |
| 151 | +| `https://doi.org/10.7910/DVN/ABC123` | `doi:10.7910/DVN/ABC123` | |
| 152 | +| `https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ABC123` | `doi:10.7910/DVN/ABC123` | |
| 153 | +| `https://dataverse.example.edu/file.xhtml?persistentId=doi:10.5072/DVN/XYZ789` | `doi:10.5072/DVN/XYZ789` | |
| 154 | + |
| 155 | +### Zenodo |
| 156 | + |
| 157 | +| Input URL | Record ID | Type | |
| 158 | +|-----------|-----------|------| |
| 159 | +| `https://zenodo.org/record/1234567` | `1234567` | Public | |
| 160 | +| `https://zenodo.org/deposit/1234567` | `1234567` | Draft | |
| 161 | +| `10.5281/zenodo.1234567` | `1234567` | Public | |
| 162 | +| `https://zenodo.org/records/12345678` | `12345678` | Public | |
| 163 | + |
| 164 | +## Requirements |
| 165 | + |
| 166 | +- Python 3.12+ |
| 167 | +- All prerequisites from called download tools: |
| 168 | + - `requests` library (for Dataverse, Zenodo Python tools) |
| 169 | + - `zenodo_get` (for Zenodo public downloads) |
| 170 | + - Jira API credentials |
| 171 | + |
| 172 | +## Integration with Pipeline |
| 173 | + |
| 174 | +This script is designed to integrate with the AEA replication workflow: |
| 175 | + |
| 176 | +```yaml |
| 177 | +# Example bitbucket-pipelines.yml usage |
| 178 | +script: |
| 179 | + - python3.12 tools/download_from_jira_url.py $JIRATICKET |
| 180 | +``` |
| 181 | +
|
| 182 | +Can replace or supplement existing openICPSR/Zenodo download logic for cases where the replication package is hosted on alternative repositories. |
| 183 | +
|
| 184 | +## See Also |
| 185 | +
|
| 186 | +- [jira_get_info.py](help-jira_get_info) - Retrieve Jira issue information |
| 187 | +- [download_dv.py](help-download_dv) - Download from Dataverse |
| 188 | +- [download_zenodo_draft.py](help-download_zenodo_draft) - Download from Zenodo draft deposits |
| 189 | +- [download_zenodo_public.sh](help-download_zenodo_public) - Download from public Zenodo records |
| 190 | +- [download_osf.sh](help-download_osf) - Download from OSF (if available) |
| 191 | +
|
| 192 | +## Troubleshooting |
| 193 | +
|
| 194 | +### "No Replication package URL found in Jira issue" |
| 195 | +
|
| 196 | +**Cause**: The "Replication package URL" field is not populated in the Jira issue. |
| 197 | +
|
| 198 | +**Solution**: |
| 199 | +1. Check the Jira issue in browser |
| 200 | +2. Verify the "Replication package URL" field contains a valid URL |
| 201 | +3. Ensure Jira credentials are correctly configured |
| 202 | +
|
| 203 | +### "Could not extract DOI from Dataverse URL" |
| 204 | +
|
| 205 | +**Cause**: URL format doesn't match expected Dataverse patterns. |
| 206 | +
|
| 207 | +**Solution**: |
| 208 | +1. Verify the URL is a valid Dataverse URL |
| 209 | +2. Ensure the URL contains either a DOI or DVN identifier |
| 210 | +3. Check for typos in the URL |
| 211 | +
|
| 212 | +### "Could not extract record ID from Zenodo URL" |
| 213 | +
|
| 214 | +**Cause**: URL format doesn't match expected Zenodo patterns. |
| 215 | +
|
| 216 | +**Solution**: |
| 217 | +1. Verify the URL is a valid Zenodo URL |
| 218 | +2. Ensure the URL contains a numeric record ID |
| 219 | +3. Try using just the record ID number instead of full URL |
| 220 | +
|
| 221 | +### "openICPSR deposit found (exit code 2)" |
| 222 | +
|
| 223 | +**Cause**: The Jira issue has an openICPSR Project Number populated. |
| 224 | +
|
| 225 | +**Solution**: This is intentional behavior. openICPSR deposits are handled separately through `download_openicpsr-private.py` or `download_openicpsr-public.py`. |
| 226 | + |
| 227 | +## Known Limitations |
| 228 | + |
| 229 | +- OSF download currently reports "not yet implemented" - manual download required |
| 230 | +- Zenodo detection defaults to trying public download first; may fail for draft deposits requiring authentication |
| 231 | +- Only supports public Dataverse datasets (no authentication support) |
| 232 | +- Custom Dataverse instances must use standard API patterns |
| 233 | + |
| 234 | +## Future Enhancements |
| 235 | + |
| 236 | +Potential improvements: |
| 237 | +- Full OSF integration |
| 238 | +- Support for additional repositories (WorldBank, Box, etc.) |
| 239 | +- Better Zenodo draft vs. public detection |
| 240 | +- Parallel download support for multiple URLs |
| 241 | +- URL validation before attempting download |
0 commit comments