-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathREADME.Rmd
More file actions
104 lines (72 loc) · 3.89 KB
/
README.Rmd
File metadata and controls
104 lines (72 loc) · 3.89 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# s3
<!-- badges: start -->
[](https://CRAN.R-project.org/package=s3)
[](https://github.com/brokamp-group/s3/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
s3 is an R package designed to download files from [AWS S3](<https://aws.amazon.com/s3/>). Files are downloaded to the R user data directory (i.e., `tools::R_user_dir("s3", "data")`) so they can be cached across all of an R user's sessions and projects. Specify an alternative download location by setting the `R_USER_DATA_DIR` environment variable (see `?tools::R_user_dir`).
A file is specified from AWS S3 using its URI and downloaded using the `s3_get()` and `s3_get_files()` functions; e.g., `s3_get("s3://modis-aod-nasa/2020.05.22.tif")`. The get functions always (invisibly) return paths to downloaded files, making it straightforward to read downloaded files into R. Files already present in the download location will be used before trying to download a file again. This means more concise code for downloading files, if they are not already downloaded, and reading files within R.
## Installation
Install the CRAN latest release inside `R` with:
```r
install.packages("s3")
```
Install the development version from GitHub:
```r
# install.packages("remotes")
remotes::install_github("geomarker-io/s3")
```
## Usage
### Downloading Files
```{r}
library(s3)
```
Download a single file specified by its S3 URI with:
```{r, echo = FALSE}
Sys.unsetenv(c('AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
Sys.setenv("R_USER_DATA_DIR" = tempdir())
```
```{r}
s3_get("s3://geomarker/testing_downloads/mtcars.rds")
```
If a file has already been downloaded, then it will not be re-downloaded:
```{r}
s3_get("s3://geomarker/testing_downloads/mtcars.rds")
```
Download multiple files with:
```{r}
s3_get_files(c(
"s3://geomarker/testing_downloads/mtcars.rds",
"s3://geomarker/testing_downloads/mtcars_again.rds"
),
confirm = FALSE)
```
### Private Files
Downloading private files requires the name of the S3 bucket's region (this is determined automatically when the file is public):
```{r, eval = FALSE}
s3_get("s3://geomarker/testing_downloads/mtcars_private.rds", region = "us-east-2")
```
#### Setting up AWS credentials
You must have the appropriate AWS S3 credentials set to gain access to non-public files. As with other AWS command line tools and R packages, you can use the environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` to gain access to such files.
It is highly recommended to setup your environment variables outside of your R script to avoid including sensitive information within your R script. This can be done by exporting environment variables before starting R (see [AWS CLI documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html) on this) or by defining them in a `.Renviron` file (see `?.Renviron` within `R`).
You can use the internal helper function to check if AWS key environment variables are set.
```{r}
s3:::check_for_aws_env_vars()
```
### Downloaded file paths
Files are saved within a directory structure matching that of the S3 URI. `s3_get` and `s3_get_files` both invisibly return the file path(s) of the downloaded files so that they can be further used to access the downloaded files. This makes it possible for different users with different operating systems and/or different project file structures and locations to utilize a downloaded S3 file without changing their source code:
```{r}
s3_get("s3://geomarker/testing_downloads/mtcars.rds") |>
readRDS()
```