openAIScientist is an R package that generates comprehensive scientific analysis using OpenAI's API. The analysis is output in markdown format, making it easy to integrate into various documentation workflows.
openAIScientist was designed to provide users with a quick overview and analysis of their datasets, helping them to understand and interpret their data more efficiently.
- Installation
- Other Dependencies
- Usage
- Setting Up Your API Key
- Documentation
- Disclaimer
- Contributing
- License
First, install the package along with its dependencies if you haven't already:
library(devtools)
install_github("noluyorAbi/openaAIScientist")The openAIScientist package relies on the following R packages, which will be installed automatically:
httr: For HTTP requests.utils: For utility functions like capturing output.readr: For reading and writing data.
To use the openAIScientist package, follow these steps:
- Load the package.
- Load your API key from
.Renviron. - Use the
openAIScientist_generate_scientific_analysisoropenAIScientist_generate_visualization_rmdfunction to generate the analysis.
data(mandatory): A data frame containing the dataset to analyze.api_key(mandatory): Your OpenAI API key as a string.output_name(optional): The name of the output markdown file (default is "Analysis").additional_prompt(optional): Additional instructions for the OpenAI API.
data(mandatory): A data frame containing the dataset to analyze.api_key(mandatory): Your OpenAI API key as a string.output_name(optional): The name of the output RMarkdown file (default is "Visualization").additional_prompt(optional): Additional instructions for the OpenAI API.
# Load the package
library(openAIScientist)
# Load environment variables from the .Renviron file
readRenviron(".Renviron")
# Example data
data <- data.frame(
var1 = rnorm(100),
var2 = rnorm(100),
outcome = sample(c(0, 1), 100, replace = TRUE)
)
# Retrieve the API key from environment variables
api_key <- Sys.getenv("OPENAI_API_KEY")
# Generate scientific analysis
analysis <- openAIScientist_generate_scientific_analysis(data, api_key, "Analysis")
# Generate scientific analysis with additional prompt
analysis <- openAIScientist_generate_scientific_analysis(data, api_key, "Analysis-ADDITIONAL-PROMPT","Write the analysis in German")
# Generate visualization RMarkdown
visualization <- openAIScientist_generate_visualization_rmd(data, api_key, "Visualization")
# Generate visualization RMarkdown with additional prompt
visualization <- openAIScientist_generate_visualization_rmd(data, api_key, "Visualization-ADDITIONAL-PROMPT", "make the visualizations for red-green colorblind")
To securely store and load your OpenAI API key, you should use the .Renviron file. This file allows you to set environment variables that R can access.
-
Create/Edit
.RenvironFile:-
Open your
.Renvironfile. If it doesn't exist, create it in your home directory or in the root of your project folder. -
Add your OpenAI API key in the following format:
OPENAI_API_KEY=your_openai_api_key
-
-
Save and Reload Environment Variables:
-
Save the
.Renvironfile. -
In R, use the following command to reload the environment variables:
readRenviron("~/.Renviron")
-
-
Access the API Key in Your R Script:
-
Retrieve the API key using
Sys.getenvas shown in the usage example above.api_key <- Sys.getenv("OPENAI_API_KEY")
-
While you can directly paste your API key as an argument in the generate_scientific_analysis function, it is considered bad practice and results in “smelly” code. Using environment variables via .Renviron is a more secure and clean approach.
# Directly pasting the API key as an argument (not recommended)
analysis <- openAIScientist_generate_scientific_analysis(data, "your_openai_api_key", "Analysis")Using environment variables as demonstrated in the previous examples is the recommended approach.
For detailed documentation, please refer to the function documentation generated by Roxygen2. You can access the documentation within R:
?openAIScientist_generate_scientific_analysis
?openAIScientist_generate_visualization_rmdThe analysis is created with GPT-4, a very powerful and fast AI. However, there can still be inaccuracies and formatting issues as AIs can be unpredictable sometimes. For formatting issues, try reanalyzing the dataset.
If you find any issues or have suggestions for improvements, please create an issue or a pull request on GitHub.
This package is licensed under the GPL-3 License.
Made with ♥ by noluyorAbi for FortStaSoft @ LMU Munich, July 2024.