Global Ozone Data Analysis and Modeling

This repository contains tools and scripts for analyzing global ozone datasets, converting data formats, performing exploratory data analysis (EDA), flexible-RAMP correction, and training ensemble models. Below is a summary of the key functions and their purposes.

Functions Summary

`evaluate_models.py`

collocate_data_xarray_optimized(toar_df_tagged, model_ds): Optimized collocation using xarray and vectorized operations.
create_collocated_multimodel_dataset(toar_data_path, year, model_sources): Creates a collocated dataset with observations and multiple model predictions.
calculate_performance_metrics(df_group): Computes performance metrics like RMSE, MAE, and correlation for grouped data.
run_analysis(model_sources, years_to_analyze): Executes the full analysis pipeline for specified years.
plot_metric_by_model(stats_df, metric, year, region, season): Bar plot comparing a metric across models for a specific context.
plot_faceted_comparison(stats_df, metric, year): Multi-panel plot comparing a metric across models, regions, and seasons.
plot_bias_vs_rmse(stats_df, year, region): Scatter plot showing the relationship between bias and RMSE.
plot_metric_heatmap(stats_df, metric, year): Heatmap of a metric across models and regions.
generate_single_model_report(model_name, year, collocated_df, all_stats): Generates a visual report for a single model in a given year.

`preprocess.py`

print_ozone_file_info(source, year, month): Prints basic information about an ozone data file.
analyze_yearly_ozone_dataset(source, year): Performs exploratory data analysis on a yearly ozone dataset.
analyze_ozone_data(source, years): Conducts multi-year ozone data analysis using xarray.

`csv_to_netcdf_converter.py`

convert_all_sources_direct(output_dir, selected_sources): Converts all available ozone datasets from yearly CSVs to combined NetCDF files.
convert_source_to_netcdf_direct(source, output_dir): Converts a single source's yearly CSV files to a combined NetCDF file.

`ramp_correction_parallel_v3.py`

This script is used for applying a ramp correction to the dataset. It is designed to identify and correct for linear instrumental drift or artifacts in the time-series data. The script leverages parallel processing to efficiently handle large volumes of data, significantly speeding up the preprocessing workflow.

Key Features:

Utilizes multiprocessing for enhanced performance.

`ozone_stacked_ensemble.py`

train_stacked_ensemble(df, feature_columns, aux_numeric_cols, output_dir, model_label, debug_rows): Trains a stacked ensemble model and saves artifacts.
predict_with_ensemble(model_path, df_new, feature_columns, aux_numeric_cols): Loads a fitted ensemble and returns predictions for new data.

Usage

Refer to the individual scripts for detailed usage instructions and examples. This repository is designed to handle both CSV and NetCDF/xarray formats, making it versatile for various ozone data analysis tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
combine_files.sh		combine_files.sh
combine_m3fusion_data.py		combine_m3fusion_data.py
compare_matlab_v_python_files.py		compare_matlab_v_python_files.py
compare_serial_parallel_ramp.py		compare_serial_parallel_ramp.py
correlation_analysis.py		correlation_analysis.py
csv_to_netcdf_converter.py		csv_to_netcdf_converter.py
csv_to_netcdf_converter_new.py		csv_to_netcdf_converter_new.py
evaluate_models.py		evaluate_models.py
evaluate_ramp_correction.py		evaluate_ramp_correction.py
gpu_stress.py		gpu_stress.py
m3fusion_eda.py		m3fusion_eda.py
my_serial_job.sh		my_serial_job.sh
njml_preprocess.py		njml_preprocess.py
ozone_stacked_ensemble.py		ozone_stacked_ensemble.py
preprocess.py		preprocess.py
python_longleaf.sh		python_longleaf.sh
ramp_analysis.py		ramp_analysis.py
ramp_analysis_parallel.py		ramp_analysis_parallel.py
ramp_correction.py		ramp_correction.py
ramp_correction_parallel.py		ramp_correction_parallel.py
ramp_correction_parallel_v3.py		ramp_correction_parallel_v3.py
ramp_correction_v2.py		ramp_correction_v2.py
ramp_correction_v3.py		ramp_correction_v3.py
ramp_evaluate.sh		ramp_evaluate.sh
reformat_satellite_data.py		reformat_satellite_data.py
run_m3fusion_eda.sh		run_m3fusion_eda.sh
run_ramp.sh		run_ramp.sh
run_ramp_parallel.sh		run_ramp_parallel.sh
run_single_ramp.sh		run_single_ramp.sh
submit_multiple_ramp_jobs.sh		submit_multiple_ramp_jobs.sh
ukml_preprocess.py		ukml_preprocess.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Global Ozone Data Analysis and Modeling

Functions Summary

`evaluate_models.py`

`preprocess.py`

`csv_to_netcdf_converter.py`

`ramp_correction_parallel_v3.py`

`ozone_stacked_ensemble.py`

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Global Ozone Data Analysis and Modeling

Functions Summary

evaluate_models.py

preprocess.py

csv_to_netcdf_converter.py

ramp_correction_parallel_v3.py

ozone_stacked_ensemble.py

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`evaluate_models.py`

`preprocess.py`

`csv_to_netcdf_converter.py`

`ramp_correction_parallel_v3.py`

`ozone_stacked_ensemble.py`

Packages