Computational astronomy research investigating galaxy evolution, cosmic large-scale structure, and quasar physics using DESI and modern spectroscopic surveys
This organization produces research outputs in astronomy and data science, building analysis-ready datasets from large public sources. The methodology was validated through the Steam Dataset 2025 — a multi-modal gaming analytics ARD with strong engagement and downloads on both Kaggle and Zenodo — and is now being applied to DESI DR1 spectroscopic surveys.
Current work spans galaxy evolution in different cosmic environments, AGN feedback mechanisms, and ML-driven spectral analysis. The research runs on purpose-built infrastructure that enables reproducibility at scale, and the entire system functions as a skill multiplier across systems engineering, DevOps, security, and machine learning.
| Repository | Domain | Description | Status |
|---|---|---|---|
| proxmox-astronomy-lab | Infrastructure | Platform documentation, VM inventory, network architecture | Production |
| desi-cosmic-void-galaxies | Research | ARD factory + environmental quenching in cosmic voids | Active |
| desi-quasar-outflows | Research | AGN outflow spectral fitting and Cloudy modeling | Planned |
| desi-qso-anomaly-detection | Research | ML anomaly detection for quasar spectra | Planned |
| rbh1-validation-reanalysis | Research | Independent reanalysis of RBH-1 hypervelocity SMBH candidate | Active |
| year-of-code-2026 | Development | 2026 project sandbox: AI, ML, agentic coding, cloud infrastructure | Active |
| .github | Meta | Organization profile and templates | — |
The infrastructure foundation for all research workloads. Documents the 7-node Proxmox cluster, VM inventory, network architecture, and automation patterns. This is the platform that enables reproducible, scalable research across all projects.
Analyzing galaxy populations within cosmic voids using DESI Data Release 1 to investigate environmental quenching mechanisms. This project serves as the Analysis-Ready Dataset (ARD) factory for the organization, joining 9 Value-Added Catalogs into enriched data products that feed downstream research.
Investigating AGN-driven outflows through semi-automated spectral fitting combined with Cloudy photoionization modeling. Developing automated pipelines to identify and characterize outflows in massive spectroscopic datasets.
ML-based anomaly detection across millions of quasar spectra. Implementing 1D convolutional variational autoencoders on Ray clusters to identify statistically unusual objects that may represent new physics or rare phenomena.
Independent validation and reanalysis of the RBH-1 hypervelocity SMBH candidate (van Dokkum et al. 2025) using Bayesian inference and GPU-accelerated computing.
2026 project sandbox covering AI, ML, agentic coding, RAG systems, cloud infrastructure, and the occasional side project. A space for experimentation and skill development across the full technology stack.
Our research consumes DESI Data Release 1 Value-Added Catalogs, materialized through PostgreSQL and distributed as Parquet files.
| VAC | Purpose | Scale |
|---|---|---|
| FastSpecFit | Stellar continuum modeling, emission line fluxes | 6.4M galaxies |
| PROVABGS | Bayesian SED fitting, stellar mass, SFH | BGS sample |
| DESIVAST | Void classifications (4 algorithms) | ~10.7K voids |
| Gfinder | Group catalog, halo mass estimates | Group members |
| AGN/QSO | Systemic redshifts, BAL flags, spectral classification | 1.4M QSOs |
| CIV Absorber | Intervening CIV absorption systems | Absorber catalog |
| MgII Absorber | Intervening MgII absorption systems | Absorber catalog |
| QMassIron | Black hole masses, bolometric luminosity | QSO subset |
| Stellar Mass/EmLine | CIGALE stellar masses, emission line properties | Full sample |
PostgreSQL serves as the materialization engine where VAC joins and derived computations occur. Final ARD products are exported to Parquet for distribution and analysis. The pipeline currently manages ~32GB of catalog data in PostgreSQL and ~108GB of spectral tiles in Parquet format.
Production research platform running on a 7-node Proxmox cluster built from small form factor enterprise workstations. The cluster provides dedicated database servers, GPU compute, and Kubernetes orchestration for containerized workloads.
| Resource | Value |
|---|---|
| Nodes | 7 |
| Total Cores | 144 |
| Total RAM | 704 GB |
| Total NVMe | 26 TB |
| Network Fabric | 10G LACP per node |
| GPU | RTX A4000 16GB |
| Node | CPU | Cores | RAM | Role |
|---|---|---|---|---|
| node01 | i9-12900H | 20 | 96 GB | Compute (K8s) |
| node02 | i5-12600H | 16 | 96 GB | Light compute + 6TB storage |
| node03 | i9-12900H | 20 | 96 GB | Compute (K8s) |
| node04 | i9-12900H | 20 | 96 GB | Compute (K8s) |
| node05 | i5-12600H | 16 | 96 GB | Light compute + 6TB storage |
| node06 | i9-13900H | 20 | 96 GB | Heavy compute (databases) |
| node07 | AMD 5950X | 32 | 128 GB | GPU compute |
Research workloads run on dedicated VMs with role-specific resource allocation.
| VM | IP | vCPU | RAM | Purpose |
|---|---|---|---|---|
| radio-k8s01 | 10.25.20.4 | 12 | 48G | Kubernetes primary node |
| radio-k8s02 | — | 12 | 48G | Kubernetes worker |
| radio-k8s03 | — | 12 | 48G | Kubernetes worker |
| radio-gpu01 | 10.25.20.10 | 12 | 48G | GPU worker (A4000) + K8s |
| radio-pgsql01 | 10.25.20.8 | 8 | 32G | Research PostgreSQL (pgvector, PostGIS) |
| radio-pgsql02 | 10.25.20.16 | 4 | 16G | Application PostgreSQL |
| radio-neo4j01 | 10.25.20.21 | 6 | 24G | Graph database |
| radio-fs02 | 10.25.20.15 | 4 | 6G | SMB file server (spectral data) |
| radio-agents01 | 10.25.20.20 | 8 | 32G | AI agents, monitoring stack |
- Hybrid Kubernetes + VM Architecture: RKE2 orchestration with strategic static VMs for databases and persistent services
- Enterprise Security Baseline: CIS Controls implementation with research workflow accommodations
- Secure Remote Access: Entra ID hybrid identity with Cloudflare ZTNA
- Open Source Toolchain: GitOps automation, container orchestration, scientific computing workflows
This organization benefits from open source programs that provide tooling to qualifying public repositories.
| Program | Provides | Use Case |
|---|---|---|
| CodeRabbit | AI code review (Pro tier) | PR review, CLI integration with agentic coding tools |
| Atlassian | Jira, Confluence (Standard) | Project tracking, documentation |
| Program | Provides | Planned Use |
|---|---|---|
| Snyk | Security scanning | Dependency vulnerability detection |
| SonarCloud | Code quality | Static analysis |
| Sentry | Error tracking | Runtime monitoring |
| Datadog | Observability | Metrics, logs, APM |
We practice open science and open methodology — our version of "showing your work":
- Research methodologies are fully documented and repeatable
- Infrastructure configurations are version-controlled and automated
- Scripts and pipelines are published so others can learn, adapt, or improve them
- Learning processes are captured and shared for community benefit
Our hope is that these materials help someone facing similar challenges, or inspire collaboration that helps us. All projects operate under open source licenses (primarily MIT) to ensure maximum reproducibility.
- Documentation Hub: docs.radioastronomy.io
- GitHub Discussions: Technical discussions and collaboration
- Issue Tracking: Project-specific development milestones
Projects in this organization are licensed under MIT unless otherwise specified.
Computational astronomy research through open data, reproducible workflows, and enterprise infrastructure








