diff --git a/GNN/RESULTS_SUMMARY.md b/GNN/RESULTS_SUMMARY.md index 6653387..791a66a 100644 --- a/GNN/RESULTS_SUMMARY.md +++ b/GNN/RESULTS_SUMMARY.md @@ -24,7 +24,7 @@ The method produces spatially coherent hotspot regions that capture all high-sev --- -## Approach: Hybrid Spatial Modeling + Clustering +### Performance Winner: Hybrid GNN Modeling + Clustering This approach integrates multiple complementary components. diff --git a/GNN/dashboard.html b/GNN/dashboard.html index 7c906d9..6683f56 100644 --- a/GNN/dashboard.html +++ b/GNN/dashboard.html @@ -370,14 +370,14 @@
Graph Neural Network-based Spatial Hotspot Identification
+Graph Neural Network (GAT) Analysis & Hotspot Identification
This dashboard presents an analysis of accessibility hotspot detection using Graph Neural Networks (GNN). The GAT-based model identifies high-risk accessibility hotspots by combining spatial modeling with graph-based learning. Evaluation is performed on a held-out test set (70/15/15 train/val/test split).
+This dashboard presents an analysis of accessibility hotspot detection using Graph Neural Networks + (GNN). The GAT-based model identifies high-risk accessibility hotspots by combining spatial modeling + with graph-based learning. Evaluation is performed on a held-out test set (70/15/15 train/val/test + split).
- Hypothesis: Accessibility issues exhibit spatial dependencies. Nearby issues may share common causes (shared infrastructure, neighborhood characteristics, urban planning patterns). Treating locations as independent points (as in KDE, thresholding) may miss these contextual relationships. A graph-based approach explicitly models spatial neighborhoods, allowing the model to learn representations that encode these dependencies. However, this requires careful graph construction (KNN k=15 chosen empirically) and comes with computational overhead. The low Jaccard similarity (0.01-0.05) with baselines suggests the GNN identifies different patterns, though this could also reflect parameter sensitivity rather than capturing true underlying structure. +
+ Hypothesis: Accessibility issues exhibit spatial dependencies. Nearby issues may + share common causes (shared infrastructure, neighborhood characteristics, urban planning patterns). + Treating locations as independent points (as in KDE, thresholding) may miss these contextual + relationships. A graph-based approach explicitly models spatial neighborhoods, allowing the model to + learn representations that encode these dependencies. However, this requires careful graph + construction (KNN k=15 chosen empirically) and comes with computational overhead. The low Jaccard + similarity (0.01-0.05) with baselines suggests the GNN identifies different patterns, though this + could also reflect parameter sensitivity rather than capturing true underlying structure.
Accessibility barriers may exhibit spatial dependencies. Nearby issues may share common causes (infrastructure age, neighborhood planning, terrain). Classical spatial statistics (KDE, Getis-Ord Gi*) treat each location independently, which may miss these contextual patterns. Graph structure explicitly models spatial neighborhoods, making these relationships learnable. However, the graph construction itself (KNN k=15, distance thresholds) introduces assumptions that affect the learned representations. Alternative graph constructions (radius-based, adaptive k) were not exhaustively explored due to 24-hour datathon constraints.
+Accessibility barriers may exhibit spatial dependencies. Nearby issues may share common causes + (infrastructure age, neighborhood planning, terrain). Classical spatial statistics (KDE, Getis-Ord + Gi*) treat each location independently, which may miss these contextual patterns. Graph structure + explicitly models spatial neighborhoods, making these relationships learnable. However, the graph + construction itself (KNN k=15, distance thresholds) introduces assumptions that affect the learned + representations. Alternative graph constructions (radius-based, adaptive k) were not exhaustively + explored due to 24-hour datathon constraints.
+
Explore barrier locations, grid cells, and detected hotspots across the study area. Use the layer controls above to switch views.
+Explore barrier locations, grid cells, and detected hotspots across the study area. Use the layer + controls above to switch views.
@@ -522,16 +551,29 @@Comparison of the GNN approach against classical spatial statistics and clustering baselines.
- Jaccard vs. Moran's I Tradeoff: Methods that achieve high spatial coherence (like Getis-Ord Gi* with Moran's I = 0.75) identify very few hotspots (17), while methods that find more hotspots often sacrifice spatial coherence. The GNN approach identifies 255 hotspots while maintaining moderate spatial coherence (0.24), suggesting a balance between coverage and spatial coherence. However, this balance depends on DBSCAN clustering parameters (eps, min_samples) and risk score thresholds, which were selected empirically. The tradeoff is not necessarily optimal. It reflects our parameter choices rather than an inherent property of the GNN approach. +
+ Jaccard vs. Moran's I Tradeoff: Methods that achieve high spatial coherence (like + Getis-Ord Gi* with Moran's I = 0.75) identify very few hotspots (17), while methods that find more + hotspots often sacrifice spatial coherence. The GNN approach identifies 255 hotspots while + maintaining moderate spatial coherence (0.24), suggesting a balance between coverage and spatial + coherence. However, this balance depends on DBSCAN clustering parameters (eps, min_samples) and risk + score thresholds, which were selected empirically. The tradeoff is not necessarily optimal. It + reflects our parameter choices rather than an inherent property of the GNN approach.
The low Jaccard similarity (0.01-0.05) between GNN and all baselines suggests the GNN identifies different spatial patterns than classical methods. This may reflect learned spatial-contextual relationships, though it could also indicate parameter sensitivity or overfitting. The patterns identified include:
+The low Jaccard similarity (0.01-0.05) between GNN and all baselines + suggests the GNN identifies different spatial patterns than classical methods. This may reflect + learned spatial-contextual relationships, though it could also indicate parameter sensitivity or + overfitting. The patterns identified include:
+
Graph Attention Network (GAT) with contrastive learning for spatial hotspot detection.
- In one sentence: Contrastive learning teaches the model that nearby locations with similar accessibility issues should have similar embeddings, while distant or dissimilar locations should have different embeddings, without requiring explicit labels about which locations are "hotspots." +
+ In one sentence: Contrastive learning teaches the model that nearby locations with + similar accessibility issues should have similar embeddings, while distant or dissimilar locations + should have different embeddings, without requiring explicit labels about which locations are + "hotspots."
-Limitations: (1) Hotspot boundaries depend on DBSCAN clustering parameters (eps, min_samples) selected empirically. While we achieve 100% coverage of high-severity issues on the test set, the exact spatial extent of hotspots may vary with different parameters. (2) Graph construction (KNN k=15) was chosen empirically; alternative constructions (radius-based, adaptive k) were not exhaustively explored due to 24-hour datathon constraints. (3) The model was trained on a single train/test split; cross-validation would provide more robust performance estimates but was not feasible within the time constraint. (4) The dataset's crowdsourced nature may introduce reporting biases (e.g., certain neighborhoods may be over-reported) that affect generalizability.
+Limitations: (1) Hotspot boundaries depend on DBSCAN clustering parameters (eps, + min_samples) selected empirically. While we achieve 100% coverage of high-severity issues on the + test set, the exact spatial extent of hotspots may vary with different parameters. (2) Graph + construction (KNN k=15) was chosen empirically; alternative constructions (radius-based, + adaptive k) were not exhaustively explored due to 24-hour datathon constraints. (3) The model + was trained on a single train/test split; cross-validation would provide more robust performance + estimates but was not feasible within the time constraint. (4) The dataset's crowdsourced nature + may introduce reporting biases (e.g., certain neighborhoods may be over-reported) that affect + generalizability.
+
Evaluation of model performance on held-out test set (15% of data, 70/15/15 train/val/test split). Metrics computed on test set only to avoid overfitting concerns.
+Evaluation of model performance on held-out test set (15% of data, 70/15/15 train/val/test split). + Metrics computed on test set only to avoid overfitting concerns.
+