diff --git a/rSHUD b/rSHUD index 4e6afef..badfd44 160000 --- a/rSHUD +++ b/rSHUD @@ -1 +1 @@ -Subproject commit 4e6afefb7719cfe6d62da1bf30e952709d650ebf +Subproject commit badfd445686b75709633ec82835d4918494b05b9 diff --git a/testdata/mock_river/BUG_REPORT.md b/testdata/mock_river/BUG_REPORT.md new file mode 100644 index 0000000..74d61ce --- /dev/null +++ b/testdata/mock_river/BUG_REPORT.md @@ -0,0 +1,698 @@ +# rSHUD `shud.river()` FromToNode 索引错配 Bug 报告 + +> 报告日期: 2026-02-10(最后更新: 2026-02-12) +> 分支: `fix/river-index-mismatch` +> 测试环境: R 4.5.0, sf, terra, sp, raster (无 rgeos/rgdal) + +--- + +## 0. 版本与文件指向 + +> **路径约定**:下表使用绝对路径(本机复现用)。相对于 SHUD-NC 仓库根目录 `$PROJ = /Users/danker/Desktop/Hydro-SHUD/SHUD-NC`,旧版 rSHUD 位于 `$PROJ/../rSHUD`。 + +### 0.1 旧版本 rSHUD(bug 发现源) + +| 项目 | 路径 | +|------|------| +| 旧版 rSHUD 仓库 | `/Users/danker/Desktop/Hydro-SHUD/rSHUD` | +| `shud.river()` | `/Users/danker/Desktop/Hydro-SHUD/rSHUD/R/River.R` | +| `FromToNode()` (git HEAD, rgeos) | `/Users/danker/Desktop/Hydro-SHUD/rSHUD/R/GIS_RiverProcess.R:207-221` | +| `FromToNode()` (工作区未提交, sf) | 同上文件,`rgeos::gSimplify` → `sf::st_simplify` 的本地修改 | +| `extractCoords()` | `/Users/danker/Desktop/Hydro-SHUD/rSHUD/R/Func_Misc.R` | + +### 0.2 新版本 rSHUD(SHUD-NC 子模块) + +| 项目 | 路径 | +|------|------| +| 新版 rSHUD 子模块 | `/Users/danker/Desktop/Hydro-SHUD/SHUD-NC/rSHUD` | +| `shud.river()` | `/Users/danker/Desktop/Hydro-SHUD/SHUD-NC/rSHUD/R/River.R` | +| `FromToNode()` | `/Users/danker/Desktop/Hydro-SHUD/SHUD-NC/rSHUD/R/GIS_RiverProcess.R` | + +两个版本的 `shud.river()` 代码完全相同,bug 相同。 + +### 0.3 QHH Baseline 生产数据(旧版 rSHUD 产出) + +| 项目 | 路径 | 说明 | +|------|------|------| +| 生产输出 `.sp.riv` | `/Users/danker/Desktop/Hydro-SHUD/SHUD-NC/runs/qhh/baseline/input/qhh/qhh.sp.riv` | 2022-06-08 生成,1633 条河段 | +| 真实河网几何 | `/Users/danker/Desktop/Hydro-SHUD/SHUD-NC/runs/qhh/baseline/input/qhh/gis/river.shp` | 1633 条河段 | +| 原始河网(Step1 输入) | `/Users/danker/Desktop/Hydro-SHUD/SHUD-NC/runs/qhh/baseline/DataPre/pcs/stm.shp` | 504 条河段 | +| AutoSHUD 流程 | `/Users/danker/Desktop/Hydro-SHUD/SHUD-NC/AutoSHUD/Step3_BuidModel.R:202` | 调用 `shud.river()` | + +### 0.4 SHUD 求解器 + +| 项目 | 路径 | +|------|------| +| BedSlope 读入 | `SHUD/src/ModelData/MD_readin.cpp:144` | +| BedSlope 钳制 | `SHUD/src/ModelData/MD_initialize.cpp:218` | +| MINRIVSLOPE 定义 | `SHUD/src/Model/Macros.hpp:47` | +| Manning 方程 | `SHUD/src/ModelData/MD_RiverFlux.cpp:20-45` | +| 河道状态方程 | `SHUD/src/ModelData/MD_f.cpp:162` | +| @point 读取(已注释) | `SHUD/src/ModelData/MD_readin.cpp:181` | + +--- + +## 1. 问题概述 + +`rSHUD` 包的 `shud.river()` 函数在计算河段坡度(slope)和 From/To 节点坐标时,存在**索引体系错配**问题。该 bug 导致几乎所有河段的高程提取点偏离真实端点,进而产生错误的坡度值和节点坐标和高程 +**影响范围**(仅限 `shud.river()` 第 28-44 行的 `ft` 索引路径): +- `@river$Slope`(河段坡度)— **受影响** ✗ +- `@point`(From/To 节点坐标和高程)— **受影响** ✗ + +**不受影响的字段**(通过独立的内部自洽代码路径计算): +- `@river$Down`(下游拓扑关系)— 由 `sp.RiverDown()` 独立计算 ✓ +- `@river$Type`(河流 Order)— 由 `sp.RiverOrder()` 独立计算 ✓ +- `@river$Length`(河段长度)— 由 `rgeos::gLength()` 直接计算 ✓ + +**关键发现**:河网拓扑(谁是谁的下游)始终正确,这解释了为什么过往计算的流域结果拓扑始终连通。Bug 仅影响坡度和节点坐标/高程。 + +**严重程度**:在 QHH 实际案例中,94.4% 的河段坡度受影响,From 端点高程偏差中位数达 **472 米**。 + +--- + +## 2. Bug 机制详解 + +### 2.1 涉及的代码位置 + +| 文件 | 行号 | 函数 | 作用 | +|------|------|------|------| +| `rSHUD/R/River.R` | 15 | `shud.river()` | `xy = extractCoords(sp.slt)`(默认 `unique=TRUE`)— 提取原始几何的唯一顶点表 | +| `rSHUD/R/River.R` | 28 | `shud.river()` | `ft = FromToNode(sp.slt, simplify=TRUE)` — 获取 From/To 索引 | +| `rSHUD/R/River.R` | 30-31 | `shud.river()` | `xy[ft[,1], ]` — 用索引查表提取高程 | +| `rSHUD/R/GIS_RiverProcess.R` | 207-221 | `FromToNode()` | 内部简化几何后构建索引 | + +### 2.2 根因:两套索引体系混用 + +``` +shud.river() 的执行流程: + + 第 15 行: xy = extractCoords(sp.slt) + ↓ + 原始几何 → unique 顶点表(N 行,`extractCoords()` 默认 `unique=TRUE`) + 例如 Mock3(QHH 真实子集,18 段): N = 1827 + 例如 QHH 生产全量(1633 段,river.shp): N = 41280 + + 第 28 行: ft = FromToNode(sp.slt, simplify=TRUE)[, 2:3] + ↓ + FromToNode 内部: + 1. gSimplify(sp, tol=...) → 简化几何(减少中间顶点) + 2. coord = extractCoords(sp_simplified) → 简化后的 unique 顶点表(M 行) + (R 惰性求值:coord 默认参数在 gSimplify 之后才求值) + 3. NodeIDList(sp_simplified, coord) → 索引基于 M 行表 + ↓ + ft 的索引值 ∈ {1, 2, ..., M} + 例如 Mock3(QHH 子集): M = 50 + 例如 QHH 生产全量: M = 1678(tol≈2.85 km 时简化后几乎只保留端点) + + 第 30 行: zf = raster::extract(dem, xy[ft[,1], ]) + ↓ + ft 的索引是基于简化后 coord(1..M),却被当作原始 xy(1..N)的行号使用 + → 实际取到的是 xy 表中“行号等于 ft”的点(必然落在 xy 的前 M 行) + → 端点坐标/高程被系统性错配,进而影响 `@river$Slope` 与 `@point` +``` + +### 2.3 R 惰性求值机制 + +`FromToNode` 的函数签名: + +```r +FromToNode <- function(sp, coord = extractCoords(sp, unique = TRUE), simplify = TRUE) +``` + +`coord` 是默认参数,R 采用**惰性求值**(lazy evaluation):`coord` 的表达式 `extractCoords(sp, unique=TRUE)` 不会在函数调用时立即求值,而是在 `coord` 第一次被使用时才求值。此时 `sp` 已经被 `gSimplify` 重新赋值为简化后的几何对象,因此 `coord` 实际上是简化后的顶点表。 + +这意味着 **`FromToNode` 内部是自洽的**——`coord` 和 `sp` 都来自简化后的几何,索引一致。但 `shud.river()` 的 `xy` 来自原始几何,与 `ft` 的索引体系不匹配。 + +### 2.4 图解 + +(以下以 Mock3(QHH 子集,N=1827、M=50)为例图解行号错配。) + +``` +原始几何 (N=1827 个唯一顶点): + xy[1] = 河段1的第1个顶点 + xy[2] = 河段1的第2个顶点 + ... + xy[50] = 河段1的第50个顶点(某个中间点) + ... + xy[1827] = 最后一个顶点 + +简化后几何 (M=50 个唯一顶点): + coord[1] = 河段A的起点 + coord[2] = 河段A的终点 + coord[3] = 河段B的起点 + ... + coord[50] = 最后一个端点 + +FromToNode 返回: ft[1,1] = 3 (表示"河段1的From端点是 coord 表的第3行") + +Bug: xy[3, ] ≠ coord[3, ] + xy[3] 是原始表的第3个顶点(河段1的某个中间点) + coord[3] 是简化表的第3个顶点(河段B的起点) + → 通常不是同一个节点(空间偏差可达 km 级) +``` + +补充说明:索引错配不是“随机噪声”。错配后的点由两张表的遍历/编号顺序共同决定;在树状河网的 QHH 生产数据中,错配会表现为**支流段端点高程关系的系统性翻转**,从而导致负坡度比例异常膨胀(见 §8b.2)。 + +--- + +## 3. 测试方案 + +### 3.1 Mock 数据设计 + +为验证 bug 的存在和修复方案的有效性,构建了 3 套递进复杂度的 mock 数据: + +| 数据集 | 描述 | 河段数 | 顶点数 | 拓扑 | 目的 | +|--------|------|--------|--------|------|------| +| Mock1 | Y 形合成河网 | 3 | 21 (unique 19) | 2 支流 → 1 干流 | 最小可复现案例 | +| Mock2 | 树形合成河网 | 7 | 68 (unique ~62) | 4 支流 → 2 中间 → 1 干流 | 多级汇流传播 | +| Mock3 | QHH 真实子集 | 18 | 1844 (unique 1827) | 真实河网拓扑 | 实际影响评估 | + +#### Mock1: Y 形合成河网 + +``` +拓扑结构: + Seg1: (0,100) → 5个锯齿中间点 → (50,50) [支流1] + Seg2: (100,100) → 5个锯齿中间点 → (50,50) [支流2] + Seg3: (50,50) → 5个锯齿中间点 → (50,0) [干流/出口] + +DEM: 10×10 栅格, extent(-5,105,-5,105), 高程 = y 坐标(北高南低) +种子: set.seed(42) +``` + +#### Mock2: 树形合成河网 + +``` +拓扑结构(二级汇流): + Seg1: (0,200) → (25,150) [支流 L1a, 10 顶点] + Seg2: (50,200) → (25,150) [支流 L1b, 10 顶点] + Seg3: (100,200) → (75,150) [支流 R1a, 10 顶点] + Seg4: (150,200) → (75,150) [支流 R1b, 10 顶点] + Seg5: (25,150) → (50,100) [中间 L2, 8 顶点] + Seg6: (75,150) → (50,100) [中间 R2, 8 顶点] + Seg7: (50,100) → (50,0) [干流, 12 顶点] + +DEM: 20×20 栅格, extent(-10,160,-10,210), 高程 = y 坐标 +种子: set.seed(123) +``` + +#### Mock3: QHH 真实子集 + +``` +来源: runs/qhh/baseline/DataPre/pcs/stm.shp (504 条河段) +选取策略: 以最长河段为种子,通过端点空间连通性扩展,收集 18 条拓扑连通河段 +DEM: 从 dem.tif 裁剪,bbox + 500m 缓冲 +投影: EPSG:32647 (UTM 47N) +``` + +### 3.2 河网总览 + +![图1: 三套 mock 数据河网总览](fig1_river_overview.png) + +**图 1** 展示了三套 mock 数据的河网几何。从左到右复杂度递增: +- **Mock1**(左):最简 Y 形,3 条河段各 7 个顶点,中间点有 ±2 的随机锯齿偏移 +- **Mock2**(中):二级汇流树形,7 条河段共 68 个顶点,偏移量 ±3 +- **Mock3**(右):QHH 真实河网子集,18 条河段共 1844 个顶点,复杂的自然河道弯曲 + +每条河段的中间顶点越多,`st_simplify` 简化后减少的顶点越多,N/M 比值越大,索引错配越严重。 + +--- + +## 4. 测试结果 + +### 4.1 验证 1:坐标表行数差异 + +| 数据集 | N_orig (原始) | M_simp (简化后) | N/M 比值 | 简化率 | +|--------|---------------|-----------------|----------|--------| +| Mock1 Y形 | 19 | 16 | 1.19× | 15.8% | +| Mock2 树形 | 62 | 36 | 1.72× | 41.9% | +| Mock3 QHH | **1827** | **50** | **36.5×** | **97.3%** | + +**结论**:`st_simplify` 在所有数据集上都显著减少了顶点数。Mock3(QHH 子集)的简化率高达 97.3%,意味着 1827 行的原始表被压缩到仅 50 行,索引体系完全不兼容。需要注意:QHH 生产全量(1633 段)在同一 `tol=(ext[2]-ext[1])*0.01` 规则下简化后 **M=1678**,并非“几十个点”(见 §8b.2)。 + +### 4.2 验证 2:索引错配复现 + +| 数据集 | 错配河段数 | 错配率 | 高程偏差 min | 高程偏差 median | 高程偏差 max | 高程偏差 SD | +|--------|-----------|--------|-------------|----------------|-------------|------------| +| Mock1 | 2/3 | **66.7%** | -44 | 0 | 22 | 33.6 | +| Mock2 | 6/7 | **85.7%** | -44 | 0 | 66 | 36.7 | +| Mock3 | 17/18 | **94.4%** | 0 | **472.5** | **638** | 148.0 | + +### 4.3 索引错配可视化 + +#### Mock1: Y 形河网 + +![图2a: Mock1 索引错配](fig2a_mismatch_mock1.png) + +**图 2a** 展示了 Y 形河网的索引错配情况。绿色圆点是每条河段的**真实 From 端点**(即河段第一个顶点),红色叉号是 bug 导致取到的**错误点**。红色虚线箭头表示偏移方向和距离。 + +在 3 条河段中,2 条的 From 端点被错误定位。由于 N/M 比值仅 1.19×,部分索引碰巧仍指向正确位置(第 1 条河段),但这只是巧合。 + +#### Mock2: 树形河网 + +![图2b: Mock2 索引错配](fig2b_mismatch_mock2.png) + +**图 2b** 展示了树形河网的错配。7 条河段中 6 条受影响(85.7%)。可以观察到: +- 错误点(红叉)散布在河网的各个位置,与对应河段的真实端点(绿点)不再一一对应 +- 偏移箭头没有单一方向偏置;其空间形态由两张坐标表的遍历顺序共同决定,因此看起来“无规律”,但并非随机噪声 + +#### Mock3: QHH 真实河网 + +![图2c: Mock3 索引错配](fig2c_mismatch_mock3.png) + +**图 2c** 展示了 QHH 河网子集(Mock3)的错配。这是最直观的案例之一: +- 18 条河段中 17 条错配(94.4%) +- 在该子集上 N/M 比值高达 36.5×,ft 索引最大值为 50,因此在 buggy 路径下只能访问原始 `xy` 的前 50 行(见 §2.2),这些行全部来自前几条河段的顶点 +- 在该子集上错误点高度集中在河网的一小段区域内,而非分布在各河段的真实端点上 +- 高程偏差中位数 472m,最大 638m——在山区流域中,这意味着坡度计算完全失真 +(对比:QHH 生产全量中 M=1678,错配点不再集中于极小区域,但负坡度会呈现支流段的结构性膨胀,见 §8b.2。) + +### 4.4 高程偏差对比 + +![图3: 高程偏差对比](fig3_elevation_diff.png) + +**图 3** 以柱状图对比了每条河段 From 端点的正确高程(绿色)与错误高程(红色): +- **Mock1/2**(合成数据):DEM 高程 = y 坐标,偏差反映了空间位置的偏移量 +- **Mock3**(真实数据):偏差极为显著,多数河段的错误高程与正确高程相差数百米 + +### 4.5 坡度对比 + +![图4: 修复前后坡度对比](fig4_slope_comparison.png) + +**图 4** 以散点图对比了修复前(buggy)和修复后(fixed)的河段坡度: +- 黑色虚线为 1:1 参考线,点越偏离该线说明 bug 影响越大 +- **Mock3** 的偏离最为严重,多数点远离 1:1 线,说明 bug 导致的坡度计算在真实数据上完全不可信 + +### 4.6 验证 3:修复方案验证 + +| 数据集 | fixed_max_abs_diff | 高程偏差 (修复后) | 结论 | +|--------|-------------------|-------------------|------| +| Mock1 | **0** | 全部 0 | 修复有效 | +| Mock2 | **0** | 全部 0 | 修复有效 | +| Mock3 | **0** | 全部 0 | 修复有效 | + +修复方案:`FromToNode(sl, coord=xy, simplify=FALSE)` + +修复后,`ft` 索引直接对应 `xy` 的行号,`fixed_from` 与每条河段的真实首顶点完全一致(max_abs_diff = 0),高程偏差归零。 + +### 4.7 汇总统计 + +![图5: 汇总统计](fig5_summary.png) + +**图 5** 上半部分展示了 3 套数据的原始/简化顶点数对比,下半部分展示了错配率。可以清晰看到: +- N/M 比值越大,错配率越高 +- Mock3(QHH 子集)的 N/M 比值(36.5×)和错配率(94.4%)远超合成数据 + +### 4.8 拓扑连接对比 + +![图6: Buggy vs Fixed 拓扑连接](fig6_topology_comparison.png) + +**图 6** 以 2×3 布局对比了 3 套 mock 数据在 bug(上行)和修复后(下行)的 From→To 拓扑连接: + +- **上行(Buggy)**:彩色箭头从错误的 From 点指向错误的 To 点。由于索引错配,箭头起止点不在河段端点上,视觉上呈现“看似乱飞/交叉”的状态——箭头跨越河段、指向非对应端点位置,`@point` 的空间连接失真。Mock3(QHH 子集)尤为明显:由于该子集 ft 索引最大值为 50,只能访问 1827 行原始表的前 50 行,导致箭头集中在河网的一小段区域内 +- **下行(Fixed)**:箭头精确连接每条河段的首尾端点,拓扑连通、方向一致 + +需要强调的是:**这里展示的是 `@point` 中记录的 From/To 坐标的错误**,而非 `@river$Down` 的拓扑关系。`Down` 字段(谁是谁的下游)始终正确,不受此 bug 影响。图中的"拓扑断裂"仅指 `@point` 中存储的节点坐标偏离了真实河段端点。 + +--- + +### 5.1 最小修复(推荐) + +**文件**: `rSHUD/R/River.R` 第 28 行 + +```r +# 修复前(bug): +ft = rbind(FromToNode(sp.slt, simplify=TRUE)[, 2:3]) + +# 修复后: +ft = rbind(FromToNode(sp.slt, coord=xy, simplify=FALSE)[, 2:3]) +``` + +**原理**:显式传入 `coord=xy`(原始几何的顶点表),并设置 `simplify=FALSE` 跳过内部简化。这样 `FromToNode` 使用 `xy` 作为坐标参考表,返回的索引直接对应 `xy` 的行号,消除索引体系不一致。 + +### 5.2 影响分析 + +| 项目 | 说明 | +|------|------| +| 修改范围 | 仅 1 行代码 | +| 向后兼容 | `FromToNode` 的 API 不变,只是调用方式改变 | +| 性能影响 | 跳过 `gSimplify` 后,`NodeIDList` 的 `xy2ID` 需要在更大的 coord 表上匹配,O(N×M) 开销增加。但 `shud.river()` 不是热路径,可接受 | +| 副作用 | `sp.RiverDown()` 和 `sp.RiverOrder()` 不受影响(它们内部自洽) | + +### 5.3 替代方案 A:保留 simplify 的等价修复 + +```r +# 显式传入 coord=xy,但保留 simplify=TRUE +ft = rbind(FromToNode(sp.slt, coord=xy, simplify=TRUE)[, 2:3]) +``` + +**原理**:显式传 `coord=xy` 覆盖默认参数后,惰性求值不再生效。`FromToNode` 内部仍做简化(减少中间顶点加速匹配),但 `NodeIDList` 使用的 `coord` 始终是原始的 `xy` 表。简化后的端点是原始顶点的子集,因此在 `xy` 中能精确匹配。 + +**优势**:保留 `simplify` 的性能收益(简化后河段只有起止两点,`NodeIDList` 遍历更少顶点)。 + +**前提条件**:简化算法(Douglas-Peucker)产出的端点必须是原始顶点的精确子集(无浮点漂移)。`rgeos::gSimplify` 和 `sf::st_simplify` 均满足此条件。 + +### 5.4 替代方案 B:FromToNode 内部防呆 + +```r +FromToNode <- function(sp, coord = extractCoords(sp, unique = TRUE), simplify = TRUE) { + coord <- force(coord) # 强制在 simplify 之前求值 + if (simplify) { + # ... gSimplify / st_simplify ... + } + # coord 始终来自调用时的 sp(未简化),索引自洽 +} +``` + +**优势**:从根源上消除惰性求值陷阱,任何调用方式都不会再出问题。 + +### 5.5 边界风险:xy2ID 严格匹配 + +`xy2ID` / `rowMatch` 使用 **严格相等**(`== 0`)匹配坐标。若未来简化算法产生非原始顶点(如拓扑保持简化的插值点)或发生浮点精度漂移,匹配将返回 0 索引,把"错配"问题变为"缺失"问题。 + +- 当前 fix(`simplify=FALSE` 或显式传 `coord`)可规避 +- 长期建议:在 `xy2ID` 中加入容差匹配(`abs(diff) < tol`),或改用空间索引 + +--- + +## 6. 版本对比 + +| | 旧版 rSHUD (git HEAD) | 旧版 (工作区未提交) | 新版 (SHUD-NC 子模块) | +|---|---|---|---| +| `FromToNode` 简化引擎 | `rgeos::gSimplify` | `sf::st_simplify` | `rgeos::gSimplify` | +| `shud.river()` 第 28 行 | 相同 (bug) | 相同 (bug) | 相同 (bug) | +| 索引错配 bug | **存在** | **存在** | **存在** | + +旧版工作区的唯一改动是将 `FromToNode` 中的 `rgeos::gSimplify` 替换为 `sf::st_simplify`(适配 rgeos 退役),但**未修复 `shud.river()` 的索引错配**。该 bug 自始至终存在于所有版本中。 + +--- + +## 7. 测试文件清单 + +| 文件 | 用途 | +|------|------| +| `testdata/mock_river/gen_mock1_y_shape.R` | Mock1 生成脚本 | +| `testdata/mock_river/gen_mock2_tree.R` | Mock2 生成脚本 | +| `testdata/mock_river/gen_mock3_qhh_subset.R` | Mock3 生成脚本 | +| `testdata/mock_river/mock1_y_shape.rds` | Mock1 数据 (SpatialLines + RasterLayer) | +| `testdata/mock_river/mock2_tree.rds` | Mock2 数据 | +| `testdata/mock_river/mock3_qhh_subset.rds` | Mock3 数据 | +| `testdata/mock_river/test_index_mismatch.R` | 索引错配验证脚本 | +| `testdata/mock_river/mismatch_report.json` | 结构化测试报告 | +| `testdata/mock_river/gen_report_figures.R` | 可视化生成脚本 | +| `testdata/mock_river/fig1_river_overview.png` | 河网总览图 | +| `testdata/mock_river/fig2a_mismatch_mock1.png` | Mock1 错配可视化 | +| `testdata/mock_river/fig2b_mismatch_mock2.png` | Mock2 错配可视化 | +| `testdata/mock_river/fig2c_mismatch_mock3.png` | Mock3 错配可视化 | +| `testdata/mock_river/fig3_elevation_diff.png` | 高程偏差对比 | +| `testdata/mock_river/fig4_slope_comparison.png` | 坡度对比 | +| `testdata/mock_river/fig5_summary.png` | 汇总统计 | +| `testdata/mock_river/fig6_topology_comparison.png` | Buggy vs Fixed 拓扑连接对比 | + +--- + +## 8. 复现步骤 + +```bash +cd /Users/danker/Desktop/Hydro-SHUD/SHUD-NC +git checkout fix/river-index-mismatch + +# 生成 mock 数据 +Rscript testdata/mock_river/gen_mock1_y_shape.R +Rscript testdata/mock_river/gen_mock2_tree.R +Rscript testdata/mock_river/gen_mock3_qhh_subset.R + +# 运行验证 +Rscript testdata/mock_river/test_index_mismatch.R + +# 生成可视化 +Rscript testdata/mock_river/gen_report_figures.R +``` + +--- + +## 8b. 实际 QHH 生产数据验证 + +为验证 mock 测试结果的可靠性,直接检查了 QHH baseline 的实际输出文件。 + +### 8b.1 @point 坐标验证 + +对比 `river.shp`(1633 条河段的真实几何)与 `qhh.sp.riv` 第 3 块(@point)中记录的 From/To 坐标: + +| 指标 | 值 | 说明 | +|------|-----|------| +| 样本量 | 前 100 条河段 | | +| From 坐标正确 (<1m) | **1/100 (1%)** | 仅第 1 条碰巧正确 | +| From 坐标错误 | **99/100 (99%)** | | +| 坐标偏差最小值 | 0.4 m | 第 1 条河段(巧合) | +| 坐标偏差中位数 | **28,103 m (28.1 km)** | 全部 100 条样本的中位数(排除唯一正确样本后为 28,233 m) | +| 坐标偏差最大值 | **65,429 m (65.4 km)** | | +| 坐标偏差均值 | 29,898 m (29.9 km) | 全部 100 条样本 | + +**示例对比**(前 4 条河段): + +| 河段 | river.shp 真实 From | @point From | 偏差 | +|------|---------------------|-------------|------| +| Seg1 | (439639.7, 4206130.4) | (439639.7, 4206130) | 0.4 m ✓ | +| Seg2 | (441686.3, 4205838.2) | (439712.1, 4206037) | 1984 m ✗ | +| Seg3 | (491223.3, 4208686.0) | (439710.8, 4205853) | **51,590 m** ✗ | +| Seg4 | (489759.2, 4207578.2) | (439784.0, 4205852) | **50,005 m** ✗ | + +Seg3 的 @point From 坐标偏离真实端点 **51.6 km**——这不是小误差,而是完全查错了坐标表。 + +另外:全量 1633 段的 `@point` 端点(From 与 To 坐标去重)共有 **1678 个唯一坐标**,数量级与树状河网节点数一致,说明并非“被压缩到极少数点”;问题在于这些点与各河段真实端点发生了系统性错配。 + +### 8b.2 Slope 统计分析 + +直接分析 `qhh.sp.riv` 第 1 块中 1633 条河段的坡度值: + +| 统计项 | 值 | 说明 | +|--------|-----|------| +| 负坡度 | **575/1633 (35.2%)** | 床面坡降按定义通常为非负;但用 DEM 端点估计时可出现少量负值(噪声/局部地形起伏/回水等) | +| 零坡度 | 137/1633 (8.4%) | | +| 正坡度 | 921/1633 (56.4%) | 其中部分也是错误值 | +| 低于 MINRIVSLOPE (4e-4) | **800/1633 (49.0%)** | 被 SHUD 钳制为 4e-4 | +| 最小坡度 | -0.125832 | | +| 最大坡度 | 0.334470 | | +| 中位数坡度 | 0.000450 | | + +#### 8b.2.1 自洽性检查:Slope 与 From/To.z 的关系 + +`shud.river()` 内部按 `Slope = (From.z - To.z) / Length` 计算并写出,因此在 QHH 生产输出中该关系对全部 1633 段成立(浮点误差在 `~1e-7` 量级,可视为 0)。因此 575 段负坡度 **100% 对应 `From.z < To.z`**(From 端点高程低于 To 端点高程)。 + +#### 8b.2.2 负坡度比例为何会系统性膨胀(支流段索引顺序反转) + +在 QHH 生产全量(river.shp, 1633 段)上,`FromToNode(simplify=TRUE)` 的 `tol=(ext[2]-ext[1])*0.01` 规则会将原始 unique 顶点数 **N=41280** 的几何简化到 **M=1678** 个 unique 端点(基本只保留每段首尾点)。随后 `shud.river()` 将 `ft`(基于简化后端点表的索引)误用于原始 `xy` 表行号(见 §2.2),从而把“河网节点编号”错当成“原始顶点表行号”,导致端点坐标/高程错配。 + +该错配在树状河网中会表现出明显结构性,而非“随机查到不相关点”: +- 主干优先遍历时,交汇点更早出现 → 在端点表中拿到较低索引 +- 支流后遍历时,支流源头更晚出现 → 拿到较高索引 +- 支流段更容易出现端点索引顺序与真实高程顺序相反的情形,映射到原始 `xy` 后导致 `From.z < To.z`,从而系统性产生假负坡度 +- 约 **35%** 的负坡度比例与河网中支流段的占比同量级,符合这种结构性机制 + +#### 8b.2.3 真实负坡度 vs Buggy 负坡度(逐段对照) + +负坡度比例的异常膨胀是该 bug 在生产数据上的一个关键表现:用 river.shp 真实端点 + dem.tif 重新计算,自然河道存在 **5.51%(90 段)** 的真实负坡度,但 buggy 结果中负坡度高达 **35.21%(575 段)**。交叉对比显示: + +| 分类 | 条数 | 占比 | +|------|-----:|-----:| +| 真实≥0 但 buggy<0(**纯 bug 造成的假负坡度**) | 539 | 33.01% | +| 真实<0 且 buggy<0(可能为真实地形) | 36 | 2.20% | +| 真实<0 但 buggy≥0(bug 反而"翻正") | 54 | 3.31% | + +即 575 段 buggy 负坡度中 **93.7%(539 段)** 可直接归因于索引错配。详细逐段对比见 `runs/qhh/baseline/analysis/slope_compare_true_vs_bug.csv`。 + +注意:自然河道可能存在局部负坡度,因此“出现负坡度”本身不是 bug 的充分条件;但负坡度比例从 5.5% 膨胀到 35.2% 且其中 93.7% 可归因于索引错配,是非常强的一致性证据。 + +### 8b.3 两个"拓扑"概念的澄清 + +用户指出"旧版本的结果拓扑关系是正确的"——这完全正确。需要严格区分: + +| 概念 | 数据来源 | 计算路径 | 受 bug 影响? | 说明 | +|------|----------|----------|--------------|------| +| **河网拓扑**(连通性) | `@river$Down` | `sp.RiverDown()` 独立计算 | **否** | 谁的下游是谁,始终正确 | +| **节点坐标**(空间位置) | `@point` From/To | `xy[ft[,1], ]` 索引查表 | **是** | From/To 的地理坐标,严重错误 | + +本报告图 6 (fig6_topology_comparison.png) 展示的是 `@point` 坐标连接,**不是** `Down` 拓扑连通性。图中箭头“看似乱飞/交叉”反映的是 @point 坐标错误,而非河网连通性断裂。**`@river$Down` 在 mock 和生产数据中均始终正确。** + +### 8b.4 mock 测试与生产数据一致性 + +| 对比项 | Mock3 (QHH 子集) | QHH 生产数据(前 100 条样本) | 一致? | +|--------|-------------------|-------------|--------| +| @point 错配率 | 94.4% | 99% | ✓ | +| 高程偏差量级 | 中位数 472 m | 中位数 ~28 km 坐标偏差 | ✓(全集更严重) | +| 负坡度存在 | 是 | 是 (35.2%) | ✓ | +| Down 拓扑 | 始终正确 | 始终正确 | ✓ | + +**结论**:mock 测试结果真实反映了生产数据中的 bug 状态。 + +--- + +## 9. 结论 + +1. **Bug 确认**:`shud.river()` 第 28-31 行存在索引体系错配,由 R 惰性求值机制和 `FromToNode` 内部简化导致 +2. **拓扑不受影响**:`@river$Down`(下游关系)和 `@river$Type`(河流 Order)通过独立的内部自洽代码路径计算,始终正确。这解释了过往流域计算结果拓扑始终连通 +3. **受影响字段**:`@river$Slope`(坡度)和 `@point`(From/To 坐标及高程) +4. **实际生产数据验证**:QHH baseline 前 100 条河段中,99% 的 @point 坐标错误(偏差中位数 28.1 km);全量 1633 段中原始 unique 顶点数 N=41280、简化后端点数 M=1678 的索引错配,使负坡度比例从真实的 5.51% 膨胀到 35.2%(其中 93.7% 为假负坡度,且负坡度段 100% 满足 From.z < To.z),49% 被 MINRIVSLOPE 钳制 +5. **修复简单**:仅需修改 1 行代码(`simplify=FALSE` + 显式传 `coord=xy`) +6. **修复有效**:3 套 mock 数据验证修复后高程偏差全部归零 +7. **历史遗留**:该 bug 存在于所有已知版本的 rSHUD 中,从未被修复 +8. **模拟为何仍能运行**:(a) Down 拓扑始终正确保证河水流向正确;(b) 49% 的错误坡度被 MINRIVSLOPE=4e-4 钳制;(c) SHUD 不读取 @point;(d) 全耦合求解器部分吸收残余坡度误差 + +--- + +## 附录 A:下游影响追踪 + +### A.1 数据流:shud.river() → 文件 → SHUD 求解器 + +``` +shud.river() 返回 SHUD.RIVER 对象 + │ + ├─ @river (Index, Down, Type, Slope, Length, BC) + ├─ @rivertype (Width, Depth, Manning, ...) + └─ @point (From.x, From.y, From.z, To.x, To.y, To.z) + │ + ▼ +write.riv(pr, ...) → .sp.riv 文件 + │ + ├─ 第1块: @river (6列) → SHUD 读取 Riv[i].BedSlope = 第4列 + ├─ 第2块: @rivertype (9列) → SHUD 读取 RivType[i].Width/Depth/Manning 等 + └─ 第3块: @point (6列) → SHUD 不读取(代码已注释) +``` + +**关键文件引用**: +- `AutoSHUD/Step3_BuidModel.R:202` — 调用 `shud.river()` +- `rSHUD/R/writeInput.R:118-120` — `write.riv()` 写出 3 个块 +- `SHUD/src/ModelData/MD_readin.cpp:144` — 读取 `Riv[i].BedSlope` +- `SHUD/src/ModelData/MD_readin.cpp:181` — 第3块读取代码已注释 + +### A.2 BedSlope 完整代码路径与影响分析 + +#### A.2.1 BedSlope 在 SHUD 源码中的全部引用 + +| 代码位置 | 用途 | +|----------|------| +| `SHUD/src/classes/River.hpp:69` | 结构体定义 `double BedSlope` | +| `SHUD/src/Model/Macros.hpp:47` | `MINRIVSLOPE = 4e-4` 常量 | +| `SHUD/src/ModelData/MD_readin.cpp:144` | 从 `.sp.riv` 第1块第4列读入,**无合法性检查** | +| `SHUD/src/ModelData/MD_initialize.cpp:218` | 钳制:`BedSlope = max(MINRIVSLOPE, BedSlope)` | +| `SHUD/src/ModelData/MD_RiverFlux.cpp:20` | 河道下泄坡降(toLake/outlet 边界) | +| `SHUD/src/ModelData/MD_RiverFlux.cpp:26` | 河道-河道坡降均值(i→down,Manning 输入) | +| `SHUD/src/ModelData/MD_RiverFlux.cpp:45` | 出口边界(BC=-3)坡降 | +| `SHUD/src/Model/WaterBalanceDiag.cpp:224` | 诊断输出中的出口流量重算 | +| `SHUD/src/classes/River.cpp:104,122` | 非物理用途(打印/调试) | + +#### A.2.2 完整影响传播链 + +``` +rSHUD shud.river() 索引错配 + ↓ +qhh.sp.riv 第1块 Slope 列: 35.2% 负值, 49% < 4e-4 + ↓ +MD_readin.cpp:144 读入 BedSlope(无合法性检查,负值直接接受) + ↓ +MD_initialize.cpp:218 钳制: BedSlope = max(4e-4, BedSlope) + → 49% 的河段统一变为 4e-4,丢失坡度差异性 + ↓ +MD_RiverFlux.cpp:20-49 Flux_RiverDown(): + s = (BedSlope[i] + BedSlope[down]) / 2 (:26) + s += 2 * (stage[i] - stage[down]) / (Length[i] + Length[down]) (:28, 水面梯度项) + Q = ManningEquation(CSarea, n, R, s) (调用点 :23/:34/:49) + ManningEquation 实现: Equations.hpp:54 + Q ~ sqrt(|S|) × A × R^(2/3) / n (含符号) + ↓ +MD_f.cpp:162 DY[iRIV] += QrivDown (河道水位状态方程) + ↓ +MD_f.cpp:238 PassValue → 下游入流 QrivUp + ↓ + ├→ MD_RiverFlux.cpp:104 河道-坡面交换(堰流,受水位间接驱动) + └→ MD_RiverFlux.cpp:116 河道-地下水交换(水头差 dh,受水位间接驱动) + ↓ +MD_f.cpp:68,69 元胞方程接收交换项 → 完整耦合反馈循环 +``` + +#### A.2.3 四类物理过程受影响分析 + +| 物理过程 | 影响方式 | BedSlope 参与方式 | 影响程度 | +|----------|----------|-------------------|----------| +| **河道流量 Q** | `BedSlope` 直接进入 Manning 方程 `Q~sqrt(S)` | 直接 | **最大,可达 ~16× 偏差(取决于真实坡度)** | +| **河道水位** | Q 偏小 → 泄水不畅 → 水位偏高 | 间接(Q→DY_riv) | 显著 | +| **坡面-河道交换** | 水位偏高 → 堰流增强 → 河水溢向坡面 | 间接(水位→WeirFlow) | 中等 | +| **地下水-河道交换** | 水位偏高 → 水头差 dh 变化 → 交换方向/强度改变 | 间接(水位→dh) | 中等 | + +#### A.2.4 量化误差分析 + +**场景(示例)**:真实坡度 `S_true = 0.01`;生产数据验证表明 bug 会导致端点高程关系翻转(表现为 `From.z < To.z`),从而出现 `S_raw < 0`,随后被钳制为 `S_bug = 4e-4` + +Manning 方程中坡度以 `sqrt(S)` 形式参与,因此在相同过水面积和糙率条件下: + +``` +Q_bug / Q_true = sqrt(S_bug / S_true) = sqrt(4e-4 / 0.01) = sqrt(0.04) = 0.2 +``` + +**结论**:bug 导致的 Manning 流量仅为真值的 **20%**,即真值是 bug 值的 **5 倍**。 + +更多场景: + +| 真实坡度 S_true | Bug 后原始值 S_raw(示例,<0) | 钳制后 S_bug | Q_bug/Q_true | 偏差倍数 | +|-----------------|----------|-------------|--------------|----------| +| 0.001 | -0.001 | 4e-4 | 63.2% | 1.58× | +| 0.005 | -0.005 | 4e-4 | 28.3% | 3.54× | +| **0.01** | **-0.01** | **4e-4** | **20.0%** | **5.00×** | +| 0.05 | -0.05 | 4e-4 | 8.9% | 11.2× | +| 0.10 | -0.10 | 4e-4 | 6.3% | 15.8× | + +坡度越陡的河段,bug 导致的流量衰减越严重。对于真实坡度 0.05 以上的陡坡河段,流量可被压缩到真值的不到 10%。 + +#### A.2.5 BedSlope 对截面几何的影响 + +`BedSlope` **不直接参与**截面几何计算。河道宽度/深度/过水面积的更新在 `SHUD/src/classes/River.cpp:49-55`,仅使用 `stage`、`BottomWidth`、`bankslope`、`Length`,不引用 `BedSlope`。 + +但 `BedSlope` 通过改变水位(stage)**间接影响**瞬时过水面积 `u_CSarea`、湿周 `u_CSperem`、顶宽 `u_topWidth` —— 属于二级反馈效应。 + +#### A.2.6 结论 + +错误的 `Slope` 直接进入 Manning 方程的水力坡降 `S`,通过 `Q ~ sqrt(S)` 影响河道流量计算。49% 的河段坡度被统一钳制为 `MINRIVSLOPE = 4e-4`,不仅丢失了坡度的空间差异性,更导致陡坡河段的流量被压缩至真值的 6%~28%。缓冲因素包括: + +1. **最小坡降钳制**:`MINRIVSLOPE = 4e-4` 防止坡度为零或负值导致的数值崩溃,但代价是 49% 的河段失去坡度个体差异 +2. **SHUD 全耦合求解器**:河道流量只是状态方程的一部分,坡面-地下水-河道耦合会部分吸收坡度误差 +3. **水面坡降补偿**:`Flux_RiverDown` 中实际使用 `BedSlope` 均值 + 水面梯度项,当水位差足够大时水面梯度可部分补偿床面坡降错误 + +### A.3 @point 的影响:**不影响求解器结果** + +| 使用位置 | 是否在主流程中 | 说明 | +|----------|---------------|------| +| SHUD 求解器 | **否** | `read_riv()` 只读前2块,第3块读取代码已注释 | +| `RiverLength()` | 否 | rSHUD R 侧函数,AutoSHUD 主流程未调用 | +| `RiverSlope()` | 否 | 同上 | +| `correctRiverSlope()` | 否 | AutoSHUD Step3 未调用 | +| `sp.riv2shp()` | 否 | 当前实现直接读 `gis/river.shp`,不用 `@point` | + +**结论**:`@point` 的错误坐标和高程**不进入 SHUD 动力学计算**。它们只存在于 `.sp.riv` 文件的第3块中,但求解器跳过了这个块。 + +### A.4 correctRiverSlope() 是否掩盖问题 + +- AutoSHUD Step3 **未调用** `correctRiverSlope()` +- 该函数与当前 `shud.river()` 输出存在字段名不一致(`FrNode/ToNode/Zmax` vs `From.x/From.y/From.z`) +- 即使调用,它修正的是负坡度/零坡度,不会修正"正但错误"的坡度值 + +### A.5 影响总结 + +| 字段 | 写入文件 | SHUD 是否读取 | 是否影响计算 | 影响程度 | +|------|----------|--------------|-------------|----------| +| `@river$Slope` | `.sp.riv` 第1块第4列 | **是** → `Riv[i].BedSlope` | **是** → Manning 方程 `Q~sqrt(S)` | **严重**:49% 河段失去坡度差异性,陡坡河段流量衰减至 6%~28% | +| `@point` | `.sp.riv` 第3块 | **否**(代码已注释) | **否** | 无 | +| `@river$Down` | `.sp.riv` 第1块第2列 | 是 | 不受 bug 影响 | — | +| `@river$Type` | `.sp.riv` 第1块第3列 | 是 | 不受 bug 影响 | — | +| `@river$Length` | `.sp.riv` 第1块第5列 | 是 | 不受 bug 影响 | — | + +### A.6 最终结论 + +**Slope 错误会传播到 SHUD 求解器**,通过 Manning 方程影响河道流量和水位计算,进而反馈到坡面-河道水量交换。具体而言: + +- **直接影响**:49% 的河段坡度被统一钳制为 `MINRIVSLOPE = 4e-4`,丢失空间差异性;陡坡河段(真实 S=0.01~0.1)流量被压缩至真值的 **6%~20%**(最多 5~16 倍偏差) +- **间接影响**:流量偏小 → 水位偏高 → 坡面-河道堰流增强 + 地下水-河道交换方向/强度改变 +- **缓冲因素**:(1) MINRIVSLOPE 防止数值崩溃;(2) 全耦合求解器部分吸收误差;(3) 水面梯度项部分补偿床面坡降错误 + +**@point 错误不影响计算结果**,因为 SHUD 求解器不读取该数据块。 + +**建议**:修复 bug 后,对比修复前后的 QHH baseline 出口流量时序,量化 Slope 错误对水文模拟结果的实际影响。 diff --git a/testdata/mock_river/BUG_VERIFICATION_PROMPT.md b/testdata/mock_river/BUG_VERIFICATION_PROMPT.md new file mode 100644 index 0000000..a98631c --- /dev/null +++ b/testdata/mock_river/BUG_VERIFICATION_PROMPT.md @@ -0,0 +1,166 @@ +# rSHUD shud.river() 索引错配 Bug 验证指南 + +## Bug 概述 + +`rSHUD` 包的 `shud.river()` 函数存在索引错配问题: +- `xy` 表来自原始几何(N 行 unique 顶点) +- `ft` 索引来自简化后几何(M 行 unique 端点) +- 用 M 范围的索引去查 N 行的表 → 端点坐标/高程错配 + +## 验证步骤 + +### 1. 准备测试数据 + +```r +library(rSHUD) +library(sp) +library(raster) + +# 创建简单 Y 形河网(3 条河段) +set.seed(42) +coords_list <- list( + cbind(x = c(0, 10, 20, 30, 40, 50, 50), + y = c(0, 10, 20, 30, 40, 50, 60) + runif(7, -2, 2)), + cbind(x = c(50, 60, 70, 80, 90, 100, 100), + y = c(60, 70, 80, 90, 95, 100, 100) + runif(7, -2, 2)), + cbind(x = c(50, 40, 30, 20, 10, 0, 0), + y = c(60, 70, 80, 90, 95, 100, 100) + runif(7, -2, 2)) +) +lines_list <- lapply(coords_list, Line) +lines_obj <- Lines(lines_list, ID = "river") +sp_river <- SpatialLines(list(lines_obj)) + +# 创建 DEM(高程 = y 坐标) +dem <- raster(nrows = 10, ncols = 10, + xmn = -5, xmx = 105, ymn = -5, ymx = 105) +dem[] <- matrix(rep(seq(0, 100, length.out = 10), 10), 10, 10, byrow = TRUE) +``` + +### 2. 提取关键中间变量 + +```r +# 模拟 shud.river() 内部逻辑 +xy <- data.frame(extractCoords(sp_river)) # 原始 unique 顶点表 +N <- nrow(xy) + +# FromToNode 内部会简化几何 +ft <- FromToNode(sp_river, simplify = TRUE)[, 2:3] +M <- max(ft) + +cat("N (原始 unique 顶点数):", N, "\n") +cat("M (简化后 unique 端点数/ft 最大值):", M, "\n") +cat("ft 索引范围: [", min(ft), ",", max(ft), "]\n") +``` + +### 3. 验证索引错配 + +```r +# 检查 ft 索引是否超出合理范围 +# 如果 M < N,则 ft 索引只能访问 xy 的前 M 行 +cat("\n=== 索引错配检测 ===\n") +cat("ft 索引用于查询 xy 表(", N, "行),但 ft 最大值仅为", M, "\n") + +if (M < N) { + cat("⚠️ 存在索引错配风险:ft 索引范围 [1,", M, "] 无法覆盖 xy 全部", N, "行\n") +} else { + cat("✓ 索引范围匹配\n") +} +``` + +### 4. 对比真实端点 vs buggy 端点 + +```r +# 获取每条河段的真实首尾端点 +n_seg <- length(sp_river@lines[[1]]@Lines) +true_from <- matrix(NA, n_seg, 2) +true_to <- matrix(NA, n_seg, 2) + +for (i in 1:n_seg) { + coords <- sp_river@lines[[1]]@Lines[[i]]@coords + true_from[i, ] <- coords[1, ] + true_to[i, ] <- coords[nrow(coords), ] +} + +# buggy 端点(shud.river() 实际使用的) +buggy_from <- as.matrix(xy[ft[, 1], ]) +buggy_to <- as.matrix(xy[ft[, 2], ]) + +# 计算偏差 +dist_from <- sqrt(rowSums((true_from - buggy_from)^2)) +dist_to <- sqrt(rowSums((true_to - buggy_to)^2)) + +cat("\n=== 端点坐标偏差 ===\n") +cat("From 端点偏差: min=", min(dist_from), ", max=", max(dist_from), "\n") +cat("To 端点偏差: min=", min(dist_to), ", max=", max(dist_to), "\n") + +n_from_wrong <- sum(dist_from > 0.1) +n_to_wrong <- sum(dist_to > 0.1) +cat("From 错配数:", n_from_wrong, "/", n_seg, "\n") +cat("To 错配数:", n_to_wrong, "/", n_seg, "\n") +``` + +### 5. 验证坡度计算 + +```r +# 真实坡度 +z_true_from <- raster::extract(dem, true_from) +z_true_to <- raster::extract(dem, true_to) +length_seg <- sapply(1:n_seg, function(i) { + LineLength(sp_river@lines[[1]]@Lines[[i]]) +}) +slope_true <- (z_true_from - z_true_to) / length_seg + +# buggy 坡度(shud.river() 实际计算的) +z_buggy_from <- raster::extract(dem, buggy_from) +z_buggy_to <- raster::extract(dem, buggy_to) +slope_buggy <- (z_buggy_from - z_buggy_to) / length_seg + +cat("\n=== 坡度对比 ===\n") +cat("真实坡度:", round(slope_true, 4), "\n") +cat("Buggy坡度:", round(slope_buggy, 4), "\n") +cat("坡度差异:", round(slope_buggy - slope_true, 4), "\n") + +# 负坡度统计 +cat("\n真实负坡度数:", sum(slope_true < 0), "/", n_seg, "\n") +cat("Buggy负坡度数:", sum(slope_buggy < 0), "/", n_seg, "\n") +``` + +### 6. 验证修复方案 + +```r +# 修复:显式传入 coord=xy 并设置 simplify=FALSE +ft_fixed <- FromToNode(sp_river, coord = xy, simplify = FALSE)[, 2:3] + +fixed_from <- as.matrix(xy[ft_fixed[, 1], ]) +fixed_to <- as.matrix(xy[ft_fixed[, 2], ]) + +dist_from_fixed <- sqrt(rowSums((true_from - fixed_from)^2)) +dist_to_fixed <- sqrt(rowSums((true_to - fixed_to)^2)) + +cat("\n=== 修复后验证 ===\n") +cat("修复后 From 偏差: max=", max(dist_from_fixed), "\n") +cat("修复后 To 偏差: max=", max(dist_to_fixed), "\n") + +if (max(dist_from_fixed) < 0.1 && max(dist_to_fixed) < 0.1) { + cat("✓ 修复有效:端点坐标偏差归零\n") +} else { + cat("⚠️ 修复后仍有偏差\n") +} +``` + +## 预期结果 + +如果 bug 存在,应观察到: +1. `M < N`(简化后端点数 < 原始顶点数) +2. `dist_from` 和 `dist_to` 存在非零偏差 +3. `slope_buggy` 与 `slope_true` 存在差异 +4. buggy 负坡度数 > 真实负坡度数(支流段高程关系翻转) +5. 修复后偏差归零 + +## Bug 根因 + +R 惰性求值机制导致 `FromToNode(sp, coord = extractCoords(sp, unique=TRUE), simplify=TRUE)` 中: +- `coord` 默认参数在 `simplify` 之后才被求值 +- 此时 `sp` 已被简化,`coord` 变成简化后的端点表(M 行) +- 但 `shud.river()` 的 `xy` 是原始顶点表(N 行) +- 用 M 范围索引查 N 行表 → 系统性错配 diff --git a/testdata/mock_river/fig1_river_overview.png b/testdata/mock_river/fig1_river_overview.png new file mode 100644 index 0000000..53aa845 Binary files /dev/null and b/testdata/mock_river/fig1_river_overview.png differ diff --git a/testdata/mock_river/fig2a_mismatch_mock1.png b/testdata/mock_river/fig2a_mismatch_mock1.png new file mode 100644 index 0000000..e82af53 Binary files /dev/null and b/testdata/mock_river/fig2a_mismatch_mock1.png differ diff --git a/testdata/mock_river/fig2b_mismatch_mock2.png b/testdata/mock_river/fig2b_mismatch_mock2.png new file mode 100644 index 0000000..226cad5 Binary files /dev/null and b/testdata/mock_river/fig2b_mismatch_mock2.png differ diff --git a/testdata/mock_river/fig2c_mismatch_mock3.png b/testdata/mock_river/fig2c_mismatch_mock3.png new file mode 100644 index 0000000..7315512 Binary files /dev/null and b/testdata/mock_river/fig2c_mismatch_mock3.png differ diff --git a/testdata/mock_river/fig3_elevation_diff.png b/testdata/mock_river/fig3_elevation_diff.png new file mode 100644 index 0000000..ad4b03c Binary files /dev/null and b/testdata/mock_river/fig3_elevation_diff.png differ diff --git a/testdata/mock_river/fig4_slope_comparison.png b/testdata/mock_river/fig4_slope_comparison.png new file mode 100644 index 0000000..6e03139 Binary files /dev/null and b/testdata/mock_river/fig4_slope_comparison.png differ diff --git a/testdata/mock_river/fig5_summary.png b/testdata/mock_river/fig5_summary.png new file mode 100644 index 0000000..feedd25 Binary files /dev/null and b/testdata/mock_river/fig5_summary.png differ diff --git a/testdata/mock_river/fig6_topology_comparison.png b/testdata/mock_river/fig6_topology_comparison.png new file mode 100644 index 0000000..8a619f1 Binary files /dev/null and b/testdata/mock_river/fig6_topology_comparison.png differ diff --git a/testdata/mock_river/gen_mock1_y_shape.R b/testdata/mock_river/gen_mock1_y_shape.R new file mode 100644 index 0000000..2be8b5b --- /dev/null +++ b/testdata/mock_river/gen_mock1_y_shape.R @@ -0,0 +1,85 @@ +#!/usr/bin/env Rscript + +suppressPackageStartupMessages({ + library(sf) + library(terra) + library(sp) + library(raster) +}) + +set.seed(42) + +make_segment_coords <- function(start_xy, end_xy, n_vertices = 7, jitter = 2) { + t <- seq(0, 1, length.out = n_vertices) + x <- start_xy[1] + (end_xy[1] - start_xy[1]) * t + y <- start_xy[2] + (end_xy[2] - start_xy[2]) * t + + if (n_vertices > 2) { + idx <- 2:(n_vertices - 1) + x[idx] <- x[idx] + runif(length(idx), min = -jitter, max = jitter) + y[idx] <- y[idx] + runif(length(idx), min = -jitter, max = jitter) + } + + cbind(x = x, y = y) +} + +segments <- data.frame( + seg = c("Seg1", "Seg2", "Seg3"), + role = c("trib1", "trib2", "main"), + stringsAsFactors = FALSE +) + +starts <- list( + c(0, 100), + c(100, 100), + c(50, 50) +) + +ends <- list( + c(50, 50), + c(50, 50), + c(50, 0) +) + +geom <- vector("list", nrow(segments)) +for (i in seq_len(nrow(segments))) { + coords <- make_segment_coords(starts[[i]], ends[[i]], n_vertices = 7, jitter = 2) + geom[[i]] <- sf::st_linestring(coords) +} + +sl_sf <- sf::st_sf( + segments, + geometry = sf::st_sfc(geom, crs = sf::st_crs(NA)) +) + +dem <- terra::rast( + nrows = 10, + ncols = 10, + ext = terra::ext(-5, 105, -5, 105) +) +xy <- terra::xyFromCell(dem, 1:terra::ncell(dem)) +terra::values(dem) <- xy[, 2] + +sl_sp <- as(sf::st_geometry(sl_sf), "Spatial") +dem_rl <- raster::raster(dem) + +out <- list( + sl = sl_sp, + dem = dem_rl, + desc = "Y-shape 3 segments" +) + +args <- commandArgs(trailingOnly = FALSE) +file_arg <- grep("^--file=", args, value = TRUE) +script_path <- if (length(file_arg) > 0) sub("^--file=", "", file_arg[1]) else NA_character_ +out_dir <- if (!is.na(script_path)) dirname(normalizePath(script_path)) else file.path("testdata", "mock_river") +out_path <- file.path(out_dir, "mock1_y_shape.rds") +dir.create(dirname(out_path), recursive = TRUE, showWarnings = FALSE) +saveRDS(out, out_path) + +coords <- sf::st_coordinates(sl_sf) +coords_xy <- data.frame(x = coords[, "X"], y = coords[, "Y"]) + +cat("Segments:", nrow(sl_sf), "\n") +cat("Total vertices:", nrow(coords_xy), "\n") +cat("Unique vertices:", nrow(unique(coords_xy)), "\n") diff --git a/testdata/mock_river/gen_mock2_tree.R b/testdata/mock_river/gen_mock2_tree.R new file mode 100644 index 0000000..37a26cf --- /dev/null +++ b/testdata/mock_river/gen_mock2_tree.R @@ -0,0 +1,89 @@ +#!/usr/bin/env Rscript + +suppressPackageStartupMessages({ + library(sf) + library(terra) + library(sp) + library(raster) +}) + +set.seed(123) + +make_segment_coords <- function(start_xy, end_xy, n_vertices) { + t <- seq(0, 1, length.out = n_vertices) + x <- start_xy[1] + (end_xy[1] - start_xy[1]) * t + y <- start_xy[2] + (end_xy[2] - start_xy[2]) * t + + if (n_vertices > 2) { + idx <- 2:(n_vertices - 1) + x[idx] <- x[idx] + runif(length(idx), min = -3, max = 3) + y[idx] <- y[idx] + runif(length(idx), min = -3, max = 3) + } + + cbind(x = x, y = y) +} + +segments <- data.frame( + seg = paste0("Seg", 1:7), + name = c("L1a", "L1b", "R1a", "R1b", "L2", "R2", "Main"), + n_vertices = c(10, 10, 10, 10, 8, 8, 12), + stringsAsFactors = FALSE +) + +starts <- list( + c(0, 200), + c(50, 200), + c(100, 200), + c(150, 200), + c(25, 150), + c(75, 150), + c(50, 100) +) + +ends <- list( + c(25, 150), + c(25, 150), + c(75, 150), + c(75, 150), + c(50, 100), + c(50, 100), + c(50, 0) +) + +geom <- vector("list", nrow(segments)) +for (i in seq_len(nrow(segments))) { + coords <- make_segment_coords(starts[[i]], ends[[i]], segments$n_vertices[i]) + geom[[i]] <- sf::st_linestring(coords) +} + +sl_sf <- sf::st_sf( + segments, + geometry = sf::st_sfc(geom, crs = sf::st_crs(NA)) +) + +dem <- terra::rast( + nrows = 20, + ncols = 20, + ext = terra::ext(-10, 160, -10, 210) +) +xy <- terra::xyFromCell(dem, 1:terra::ncell(dem)) +terra::values(dem) <- xy[, 2] + +sl_sp <- as(sf::st_geometry(sl_sf), "Spatial") +dem_rl <- raster::raster(dem) + +out <- list( + sl = sl_sp, + dem = dem_rl, + desc = "Tree 7 segments 2-level confluence" +) + +out_path <- file.path("testdata", "mock_river", "mock2_tree.rds") +dir.create(dirname(out_path), recursive = TRUE, showWarnings = FALSE) +saveRDS(out, out_path) + +cat("Saved:", normalizePath(out_path, winslash = "/", mustWork = FALSE), "\n\n") +cat("SpatialLines summary:\n") +print(summary(out$sl)) +cat("\nDEM summary:\n") +print(summary(out$dem)) diff --git a/testdata/mock_river/gen_mock3_qhh_subset.R b/testdata/mock_river/gen_mock3_qhh_subset.R new file mode 100644 index 0000000..7b5723c --- /dev/null +++ b/testdata/mock_river/gen_mock3_qhh_subset.R @@ -0,0 +1,185 @@ +#!/usr/bin/env Rscript + +suppressPackageStartupMessages({ + library(sf) + library(terra) + library(sp) + library(raster) +}) + +stm_path <- file.path("runs", "qhh", "baseline", "DataPre", "pcs", "stm.shp") +dem_path <- file.path("runs", "qhh", "baseline", "DataPre", "pcs", "dem.tif") +out_path <- file.path("testdata", "mock_river", "mock3_qhh_subset.rds") + +if (!file.exists(stm_path)) { + stop("Missing stm.shp: ", stm_path) +} +if (!file.exists(dem_path)) { + stop("Missing dem.tif: ", dem_path) +} + +message("Reading stm: ", stm_path) +stm <- sf::st_read(stm_path, quiet = TRUE) + +message("Reading DEM (lazy): ", dem_path) +dem <- terra::rast(dem_path) + +crs_stm <- sf::st_crs(stm) +if (is.na(crs_stm)) { + message("stm.shp CRS is NA; setting EPSG:32647 (UTM 47N).") + sf::st_crs(stm) <- 32647 +} else if (is.na(crs_stm$epsg)) { + message("stm.shp CRS EPSG is NA; setting EPSG:32647 (UTM 47N).") + sf::st_crs(stm) <- 32647 +} + +if (sf::st_is_longlat(stm)) { + stop("stm.shp CRS appears to be geographic (lon/lat); expected PCS UTM (meters).") +} + +stm$..len_m <- as.numeric(sf::st_length(stm)) +seed_idx <- which.max(stm$..len_m) + +min_n <- 15L +max_n <- 20L +target_n <- 18L + +expand_connected <- function(stm_sf, seed, buffer_m, target_n, min_n, max_n, max_iter = 80L) { + selected <- seed + stop_n <- max(min_n, min(max_n, target_n)) + + for (iter in seq_len(max_iter)) { + if (length(selected) >= stop_n) break + + endpoints <- sf::st_cast( + sf::st_boundary(sf::st_geometry(stm_sf[selected, ])), + "POINT" + ) + endpoints_buf <- sf::st_buffer(endpoints, dist = buffer_m) + + hits <- lengths(sf::st_intersects(stm_sf, endpoints_buf)) > 0 + cand <- setdiff(which(hits), selected) + if (length(cand) == 0) break + + cand <- cand[order(stm_sf$..len_m[cand], decreasing = TRUE)] + space <- stop_n - length(selected) + if (space <= 0) break + + add <- cand[seq_len(min(length(cand), space))] + selected <- unique(c(selected, add)) + } + + selected +} + +select_by_bbox <- function(stm_sf, seed, target_n, min_n, max_n) { + seed_geom <- sf::st_geometry(stm_sf[seed, ]) + base_bbox <- sf::st_bbox(seed_geom) + bbox_poly <- function(bbox) sf::st_as_sfc(bbox) + + deltas <- c(200, 300, 500, 800, 1200, 1800, 2500, 3500, 5000, 8000) + counts <- integer(length(deltas)) + picks <- vector("list", length(deltas)) + + for (i in seq_along(deltas)) { + d <- deltas[i] + bb <- base_bbox + bb[c("xmin", "ymin")] <- bb[c("xmin", "ymin")] - d + bb[c("xmax", "ymax")] <- bb[c("xmax", "ymax")] + d + hit <- lengths(sf::st_intersects(stm_sf, bbox_poly(bb))) > 0 + idx <- which(hit) + counts[i] <- length(idx) + picks[[i]] <- idx + } + + ok <- which(counts >= min_n & counts <= max_n) + if (length(ok) > 0) { + i <- ok[which.min(abs(counts[ok] - target_n))] + return(picks[[i]]) + } + + i <- which.min(abs(counts - target_n)) + idx <- picks[[i]] + if (length(idx) > max_n) { + idx <- idx[order(stm_sf$..len_m[idx], decreasing = TRUE)][seq_len(max_n)] + } + idx +} + +message("Selecting subset from ", nrow(stm), " segments...") +buffer_candidates <- c(5, 10, 25, 50) + +selected_idx <- integer() +for (buffer_m in buffer_candidates) { + message(" trying connected expansion with buffer=", buffer_m, "m ...") + idx <- expand_connected(stm, seed_idx, buffer_m, target_n, min_n, max_n) + message(" got ", length(idx), " segments") + if (length(idx) >= min_n) { + selected_idx <- idx + break + } +} + +if (length(selected_idx) < min_n) { + warning("Connected expansion did not reach ", min_n, " segments; falling back to bbox selection.") + selected_idx <- select_by_bbox(stm, seed_idx, target_n, min_n, max_n) + message(" bbox selection got ", length(selected_idx), " segments") +} + +if (length(selected_idx) > max_n) { + selected_idx <- selected_idx[seq_len(max_n)] +} + +stm_sub <- stm[selected_idx, ] +stm_sub$..len_m <- NULL + +if (nrow(stm_sub) < min_n || nrow(stm_sub) > max_n) { + warning("Final subset has ", nrow(stm_sub), " segments (expected ", min_n, "-", max_n, ").") +} + +bbox <- sf::st_bbox(stm_sub) +expand_m <- 500 +bbox_exp <- bbox +bbox_exp[c("xmin", "ymin")] <- bbox_exp[c("xmin", "ymin")] - expand_m +bbox_exp[c("xmax", "ymax")] <- bbox_exp[c("xmax", "ymax")] + expand_m + +ext_sub <- terra::ext( + as.numeric(bbox_exp["xmin"]), + as.numeric(bbox_exp["xmax"]), + as.numeric(bbox_exp["ymin"]), + as.numeric(bbox_exp["ymax"]) +) + +message("Cropping DEM to subset bbox + ", expand_m, "m ...") +dem_sub <- terra::crop(dem, ext_sub, snap = "out") +if (terra::nlyr(dem_sub) != 1) { + dem_sub <- dem_sub[[1]] +} + +sl_sp <- as(sf::st_geometry(stm_sub), "Spatial") +sl <- as(sl_sp, "SpatialLines") + +dem_rl <- raster::raster(dem_sub) +if (!raster::inMemory(dem_rl)) { + dem_rl <- raster::readAll(dem_rl) +} + +out <- list( + sl = sl, + dem = dem_rl, + desc = "QHH subset ~15-20 segments", + n_segments = nrow(stm_sub) +) + +dir.create(dirname(out_path), recursive = TRUE, showWarnings = FALSE) +saveRDS(out, out_path) + +coords <- sf::st_coordinates(stm_sub) +total_vertices <- nrow(coords) + +cat("Saved:", normalizePath(out_path, winslash = "/", mustWork = FALSE), "\n") +cat("Summary\n") +cat(" n_segments :", out$n_segments, "\n") +cat(" total_vertices:", total_vertices, "\n") +cat(" bbox :", "\n") +print(bbox) diff --git a/testdata/mock_river/gen_report_figures.R b/testdata/mock_river/gen_report_figures.R new file mode 100644 index 0000000..b56ea70 --- /dev/null +++ b/testdata/mock_river/gen_report_figures.R @@ -0,0 +1,604 @@ +#!/usr/bin/env Rscript + +options(stringsAsFactors = FALSE) + +suppressPackageStartupMessages({ + library(sf) + library(terra) + library(sp) + library(raster) +}) + +get_script_dir <- function() { + args <- commandArgs(trailingOnly = FALSE) + file_arg <- grep("^--file=", args, value = TRUE) + if (length(file_arg) < 1) { + return(normalizePath(getwd())) + } + script_path <- sub("^--file=", "", file_arg[1]) + dirname(normalizePath(script_path)) +} + +# ---------------------------------------------------------------------- +# Helper functions copied from test_index_mismatch.R +# ---------------------------------------------------------------------- + +rowMatch <- function(x, m) { + n <- length(x) + nc <- ncol(m) + if (n != nc) { + return(FALSE) + } + y <- m * 1 + for (i in seq_len(nc)) { + y[, i] <- (m[, i] - x[i]) + } + apply(y, 1, FUN = function(x) { + all(x == 0) + }) +} + +extractCoords <- function(x, unique = TRUE, aslist = FALSE) { + spl <- methods::as(x, "SpatialLines") + spp <- methods::as(spl, "SpatialPoints") + pts <- sp::coordinates(spp) + if (unique) { + return(unique(pts)) + } + pts +} + +xy2ID <- function(xy, coord) { + if (!(is.matrix(xy) || is.data.frame(xy))) { + xy <- matrix(xy, ncol = 2) + } + ng <- nrow(xy) + id <- rep(0, ng) + for (i in seq_len(ng)) { + dd <- which(rowMatch(xy[i, ], coord)) + if (length(dd) > 0) { + id[i] <- dd + } + } + id +} + +NodeIDList <- function(sp, coord = extractCoords(sp, unique = TRUE)) { + pt.list <- unlist(sp::coordinates(sp), recursive = FALSE) + lapply(pt.list, function(x) { + xy2ID(x, coord = coord) + }) +} + +FromToNode <- function(sp, coord = extractCoords(sp, unique = TRUE), simplify = TRUE) { + if (simplify) { + ext <- raster::extent(sp) + tol <- (ext[2] - ext[1]) * 0.01 + old_s2 <- sf::sf_use_s2() + on.exit(suppressMessages(sf::sf_use_s2(old_s2)), add = TRUE) + suppressMessages(sf::sf_use_s2(FALSE)) + sp.sf <- sf::st_as_sf(sp) + sp.sf <- sf::st_simplify(sp.sf, preserveTopology = FALSE, dTolerance = tol) + sp <- methods::as(sp.sf, "Spatial") + } + id.list <- NodeIDList(sp, coord = coord) + frto <- cbind( + unlist(lapply(id.list, function(x) x[1])), + unlist(lapply(id.list, function(x) x[length(x)])) + ) + frto <- cbind(seq_len(length(sp)), frto) + colnames(frto) <- c("ID", "FrNode", "ToNode") + rbind(frto) +} + +simplify_sp_lines <- function(sp) { + ext <- raster::extent(sp) + tol <- (ext[2] - ext[1]) * 0.01 + old_s2 <- sf::sf_use_s2() + on.exit(suppressMessages(sf::sf_use_s2(old_s2)), add = TRUE) + suppressMessages(sf::sf_use_s2(FALSE)) + sp.sf <- sf::st_as_sf(sp) + sp.sf <- sf::st_simplify(sp.sf, preserveTopology = FALSE, dTolerance = tol) + methods::as(sp.sf, "Spatial") +} + +get_segment_parts <- function(sp) { + sl <- methods::as(sp, "SpatialLines") + lapply(sl@lines, function(one_line) { + lapply(one_line@Lines, function(seg) { + seg@coords[, 1:2, drop = FALSE] + }) + }) +} + +segment_endpoints <- function(sp) { + seg_parts <- get_segment_parts(sp) + first_xy <- do.call(rbind, lapply(seg_parts, function(parts) { + parts[[1]][1, 1:2, drop = FALSE] + })) + last_xy <- do.call(rbind, lapply(seg_parts, function(parts) { + p <- parts[[length(parts)]] + p[nrow(p), 1:2, drop = FALSE] + })) + colnames(first_xy) <- c("X", "Y") + colnames(last_xy) <- c("X", "Y") + list(from = first_xy, to = last_xy) +} + +segment_label_points <- function(sp) { + seg_parts <- get_segment_parts(sp) + do.call(rbind, lapply(seg_parts, function(parts) { + pts <- do.call(rbind, parts) + c(mean(pts[, 1]), mean(pts[, 2])) + })) +} + +segment_lengths <- function(sp) { + seg_parts <- get_segment_parts(sp) + sapply(seg_parts, function(parts) { + sum(sapply(parts, function(coords) { + if (nrow(coords) < 2) { + return(0) + } + dxy <- coords[-1, , drop = FALSE] - coords[-nrow(coords), , drop = FALSE] + sum(sqrt(rowSums(dxy^2))) + })) + }) +} + +format_pct <- function(x, digits = 1) { + sprintf(paste0("%.", digits, "f%%"), x * 100) +} + +open_png <- function(path, width, height, res = 150) { + os_name <- tolower(Sys.info()["sysname"]) + + if (grepl("darwin", os_name)) { + try_quartz <- try( + quartz( + file = path, + type = "png", + width = width / res, + height = height / res, + dpi = res + ), + silent = TRUE + ) + if (!inherits(try_quartz, "try-error")) { + return(invisible(TRUE)) + } + } + + ok <- try( + png(filename = path, width = width, height = height, units = "px", res = res, type = "cairo"), + silent = TRUE + ) + if (!inherits(ok, "try-error")) { + return(invisible(TRUE)) + } + + png(filename = path, width = width, height = height, units = "px", res = res) + invisible(TRUE) +} + +set_png_dpi <- function(paths, res = 150) { + sips_bin <- Sys.which("sips") + if (!nzchar(sips_bin)) { + return(invisible(FALSE)) + } + + for (p in paths) { + if (!file.exists(p)) { + next + } + suppressWarnings(system2( + sips_bin, + args = c( + "--setProperty", "dpiWidth", as.character(res), + "--setProperty", "dpiHeight", as.character(res), + p + ), + stdout = FALSE, + stderr = FALSE + )) + } + invisible(TRUE) +} + +pad_range <- function(x, frac = 0.08) { + xr <- range(x, finite = TRUE) + if (length(xr) < 2 || !all(is.finite(xr))) { + return(c(0, 1)) + } + span <- xr[2] - xr[1] + if (span <= 0) { + pad <- max(1, abs(xr[1]) * frac) + return(c(xr[1] - pad, xr[2] + pad)) + } + c(xr[1] - span * frac, xr[2] + span * frac) +} + +map_diff_color <- function(x) { + pal <- grDevices::colorRampPalette(c("#2DC937", "#E7B416", "#CC3232"))(100) + out <- rep("grey60", length(x)) + ok <- is.finite(x) + if (!any(ok)) { + return(out) + } + xr <- range(x[ok]) + if (abs(xr[2] - xr[1]) < 1e-12) { + out[ok] <- pal[60] + return(out) + } + idx <- floor((x[ok] - xr[1]) / (xr[2] - xr[1]) * 99) + 1 + idx <- pmin(100, pmax(1, idx)) + out[ok] <- pal[idx] + out +} + +case_specs <- list( + list(id = "mock1_y_shape", file = "mock1_y_shape.rds", short = "Mock1", label = "Mock1: Y-shape"), + list(id = "mock2_tree", file = "mock2_tree.rds", short = "Mock2", label = "Mock2: Tree"), + list(id = "mock3_qhh_subset", file = "mock3_qhh_subset.rds", short = "Mock3", label = "Mock3: QHH subset") +) + +script_dir <- get_script_dir() +mock_dir <- script_dir + +calc_case_metrics <- function(spec) { + path <- file.path(mock_dir, spec$file) + if (!file.exists(path)) { + stop("Missing input file: ", path) + } + dat <- readRDS(path) + sl <- dat$sl + dem <- dat$dem + + xy_orig <- extractCoords(sl) + xy_simp <- extractCoords(simplify_sp_lines(sl)) + ft_buggy <- FromToNode(sl, simplify = TRUE)[, 2:3, drop = FALSE] + wrong_from <- xy_orig[ft_buggy[, 1], , drop = FALSE] + wrong_to <- xy_orig[ft_buggy[, 2], , drop = FALSE] + + ends <- segment_endpoints(sl) + right_from <- ends$from + right_to <- ends$to + + eps <- 1e-9 + mismatch_flag <- rowSums(abs(wrong_from - right_from) > eps) > 0 + mismatch_row_rate <- sum(mismatch_flag) / nrow(ft_buggy) + + z_right_from <- raster::extract(dem, right_from) + z_wrong_from <- raster::extract(dem, wrong_from) + z_right_to <- raster::extract(dem, right_to) + z_wrong_to <- raster::extract(dem, wrong_to) + + seg_len <- segment_lengths(sl) + seg_len[seg_len <= 0] <- NA_real_ + + fixed_slope <- (z_right_from - z_right_to) / seg_len + buggy_slope <- (z_wrong_from - z_wrong_to) / seg_len + + data.frame( + id = spec$id, + short = spec$short, + label = spec$label, + n_segments = length(sl), + N_orig = nrow(xy_orig), + M_simp = nrow(xy_simp), + mismatch_row_rate = mismatch_row_rate, + stringsAsFactors = FALSE + ) -> summary_row + + list( + spec = spec, + sl = sl, + dem = dem, + summary = summary_row, + label_points = segment_label_points(sl), + wrong_from = wrong_from, + wrong_to = wrong_to, + right_from = right_from, + right_to = right_to, + mismatch_flag = mismatch_flag, + z_right_from = z_right_from, + z_wrong_from = z_wrong_from, + fixed_slope = fixed_slope, + buggy_slope = buggy_slope, + slope_abs_diff = abs(fixed_slope - buggy_slope), + mean_abs_elev_diff = mean(abs(z_wrong_from - z_right_from), na.rm = TRUE) + ) +} + +case_data <- lapply(case_specs, calc_case_metrics) + +fig1_path <- file.path(mock_dir, "fig1_river_overview.png") +open_png(fig1_path, width = 1200, height = 400, res = 150) +par(mfrow = c(1, 3), mar = c(3.5, 3.5, 3.5, 1), mgp = c(2.1, 0.7, 0), cex.main = 0.95) +for (one in case_data) { + n_seg <- one$summary$n_segments[1] + seg_cols <- grDevices::rainbow(n_seg, s = 0.85, v = 0.9) + plot( + one$sl, + col = seg_cols, + lwd = 2, + axes = TRUE, + xlab = "X", + ylab = "Y", + main = sprintf("%s\nSegments=%d", one$spec$label, n_seg) + ) + text( + x = one$label_points[, 1], + y = one$label_points[, 2], + labels = seq_len(n_seg), + col = seg_cols, + cex = 0.75, + pos = 3 + ) +} +dev.off() + +fig2_paths <- c( + mock1_y_shape = file.path(mock_dir, "fig2a_mismatch_mock1.png"), + mock2_tree = file.path(mock_dir, "fig2b_mismatch_mock2.png"), + mock3_qhh_subset = file.path(mock_dir, "fig2c_mismatch_mock3.png") +) + +for (one in case_data) { + out_path <- fig2_paths[[one$spec$id]] + open_png(out_path, width = 900, height = 700, res = 150) + par(mar = c(3.8, 3.8, 4, 1), mgp = c(2.2, 0.7, 0)) + plot( + one$sl, + col = "grey70", + lwd = 2, + axes = TRUE, + xlab = "X", + ylab = "Y", + main = sprintf( + "%s From-node Index Mismatch\nMismatch rate=%s (%d/%d)", + one$spec$label, + format_pct(one$summary$mismatch_row_rate[1], digits = 1), + sum(one$mismatch_flag), + one$summary$n_segments[1] + ) + ) + + idx_arrow <- which(one$mismatch_flag) + if (length(idx_arrow) > 0) { + for (k in idx_arrow) { + arrows( + x0 = one$right_from[k, 1], + y0 = one$right_from[k, 2], + x1 = one$wrong_from[k, 1], + y1 = one$wrong_from[k, 2], + col = "red3", + lwd = 1, + lty = 2, + length = 0.08 + ) + } + } + + points(one$right_from[, 1], one$right_from[, 2], pch = 21, bg = "chartreuse3", col = "darkgreen", cex = 1.1) + points(one$wrong_from[, 1], one$wrong_from[, 2], pch = 4, col = "red3", cex = 1.2, lwd = 1.4) + text(one$right_from[, 1], one$right_from[, 2], labels = seq_len(one$summary$n_segments[1]), pos = 3, cex = 0.65) + + legend( + "topleft", + legend = c("River segments", "Correct From endpoint", "Wrong From endpoint", "Offset arrow (correct->wrong)"), + col = c("grey60", "darkgreen", "red3", "red3"), + lty = c(1, NA, NA, 2), + lwd = c(2, NA, 1.4, 1), + pch = c(NA, 21, 4, NA), + pt.bg = c(NA, "chartreuse3", NA, NA), + bty = "n", + cex = 0.85 + ) + dev.off() +} + +fig3_path <- file.path(mock_dir, "fig3_elevation_diff.png") +open_png(fig3_path, width = 1200, height = 500, res = 150) +par(mfrow = c(1, 3), mar = c(4.5, 4, 3.8, 1), mgp = c(2.2, 0.7, 0), cex.main = 0.9) +for (one in case_data) { + z_mat <- rbind(one$z_right_from, one$z_wrong_from) + colnames(z_mat) <- seq_len(ncol(z_mat)) + ylim <- pad_range(z_mat, frac = 0.12) + barplot( + z_mat, + beside = TRUE, + col = c("palegreen3", "tomato2"), + border = NA, + ylim = ylim, + names.arg = seq_len(one$summary$n_segments[1]), + xlab = "Segment ID", + ylab = "Elevation (m)", + main = sprintf("%s\nMean elev. diff=%.2f m", one$spec$label, one$mean_abs_elev_diff) + ) + legend( + "topright", + legend = c("Correct elev. (z_right)", "Wrong elev. (z_wrong)"), + fill = c("palegreen3", "tomato2"), + bty = "n", + cex = 0.8 + ) +} +dev.off() + +fig4_path <- file.path(mock_dir, "fig4_slope_comparison.png") +open_png(fig4_path, width = 1200, height = 500, res = 150) +par(mfrow = c(1, 3), mar = c(4.2, 4.2, 3.6, 1), mgp = c(2.2, 0.7, 0), cex.main = 0.92) +for (one in case_data) { + keep <- is.finite(one$fixed_slope) & is.finite(one$buggy_slope) + if (!any(keep)) { + plot.new() + title(main = sprintf("%s\nSlope comparison (no valid points)", one$spec$label)) + next + } + xs <- one$fixed_slope[keep] + ys <- one$buggy_slope[keep] + ds <- one$slope_abs_diff[keep] + cols <- map_diff_color(ds) + + xy_lim <- pad_range(c(xs, ys), frac = 0.08) + plot( + xs, + ys, + pch = 16, + col = cols, + xlim = xy_lim, + ylim = xy_lim, + xlab = "Fixed slope", + ylab = "Buggy slope", + main = sprintf("%s\nSlope scatter comparison", one$spec$label) + ) + abline(a = 0, b = 1, lty = 2, lwd = 1.2, col = "black") + legend( + "topleft", + legend = c("1:1 reference", "Small diff", "Large diff"), + lty = c(2, NA, NA), + pch = c(NA, 16, 16), + col = c("black", "#2DC937", "#CC3232"), + bty = "n", + cex = 0.8 + ) +} +dev.off() + +summary_df <- do.call(rbind, lapply(case_data, function(one) one$summary)) +summary_df$ratio_NM <- summary_df$N_orig / summary_df$M_simp + +fig5_path <- file.path(mock_dir, "fig5_summary.png") +open_png(fig5_path, width = 800, height = 600, res = 150) +par(mfrow = c(2, 1), mar = c(4.2, 4, 3.5, 1), mgp = c(2.2, 0.7, 0)) + +top_mat <- rbind(summary_df$N_orig, summary_df$M_simp) +colnames(top_mat) <- summary_df$short +bp_top <- barplot( + top_mat, + beside = TRUE, + col = c("steelblue3", "darkorange2"), + border = NA, + ylab = "Row count", + main = "Coord table size: Original N vs Simplified M" +) +text( + x = colMeans(bp_top), + y = apply(top_mat, 2, max) * 1.05, + labels = sprintf("N/M=%.1f", summary_df$ratio_NM), + cex = 0.85 +) +legend( + "topright", + legend = c("N_orig", "M_simp"), + fill = c("steelblue3", "darkorange2"), + bty = "n", + cex = 0.85 +) + +bot_vals <- summary_df$mismatch_row_rate * 100 +bp_bot <- barplot( + matrix(bot_vals, nrow = 1), + beside = TRUE, + col = "firebrick2", + border = NA, + names.arg = summary_df$short, + ylim = c(0, max(bot_vals) * 1.25), + ylab = "Mismatch rate (%)", + main = "FromToNode Row-level Index Mismatch Rate" +) +text( + x = bp_bot, + y = bot_vals + max(1, max(bot_vals) * 0.05), + labels = sprintf("%.1f%%", bot_vals), + cex = 0.9 +) +legend("topright", legend = "mismatch_row_rate", fill = "firebrick2", bty = "n", cex = 0.85) +dev.off() + +draw_topology_panel <- function(one, from_xy, to_xy, row_label) { + n_seg <- one$summary$n_segments[1] + seg_cols <- grDevices::rainbow(n_seg, s = 0.85, v = 0.9) + + panel_title <- if (identical(row_label, "Buggy")) { + sprintf( + "%s - Buggy\nMismatch rate=%s", + one$spec$short, + format_pct(one$summary$mismatch_row_rate[1], digits = 1) + ) + } else { + sprintf("%s - Fixed", one$spec$short) + } + + plot( + one$sl, + col = "grey70", + lwd = 2, + axes = TRUE, + xlab = "X", + ylab = "Y", + main = panel_title + ) + + ok <- is.finite(from_xy[, 1]) & is.finite(from_xy[, 2]) & is.finite(to_xy[, 1]) & is.finite(to_xy[, 2]) + if (any(ok)) { + arrows( + x0 = from_xy[ok, 1], + y0 = from_xy[ok, 2], + x1 = to_xy[ok, 1], + y1 = to_xy[ok, 2], + col = seg_cols[ok], + lwd = 1.5, + length = 0.08 + ) + + mid_x <- (from_xy[ok, 1] + to_xy[ok, 1]) / 2 + mid_y <- (from_xy[ok, 2] + to_xy[ok, 2]) / 2 + text( + x = mid_x, + y = mid_y, + labels = which(ok), + col = seg_cols[ok], + cex = 0.7, + pos = 3 + ) + } + + points(from_xy[, 1], from_xy[, 2], pch = 21, bg = "chartreuse3", col = "darkgreen", cex = 1.0) + points(to_xy[, 1], to_xy[, 2], pch = 24, bg = "dodgerblue2", col = "navy", cex = 1.0) +} + +fig6_path <- file.path(mock_dir, "fig6_topology_comparison.png") +open_png(fig6_path, width = 1200, height = 800, res = 150) +par(mfrow = c(2, 3), mar = c(3.6, 3.6, 3.8, 1.2), mgp = c(2.2, 0.7, 0), cex.main = 0.92) + +for (one in case_data) { + draw_topology_panel(one, one$wrong_from, one$wrong_to, row_label = "Buggy") +} +for (one in case_data) { + draw_topology_panel(one, one$right_from, one$right_to, row_label = "Fixed") +} +dev.off() + +cat("Generated figures:\n") +all_fig_paths <- c( + fig1_path, + fig2_paths[["mock1_y_shape"]], + fig2_paths[["mock2_tree"]], + fig2_paths[["mock3_qhh_subset"]], + fig3_path, + fig4_path, + fig5_path, + fig6_path +) +set_png_dpi(all_fig_paths, res = 150) + +cat(" -", fig1_path, "\n") +cat(" -", fig2_paths[["mock1_y_shape"]], "\n") +cat(" -", fig2_paths[["mock2_tree"]], "\n") +cat(" -", fig2_paths[["mock3_qhh_subset"]], "\n") +cat(" -", fig3_path, "\n") +cat(" -", fig4_path, "\n") +cat(" -", fig5_path, "\n") +cat(" -", fig6_path, "\n") diff --git a/testdata/mock_river/mismatch_report.json b/testdata/mock_river/mismatch_report.json new file mode 100644 index 0000000..e78f02e --- /dev/null +++ b/testdata/mock_river/mismatch_report.json @@ -0,0 +1,108 @@ +{ + "run_info": { + "time": "2026-02-12 12:14:49 +0800", + "r_version": "R version 4.5.0 (2025-04-11)", + "sf_use_s2": true + }, + "cases": { + "mock1_y_shape": { + "case": "mock1_y_shape", + "input": "mock1_y_shape.rds", + "n_segments": 3, + "N_orig": 19, + "M_simp": 16, + "mismatch_component_rate": 1.3333333333, + "mismatch_row_rate": 0.66666666667, + "elev_diff_wrong_right": { + "n": 3, + "n_na": 0, + "min": -44, + "p05": -39.6, + "median": 0, + "mean": -7.3333333333, + "p95": 19.8, + "max": 22, + "sd": 33.6055550963 + }, + "fixed_endpoints_ok": true, + "fixed_max_abs_diff": 0, + "elev_diff_fixed_right": { + "n": 3, + "n_na": 0, + "min": 0, + "p05": 0, + "median": 0, + "mean": 0, + "p95": 0, + "max": 0, + "sd": 0 + } + }, + "mock2_tree": { + "case": "mock2_tree", + "input": "mock2_tree.rds", + "n_segments": 7, + "N_orig": 62, + "M_simp": 36, + "mismatch_component_rate": 1.7142857143, + "mismatch_row_rate": 0.85714285714, + "elev_diff_wrong_right": { + "n": 7, + "n_na": 0, + "min": -44, + "p05": -40.7, + "median": 0, + "mean": 1.5714285714, + "p95": 52.8, + "max": 66, + "sd": 36.7190102574 + }, + "fixed_endpoints_ok": true, + "fixed_max_abs_diff": 0, + "elev_diff_fixed_right": { + "n": 7, + "n_na": 0, + "min": 0, + "p05": 0, + "median": 0, + "mean": 0, + "p95": 0, + "max": 0, + "sd": 0 + } + }, + "mock3_qhh_subset": { + "case": "mock3_qhh_subset", + "input": "mock3_qhh_subset.rds", + "n_segments": 18, + "N_orig": 1827, + "M_simp": 50, + "mismatch_component_rate": 1.8888888889, + "mismatch_row_rate": 0.94444444444, + "elev_diff_wrong_right": { + "n": 18, + "n_na": 0, + "min": 0, + "p05": 249.9, + "median": 472.5, + "mean": 456.5555555556, + "p95": 637.15, + "max": 638, + "sd": 148.0235363726 + }, + "fixed_endpoints_ok": true, + "fixed_max_abs_diff": 0, + "elev_diff_fixed_right": { + "n": 18, + "n_na": 0, + "min": 0, + "p05": 0, + "median": 0, + "mean": 0, + "p95": 0, + "max": 0, + "sd": 0 + } + } + } +} diff --git a/testdata/mock_river/mock1_y_shape.rds b/testdata/mock_river/mock1_y_shape.rds new file mode 100644 index 0000000..fddc92c Binary files /dev/null and b/testdata/mock_river/mock1_y_shape.rds differ diff --git a/testdata/mock_river/mock2_tree.rds b/testdata/mock_river/mock2_tree.rds new file mode 100644 index 0000000..4095fcb Binary files /dev/null and b/testdata/mock_river/mock2_tree.rds differ diff --git a/testdata/mock_river/mock3_qhh_subset.rds b/testdata/mock_river/mock3_qhh_subset.rds new file mode 100644 index 0000000..408659c Binary files /dev/null and b/testdata/mock_river/mock3_qhh_subset.rds differ diff --git a/testdata/mock_river/review_mock1.txt b/testdata/mock_river/review_mock1.txt new file mode 100644 index 0000000..6e78475 --- /dev/null +++ b/testdata/mock_river/review_mock1.txt @@ -0,0 +1,6 @@ +Review: testdata/mock_river/mock1_y_shape.rds +1. [PASS] 3 segments + 7 vertices each + Y-topology endpoints (Seg1->(50,50), Seg2->(50,50), Seg3->(50,0)); observed segments=3, vertices=7,7,7 +2. [PASS] DEM is 10x10, extent (-5,105,-5,105), elevation=y; observed dim=10x10 extent=(-5,105,-5,105) value_equals_y=TRUE +3. [PASS] set.seed(42) reproducibility; seed_declared=TRUE geometry_matches_seed42=TRUE +4. [PASS] sf->sp conversion correct; class=SpatialLines:TRUE roundtrip_geometry_equal=TRUE +5. [PASS] st_simplify reduces vertex count; before=21 after(dTolerance=2)=9 diff --git a/testdata/mock_river/review_mock2.txt b/testdata/mock_river/review_mock2.txt new file mode 100644 index 0000000..cc896e6 --- /dev/null +++ b/testdata/mock_river/review_mock2.txt @@ -0,0 +1,5 @@ +1. 7 segments, 2-level confluence tree topology: PASS +2. Vertex counts (4×10 + 2×8 + 1×12 = 68): PASS +3. DEM 20x20, extent (-10,160,-10,210), elevation = y: PASS +4. set.seed(123) reproducibility: PASS +5. Topology correct (Seg1..Seg7 confluences/endpoints): PASS diff --git a/testdata/mock_river/review_mock3.txt b/testdata/mock_river/review_mock3.txt new file mode 100644 index 0000000..799229d --- /dev/null +++ b/testdata/mock_river/review_mock3.txt @@ -0,0 +1,12 @@ +1) Reads stm.shp and dem.tif from runs/qhh/baseline/DataPre/pcs/: PASS +2) Subset selection (seed longest, connectivity expand, bbox fallback): PASS +3) CRS handling EPSG:32647 UTM 47N: PASS +4) DEM cropped with 500m buffer: PASS +5) DEM loaded into memory (readAll) for self-contained .rds: PASS +6) n_segments in [15,20], no rgeos/rgdal dependency: PASS + +n_segments=18 +dem_inMemory=TRUE +sl_proj4=+proj=utm +zone=47 +datum=WGS84 +units=m +no_defs +dem_extent=xmin 413907.991641 xmax 463795.995540 ymin 4164582.189220 ymax 4206622.014451 +expanded_sl_bbox_500m=xmin 413909.635604 xmax 463788.508148 ymin 4164596.832571 ymax 4206598.689899 diff --git a/testdata/mock_river/review_report.txt b/testdata/mock_river/review_report.txt new file mode 100644 index 0000000..f437281 --- /dev/null +++ b/testdata/mock_river/review_report.txt @@ -0,0 +1,31 @@ +Review of testdata/mock_river/mismatch_report.json +Generated: 2026-02-10 + +1) All 3 cases passed (no error fields) +PASS - No case contains an "error" field; fixed_endpoints_ok=true for all 3 cases. + +2) Mock1: N_orig=19, M_simp=16, mismatch_row_rate≈0.667 +PASS - Observed N_orig=19, M_simp=16, mismatch_row_rate=0.66666666667. + +3) Mock2: N_orig=62, M_simp=36, mismatch_row_rate≈0.857 +PASS - Observed N_orig=62, M_simp=36, mismatch_row_rate=0.85714285714. + +4) Mock3: N_orig=1827, M_simp=50, mismatch_row_rate≈0.944, median elev diff ~472m +PASS - Observed N_orig=1827, M_simp=50, mismatch_row_rate=0.94444444444, elev_diff_wrong_right.median=472.5 m. + +5) All fixed_max_abs_diff = 0 +PASS - fixed_max_abs_diff is 0 for mock1, mock2, and mock3. + +6) All fixed elevation diffs = 0 +PASS - For all cases, elev_diff_fixed_right stats are all zeros (min/p05/median/mean/p95/max/sd = 0). + +7) Mismatch rates increase with N/M ratio (physically expected) +PASS - N/M ratios increase monotonically (19/16=1.1875 < 62/36=1.7222 < 1827/50=36.54), and mismatch_row_rate increases accordingly (0.6667 < 0.8571 < 0.9444). + +8) Mock3 elevation diffs physically plausible for mountainous QHH terrain +PASS - Wrong-vs-right endpoint elevation differences are positive and large (median 472.5 m, p05 249.9 m, p95 637.15 m, max 638 m), which is plausible for steep mountainous terrain. + +9) Conclusive proof: bug exists and fix works +PASS - Pre-fix behavior shows high mismatch rates and large wrong-vs-right elevation discrepancies; post-fix checks show exact endpoint agreement (fixed_max_abs_diff=0 and all fixed elevation differences=0), demonstrating both bug existence and effective fix. + +Overall verdict: PASS (9/9 checklist items passed). diff --git a/testdata/mock_river/review_test_script.txt b/testdata/mock_river/review_test_script.txt new file mode 100644 index 0000000..7af8cb2 --- /dev/null +++ b/testdata/mock_river/review_test_script.txt @@ -0,0 +1,50 @@ +Review target: testdata/mock_river/test_index_mismatch.R +Date: 2026-02-10 + +1) FAIL +- Inlined `extractCoords` cannot be matched against `/Users/danker/Desktop/Hydro-SHUD/rSHUD/R/Func_GIS.R` because that file does not define `extractCoords`. +- `extractCoords` is defined in `/Users/danker/Desktop/Hydro-SHUD/rSHUD/R/Func_Misc.R` (line ~41), and the inlined implementation matches that definition semantically. + +2) FAIL +- Inlined `xy2ID` cannot be matched against `/Users/danker/Desktop/Hydro-SHUD/rSHUD/R/Func_GIS.R` because that file does not define `xy2ID`. +- `xy2ID` is defined in `/Users/danker/Desktop/Hydro-SHUD/rSHUD/R/Func_Misc.R` (line ~63), and the inlined implementation matches that definition semantically. + +3) PASS +- Inlined `FromToNode` matches `/Users/danker/Desktop/Hydro-SHUD/rSHUD/R/GIS_RiverProcess.R` lines 207-227 working copy, including `sf::st_simplify` behavior and `sf_use_s2` handling. + +4) PASS +- Validation1 correctly compares original vs simplified coordinate table sizes: + - `xy_orig <- extractCoords(sl)` + - `xy_simp <- extractCoords(simplify_sp_lines(sl))` + - asserts `N_orig != M_simp`. + +5) PASS +- Validation2 correctly reproduces the mismatch bug scenario by: + - using `FromToNode(sl, simplify = TRUE)` indices, + - confirming indices are in simplified-coordinate range, + - applying those indices to both `xy_orig` and `xy_simp`, + - asserting non-zero mismatch rate. + +6) PASS +- Validation3 correctly verifies the fix by using `FromToNode(sl, coord = xy_orig, simplify = FALSE)` and checking extracted From endpoints against geometry-derived truth (`segment_first_vertices`) with strict tolerance. + +7) PASS +- `segment_first_vertices()` is an appropriate ground truth for From endpoints in this mock dataset: + - extracts first vertex from each segment in `sp::coordinates` order, + - guards against multi-part segments (fails fast if encountered). + +8) PASS +- No `rgeos` / `rgdal` dependency is present in the test script. + +9) PASS +- JSON report is well-structured: + - top-level `run_info` + `cases`, + - each case contains core metrics and stats, + - error-case shape is explicit (`case`, `input`, `error`). + +10) PASS +- Error handling is correctly implemented with `tryCatch` per case, sets `had_error`, records error payload, and exits with non-zero status when any case fails. + +Runtime verification: +- Executed `Rscript testdata/mock_river/test_index_mismatch.R` successfully (exit code 0). +- All three bundled mock cases completed and `mismatch_report.json` was generated. diff --git a/testdata/mock_river/test_index_mismatch.R b/testdata/mock_river/test_index_mismatch.R new file mode 100644 index 0000000..76d3858 --- /dev/null +++ b/testdata/mock_river/test_index_mismatch.R @@ -0,0 +1,303 @@ +#!/usr/bin/env Rscript + +options(stringsAsFactors = FALSE) + +suppressPackageStartupMessages({ + library(sf) + library(sp) + library(raster) + library(jsonlite) +}) + +get_script_dir <- function() { + args <- commandArgs(trailingOnly = FALSE) + file_arg <- grep("^--file=", args, value = TRUE) + if (length(file_arg) < 1) { + return(normalizePath(getwd())) + } + script_path <- sub("^--file=", "", file_arg[1]) + dirname(normalizePath(script_path)) +} + +fmt_num <- function(x, digits = 6) { + if (length(x) != 1 || is.na(x) || !is.finite(x)) { + return("NA") + } + formatC(x, format = "fg", digits = digits) +} + +num_stats <- function(x) { + x <- as.numeric(x) + n_total <- length(x) + n_na <- sum(is.na(x)) + x_ok <- x[!is.na(x)] + if (length(x_ok) < 1) { + return(list(n = n_total, n_na = n_na)) + } + list( + n = n_total, + n_na = n_na, + min = min(x_ok), + p05 = unname(stats::quantile(x_ok, 0.05)), + median = stats::median(x_ok), + mean = mean(x_ok), + p95 = unname(stats::quantile(x_ok, 0.95)), + max = max(x_ok), + sd = stats::sd(x_ok) + ) +} + +stats_one_line <- function(st) { + if (is.null(st$min)) { + return(paste0("n=", st$n, ", n_na=", st$n_na)) + } + paste0( + "n=", st$n, + ", n_na=", st$n_na, + ", min=", fmt_num(st$min), + ", p05=", fmt_num(st$p05), + ", median=", fmt_num(st$median), + ", mean=", fmt_num(st$mean), + ", p95=", fmt_num(st$p95), + ", max=", fmt_num(st$max), + ", sd=", fmt_num(st$sd) + ) +} + +# ---------------------------------------------------------------------- +# Helper functions extracted from old rSHUD workspace (no rgeos/rgdal). +# Do NOT source whole files; only keep minimal functions needed. +# Sources (for reference): +# - /Users/danker/Desktop/Hydro-SHUD/rSHUD/R/Func_Misc.R +# - /Users/danker/Desktop/Hydro-SHUD/rSHUD/R/GIS_RiverProcess.R +# ---------------------------------------------------------------------- + +rowMatch <- function(x, m) { + n <- length(x) + nc <- ncol(m) + if (n != nc) { + return(FALSE) + } + y <- m * 1 + for (i in seq_len(nc)) { + y[, i] <- (m[, i] - x[i]) + } + apply(y, 1, FUN = function(x) { + all(x == 0) + }) +} + +extractCoords <- function(x, unique = TRUE, aslist = FALSE) { + spl <- methods::as(x, "SpatialLines") + spp <- methods::as(spl, "SpatialPoints") + pts <- sp::coordinates(spp) + if (unique) { + return(unique(pts)) + } + pts +} + +xy2ID <- function(xy, coord) { + if (!(is.matrix(xy) || is.data.frame(xy))) { + xy <- matrix(xy, ncol = 2) + } + ng <- nrow(xy) + id <- rep(0, ng) + for (i in seq_len(ng)) { + dd <- which(rowMatch(xy[i, ], coord)) + if (length(dd) > 0) { + id[i] <- dd + } + } + id +} + +NodeIDList <- function(sp, coord = extractCoords(sp, unique = TRUE)) { + pt.list <- unlist(sp::coordinates(sp), recursive = FALSE) + lapply(pt.list, function(x) { + xy2ID(x, coord = coord) + }) +} + +FromToNode <- function(sp, coord = extractCoords(sp, unique = TRUE), simplify = TRUE) { + if (simplify) { + ext <- raster::extent(sp) + tol <- (ext[2] - ext[1]) * 0.01 + old_s2 <- sf::sf_use_s2() + on.exit(suppressMessages(sf::sf_use_s2(old_s2)), add = TRUE) + suppressMessages(sf::sf_use_s2(FALSE)) + sp.sf <- sf::st_as_sf(sp) + sp.sf <- sf::st_simplify(sp.sf, preserveTopology = FALSE, dTolerance = tol) + sp <- methods::as(sp.sf, "Spatial") + } + id.list <- NodeIDList(sp, coord = coord) + frto <- cbind( + unlist(lapply(id.list, function(x) x[1])), + unlist(lapply(id.list, function(x) x[length(x)])) + ) + frto <- cbind(seq_len(length(sp)), frto) + colnames(frto) <- c("ID", "FrNode", "ToNode") + rbind(frto) +} + +simplify_sp_lines <- function(sp) { + ext <- raster::extent(sp) + tol <- (ext[2] - ext[1]) * 0.01 + old_s2 <- sf::sf_use_s2() + on.exit(suppressMessages(sf::sf_use_s2(old_s2)), add = TRUE) + suppressMessages(sf::sf_use_s2(FALSE)) + sp.sf <- sf::st_as_sf(sp) + sp.sf <- sf::st_simplify(sp.sf, preserveTopology = FALSE, dTolerance = tol) + methods::as(sp.sf, "Spatial") +} + +segment_first_vertices <- function(sp) { + sp <- methods::as(sp, "SpatialLines") + pt.list <- unlist(sp::coordinates(sp), recursive = FALSE) + if (length(pt.list) != length(sp)) { + stop("Unexpected SpatialLines structure: multiple Line parts per segment.") + } + ret <- do.call(rbind, lapply(pt.list, function(x) { + x[1, 1:2, drop = FALSE] + })) + colnames(ret) <- c("X", "Y") + ret +} + +assert_true <- function(ok, msg) { + if (!isTRUE(ok)) { + stop(msg, call. = FALSE) + } +} + +run_one_case <- function(case_name, rds_path) { + message("Case: ", case_name, " (", basename(rds_path), ")") + dat <- readRDS(rds_path) + sl <- dat$sl + dem <- dat$dem + + # Validation 1: coord table row count differs after simplify + xy_orig <- extractCoords(sl) + sl_simp_sp <- simplify_sp_lines(sl) + xy_simp <- extractCoords(sl_simp_sp) + N_orig <- nrow(xy_orig) + M_simp <- nrow(xy_simp) + assert_true(N_orig != M_simp, paste0( + "Validation1 failed: nrow(xy_orig) == nrow(xy_simp) == ", N_orig + )) + + # Validation 2: reproduce mismatch (ft index from simplified coord used on original coord) + ft_all <- FromToNode(sl, simplify = TRUE)[, 2:3, drop = FALSE] + assert_true(all(ft_all[, 1] >= 1) && all(ft_all[, 1] <= M_simp), paste0( + "Validation2 failed: ft FrNode index out of range for xy_simp (M_simp=", M_simp, ")" + )) + + wrong_from <- xy_orig[ft_all[, 1], , drop = FALSE] + right_from <- xy_simp[ft_all[, 1], , drop = FALSE] + + mismatch_component_rate <- sum(wrong_from != right_from) / nrow(ft_all) + mismatch_row_rate <- sum(rowSums(wrong_from != right_from) > 0) / nrow(ft_all) + assert_true(mismatch_component_rate > 0, "Validation2 failed: mismatch_component_rate == 0") + + z_wrong <- raster::extract(dem, wrong_from) + z_right <- raster::extract(dem, right_from) + dz_wrong_right <- z_wrong - z_right + + # Validation 3: fixed approach (explicit coord=xy_orig, simplify=FALSE) + ft_fixed <- FromToNode(sl, coord = xy_orig, simplify = FALSE)[, 2:3, drop = FALSE] + fixed_from <- xy_orig[ft_fixed[, 1], , drop = FALSE] + true_from <- segment_first_vertices(sl) + max_abs_diff <- suppressWarnings(max(abs(fixed_from - true_from))) + fixed_endpoints_ok <- is.finite(max_abs_diff) && max_abs_diff < 1e-9 + assert_true(fixed_endpoints_ok, paste0( + "Validation3 failed: fixed_from != segment first vertices (max_abs_diff=", fmt_num(max_abs_diff), ")" + )) + + z_fixed <- raster::extract(dem, fixed_from) + dz_fixed_right <- z_fixed - z_right + + # Print structured report block + cat("\n==============================\n") + cat("Mock river case: ", case_name, "\n", sep = "") + cat("Input: ", basename(rds_path), "\n", sep = "") + cat("Segments: ", length(sl), "\n", sep = "") + cat("Validation1 (coord rows): N_orig=", N_orig, ", M_simp=", M_simp, "\n", sep = "") + cat( + "Validation2 (index mismatch): mismatch_component_rate=", + fmt_num(mismatch_component_rate, digits = 6), + ", mismatch_row_rate=", + fmt_num(mismatch_row_rate, digits = 6), + "\n", + sep = "" + ) + cat("Elevation diff (wrong-right): ", stats_one_line(num_stats(dz_wrong_right)), "\n", sep = "") + cat( + "Validation3 (fixed endpoints): max_abs_diff=", + fmt_num(max_abs_diff, digits = 12), + "\n", + sep = "" + ) + cat("Elevation diff (fixed-right): ", stats_one_line(num_stats(dz_fixed_right)), "\n", sep = "") + cat("==============================\n") + + list( + case = case_name, + input = basename(rds_path), + n_segments = length(sl), + N_orig = N_orig, + M_simp = M_simp, + mismatch_component_rate = mismatch_component_rate, + mismatch_row_rate = mismatch_row_rate, + elev_diff_wrong_right = num_stats(dz_wrong_right), + fixed_endpoints_ok = fixed_endpoints_ok, + fixed_max_abs_diff = max_abs_diff, + elev_diff_fixed_right = num_stats(dz_fixed_right) + ) +} + +script_dir <- get_script_dir() +mock_dir <- script_dir + +cases <- c( + mock1_y_shape = file.path(mock_dir, "mock1_y_shape.rds"), + mock2_tree = file.path(mock_dir, "mock2_tree.rds"), + mock3_qhh_subset = file.path(mock_dir, "mock3_qhh_subset.rds") +) + +results <- list() +had_error <- FALSE + +for (nm in names(cases)) { + rds_path <- cases[[nm]] + if (!file.exists(rds_path)) { + message("Skip: ", nm, " (missing file: ", rds_path, ")") + had_error <- TRUE + next + } + out <- tryCatch( + run_one_case(nm, rds_path), + error = function(e) { + had_error <<- TRUE + message("ERROR in ", nm, ": ", conditionMessage(e)) + list(case = nm, input = basename(rds_path), error = conditionMessage(e)) + } + ) + results[[nm]] <- out +} + +report <- list( + run_info = list( + time = format(Sys.time(), "%Y-%m-%d %H:%M:%S %z"), + r_version = R.version.string, + sf_use_s2 = sf::sf_use_s2() + ), + cases = results +) + +report_path <- file.path(mock_dir, "mismatch_report.json") +jsonlite::write_json(report, report_path, pretty = TRUE, auto_unbox = TRUE, digits = 10) +message("Saved JSON report: ", report_path) + +if (had_error) { + quit(status = 1) +}