Summary
Tile APF0000pqj (obsid ESP_020568_0950) contains a blotch entry with radius_1=450.6, radius_2=356.6 — larger than the tile itself (840×648 px). This is not an isolated case: 4,418 blotches (1.03% of catalog) have at least one radius > 200 px, and 36 blotches have both radii > 300 px (truly tile-spanning).
Root cause
The radii sub-clustering works correctly — it groups similarly-sized markings together. The problem is that there is no physical bounds filter to reject clusters whose averaged dimensions exceed the tile.
For APF0000pqj, 4 users drew tile-spanning ellipses centered near tile center (420, 324):
| User |
x |
y |
radius_1 |
radius_2 |
offset from center |
| cosycoysh |
402 |
366 |
458 |
372 |
(18, 42) |
| Chris.Parker |
403 |
328 |
465 |
349 |
(17, 4) |
| 2715 |
407 |
320 |
377 |
320 |
(14, 4) |
| andyt26767 |
414 |
348 |
429 |
349 |
(6, 24) |
These users essentially marked "the whole tile is one big feature" — which is not a meaningful scientific measurement.
Suggested mitigations
Option A: Physical bounds filter on cluster output
Reject any averaged blotch cluster where radius_1 or radius_2 exceeds a threshold (e.g., half the tile dimension — 420 or 324 px). Simple, conservative, catches the most egregious cases.
Option B: Center-proximity + large-radius heuristic
Filter markings where the center is near the tile center AND the radii are large. The rationale: a legitimate large blotch can be centered anywhere, but a "whole tile" marking will tend toward the center. Analysis of the 36 tile-spanning blotches shows only 4 are within 100 px of tile center — but most legitimate large features aren't centered either, so this may not add much over Option A.
Option C: Pre-filter individual markings before clustering
In filter_data() or at the start of cluster_image_id(), discard individual markings where radius_1 > max_allowed (e.g., 420 px). This prevents them from entering the clustering pipeline at all and is arguably the cleanest solution — these markings represent user error, not scientific signal.
Recommendation
Option C (pre-filter) is cleanest: discard individual blotch markings where either radius exceeds half the tile width (420 px) before clustering. This removes the noise at source without adding post-hoc filters. Option A as a safety net on the output side would catch any edge cases that slip through.
Scale of impact
- 4,418 blotches with radius > 200 px (1.03% of catalog)
- 36 blotches with both radii > 300 px
- 80% of radius > 200 px blotches are far from tile center (>200 px offset), suggesting many are legitimate large features — the filter threshold matters
Catalog version
v3.1
Summary
Tile
APF0000pqj(obsidESP_020568_0950) contains a blotch entry withradius_1=450.6, radius_2=356.6— larger than the tile itself (840×648 px). This is not an isolated case: 4,418 blotches (1.03% of catalog) have at least one radius > 200 px, and 36 blotches have both radii > 300 px (truly tile-spanning).Root cause
The radii sub-clustering works correctly — it groups similarly-sized markings together. The problem is that there is no physical bounds filter to reject clusters whose averaged dimensions exceed the tile.
For
APF0000pqj, 4 users drew tile-spanning ellipses centered near tile center (420, 324):These users essentially marked "the whole tile is one big feature" — which is not a meaningful scientific measurement.
Suggested mitigations
Option A: Physical bounds filter on cluster output
Reject any averaged blotch cluster where
radius_1orradius_2exceeds a threshold (e.g., half the tile dimension — 420 or 324 px). Simple, conservative, catches the most egregious cases.Option B: Center-proximity + large-radius heuristic
Filter markings where the center is near the tile center AND the radii are large. The rationale: a legitimate large blotch can be centered anywhere, but a "whole tile" marking will tend toward the center. Analysis of the 36 tile-spanning blotches shows only 4 are within 100 px of tile center — but most legitimate large features aren't centered either, so this may not add much over Option A.
Option C: Pre-filter individual markings before clustering
In
filter_data()or at the start ofcluster_image_id(), discard individual markings whereradius_1 > max_allowed(e.g., 420 px). This prevents them from entering the clustering pipeline at all and is arguably the cleanest solution — these markings represent user error, not scientific signal.Recommendation
Option C (pre-filter) is cleanest: discard individual blotch markings where either radius exceeds half the tile width (420 px) before clustering. This removes the noise at source without adding post-hoc filters. Option A as a safety net on the output side would catch any edge cases that slip through.
Scale of impact
Catalog version
v3.1