Summary
Markings that enter an XY cluster in the small-object run but whose cluster is subsequently destroyed by radii (or angle) sub-clustering are permanently removed from the pool. They never reach the large-object run, even though they might form valid clusters there.
Mechanism
In _cluster_pipeline (05e_production.dbscan.ipynb), _calculate_unclustered is called immediately after XY clustering, before radii and angle sub-clustering:
xyclusters = self.cluster_xy(data, eps)
xyclusters = list(xyclusters)
self._calculate_unclustered(data, xyclusters) # removes all XY-clustered points NOW
# ... radii sub-clustering may destroy some clusters ...
# ... angle sub-clustering may destroy more ...
# ... but their members are already gone from self.remaining
So any marking that was part of an XY cluster — even one that doesn't survive — is excluded from self.remaining and never reaches the large-object run.
Example: APF0000jei (ESP_021829_0985)
A large, clearly visible elliptical blotch at the bottom-center of the tile is marked by 12 different users at positions spanning x: 246–362, y: 509–584 (116×75 px spread):
| User |
x |
y |
r1 |
r2 |
| MyC0w |
246 |
509 |
227 |
123 |
| Ben Teji |
259 |
550 |
121 |
91 |
| Strufus78 |
285 |
583 |
193 |
152 |
| jmortimer |
292 |
533 |
141 |
106 |
| umajv |
294 |
548 |
195 |
146 |
| WKDCon |
304 |
552 |
138 |
104 |
| basty808 |
308 |
563 |
131 |
98 |
| Frankee64 |
314 |
555 |
240 |
113 |
| Rad awes |
327 |
570 |
161 |
121 |
| not-logged-in |
330 |
584 |
168 |
126 |
| jackielivesey |
350 |
554 |
114 |
86 |
| bruno.edwards |
362 |
583 |
174 |
130 |
What happens:
-
Small run (eps_xy=10): 3 markings cluster (WKDCon, basty808, Frankee64 at x: 304–314). Radii sub-clustering with eps=30 destroys this cluster — radii span 131–240, far beyond eps=30. All 3 become noise, but _calculate_unclustered already removed them.
-
Large run (eps_xy=25): Only 9 of the 12 markings remain. With eps=25, they span x: 246–362 (116 px) — too wide to fully connect. A sub-group clusters, but after radii sub-clustering it falls below min_samples and is filtered.
-
Result: No blotch entry in the catalog, despite 12 independent users agreeing the feature exists.
If the 3 killed-by-radii markings were returned to the pool, the large run would have all 12 markings. With eps=25, a 7-member cluster forms (x: 292–330), and radii sub-clustering with eps=50 produces a surviving 6-member cluster (r1: 131–195).
Suggested fix
Move _calculate_unclustered after all sub-clustering stages, using finalclusters instead of xyclusters:
def _cluster_pipeline(self, kind, data, eps, eps_rad):
xyclusters = self.cluster_xy(data, eps)
xyclusters = list(xyclusters)
# _calculate_unclustered was here — too early
if self.with_radii and eps_rad is not None:
last = self.cluster_radii(xyclusters, eps_rad)
else:
last = xyclusters
last = list(last)
if self.with_angles and self.eps_values[kind]["angle"] is not None:
finalclusters = self.cluster_angles(last, kind)
else:
finalclusters = last
finalclusters = list(finalclusters)
self.finalclusters = finalclusters
self._calculate_unclustered(data, finalclusters) # ← moved here
averaged = get_average_objects(finalclusters, kind)
...
No downside: members of clusters that survived all stages are still excluded from remaining. Only markings whose clusters were fragmented by sub-clustering are recovered.
Target version
v3.2
Summary
Markings that enter an XY cluster in the small-object run but whose cluster is subsequently destroyed by radii (or angle) sub-clustering are permanently removed from the pool. They never reach the large-object run, even though they might form valid clusters there.
Mechanism
In
_cluster_pipeline(05e_production.dbscan.ipynb),_calculate_unclusteredis called immediately after XY clustering, before radii and angle sub-clustering:So any marking that was part of an XY cluster — even one that doesn't survive — is excluded from
self.remainingand never reaches the large-object run.Example: APF0000jei (ESP_021829_0985)
A large, clearly visible elliptical blotch at the bottom-center of the tile is marked by 12 different users at positions spanning x: 246–362, y: 509–584 (116×75 px spread):
What happens:
Small run (eps_xy=10): 3 markings cluster (WKDCon, basty808, Frankee64 at x: 304–314). Radii sub-clustering with eps=30 destroys this cluster — radii span 131–240, far beyond eps=30. All 3 become noise, but
_calculate_unclusteredalready removed them.Large run (eps_xy=25): Only 9 of the 12 markings remain. With eps=25, they span x: 246–362 (116 px) — too wide to fully connect. A sub-group clusters, but after radii sub-clustering it falls below
min_samplesand is filtered.Result: No blotch entry in the catalog, despite 12 independent users agreeing the feature exists.
If the 3 killed-by-radii markings were returned to the pool, the large run would have all 12 markings. With eps=25, a 7-member cluster forms (x: 292–330), and radii sub-clustering with eps=50 produces a surviving 6-member cluster (r1: 131–195).
Suggested fix
Move
_calculate_unclusteredafter all sub-clustering stages, usingfinalclustersinstead ofxyclusters:No downside: members of clusters that survived all stages are still excluded from
remaining. Only markings whose clusters were fragmented by sub-clustering are recovered.Target version
v3.2