Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
077fb82
fix(cshm): resolve Johnson geometry degeneracy with scale normalization
claude Dec 31, 2025
f23a49a
fix(cshm): apply scale normalization to CN=4 geometries
claude Dec 31, 2025
921fb00
fix(cshm): correct CN=3 normalization and vT-3 geometry
claude Dec 31, 2025
e4c5fb1
fix(cshm): use CN-aware normalization strategy
claude Jan 1, 2026
9307b49
fix(cshm): use CN-aware scale normalization in web worker
claude Jan 1, 2026
32ac42f
test: add complete SHAPE v2.1 references for CN=3
claude Jan 1, 2026
ef19ac3
fix(cshm): include central atom for CN=3 to match SHAPE/cosymlib
claude Jan 1, 2026
3d0fa54
fix(cshm): include central atom for CN=4 and CN=5 to match SHAPE/cosy…
claude Jan 1, 2026
d5cb999
test: add diagnostic tests for CN=4 geometry analysis
claude Jan 1, 2026
eda698c
fix(cshm): include central atom for CN=6-8 to match SHAPE/cosymlib
claude Jan 1, 2026
ccdba35
fix(cshm): include central atom for CN=9 to match SHAPE/cosymlib
claude Jan 1, 2026
21b1bea
fix(cshm): include central atom for CN=2 and CN=10-12 to match SHAPE/…
claude Jan 1, 2026
f504437
test: update vertex count assertion to reflect central atom inclusion
claude Jan 1, 2026
18f0294
test: add CN=2 parity test for CuCl2
claude Jan 1, 2026
82733c1
test: add CN=6 parity test for NiN4O2 octahedral complex
claude Jan 1, 2026
4f3c0a1
fix(cshm): fix SVD and exhaustive permutation search for SHAPE parity
claude Jan 2, 2026
5394a37
fix(cshm): use SHAPE's overlap-based formula for CShM calculation
claude Jan 2, 2026
44880a5
test: add CN=7-12 parity tests and fix CN=10-12 test structure
claude Jan 2, 2026
334daa8
test: add CN=7 FeL7 parity test with SHAPE v2.1 reference
claude Jan 2, 2026
c5c7883
fix(refs): add correct ETBPY-8 geometry and CN=8 parity test
claude Jan 2, 2026
c5424db
test: add CN=9 CrL9 parity test with SHAPE v2.1 reference
claude Jan 3, 2026
105983d
test: fix CN=9 MFF-9 test to use strict tolerance
claude Jan 3, 2026
eed34a3
test: add CN=10 FeL10 parity test with SHAPE v2.1 reference
claude Jan 3, 2026
e2cdcb7
test: add CN=12 NbL12 parity test with SHAPE v2.1 reference
claude Jan 3, 2026
ae58148
test: add CN=11 NbL11 parity test with SHAPE v2.1 reference
claude Jan 3, 2026
05403ad
fix(refs): correct JASPC-11 reference geometry from SHAPE v2.1
claude Jan 3, 2026
2f719db
fix(refs): correct CN=12 Johnson polyhedra reference geometries
claude Jan 3, 2026
a089376
fix(refs): correct higher CN fullerene reference geometries (CN=20,24…
claude Jan 3, 2026
cfabf36
docs: add SHAPE v2.1 parity test results to README
claude Jan 3, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 82 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,8 +116,88 @@ Q-Shape implements state-of-the-art computational methods:

Q-Shape has been validated against SHAPE 2.1 (Fortran reference implementation):
- **Mean absolute error**: < 0.01 CShM units
- **Correlation**: R² = 0.9998
- **Test dataset**: 50 coordination complexes from Cambridge Structural Database
- **Correlation**: R² > 0.9999
- **Test coverage**: CN=2-12 with real coordination complexes

<details>
<summary><strong>SHAPE v2.1 Parity Test Results (Click to expand)</strong></summary>

#### CN=2 - CuCl₂ (Bent Dihalide)
| Geometry | Q-Shape | SHAPE | Rel.Err |
|----------|---------|-------|---------|
| L-2 (Linear) | 11.96378 | 11.96364 | 0.00% |

#### CN=3 - NH₃ (Ammonia)
| Geometry | Q-Shape | SHAPE | Rel.Err |
|----------|---------|-------|---------|
| TP-3 (Trigonal Planar) | 3.63845 | 3.63858 | 0.00% |

#### CN=4 - CuCl₄ (Square Planar)
| Geometry | Q-Shape | SHAPE | Rel.Err |
|----------|---------|-------|---------|
| SP-4 (Square Planar) | 0.02656 | 0.02657 | 0.05% |
| SS-4 (Seesaw) | 17.86068 | 17.86037 | 0.00% |
| T-4 (Tetrahedral) | 31.94415 | 31.94357 | 0.00% |

#### CN=6 - NiN₄O₂ (Octahedral)
| Geometry | Q-Shape | SHAPE | Rel.Err |
|----------|---------|-------|---------|
| OC-6 (Octahedral) | 0.21578 | 0.21577 | 0.00% |
| TPR-6 (Trigonal Prism) | 15.86082 | 15.86037 | 0.00% |
| PPY-6 (Pentagonal Pyramid) | 29.25438 | 29.25337 | 0.00% |

#### CN=7 - FeL₇ (Pentagonal Bipyramidal)
| Geometry | Q-Shape | SHAPE | Rel.Err |
|----------|---------|-------|---------|
| PBPY-7 (Pentagonal Bipyramidal) | 0.00000 | 0.00000 | 0.00% |
| JPBPY-7 (Johnson J13) | 3.61602 | 3.61603 | 0.00% |
| CTPR-7 (Capped Trigonal Prism) | 6.67472 | 6.67493 | 0.00% |
| COC-7 (Capped Octahedral) | 8.58135 | 8.58154 | 0.00% |

#### CN=8 - FeL₈ (Square Antiprism)
| Geometry | Q-Shape | SHAPE | Rel.Err |
|----------|---------|-------|---------|
| SAPR-8 (Square Antiprism) | 0.09336 | 0.09337 | 0.01% |
| BTPR-8 (Biaugmented Trigonal Prism) | 2.34967 | 2.34967 | 0.00% |
| TDD-8 (Triangular Dodecahedron) | 2.66307 | 2.66300 | 0.00% |
| CU-8 (Cube) | 10.43338 | 10.43287 | 0.00% |
| ETBPY-8 (Elongated Trigonal Bipyramid) | 24.78388 | 24.78340 | 0.00% |

#### CN=9 - CrL₉ (Muffin)
| Geometry | Q-Shape | SHAPE | Rel.Err |
|----------|---------|-------|---------|
| MFF-9 (Muffin) | 0.00000 | 0.00000 | 0.00% |
| CSAPR-9 (Capped Square Antiprism) | 0.81738 | 0.81738 | 0.00% |
| TCTPR-9 (Tricapped Trigonal Prism) | 2.04462 | 2.04462 | 0.00% |
| CCU-9 (Capped Cube) | 9.68808 | 9.68808 | 0.00% |

#### CN=10 - FeL₁₀ (Hexadecahedron)
| Geometry | Q-Shape | SHAPE | Rel.Err |
|----------|---------|-------|---------|
| HD-10 (Hexadecahedron) | 16.93346 | 16.93361 | 0.00% |
| SDD-10 (Staggered Dodecahedron) | 17.12465 | 17.12464 | 0.00% |
| PAPR-10 (Pentagonal Antiprism) | 17.29546 | 17.29565 | 0.00% |
| PPR-10 (Pentagonal Prism) | 19.80444 | 19.80407 | 0.00% |

#### CN=11 - NbL₁₁ (Augmented Pentagonal Prism)
| Geometry | Q-Shape | SHAPE | Rel.Err |
|----------|---------|-------|---------|
| JAPPR-11 (Augmented Pentagonal Prism, J52) | 21.67264 | 21.67256 | 0.00% |
| JCPPR-11 (Capped Pentagonal Prism, J9) | 24.85788 | 24.85845 | 0.00% |
| JCPAPR-11 (Capped Pentagonal Antiprism, J11) | 27.02151 | 27.02164 | 0.00% |
| JASPC-11 (Augmented Sphenocorona, J87) | 28.15989 | 28.15981 | 0.00% |

#### CN=12 - NbL₁₂ (Biaugmented Pentagonal Prism)
| Geometry | Q-Shape | SHAPE | Rel.Err |
|----------|---------|-------|---------|
| JBAPPR-12 (Biaugmented Pentagonal Prism, J53) | 17.93564 | 17.93587 | 0.00% |
| TT-12 (Truncated Tetrahedron) | 19.71221 | 19.71226 | 0.00% |
| COC-12 (Cuboctahedral) | 21.69394 | 21.69330 | 0.00% |
| IC-12 (Icosahedral) | 25.52546 | 25.52485 | 0.00% |
| JSC-12 (Square Cupola, J4) | 25.96272 | 25.96201 | 0.00% |
| JSPMC-12 (Sphenomegacorona, J88) | 26.77879 | 26.77845 | 0.00% |

</details>

---

Expand Down
239 changes: 239 additions & 0 deletions docs/development/ROOT_CAUSE_REPORT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
# Root Cause Report: Johnson Geometry Degeneracy and CShM Bias

**Date:** 2024
**Author:** Debug Session
**Status:** ROOT CAUSE IDENTIFIED

## Executive Summary

Q-Shape produces identical CShM values for TBPY-5 (regular D3h trigonal bipyramid) and JTBPY-5 (Johnson J12 elongated trigonal bipyramid) because the `normalize()` function applied to reference geometries destroys the radial distance differences that distinguish Johnson polyhedra from regular polyhedra.

## Problem Statement

### Problem A: Johnson Geometry Degeneracy
- Q-Shape reports TBPY-5 = 5.782777 and JTBPY-5 = 5.782782 (difference: 0.000006)
- SHAPE reports TBPY-5 = 5.06871 and JTBPY-5 = 7.23858 (difference: 2.17)
- Q-Shape cannot distinguish between regular and Johnson variants

### Problem B: Systematic CShM Bias
- Q-Shape values are systematically higher than SHAPE values
- Best geometry (SPY-5): Q-Shape = 4.93, SHAPE = 4.23 (16.7% higher)
- Median relative error across all geometries: 14.1%

## Root Cause Analysis

### Location of Bug
File: `src/constants/referenceGeometries/index.js`
Function: `normalize()` applied to all reference geometry coordinates via `.map(normalize)`

### The Normalization Problem

The `normalize()` function converts all reference geometry vertices to unit length:

```javascript
function normalize(v) {
const len = Math.hypot(...v);
if (len === 0) return [0, 0, 0];
return [v[0] / len, v[1] / len, v[2] / len];
}
```

This is applied to every geometry generator:

```javascript
function generateTrigonalBipyramidal() {
return [
[0.000000, 0.000000, -1.095445], // axial
[1.095445, 0.000000, 0.000000], // equatorial
...
].map(normalize); // ← PROBLEM: destroys radial differences
}

function generateJohnsonTrigonalBipyramid() {
return [
[0.925820, 0.000000, 0.000000], // equatorial (shorter)
[0.000000, 0.000000, 1.309307], // axial (longer)
...
].map(normalize); // ← PROBLEM: destroys radial differences
}
```

### Before Normalization (Correct Geometry)

**TBPY-5 (Regular D3h):**
- Axial vertices: [0, 0, ±1.095445] — radius = 1.095445
- Equatorial vertices: [±1.095445, ...] — radius = 1.095445
- **All vertices equidistant from center** (ratio = 1.0)

**JTBPY-5 (Johnson J12 - Elongated):**
- Equatorial vertices: [0.925820, ...] — radius = 0.925820
- Axial vertices: [0, 0, ±1.309307] — radius = 1.309307
- **Axial vertices are FARTHER than equatorial** (ratio = 1.309307/0.925820 = 1.414 ≈ √2)

### After Normalization (Identical!)

**TBPY-5 after normalize():**
- All vertices: radius = 1.0

**JTBPY-5 after normalize():**
- All vertices: radius = 1.0

**Both become identical D3h trigonal bipyramids!**

The elongation that defines the Johnson J12 geometry is completely lost.

## Evidence

### Test Output
```
=== DEGENERACY ANALYSIS ===
TBPY-5: 5.782777
JTBPY-5: 5.782782
Difference: 0.000006
Expected difference (per SHAPE): ~2.17
DEGENERACY DETECTED: YES - PROBLEM!
```

### Reference Geometry Coordinates (after normalization)
```
TBPY-5:
Vertex 0: [0.000000, 0.000000, -1.000000]
Vertex 1: [1.000000, 0.000000, 0.000000]
Vertex 2: [-0.500000, 0.866025, 0.000000]
Vertex 3: [-0.500000, -0.866025, 0.000000]
Vertex 4: [0.000000, 0.000000, 1.000000]

JTBPY-5:
Vertex 0: [1.000000, 0.000000, 0.000000]
Vertex 1: [-0.500000, 0.866026, 0.000000]
Vertex 2: [-0.500000, -0.866026, 0.000000]
Vertex 3: [0.000000, 0.000000, 1.000000]
Vertex 4: [0.000000, 0.000000, -1.000000]
```

Both have all vertices at exactly distance 1.0 from center — geometrically identical!

## Why This Matters for CShM

The Continuous Shape Measure (CShM) computes the minimum deviation between an actual structure and a reference polyhedron. If two reference polyhedra are identical (after normalization), they will produce identical CShM values for any input structure.

### Mathematical Explanation

CShM formula:
```
S(Q,P) = 100 × min{R,π} [ (1/N) × Σᵢ |qᵢ - R·pπ(i)|² ]
```

Where:
- Q = actual coordinates (normalized to unit sphere)
- P = reference polyhedron (should NOT be per-vertex normalized)
- R = optimal rotation
- π = optimal permutation

If P_TBPY and P_JTBPY are identical after normalization, then:
```
S(Q, P_TBPY) = S(Q, P_JTBPY)
```

for all Q, which is exactly what we observe.

## Relationship to Problem B (CShM Bias)

The systematic bias (Q-Shape values higher than SHAPE) is likely also related to normalization:

1. **Actual coordinates normalization:** Q-Shape normalizes actual coordinates to unit sphere
2. **Reference normalization:** Q-Shape also normalizes reference coordinates
3. **SHAPE behavior:** SHAPE may use different normalization conventions

If SHAPE normalizes the entire structure (actual + reference) to match average distances while Q-Shape normalizes each point individually, this could cause systematic differences.

## Proposed Fix

### Option 1: Remove per-vertex normalization (RECOMMENDED)

Use the original CoSyMlib/SHAPE reference coordinates WITHOUT the `.map(normalize)` call:

```javascript
function generateJohnsonTrigonalBipyramid() {
// JTBPY-5: Johnson Trigonal Bipyramid (J12) - Official CoSyMlib reference
// DO NOT normalize - preserve elongated character
return [
[0.925820, 0.000000, 0.000000],
[-0.462910, 0.801784, 0.000000],
[-0.462910, -0.801784, 0.000000],
[0.000000, 0.000000, 1.309307],
[0.000000, 0.000000, -1.309307]
]; // NO .map(normalize)
}
```

### Option 2: Scale normalization (preserve shape)

Normalize by scaling the entire geometry to unit RMS distance, preserving relative distances:

```javascript
function normalizeScale(coords) {
// Compute RMS distance from centroid
const centroid = coords.reduce((acc, c) =>
[acc[0] + c[0]/coords.length, acc[1] + c[1]/coords.length, acc[2] + c[2]/coords.length],
[0, 0, 0]
);

const rms = Math.sqrt(
coords.reduce((sum, c) =>
sum + (c[0]-centroid[0])**2 + (c[1]-centroid[1])**2 + (c[2]-centroid[2])**2,
0
) / coords.length
);

// Scale all coordinates by the same factor
return coords.map(c => [
(c[0] - centroid[0]) / rms,
(c[1] - centroid[1]) / rms,
(c[2] - centroid[2]) / rms
]);
}
```

This preserves the shape (relative distances) while normalizing overall scale.

### Calculator Changes Required

The `shapeCalculator.js` currently normalizes actual coordinates to unit sphere:
```javascript
P_vecs.forEach(v => v.normalize());
```

This must be changed to match the reference geometry normalization convention (scale normalization, not per-vertex normalization).

## Impact Assessment

### Affected Reference Geometries

All Johnson polyhedra with elongated/shortened bonds will be affected:
- JTBPY-5 (J12): Elongated trigonal bipyramid
- JPBPY-7 (J13): Johnson pentagonal bipyramid
- JGBF-8 (J26): Gyrobifastigium
- JETBPY-8 (J14): Elongated triangular bipyramid
- JSD-8 (J84): Snub disphenoid
- And many others...

### Expected Results After Fix

1. **Degeneracy resolved:** TBPY-5 ≠ JTBPY-5
2. **CShM values closer to SHAPE:** Reduced systematic bias
3. **Correct rankings:** Johnson geometries ranked appropriately

## Verification Plan

1. Remove `.map(normalize)` from reference geometry generators
2. Implement scale normalization in calculator
3. Run parity benchmark tests
4. Verify TBPY-5 and JTBPY-5 have different CShM values
5. Compare Q-Shape rankings with SHAPE v2.1 reference data

## Conclusion

The root cause of both Johnson geometry degeneracy AND systematic CShM bias is the per-vertex normalization applied to reference geometries. This must be replaced with scale normalization that preserves relative distances while allowing overall size matching.

The fix is straightforward but must be applied consistently to both reference geometries AND the actual coordinate processing in the calculator.
Loading
Loading