At commit 94e55c7a3b4ffdaa11052e1d3d459505e6ed126c, the following test starts failing with a precision mismatch:
pytest tests/ops/test_engram_bwd.py::test_engram_gate_conv_bwd[1-32-256-dtype0-False]
The test should pass, and the output should be numerically close to the reference output.
FAILED tests/ops/test_engram_bwd.py::test_engram_gate_conv_bwd[1-32-256-dtype0-False]
AssertionError: Tensor-likes are not close!
output = tensor([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], device='cuda:0')
output_ref = tensor([[ 0.1562, -0.3986, -0.1758, ..., -0.0725, -0.5347, -0.2780],
[ 0.0719, 0.2237, 0.1933, ..., -0.3443, 0.1883, -0.0696],
[-0.3473, -0.0062, -0.2820, ..., -0.3163, 0.1547, -0.7590]],
device='cuda:0')
Mismatched elements: 388 / 1024 (37.9%)
Greatest absolute difference: 1.0608892440795898
Greatest relative difference: 1.0
It looks like the kernel output becomes all zeros after this commit, while the reference output remains valid.
The same test passes on the previous commit, so this issue appears to be introduced by commit 94e55c7.
Describe the Bug
At commit 94e55c7a3b4ffdaa11052e1d3d459505e6ed126c, the following test starts failing with a precision mismatch:
pytest tests/ops/test_engram_bwd.py::test_engram_gate_conv_bwd[1-32-256-dtype0-False]To Reproduce
git checkout 94e55c7a3b4ffdaa11052e1d3d459505e6ed126cpytest tests/ops/test_engram_bwd.py::test_engram_gate_conv_bwdExpected Behavior
The test should pass, and the output should be numerically close to the reference output.
Actual Behavior
The test fails with an
AssertionError:It looks like the kernel output becomes all zeros after this commit, while the reference output remains valid.
The same test passes on the previous commit, so this issue appears to be introduced by commit 94e55c7.