Intermittent initialization failure on some Kasli SoC banks #10
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
We have identified a recurring initialization failure when Fastino is connected to Kasli SoC EEM ports driven by 1.8V banks (2-7, 10, 11). The issue does not appear when using 2.5V banks (ports 0, 1, 8, 9).
This behavior is inconsistent across units; it was identified in the latest production batch, though it may be latent in others. All affected units pass stackup verification and operate correctly on non-1.8V ports.
Observations & Investigation
While troubleshooting, we noticed clock signals exposed via TP22 and TP23 CMOS test points.
Investigation of signals entering the Fastino FPGA with differential and active probes showed no significant distortion or Signal Integrity (SI) issues, even when the initialization failed.
However, disabling the drivers for these two test points consistently resolved the initialization issue in our test setup. With the test point drivers removed, the Fastino units initialize and operate correctly across all Kasli SoC EEM ports, including those on 1.8V banks.
Conclusion
While the differential probes did not reveal obvious SI issues, the fact that disabling these CMOS outputs resolves the failure suggests we may be on the tight margin of operation at some point. The question is whether the problem is on the gateware or layout side, and if removing TP drivers creates enough safety margin for reliable operation.