Add OSDF cache-diagnostic appendix to JRA-3Q heat-index notebook#57
Open
hrhampapura wants to merge 2 commits into
Open
Add OSDF cache-diagnostic appendix to JRA-3Q heat-index notebook#57hrhampapura wants to merge 2 commits into
hrhampapura wants to merge 2 commits into
Conversation
The PelicanFS team confirmed (a) a cache serving corrupt bytes is the most likely cause of the deterministic zlib failures, and (b) they have no easy way to find a bad object across their caches and asked us to print which cache Casper is routed to. They also endorsed pinning known-good caches via preferred_caches as the interim lever until client-side checksumming / automatic failover lands in PelicanFS. This adds an appendix (run on Casper) that leaves the working direct_reads open untouched: - Cell A: print the director-chosen cache + full candidate list + origin for d640000, and enable the fsspec.pelican logger. - Cell B: re-open JRA-3Q through the cache, reproduce the failing read, and dump get_access_data() to name the serving/failing cache. - Cell C: guarded preferred_caches template (inert until a good cache is set). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
Follow-up to the JRA-3Q
d640000read failures debugged in #56. The PelicanFS team confirmed the most likely cause is a cache serving corrupt bytes (deterministiczlib: incorrect header checkfrom Casper; the same chunks decode cleanly from the origin and over HTTPS), and asked us for two things:preferred_cachesto pin a known-good cache as the interim lever, until client-side checksumming / automatic failover lands in PelicanFS.This PR adds a diagnostic appendix to
notebooks/jja_heatindex.ipynb. The workingdirect_reads=Trueopen is left untouched, so the notebook still runs end-to-end.What's added (appendix, run on Casper)
d640000, the full candidate list, and the origin; enables thefsspec.pelicanlogger so cache selection andMarking cache at <url> as badevents print inline.direct_reads), reproduces the failing read, and dumpsget_access_data()to name the cache that served/failed each object (handles both thezlib-corrupt case, recorded as success, and theContentLengthErrorcase, recorded as failure).preferred_cachestemplate (keeps"+"for director fallback); inert until a healthy cache host is filled in andUSE_PREFERRED_CACHES=True.Notes
get_working_cache/get_origin_urlreturn(url, director_response);get_access_data()keeps the last 3 responses per object path).🤖 Generated with Claude Code