Skip to content

Add some simple naive caching to the tiff file#31

Open
mdales wants to merge 1 commit into
geocaml:mainfrom
mdales:mwd-caching-data
Open

Add some simple naive caching to the tiff file#31
mdales wants to merge 1 commit into
geocaml:mainfrom
mdales:mwd-caching-data

Conversation

@mdales
Copy link
Copy Markdown
Contributor

@mdales mdales commented May 14, 2025

No description provided.

@mdales mdales force-pushed the mwd-caching-data branch from 58e5c69 to 968d377 Compare May 14, 2025 08:33
@mdales mdales requested a review from patricoferris May 14, 2025 22:15
Comment thread src/tiff.ml
| _ -> failwith "Unsupported compression"
in

t.strip_cache <-
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure what this gets compiled to, but I think it would be better to put the matching statement outside of the assignment operation. IDK if the compiler is smart enough to recognise that if t.caching_policy = NoCachingis a no-op.

Comment thread src/tiff.ml
Lzw.decode raw_strip_buffer uncompressed_buffer;
uncompressed_buffer
| _ -> failwith "Unsupported compression"
match List.assoc_opt strip t.strip_cache with
Copy link
Copy Markdown
Contributor

@patricoferris patricoferris May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we're better off using something with O(1) lookup here. It probably doesn't matter too much but we do incur a bit of a hit the first time we load something in and have the caching policy set (as we have to loop through the entire strip_cache for every strip).

Comment thread src/tiff.ml
end

type t = { header : header; ifd : Ifd.t }
(* I imagine one day we will also have `Some of int` as an option here *)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will that do? Only cache n strips?

I do wonder if we should not make the ro more sophisticated here so the TIFF library doesn't have to deal with this. A user could already do something very similar themselves using the file_offset as the cache key?

@mdales
Copy link
Copy Markdown
Contributor Author

mdales commented May 15, 2025

Great points in the review @patricoferris - thanks.

Re the placement:

  • It should be after decompression - as LZW compression does prove to be relatively expensive in my experience - this means we'd need to move decompression into the ro, which is an option, but I don't think makes API sense as a user
  • It should be before windowing - because the if you read the left half of the image in a striped world, then read the right half, you have to go back to disk if the user caches, which is wasteful.

We could make windowing a yirgacheffe problem, and just expose strips/tiles at the TIFF library, then you solve the second of those issues.

@patricoferris
Copy link
Copy Markdown
Contributor

I feel convinced, one thought that this stirred (and this is my own ignorance showing) but how does windowing work in terms of LZW compression? Do you need to start decompressing from the start of a strip even if you are going to start somewhere in the middle of the strip because the user supplied some offset?

I think it would be good to expose strips and tiles at some point as an "expert" interface, but I would also like users to be able to work with the tiff library directly so let's keep this and add in the caching layer :)) Sorry for the conflicts, do you want me to take care of those?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants