Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Feb 11, 2026

Related to

I wanted to get an "end to end" reproducer (aka with a parquet file) that triggers the error reported in from #9370 @jonded94

This is what I have so far (though it requires creating a column chunk with both V1 and V2 page headers)

I could not simplify the test more

Here is the file produced:
v1_v2_mixed.parquet.zip

Fascinatingly, it seems to work just fine without the row selection:

(venv) andrewlamb@Andrews-MacBook-Pro-3:~/Software/arrow-rs$ datafusion-cli
selecDataFusion CLI v52.1.0
> select * from '/tmp/v1_v2_mixed.parquet';
+---------------------+
| s                   |
+---------------------+
| {a: [10, 20], b: 1} |
| {a: [30, 40], b: 2} |
| {a: [50, 60], b: 3} |
| {a: [70, 80], b: 4} |
+---------------------+
4 row(s) fetched.
Elapsed 0.010 seconds.

And indeed https://parquet-viewer.xiangpeng.systems/ shows the different headers

Screenshot 2026-02-11 at 1 52 11 PM

@jonded94
Copy link
Contributor

@alamb please have a look at #9399.

I incorrectly stated that this was about mixing v1 and v2 data pages. Apparently, there also can be other bugs, as your test in this PR discovers (?), but what my original issue of #9370 was about actually can be reproduced with v2 data pages only.

@alamb
Copy link
Contributor Author

alamb commented Feb 12, 2026

#9399 is better, let's continue there instead

@alamb alamb closed this Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants