For most List objects, there are two FieldNode objects in the RecordBatchList: one for the List itself, and one for the underlying values. For some strange reason, the arrow creators decided that strings are special, so if you have an array of strings, there is only a single FieldNode representing the array itself.
The only reason I can think of to do this would be a tiny bit of compression of the metadata. Consistency of the format seems like a horrible price to pay for that.
Anyway, I'm keeping this issue open because I'm worried that this problem that the underlying data of arrays cannot be constructed from knowledge of the underlying data alone, but requires additional metadata, might be an ongoing issue.
For most
Listobjects, there are twoFieldNodeobjects in theRecordBatchList: one for theListitself, and one for the underlying values. For some strange reason, the arrow creators decided that strings are special, so if you have an array of strings, there is only a singleFieldNoderepresenting the array itself.The only reason I can think of to do this would be a tiny bit of compression of the metadata. Consistency of the format seems like a horrible price to pay for that.
Anyway, I'm keeping this issue open because I'm worried that this problem that the underlying data of arrays cannot be constructed from knowledge of the underlying data alone, but requires additional metadata, might be an ongoing issue.