-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Hello, is there a way to get more debugging info for this? I file called cell_boundaries_sf.parquet has indeed been created (currently 42 Mb).
I'm running on a compute node with no outside network.
> sfes[[sid]] <- readXenium(data_dir=ddirs[[sid]], sample_id=sid,
+ segmentations=c('cell'), flip='none',
+ row.names='symbol', add_molecules=FALSE,
+ BPPARAM=MulticoreParam(workers=16))
>>> Cell segmentations are found in `.parquet` file(s)
>>> Reading cell segmentations
>>> Making POLYGON cell geometries
>>> Checking polygon validity
>>> Saving geometries to parquet files
Error: IOError: Unexpected end of stream
Timing stopped at: 70.25 2.862 75.3
> traceback()
8: stop(e)
7: value[[3L]](cond)
6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
5: tryCatchList(expr, classes, parentenv, handlers)
4: tryCatch(reader$ReadTable(), error = read_compressed_error)
3: arrow::read_parquet(fn_meta)
2: readXenium(data_dir = ddirs[[sid]], sample_id = sid, segmentations = c("cell"),
flip = "none", row.names = "symbol", add_molecules = FALSE,
BPPARAM = MulticoreParam(workers = 16))
1: system.time(sfes[[sid]] <- readXenium(data_dir = ddirs[[sid]],
sample_id = sid, segmentations = c("cell"), flip = "none",
row.names = "symbol", add_molecules = FALSE, BPPARAM = MulticoreParam(workers = 16)))
The trace seems to suggest it's actually happening on the reading of the cell metadata. The cells.parquet file seems valid:
$ pip install parquet-cli
$ parq cells.parquet -s
# Schema
<pyarrow._parquet.ParquetSchema object at 0x7fb4e1c3f480>
required group field_id=-1 root {
optional binary field_id=-1 cell_id (String);
optional double field_id=-1 x_centroid;
optional double field_id=-1 y_centroid;
optional int64 field_id=-1 transcript_counts;
optional int64 field_id=-1 control_probe_counts;
optional int64 field_id=-1 genomic_control_counts;
optional int64 field_id=-1 control_codeword_counts;
optional int64 field_id=-1 unassigned_codeword_counts;
optional int64 field_id=-1 deprecated_codeword_counts;
optional int64 field_id=-1 total_counts;
optional double field_id=-1 cell_area;
optional double field_id=-1 nucleus_area;
optional int64 field_id=-1 nucleus_count;
optional binary field_id=-1 segmentation_method (String);
}
$ parq cells.parquet --tail
cell_id x_centroid y_centroid transcript_counts \
407114 oinmkgca-1 9205.424924 4128.082317 201
407115 oinmoddo-1 9217.751420 4118.034091 34
407116 oinmpmal-1 9211.946875 4125.975000 6
407117 oinndlpa-1 9205.680208 4145.841667 3
407118 oinnfkol-1 9212.738965 4138.211158 80
407119 oinnload-1 9223.546875 4120.291667 4
407120 oinnpdnc-1 9222.334486 4127.976770 74
407121 oinodnoj-1 9227.546875 4118.589286 10
407122 oinoflon-1 9209.463542 4150.187500 27
407123 oinokmkj-1 9239.280208 4116.797222 39
control_probe_counts genomic_control_counts control_codeword_counts \
407114 0 0 0
407115 0 0 0
407116 0 0 0
407117 0 0 0
407118 0 0 0
407119 0 0 0
407120 0 0 0
407121 0 0 0
407122 0 0 0
407123 0 0 0
unassigned_codeword_counts deprecated_codeword_counts total_counts \
407114 0 0 201
407115 0 0 34
407116 0 0 6
407117 0 0 3
407118 0 0 80
407119 0 0 4
407120 0 0 74
407121 0 0 10
407122 0 0 27
407123 0 0 39
cell_area nucleus_area nucleus_count segmentation_method
407114 41.0 0.0 0 Imported Cell Segmentation
407115 44.0 0.0 0 Imported Cell Segmentation
407116 40.0 0.0 0 Imported Cell Segmentation
407117 15.0 0.0 0 Imported Cell Segmentation
407118 177.0 0.0 0 Imported Cell Segmentation
407119 36.0 0.0 0 Imported Cell Segmentation
407120 113.0 0.0 0 Imported Cell Segmentation
407121 28.0 0.0 0 Imported Cell Segmentation
407122 48.0 0.0 0 Imported Cell Segmentation
407123 90.0 0.0 0 Imported Cell Segmentation
I have been able to create an sf object from this segmentation by manually running arrow::read_parquet and then st_polygon on cell groups from the resulting data frame), though I did not reach the point of reading cells.parquet (was only focused on the geometry with associated cell type annotations).
BTW, i saw there was another issue referencing proseg, and in this case I am using a Xenium outs dir created from a proseg (v3) segmentation (via conversion to baysor style and then a new spaceranger run).