Skip to content

IOError: Unexpected end of stream while "Saving geometries to parquet files" #57

@dpschreiner

Description

@dpschreiner

Hello, is there a way to get more debugging info for this? I file called cell_boundaries_sf.parquet has indeed been created (currently 42 Mb).

I'm running on a compute node with no outside network.

>  sfes[[sid]] <- readXenium(data_dir=ddirs[[sid]], sample_id=sid,
+                           segmentations=c('cell'), flip='none',
+                           row.names='symbol', add_molecules=FALSE,
+                           BPPARAM=MulticoreParam(workers=16))

>>> Cell segmentations are found in `.parquet` file(s)
>>> Reading cell segmentations
>>> Making POLYGON cell geometries
>>> Checking polygon validity
>>> Saving geometries to parquet files
Error: IOError: Unexpected end of stream
Timing stopped at: 70.25 2.862 75.3
> traceback()
8: stop(e)
7: value[[3L]](cond)
6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
5: tryCatchList(expr, classes, parentenv, handlers)
4: tryCatch(reader$ReadTable(), error = read_compressed_error)
3: arrow::read_parquet(fn_meta)
2: readXenium(data_dir = ddirs[[sid]], sample_id = sid, segmentations = c("cell"),
       flip = "none", row.names = "symbol", add_molecules = FALSE,
       BPPARAM = MulticoreParam(workers = 16))
1: system.time(sfes[[sid]] <- readXenium(data_dir = ddirs[[sid]],
       sample_id = sid, segmentations = c("cell"), flip = "none",
       row.names = "symbol", add_molecules = FALSE, BPPARAM = MulticoreParam(workers = 16)))

The trace seems to suggest it's actually happening on the reading of the cell metadata. The cells.parquet file seems valid:

$ pip install parquet-cli
$ parq cells.parquet -s

 # Schema 
 <pyarrow._parquet.ParquetSchema object at 0x7fb4e1c3f480>
required group field_id=-1 root {
  optional binary field_id=-1 cell_id (String);
  optional double field_id=-1 x_centroid;
  optional double field_id=-1 y_centroid;
  optional int64 field_id=-1 transcript_counts;
  optional int64 field_id=-1 control_probe_counts;
  optional int64 field_id=-1 genomic_control_counts;
  optional int64 field_id=-1 control_codeword_counts;
  optional int64 field_id=-1 unassigned_codeword_counts;
  optional int64 field_id=-1 deprecated_codeword_counts;
  optional int64 field_id=-1 total_counts;
  optional double field_id=-1 cell_area;
  optional double field_id=-1 nucleus_area;
  optional int64 field_id=-1 nucleus_count;
  optional binary field_id=-1 segmentation_method (String);
}

$ parq cells.parquet --tail
           cell_id   x_centroid   y_centroid  transcript_counts  \
407114  oinmkgca-1  9205.424924  4128.082317                201   
407115  oinmoddo-1  9217.751420  4118.034091                 34   
407116  oinmpmal-1  9211.946875  4125.975000                  6   
407117  oinndlpa-1  9205.680208  4145.841667                  3   
407118  oinnfkol-1  9212.738965  4138.211158                 80   
407119  oinnload-1  9223.546875  4120.291667                  4   
407120  oinnpdnc-1  9222.334486  4127.976770                 74   
407121  oinodnoj-1  9227.546875  4118.589286                 10   
407122  oinoflon-1  9209.463542  4150.187500                 27   
407123  oinokmkj-1  9239.280208  4116.797222                 39   

        control_probe_counts  genomic_control_counts  control_codeword_counts  \
407114                     0                       0                        0   
407115                     0                       0                        0   
407116                     0                       0                        0   
407117                     0                       0                        0   
407118                     0                       0                        0   
407119                     0                       0                        0   
407120                     0                       0                        0   
407121                     0                       0                        0   
407122                     0                       0                        0   
407123                     0                       0                        0   

        unassigned_codeword_counts  deprecated_codeword_counts  total_counts  \
407114                           0                           0           201   
407115                           0                           0            34   
407116                           0                           0             6   
407117                           0                           0             3   
407118                           0                           0            80   
407119                           0                           0             4   
407120                           0                           0            74   
407121                           0                           0            10   
407122                           0                           0            27   
407123                           0                           0            39   

        cell_area  nucleus_area  nucleus_count         segmentation_method  
407114       41.0           0.0              0  Imported Cell Segmentation  
407115       44.0           0.0              0  Imported Cell Segmentation  
407116       40.0           0.0              0  Imported Cell Segmentation  
407117       15.0           0.0              0  Imported Cell Segmentation  
407118      177.0           0.0              0  Imported Cell Segmentation  
407119       36.0           0.0              0  Imported Cell Segmentation  
407120      113.0           0.0              0  Imported Cell Segmentation  
407121       28.0           0.0              0  Imported Cell Segmentation  
407122       48.0           0.0              0  Imported Cell Segmentation  
407123       90.0           0.0              0  Imported Cell Segmentation

I have been able to create an sf object from this segmentation by manually running arrow::read_parquet and then st_polygon on cell groups from the resulting data frame), though I did not reach the point of reading cells.parquet (was only focused on the geometry with associated cell type annotations).

BTW, i saw there was another issue referencing proseg, and in this case I am using a Xenium outs dir created from a proseg (v3) segmentation (via conversion to baysor style and then a new spaceranger run).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions