Relevant section from the CODEX preprint:
A 3D segmentation algorithm was therefore created to combine information from the nuclear staining and a ubiquitous membrane marker (in this case CD45) to define single-cell boundaries in crowded images such as lymphoid tissues. For each segmented object (i.e., cell) a marker expression profile, as well as the identities of the nearby neighbors were recorded (using Delaunay triangulation)
Software
Expanding on that list a bit:
Models Specific to Medical Imaging
- U-Net (Example TF-based implementation) - This appears to be a real workhorse architecture in medical image segmentation (there are dozens of implementations in TensorFlow and Caffe)
- V-Net - A TensorFlow implementation of 3d extensions to the U-Net
- NiftyNet (Site) - "NiftyNet is a TensorFlow-based open-source convolutional neural networks (CNN) platform for research in medical image analysis and image-guided therapy."
- If we have to retrain an architecture for segmentation I have to imagine this would be a top choice.
- Supports 2-D, 2.5-D, 3-D, 4-D inputs
- It has a Model Zoo but nothing in there for our modality yet, or anything even close
- (Original Publication](https://arxiv.org/abs/1709.03485)
Generic Architectures
- DeepLab (Google Research Post) - Google research project in the vein of Detectron
- My gut says we'd never have enough data to train these big general kinds of models but who knows
- SegNet - Another generic architecture for semantic segmentation which I only mention because it was brought up along with U-Nets in this webinar on advances in medical image analysis
Comments from @nsamusik on some things to keep in mind:
My main thought at this point is that the segmentation itself is just the first step, there also has to be a second step, where cell boundaries are optimized concomitantly with estimating the single-cell expression vectors. This way both the optimized cell boundaries and the expression data will likely look more accurate.
As for the benchmarking, I am happy to share a hand-labelled dataset that I have generated for the CODEX paper revisions. Here, each TIFF is matched with a TXT file that contains the coordinates of hand-labeled cell centers (X, Y, Z). There are no cell outlines labelled here, just the centers. In order to assess the segmentation quality, I computed several measures: R = Recall (% of hand-labelled centeres that ended up within a segmented cell region), S= Singlets (of those, what % how many ended up in a cell region with exactly 1 hand-labelled center), FPR = False positive rate (% cell regions without a hand-labelled center). Then I combined the three in a harmonic mean 3/(1/R + 1/S + 1/(1-FPR))
here's the link
https://drive.google.com/open?id=1wUNaZ5dv2mDn_wwcSXlnfof6SwoQmlsq
Relevant section from the CODEX preprint:
Software
Expanding on that list a bit:
Models Specific to Medical Imaging
Generic Architectures
Comments from @nsamusik on some things to keep in mind:
My main thought at this point is that the segmentation itself is just the first step, there also has to be a second step, where cell boundaries are optimized concomitantly with estimating the single-cell expression vectors. This way both the optimized cell boundaries and the expression data will likely look more accurate.
As for the benchmarking, I am happy to share a hand-labelled dataset that I have generated for the CODEX paper revisions. Here, each TIFF is matched with a TXT file that contains the coordinates of hand-labeled cell centers (X, Y, Z). There are no cell outlines labelled here, just the centers. In order to assess the segmentation quality, I computed several measures: R = Recall (% of hand-labelled centeres that ended up within a segmented cell region), S= Singlets (of those, what % how many ended up in a cell region with exactly 1 hand-labelled center), FPR = False positive rate (% cell regions without a hand-labelled center). Then I combined the three in a harmonic mean 3/(1/R + 1/S + 1/(1-FPR))
here's the link
https://drive.google.com/open?id=1wUNaZ5dv2mDn_wwcSXlnfof6SwoQmlsq