Package to process images and their features from .cyz files from the CytoSense and upload them to EcoTaxa.
NB: As for all things Python, you should preferrably install CytoProcess within a Python venv/coda environment. The package is tested with Python=3.11 and should therefore work with this or a more recent version. To create a conda environment, use
conda create -n cytoprocess python=3.11
conda activate cytoprocessThen install the sable version with
pip install cytoprocessor the development version with
pip install git+https://github.com/jiho/cytoprocess.gitThe Python package includes a command line tool, which should become available from within a terminal. To try it and output the help message
cytoprocessCytoProcess depends on Cyz2Json. To install it, run
cytoprocess installCytoProcess uses the concept of "project". A project corresponds conceptually to a cruise, a time series, etc. Practically, it is a directory with a specific set of subdirectories that contain all files related to the cruise/time series/etc. It corresponds to a single EcoTaxa project.
Each .cyz file is considered as a "sample" (and will correspond to an EcoTaxa sample).
A project is organised like so
my_project/
config configuration files
raw source .cyz files
meta file storing manually-provided metadata for each sample(lat, lon, etc.)
work data extracted by the various processing steps
<sample_id_1> in one folder per sample
converted_data.json file converted from .cyz by Cyz2Json
cytometric_features.parquet average cytometric measurement per image
image_features.parquet features computed on each image (area, etc.)
images images with scale bar and mask for the particle
metadata.parquet instrument metadata extracted from the .json file
pulses_plots plot of the pulse shapes of imaged particles
pulses_summaries.parquet polynomial summaries of the pulse shapes
<sample_id_2>
...
ecotaxa .zip files ready for upload in EcoTaxa
logs logs of all commands executed on this project, split per day
A CytoProcess command line looks like
cytoprocess --global-option command --command-option project_directoryTo know which global options and which commands are available, use
cytoprocess --helpTo know which options are available for a given command
cytoprocess command --helpUse
cytoprocess create path/to/my_projectThen copy/move the .cyz files that are relevant for this project in my_project/raw. If you have an archive of .cyz files organised differently, you should be able to symlink them in my_project/raw instead of copying them.
List available raw samples and create the meta/samples.csv file with
cytoprocess list path/to/my_projectManually enter the required metadata (such as lon, lat, etc.) in the .csv file. You can add or remove columns as you see fit, you can use the option --extra-fields (or -e) to change the default columns added. The conventions follow those of EcoTaxa.
Then, perform all processing steps, for all samples, with default options
cytoprocess all path/to/my_projectIf you want to know the details, or proceed manually, the steps behind all are:
# convert .cyz files into .json and create a placeholder its metadata
cytoprocess convert path/to/project
# extract instrument provided metadata from each .json file
cytoprocess extract_meta path/to/project
# extract cytometric features for each imaged particle
cytoprocess extract_cyto path/to/project
# compute pulse shapes polynomial summaries for each imaged particle
cytoprocess summarise_pulses path/to/project
# extract images and their features
cytoprocess extract_images path/to/project
# prepare files for ecotaxa upload
cytoprocess prepare path/to/project
# upload them to EcoTaxa
cytoprocess upload path/to/projectTo check how far along the processing of each sample is, you can use
cytoprocess status path/to/projectTo process a subset of samples, use
cytoprocess --sample 'name_of_cyz_file' command path/to/projectwhich processes this single sample. Or
cytoprocess --sample '*foo*' command path/to/projectwhich process all samples whose name contains foo.
All commands will skip the processing of a given sample if the output is already present. To re-process and overwrite, use the --force option.
For metadata and cytometric features extraction (extract_meta and extract_cyto), information from the json file needs to be curated and translated into EcoTaxa metadata columns. This is defined in the configuration file my_project/config/config.yaml. It contains key: value pairs of the form json.fields.item.name: ecotaxa_name. To get the list of possible json fields, use the --list (or -l) option for extract_meta or extract_cyto; it will write a text file in config with all possibilities. You can then copy-paste them to config/config.yaml.
Even with all these fields available, the CytoSense does not record some relevant metadata such as latitude, longitude, and date of collection of each sample, which EcoTaxa needs to filter the data or export it to other data bases. You should provide such fields manually by editing the meta/samples.csv file.
If you change this metadata or the mapping of fields in config.yaml and want to reimport the modified .tsv files on EcoTaxa, you can do so with
# re-generate the .tsv files with the corrected metadata
cytoprocess prepare --force path/to/project
# re-upload the .tsv only and use "Update metadata" mode
cytoprocess upload --update path/to/projectBecause everything is stored in the EcoTaxa .zip files and can be re-generated from the .cyz files, you may want to remove the intermediate files, in work, as well as old log files, to reclaim disk space. For example, to remove intermediate files and log files older than 20 days
cytoprocess clean --older-than 20 path/to/projectFork this repository, clone your fork.
Prepare your development environment by installing the dependencies within a conda environment
conda create -n cytoprocess python=3.11
conda activate cytoprocess
pip install -e .This creates a cytoprocess.egg-info directory at the root of the package's directory. It is safely ignored by git (and you should too).
Now, either run commands as you normally would
cytoprocess --helpor call the module explicitly
python -m cytoprocess --helpAny edits made to the files are immediately reflected in the output (because the package was installed in "editable" mode: pip install -e ... ; or is run directly as a module: python -m ...).