Kassandra seem not finding common genes from 12-13k genes in my expression matrix data.

Hi, 
Used the Kassandra model for predicting cell types based on normalized TPM data, but it’s not working as expected. The model is unable to find some of the genes used as features from 12-13K genes in my matrix. How many genes should overlap for the Kassandra model to work? 

```
exprDis = pd.read_csv('~/Users/us/data/transcriptomics.csv', index_col=0)
exprDis.index.name = 'Gene'
exprDis
exprDis = renorm_expressions(exprDis, '/Users/us/data/data/genes_in_expression.txt')
exprDis

# The above codes did not throw any error rather the data frame as indicated above or below 
preds = model.predict(exprDis) * 100

ValueError                                Traceback (most recent call last)
Cell In[23], line 1
----> 1 preds = model.predict(expr) * 100

File ~/Users/us/Kassandra/core/model.py:154, in DeconvolutionModel.predict(self, expr, use_l2, add_other, other_coeff)
    147 def predict(self, expr, use_l2=False, add_other=True, other_coeff=0.073468):
    148     """
    149     Prediction pipeline for the model.
    150     :param expr: pd df with samples in columns and genes in rows
    151     :param predict_cells: If RNA fractions to be recalculated to cells fractions.
    152     :return: pd df with predictions for cell types in rows and samples in columns.
    153     """
--> 154     self.check_expressions(expr)
    155     expr = self.renormalize_expr(expr)
    156     preds = self.predict_l2(expr)

File ~/Users/us/Kassandra/core/model.py:171, in DeconvolutionModel.check_expressions(self, expr)
    169 diff = set(self.cell_types.genes).difference(set(expr.index))
    170 if diff:
--> 171     raise ValueError("EXPRESSION MATRIX HAS TO CONTAIN AT LEAST ALL THE GENES THAT ARE USED AS A FEATURES")
    172 diff = set(self.cell_types.genes).symmetric_difference(set(expr.index))
    173 if not diff:

ValueError: EXPRESSION MATRIX HAS TO CONTAIN AT LEAST ALL THE GENES THAT ARE USED AS A FEATURES
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kassandra seem not finding common genes from 12-13k genes in my expression matrix data. #12

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Kassandra seem not finding common genes from 12-13k genes in my expression matrix data. #12

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions