exprDis = pd.read_csv('~/Users/us/data/transcriptomics.csv', index_col=0)
exprDis.index.name = 'Gene'
exprDis
exprDis = renorm_expressions(exprDis, '/Users/us/data/data/genes_in_expression.txt')
exprDis
# The above codes did not throw any error rather the data frame as indicated above or below
preds = model.predict(exprDis) * 100
ValueError Traceback (most recent call last)
Cell In[23], line 1
----> 1 preds = model.predict(expr) * 100
File ~/Users/us/Kassandra/core/model.py:154, in DeconvolutionModel.predict(self, expr, use_l2, add_other, other_coeff)
147 def predict(self, expr, use_l2=False, add_other=True, other_coeff=0.073468):
148 """
149 Prediction pipeline for the model.
150 :param expr: pd df with samples in columns and genes in rows
151 :param predict_cells: If RNA fractions to be recalculated to cells fractions.
152 :return: pd df with predictions for cell types in rows and samples in columns.
153 """
--> 154 self.check_expressions(expr)
155 expr = self.renormalize_expr(expr)
156 preds = self.predict_l2(expr)
File ~/Users/us/Kassandra/core/model.py:171, in DeconvolutionModel.check_expressions(self, expr)
169 diff = set(self.cell_types.genes).difference(set(expr.index))
170 if diff:
--> 171 raise ValueError("EXPRESSION MATRIX HAS TO CONTAIN AT LEAST ALL THE GENES THAT ARE USED AS A FEATURES")
172 diff = set(self.cell_types.genes).symmetric_difference(set(expr.index))
173 if not diff:
ValueError: EXPRESSION MATRIX HAS TO CONTAIN AT LEAST ALL THE GENES THAT ARE USED AS A FEATURES
Hi,
Used the Kassandra model for predicting cell types based on normalized TPM data, but it’s not working as expected. The model is unable to find some of the genes used as features from 12-13K genes in my matrix. How many genes should overlap for the Kassandra model to work?