parameters used in cell type annotation tutorial #174

yueming-ding · 2024-03-28T20:59:48Z

@subercui I have questions on your cell type tutorial https://github.com/bowang-lab/scGPT/blob/main/tutorials/Tutorial_Annotation.ipynb. You use "normalize_total=1e4" in the following preprocessor function:
preprocessor = Preprocessor(
use_key="X", # the key in adata.layers to use as raw data
filter_gene_by_counts=filter_gene_by_counts, # step 1
filter_cell_by_counts=False, # step 2
normalize_total=1e4, # 3. whether to normalize the raw data and to what sum
result_normed_key="X_normed", # the key in adata.layers to store the normalized data
log1p=data_is_raw, # 4. whether to log1p the normalized data
result_log1p_key="X_log1p",
subset_hvg=False, # 5. whether to subset the raw data to highly variable genes
hvg_flavor="seurat_v3" if data_is_raw else "cell_ranger",
binning=n_bins, # 6. whether to bin the raw data and to what number of bins
result_binned_key="X_binned", # the key in adata.layers to store the binned data
)

But I checked the data you used. Both datasets (c_data.h5ad and filtered_ms_adata.h5ad) are log1p data (not raw data) at adata.X. I think the correct parameter for normalize_total should be "False" (normalize_total=False), not normalize_total=1e4. Could you explain why you use "normalize_total=1e4" here if adata.X is log1p not raw data (basically now normalize log1p by total counts if normalize_total=1e4 is used)? We found that the X_binned values and the final stat metrics (accuracy, precision etc) were changed if "normalize_total=1e4" or "normalize_total=False" were used for these datasets .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parameters used in cell type annotation tutorial #174

parameters used in cell type annotation tutorial #174

yueming-ding commented Mar 28, 2024

parameters used in cell type annotation tutorial #174

parameters used in cell type annotation tutorial #174

Comments

yueming-ding commented Mar 28, 2024