You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@subercui I have questions on your cell type tutorial https://github.com/bowang-lab/scGPT/blob/main/tutorials/Tutorial_Annotation.ipynb. You use "normalize_total=1e4" in the following preprocessor function:
preprocessor = Preprocessor(
use_key="X", # the key in adata.layers to use as raw data
filter_gene_by_counts=filter_gene_by_counts, # step 1
filter_cell_by_counts=False, # step 2
normalize_total=1e4, # 3. whether to normalize the raw data and to what sum
result_normed_key="X_normed", # the key in adata.layers to store the normalized data
log1p=data_is_raw, # 4. whether to log1p the normalized data
result_log1p_key="X_log1p",
subset_hvg=False, # 5. whether to subset the raw data to highly variable genes
hvg_flavor="seurat_v3" if data_is_raw else "cell_ranger",
binning=n_bins, # 6. whether to bin the raw data and to what number of bins
result_binned_key="X_binned", # the key in adata.layers to store the binned data
)
But I checked the data you used. Both datasets (c_data.h5ad and filtered_ms_adata.h5ad) are log1p data (not raw data) at adata.X. I think the correct parameter for normalize_total should be "False" (normalize_total=False), not normalize_total=1e4. Could you explain why you use "normalize_total=1e4" here if adata.X is log1p not raw data (basically now normalize log1p by total counts if normalize_total=1e4 is used)? We found that the X_binned values and the final stat metrics (accuracy, precision etc) were changed if "normalize_total=1e4" or "normalize_total=False" were used for these datasets .
The text was updated successfully, but these errors were encountered:
@subercui I have questions on your cell type tutorial https://github.com/bowang-lab/scGPT/blob/main/tutorials/Tutorial_Annotation.ipynb. You use "normalize_total=1e4" in the following preprocessor function:
preprocessor = Preprocessor(
use_key="X", # the key in adata.layers to use as raw data
filter_gene_by_counts=filter_gene_by_counts, # step 1
filter_cell_by_counts=False, # step 2
normalize_total=1e4, # 3. whether to normalize the raw data and to what sum
result_normed_key="X_normed", # the key in adata.layers to store the normalized data
log1p=data_is_raw, # 4. whether to log1p the normalized data
result_log1p_key="X_log1p",
subset_hvg=False, # 5. whether to subset the raw data to highly variable genes
hvg_flavor="seurat_v3" if data_is_raw else "cell_ranger",
binning=n_bins, # 6. whether to bin the raw data and to what number of bins
result_binned_key="X_binned", # the key in adata.layers to store the binned data
)
But I checked the data you used. Both datasets (c_data.h5ad and filtered_ms_adata.h5ad) are log1p data (not raw data) at adata.X. I think the correct parameter for normalize_total should be "False" (normalize_total=False), not normalize_total=1e4. Could you explain why you use "normalize_total=1e4" here if adata.X is log1p not raw data (basically now normalize log1p by total counts if normalize_total=1e4 is used)? We found that the X_binned values and the final stat metrics (accuracy, precision etc) were changed if "normalize_total=1e4" or "normalize_total=False" were used for these datasets .
The text was updated successfully, but these errors were encountered: