Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to enable cell type annotation like previous AltAnalyze Version. #45

Open
ewijaya opened this issue Jun 8, 2020 · 4 comments
Open

Comments

@ewijaya
Copy link

ewijaya commented Jun 8, 2020

I have the following data downloadable here.

Now, I'm using the most recent version of AltAnalyze.

However when I tried the following script:

ALTANALYZE=/home/ubuntu/storage2/Tools/altanalyze/AltAnalyze.py
/home/ubuntu/anaconda2/bin/python $ALTANALYZE \
    --runICGS yes \
    --expdir test_outdir  \
    --platform RNASeq \
    --species Mm \
    --column_method hopach --rho 0.4 \
    --ExpressionCutoff 4\
    --FoldDiff 3  \
    --SamplesDiffering 3\
    --excludeCellCycle conservative

I cannot get this kind of plot where the cell type is assigned on the left.
Like the previous version of AltAnalyze.

IMG_20200605_130618

I have removed the old version and don't know anymore which the previous version can create that.
Please advice how can I go about it.

@nsalomonis
Copy link
Owner

nsalomonis commented Jun 8, 2020

Hi Edward,
As you note, when running ICGS (currently version 2), with a command like you specified, the Guide3 results in the ICGS folder and final NMF-defined clusters (marker gene visualized with typically many more clusters) will also have cell-type predictions. Indeed, these are much better in the current version in which there are marker genes for thousands of cell-type specific signatures. First, I would confirm that you are getting the ICGS-NMF folder which provide the primary results for ICGS2. the command you are using is fine, but is typically too stringent for large droplet sequencing experiments. For example:

python AltAnalyze.py --platform RNASeq --species Mm --excludeCellCycle no --removeOutliers yes --ChromiumSparseMatrix /Users/saljh8/DemoData/mouse.h5 --output /Users/saljh8/DemoData/ --runICGS yes --expname test

image

image

The more verbose version of this command (displaying default options) is:

python AltAnalyze.py --platform RNASeq --species Mm --restrictBy protein_coding --excludeCellCycle no --removeOutliers yes --ChromiumSparseMatrix "/Users/saljh8/DemoData/mouse.h5" --output "/Users/saljh8/DemoData/" --runICGS yes --expname test --downsample 2500 --column_method hopach --column_metric cosine --rho 0.2 --ExpressionCutoff 1 --FoldDiff 4 --SamplesDiffering 4 --restrictBy protein_coding --numVarGenes 500 --numGenesExp 500

ICGS2 applies a dynamic correlation cutoff which begins at 0.2 and increases by 0.1 if > 5000 correlated variable genes are obtained.

if you had a tab-delimited file with counts you would add: --dataFormat counts
if you want to force a specific number of target clusters: --k 23

If the enriched blue terms do not display in these results (Guide3 in ICGS or FinalMarkerHeatmap in ICGS-NMF), I would assume there was an issue with downloading the GO-Elite database which can be found in the software AltDatabase folder under EnsMart72/goelite/Hs/gene-mapp/Ensembl-BioMarkers.txt

If present, you can try to add these cell-type enrichment results by finding the text file corresponding to the heatmap of interest (e.g., ICGS-NMF/FinalMarkerHeatmap.txt) and supplying the --clusterGOElite BioMarkers option with the hierarchical clustering command:

python AltAnalyze.py --image hierarchical --platform RNASeq --species Mm --display False --input "/Users/saljh8/DemoData/ICGS-NMF/FinalMarkerHeatmap.txt" --contrast 5 --color_gradient yellow_black_blue --column_method None --row_method None --column_metric cosine --row_metric correlation --normalization median --clusterGOElite BioMarkers

This uses the prior clustering rather than re-clustering (replace None with hopach to re-cluster). You can see what the print out is which should indicate cell-type enrichments or produce a specific error if something is missing.

@ewijaya
Copy link
Author

ewijaya commented Jun 8, 2020

Hi Nathan,

Thank you so much for your prompt response.
Can you advise the exact command line I can use for the attached TSV matrix file as input?
The TSV file downloadable here.

Using my initial command line, I looked at ICGS-NMF subdirectory, but it only contains one file FinalGroups.txt.

Thanks and I hope to hear from you again.

E.
P.S. I can't find the example DemoData/ICGS-NMF/FinalMarkerHeatmap.txt in your github.

@nsalomonis
Copy link
Owner

Hi Edward,
I used the example path as a local path on my machine, but an example path in the GitHub (which is build with an older version of ICGS2 without as nice graphics) is:
GitHub/altanalyze/DemoData/ICGS/10xGenomics/Mm-e14.5_Kidney-GSE104396/precomputed_results/ICGS-NMF

If your output contains FinalGroups.txt, you can post any errors in the log file (should be closer to the end) that is produced by AltAnalyze (designed output directory: AltAnalyze_timestamp.log).

The file you contained is formatted properly, but the extension should just be changed to ".txt". For example:

python AltAnalyze.py --platform RNASeq --species Mm --excludeCellCycle no --removeOutliers yes --expdir "/Users/saljh8/DemoData/matrix.txt" --output "/Users/saljh8/DemoData/" --runICGS yes --expname test

@ewijaya
Copy link
Author

ewijaya commented Jun 9, 2020

Hi Nathan,

Thank you for your reply.
I tried this following command:

/home/ubuntu/storage2/Tools/altanalyze/AltAnalyze.py --platform RNASeq --species Mm --excludeCellCycle no --removeOutliers yes --runICGS yes --expdir /home/ubuntu/storage2/tmp/test_altanalyze/output/rnaseq_altanalyze_fc2/ExpressionInput/result.txt --output /home/ubuntu/storage2/tmp/test_altanalyze/output --rho 0.4 --column_method None --expname rnaseq_altanalyze_fc2 --ExpressionCutoff 1 --FoldDiff 2 --SamplesDiffering 3

The input file result.txt can be downloaded here and the log file here.

I still cannot produce plot with cell type assignment in ICGS-NMF directory.

As you will notice in the log file. There are some errors. I'm not sure if it's
caused by my data or real bug in the code:

 File "/home/ubuntu/storage2/Tools/altanalyze/stats_scripts/ICGS_NMF.py", line 1049, in CompleteICGSWorkflow
    NMFinput,Rank=NMF_Analysis.FilterGuideGeneFile(Guidefile,Guidefile_block,processedInputExpFile,iteration,platform,uniqueIDs,symbolIDs)
  File "/home/ubuntu/storage2/Tools/altanalyze/stats_scripts/NMF_Analysis.py", line 114, in FilterGuideGeneFile
    rank_Count=int(q[n-1])
ValueError: invalid literal for int() with base 10: 'NA'

Thanks and I hope to hear from you again.
E.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants