Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicted sample export failed in GUI #50

Open
aniruddh-shukla opened this issue Apr 7, 2021 · 5 comments
Open

Predicted sample export failed in GUI #50

aniruddh-shukla opened this issue Apr 7, 2021 · 5 comments

Comments

@aniruddh-shukla
Copy link

Hello,
I am trying to map a list of fastq files to reference mus musculus for my research. My system has 32GB ram and i7 processor. But when i start running i get the following error. I don't have any idea how to solve this. I would be highly obliged if you help me solve this .
I have attached the image for your reference. Thanking you for your time.

Error

@nsalomonis
Copy link
Owner

Hi Aniruddh,

As you note, when performing an unsupervised analysis with ICGS2 with FASTQ files loaded, the program supports processing FASTQs through the Kallisto workflow to produce TPM quantified gene expression files. Below is an example I just ran in the software using some downsampled paired-end FASTQ files:
image

If this analysis fails there are different possible reasons. If it immediately fails, it suggests that there is a simply a naming issue with the FASTQ files not being sufficiently recognized by AltAnalyze and communicated to Kallisto. Kallisto here is by default a version of Kallisto we created called Kallisto-splice, that also produced spliced-aligned, genomic coordinate associate BAM and BED files used for possible downstream splicing analyses.

My guess is that AltAnalyze was not properly recognizing which files were either fastq files or read 1 and read 2. Can you indicate an example names of paired-end fastq files? AltAnalyze supports a number of possible fastq file names, including recognizing read1 and read2, but it can fail if it doesn't recognize the names. As shown in the above screen shot, of the files produced in the created ExpressionInput directory, which includes the folder Kallisto and log files for each sample run (run_info.json) and an overall log file (log.txt). If the file names are OK, then the issue is likely in these log files.

Best,
Nathan

@aniruddh-shukla
Copy link
Author

Hello Nathan,
Thanks for your quick response. Yeah, it fails immediately after I start the analysis. I have attached the screenshot for my fastq files. The R1 here is read 1 and R2 is read 2. If you can have a look and suggest if I should change the names of fastq files and try or should I try something else?
image
Thanking you for your time.
Aniruddh

@nsalomonis
Copy link
Owner

nsalomonis commented Apr 8, 2021

Hi Aniruddh, 

Off hand, the one issue I see is that all files have "_001" at the end. Can you remove "_001"? I believe the software thinks these are all read 1, which is overriding its detection of R1 and R2. If it fails, please look in log files I mentioned above.

Best,
Nathan

@aniruddh-shukla
Copy link
Author

Hello Nathan,
As you can see I have changed the file names as suggested and ran
image
but the problem is I'm still getting the error and this stops immediately after I run the analysis using UI.
image
I also did go to the Kallisto folder to check the log files but there is nothing generated.
image
image
Can you help??

@nsalomonis
Copy link
Owner

Hi Aniruddh,

If no files are produced in the Kallisto directory, then Kallisto ends up not being called. In such cases there are two options:

  1. Call AltAnalyze's version of Kallisto directly from the command-line as a test
  2. Try test FASTQ files to make sure something odd about the specific system configuration on your computer is not conflicting with AltAnalyze.

I would suggest #2 first, but before this, I recommend running again, but this time, do not select the ICGS option, denote at least two biological groups and set comparisons in the GUI. When doing so, please send the screen shot of how the samples show up to ensure it is not some hidden character naming that is a problem. Will be also good to see if the same issue occurs. Otherwise, to proceed with number 2, see the below instructions provided with the software:


Input Files for Kallisto-Splice Analysis

The latest version of AltAnalyze introduces a new method to quickly process raw sequencing data (FASTQ files) to directly produce gene expression and alternative splicing estimates, without any additional software on your desktop or laptop computer.

Demo Files
Two zip files with very small FASTQ files for demonstration purposes are available here:
http://altanalyze.org/Data/Hs_GSE45419_FASTQs.zip (344MB - Human downsampled)
http://altanalyze.org/Data/Mm-FASTQ-GSE70245.zip (59MB - Mouse scRNA-Seq)

The Breast Cancer dataset was downsampled from the original fast files as described here: https://www.synapse.org/#!Synapse:syn7286377/files/
The human breast cancer samples correspond to two subtypes (ER-positive and Triple-negative). Unzip these file before proceeding (right click and extract of open and extract to this directory - e.g., WinZip).

Running Kallisto-Splice through the Graphical User Interface

  1. Double-click on the AltAnalyze executable (see Running-AltAnalyze file in the program directory for problems opening).

  2. Install the correct species database when prompted.
    a. For the demo dataset, ensure Homo Sapiens is downloaded. Any version of the database should be compatible (e.g., EnsMart72).

  3. From the main menu in AltAnalyze, select RNA-Seq as the “Select vendor/data type”. Then select the Continue button.

  4. Select the “Process RNA-Seq reads” radio button. Then select the Continue button.

  5. Dataset Location: Enter a dataset name of choice (e.g., Breast_cancer). For the “Select FASTQ files to run in Kallisto” , select the location of the unzipped FASTQ directory. The program will process all FASTQ files in that directory. Select the output directory, which is the folder to save all results and subsequent input files to. Then select the Continue button.

  6. Expression Analysis Parameters: Choose the additional options you want to include or exclude for the pipeline analyses (optional). These include pathway analyses options, which statistical comparison tests to apply for differential expression analysis. If users wish they can select “no” for the option “Perform alternative analysis, which will skip the splicing analyses and process the Kallisto TPM expression file instead of the produced exon-exon junction derived gene RPKM file. When complete, select the Continue button.

  7. Pathway Analysis Options: Here, the user will be prompted to specify the statistical cutoff applied for differential gene and splicing analyses. The adjp indicates an FDR corrected p-value versus a non-corrected p-value.

  8. Alternative Splicing Analysis Options: The default recommended method for splicing analysis is selected (MultiPath-PSI), however, alternative and additional algorithm options are available. When finished, select the “Run Analysis” option.

  9. Groups Designation: Type a label for each FASTQ sample shown (e.g., ERpos, TripleNeg). This will create a “groups.” text file in the output directory folder ExpressionInput. This file will be reloaded when.

  10. Comparisons Designation: Select the experimental and control datasets to compare to (e.g., TripleNeg vs. ERpos). Select “Continue” to run the analysis.

  11. Analysis Progress: A black screen will appear once the analysis has begun. Be patient as the software is performing a series of in-depth analyses, including indexing of the Kallisto transcriptome (run the first time FASTQ files are processed), Kallisto pseudo-alignment to the reference transcriptome, BAM file generation with genome coordinates for all pseudo-aligned reads, gene expression quantification, differential gene expression analysis, QC analysis, network analysis, marker identification, pathway analysis and alternative splicing analysis.

Outputs of Kallisto-Splice

There are a large array of results from this workflow which can be found in the below described folders. Note, a separate PDF file is saved to the root directory describing the files in each of these folders. Please refer to those PDFs for details.

  1. ExpressionInput: This includes all expression estimates for exon-exon junctions, kallisto isofroms and genes as normalized values (TPM and RPKM) and counts. All Kallisto results are saved to the Kallisto_results folder along with the number of percentage of aligned reads.
  2. ExpressionOutput: This folder contains all computed differential gene expression results, primarily found in the DATASET file. The MarkerFinder folder contains the top markers assigned to each sample group (AllGenes_correlations-ReplicateBased.txt file).
  3. DataPlots: This folder contains the majority of saved plots as pdf and png files. Note, that the MarkerFinder folder in this directory contains additional plots. Splicing associated plots will also be saved to the folder AltResults/AlternativeOutput.
  4. AltResults: This directory contains all splicing analysis results. This most important file is “Hs_RNASeq_top_alt_junctions-PSI_EventAnnotation.txt”, which contains all MultiPath-PSI detected splicing events and associated annotations. Statistical comparison results are saved to the “Events-dPSI” folder and splicing graphs to the SashimiPlots folder in the output directory (derived from the BAM files in the output directory.
  5. SashimiPlots: This folder contains the PDF and PNG outputs for genome and exon-exon junction aligned reads associated with example top-significant alternative splicing events. Users can output addition such plots from the Additional Analyses menu option “AltExon Viewer”.
  6. GO-Elite: This folder contains all pathway and gene-set enrichment analysis results. See each folder for the “pruned-results_z-score_elite.txt” (open in Excel or equivalent). Network graphs, heatmaps comparing different comparison groups and optional colored WikiPathways are saved to these directories.

Below is an example command similar to the above analysis:

python AltAnalyze.py --platform "RNASeq" --species Mm —fastq_dir /Users/altanalyze/DemoData/Mm-FASTQ-GSE70245-DownSampled/ --groupdir /Users/altanalyze/DemoData/Mm-FASTQ-GSE70245-DownSampled/groups.Breast_cancer.txt --compdir /Users/altanalyze/DemoData/Mm-FASTQ-GSE70245-DownSampled/comp.Breast_cancer.txt --output /Users/altanalyze/DemoData/Mm-FASTQ-GSE70245-DownSampled/output --expname Breast_cancer --runGOElite yes --returnPathways all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants