Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data for pie chart proportions #105

Open
baligpanossian opened this issue May 2, 2024 · 8 comments
Open

data for pie chart proportions #105

baligpanossian opened this issue May 2, 2024 · 8 comments

Comments

@baligpanossian
Copy link

Hello, I ran EarlGrey on 21 assemblies but it only produced summary files for 5/21.
All runs produced SAMPLE-families.fa files showing representative sequences for each family, but the log files show the following:

Trimming and sorting based on mreps, TRF, SA-SSR
Error: object 'trimmed_seq' not found
Execution halted
Removing temporary files
Reclassifying repeats
cp: cannot stat 'TS_515_WP-families.fa_7185/trf/515_WP-families.fa.nonsatellite': No such file or directory
/workspace/fastas/515_WP/515_WP_EarlGrey/515_WP_strainer
Compiling library
WARNING: TEstrainer failed to produce a strain file, please check the log file for more information. If you have run an intial mask with known repeats, this could be due RepeatModeler2 failing to identify any new repeats. Please check if this is expected.
    
              )  (
	     (   ) )
	     ) ( (
	   _______)_
	.-'---------|  
       ( C|/\/\/\/\/|
	'-./\/\/\/\/|
	 '_________'
	  '-------'
	<<< Identifying Repeats Using Species-Specific Library >>>
RepeatMasker version 4.1.5
Search Engine: NCBI/RMBLAST [ 2.14.1+ ]
RepeatMasker::setspecies: Could not find user specified library /workspace/fastas/515_WP/515_WP_EarlGrey/515_WP_strainer/TS_515_WP-families.fa_7185/515_WP-families.fa.strained, or the file is empty.
ERROR: RepeatMasker failed, please check logs

This doesn't seem to be an assembly issue, because there were good and poor assembled sequences in both the successful runs and the failed runs.

I would appreciate any guidance on how/where to find the identified TEs in the temporary files in both successful and failed runs to calculate raw values for percentage of TEs covering the genomes, similar to what the pie charts show in the completed runs.

@TobyBaril
Copy link
Owner

In this case, TEstrainer has failed to produce a strained version of the input TE library. This could be because no non-satellite repeats were found in the RepeatModeler run, but this needs to be verified for the runs that failed. What version of Earl Grey are you using? Which OS? What flags did you use to run Earl Grey? Providing the whole log file should help to understand where the process has failed.

Verify that the -families.fa files contain sequences that you expect. Are any families missing or unexpected?

If you are happy to ignore any filtering steps, you can just rename *-families.fa to /workspace/fastas/515_WP/515_WP_EarlGrey/515_WP_strainer/TS_515_WP-families.fa_7185/515_WP-families.fa.strained and rerun Earl Grey with the same command to skip to the masking step. I strongly recommend against this as this runs the risk of TE annotations being wrong, as well as the consensus sequences not being refined

@baligpanossian
Copy link
Author

baligpanossian commented May 2, 2024

Thank you for the prompt response. It is possible that these samples had no non-satellite repeats, but I'd want to check before ruling that out. Also, I'm now only trying to get the raw data with which the pie charts are generated.
I've used EarlGrey 4.2.3 with a conda installation in linux. My command for this sample was
EarlGrey -t 8 -g 515_WP.fasta -s 515 -o 515

The full log file is attached

515_WPEarlGrey.log

@TobyBaril
Copy link
Owner

TobyBaril commented May 2, 2024

The script that failed was the simple repeat filter trimming step in TEstrainer... @jamesdgalbraith any ideas on this one?

It looks like all previous steps of TEstrainer completed successfully, so it is something in the post-processing that has caused an issue

@TobyBaril
Copy link
Owner

Regarding the data used for the plots, this is in the _summaryFiles directory after a successful run - All the plots are generated using SAMPLE.filteredrepeats.gff

@baligpanossian
Copy link
Author

Regarding the data used for the plots, this is in the _summaryFiles directory after a successful run - All the plots are generated using SAMPLE.filteredrepeats.gff

Thank you, this is perfect for the samples that ran successfully. As for those with an unsuccessful run that didn't generate anything in the _summaryFIles , can I manually calculate the data from another file upstream of the summary?

@TobyBaril
Copy link
Owner

You will need to run the final repeatmasker step, followed by the post-filtering. The easiest way to do this is to rename the families.fa with the .strained file, delete everything in *mergedRepeats/ , *RepeatMasker_Against_Custom_Library/ (but not the directories themselves) then run exactly the same command you did before, and Earl Grey will continue as if the TEstrainer step completed successfully.

I would still recommend trying to work out why the TEstrainer step failed, hopefully James could give us some more insight!

@jamesdgalbraith
Copy link
Collaborator

Sorry for the delay, I think I've identified the problem.

In the /workspace/fastas/515_WP/515_WP_EarlGrey/515_WP_strainer/TS_515_WP-families.fa_7185/TRF/ folder is there a fasta file tha ends with the extension .nonsatellite? If not I think that's the error and I'll need to patch this.

@baligpanossian
Copy link
Author

Hello again, thank you for following up on this to try to find a fix.
After attempting the workaround you suggested previously in this thread, I came across an error indicating that the .nonsatellite file you mention here is not found.
I tried to fix this by adding a copy of the *-families.fa file and renaming it to *-families.fa.nonsatellite in the same directory (/TRF/) but it still didn't produce the summary files.
Hope this helps narrow it down

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants