Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue plotting TWISST results #30

Open
rancilhac opened this issue Mar 16, 2022 · 3 comments
Open

Issue plotting TWISST results #30

rancilhac opened this issue Mar 16, 2022 · 3 comments

Comments

@rancilhac
Copy link

rancilhac commented Mar 16, 2022

Hello,
I am currently trying to plot results from multiple TWISST runs (1 per chromosome), but I run into the following error:

> list.w <- list.files(pattern="_w") #vector of the weight files
> list.T <- list.files(pattern="_T") #vector of the topology files
> list.tsv <- list.files(pattern="data.tsv") #vector of the window files
> twisst.all <- import.twisst(weights_files = list.w, window_data_files = list.tsv, topos_file=list.T)
[1] "Reading weights and window data"
[1] "Number of regions: 31"
[1] "Computing summaries"
[1] "Cleaning data"
[1] "Getting topologies"
Error in file(file, "r") : invalid 'description' argument
In addition: Warning messages:
1: In is.na(apply(l$weights[[i]], 1, sum)) == F & l$window_data[[i]]$end -  :
  longer object length is not a multiple of shorter object length
2: In is.na(apply(l$weights[[i]], 1, sum)) == F & l$window_data[[i]]$end -  :
  longer object length is not a multiple of shorter object length
3: In is.na(apply(l$weights[[i]], 1, sum)) == F & l$window_data[[i]]$end -  :
  longer object length is not a multiple of shorter object length
4: In is.na(apply(l$weights[[i]], 1, sum)) == F & l$window_data[[i]]$end -  :
  longer object length is not a multiple of shorter object length
5: In is.na(apply(l$weights[[i]], 1, sum)) == F & l$window_data[[i]]$end -  :
  longer object length is not a multiple of shorter object length
6: In is.na(apply(l$weights[[i]], 1, sum)) == F & l$window_data[[i]]$end -  :
  longer object length is not a multiple of shorter object length
7: In is.na(apply(l$weights[[i]], 1, sum)) == F & l$window_data[[i]]$end -  :
  longer object length is not a multiple of shorter object length
8: In is.na(apply(l$weights[[i]], 1, sum)) == F & l$window_data[[i]]$end -  :
  longer object length is not a multiple of shorter object length
9: In if (file == "") file <- stdin() else { :
Error in file(file, "r") : invalid 'description' argument"

If I try to run the same thing without window files, I get a different error:

> twisst.all <- import.twisst(weights_files = list.w, topos_file=list.T)
[1] "Reading weights"
Error in names(l$window_data) <- names(l$weights_raw) <- paste0("region",  : 
  'names' attribute [31] must be the same length as the vector [1] "

I thought this error could be related to missing data (windows where phylogenetic inference failed) in some chromosomes, but trying to run these commands just on chromosomes without missing data gave the same results.
Also, it should be noted that I am able to plot the chromosomes individually without specifying the window file (chromosomes with missing data) or with specifying window files (chromosomes without missing data).

Is there any way to overcome this issue and plot all the chromosomes at once, preferably with the window files specified?

Thanks a lot in advance,
Loïs Rancilhac

@simonhmartin
Copy link
Owner

Hi Loïs,
I haven't seen an error like this before. I could have a look if you send me the data, but I won't get to this for about a week (teaching and marking). Alternatively, you could try what i would try, which is:

  1. Try running with subsets of the chromosomes to see if only one or a few of them are problematic
    If they all cause problems:
  2. Confirm that the number of lines in the weights and windows files are the same.
    If they are:
  3. See whether it is possible to make minimal files with say 100 windows per chromosome that can be loaded - this would indicate that there is some point in the file where things start to go wrong.
    If this is possible:
  4. Make increasingly larger files to find the first line that causes failure.

@rancilhac
Copy link
Author

Hi Simon,
thanks a lot for your reply. The problem came from the fact that I was specifying several topology files. By specifying a single topology file, I can import all my weight files, but only if I do not have missing data and if I specify windows files. If I don't specify windows files I still get the second error mentionned in my initial message, but maybe the function is not supposed to work without windows file in case several weight files are imported together?

This leads me to another question: do you have an idea of how I could plot the chromosomes that have missing data together with the others? I can always plot them separately, but if I could plot all chromosomes at once that would be even better.

Cheers,
Loïs

@simonhmartin
Copy link
Owner

Hi Loïs,
I'm sorry I never responded to this. Did you ever manage to solve the problem. You should definitely be able to import multiple weights files and multiple windows file together, so I still don't understand why you got an error.
Simon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants