Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

erro twisst_data_smooth #12

Open
Wei-Gao-CAS opened this issue Apr 28, 2020 · 12 comments · May be fixed by #27
Open

erro twisst_data_smooth #12

Wei-Gao-CAS opened this issue Apr 28, 2020 · 12 comments · May be fixed by #27

Comments

@Wei-Gao-CAS
Copy link

twisst_data_smooth <- smooth.twisst(twisst_data, span_bp = 1000000, spacing = 50000)
Error in seq.default(twisst_object$pos[[i]][1], tail(twisst_object$pos[[i]], :
wrong sign in 'by' argument
Calls: smooth.twisst -> seq -> seq.default

@simonhmartin
Copy link
Owner

I have not encountered this error before. I could look into it if you share the weights file.

@giyany
Copy link

giyany commented Jan 4, 2022

Hi Simon,
I experience this error too, and have not been able to plot some of the scaffolds in my data-set. Others plot fine.

I'm attaching weights and the data: Scaffold_15 plots fine, Scaffold_2199 - does not.

Thanks
[output.run3.weights.csv](https://github.com/simonhmartin/twisst/files/7807756/outp
output.run4.data.tsv.gz
ut.run3.weights.csv)

@simonhmartin
Copy link
Owner

simonhmartin commented Jan 6, 2022

Hi giyany,
The problem appears to be that your window start and end positions do not increase consistently in the data file:

scaffold        start       end
Scaffold_1174   1000000     1010000
Scaffold_1174   100000      110000
Scaffold_1174   10040000    10050000

The R script is expecting each window to have a larger start and end position to the one before it. I'm not sure how this happened in your data, but you will need to correct the input files before using plot_twisst.R. If you think the results are correct and simply unordered, you could probably reorder the files quite easily with R by first inferring the correct order using the order function and then using this correct order to reorder the rows in each file.

Simon

@simonhmartin
Copy link
Owner

Actually I just realised that I've already implemented a fix for this. You can include reorder_by_start=TRUE in the import.twisst command and it will correct the data.

@giyany
Copy link

giyany commented Jan 6, 2022

Thanks, that makes a lot of sense. The issue was, when using reorder_by_start=TRUE, I got this error:

Error in predLoess(object$y, object$x, newx = if (is.null(newdata)) object$x else if (is.data.frame(newdata)) as.matrix(model.frame(delete.response(terms(object)), :
NA/NaN/Inf in foreign function call (arg 5)

in addition, the result of plot.twisst also seemed off, although it's not very clear to say: so I assumed reorder_by_start may not be doing what I expected.

Now I sort the data beforehand: the reason it was sorted that way was that I used the order of files as they appeared on command line to pull out the coordinates. There are probably better ways to do it, but apparently I'm not the only one.

@simonhmartin simonhmartin reopened this Jan 6, 2022
@simonhmartin
Copy link
Owner

I see. I will look into this. Perhaps there is still a bug in how I am doing the reordering.

@simonhmartin
Copy link
Owner

Would you please attach the weights file (output.run3.weights.csv) again? I couldn't download it for some reason.

@giyany
Copy link

giyany commented Jan 6, 2022

Happily:

output.run3.weights.csv.gz

@giyany
Copy link

giyany commented Jan 6, 2022

Another note: it seems to be a function of span_bp, maybe the data is simply too skewed/too many NA values for the span, not related to the order.

@simonhmartin
Copy link
Owner

I couldn't recreate your error, but I got different errors due to the files having different numbers of lines (possibly because the weights are from run3 and the window data from run4?).
Anyway I think you're right that the span needs to be set much larger for your data - probably at least 10 times the window size. You've used broad windows of 10kb for your trees. This is not recommended for most organisms, because in most species the span of distinct genealogies across the genome will be much less than 10kb. I think this is reflected in the fact that your data are strongly skewed toward one topology. In the original paper, we showed that that would happen if the tree spans are too large.

@giyany
Copy link

giyany commented Jan 7, 2022

output.run4.weights.csv

Yes, I attached the wrong file - sorry about that.
If you still want to look, here is the correct file.

Thanks a lot for this useful input, I'll re-do this considering just SNP numbers like the paper recommends.

@simonhmartin
Copy link
Owner

Great. Yes seems to work find with run4 after increasing the smoothing span.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants