-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running netMHC #148
Comments
Hi Sofia, I am sorry for the delayed response. Thanks for trying the software. I'm happy to help you sort this out. NetMHC tool configuration is indeed often an issue - and that could be the issue here - but based on this error message:
I cannot be certain what is occurring. I agree that the mostly likely explanation is that the netMHC output is not as expected. In my opinion, the best path forward would be for you to try our Docker image. If it works in Docker, we know it is a configuration issue. If it does not, it could be a bug in our R package that we need to address. if you cannot use Docker, can you pare down the test data to a minimal set of predictions that works? |
Hi Andrew, Thank you very much for your response. I just tried to run the test data on peptide length 9 and only with the HLA-A*02:01 allele. Now I get an output file from hence netMHC and netMHCpan that look correct (netMHC_1245669d-3789-4c5d_o.csv and netMHCpan_fa954704-9be2-4f0d_o.csv). Though I still get the same error: I have never tried Docker before, but do you think it will be the best solution then? Kind regards Sofia |
Hi Sofia, Are you using these versions of NetMHC listed in the README?
|
Hi Andrew, Yes I have downloaded these versions: NetMHCpan 4.1b: https://services.healthtech.dtu.dk/cgi-bin/sw_request Kind regards Sofia |
This error is occurring here because the results from netMHC are not in the expected format. Could you take a look at the results tables Could you also test that you are able to run the netMHC tools from the command line and that they work normally? |
The output from netMHC looks normal: /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64/bin/netMHC -p -l 9 -a HLA-A0201 -f netMHC_1245669d-3789-4c5d.csvThu May 26 10:37:54 2022User: sofotePWD : /home/projects/SRHgroup/projects/MuPeXI_project/scripts/antigen_garnish/ag_848f53c57a5744e6a7Host: Linux g-01-c0024 3.10.0-1062.4.1.el7.x86_64 x86_64-p 1 Switch on if input is a list of peptides (Peptide format)-l 9 Peptide length (multiple lengths separated by comma e.g. 8,9,10)-a HLA-A0201 HLA allele name-f netMHC_1245669d-3789-4c5d.csv Input file (by default in FASTA format)Command line parameters set to:[-a line] HLA-A0201 HLA allele name[-f filename] netMHC_1245669d-3789-4c5d.csv Input file (by default in FASTA format)[-p] 1 Switch on if input is a list of peptides (Peptide format)[-l string] 9 Peptide length (multiple lengths separated by comma e.g. 8,9,10)[-s] 0 Sort output on decreasing affinity[-rth float] 0.500000 Threshold for high binding peptides (%Rank)[-rlt float] 2.000000 Threshold for low binding peptides (%Rank)[-listMHC] 0 Print list of alleles included in netMHC[-xls] 0 Save output to xls file[-xlsfile filename] NetMHC_out.xls File name for xls output[-t float] -99.900002 Threshold for output[-thrfmt filename] /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64/data/threshold/%s.thr Format for threshold filenames[-hlalist filename] /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64/data/allelelist File with covered HLA names[-rdir filename] /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64 Home directory for NetMHC[-tdir filename] /scratch/35872875 Temporary directory (Default $$)[-syn filename] /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64/data/synlists/%s.synlist Format of synlist file[-v] 0 Verbose mode[-dirty] 0 Dirty mode, leave tmp dir+files[-inptype int] 0 Input type [0] FASTA [1] Peptide[-version filename] /home/projects/SRHgroup/apps/antigen.garnish/netMHC/netMHC-4.0/Linux_x86_64/data/version File with version information[-w] 0 w option for webfaceNetMHC version 4.0Input is in PEPTIDE formatRank Threshold for Strong binding peptides 0.500Rank Threshold for Weak binding peptides 2.000pos HLA peptide Core Offset I_pos I_len D_pos D_len iCore Identity 1-log50k(aff) Affinity(nM) %Rank BindLevel
or is it because antigen garnish is not expecting the header that my files get? I can see that they all can be run except NetMHCII 2.3 so it makes sense that I didn't get these output files before, but this should not have an effect on the prediction of the minimal set of predictions for MHC I that I am using now should it? Kind regards Sofia |
Sorry the beginning of the file got printed very big.. |
yes I am not sure. Could you upload the output files |
This error is occurring because the first two lines of your netMHCpan output file
are not "commented out" like the rest of the header with the I am not sure why these lines exist. This output file seems to be prepended with these temporary and working directory paths for some reason? I've never seen this. Unfortunately, parsing these minimally formatted stdout plain text files is subject to strange OS/CLI environment formatting issues such as this. You could attempt to fix this by determining what is causing this. Alternatively, I suggest you use our Docker container where we are able to have control over factors such as this and where I am certain all our package tests, including those covering these input files, pass. Repro below And then ultimately |
Thank you very very much for your help, I think we will first try to fix it with the netMHC output files and then try the Docker version if it does not work. Kind regards Sofia |
If I had to bet, I would say launching R with |
Hi Andrew, Thanks for your help, I managed to remove the first lines from the file as they where echo'ed into the output file which I hadn't noticed. I get an output file from the test data with all peptide lengths and all HLA's but I am not sure that the output from antigen garnish is correct and if netMHCII even is run. Can you confirm if the output looks correct? I just included the head of the file as it is too big. Kind regards Sofia |
Looks good. You can see the affinity predictions in the column |
Thank you for all your help, then I will proceed with my own data. Kind regards Sofia :) |
Hi again, sorry for all the questions, but I have a few more.
Thank you for your time. Kind regards Sofia |
Hi Sofia,
|
Hi Andrew, thanks for the quick answer.
Are you able to run your test example with this HLA, to see if I still have a problem with netmhcii or netmhciipan on our server. Though this seems weird as the test example with your suggested alleles (dt[, MHC := c("HLA-A01:47 HLA-A02:01 HLA-DRB1*14:67")]) works fine.
Kind regards Sofia |
Could you check the appropriate notation using the netMHCII command line tool? I think it has a function to list all alleles and the correct format. Maybe the required nomenclature is different than expected? I can check it out further tomorrow if that doesn't solve the issue. |
Hi Andrew, Both netmhcii and netmhciipan takes the MHC II alleles in the same way, for DRB it is DRB1_1101 and for DQ it is HLA-DQA10102-DQB10501. So I tried to input it to antigen garnish in different ways on your test data:
If you have time to try it off I would very much appreciate it! I have managed to run my data on MHC I so it is only the MHC II that is missing now. Kind regards Sofia |
Hi Sofia, Sorry for the delay. This was an issue in our codebase in which the unique format of this allele name broke our creation of the netMHC commands. I think you are the first person to test this allele. Sorry for the trouble. It is fixed on master in Github in the commit linked below. Please use # load an example VCF
dir <- system.file(package = "antigen.garnish") %>%
file.path(., "extdata/testdata")
file <- file.path(dir, "TUMOR.vcf")
# extract variants
dt <- garnish_variants(file)
# add space separated MHC types
# see list_mhc() for nomenclature of supported alleles
# MHC may also be set to "all_human" or "all_mouse" to use all supported alleles
dt[, MHC := c("HLA-DQA10102-DQB10501 HLA-DRB1_1101")]
# predict neoantigens
result <- dt %>% garnish_affinity(.)
result$`%Rank_EL_netMHCIIpan` %>%
stats::na.omit() %>%
as.numeric()
|
Hi Andrew, Though if I run it with an MHC I allele first like: dt[, MHC := c("HLA-A*02:01 HLA-DQA10102-DQB10501 HLA-DRB1_1101")], then it works fine. I want to run them separately as I have huge files, do you know why it crashes doing that? Kind regards Sofia |
Sorry - just to be clear you tried the latest commit on Github?
… On Jun 16, 2022, at 04:51, SofiaOtero ***@***.***> wrote:
Hi Andrew,
thank you very much, I was finally able to run it again after waiting for them to update it on the server.
I cannot run your command: dt[, MHC := c("HLA-DQA10102-DQB10501 HLA-DRB1_1101")], because I get the error:
Calculating netMHC consensus score.
Calculating overall consensus affinity score.
Error in get(cols) : invalid first argument
Calls: %>% ... merge_predictions -> [ -> [.data.table -> eval -> eval -> get
In addition: Warning message:
In .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], :
last 2 bases were ignored
Removing temporary files.
Execution halted
Though if I run it with an MHC I allele first like: dt[, MHC := c("HLA-A*02:01 HLA-DQA10102-DQB10501 HLA-DRB1_1101")], then it works fine. I want to run them separately as I have huge files, do you know why it crashes doing that?
Kind regards Sofia
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.
|
Sorry I didn't see your response. Yes we tried the latest commit and it does work in comparison to before, but it crashes if I don't add an HLA-I in the beginning. I have 26 patient files with different HLA's and I can see that it crashes for most of them with the same error as before "netMHCII: command not found", do you think it is because no one has run with those HLA II types before too? Kind regards Sofia |
Hi again, I have a cohort of 26 patients with different HLA II alleles and many of them crash when I run antigen garnish, I know that they are all available in netMHCiipan. I don't know if it is too much to ask, but could you check if some of them crash when you run it? I have attached a txt file with them all in the correct format to run in antigen garnish. Kind regards Sofia |
I’m happy to check this, please give me a few days.
… On Jun 23, 2022, at 09:20, SofiaOtero ***@***.***> wrote:
Hi again,
I have a cohort of 26 patients with different HLA II alleles and many of them crash when I run antigen garnish, I know that they are all available in netMHCiipan. I don't know if it is too much to ask, but could you check if some of them crash when you run it? I have attached a txt file with them all in the correct format to run in antigen garnish.
unique_HLA_II.txt
Kind regards Sofia
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.
|
Thanks, do you have any updates? Kind regards Sofia |
Sorry @SofiaOtero for the delay. I fixed the error that was preventing correct parsing of some of the rare alleles for netMHCIIpan from the canonical format. Every allele on your list now works, which you can test as I did: library(data.table)
library(magrittr)
library(antigen.garnish)
library(parallel)
HLA <- readLines("unique_HLA_II.txt") %>%
stringr::str_split(" ") %>%
unlist()
dir <- system.file(package = "antigen.garnish") %>%
file.path(., "extdata/testdata")
file <- file.path(dir, "TUMOR.vcf")
# test each allele
ret <- HLA %>% lapply(function(i) {
print(i)
dt %<>% data.table::copy()
dt[, MHC := i]
ret <- try(garnish_affinity(dt))
out <- list(
name = i,
result = ret
)
return(ret)
})
classes <- ret %>% lapply(function(i) {
any(i %>%
class() == "try-error")
}) %>% unlist() The table of HLA alleles is now printed on stdout also. |
Thank you very much, I have updated the new commit on the Github. When I run e.g. following HLA's: HLA-DQA10102-DQB10301 HLA-DQA10102-DQB10602 HLA-DQA10505-DQB10301 HLA-DQA10505-DQB10602 HLA-DRB111:01 HLA-DRB115:01 I get this after all variants processed: And I can see in the ag output directory that both netMHCII and netMHCIIpan have been run but then antigen garnish crashes with following error: Checking netMHC scripts in antigen.garnish data directory. So it seems lige it cannot run when there are NA values in the netMHCII alleles even though it should just proceed and run netMHCIIpan. Is this also an error you get? Kind regards Sofia |
Hi Sofia, No, I do not get an error. Are you sure the paths are configured correctly?
Seems to indicate that they are not.
By design, this should be fine and not generate any errors. |
Hi, I have been trying to run antigen garnish for a while with your testdata and now it seems to run fine with parallel and netMHC. The issue is that in the folder as e.g. ag_f236b988a09e438ea2 it does not seem like all netMHC's have been run as I only get following amount of files:
netMHC_2222f95f-7e2c-4c43_o.csv netMHCpan_5bf2b41a-85f4-453e_o.csv
netMHC_29f106ef-a12f-488c_o.csv netMHCpan_71ec0664-1374-4786_o.csv
netMHC_60c9a80d-91cf-4c85_o.csv netMHCpan_756ff183-22ac-453e_o.csv
netMHC_88957201-b040-4b1c_o.csv netMHCpan_881141cd-e83d-49ab_o.csv
netMHC_992253ab-0c44-4986_o.csv netMHCpan_a1a0dfef-606a-4942_o.csv
netMHC_b61f7b8f-a6b5-45cc_o.csv netMHCpan_b3d9c7ad-7535-45f9_o.csv
netMHC_c6123f47-736a-44f1_o.csv netMHCpan_bff23973-51cc-41e7_o.csv
netMHCIIpan_eb376fad-1533-4079_o.csv netMHCpan_cd147a8a-c683-4d5b_o.csv
netMHCpan_2ec1102a-b2ce-4c07_o.csv netMHCpan_e0771477-971b-4a0b_o.csv
netMHCpan_35980300-2a74-49f9_o.csv netMHCpan_f962f91a-ce37-4784_o.csv
netMHCpan_56cb499b-ace7-453e_o.csv netMHCpan_fd56749a-9714-4d38_o.csv
In the netMHC files there are not results from all the lengths and neither all the HLA's, so it seems like not all files needed for antigen garnish have been created as I get following error after 'Running netMHC in parallel.':
Collating netMHC output...
Read 74 items
Read 79 items
Read 84 items
Read 89 items
Read 94 items
Read 99 items
Read 103 items
Read 83 items
Error in data.table::setnames(., dt %>% names(), dtn) :
'old' is length 14 but 'new' is length 1
Calls: %>% ... collate_netMHC -> lapply -> FUN -> %>% ->
In addition: Warning message:
In .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], :
last 2 bases were ignored
Removing temporary files.
Execution halted
Can you help me to resolve this issue?
Thanks in advance.
The text was updated successfully, but these errors were encountered: