Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directly reading Thermo .raw files #9

Open
gsaxena888 opened this issue Aug 21, 2017 · 3 comments
Open

Directly reading Thermo .raw files #9

gsaxena888 opened this issue Aug 21, 2017 · 3 comments

Comments

@gsaxena888
Copy link

I saw that BatMass was thinking of being able to support direct reading of Thermo .raw files. Is this still true? If so, would it be possible to achieve this reasonably quickly/stably by calling the already-built Go library through Java (and some sort of Java-Go interface library)? (https://pkelchte.wordpress.com/2013/11/25/unthermo/ and https://godoc.org/bitbucket.org/proteinspector/ms/unthermo -- There's also probably a paper somehwere on this). Or, would it be easy to somehow convert the Go code to Java (to avoid the hard-to-debug library interface problems?)

If the above is happenning, would you also have any plans to do the same for Sciex .wiff files? (I know that Sciex has for the last two years provided unofficially a linux program that can convert their wiff files to XML (I think mzML, but it could be mzXML) so presumably there exists a way to write a nonWindows program to access their .wiff files.

@chhh
Copy link
Owner

chhh commented Aug 27, 2017

I would love to add that sort of functionality. Batmass uses MSFTBX library to read files, that will need to be added there. Thermo has almost released a new set of .NET libraries for reading their data files, so I was waiting for that to become fully official.
I don't think sciex provides multi-platform libraries to read their data.

@gsaxena888
Copy link
Author

Very cool about Thermo's new libraries. I didn't know about that. I presume that these .net libraries can be called (easily/robsutly) from Java? Would it work even on Linux systems? Regarding Sciex, I had one idea and was wondering your thought on it: what if there were a Java library that used the same dlls that come with msconvert to access the Sciex wiff files? Obviously, this would only work on Windows systems, but it would at least give individuals an option to read directly from the Sciex wiff files, using Java, and without having to go through the somewhat slow msconvert process. Regarding Thermo again, do you have any guestimate (high level estimate is fine) when the MSFTX library might support reading from Thermo files? And, this is differrent question really, but any thoughts on how the MSFTBS library compares to a new Java library being built for protoemics: https://msdk.github.io/

@chhh
Copy link
Owner

chhh commented Sep 6, 2017

@gsaxena888 it will, it has been tested to work with Mono instead of .NET.
It all would be very nice to do, but all those native DLLs are made with different technologies and are not too easy to interface with.
Thermo was using Microsoft COM, others were building native c libraries, yet others used .NET. Not easy to interface to that zoo from Java.

Regarding MSDK - I'm taking part in its development as well. We've just completed updated mzml and netcdf parsers writers.
I'd say that mzxml and mzml parsers in MSFTBX are faster (they are multi-threaded), there's also more control over how spectra can be loaded/unloaded form memory, which MSDK does not provide. But then, most people don't need that. MSFTBX was built to cater to the needs in BatMass and later MSFragger. With a good SSD (or a good RAID array) mzml parser in MSFTBX can reach 0.5-1GB/sec you won't see that in MSDK parsers. But then, MSDK also provides a writer and can read from streams, not only files. This will be helpful if you have files gzipped, for example, you can then use a deflating stream.

Sucking in all MS1 scans with MSFTBX is as easy as:

MZMLFile source = new MZMLFile(path);
ScanCollectionDefault scans = new ScanCollectionDefault();
scans.setDataSource(source);
scans.loadData(LCMSDataSubset.MS1_WITH_SPECTRA);

When this last call returns you will have all MS1 scans loaded into the data structure.
To navigate those loaded scans use various getMapXXX methods, e.g.:
scans.getMapMsLevel2index().get(1).getNum2scan()
to get a simple map from scan numbers (in order, as it's a navigable map) or:
scans.getMapMsLevel2index().get(1).getRt2scan()
to get an RT mapping which returns lists, as two scans are allowed to have the same RT.

final TreeMap<Integer, IScan> num2scan = scans.getMapMsLevel2index().get(1).getNum2scan();
for (Map.Entry<Integer, IScan> e : num2scan.entrySet()) {
    int num = e.getKey();
    IScan scan = e.getValue();
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants