Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance support for non-ncs files #18

Open
jniediek opened this issue Nov 11, 2015 · 11 comments
Open

Enhance support for non-ncs files #18

jniediek opened this issue Nov 11, 2015 · 11 comments

Comments

@jniediek
Copy link
Owner

This includes getting rid of all the small pieces of code like glob('*.ncs') or fname[:3] and so on.

Thanks to @eann for pointing this out!

Also see the wiki

@eann
Copy link

eann commented Nov 12, 2015

I would like to tackle this issue; I have been thinking about it and, as a first step, I thought that making a base interface class from which all other interfaces (ncs, mat, etc.) would inherit could help to guarantee that all data files have a common interface. This way, the main scripts don't have to know if data is stored in mat or ncs files. What do you think about it?

@jniediek
Copy link
Owner Author

It's great that you want to do this. Thinking about software design in general, I think it is a great idea to start with a base class as you proposed.

What I'm not sure about is whether it makes more sense to first collect all the places where the existence of ncs-files is assumed. For example, I just discovered that even css-plot-extracted assumes that there are ncs-files to read the header information from. These places don't want to read data, they want to have some information about the names or headers of the ncs files.

This leads to the issue of meta-information. In several places, it's necessary to know the sampling rates and channel names. One option would be to create a simple text file in each data folder to store this information. It's also possible to use the attributes of h5 files, but then it's impossible to read this information without invoking python.

@eann
Copy link

eann commented Nov 13, 2015

Maybe we could look at it the other way around: where is it more comfortable for python to have the metadata? Having them in a text file rather than in the h5 only makes sense if a human is supposed to read it; I think it makes more sense that this information is gathered in one place and converted in human-readable form if and when a human needs to read it. The information could also be converted in different formats (e.g. txt, html, xml, GUI, etc.).

@jniediek
Copy link
Owner Author

You are right, once the metadata is somewhere, it's easy to convert it. So far, I have good experiences with text files (the header of ncs files is basically a text file, look at combinato/basics/nlxio.py, function ncs_nfo), but h5 attributes work, too. At least it would be no problem to change css-extract in such a way that it stores the header data in the h5 files when creating them. I already use h5-attritbutes in combinato/manager/create_session.py, function create_session.

The important part would then be to rewrite all scripts that read header information, so that they use the information stored in the h5 files.

If you decide to write a uniform interface, that could of course be part of the interface.

@eann
Copy link

eann commented Nov 17, 2015

I was modifying css-find-concurrent, and one question occurred to me: are the times expressed in seconds or in milliseconds?

@jniediek
Copy link
Owner Author

The timestamps that NcsFile.read returns are in microseconds. Everything else is in milliseconds. You might have noticed that concurrent.py divides timestamps by 1000 in two places. This is to convert from the ncs format to our h5 files.

@eann
Copy link

eann commented Nov 17, 2015

Ok, thanks!

@eann
Copy link

eann commented Nov 25, 2015

One question: I have seen that sometimes you use the tag "AcqEntName" from NCS files. As I'm not familiar with Neuralynx hardware I wanted to ask you what exactly is that tag and whether it is necessary or not when using .ncs files (e.g. you have many files with different "AcqEntName"s for each channel); In case it is not fundamental I would drop it and only use the essential information (basically just the channel) to build filenames.

@jniediek
Copy link
Owner Author

AcqEntName is the Neuralynx name for Acquisition Entity Name, basically the name of the Recording channel, which can be different from the filename.

If I had thought more carefully in the beginning, I would have created meta-data during the extraction of spikes.

The only script that should be allowed to refer to AcqEntName should be css-extract when dealing with ncs files - but I didn't think hard enough about it when I programmed it.

@bergem1t
Copy link

Have you ever thought about using existing tools that deal with integrating different formats for neurophysiology data?

Neo sounds like a useful tool here, since it is python based: https://pythonhosted.org/neo/

@jniediek
Copy link
Owner Author

Hi @bergem1t,

Short answer

Very important idea for improvement, but I have no resources to fix it now.

Long answer

File handling in Combinato could clearly benefit from a thorough redesign. The details vary for the different css-... programs:

css-extract and css-plot-rawsignal

These should be redesigned using an abstract reader class, which could then be implemented for each file format specifically. @eann started some work in this direction. Update?

css-prepare-sorting, css-cluster, css-combine, css-plot-...

These work with hdf files already, but in some odd points refer to the .ncs files nevertheless. Theses odd dependencies should be removed. Instead, more (optional) metadata (e.g., channel names) should be stored in the hdf files.

Workarounds

Make sure to check out the scripts in the combinato/tools folder. For issues with file formats,

mat2h5.py, concatenate_h5.py and old_format_output.py are most relevant.

These scripts are not documented much. Sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants