Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing mictures of templates: VallueError operands could not be broadcast together #395

Open
KKKenser opened this issue Mar 18, 2024 · 10 comments

Comments

@KKKenser
Copy link

KKKenser commented Mar 18, 2024

Hi,

I'm trying to run test sorting on a few channels (NEURALYNX format) and I'm constantly getting following error:

File          : /home/lab/Desktop/test_data_to_sort/test_8/CSC_129.ncs
Steps         : filtering, whitening, clustering, fitting, merging
Number of CPU : 12/20
Parallel HDF5 : False
Shared memory : False
Hostfile      : /home/lab/spyking-circus/circus.hosts

##################################################################


-------------------------  Informations  -------------------------
| Number of recorded channels : 8
| Number of analyzed channels : 8
| File format                 : NEURALYNX
| Data type                   : int16
| Sampling rate               : 32000 Hz
| Duration of the recording   : 44 min 42 s 208 ms
| Width of the templates      : 3 ms
| Spatial radius considered   : 300 um
| Threshold crossing          : positive and negative
| Streams                     : multi-files (1 found)
------------------------------------------------------------------
-------------------------  Informations  -------------------------
| Using only 12 out of 20 local CPUs available (-c to change)
------------------------------------------------------------------
-------------------------  Informations  -------------------------
| Filtering has already been done
------------------------------------------------------------------
Analyzing data to get whitening matrices and thresholds...
Found 30s to compute the whitening matrix...
Because of whitening, need to recompute the thresholds...
Searching spikes to construct the PCA basis...
100%|████████████████████████████████████|[00:07<00:00,  3.64s/it]
Found 11726 waveforms over 12792 requested
Searching isolated random spikes to sample amplitudes...
100%|████████████████████████████████████|[00:30<00:00, 15.45s/it]
Found 78739 spikes over 127992 requested
Estimating amplitudes distributions...
Smart Search of good isolated spikes for the clustering (1/3)...
100%|████████████████████████████████████|[00:35<00:00, 17.92s/it]
Found 75513 isolated spikes over 127992 requested (6105 rejected)
Computing density estimations...
Searching random spikes to refine the clustering (2/3)...
100%|████████████████████████████████████|[00:34<00:00, 17.33s/it]
Found 17775 spikes over 127992 requested
Refining density estimations...
Searching random spikes to refine the clustering (3/3)...
100%|████████████████████████████████████|[00:31<00:00, 15.53s/it]
Found 6042 spikes over 127992 requested
Refining density estimations...
Running density-based clustering...
100%|████████████████████████████████████|[00:00<00:00, 12.02it/s]
-------------------------  Informations  -------------------------
| Number of clusters found : 46
| Number of local merges   : 11 (method nd-bhatta, param 2)
------------------------------------------------------------------
Estimating the templates with the median-raw procedure ...
100%|████████████████████████████████████|[00:00<00:00,  4.39it/s]
Removing 44 strongly shifted or noisy/mixture templates...
-------------------------  Informations  -------------------------
| Templates on few channels only (1), cc_merge set to 1 automatically
------------------------------------------------------------------
Removing mixtures of templates...
100%|███████████████████████████████████|[00:00<00:00, 568.03it/s]
  0%|                                            |[00:00<?, ?it/s]Traceback (most recent call last):

File "/home/lab/miniconda3/envs/circus/bin/spyking-circus-subtask", line 33, in <module>
    sys.exit(load_entry_point('spyking-circus==1.1.0', 'console_scripts', 'spyking-circus-subtask')())
  File "/home/lab/miniconda3/envs/circus/lib/python3.6/site-packages/circus/scripts/subtask.py", line 54, in main
    circus.launch(task, filename, nb_cpu, nb_gpu, use_gpu)
  File "/home/lab/miniconda3/envs/circus/lib/python3.6/site-packages/circus/__init__.py", line 23, in launch
    module.main(params, nb_cpu, nb_gpu, use_gpu)
  File "/home/lab/miniconda3/envs/circus/lib/python3.6/site-packages/circus/clustering.py", line 1804, in main
    merged2 = algo.delete_mixtures(params, nb_cpu=nb_cpu, nb_gpu=nb_gpu, use_gpu=use_gpu, debug_plots=debug_plots)
  File "/home/lab/miniconda3/envs/circus/lib/python3.6/site-packages/circus/s
  0%|                                            |[00:00<?, ?it/s]**
----------------------------  Error  -----------------------------
| Step "clustering" failed for reason Command '['mpiexec', '-np', '12', 'spyking-circus-subtask', 'clustering', '/home/lab/Desktop/test_data_to_sort/test_8/CSC_129.ncs', '12', '0', 'False', 'False']' returned non-zero exit status 1.!
------------------------------------------------------------------hared/algorithms.py", line 1659, in delete_mixtures
    min_sps = (limits[:, 0] * sub_norm_templates)[:, numpy.newaxis]
**ValueError: operands could not be broadcast together with shapes (2,) (4,) 
  0%|                                            |[00:00<?, ?it/s]**
----------------------------  Error  -----------------------------
| Step "clustering" failed for reason Command '['mpiexec', '-np', '12', 'spyking-circus-subtask', 'clustering', '/home/lab/Desktop/test_data_to_sort/test_8/CSC_129.ncs', '12', '0', 'False', 'False']' returned non-zero exit status 1.!
------------------------------------------------------------------

And parameters file is following:


[data]
file_format    = NEURALYNX  # Can be raw_binary, openephys, hdf5, ... See >> spyking-circus help -i for more info
stream_mode    = multi-files # None by default. Can be multi-files, or anything depending to the file format
mapping        = mapping.prb           # Mapping of the electrode (see http://spyking-circus.rtfd.org)
suffix         =            # Suffix to add to generated files, if needed
overwrite      = True       # Filter or remove artefacts on site (if write access is possible). Data are duplicated otherwise
parallel_hdf5  = True    # Use the parallel HDF5 feature (if available)
output_dir     =            # By default, generated data are in the same folder as the data.
shared_memory  = False

[detection]
radius         = auto       # Radius [in um] (if auto, read from the prb file)
N_t            = 3          # Width of the templates [in ms]
spike_thresh   = 5          # Threshold [in MAD] for spike detection
peaks          = both    # Can be negative (default), positive or both
dead_channels  =            # If not empty or specified in the probe, a dictionary {channel_group : [list_of_valid_ids]}
weird_thresh   =            # If not empty, threshold [in MAD] for artefact detection

[filtering]
cut_off        = 300, auto  # Min and Max (auto=nyquist) cut off frequencies for the band pass butterworth filter [Hz]
filter         = True       # If True, then a low-pass filtering is performed
remove_median  = False      # If True, medians over all channels within shanks are substracted (movement artifacts)
common_ground  =            # If you want to use channels as ground within shanks {channel_group : ground_channel}
sat_value      =            # Values higher than sat_value are set to 0 during filtering (in percent of max dtype) [0,1]

[triggers]
trig_file      =            # External stimuli to be considered as putative artefacts [in trig units] (see documentation)
trig_windows   =            # The time windows of those external stimuli [in trig units]
trig_unit      = ms         # The unit in which times are expressed: can be ms or timestep
clean_artefact = False      # If True, external artefacts induced by triggers will be suppressed from data
dead_file      =            # Portion of the signals that should be excluded from the analysis [in dead units]
dead_unit      = ms         # The unit in which times for dead regions are expressed: can be ms or timestep
ignore_times   = False      # If True, any spike in the dead regions will be ignored by the analysis
make_plots     =            # Generate sanity plots of the averaged artefacts [Nothing or None if no plots]

[whitening]
spatial        = True       # Perform spatial whitening
max_elts       = 1000       # Max number of events per electrode (should be compatible with nb_elts)
nb_elts        = 0.8        # Fraction of max_elts that should be obtained per electrode [0,1]
output_dim     = 5          # Can be in percent of variance explain, or num of dimensions for PCA on waveforms

[clustering]
extraction     = median-raw # Can be either median-raw (default) or mean-raw
sub_dim        = 10         # Number of dimensions to keep for local PCA per electrode
max_elts       = 10000      # Max number of events per electrode (should be compatible with nb_elts)
nb_elts        = 0.8        # Fraction of max_elts that should be obtained per electrode [0,1]
nb_repeats     = 3          # Number of passes used for the clustering
smart_search   = True       # Activate the smart search mode
merging_method = nd-bhatta  # Method to perform local merges (distance, dip, folding, nd-folding, bhatta, nd-bhatta)
merging_param  = default    # Merging parameter (see docs) (3 if distance, 0.5 if dip, 1e-9 if folding, 2 if bhatta)
sensitivity    = 3          # Single parameter for clustering sensitivity. The lower the more sensitive
cc_merge       = 0.95       # If CC between two templates is higher, they are merged
dispersion     = (5, 5)     # Min and Max dispersion allowed for amplitudes [in MAD]
fine_amplitude = True       # Optimize the amplitudes and compute a purity index for each template
make_plots     =            # Generate sanity plots of the clustering [Nothing or None if no plots]
max_amplitude  = auto       # ADDED RECENTLY, HAS TO FIX VALLUEERROR PROBLEM

[fitting]
amp_limits     = (0.3, 5)   # Amplitudes for the templates during spike detection [if not auto]
amp_auto       = True       # True if amplitudes are adjusted automatically for every templates
collect_all    = False      # If True, one garbage template per electrode is created, to store unfitted spikes
ratio_thresh   = 0.9        # Ratio of the spike_threshold used while fitting [0-1]. The lower the slower
mse_error      = False      # If True, RMS is collected over time, to assess quality of reconstruction

[merging]
erase_all      = True       # If False, a prompt will ask you to remerge if merged has already been done
cc_overlap     = 0.75       # Only templates with CC higher than cc_overlap may be merged
cc_bin         = 2          # Bin size for computing CC [in ms]
default_lag    = 5          # Default length of the period to compute dip in the CC [ms]
auto_mode      = 0.75       # Between 0 (aggressive) and 1 (no merging). If empty, GUI is launched
remove_noise   = True       # If True, meta merging will remove obvious noise templates (weak amplitudes)
noise_limit    = 0.75       # Amplitude at which templates are classified as noise
sparsity_limit = 0          # Sparsity level (in percentage) for selecting templates as putative noise (in [0,1])
time_rpv       = 5          # Time [in ms] to consider for Refraction Period Violations (RPV) (0 to disable)
rpv_threshold  = 0.02       # Percentage of RPV allowed while merging
merge_drifts   = True       # Try to automatically merge drifts, i.e. non overlapping spiking neurons
drift_limit    = 1          # Distance for drifts. The higher, the more non-overlapping the activities should be
clean_merging  = False      # When templates are merged, automatically remove spike duplicated (less than 0.5ms appart)

[converting]
erase_all      = True       # If False, a prompt will ask you to export if export has already been done
export_pcs     = prompt     # Can be prompt [default] or in none, all, some
export_all     = False      # If True, unfitted spikes will be exported as the last Ne templates
sparse_export  = True       # For recent versions of phy, and large number of templates/channels
prelabelling   = False      # If True, putative labels (good, noise, best, mua) are pre-assigned to neurons
rpv_threshold  = 0.05       # Percentage of RPV allowed while labelling neurons as good neurons

[validating]
nearest_elec   = auto       # Validation channel (e.g. electrode closest to the ground truth cell)
max_iter       = 200        # Maximum number of iterations of the stochastic gradient descent (SGD)
learning_rate  = 1.0e-3     # Initial learning rate which controls the step-size of the SGD
roc_sampling   = 10         # Number of points to estimate the ROC curve of the BEER estimate
test_size      = 0.3        # Portion of the dataset to include in the test split
radius_factor  = 0.5        # Radius factor to modulate physical radius during validation
juxta_dtype    = uint16     # Type of the juxtacellular data
juxta_thresh   = 6          # Threshold for juxtacellular detection
juxta_valley   = False      # True if juxta-cellular spikes are negative peaks
juxta_spikes   =            # If none, spikes are automatically detected based on juxta_thresh
filter         = True       # If the juxta channel need to be filtered or not
make_plots     = png        # Generate sanity plots of the validation [Nothing or None if no plots]

[extracting]
safety_time    = 1          # Temporal zone around which spikes are isolated [in ms]
max_elts       = 1000       # Max number of collected events per templates
output_dim     = 5          # Percentage of variance explained while performing PCA
cc_merge       = 0.975      # If CC between two templates is higher, they are merged
noise_thr      = 0.8        # Minimal amplitudes are such than amp*min(templates) < noise_thr*threshold

[noedits]
filter_done    = True              #!! AUTOMATICALLY EDITED: DO NOT MODIFY !!
artefacts_done = False      # Will become True automatically after removing artefacts
median_done    = False      # Will become True automatically after removing common median
ground_done    = False      # Will become True automatically after removing common ground

I have downloaded Spyking-Circus recently, so I'm assuming I have up-to-date version. With different initial settings in parameters file I have different values for brodcasting together ValueError (4), (8) or as above (2), (4). I have loaded so far only 8 channels for testing, in the mapping file radius covers all of them.

I would be grateful for any help in solving this issue.

Sincerly,
Krzysztof

@mmagnuski
Copy link

Hi, I'm working with Krzysztof to get SpyKING Cirus running. We've updated to the most recent Github version of SpyKING Circus, but we still encounter the same problem (**ValueError: operands could not be broadcast together with shapes (2,) (4,)).

Additionally SpyKING Circus was not able to detect all of our files - it only sees one (Streams : multi-files (1 found)). We tried: a) chaning file names to include underscore (CSC_2.ncs instead of original CSC2.ncs), b) chaning file names to include underscore and zero-based indexing (CSC_1.ncs instead of original CSC2.ncs) or c) using the ncs_pattern configuration option (we tried a regular expression 'CSC_[0-1]+.ncs' which didn't work, and a simpler 'CSC' pattern that failed too but with a slightly different error). So if anyone has suggestions with respect to the multi-file problem, we would appreciate it too.

@KKKenser
Copy link
Author

KKKenser commented Mar 21, 2024

The second issue @mmagnuski mentioned was solved by commenting line 126 in neuralynx.py file and writing instead

all_files = tmp_all_files

The problem was caused by filter_name_duplicates function which strips names of files with "_" separator and removing last part of filename, so if your filename is NAME_number.ncs (in our case CSC_129), it will remove number and leave only NAME part so in theory you will have multiple copies of files named NAME to analyze and the program will take into account only first one.

However, our main problem

**ValueError: operands could not be broadcast together with shapes (2,) (4,))
still remains unsolved.

@yger
Copy link
Member

yger commented Mar 21, 2024

You could try to add
remove_mixture = False in the [clustering] section of the parameter files, to avoid this mixture_removal that in your case seems to crash.

@mmagnuski
Copy link

Yes, as @KKKenser mentions, we would have to modify the naming of our files to make it work in Spyking Circus without modifying the code. I think this could be considered a bug, because our original naming was in line with this documentation page. The files are not detected because Spying Circus detects duplicates by splitting file names by underscore and rejecting the last part - so 'mydata_1.ncs' and 'mydata_2.ncs' both become 'mydata' and are treated as duplicates. We could later create a pull request to fix this issue.

@yger
Copy link
Member

yger commented Mar 21, 2024

However, while I know this is not entirely finished, and that the publication will only arrives in couple of month, you could try to give it a go to spyking-circus2, entirely based on SpikeInterface (https://github.com/SpikeInterface/). i'll write a proper documentation soon, but thanks to spikeinterface, the handling of the files/preprocessing, .... is much more smoother and easier. You can simply do somehting like

rec = si.read_neuralynx('path')
sorting = si.run_sorter('spykingcircus2", rec)

@mmagnuski
Copy link

Hi @yger, thanks for your suggestion! We can try this out.
Do you have any idea on how to fix the problem without skipping this computation step? Both in our case and #391 the difference in shapes is two-fold (the two lengths that mismatch).

Also, thank you for suggesting using spyking circus through spike_interface. Indeed, we will try giving it a go. I actually wanted to start with SC1, because I remember seeing figures in this paper that would suggest that SC2 is not as good as SC1 (for example number of correctly identified units in table 2). I don't know how accurate this table is with respect to current state to SC2, but this was the source of bias in our case. :)

@yger
Copy link
Member

yger commented Mar 21, 2024

I would not trust the figure of this preprint, regarding the number obtained with SC2. It was clearly misused (not necessarily on purpose) and/or taken from a too alpha version that was not stabilized. Internal bencharmks are now clearly showing that SC2 and SC1 should behave alike. But you are welcome to try with SC1 first ! The best for you might be to share your data if you cant, in order for me to understand why is going on under the hood

@mmagnuski
Copy link

Great, thanks for reassuring us!
I think we could share our data with you, I will just double-check and let you know tomorrow.

@KKKenser
Copy link
Author

We have received approval for sharing the data we've used for testing the spyking-circus environment. Here's the link to zip file on google disc:

https://drive.google.com/file/d/1kpyRc11xCokAkPcPjyK--KcGcSS3TcHU/view?usp=sharing

We'd be grateful if you could find the time and take a look at our issue

@mmagnuski
Copy link

We managed to run sorting without errors last week, but we are not sure what helped in the end. We will try again on our full dataset and let you know if we still have problems. So the issue can be closed - at least temporarily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants