AnoushkaJain3/curation_machine_learning_models

To reduce the effort in manual curation, we developed a machine learning approach using Neuropixels probes, incorporating quality metrics to identify noise clusters and isolate single-cell activity automatically. Compatible with the Spikeinterface API, our method generalizes across various probes and species.

We generated a machine learning model trained on 11 mice in V1, SC, and ALM using Neuropixels on mice. Each recording was labeled by at least two people and in different combinations. The agreement amongst labelers is 80%.

There are two tutorial notebooks:

Model_based_curation_tutorial.ipynb

This notebook helps you apply pre-trained models to new recordings. Simply load the models and use them to label your spike-sorted data.

We provide "noise_neuron_model.skops" which is used to identify noise, and "sua_mua_model.skops" which is used to isolate SUA. These models can be used if you want to predict labels (SUA,MUA and noise) on mice data generated using Neuropixels.

Steps:
1. load your recording depending on the acquisition software you used to create the 'recording' object.
2. load your sorting depending on the spike sorter you used to create the 'sorting' object.
3. Then you can create a Sorting_Analyzer object and you compute quality metrics.

These steps are explained in more detail in the Jupyter notebook in the files folder.

auto_label_units is the main function in this notebook.

API link to know the parameters: (https://spikeinterface--2918.org.readthedocs.build/en/2918/api.html#spikeinterface.curation.auto_label_units)

# example use of auto-label function
from spikeinterface.curation import auto_label_units

labels = auto_label_units(
sorting_analyzer = sorting_analyzer,
model_folder = “SpikeInterface/a_folder_for_a_model”,
trusted = [‘numpy.dtype’])

Train_new_model.ipynb

If you have your own manually curated data (e.g., from other species), this notebook allows you to train a new model using your specific data. Here you need to follow the three steps mentioned before but you need to provide your manually curated labels.

train_model is the main function in this notebook.

API link to know the parameters: https://spikeinterface--2918.org.readthedocs.build/en/2918/api.html#spikeinterface.curation.train_model

# example use of train_model function
from spikeinterface.curation.train_manual_curation import train_model

trainer = train_model(mode = "analyzers",
labels = labels,
analyzers = [labelled_analyzer, labelled_analyzer],
output_folder = str(output_folder), 
imputation_strategies = None, 
scaling_techniques = None,
classifiers = None) # Default to Random Forest only. Other classifiers you can try [ "AdaBoostClassifier", "GradientBoostingClassifier",
                                                            # "LogisticRegression", "MLPClassifier", "XGBoost", "LightGBM", "CatBoost"]

Acknowledgments:

I would like to thank people who have helped a lot in this project:

For code refactoring and helping integration in Spikeinterface: Chris Halcrow, Jake Swann, Robyn Greene, Sangeetha Nandakumar(ibots)
Curators: Nilufar Lahiji, Sacha Abou Rachid, Severin Graff, Luca Koenig, Natalia Babushkina, Simon Musall
Advisors: Alessio Buccino, Matthias Hennig and Simon Musall

Also all my amazing lab members : https://brainstatelab.wordpress.com/