AMR prediction with LGBMClassifier models

This repository contains a Python script for predicting antimicrobial resistance (AMR) using the LGBMClassifier model. The script reads input datasets from a directory, applies feature extraction techniques to obtain k-mer features, trains and tests the models using cross-validation, and outputs the results in text files.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

This script requires the following Python libraries:

pandas
scikit-learn
numpy
tqdm
lightgbm
hyperopt
joblib
bayesian-optimization
skopt

Installing

Clone the repository to your local machine and install the required libraries:

  $ git clone https://github.com/username/repo.git
  $ cd repo
  $ pip install -r requirements.txt

Usage

To use the script, execute the following command:

css Copy code

  $ python main.py

Code Structure

The main script consists of several sections:

1 Import necessary libraries
2 Set seed for reproducibility
3 Define function to get list of models to evaluate
4 Load list of selected samples
5 Call function to get list of models
6 Initialize KFold cross-validation
7 Iterate over values of k to read the corresponding k-mer feature dataset
8 Iterate over the models list
9 Write results to text file

Data Description

The input datasets are CSV files containing bacterial genomic sequences and their corresponding resistance profiles for selected antibiotics. The script reads these files from a directory and applies k-mer feature extraction techniques to obtain numerical feature vectors.

Models

The script uses two models for AMR prediction: Random Forest and LGBMClassifier.

Output

The script outputs the results of each model to a text file in the specified output directory. The results include accuracy, precision, recall, F1 score, and area under the ROC curve.

Authors

Gabriel Sousa - gabrieltxs

License

This project is licensed under the MIT License - see the LICENSE.md file for details.