Igor Santana
fixing bugs
e6f1d8f
---
language:
- en
thumbnail: "https://img.memegenerator.net/instances/38101830.jpg"
tags:
- Recommendation System
- Context-Aware Recommendation System
- Music Embeddings
- Song Embeddings
- Recurrent Neural Network
- word2vec
- doc2vec
license: "unlicense"
datasets:
- music4all
metrics:
- precision
- recall
- hitrate
- F1
- MAP
---
# RNN Embeddings
## Jointly learning music embeddings with Recurrent Neural Networks
This repository contains all the code that I did during my masters @ State University of Maringá. I do not intend to add new features to this project, as I will not continue this project in a PhD. To better understand what is the goal of this project, this quote is from my thesis and summarizes what I did:
> This work's goal is to use Recurrent Neural Networks to acquire contextual information for each song, given the sequence of songs that each user has listened to using embeddings.
If you have any doubts about the code, or want to use it in your project, let me know! I will be glad to help you in anything you need.
### Installation and Setup
As this code was written in Python, I highly recommend you to use [conda](https://docs.conda.io/en/latest/) to install all the dependencies that you'll need to run it. I have provided the [environment file](environment.yml) that I ended up with, and to create the repository using this file, you should run the following command (assuming you already have conda):
```
conda env create -f environment.yml
```
It is important to know that I used Tensorflow 1.14.0, Cuda 9.2 and Python 3.6.9 to run the experiments. If you cannot run with the environment file that I have provided, perhaps its because one of those versions.
### Directory Structure and General Instructions
```
.
|-- analysis
|-- configs
|-- dataset
| |-- dataset #1
| |-- dataset #2
| `-- ...
|-- outputs
|-- project
| |-- data
| |-- evaluation
| |-- models
| `-- recsys
|-- tmp
```
This project follows this directory structure in order to work. The main python files are in the **project** folder, and any change that you'll want to do in the code must be done in the files in this folder. The **outputs** folder will contain the output file for the models that you built.
The **dataset** contains all the datasets that you'll use in the project, and for each dataset, you should create a separate folder for it inside the **dataset** folder. The project will then look for a `listening_history.csv` file inside of this folder to run it. This file **must be** comma-separated.
A temporary folder, **tmp**, will be created while the project works. For each dataset that you'll run this project with, a folder inside the **tmp** folder will be created. There you can find the cross-validation folds, the models that you built and the individual recommendations for each user, as well as some auxiliary matrixes used in the UserKNN algorithm.
I have also included an **analysis** folder that I used to create some graphs with the results. You just have to point to the `main.py` file in the analysis folder where are the results, and it will show an graphical comparison between the models with all the metrics.
The project will only work if you provide a configuration file to it. In my case, I stored my configuration files in the **configs** folder, but feel free to delete the folder if you don' want it. The configuration file contains the parameters for the models, and I don't recommend deleting any parameter even if you are not going to use it. I've included a [sample configuration](configs/config_sample.yml) file that you can use as guideline for your project.
To run the project, you have to pass the config to the `main.py` as a parameter.
```
$ python main.py --config=configs/config_sample.yml
```
###### DISCLAIMER:
The `model` and `bi` parameters in the `models/rnn` configuration object are not working, as I hardcoded it in my project. If you want to change the layer (to a GRU or a Simple RNN), you should do it [directly in the code](project/models/rnn.py#L147).
### What is included in this project?
To better understand the project, I highly recommend you to go check the work that I used as a baseline for my model:
- [link](https://doi.org/10.1007/s10791-017-9317-7) - Wang, D., Deng, S. & Xu, G. Sequence-based context-aware music recommendation. Information Retrieval Journal (2018)
Their work, _music2vec_, is one of the baselines for my RNN model. The following embeddings are implemented in this project:
- music2vec
- doc2vec - [link](https://cs.stanford.edu/~quocle/paragraph_vector.pdf)
- GloVe - [link](https://nlp.stanford.edu/projects/glove/)
To evaluate these embeddings models, the CARS that are implemented are the ones that were proposed by Wang et. al (M-TN, SM-TN, CSM-TN, CSM-UK). Besides the metrics that were used in the paper, I have included MAP, NDCG@5 and Precision@5 as well. The cutoff of these metrics is not configurable, sorry.
---
If you have any doubts about this project, feel free to contact me!