File size: 1,472 Bytes
adbdca7
 
 
 
 
 
ffd9d26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
adbdca7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
---
language:
- en
tags:
- code
---
# Representativity-based active learning for regression using Wasserstein distance and GroupSort Neural Networks

You will find in this repository the codes used to test the performance of the WAR model on a fully labeled dataset

**WAR-notebook** : you can run the algorithm from there and change the desired parameters


### WAR directory

**Experiment_functions.py** : functions used to vizualise information about WAR process (loss, metrics, points queried every rounds...).

**Models.py**: Definition of the two neural networks h and phi.

**dataset_handler.py**: Import and preprocess datasets.

**full_training_process.py**: main function.

**training_and_query.py**: function to run one round (one training and querying process).


## Abstract
This paper proposes a new active learning strategy called Wasserstein active regression (WAR) based on the principle of distribution-matching to measure the representativeness of our labeled dataset  compared to the global data distribution. We use GroupSort Neural Networks to compute the Wasserstein distance and provide theoretical foundations to justify the use of such networks with explicit bounds for their size and depth. We combine this solution with another diversity and uncertainty-based approach to sharpen our query strategy. Finally, we compare our method with other solutions and show empirically that we consistently achieve better estimations with less labeled data.