File size: 3,245 Bytes
a1a7d89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# Model Card for Product Return Prediction

## model details

- **person or organization developing model**: team product-return-prediction
- **model date**: 24/11/2024
- **model version**: v1.4
- **model type**: Support Vector Machine

<!-- algorithm description -->
This model is a **Support Vector Machine** classifier designed to predict whether a product will be returned or not, based on various product and transaction features. Hyperparameters (C, kernel type and gamma) are chosen using a grid search, with a 10-fold cross validation.

## intended use

### primary intended uses

<!-- description of the model's use -->
The purpose of the model is to assist e-commerce owners (Armani) in identifying possible returns among their purchases in order to reorganize inventories to optimize product handling and transportation costs

### primary intended users

<!-- description of the users -->
The model was developed for Armani. Specifically, the purpose is to support professional figures involved in logistics, product management, and marketing

<!-- ### out-of scope use cases -->

## factors

### relevant factors

<!-- factors to consider -->
Some factors to be considered that involve the model are the following:

- **product features**: characteristics like model, fabric, colour, composition, and product category may have a significant impact on the likelihood of a product being returned
- **imbalanced classes**: the class imbalance is a relevant factor that may affect the model's ability to predict the minority class (returns) accurately

### decision thresholds

<!-- description of selected thresholds -->
The default decision threshold for the SVM model is 0.5, where probabilities greater than or equal to 0.5 indicate a "returned" prediction, and probabilities below 0.5 indicate "not returned."

## Train and Test data

### dataset description

- **dataset**: *German Sales 2023 EA*

the model was trained and tested on this dataset, following appropriate splitting and pre-processing steps. 

### split

Dataset splitting is as follows:
- **training**: 80%
- **validation and test**: 20%

the splitting is performed by using the corresponding sklearn function. The chosen random state is 42.

### pre-processing

To be adapted to the binary classification task, and further adapted to a numerical model such as SVM, the model underwent an important pre-processing phase. Pre-processing steps are the following:

1.  Dataset conversion from Excel to TSV
2. Specific columns removal from dataframe
3. Train and test data splitting
4. Train and save scaler
5. Scaling data with a pre-trained scaler
6. Target encoding of categorical columns
7. Preparation of inventory with sales data
8. Population of missing values
9. Calculation and application of return percentages by color
10. Final cleaning and processing

## Quantitative analysis

|           | PRECISION | RECALL    | F1-SCORE  | Support   |
|-----------|-----------|-----------|-----------|-----------|
| No return | 0.95      | 0.95      | 0.95      | 2086      |
| Return    | 0.89      | 0.90      | 0.89      | 960       |
| Accuracy  |           |           |           |0.93       |


<!-- ### unitary results -->

<!-- ### intersectional results -->