Spaces:
Running
Running
Sébastien De Greef
commited on
Commit
•
9735e61
1
Parent(s):
6a45255
feat: Add "Perplexity in AI Models" to theory section
Browse files- src/_quarto.yml +21 -8
- src/theory/dying_neurons.qmd +84 -0
- src/theory/overfitting.qmd +94 -0
- src/theory/perplexity_in_ai.qmd +27 -0
src/_quarto.yml
CHANGED
@@ -29,6 +29,13 @@ website:
|
|
29 |
- href: about.qmd
|
30 |
text: About this Cookbook
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
- section: "Theory"
|
33 |
contents:
|
34 |
- href: theory/activations.qmd
|
@@ -52,19 +59,27 @@ website:
|
|
52 |
text: "Optimizers"
|
53 |
- href: theory/quantization.qmd
|
54 |
text: "Quantization"
|
|
|
|
|
55 |
- href: theory/regularization.qmd
|
56 |
text: "Regularization"
|
57 |
-
-
|
58 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
- href: theory/mixture_of_models.qmd
|
60 |
text: "Mixture of Models"
|
61 |
-
- href: theory/dont_mess_with_kittens.qmd
|
62 |
-
text: "High Stakes, and... Kittens"
|
63 |
|
64 |
- section: "Large Language Models"
|
65 |
contents:
|
66 |
-
- href: theory/good_enough.qmd
|
67 |
-
text: "What is Good Enough ?"
|
68 |
- href: llms/tasks.qmd
|
69 |
text: "Tasks"
|
70 |
- href: llms/tokenizers.qmd
|
@@ -81,8 +96,6 @@ website:
|
|
81 |
- href: llms/rag_systems.qmd
|
82 |
text: "Retrival Augmented Generation"
|
83 |
|
84 |
-
|
85 |
-
|
86 |
- section: "Computer Vision Models"
|
87 |
contents:
|
88 |
- href: vision/tasks.qmd
|
|
|
29 |
- href: about.qmd
|
30 |
text: About this Cookbook
|
31 |
|
32 |
+
- section: "Thoughts"
|
33 |
+
contents:
|
34 |
+
- href: theory/dont_mess_with_kittens.qmd
|
35 |
+
text: "High Stakes, and... Kittens"
|
36 |
+
- href: theory/good_enough.qmd
|
37 |
+
text: "What is Good Enough ?"
|
38 |
+
|
39 |
- section: "Theory"
|
40 |
contents:
|
41 |
- href: theory/activations.qmd
|
|
|
59 |
text: "Optimizers"
|
60 |
- href: theory/quantization.qmd
|
61 |
text: "Quantization"
|
62 |
+
- href: theory/perplexity_in_ai.qmd
|
63 |
+
text: "Perplexity and Quantization"
|
64 |
- href: theory/regularization.qmd
|
65 |
text: "Regularization"
|
66 |
+
- section: "Training"
|
67 |
+
href: theory/training.qmd
|
68 |
+
contents:
|
69 |
+
- href: theory/training.qmd
|
70 |
+
text: "Training"
|
71 |
+
- href: theory/dying_neurons.qmd
|
72 |
+
text: "Dying Neurons"
|
73 |
+
- href: theory/overfitting.qmd
|
74 |
+
text: "Overfitting"
|
75 |
+
|
76 |
+
- href: theory/perplexity_in_ai.qmd
|
77 |
+
text: "Perplexity and Quantization"
|
78 |
- href: theory/mixture_of_models.qmd
|
79 |
text: "Mixture of Models"
|
|
|
|
|
80 |
|
81 |
- section: "Large Language Models"
|
82 |
contents:
|
|
|
|
|
83 |
- href: llms/tasks.qmd
|
84 |
text: "Tasks"
|
85 |
- href: llms/tokenizers.qmd
|
|
|
96 |
- href: llms/rag_systems.qmd
|
97 |
text: "Retrival Augmented Generation"
|
98 |
|
|
|
|
|
99 |
- section: "Computer Vision Models"
|
100 |
contents:
|
101 |
- href: vision/tasks.qmd
|
src/theory/dying_neurons.qmd
ADDED
@@ -0,0 +1,84 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Detecting Dying Neurons in Artificial Intelligence Modnels
|
2 |
+
|
3 |
+
## Introduction
|
4 |
+
|
5 |
+
In the field of artificial intelligence (AI), particularly deep learning, "dying" neurons refer to a phenomenon where certain neurons within an AI model stop contributing to the network's output. This can occur due to various reasons such as improper initialization or extreme weight updates during training. Detecting and addressing dying neurons is crucial for maintaining optimal performance in deep learning models.
|
6 |
+
|
7 |
+
## Understanding Dying Neurons
|
8 |
+
|
9 |
+
During the training process, AI models learn by adjusting their weights based on input data. However, if a neuron's output remains consistently close to zero or its gradient becomes negligible, it is considered "dead" or "dying." This can lead to suboptimal performance and reduced model accuracy.
|
10 |
+
|
11 |
+
## Detecting Dying Neurons in AI Models
|
12 |
+
|
13 |
+
Detecting dying neurons involves analyzing the activations of each layer within a deep learning model during training. By monitoring these activations, we can identify if any neuron is not contributing to the network's output and take appropriate measures to address it.
|
14 |
+
|
15 |
+
### Step 1: Set Up Your Environment
|
16 |
+
|
17 |
+
Firstly, ensure that you have installed all necessary libraries for your project. For this example, we will use TensorFlow as our deep learning framework. Install TensorFlow using pip:
|
18 |
+
|
19 |
+
```bash
|
20 |
+
pip install tensorflow
|
21 |
+
```
|
22 |
+
|
23 |
+
Next, import the required modules in your Python script:
|
24 |
+
|
25 |
+
```python
|
26 |
+
import tensorflow as tf
|
27 |
+
from tensorflow.keras.models import Sequential
|
28 |
+
from tensorflow.keras.layers import Dense
|
29 |
+
```
|
30 |
+
|
31 |
+
### Step 2: Create a Sample Model
|
32 |
+
|
33 |
+
For demonstration purposes, let's create a simple feedforward neural network with one hidden layer and an output layer using the Keras API in TensorFlow:
|
34 |
+
|
35 |
+
```python
|
36 |
+
model = Sequential([
|
37 |
+
Dense(64, activation='relu', input_shape=(784,)), # Hidden Layer
|
38 |
+
Dense(10, activation='softmax') # Output Layer
|
39 |
+
])
|
40 |
+
```
|
41 |
+
|
42 |
+
### Step 3: Monitoring Neuron Activations During Training
|
43 |
+
|
44 |
+
To detect dying neurons during training, we need to monitor the activations of each layer. We can achieve this by creating a custom callback in Keras that logs the mean activation value for each layer after every epoch:
|
45 |
+
|
46 |
+
```python
|
47 |
+
class DeadNeuronDetector(tf.keras.callbacks.Callback):
|
48 |
+
def on_epoch_end(self, epoch, logs=None):
|
49 |
+
print("\nEpoch {} --------------------------------------".format(epoch))
|
50 |
+
|
51 |
+
for layer in self.model.layers:
|
52 |
+
if 'activation' in layer.get_config():
|
53 |
+
activations = layer.output[:5] # Get the first five samples of output from this layer
|
54 |
+
|
55 |
+
mean_activation = tf.reduce_mean(tf.abs(activations))
|
56 |
+
print("Mean activation for {} is: {:.4f}".format(layer.name, mean_activation.numpy()))
|
57 |
+
```
|
58 |
+
|
59 |
+
### Step 4: Train the Model and Detect Dying Neurons
|
60 |
+
|
61 |
+
Now that we have our custom callback set up, let's train our model using a sample dataset (e.g., MNIST) while monitoring neuron activations:
|
62 |
+
|
63 |
+
```python
|
64 |
+
(x_train, y_n_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
|
65 |
+
x_train, x_test = x_train / 255.0, x_test / 255.0
|
66 |
+
|
67 |
+
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
|
68 |
+
dead_neuron_detector = DeadNeuronDetector()
|
69 |
+
history = model.fit(x_train, y_n_train, epochs=10, validation_data=(x_test, y_test), callbacks=[dead_neuron_detector])
|
70 |
+
```
|
71 |
+
|
72 |
+
### Step 5: Addressing Dying Neurons
|
73 |
+
|
74 |
+
If you detect dying neurons during training, consider the following approaches to address this issue:
|
75 |
+
|
76 |
+
1. **Weight Initialization**: Use different weight initialization techniques such as He or Glorot initialization instead of default ones like Xavier or random normal.
|
77 |
+
2. **Learning Rate Adjustment**: Try using a smaller learning rate or implementing adaptive learning rates (e.g., Adam, AdaGrad) to prevent extreme updates that may cause neurons to die.
|
78 |
+
3. **Regularization Techniques**: Apply regularization techniques like dropout or L1/L2 regularization to encourage the model to learn more robust features and reduce overfitting.
|
79 |
+
4. **Batch Normalization**: Incorporate batch normalization layers in your network architecture, which can help maintain stable activations throughout training.
|
80 |
+
5. **Revive Dead Neurons**: If a neuron dies during training, you may try to revive it by reinitializing its weights and continuing the training process.
|
81 |
+
|
82 |
+
## Conclusion
|
83 |
+
|
84 |
+
Detecting dying neurons in AI models is essential for maintaining optimal performance and accuracy. By monitoring layer activations during training using custom callbacks, we can identify dead or dying neurons and take appropriate measures to address them. Implementing techniques such as weight initialization adjustments, learning rate tuning, regularization methods, batch normalization, and reviving dead neurons can help mitigate this issue in deep learning models.
|
src/theory/overfitting.qmd
ADDED
@@ -0,0 +1,94 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Understanding Overfitting in Machine Learning Models
|
2 |
+
|
3 |
+
## Introduction
|
4 |
+
|
5 |
+
Overfitting occurs when a machine learning model learns to perform well on its training data but fails to generalize and make accurate predictions on new, unseen data. This phenomenon can lead to poor performance of the model in real-world scenarios. In this article, we will discuss overfamming, how to detect it using training metrics, and provide code examples with plots that illustrate the concept.
|
6 |
+
|
7 |
+
## Detecting Overfitting Using Training Metrics
|
8 |
+
|
9 |
+
To identify if a machine learning model is suffering from overfitting, you can monitor its performance on both the training set and validation set during the training process. The key indicators of overfitting are:
|
10 |
+
|
11 |
+
1. High accuracy or low error rate on the training data but poor performance on the validation data.
|
12 |
+
2. A large gap between the model's performance metrics (e.g., accuracy, precision, recall) for the training and validation sets.
|
13 |
+
|
14 |
+
### Code Example
|
15 |
+
|
16 |
+
Here is a Python code example using scikit-learn to train a logistic regression classifier with overfitting:
|
17 |
+
|
18 |
+
```{python}
|
19 |
+
import numpy as np
|
20 |
+
from sklearn.datasets import make_classification
|
21 |
+
from sklearn.linear_model import LogisticRegression
|
22 |
+
from sklearn.metrics import accuracy_score, confusion_matrix
|
23 |
+
from sklearn.model_selection import train_test_split
|
24 |
+
|
25 |
+
# Generate synthetic data for demonstration purposes
|
26 |
+
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
|
27 |
+
|
28 |
+
# Split the dataset into training and validation sets
|
29 |
+
X_train, X_val, y_train, y_val = train_test_split(X, y, random_state=42)
|
30 |
+
|
31 |
+
# Train a logistic regression classifier with overfitting
|
32 |
+
clf = LogisticRegression(max_iter=100).fit(X_train, y_train)
|
33 |
+
|
34 |
+
# Evaluate the model on training and validation sets
|
35 |
+
y_pred_train = clf.predict(X_train)
|
36 |
+
y_pred_val = clf.predict(X_val)
|
37 |
+
|
38 |
+
print("Training accuracy:", accuracy_score(y_train, y_pred_train))
|
39 |
+
print("Validation accuracy:", accuracy_score(y_val, y_pred_val))
|
40 |
+
```
|
41 |
+
|
42 |
+
## Visualizing Overfitting with Plots
|
43 |
+
|
44 |
+
To better understand overfitting and its impact on model performance, we can visualize the training metrics using plots. Here are two examples of code blocks that generate plots for illustrating overfitting:
|
45 |
+
|
46 |
+
### Plot 1: Training vs Validation Accuracy
|
47 |
+
|
48 |
+
```{python}
|
49 |
+
import matplotlib.pyplot as plt
|
50 |
+
|
51 |
+
train_accuracies = [0.95, 0.96, 0, 0.97] # Example training accuracies for different epochs
|
52 |
+
val_accuracies = [0.75, 0.72, 0.71, 0.70] # Corresponding validation accuracies
|
53 |
+
|
54 |
+
plt.plot(train_accuracies, label="Training Accuracy")
|
55 |
+
plt.plot(val_accuracies, label="Validation Accuracy")
|
56 |
+
plt.xlabel("Epoch")
|
57 |
+
plt.ylabel("Accuracy")
|
58 |
+
plt.title("Overfitting: Training vs Validation Accuracy")
|
59 |
+
plt.legend()
|
60 |
+
plt.show()
|
61 |
+
```
|
62 |
+
|
63 |
+
### Plot 2: Learning Curves for Overfitting Detection
|
64 |
+
|
65 |
+
Learning curves are a powerful tool to visualize the relationship between training and validation performance as more data is used during model training. Here's an example of generating learning curves using scikit-learn:
|
66 |
+
|
67 |
+
```{python}
|
68 |
+
from sklearn.model_selection import learning_curve
|
69 |
+
import matplotlib.pyplot as plt
|
70 |
+
|
71 |
+
train_sizes, train_scores, val_scores = learning_curve(clf, X, y, cv=5)
|
72 |
+
|
73 |
+
# Calculate mean and standard deviation of training set scores
|
74 |
+
train_mean = np.mean(train_scores, axis=1)
|
75 |
+
train_std = np.std(train_scores, axis=1)
|
76 |
+
|
77 |
+
# Calculate mean and standard deviation of validation set scores
|
78 |
+
val_mean = np.mean(val_scores, axis=1)
|
79 |
+
val_std = np.std(val_scores, axis=1)
|
80 |
+
|
81 |
+
plt.fill_between(train_sizes, train_mean - train_std, train_mean + train_std, alpha=0.1, color="r")
|
82 |
+
plt.title(label="Training Score", color="r")
|
83 |
+
plt.fill_between(train_sizes, val_mean - val_std, val_mean + val_std, alpha=0.1, color="g")
|
84 |
+
plt.plot(train_sizes, val_mean, label="Cross-validation Score", color="g")
|
85 |
+
plt.xlabel("Training examples used")
|
86 |
+
plt.ylabel("Score")
|
87 |
+
plt.title("Learning Curves for Overfitting Detection")
|
88 |
+
plt.legend()
|
89 |
+
plt.show()
|
90 |
+
```
|
91 |
+
|
92 |
+
## Conclusion
|
93 |
+
|
94 |
+
Overfitting is a common challenge in machine learning, and it can lead to poor model performance on unseen data. By monitoring training metrics such as accuracy or error rates and visualizing the results using plots like training vs validation accuracy graphs and learning curves, you can detect overfitting early during the model development process. This allows for timely interventions, such as regularization techniques or adjusting hyperparameters to improve your model's generalization capabilities.
|
src/theory/perplexity_in_ai.qmd
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Perplexity in AI Models
|
2 |
+
|
3 |
+
Quantization, as detailed in the [Quantization](https://sebdg-ai-cookbook.hf.space/theory/quantization.html) page, reduces the memory footprint of neural networks by using lower-precision formats. This technique is vital for deploying models on devices with limited computational power.
|
4 |
+
|
5 |
+
## Introducing the Perplexity Metric
|
6 |
+
|
7 |
+
Perplexity is a key metric used to evaluate language models, measuring their effectiveness in predicting the next word in a sequence. It essentially indicates the model's uncertainty; a lower perplexity means better predictive performance.
|
8 |
+
|
9 |
+
## What is Perplexity?
|
10 |
+
|
11 |
+
Perplexity is defined as the exponentiation of the entropy of the model's probability distribution. For language models, it is computed as:
|
12 |
+
|
13 |
+
\[ \text{Perplexity}(P) = \exp \left( -\frac{1}{N} \sum_{i=1}^{N} \log P(w_i | w_1, w_2, \ldots, w_{i-1}) \right) \]
|
14 |
+
|
15 |
+
Here, \( w_i \) represents the \(i\)-th word in the sequence, and \( P(w_i | w_1, w_2, \ldots, w_{i-1}) \) is the conditional probability of the \(i\)-th word given the previous words.
|
16 |
+
|
17 |
+
## Importance of Perplexity in AI
|
18 |
+
|
19 |
+
Perplexity provides a single scalar value that summarizes how well a language model predicts test data, facilitating comparisons between models or versions of the same model.
|
20 |
+
|
21 |
+
## Relating Perplexity to Quantization
|
22 |
+
|
23 |
+
While quantization itself doesn’t directly affect perplexity, the reduction in model precision can impact overall performance, potentially increasing perplexity if errors are introduced. Balancing memory efficiency from quantization with maintaining low perplexity is crucial.
|
24 |
+
|
25 |
+
## Conclusion
|
26 |
+
|
27 |
+
Quantization optimizes AI models for deployment on resource-constrained devices. Understanding perplexity helps in evaluating model effectiveness. For a deeper dive into quantization, visit the [Quantization](https://sebdg-ai-cookbook.hf.space/theory/quantization.html) page.
|