Sébastien De Greef commited on
Commit
9735e61
1 Parent(s): 6a45255

feat: Add "Perplexity in AI Models" to theory section

Browse files
src/_quarto.yml CHANGED
@@ -29,6 +29,13 @@ website:
29
  - href: about.qmd
30
  text: About this Cookbook
31
 
 
 
 
 
 
 
 
32
  - section: "Theory"
33
  contents:
34
  - href: theory/activations.qmd
@@ -52,19 +59,27 @@ website:
52
  text: "Optimizers"
53
  - href: theory/quantization.qmd
54
  text: "Quantization"
 
 
55
  - href: theory/regularization.qmd
56
  text: "Regularization"
57
- - href: theory/training.qmd
58
- text: "Training"
 
 
 
 
 
 
 
 
 
 
59
  - href: theory/mixture_of_models.qmd
60
  text: "Mixture of Models"
61
- - href: theory/dont_mess_with_kittens.qmd
62
- text: "High Stakes, and... Kittens"
63
 
64
  - section: "Large Language Models"
65
  contents:
66
- - href: theory/good_enough.qmd
67
- text: "What is Good Enough ?"
68
  - href: llms/tasks.qmd
69
  text: "Tasks"
70
  - href: llms/tokenizers.qmd
@@ -81,8 +96,6 @@ website:
81
  - href: llms/rag_systems.qmd
82
  text: "Retrival Augmented Generation"
83
 
84
-
85
-
86
  - section: "Computer Vision Models"
87
  contents:
88
  - href: vision/tasks.qmd
 
29
  - href: about.qmd
30
  text: About this Cookbook
31
 
32
+ - section: "Thoughts"
33
+ contents:
34
+ - href: theory/dont_mess_with_kittens.qmd
35
+ text: "High Stakes, and... Kittens"
36
+ - href: theory/good_enough.qmd
37
+ text: "What is Good Enough ?"
38
+
39
  - section: "Theory"
40
  contents:
41
  - href: theory/activations.qmd
 
59
  text: "Optimizers"
60
  - href: theory/quantization.qmd
61
  text: "Quantization"
62
+ - href: theory/perplexity_in_ai.qmd
63
+ text: "Perplexity and Quantization"
64
  - href: theory/regularization.qmd
65
  text: "Regularization"
66
+ - section: "Training"
67
+ href: theory/training.qmd
68
+ contents:
69
+ - href: theory/training.qmd
70
+ text: "Training"
71
+ - href: theory/dying_neurons.qmd
72
+ text: "Dying Neurons"
73
+ - href: theory/overfitting.qmd
74
+ text: "Overfitting"
75
+
76
+ - href: theory/perplexity_in_ai.qmd
77
+ text: "Perplexity and Quantization"
78
  - href: theory/mixture_of_models.qmd
79
  text: "Mixture of Models"
 
 
80
 
81
  - section: "Large Language Models"
82
  contents:
 
 
83
  - href: llms/tasks.qmd
84
  text: "Tasks"
85
  - href: llms/tokenizers.qmd
 
96
  - href: llms/rag_systems.qmd
97
  text: "Retrival Augmented Generation"
98
 
 
 
99
  - section: "Computer Vision Models"
100
  contents:
101
  - href: vision/tasks.qmd
src/theory/dying_neurons.qmd ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Detecting Dying Neurons in Artificial Intelligence Modnels
2
+
3
+ ## Introduction
4
+
5
+ In the field of artificial intelligence (AI), particularly deep learning, "dying" neurons refer to a phenomenon where certain neurons within an AI model stop contributing to the network's output. This can occur due to various reasons such as improper initialization or extreme weight updates during training. Detecting and addressing dying neurons is crucial for maintaining optimal performance in deep learning models.
6
+
7
+ ## Understanding Dying Neurons
8
+
9
+ During the training process, AI models learn by adjusting their weights based on input data. However, if a neuron's output remains consistently close to zero or its gradient becomes negligible, it is considered "dead" or "dying." This can lead to suboptimal performance and reduced model accuracy.
10
+
11
+ ## Detecting Dying Neurons in AI Models
12
+
13
+ Detecting dying neurons involves analyzing the activations of each layer within a deep learning model during training. By monitoring these activations, we can identify if any neuron is not contributing to the network's output and take appropriate measures to address it.
14
+
15
+ ### Step 1: Set Up Your Environment
16
+
17
+ Firstly, ensure that you have installed all necessary libraries for your project. For this example, we will use TensorFlow as our deep learning framework. Install TensorFlow using pip:
18
+
19
+ ```bash
20
+ pip install tensorflow
21
+ ```
22
+
23
+ Next, import the required modules in your Python script:
24
+
25
+ ```python
26
+ import tensorflow as tf
27
+ from tensorflow.keras.models import Sequential
28
+ from tensorflow.keras.layers import Dense
29
+ ```
30
+
31
+ ### Step 2: Create a Sample Model
32
+
33
+ For demonstration purposes, let's create a simple feedforward neural network with one hidden layer and an output layer using the Keras API in TensorFlow:
34
+
35
+ ```python
36
+ model = Sequential([
37
+ Dense(64, activation='relu', input_shape=(784,)), # Hidden Layer
38
+ Dense(10, activation='softmax') # Output Layer
39
+ ])
40
+ ```
41
+
42
+ ### Step 3: Monitoring Neuron Activations During Training
43
+
44
+ To detect dying neurons during training, we need to monitor the activations of each layer. We can achieve this by creating a custom callback in Keras that logs the mean activation value for each layer after every epoch:
45
+
46
+ ```python
47
+ class DeadNeuronDetector(tf.keras.callbacks.Callback):
48
+ def on_epoch_end(self, epoch, logs=None):
49
+ print("\nEpoch {} --------------------------------------".format(epoch))
50
+
51
+ for layer in self.model.layers:
52
+ if 'activation' in layer.get_config():
53
+ activations = layer.output[:5] # Get the first five samples of output from this layer
54
+
55
+ mean_activation = tf.reduce_mean(tf.abs(activations))
56
+ print("Mean activation for {} is: {:.4f}".format(layer.name, mean_activation.numpy()))
57
+ ```
58
+
59
+ ### Step 4: Train the Model and Detect Dying Neurons
60
+
61
+ Now that we have our custom callback set up, let's train our model using a sample dataset (e.g., MNIST) while monitoring neuron activations:
62
+
63
+ ```python
64
+ (x_train, y_n_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
65
+ x_train, x_test = x_train / 255.0, x_test / 255.0
66
+
67
+ model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
68
+ dead_neuron_detector = DeadNeuronDetector()
69
+ history = model.fit(x_train, y_n_train, epochs=10, validation_data=(x_test, y_test), callbacks=[dead_neuron_detector])
70
+ ```
71
+
72
+ ### Step 5: Addressing Dying Neurons
73
+
74
+ If you detect dying neurons during training, consider the following approaches to address this issue:
75
+
76
+ 1. **Weight Initialization**: Use different weight initialization techniques such as He or Glorot initialization instead of default ones like Xavier or random normal.
77
+ 2. **Learning Rate Adjustment**: Try using a smaller learning rate or implementing adaptive learning rates (e.g., Adam, AdaGrad) to prevent extreme updates that may cause neurons to die.
78
+ 3. **Regularization Techniques**: Apply regularization techniques like dropout or L1/L2 regularization to encourage the model to learn more robust features and reduce overfitting.
79
+ 4. **Batch Normalization**: Incorporate batch normalization layers in your network architecture, which can help maintain stable activations throughout training.
80
+ 5. **Revive Dead Neurons**: If a neuron dies during training, you may try to revive it by reinitializing its weights and continuing the training process.
81
+
82
+ ## Conclusion
83
+
84
+ Detecting dying neurons in AI models is essential for maintaining optimal performance and accuracy. By monitoring layer activations during training using custom callbacks, we can identify dead or dying neurons and take appropriate measures to address them. Implementing techniques such as weight initialization adjustments, learning rate tuning, regularization methods, batch normalization, and reviving dead neurons can help mitigate this issue in deep learning models.
src/theory/overfitting.qmd ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Understanding Overfitting in Machine Learning Models
2
+
3
+ ## Introduction
4
+
5
+ Overfitting occurs when a machine learning model learns to perform well on its training data but fails to generalize and make accurate predictions on new, unseen data. This phenomenon can lead to poor performance of the model in real-world scenarios. In this article, we will discuss overfamming, how to detect it using training metrics, and provide code examples with plots that illustrate the concept.
6
+
7
+ ## Detecting Overfitting Using Training Metrics
8
+
9
+ To identify if a machine learning model is suffering from overfitting, you can monitor its performance on both the training set and validation set during the training process. The key indicators of overfitting are:
10
+
11
+ 1. High accuracy or low error rate on the training data but poor performance on the validation data.
12
+ 2. A large gap between the model's performance metrics (e.g., accuracy, precision, recall) for the training and validation sets.
13
+
14
+ ### Code Example
15
+
16
+ Here is a Python code example using scikit-learn to train a logistic regression classifier with overfitting:
17
+
18
+ ```{python}
19
+ import numpy as np
20
+ from sklearn.datasets import make_classification
21
+ from sklearn.linear_model import LogisticRegression
22
+ from sklearn.metrics import accuracy_score, confusion_matrix
23
+ from sklearn.model_selection import train_test_split
24
+
25
+ # Generate synthetic data for demonstration purposes
26
+ X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
27
+
28
+ # Split the dataset into training and validation sets
29
+ X_train, X_val, y_train, y_val = train_test_split(X, y, random_state=42)
30
+
31
+ # Train a logistic regression classifier with overfitting
32
+ clf = LogisticRegression(max_iter=100).fit(X_train, y_train)
33
+
34
+ # Evaluate the model on training and validation sets
35
+ y_pred_train = clf.predict(X_train)
36
+ y_pred_val = clf.predict(X_val)
37
+
38
+ print("Training accuracy:", accuracy_score(y_train, y_pred_train))
39
+ print("Validation accuracy:", accuracy_score(y_val, y_pred_val))
40
+ ```
41
+
42
+ ## Visualizing Overfitting with Plots
43
+
44
+ To better understand overfitting and its impact on model performance, we can visualize the training metrics using plots. Here are two examples of code blocks that generate plots for illustrating overfitting:
45
+
46
+ ### Plot 1: Training vs Validation Accuracy
47
+
48
+ ```{python}
49
+ import matplotlib.pyplot as plt
50
+
51
+ train_accuracies = [0.95, 0.96, 0, 0.97] # Example training accuracies for different epochs
52
+ val_accuracies = [0.75, 0.72, 0.71, 0.70] # Corresponding validation accuracies
53
+
54
+ plt.plot(train_accuracies, label="Training Accuracy")
55
+ plt.plot(val_accuracies, label="Validation Accuracy")
56
+ plt.xlabel("Epoch")
57
+ plt.ylabel("Accuracy")
58
+ plt.title("Overfitting: Training vs Validation Accuracy")
59
+ plt.legend()
60
+ plt.show()
61
+ ```
62
+
63
+ ### Plot 2: Learning Curves for Overfitting Detection
64
+
65
+ Learning curves are a powerful tool to visualize the relationship between training and validation performance as more data is used during model training. Here's an example of generating learning curves using scikit-learn:
66
+
67
+ ```{python}
68
+ from sklearn.model_selection import learning_curve
69
+ import matplotlib.pyplot as plt
70
+
71
+ train_sizes, train_scores, val_scores = learning_curve(clf, X, y, cv=5)
72
+
73
+ # Calculate mean and standard deviation of training set scores
74
+ train_mean = np.mean(train_scores, axis=1)
75
+ train_std = np.std(train_scores, axis=1)
76
+
77
+ # Calculate mean and standard deviation of validation set scores
78
+ val_mean = np.mean(val_scores, axis=1)
79
+ val_std = np.std(val_scores, axis=1)
80
+
81
+ plt.fill_between(train_sizes, train_mean - train_std, train_mean + train_std, alpha=0.1, color="r")
82
+ plt.title(label="Training Score", color="r")
83
+ plt.fill_between(train_sizes, val_mean - val_std, val_mean + val_std, alpha=0.1, color="g")
84
+ plt.plot(train_sizes, val_mean, label="Cross-validation Score", color="g")
85
+ plt.xlabel("Training examples used")
86
+ plt.ylabel("Score")
87
+ plt.title("Learning Curves for Overfitting Detection")
88
+ plt.legend()
89
+ plt.show()
90
+ ```
91
+
92
+ ## Conclusion
93
+
94
+ Overfitting is a common challenge in machine learning, and it can lead to poor model performance on unseen data. By monitoring training metrics such as accuracy or error rates and visualizing the results using plots like training vs validation accuracy graphs and learning curves, you can detect overfitting early during the model development process. This allows for timely interventions, such as regularization techniques or adjusting hyperparameters to improve your model's generalization capabilities.
src/theory/perplexity_in_ai.qmd ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Perplexity in AI Models
2
+
3
+ Quantization, as detailed in the [Quantization](https://sebdg-ai-cookbook.hf.space/theory/quantization.html) page, reduces the memory footprint of neural networks by using lower-precision formats. This technique is vital for deploying models on devices with limited computational power.
4
+
5
+ ## Introducing the Perplexity Metric
6
+
7
+ Perplexity is a key metric used to evaluate language models, measuring their effectiveness in predicting the next word in a sequence. It essentially indicates the model's uncertainty; a lower perplexity means better predictive performance.
8
+
9
+ ## What is Perplexity?
10
+
11
+ Perplexity is defined as the exponentiation of the entropy of the model's probability distribution. For language models, it is computed as:
12
+
13
+ \[ \text{Perplexity}(P) = \exp \left( -\frac{1}{N} \sum_{i=1}^{N} \log P(w_i | w_1, w_2, \ldots, w_{i-1}) \right) \]
14
+
15
+ Here, \( w_i \) represents the \(i\)-th word in the sequence, and \( P(w_i | w_1, w_2, \ldots, w_{i-1}) \) is the conditional probability of the \(i\)-th word given the previous words.
16
+
17
+ ## Importance of Perplexity in AI
18
+
19
+ Perplexity provides a single scalar value that summarizes how well a language model predicts test data, facilitating comparisons between models or versions of the same model.
20
+
21
+ ## Relating Perplexity to Quantization
22
+
23
+ While quantization itself doesn’t directly affect perplexity, the reduction in model precision can impact overall performance, potentially increasing perplexity if errors are introduced. Balancing memory efficiency from quantization with maintaining low perplexity is crucial.
24
+
25
+ ## Conclusion
26
+
27
+ Quantization optimizes AI models for deployment on resource-constrained devices. Understanding perplexity helps in evaluating model effectiveness. For a deeper dive into quantization, visit the [Quantization](https://sebdg-ai-cookbook.hf.space/theory/quantization.html) page.