|
--- |
|
license: wtfpl |
|
inference: true |
|
datasets: |
|
- Onion_News |
|
- wikitext-2-raw-v1-preprocessed |
|
- wikipedia/wikimedia |
|
language: |
|
- en |
|
metrics: |
|
- f1 |
|
- accuracy |
|
- precision |
|
- perplexity |
|
base_model: |
|
- Wonder-Griffin/TraXL |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- text-generation-inference |
|
- causal-LM |
|
- question-answering |
|
- conversational |
|
--- |
|
|
|
TraXLMistral |
|
|
|
Created by: Morgan Griffin & WongrifferousAI (Wonder-Griffin) |
|
|
|
https://github.com/GriffinDevelopments/TraXLMistraLForCausalLM |
|
|
|
#Model Description |
|
|
|
TraXLMistral is a custom language model based on the GPT-2 architecture with additional enhancements for various tasks including causal language modeling, sequence classification, and question answering. The model incorporates several advanced techniques such as sparse attention, memory-augmented neural networks (MANN), adaptive computation time (ACT), and latent space clustering, making it suitable for both reasoning and general-purpose text generation. |
|
|
|
#Key Features: |
|
|
|
Sparse Attention: Efficient attention mechanism inspired by Mistral, focusing computational resources on important elements in the sequence. |
|
Memory-Augmented Neural Networks (MANN): Enhances model capacity by adding external memory to better handle long-term dependencies and complex reasoning tasks. |
|
Adaptive Computation Time (ACT): Dynamically adjusts the number of computation steps based on the complexity of the input. |
|
Latent Space Clustering: Clusters latent representations for improved interpretability and task-specific adjustments. |
|
Logical Transformer Layer: Improves the model's reasoning capabilities by integrating logical transformations. |
|
|
|
Intended Uses & Limitations |
|
|
|
#Use Cases: |
|
|
|
Text Generation: Generating coherent and contextually relevant text in a wide range of domains, including conversational agents, story generation, and creative writing. |
|
Question Answering: Providing accurate and concise answers to natural language questions. |
|
Sequence Classification: Classification of text into predefined categories such as sentiment analysis, document categorization, or other NLP tasks. |
|
Conversational AI: Suitable for applications requiring interactive and context-aware conversation. |
|
|
|
#Limitations: |
|
|
|
This model may require additional fine-tuning for domain-specific tasks where the input data differs significantly from the training data. |
|
Due to the use of sparse attention and memory modules, the model may require more resources (GPU memory) compared to simpler architectures. |
|
|
|
Training Procedure |
|
|
|
The model was trained using the Wikitext-raw-01 dataset (details needed) and fine-tuned for various tasks such as causal language modeling, question answering, and sequence classification. #Training Hyperparameters: |
|
|
|
Learning Rate: 5e-05 |
|
Train Batch Size: 8 |
|
Eval Batch Size: 8 |
|
Optimizer: Adam (betas = (0.9, 0.999), epsilon = 1e-08) |
|
LR Scheduler: Linear |
|
Training Steps: 100,000 |
|
Seed: 42 |
|
|
|
#Training Environment: |
|
|
|
Transformers version: 4.45.0.dev0 |
|
PyTorch version: 2.4.0+cu124 |
|
Datasets version: 2.20.0 |
|
Tokenizers version: 0.19.1 |
|
GPU: The model is trained using GPU acceleration, with checks for CUDA availability and multiple GPUs. |
|
|
|
Model Architecture |
|
|
|
##Configuration: |
|
|
|
Model Type: Hybrid Transformer with GPT/Mistral/TransformerXL (Causal LM) |
|
Vocab Size: 50256 |
|
Hidden Size: 768 |
|
Number of Layers: 4 |
|
Number of Attention Heads: 4 |
|
Feedforward Expansion Factor: 4 |
|
RNN Units: 128 |
|
Max Sequence Length: 256 |
|
Dropout Rate: 0.1 |
|
Sparse Attention: Enabled |
|
Memory Size: 256 |
|
Max Computation Steps: 5 |
|
Dynamic Routing: Enabled |
|
|
|
##Special Modules: |
|
|
|
Sparse Attention Layer: Improves efficiency by reducing unnecessary attention computation. |
|
Adaptive Computation Time (ACT): Adjusts computation time based on input complexity. |
|
Memory-Augmented Neural Networks (MANN): Provides external memory to help with long-term dependencies. |
|
Latent Space Clustering: Clusters latent representations for improved task-specific behavior. |
|
Logical Transformer Layer: Improves reasoning and logic-based tasks. |
|
|
|
##Supported Tasks: |
|
|
|
Causal Language Modeling (causal_lm): Generates text sequences based on a given prompt. |
|
Question Answering (qa): Extracts relevant answers from a context given a question. |
|
Sequence Classification: Classifies input sequences into one of the predefined labels. |
|
|
|
##Evaluation## |
|
|
|
The model was evaluated on several NLP benchmarks, but detailed results are pending. The primary metrics used for evaluation include accuracy, F1-score, and precision. Evaluation Metrics: |
|
|
|
Accuracy |
|
F1-score |
|
Precision |
|
|
|
Intended Users |
|
|
|
This model is designed for researchers, developers, and organizations looking to implement advanced NLP models in production. It can be used for building conversational agents, question-answering systems, text generation applications, and more. How to Use Inference Example """" |
|
|
|
python |
|
|
|
from transformers import BertTokenizerFast, TraXLMistral |
|
|
|
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased') model = TraXLMistral.from_pretrained('Wonder-Griffin/TraXLMistral') |
|
|
|
input_text = "What is the capital of France?" inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(**inputs) print(outputs) """" Limitations and Future Work |
|
|
|
Limited Training Data: Future iterations should focus on expanding the dataset and improving performance across different languages and domains. |
|
Memory Usage: Due to its complex architecture, this model might require optimizations for resource-constrained environments. |
|
|
|
Acknowledgements |
|
|
|
**Created by Morgan Griffin and WongrifferousAI (Wonder-Griffin)** |