--- base_model: all-MiniLM-L6-v2 library_name: sentence-transformers license: apache-2.0 pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction - ontology - on2vec - graph-neural-networks - base-all-MiniLM-L6-v2 - general - general-ontology - fusion-cross_attention - gnn-gcn - small-ontology --- # chiro_all-MiniLM-L6-v2_cross_attention_gcn_h512_o64_cosine_e128_early This is a sentence-transformers model created with [on2vec](https://github.com/david4096/on2vec), which augments text embeddings with ontological knowledge using Graph Neural Networks. ## Model Details - **Base Text Model**: all-MiniLM-L6-v2 - Text Embedding Dimension: 384 - **Ontology**: chiro.owl - **Domain**: general - **Ontology Concepts**: 26 - **Concept Alignment**: 26/26 (100.0%) - **Fusion Method**: cross_attention - **GNN Architecture**: GCN - **Structural Embedding Dimension**: 26 - **Output Embedding Dimension**: 64 - **Hidden Dimensions**: 512 - **Dropout**: 0.0 - **Training Date**: 2025-09-19 - **on2vec Version**: 0.1.0 - **Source Ontology Size**: 0.2 MB - **Model Size**: 91.2 MB - **Library**: on2vec + sentence-transformers ## Technical Architecture This model uses a multi-stage architecture: 1. **Text Encoding**: Input text is encoded using the base sentence-transformer model 2. **Ontological Embedding**: Pre-trained GNN embeddings capture structural relationships 3. **Fusion Layer**: Simple concatenation of text and ontological embeddings **Embedding Flow:** - Text: 384 dimensions → 512 hidden → 64 output - Structure: 26 concepts → GNN → 64 output - Fusion: cross_attention → Final embedding ## How It Works This model combines: 1. **Text Embeddings**: Generated using the base sentence-transformer model 2. **Ontological Embeddings**: Created by training Graph Neural Networks on OWL ontology structure 3. **Fusion Layer**: Combines both embedding types using the specified fusion method The ontological knowledge helps the model better understand domain-specific relationships and concepts. ## Usage ```python from sentence_transformers import SentenceTransformer # Load the model model = SentenceTransformer('chiro_all-MiniLM-L6-v2_cross_attention_gcn_h512_o64_cosine_e128_early') # Generate embeddings sentences = ['Example sentence 1', 'Example sentence 2'] embeddings = model.encode(sentences) # Compute similarity from sentence_transformers.util import cos_sim similarity = cos_sim(embeddings[0], embeddings[1]) ``` ## Training Process This model was created using the on2vec pipeline: 1. **Ontology Processing**: The OWL ontology was converted to a graph structure 2. **GNN Training**: Graph Neural Networks were trained to learn ontological relationships 3. **Text Integration**: Base model text embeddings were combined with ontological embeddings 4. **Fusion Training**: The fusion layer was trained to optimally combine both embedding types ## Intended Use This model is particularly effective for: - General domain text processing - Tasks requiring understanding of domain-specific relationships - Semantic similarity in specialized domains - Classification tasks with domain knowledge requirements ## Limitations - Performance may vary on domains different from the training ontology - Ontological knowledge is limited to concepts present in the source OWL file - May have higher computational requirements than vanilla text models ## Citation If you use this model, please cite the on2vec framework: ```bibtex @software{on2vec, title={on2vec: Ontology Embeddings with Graph Neural Networks}, author={David Steinberg}, url={https://github.com/david4096/on2vec}, year={2024} } ``` --- Created with [on2vec](https://github.com/david4096/on2vec) 🧬→🤖