Basemodel: roBERTa

Configs: Vocab size: 10,000 Hidden size: 512 Max position embeddings: 512 Number of layers: 2 Number of heads: 4 Window size: 256 Intermediate-size: 1024

Results:

Task: glue Score: 57.69 Confidence Interval: [56.75, 58.73]
Task: blimp Score: 59.25 Confidence Interval: [58.78, 59.65]

Downloads last month: 6

Inference Examples

Fill-Mask

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train AISE-TUDelft/Custom-Activations-BERT-Adaptive-GELU

Collection including AISE-TUDelft/Custom-Activations-BERT-Adaptive-GELU

BRP Tiny-Transformers

Collection

Models for the 2024-Q4 BSc. Research Project: "Architectural Decisions for Language Modelling with Small Transformers". • 14 items • Updated Jun 25