Papers
arxiv:2509.00605

Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling

Published on Aug 30
· Submitted by rishiraj on Sep 3

Abstract

Gated Associative Memory (GAM) network offers a linear complexity alternative to the Transformer architecture, improving training speed and achieving competitive validation perplexity.

AI-generated summary

The Transformer architecture, underpinned by the self-attention mechanism, has become the de facto standard for sequence modeling tasks. However, its core computational primitive scales quadratically with sequence length (O(N^2)), creating a significant bottleneck for processing long contexts. In this paper, we propose the Gated Associative Memory (GAM) network, a novel, fully parallel architecture for sequence modeling that exhibits linear complexity (O(N)) with respect to sequence length. The GAM block replaces the self-attention layer with two parallel pathways: a causal convolution to efficiently capture local, position-dependent context, and a parallel associative memory retrieval mechanism to model global, content-based patterns. These pathways are dynamically fused using a gating mechanism, allowing the model to flexibly combine local and global information for each token. We implement GAM from scratch and conduct a rigorous comparative analysis against a standard Transformer model and a modern linear-time baseline (Mamba) on the WikiText-2 benchmark, as well as against the Transformer on the TinyStories dataset. Our experiments demonstrate that GAM is consistently faster, outperforming both baselines on training speed, and achieves a superior or competitive final validation perplexity across all datasets, establishing it as a promising and efficient alternative for sequence modeling.

Community

Paper author Paper submitter

I look forward to hearing what the community thinks.

Would be interesting to see more benchmarks to Transformer with regard to the same large training data

·

true. i'm trying to get hands on some compute to conduct more experiments. i'll update.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.00605 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.00605 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.00605 in a Space README.md to link it from this page.

Collections including this paper 2