In Honour of This Year's NeurIPs Test of Time Paper Awardees

Community Article Published December 10, 2024

image/png

This year's NIPs Test of Time Paper Awards went to two groundbreaking papers:

  1. Generative Adversarial Nets (Goodfellow et al)
  2. Sequence to Sequence Learning with Neural Networks (Ilya et al)

Let's explore how these papers helped pioneered breakthroughs in today's AI.

  1. Generative Adversarial Nets (Goodfellow et al)

The GANs paper (with now over 85k citations) introduced a brilliant approach to generative modeling - as an adversarial game between two neural networks: a generator (G) that creates samples, and a discriminator (D) that tries to distinguish real from fake samples.

image/png

In a general GANs architecture, the generator G takes random noise z and maps it to synthetic data, while the discriminator D outputs a probability that its input came from real data. They're trained simultaneously - G tries to minimize log(1-D(G(z))) while D tries to maximize log(D(x)) + log(1-D(G(z))).

image/png

GANs solved a fundamental problem: previous generative models relied on explicit density estimation or Markov chains. GANs bypassed this by learning the generative process directly through an adversarial process. This enabled modeling of much more complex distributions. The impact? GANs led directly to breakthroughs like:

  • StyleGAN for photorealistic face synthesis
  • CycleGAN for unpaired image translation
  • BigGAN for high-fidelity image generation
  • Stable Diffusion's image generation components

image/png

The authors also outlined a comprehensive summary of the challenges to generative modeling then along with details on the pros and cons of GANs.

image/png

  1. Sequence to Sequence Learning with Neural Networks (Ilya et al)

The paper showed we could transform variable-length sequences end-to-end using encoder-decoder architecture i.e encode meaning into a vector, decode into a new sequence vector. This breakthrough enabled efficient neural machine translation and notably influenced the architectures behind todays LLMs.

This work is believe to have echoed the invention of the much loved Transformer architecture.

image/png

The seq2seq architecture uses two multilayered Long Short-Term Memory (LSTMs) models - one encodes the input sequence to a fixed-length vector and the other decodes the output sequence from this vector.

image/png

This end-to-end learning framework eliminated the need for complex, hand-engineered features.

The authors pointed out that while the simplest approach to general sequence learning was to use RNNs, they turned out to be very difficult to train end-to-end "due to resulting long term dependencies" - meaning information from the beginning of a sequence gets lost by the time it reaches the end.

Or in technical terms, RNNs struggled to maintain useful information over long sequences because their gradients would either vanish or explode during training, making it hard for them to learn connections between distant parts of the sequence.

image/png

With now over 27k citations, this work inspired the invention of the attention mechanism —used in the Transformer architecture that fuel today’s large language models.

Every step, from early sequence learning to today’s billion-parameter models, traces back to these core ideas.

There is no doubt that these papers are deserving of a test of time award, much thanks to the brilliant minds behind them Ilya Sutskever et al, Ian Goodfellow et al. Which is why this thread is in honour of their genius contributions in the field 🫡

image/png

image/png

That's it, thanks for reading. I have recently implemented the "Sequence to Sequence Learning with Neural Networks" paper, here is the notebook https://github.com/Jaykef/ai-algorithms/blob/main/seq2seq.ipynb