AI Termcirca 2017· Added May 29, 2026

Transformer Model

A transformer model is an architecture for processing sequential data without relying on sequence-specific recurrence or convolution.

The transformer model revolutionized natural language processing by introducing a mechanism known as 'attention', which allows it to weigh the significance of different parts of input sequences differently. This contrasts with recurrent neural networks (RNNs) and convolutional neural networks (CNNs), which process sequences through recurrence and convolution respectively. The self-attention mechanism in transformers enables parallelization, speeding up training times and improving performance on tasks such as translation, summarization, and other NLP tasks. The BERT and GPT series of models build upon this architecture, showcasing its flexibility and power.

Examples

GPT-3 is based on the transformer architecture to generate human-like text.
BERT uses transformers for bidirectional language understanding.

Common misconceptions

Transformers do not require ordered input like RNNs do; they handle positions using embeddings.
Attention does not mean focus on only one part but distributing focus across the sequence.

Want more like this?

Open the full library

Plain-English AI lessons, prompts and guides.

Start free Browse library