Back to Glossary

Transformer

The neural network architecture behind modern LLMs, using self-attention mechanisms to process sequences in parallel.

The transformer architecture, introduced in 2017's "Attention Is All You Need" paper, revolutionized natural language processing and later computer vision. Unlike earlier sequential models (RNNs, LSTMs), transformers process all tokens in a sequence simultaneously using self-attention, making them highly parallelizable and scalable.

Self-attention allows each part of the input to consider every other part, capturing long-range dependencies in text. This mechanism, combined with positional encoding and layer normalization, forms the backbone of models like GPT, BERT, Claude, and virtually all modern LLMs.

Understanding transformer architecture is fundamental for ML engineers and researchers working on model development, fine-tuning, or optimization. Knowledge of attention mechanisms, multi-head attention, and scaling laws is frequently required in AI research roles.

    Transformer — AI Careers Glossary | We Love AI Jobs