Skip to content

Mixture of Experts (MoE)

A model architecture with multiple specialized sub-networks (experts), where a gating mechanism routes each input to only a subset. MoE models can have very large total parameter counts while keeping inference efficient since only some experts activate per token.

Related terms

TransformerParameterScaling Laws
← Back to glossary