Mixture of Experts (MoE)

A model architecture with multiple specialized sub-networks (experts), where a gating mechanism routes each input to only a subset. MoE models can have very large total parameter counts while keeping inference efficient since only some experts activate per token.