What is LLM beam search?

Large Language Model - Interview Questions

LLM beam search is a decoding algorithm used to generate text from large language models. The goal of beam search is to find the most likely sequence of words given a set of input tokens. The algorithm works by iteratively generating the most probable sequence of words, one token at a time. At each step, the algorithm keeps track of the k most probable sequences (i.e., beams), where k is a hyperparameter that determines the width of the beam.

The beams are ranked according to their probabilities, and the algorithm continues to expand each beam by generating the most probable next token for each beam. The process continues until the end-of-sequence token is generated or a maximum sequence length is reached.

The final output is the most probable sequence of tokens found among the k beams. Beam search is a widely used decoding algorithm in large language models and is often used in conjunction with other techniques such as temperature sampling and nucleus sampling to generate diverse and high-quality text outputs.