What is the difference between greedy decoding and sampling?

Large Language Model - Interview Questions

In the context of large language models, greedy decoding and sampling are two methods used to generate text.

Greedy decoding : Greedy decoding is a method where the model generates the word with the highest probability at each step of generation. The model keeps generating words until it reaches a stopping criterion, such as generating a predefined number of words or a special end-of-sentence token.

Sampling : Sampling is a method where the model randomly samples a word from the probability distribution of the predicted words at each step of generation. Sampling allows for more diverse and creative text generation, as it can result in the model generating unexpected or novel text.