What is the difference between self-attention and global attention?

Large Language Model - Interview Questions

Both self-attention and global attention are mechanisms used in neural networks to weigh the importance of different parts of the input sequence. However, there are some key differences between them:

Input : Self-attention, as the name suggests, only attends to the input sequence itself, while global attention can attend to external information, such as metadata about the input sequence or information from a different modality.

Computation : Self-attention computes the importance of each element in the input sequence with respect to all other elements in the same sequence, whereas global attention computes the importance of each element in the input sequence with respect to a predefined global context vector.

Weighting : In self-attention, each element in the input sequence contributes to the computation of the weights used to weigh the other elements. In contrast, in global attention, the weights are computed using a predefined context vector, which is typically a weighted sum of all elements in the input sequence.

Usage : Self-attention is commonly used in transformer-based architectures for natural language processing tasks, such as BERT and GPT, whereas global attention is often used in image or video captioning, where a global context vector, such as the image or video representation, can provide additional information to the network.