Mamba 2

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Pasted image 20240623180325.png

Okay, it is not mamba 2 but a theory

This paper mainly focuses on proofing Attention Mechanism is Tensor Contraction; and by doing so, they bring the transformer architecture and SSM together.( I think...

I always think Attention is the process of taking two tensors to generate a scalar and using the scalar to access the relationship of these two. Tensor Contraction is a more generalized summary.

There are some other contributions, such as,

Pasted image 20240623181432.png

Decomposing the Low-Rank Block. Make it more efficiently.

And ... I am not fully read the paper actually...