Mamba 2

#untaged

Sunday, June 23, 2024 6:02:36 PM UTC

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Okay, it is not mamba 2 but a theory

This paper mainly focuses on proofing Attention Mechanism is Tensor Contraction; and by doing so, they bring the transformer architecture and SSM together.( I think...

I always think Attention is the process of taking two tensors to generate a scalar and using the scalar to access the relationship of these two. Tensor Contraction is a more generalized summary.

There are some other contributions, such as,

Decomposing the Low-Rank Block. Make it more efficiently.

And ... I am not fully read the paper actually...