Attention Wasn't All We Needed
(stephendiehl.com)
There's a lot of modern transformer techniques that have been developed since the original Attention Is All You Need paper. Let's look at some of the most important ones that have been developed over the years and try to implement the basic ideas as succintly as possible. We'll use the Pytorch framework for most of the examples.
There's a lot of modern transformer techniques that have been developed since the original Attention Is All You Need paper. Let's look at some of the most important ones that have been developed over the years and try to implement the basic ideas as succintly as possible. We'll use the Pytorch framework for most of the examples.