Were RNNs all we needed?
(arxiv.org)
The scalability limitations of Transformers regarding sequence length have renewed interest in recurrent sequence models that are parallelizable during training.
The scalability limitations of Transformers regarding sequence length have renewed interest in recurrent sequence models that are parallelizable during training.