Were RNNs all we needed? (arxiv.org)
The scalability limitations of Transformers regarding sequence length have renewed interest in recurrent sequence models that are parallelizable during training.