SUS backprop: linear backpropagation algorithm for long inputs in transformers
(arxiv.org)
It is straightforward to design an unbiased gradient estimator that stochastically cuts the backpropagation flow through any part of a computational graph.
It is straightforward to design an unbiased gradient estimator that stochastically cuts the backpropagation flow through any part of a computational graph.