Hacker News with Generative AI: Vision Transformers

Your ViT Is Secretly an Image Segmentation Model (arxiv.org)
Vision Transformers (ViTs) have shown remarkable performance and scalability across various computer vision tasks.
The Speed of VITs and CNNs (eyer.be)
It is often stated that because of the quadratic self-attention, ViTs aren't practical at higher resolution.
Vision Transformers Need Registers (openreview.net)