Naturally, this is a single-authored paper: "Fast Transformer Decoding: One Write-Head is All You Need," Noam Shazeer
mat sat in the garden implementing self attention models from scratch. "it's all reshapes & matmul & dot products" he said. "there must be a terser way". noam appeared. "md,hdk−>hmk! hk,hmk−>hm!" he yelled & clapped. mat was enlightened. #einsumKoan
Fast Transformer Decoding: One Write-Head is All You Need. (arXiv:1911.02150v1 []) #NLProc