Demystifying 6ND FLOPs
\(\mathrm{FLOPs} = 6 \cdot \mathrm{params} \cdot \mathrm{tokens}.\)This is a nearly magical formula that approximates the floating-point operations required ...
Personal site and blog
Talking about deep learning, programming and school