11 Apr 2026Worth readingA paper on pruning transformer attention heads during inference without retraining.#links #ai