
Quick Take: Anthropic just open-sourced a game-changing toolkit letting researchers and devs literally map the “thoughts” of LLMs like Gemma and Llama. By generating “attribution graphs” from internal model workings (specifically, cross-layer MLP transcoders), these tools reveal the step-by-step computational pathways models take. It’s a massive boost for LLM interpretability,… Read More