LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language?
In Part 1, I described how duplicating a block of seven middle layers in Qwen2-72B — no weight changes, no training — produced the #1 model on the HuggingFace Open LLM Leaderboard. The method, which I called RYS (Repeat Your Self), was discovered using nothin…
In Part 1, I described how duplicating a block of seven middle layers in Qwen2-72B no weight changes, no training produced the #1 model on the HuggingFace Open LLM Leaderboard. The method, which I ca… [+25487 chars]