A major architectural update to llama.cpp, merged on April 18, cuts VRAM usage by up to 40% and boosts token throughput by as ...