The release of ChatGPT in November 2022 sparked an explosion of interest in post-training and an avalanche of new preference optimization (PO) methods.These methods claim superior alignment by virtue of better correspondence with human pairwise preferences, often measured by LLM-judges.In this work, we attempt to answer the following question – do LLM-judge preferences translate to progress on o..