This document explains how to run evaluations with statistical testing enabled, combine results across runs, and interpret the dashboard's comparison features. Choose this based on your compute budget ...