How good is an IR test collection? A series of papers in recent years has addressed the question by empirically enumerating the consistency of performance comparisons using alternate subsets of the collection. In this paper we propose using Test Theory, which is based on analysis of variance and is specifically designed to assess test collections. Using the method, we not only can measure test reliability after the fact, but we can estimate the test collection's reliability before it is even built or used. We can also determine an optimal allocation of resources before the fact, e.g. whether to invest in more judges or queries. The method, which is in widespread use in the field of educational testing, complements data-driven approaches to assessing test collections. Whereas the data-driven method focuses on test results, test theory focuses on test designs. It offers unique practical results, as well as insights about the variety and implications of alternative test designs.