TY - GEN
T1 - Test theory for assessing IR test collections
AU - Bodoff, David
AU - Li, Pu
PY - 2007
Y1 - 2007
N2 - How good is an IR test collection? A series of papers in recent years has addressed the question by empirically enumerating the consistency of performance comparisons using alternate subsets of the collection. In this paper we propose using Test Theory, which is based on analysis of variance and is specifically designed to assess test collections. Using the method, we not only can measure test reliability after the fact, but we can estimate the test collection's reliability before it is even built or used. We can also determine an optimal allocation of resources before the fact, e.g. whether to invest in more judges or queries. The method, which is in widespread use in the field of educational testing, complements data-driven approaches to assessing test collections. Whereas the data-driven method focuses on test results, test theory focuses on test designs. It offers unique practical results, as well as insights about the variety and implications of alternative test designs.
AB - How good is an IR test collection? A series of papers in recent years has addressed the question by empirically enumerating the consistency of performance comparisons using alternate subsets of the collection. In this paper we propose using Test Theory, which is based on analysis of variance and is specifically designed to assess test collections. Using the method, we not only can measure test reliability after the fact, but we can estimate the test collection's reliability before it is even built or used. We can also determine an optimal allocation of resources before the fact, e.g. whether to invest in more judges or queries. The method, which is in widespread use in the field of educational testing, complements data-driven approaches to assessing test collections. Whereas the data-driven method focuses on test results, test theory focuses on test designs. It offers unique practical results, as well as insights about the variety and implications of alternative test designs.
KW - Information retrieval
KW - Test collections
KW - Test theory
UR - http://www.scopus.com/inward/record.url?scp=36448947171&partnerID=8YFLogxK
U2 - 10.1145/1277741.1277805
DO - 10.1145/1277741.1277805
M3 - Conference contribution
AN - SCOPUS:36448947171
SN - 1595935975
SN - 9781595935977
T3 - Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07
SP - 367
EP - 374
BT - Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07
T2 - 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07
Y2 - 23 July 2007 through 27 July 2007
ER -