Abstract
A central idea of Language Models is that documents (and perhaps queries) are random variables, generated by data-generating functions that are characterized by document (query) parameters. The key new idea of this paper is to model that a relevance judgment is also generated stochastically, and that its data generating function is also governed by those same document and query parameters. The result of this addition is that any available relevance judgments are easily incorporated as additional evidence about the true document and query model parameters. An additional aspect of this approach is that it also resolves the long-standing problem of document-oriented versus query-oriented probabilities. The general approach can be used with a wide variety of hypothesized distributions for documents, queries, and relevance. We test the approach on Reuters Corpus Volume 1, using one set of possible distributions. Experimental results show that the approach does succeed in incorporating relevance data to improve estimates of both document and query parameters, but on this data and for the specific distributions we hypothe-sized, performance was no better than two separate one-sided models. We conclude that the model's theoretical contribution is its integration of relevance models, document models, and query models, and that the potential for additional performance improvement over one-sided methods requires refinements.
Original language | English |
---|---|
Pages (from-to) | 357-380 |
Number of pages | 24 |
Journal | ACM Transactions on Information Systems |
Volume | 22 |
Issue number | 3 |
DOIs | |
State | Published - Jul 2004 |
Externally published | Yes |
Keywords
- Language models
- Probabilistic models
ASJC Scopus subject areas
- Information Systems
- General Business, Management and Accounting
- Computer Science Applications