Relevance models to help estimate document and query parameters

Research output: Contribution to journalArticlepeer-review

Abstract

A central idea of Language Models is that documents (and perhaps queries) are random variables, generated by data-generating functions that are characterized by document (query) parameters. The key new idea of this paper is to model that a relevance judgment is also generated stochastically, and that its data generating function is also governed by those same document and query parameters. The result of this addition is that any available relevance judgments are easily incorporated as additional evidence about the true document and query model parameters. An additional aspect of this approach is that it also resolves the long-standing problem of document-oriented versus query-oriented probabilities. The general approach can be used with a wide variety of hypothesized distributions for documents, queries, and relevance. We test the approach on Reuters Corpus Volume 1, using one set of possible distributions. Experimental results show that the approach does succeed in incorporating relevance data to improve estimates of both document and query parameters, but on this data and for the specific distributions we hypothe-sized, performance was no better than two separate one-sided models. We conclude that the model's theoretical contribution is its integration of relevance models, document models, and query models, and that the potential for additional performance improvement over one-sided methods requires refinements.

Original languageEnglish
Pages (from-to)357-380
Number of pages24
JournalACM Transactions on Information Systems
Volume22
Issue number3
DOIs
StatePublished - Jul 2004
Externally publishedYes

Keywords

  • Language models
  • Probabilistic models

ASJC Scopus subject areas

  • Information Systems
  • General Business, Management and Accounting
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Relevance models to help estimate document and query parameters'. Together they form a unique fingerprint.

Cite this