Tackling biased baselines in the risk-sensitive evaluation of retrieval systems

Dinçer, B.T.; Ounis, I.; MacDonald, C.

dc.contributor.author	Dinçer, B.T.
dc.contributor.author	Ounis, I.
dc.contributor.author	MacDonald, C.
dc.date.accessioned	2020-11-20T16:49:06Z
dc.date.available	2020-11-20T16:49:06Z
dc.date.issued	2014
dc.identifier.isbn	9783319060279
dc.identifier.issn	0302-9743
dc.identifier.uri	https://doi.org/10.1007/978-3-319-06028-6_3
dc.identifier.uri	https://hdl.handle.net/20.500.12809/6103
dc.description	City of Amsterdam;et al.;Google;Microsoft Research;Textkernel;The Netherlands Organization for Scientific Research (NWO)	en_US
dc.description	36th European Conference on Information Retrieval, ECIR 2014, 13 April 2014 through 16 April 2014, Amsterdam, 105000	en_US
dc.description.abstract	The aim of optimising information retrieval (IR) systems using a risk-sensitive evaluation methodology is to minimise the risk of performing any particular topic less effectively than a given baseline system. Baseline systems in this context determine the reference effectiveness for topics, relative to which the effectiveness of a given IR system in minimising the risk will be measured. However, the comparative risk-sensitive evaluation of a set of diverse IR systems - as attempted by the TREC 2013 Web track - is challenging, as the different systems under evaluation may be based upon a variety of different (base) retrieval models, such as learning to rank or language models. Hence, a question arises about how to properly measure the risk exhibited by each system. In this paper, we argue that no model of information retrieval alone is representative enough in this respect to be a true reference for the models available in the current state-of-the-art, and demonstrate, using the TREC 2012 Web track data, that as the baseline system changes, the resulting risk-based ranking of the systems changes significantly. Instead of using a particular system's effectiveness as the reference effectiveness for topics, we propose several remedies including the use of mean within-topic system effectiveness as a baseline, which is shown to enable unbiased measurements of the risk-sensitive effectiveness of IR systems. © 2014 Springer International Publishing Switzerland.	en_US
dc.item-language.iso	eng	en_US
dc.publisher	Springer Verlag	en_US
dc.item-rights	info:eu-repo/semantics/closedAccess	en_US
dc.title	Tackling biased baselines in the risk-sensitive evaluation of retrieval systems	en_US
dc.item-type	conferenceObject	en_US
dc.contributor.department	MÜ	en_US
dc.contributor.departmentTemp	Dinçer, B.T., Department of Statistics and Computer Engineering, Muğla University, 48000 Muğla, Turkey; Ounis, I., School of Computing Science, University of Glasgow, Glasgow G12 8QQ, United Kingdom; MacDonald, C., School of Computing Science, University of Glasgow, Glasgow G12 8QQ, United Kingdom	en_US
dc.identifier.doi	10.1007/978-3-319-06028-6_3
dc.identifier.volume	8416 LNCS	en_US
dc.identifier.startpage	26	en_US
dc.identifier.endpage	38	en_US
dc.relation.journal	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	en_US
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US

Bu öğenin dosyaları:

Dosyalar	Boyut	Biçim	Göster
Bu öğe ile ilişkili dosya yok.

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Scopus İndeksli Yayınlar Koleksiyonu [6219]

Basit öğe kaydını göster