dc.contributor.author | Dinçer, B.T. | |
dc.contributor.author | Ounis, I. | |
dc.contributor.author | MacDonald, C. | |
dc.date.accessioned | 2020-11-20T16:49:06Z | |
dc.date.available | 2020-11-20T16:49:06Z | |
dc.date.issued | 2014 | |
dc.identifier.isbn | 9783319060279 | |
dc.identifier.issn | 0302-9743 | |
dc.identifier.uri | https://doi.org/10.1007/978-3-319-06028-6_3 | |
dc.identifier.uri | https://hdl.handle.net/20.500.12809/6103 | |
dc.description | City of Amsterdam;et al.;Google;Microsoft Research;Textkernel;The Netherlands Organization for Scientific Research (NWO) | en_US |
dc.description | 36th European Conference on Information Retrieval, ECIR 2014, 13 April 2014 through 16 April 2014, Amsterdam, 105000 | en_US |
dc.description.abstract | The aim of optimising information retrieval (IR) systems using a risk-sensitive evaluation methodology is to minimise the risk of performing any particular topic less effectively than a given baseline system. Baseline systems in this context determine the reference effectiveness for topics, relative to which the effectiveness of a given IR system in minimising the risk will be measured. However, the comparative risk-sensitive evaluation of a set of diverse IR systems - as attempted by the TREC 2013 Web track - is challenging, as the different systems under evaluation may be based upon a variety of different (base) retrieval models, such as learning to rank or language models. Hence, a question arises about how to properly measure the risk exhibited by each system. In this paper, we argue that no model of information retrieval alone is representative enough in this respect to be a true reference for the models available in the current state-of-the-art, and demonstrate, using the TREC 2012 Web track data, that as the baseline system changes, the resulting risk-based ranking of the systems changes significantly. Instead of using a particular system's effectiveness as the reference effectiveness for topics, we propose several remedies including the use of mean within-topic system effectiveness as a baseline, which is shown to enable unbiased measurements of the risk-sensitive effectiveness of IR systems. © 2014 Springer International Publishing Switzerland. | en_US |
dc.item-language.iso | eng | en_US |
dc.publisher | Springer Verlag | en_US |
dc.item-rights | info:eu-repo/semantics/closedAccess | en_US |
dc.title | Tackling biased baselines in the risk-sensitive evaluation of retrieval systems | en_US |
dc.item-type | conferenceObject | en_US |
dc.contributor.department | MÜ | en_US |
dc.contributor.departmentTemp | Dinçer, B.T., Department of Statistics and Computer Engineering, Muğla University, 48000 Muğla, Turkey; Ounis, I., School of Computing Science, University of Glasgow, Glasgow G12 8QQ, United Kingdom; MacDonald, C., School of Computing Science, University of Glasgow, Glasgow G12 8QQ, United Kingdom | en_US |
dc.identifier.doi | 10.1007/978-3-319-06028-6_3 | |
dc.identifier.volume | 8416 LNCS | en_US |
dc.identifier.startpage | 26 | en_US |
dc.identifier.endpage | 38 | en_US |
dc.relation.journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | en_US |
dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |