Hypothesis Testing for the Risk-Sensitive Evaluation of Retrieval Systems

Dincer, B. Taner; Macdonald, Craig; Ounis, Iadh

dc.contributor.author	Dincer, B. Taner
dc.contributor.author	Macdonald, Craig
dc.contributor.author	Ounis, Iadh
dc.date.accessioned	2020-11-20T16:19:00Z
dc.date.available	2020-11-20T16:19:00Z
dc.date.issued	2014
dc.identifier.isbn	978-1-4503-2259-1
dc.identifier.uri	https://doi.org/10.1145/2600428.2609625
dc.identifier.uri	https://hdl.handle.net/20.500.12809/3678
dc.description	37th Annual International ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval - JUL 06-11, 2014 - Gold Coast, AUSTRALIA	en_US
dc.description	Dincer, Bekir Taner/0000-0002-0660-7239; Ounis, Iadh/0000-0003-4701-3223	en_US
dc.description	WOS: 000450717600003	en_US
dc.description.abstract	The aim of risk-sensitive evaluation is to measure when a given information retrieval (IR) system does not perform worse than a corresponding baseline system for any topic. This paper argues that risk-sensitive evaluation is akin to the underlying methodology of the Student's t test for matched pairs. Hence, we introduce a risk-reward tradeoff measure T-Risk that generalises the existing U-Risk measure (as used in the TREC 2013 Web track's risk-sensitive task) while being theoretically grounded in statistical hypothesis testing and easily interpretable. In particular, we show that T-Risk is a linear transformation of the t statistic, which is the test statistic used in the Student's t test. This inherent relationship between T-Ri(sk) and the t statistic, turns risk-sensitive evaluation from a descriptive analysis to a fully-fledged inferential analysis. Specifically, we demonstrate using past TREC data, that by using the inferential analysis techniques introduced in this paper, we can (1) decide whether an observed level of risk for an IR system is statistically significant, and thereby infer whether the system exhibits a real risk, and (2) determine the topics that individually lead to a significant level of risk. Indeed, we show that the latter permits a state-of-the-art learning to rank algorithm (Lamb-daMART) to focus on those topics in order to learn effective yet risk-averse ranking systems.	en_US
dc.description.sponsorship	ACM Special Interest Grp Informat Retrieval, Baidu, Google, Microsoft Res, Tourism & Events Queensland, eBay, Huawei, Seznam cz, Facebook, IBM, Pivotal, Yahoo, Labs, Yandex, Queensland Univ Technol, RMIT Univ, Univ Melbourne, Univ Otago	en_US
dc.item-language.iso	eng	en_US
dc.publisher	Assoc Computing Machinery	en_US
dc.item-rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Risk-Sensitive Evaluation	en_US
dc.subject	Student's T Test	en_US
dc.title	Hypothesis Testing for the Risk-Sensitive Evaluation of Retrieval Systems	en_US
dc.item-type	conferenceObject	en_US
dc.contributor.department	MÜ	en_US
dc.contributor.departmentTemp	[Dincer, B. Taner] Mugla Univ, Dept Stat & Comp Engn, Mugla, Turkey -- [Macdonald, Craig; Ounis, Iadh] Univ Glasgow, Sch Comp Sci, Glasgow, Lanark, Scotland	en_US
dc.identifier.doi	10.1145/2600428.2609625
dc.identifier.startpage	23	en_US
dc.identifier.endpage	32	en_US
dc.relation.journal	Sigir'14: Proceedings of the 37Th International Acm Sigir Conference on Research and Development in Information Retrieval	en_US
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US

Bu öğenin dosyaları:

Dosyalar	Boyut	Biçim	Göster
Bu öğe ile ilişkili dosya yok.

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Scopus İndeksli Yayınlar Koleksiyonu [6219]
WoS İndeksli Yayınlar Koleksiyonu [6466]

Basit öğe kaydını göster