Transferring learning to rank models for web search
Özet
Learning to rank techniques provide mechanisms for combining document feature values into learned models that produce effective rankings. However, issues concerning the transferability of learned models between different corpora or subsets of the same corpus are not yet well understood. For instance, is the importance of different feature sets consistent between subsets of a corpus, or whether a learned model obtained on a small subset of the corpus effectively transfer to the larger corpus? By formulating our experiments around two null hypotheses, in this work, we apply a full-factorial experiment design to empirically investigate these questions using the ClueWeb09 and ClueWeb12 corpora, combined with queries from the TREC Web track. Among other observations, our experiments reveal that ClueWeb09 remains an effective choice of training corpus for learning effective models for ClueWeb12, and also that the importance of query independent features varies among the ClueWeb09 and ClueWeb12 corpora. In doing so, this work contributes an important study into the transferability of learning to rank models, as well as empirically-derived best practices for effective retrieval on the ClueWeb12 corpus. © 2015 ACM.