Siamese BERT-Based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset

Matěj Kocián, Jakub Náplava, Daniel Štancl, Vladimír Kadlec

[IAAI-22] Deployed Highly Innovative Applications of AI
Abstract: Web search engines focus on serving highly relevant results within hundreds of milliseconds. Pre-trained language transformer models such as BERT are therefore hard to use in this scenario due to their high computational demands. We present our real-time approach to the document ranking problem leveraging a BERT-based siamese architecture. The model is already deployed in a commercial search engine and it improves production performance by more than 3%. For further research and evaluation, we release DaReCzech, a unique data set of 1.6 million Czech user query-document pairs with manually assigned relevance levels. We also release Small-E-Czech, an Electra-small language model pre-trained on a large Czech corpus. We believe this data will support endeavours both of search relevance and multilingual-focused research communities.

Introduction Video

Sessions where this paper appears

  • Poster Session 1

    Thu, February 24 4:45 PM - 6:30 PM (+00:00)
    Red 6
    Add to Calendar

  • Poster Session 8

    Sun, February 27 12:45 AM - 2:30 AM (+00:00)
    Red 6
    Add to Calendar