3 min read

So you think you know NDCG?

Table of Contents

Overview

Ever wondered, how items are sorted when you goto Amazon.com and search for your favorite item? Whats hapening behind the scenes? How is an item’s relevance gauged based on where it is ranked after being sorted?

In this blog will cover Normalized Discounted Cumulative Gain (NDCG). Its a metric and quite famously known to evaluate information retrieval systems.

Math First

NDCG@k Formula

Let us start with basics. NDCG@k (and will tell what is @k so do not worry at this point!) is given by the formula below:

NDCG@k=DCG@kIDCG@k\text{NDCG}@k = \frac{\text{DCG}@k}{\text{IDCG}@k}

As we already are aware of by now, what NDCG stands for. In the above formula DCG stands for Discounted Cumulative Gain and IDCG stands for Ideal Dicsounted Cumulative Gain.

Whats @k?

The k parameter tells us the limit of the list. Simply said, it defines how many items in a sorted list we would like to have a look at. Items beyond the k items in the list are not considered relevant. Does that mean, those items are irrelevant? The answer is No. Basically, we assume that beyond this k point, we do not find very highly relevant items for the use-case at hand. Think of a simple google search result. If the user does not find something in the first page, then anything beyond that point becomes irrelevant, even though chances are that maybe the user might be able to find relevant item in the fourth page? But to be able to make a good and relevant IR system we are always going to be striving to show relevant results in the first page. Maybe, even better - in the first page probably first five items should contain the relevant result? So here 5 becomes the limit or the k parameter.

If finding NDCG@k\text{NDCG}@k with a limit of finding five items in the list, we will write it like: NDCG@5\text{NDCG}@5. Let us understand this in detail.