10 min read

So you think you know NDCG?

Table of Contents

Overview

Ever wondered, how items are sorted when you visit Amazon.com and search for your favorite item? Whats hapening behind the scenes? How is an item’s relevance gauged based on where it is ranked after being sorted?

In this blog will cover Normalized Discounted Cumulative Gain (NDCG). Its a metric and quite famously known to evaluate information retrieval systems.

Math First

NDCG@k Formula

Let us start with basics. NDCG@k (and will tell what is @k so do not worry at this point!) is given by the equation below:

NDCG@k=DCG@kIDCG@k\begin{equation} \text{NDCG}@k = \frac{\text{DCG}@k}{\text{IDCG}@k} \end{equation}

As we already are aware of by now, what NDCG stands for. In the above equation DCG stands for Discounted Cumulative Gain and IDCG stands for Ideal Dicsounted Cumulative Gain.

Whats @k?

The k parameter tells us the limit of the list. Simply said, it defines how many items in a sorted list we would like to have a look at.

Items beyond the k items in the list are not considered relevant. Does that mean, those items are irrelevant? The answer is No.

We assume that beyond this k point, we do not find very highly relevant items and therefore the likelihood of relevant engagement (click, like, view, add-to-cart etc.) by the user decreases as we go down the list.

Generally, for an Information Retrieval (IR) system to be considered good it should show relevant results in the first page.

Now what if in the first page we only consider first five items to be relevant? Then 5 becomes the limit or we can say: k = 5.

So when trying to calculate: NDCG@k\text{NDCG}@k for first five items in a list of relevant results we write the metric like: NDCG@5\text{NDCG}@5, similarly for first 10 items it becomes NDCG@10\text{NDCG}@10.

Why is NDCG important

Before we deep dive further, its important to understand the reason why, NDCG@k\text{NDCG}@k is used to evaluate search engines and recommender systems.

Traditionally metrics like precision@k\text{precision}@k do not account for relevance of an item based on position of that item in a list. However, NDCG@k\text{NDCG}@k does that.

As explained in the above section, a relevant item for a query found above in the list is generally considered to have more impact on the user than the items found below in the list and this is what NDCG\text{NDCG} exploits.

Intuition

Let us take an example by mimicking a real-world search system scenario and then expand on DCG@k\text{DCG}@k and IDCG@k\text{IDCG}@k as these are needed to find NDCG@k\text{NDCG}@k.

But, before that it is important to get the intuition right.

Gain

By Gain\text{Gain} it essentially means to convert “Relevance Score” (higher the better) (refer Table 1) into some insightful metric, which tells how useful the item is or one can say how much “gain” this items provides to users. Higher the better.

Gain=2reli1\begin{equation} \text{Gain} = 2^{rel_i} - 1 \end{equation}

Here relirel_i is “Relevance Score” of item at position ii.

Take for example from Table 1 “Product ID = 1”, its “Relevance Score” is 3, its Gain would be 231=72^3 - 1 = 7. However, an item ranked below in the list such as the “Product ID = 5” with “Relevance Score” equal to 1 will have gain equal to 211=12^1 - 1 = 1. This means, items with higher “Relevance score” hold more importance.

You might also sometimes see Gain\text{Gain} referred to as:

reli\begin{equation} rel_i \end{equation}

Discount

Intuitively the idea that as users scroll down they lose attention (assuming users scroll from top-to-bottom), sometimes referred to as position bias1.

Since items later in the list hold less relevance therefore there is a discount as we move down the list.

For the purpose of NDCG\text{NDCG} discount is modeled using a logarithmic discount factor:

1log2(i+1)\begin{equation} \frac{1}{\log_2{(i + 1)}} \end{equation}

Initially there wasn’t any theoretical basis to use logarithmic discount factor except for it being performing great emperically and that logarithmic factor offers a smoother decay as we go down the list.

However, wang et al.,(2013)2 provided theoretical guarantees on the usage of logarithmic scales for NDCG.

So if a relevant item is found at the top, it will be preferred and items at the bottom will be discounted. The importance of items down the list will not be nullified entirely, but will hold less importance than items at the top due to logarithmic discount factor.

Discounted Gain

Combining both Gain\text{Gain} from (2) and Discount\text{Discount} from (4) respectively, we get:

Discounted Gain=2reli1log2(i+1)\begin{equation} \text{Discounted Gain} = \frac{2^{rel_i} - 1}{\log_2{(i + 1)}} \end{equation}

Discounted Cumulative Gain (DCG)

As we do a cumulative sum of items in the ranked list from 1st1^{st} to kthk^{th} we get:

DCG@k
DCG@k=i=1k2reli1log2(i+1)\begin{equation} \text{DCG}@k = \sum_{i=1}^{k} \frac{2^{rel_i} - 1}{\log_2(i+1)} \end{equation}

Ideal Discounted Cumulative Gain (IDCG)

Similarly for IDCG@k we find a cumulative sum of items in the ideal ranked list. For example refer Table 2

IDCG@k
IDCG@k=i=1k2reli1log2(i+1)\begin{equation} \text{IDCG}@k = \sum_{i=1}^{k} \frac{2^{rel_i^*} - 1}{\log_2(i+1)} \end{equation}

Here reli{rel_i} and reli{rel_i^*} are relevance scores for the list returned by search query and judgement list respectively.

Example walk-through to find NDCG@k

Suppose you have a judgement list (also sometimes called “golden sets”3) against a specific query “bluetooth headphones”:

Table 1: Relevance Judgement List for Query “bluetooth headphones”

QueryProduct IDProduct TitleRelevance Score
bluetooth headphones1Sony Bluetooth Headphones3
bluetooth headphones2Bose Wireless Headphones3
bluetooth headphones3JBL Bluetooth Headphones2
bluetooth headphones4Wireless Earbuds2
bluetooth headphones5Wired Headphones1
bluetooth headphones6Portable Bluetooth Speaker0
bluetooth headphones7Gaming Headset0

To understand, in the Table 1 column “Relevance Score” is a graded relevance score, on relevance scale it essentially denotes “3” as highly relevant and “0” as least relevant.

If someone searches for “bluetooth headphones”, ideally the ranked list returned should look like this:

Table 2: Ideal ranked list for Query “bluetooth headphones”

RankProduct TitleRelevance Score
1Sony Bluetooth Headphones3
2Bose Wireless Headphones3
3JBL Bluetooth Headphones2
4Wireless Earbuds2
5Wired Headphones1
6Portable Bluetooth Speaker0
7Gaming Headset0

However, it is possible the list returned is not in an ideal order and might look something like this:

Table 3: Returned ranked list for Query “bluetooth headphones”

RankProduct TitleRelevance Score
1Portable Bluetooth Speaker0
2Wired Headphones1
3JBL Bluetooth Headphones2
4Sony Bluetooth Headphones3
5Wireless Earbuds2
6Gaming Headset0
7Bose Wireless Headphones3

In the Table 3 it can be seen that “highly relevant” items can appear at the end of the list (lower in ranking); Such scenarios are where where NDCG@k\text{NDCG@k} metric is a good fit.

Calculating DCG@k at k = 5

To find NDCG@k\text{NDCG@k}, will first compute DCG@k\text{DCG@k} and to do that will utilize Table 3.

Plugging the numbers into the equation (6) we get:

Table 4: Calculating DCG@5

RankRelevance ScoreCalculationValue
10(201)/log2(2)(2^0-1)/\log_2(2)0
21(211)/log2(3)(2^1-1)/\log_2(3)0.63
32(221)/log2(4)(2^2-1)/\log_2(4)1.50
43(231)/log2(5)(2^3-1)/\log_2(5)3.02
52(221)/log2(6)(2^2-1)/\log_2(6)1.16
60(201)/log2(7)(2^0-1)/\log_2(7)0
73(231)/log2(8)(2^3-1)/\log_2(8)2.33

From the above table we take in the top 5 items in the order of their rank as our k=5k = 5. We add the values derived for these top 5 items and get:

DCG@5=0 + 0.63 + 1.50 + 3.02 + 1.16 = 6.31\begin{equation} \text{DCG@5} = \text{0 + 0.63 + 1.50 + 3.02 + 1.16 = 6.31} \end{equation}

Similarly calculate IDCG@5\text{IDCG@5}.

Calculating IDCG@k at k = 5

We need to now compute IDCG@k\text{IDCG@k} and to do that will utilize Table 2.

Now plug in the numbers into equation (7) we get:

Table 5: Calculating IDCG@5

RankRelevance ScoreCalculationValue
13(231)/log2(2)(2^3-1)/\log_2(2)7
23(231)/log2(3)(2^3-1)/\log_2(3)4.42
32(221)/log2(4)(2^2-1)/\log_2(4)1.50
42(221)/log2(5)(2^2-1)/\log_2(5)1.29
51(211)/log2(6)(2^1-1)/\log_2(6)0.39
60(201)/log2(7)(2^0-1)/\log_2(7)0
70(201)/log2(8)(2^0-1)/\log_2(8)0

We take the top 5 from the above list and calculate:

IDCG@5=7 + 4.42 + 1.50 + 1.29 + 0.39 = 14.6\begin{equation} \text{IDCG@5} = \text{7 + 4.42 + 1.50 + 1.29 + 0.39 = 14.6} \end{equation}

Calculating NDCG@k at k = 5

Final step is to calculate NDCG@5\text{NDCG@5}. Plugin the answer from (8) and (9) into equation (1) we get:

NDCG@5=6.3114.60.43\begin{equation} \text{NDCG@5} = \frac{6.31}{14.6} \approx{0.43} \end{equation}

NDCG lies between 0.00.0 and 1.01.0. In the above case (10) it is quite low as highly relevant items such as sony and bose are too low in the ranked list and items appearing at the top of the list are irrelevant. Such a search/ranking system is likely to perform poorly at-least for the first page.

Citation

Feel free to cite this article if you found it useful.

APA

Mehra, V. (2025, Mar). So you think you know NDCG? vipulmehra.com. https://vipulmehra.com/blog/metrics/ndcg

BibTeX

@article{NDCGMehra2026,
  title = {So you think you know NDCG?},
  author = {Mehra, Vipul},
  journal = {vipulmehra.com}
  year = {2025},
  month = {Mar},
  url = {https://vipulmehra.com/blog/metrics/ndcg}
}

Footnotes

  1. Craswell, N., Zoeter, O., Taylor, M., & Ramsey, B. (2008, February). An experimental comparison of click position-bias models. In Proceedings of the 2008 international conference on web search and data mining (pp. 87-94).

  2. Wang, Y., Wang, L., Li, Y., He, D., & Liu, T. Y. (2013, June). A theoretical analysis of NDCG type ranking measures. In Conference on learning theory (pp. 25-54). PMLR.

  3. Core Concepts — Elasticsearch Learning to Rank documentation. (2023). Readthedocs.io. https://elasticsearch-learning-to-rank.readthedocs.io/en/latest/core-concepts.html#judgments-expression-of-the-ideal-ordering