Overview
Ever wondered, how items are sorted when you visit Amazon.com and search for your favorite item? Whats hapening behind the scenes? How is an item’s relevance gauged based on where it is ranked after being sorted?
In this blog will cover Normalized Discounted Cumulative Gain (NDCG). Its a metric and quite famously known to evaluate information retrieval systems.
Math First
NDCG@k Formula
Let us start with basics. NDCG@k (and will tell what is @k so do not worry at this point!) is given by the equation below:
As we already are aware of by now, what NDCG stands for. In the above equation DCG stands for Discounted Cumulative Gain and IDCG stands for Ideal Dicsounted Cumulative Gain.
Whats @k?
The k parameter tells us the limit of the list. Simply said, it defines how many items in a sorted list we would like to have a look at.
Items beyond the k items in the list are not considered relevant. Does that mean, those items are irrelevant? The answer is No.
We assume that beyond this k point, we do not find very highly relevant items and therefore the likelihood of relevant engagement (click, like, view, add-to-cart etc.) by the user decreases as we go down the list.
Generally, for an Information Retrieval (IR) system to be considered good it should show relevant results in the first page.
Now what if in the first page we only consider first five items to be relevant? Then 5 becomes the limit or we can say: k = 5.
So when trying to calculate: for first five items in a list of relevant results we write the metric like: , similarly for first 10 items it becomes .
Why is NDCG important
Before we deep dive further, its important to understand the reason why, is used to evaluate search engines and recommender systems.
Traditionally metrics like do not account for relevance of an item based on position of that item in a list. However, does that.
As explained in the above section, a relevant item for a query found above in the list is generally considered to have more impact on the user than the items found below in the list and this is what exploits.
Intuition
Let us take an example by mimicking a real-world search system scenario and then expand on and as these are needed to find .
But, before that it is important to get the intuition right.
Gain
By it essentially means to convert “Relevance Score” (higher the better) (refer Table 1) into some insightful metric, which tells how useful the item is or one can say how much “gain” this items provides to users. Higher the better.
Here is “Relevance Score” of item at position .
Take for example from Table 1 “Product ID = 1”, its “Relevance Score” is 3, its Gain would be . However, an item ranked below in the list such as the “Product ID = 5” with “Relevance Score” equal to 1 will have gain equal to . This means, items with higher “Relevance score” hold more importance.
You might also sometimes see referred to as:
Discount
Intuitively the idea that as users scroll down they lose attention (assuming users scroll from top-to-bottom), sometimes referred to as position bias1.
Since items later in the list hold less relevance therefore there is a discount as we move down the list.
For the purpose of discount is modeled using a logarithmic discount factor:
Initially there wasn’t any theoretical basis to use logarithmic discount factor except for it being performing great emperically and that logarithmic factor offers a smoother decay as we go down the list.
However, wang et al.,(2013)2 provided theoretical guarantees on the usage of logarithmic scales for NDCG.
So if a relevant item is found at the top, it will be preferred and items at the bottom will be discounted. The importance of items down the list will not be nullified entirely, but will hold less importance than items at the top due to logarithmic discount factor.
Discounted Gain
Combining both from (2) and from (4) respectively, we get:
Discounted Cumulative Gain (DCG)
As we do a cumulative sum of items in the ranked list from to we get:
DCG@k
Ideal Discounted Cumulative Gain (IDCG)
Similarly for IDCG@k we find a cumulative sum of items in the ideal ranked list. For example refer Table 2
IDCG@k
Here and are relevance scores for the list returned by search query and judgement list respectively.
Example walk-through to find NDCG@k
Suppose you have a judgement list (also sometimes called “golden sets”3) against a specific query “bluetooth headphones”:
Table 1: Relevance Judgement List for Query “bluetooth headphones”
| Query | Product ID | Product Title | Relevance Score |
|---|---|---|---|
| bluetooth headphones | 1 | Sony Bluetooth Headphones | 3 |
| bluetooth headphones | 2 | Bose Wireless Headphones | 3 |
| bluetooth headphones | 3 | JBL Bluetooth Headphones | 2 |
| bluetooth headphones | 4 | Wireless Earbuds | 2 |
| bluetooth headphones | 5 | Wired Headphones | 1 |
| bluetooth headphones | 6 | Portable Bluetooth Speaker | 0 |
| bluetooth headphones | 7 | Gaming Headset | 0 |
To understand, in the Table 1 column “Relevance Score” is a graded relevance score, on relevance scale it essentially denotes “3” as highly relevant and “0” as least relevant.
If someone searches for “bluetooth headphones”, ideally the ranked list returned should look like this:
Table 2: Ideal ranked list for Query “bluetooth headphones”
| Rank | Product Title | Relevance Score |
|---|---|---|
| 1 | Sony Bluetooth Headphones | 3 |
| 2 | Bose Wireless Headphones | 3 |
| 3 | JBL Bluetooth Headphones | 2 |
| 4 | Wireless Earbuds | 2 |
| 5 | Wired Headphones | 1 |
| 6 | Portable Bluetooth Speaker | 0 |
| 7 | Gaming Headset | 0 |
However, it is possible the list returned is not in an ideal order and might look something like this:
Table 3: Returned ranked list for Query “bluetooth headphones”
| Rank | Product Title | Relevance Score |
|---|---|---|
| 1 | Portable Bluetooth Speaker | 0 |
| 2 | Wired Headphones | 1 |
| 3 | JBL Bluetooth Headphones | 2 |
| 4 | Sony Bluetooth Headphones | 3 |
| 5 | Wireless Earbuds | 2 |
| 6 | Gaming Headset | 0 |
| 7 | Bose Wireless Headphones | 3 |
In the Table 3 it can be seen that “highly relevant” items can appear at the end of the list (lower in ranking); Such scenarios are where where metric is a good fit.
Calculating DCG@k at k = 5
To find , will first compute and to do that will utilize Table 3.
Plugging the numbers into the equation (6) we get:
| Rank | Relevance Score | Calculation | Value |
|---|---|---|---|
| 1 | 0 | 0 | |
| 2 | 1 | 0.63 | |
| 3 | 2 | 1.50 | |
| 4 | 3 | 3.02 | |
| 5 | 2 | 1.16 | |
| 6 | 0 | 0 | |
| 7 | 3 | 2.33 |
From the above table we take in the top 5 items in the order of their rank as our . We add the values derived for these top 5 items and get:
Similarly calculate .
Calculating IDCG@k at k = 5
We need to now compute and to do that will utilize Table 2.
Now plug in the numbers into equation (7) we get:
| Rank | Relevance Score | Calculation | Value |
|---|---|---|---|
| 1 | 3 | 7 | |
| 2 | 3 | 4.42 | |
| 3 | 2 | 1.50 | |
| 4 | 2 | 1.29 | |
| 5 | 1 | 0.39 | |
| 6 | 0 | 0 | |
| 7 | 0 | 0 |
We take the top 5 from the above list and calculate:
Calculating NDCG@k at k = 5
Final step is to calculate . Plugin the answer from (8) and (9) into equation (1) we get:
NDCG lies between and . In the above case (10) it is quite low as highly relevant items such as sony and bose are too low in the ranked list and items appearing at the top of the list are irrelevant. Such a search/ranking system is likely to perform poorly at-least for the first page.
Citation
Feel free to cite this article if you found it useful.
APA
Mehra, V. (2025, Mar). So you think you know NDCG? vipulmehra.com. https://vipulmehra.com/blog/metrics/ndcg
BibTeX
@article{NDCGMehra2026,
title = {So you think you know NDCG?},
author = {Mehra, Vipul},
journal = {vipulmehra.com}
year = {2025},
month = {Mar},
url = {https://vipulmehra.com/blog/metrics/ndcg}
}
Footnotes
-
Craswell, N., Zoeter, O., Taylor, M., & Ramsey, B. (2008, February). An experimental comparison of click position-bias models. In Proceedings of the 2008 international conference on web search and data mining (pp. 87-94). ↩
-
Wang, Y., Wang, L., Li, Y., He, D., & Liu, T. Y. (2013, June). A theoretical analysis of NDCG type ranking measures. In Conference on learning theory (pp. 25-54). PMLR. ↩
-
Core Concepts — Elasticsearch Learning to Rank documentation. (2023). Readthedocs.io. https://elasticsearch-learning-to-rank.readthedocs.io/en/latest/core-concepts.html#judgments-expression-of-the-ideal-ordering ↩