Skip to main content

Window functions in SQL Server

Two guys were quarreling.


I interfered to solve the problem. Though I knew Robert was right, I decided to be very polite to James and make him understand he was wrong. I tried. But, he was shouting at me and told "Prove me then!". In order to do that, I created a table which contains Robert De Niro movies.


I showed him the table and asked him to count. He said, "What if the table has thousands of rows. Will you ask me to sit and count? You count and tell me. I am too busy".

Point! I should not have asked him to do that. What should I do now to show the result? I have to show the director names and the number of movies directed by them. Thinking about this, I was walking in a vegetable market and saw this vegetable vendor.


Thanks to her. I have implemented the idea borrowed from her.


I showed this result to James. I explained him, "See, Brian De Palma has directed Robert De Niro only 3 times but Martin Scorsese 9 times". I thought the problem was solved. He asked me, "Who are these Brian De Palma and Martin Scorsese?"


I: "Brian is the one who directed 'The Untouchables'"

James: "How can I believe you?"

I: "I just showed you"

James: "When did you? Your result has only the names of the directors and the number of movies. I WANT THEIR NAMES WITH THE MOVIE NAMES AND COUNT"

I changed the query as follows:


I: "See, Brian De Palma directed 'The Untouchables'. Will you accept now at least?"

James: "Brian might have directed 'The Untouchables'. But what I told was correct. He has directed Robert De Niro the most"

I: "What!!! you can easily count and check this"

James: "I told you I can't count"


I got it. Even he knew that he was wrong. But he was not ready to accept that. He wanted me to show something like this:


Somehow I have to prove him by showing both the movie name and total movie count against the director's name in a single row. He is playing with my inability :(

When you group anything, the data get rolled up. For example, the vegetable vendor might have had 50 single units of vegetables. But, after grouping them, the count of grouped units might become 10. So, when you do group by, the result is always lesser than or equal to the total number of original rows. Definitely, our requirement can't be achieved through GROUP BY.

Ah.. I got it. We can achieve this through WINDOW FUNCTIONS in SQL Server. Now, before jumping on to window functions, let's see why they are given the name WINDOW.

The entire table is split into different windows based on any criteria. Here, we split it by the DIRECTOR. This process is called PARTITION. Please note this is very much similar to GROUP BY. But, PARTITION BY and GROUP BY are different.

PARTITION BY is analytic (rows remains the same while aggregation is calculated) while GROUP BY is aggregate (data are rolled up).


Let's get the desired result:


James, I will come to you soon. Let me explain the syntax to my friends first.

What is window function in this query? over(partition by Director) or COUNT(Movie)It is COUNT(Movie).

Yes, I got your question. Isn't COUNT() a normal aggregate function? Yes of course, but the use of OVER() clause after that makes it a window function.


Yes, all left handers have power only on right hand. Left hand is just for supporting the bat. Vice versa for right handers. Similarly, window functions do not define the windows. They just produce and supply aggregations/ranks/values (in part 2). OVER(PARTITION BY) clause defines the windows actually. What will happen if I don't define any partitions and just use OVER()?


No windows. COUNT() acts on the whole HOUSE. But, when we add PARTITION BY, windows are created for each Director and we get the aggregated result for each window. Simple. 

Window functions are initiated by OVER() clause. Following are the three concepts behind it:

1) OVER(PARTITION BY) - defines windows
2) OVER(ORDER BY) - orders rows inside each window based on column(s)
3) ROWS/RANGE - defines window frame

Let me prove James that he is wrong. Where is he? Oh, he is again fighting with Robert. A different topic this time.


Comments

Popular posts from this blog

AWS Route53 - Private Hosted Zone

AWS - Error - An error occurred (ExpiredToken) when calling the DescribeStacks operation: The security token included in the request is expired

Error:   An error occurred (ExpiredToken) when calling the DescribeStacks operation: The security token included in the request is expired. Reason: It occurred when I ran a MAKE command with a profile having expired token (security credentials) Fix: Generate new security credentials (aws sts assume-role) and run the command again

AWS CloudTrail

AWS CloudTrail is an API monitoring service.  It records activities in your account. We can log those activities in S3 bucket It gives visibility to user activities e.g., if you want to know who created an EC2 instance, you can get the answer using CloudTrail Using CloudTrail, you can track changes to AWS resources in your accounts