SQL window functions LAG and LEAD

2024-01-15

LAG and LEAD allow you to access data from previous or subsequent rows in the same result set without any joins.

Let there be some time series data: a table of city names, dates, and daily maximum temperatures. That’s what we see in the first three columns of the following table (not necessarily ordered).

city	dt	temp	prev_temp	next_temp	diff
Leipzig	2024-01-01	0	NULL	5	NULL
Leipzig	2024-01-02	5	0	-3	5
Leipzig	2024-01-03	-3	5	-1	-8
Leipzig	2024-01-04	-1	-3	NULL	2
Paris	2024-01-01	8	NULL	3	NULL
Paris	2024-01-02	3	8	12	-5
Paris	2024-01-03	12	3	NULL	9

The remaining columns are derived from the first three with the help of LAG and LEAD (What’s new in SQL:2011, 8.2.1).

SELECT *,
    LAG(temp) OVER city_dt AS prev_temp,
    LEAD(temp) OVER city_dt AS next_temp,
    temp - LAG(temp) OVER city_dt AS diff
FROM temperature
WINDOW city_dt AS
    (PARTITION BY city ORDER BY dt);

What happens here? We partition the table by city and order each partition by date. Now we pick the previous and the next temperature for each row within each partition. If there is no previous or next data point within the same partition, we get NULL. One use case for this is computing differences between adjacent rows.

Why is it fast? The data can be partitioned in linear time using hashes of the PARTITION BY values. After this first step, we can work on each partition in parallel, starting with ORDER BY. If the number of partitions is much smaller than the number of threads, we might want to add some intra-partition parallelism, see Efficient Processing of Window Functions in Analytical SQL Queries (Leis et al., 2015).