zeitbach.com

SQL window functions LAG and LEAD

LAG and LEAD allow you to access data from previous or subsequent rows in the same result set without any joins.

Let there be some time series data: a table of city names, dates, and daily maximum temperatures. That’s what we see in the first three columns of the following table (not necessarily ordered).

citydttempprev_tempnext_tempdiff
Leipzig2024-01-010NULL5NULL
Leipzig2024-01-0250-35
Leipzig2024-01-03-35-1-8
Leipzig2024-01-04-1-3NULL2
Paris2024-01-018NULL3NULL
Paris2024-01-023812-5
Paris2024-01-03123NULL9

The remaining columns are derived from the first three with the help of LAG and LEAD (What’s new in SQL:2011, 8.2.1).

SELECT *,
    LAG(temp) OVER city_dt AS prev_temp,
    LEAD(temp) OVER city_dt AS next_temp,
    temp - LAG(temp) OVER city_dt AS diff
FROM temperature
WINDOW city_dt AS
    (PARTITION BY city ORDER BY dt);

What happens here? We partition the table by city and order each partition by date. Now we pick the previous and the next temperature for each row within each partition. If there is no previous or next data point within the same partition, we get NULL. One use case for this is computing differences between adjacent rows.

Why is it fast? The data can be partitioned in linear time using hashes of the PARTITION BY values. After this first step, we can work on each partition in parallel, starting with ORDER BY. If the number of partitions is much smaller than the number of threads, we might want to add some intra-partition parallelism, see Efficient Processing of Window Functions in Analytical SQL Queries (Leis et al., 2015).