Postgresql inefficient query #3: Restore Points

hoerup · Sep 04, 2025 1:10 pm

Case #07802680 - second query

I have found another query in postgres log that might be worthwile to investigate further, this one is flagged due to temp file
The downside of this particular query is that psql log_temp_files will not log parameters so I don't know if it's a single value or how many you typically send ?

Code: Select all

temporary file: path "base/pgsql_tmp/pgsql_tmp11784.0", size 30031872
SELECT r.restore_point_type AS "Types", max(r.restore_point_date) AS "LatestDate", r.job_id AS "JobId"
	FROM jobs AS j
	INNER JOIN restore_point_configs AS r ON j.id = r.job_id AND j.repository_id = r.repository_id
	WHERE j.id = ANY ($1)
	GROUP BY r.restore_point_type, r.job_id
	ORDER BY max(r.restore_point_date) DESC

If I try with just a single randomly selected JobID

Code: Select all

EXPLAIN ANALYZE 
        SELECT r.restore_point_type AS "Types", max(r.restore_point_date) AS "LatestDate", r.job_id AS "JobId"
	FROM jobs AS j
	INNER JOIN restore_point_configs AS r ON j.id = r.job_id AND j.repository_id = r.repository_id
	WHERE j.id = ANY (ARRAY['058ba577-a13a-4824-9a8e-d45ba9a037ee']::uuid[])
	GROUP BY r.restore_point_type, r.job_id
	ORDER BY max(r.restore_point_date) DESC;



--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2996.23..2996.68 rows=181 width=28) (actual time=35.746..35.751 rows=5 loops=1)
   Sort Key: (max(r.restore_point_date)) DESC
   Sort Method: quicksort  Memory: 25kB
   ->  GroupAggregate  (cost=2985.82..2989.44 rows=181 width=28) (actual time=34.419..35.741 rows=5 loops=1)
         Group Key: r.restore_point_type, r.job_id
         ->  Sort  (cost=2985.82..2986.27 rows=181 width=28) (actual time=34.171..34.584 rows=12196 loops=1)
               Sort Key: r.restore_point_type, r.job_id
               Sort Method: quicksort  Memory: 1242kB
               ->  Nested Loop  (cost=2338.28..2979.03 rows=181 width=28) (actual time=20.053..31.304 rows=12196 loops=1)
                     ->  Index Scan using pk_jobs on jobs j  (cost=0.28..2.50 rows=1 width=32) (actual time=0.013..0.019 rows=1 loops=1)
                           Index Cond: (id = ANY ('{058ba577-a13a-4824-9a8e-d45ba9a037ee}'::uuid[]))
                     ->  Bitmap Heap Scan on restore_point_configs r  (cost=2338.00..2970.82 rows=572 width=44) (actual time=20.029..30.467 rows=12196 loops=1)
                           Recheck Cond: ((job_id = j.id) AND (repository_id = j.repository_id))
                           Heap Blocks: exact=518
                           ->  BitmapAnd  (cost=2338.00..2338.00 rows=572 width=0) (actual time=19.945..19.947 rows=0 loops=1)
                                 ->  Bitmap Index Scan on ix_restore_point_configs_job_id  (cost=0.00..103.88 rows=10860 width=0) (actual time=8.868..8.868 rows=12199 loops=1)
                                       Index Cond: (job_id = j.id)
                                 ->  Bitmap Index Scan on ix_restore_point_configs_repository_id  (cost=0.00..2233.58 rows=237767 width=0) (actual time=10.791..10.791 rows=222241 loops=1)
                                       Index Cond: (repository_id = j.repository_id)
 Planning Time: 0.765 ms
 Execution Time: 36.155 ms

I'm surprised that repository id is part of the JOIN condition, isn't the job_id sufficient, please correct me if i'm mistaken ?
If that's the case you can avoid the index scan of ix_restore_point_configs_repository_id and the BitmapAnd step

But looking even further, it looks like you don't actually need any data from Jobs table lets try to avoid the join entirely

Code: Select all

EXPLAIN ANALYZE 
        SELECT r.restore_point_type AS "Types", max(r.restore_point_date) AS "LatestDate", r.job_id AS "JobId"
	FROM restore_point_configs AS r
	WHERE r.job_id = ANY (ARRAY['058ba577-a13a-4824-9a8e-d45ba9a037ee']::uuid[])
	GROUP BY r.restore_point_type, r.job_id
	ORDER BY max(r.restore_point_date) DESC;
                                                                          QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=14980.49..14990.55 rows=4022 width=28) (actual time=4.439..4.440 rows=5 loops=1)
   Sort Key: (max(restore_point_date)) DESC
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=14699.48..14739.70 rows=4022 width=28) (actual time=4.415..4.433 rows=5 loops=1)
         Group Key: restore_point_type, job_id
         Batches: 1  Memory Usage: 217kB
         ->  Bitmap Heap Scan on restore_point_configs r  (cost=138.73..14593.32 rows=14155 width=28) (actual time=0.358..2.841 rows=12196 loops=1)
               Recheck Cond: (job_id = ANY ('{058ba577-a13a-4824-9a8e-d45ba9a037ee}'::uuid[]))
               Heap Blocks: exact=518
               ->  Bitmap Index Scan on ix_restore_point_configs_job_id  (cost=0.00..135.19 rows=14155 width=0) (actual time=0.309..0.309 rows=12199 loops=1)
                     Index Cond: (job_id = ANY ('{058ba577-a13a-4824-9a8e-d45ba9a037ee}'::uuid[]))
 Planning Time: 0.214 ms
 Execution Time: 4.510 ms
(13 rows)

Now the query is down from 36 seconds to 4 seconds and the data read from disk is significantly lower, which removes the need for a tmp file

hoerup · Post by **hoerup** » Sep 05, 2025 10:47 am this post

> Now the query is down from 36 seconds to 4 seconds and the data read from disk is significantly lower, which removes the need for a tmp file

I of course meant "down from 36milliseconds to 4milliseconds"

Post by **Polina** » Sep 05, 2025 5:41 pm this post

Noted. Please see my comment for #1

R&D Forums

Postgresql inefficient query #3: Restore Points

Re: Postgresql inefficient query #3: Restore Points

Re: Postgresql inefficient query #3: Restore Points

Who is online