Postgresql inefficient query #1: Cleanup

hoerup · Sep 04, 2025 12:33 pm

Case #07804056 - original text

We are seeing the following error in postgresql log

Seems like maintenance query times out and data is accumulating -

This could probably be made in a smarter way - eg several smaller delete batches that looped until there was nothing further to clean up

With the current failure mode the data will accumulate, with a potential of other queries getting slower due to increased data size

Code: Select all

	DELETE FROM job_session_messages AS j
	WHERE j.job_session_id IN (
	    SELECT j0.id
	    FROM job_sessions AS j0
	    WHERE j0.end_time <= $1 AND j0.status = ANY ($2)
	)

and a combination of the next 2 updates

Just did a small test to get an overview of the size of the pending deletes, query build with the logged parameters

Code: Select all

SELECT count(*) 
FROM job_session_messages AS j
WHERE j.job_session_id IN (
	SELECT j0.id
        FROM job_sessions AS j0
        WHERE j0.end_time <= 638596224007975371 AND j0.status = ANY ('{2,0,4,3}')
);
   count
-----------
 151234372
(1 row)

So its actually 151 Mio rows it tries to delete in 1 query

Any body with programming experience will probably agree that trying to do very large deletes in one big swoop is more or less doomed to go wrong

Instead use several minor deletes like this - a couple of these run manually got us through the delete backlog and a following VACCUUM got our DB size back to a more reasonable level

Code: Select all

WITH messages_to_delete AS (
        SELECT j.id
        FROM job_session_messages AS j
        WHERE j.job_session_id IN (
                SELECT j0.id
                FROM job_sessions AS j0
                WHERE j0.end_time <= 638596224007975371 AND j0.status = ANY ('{2,0,4,3}')
        )
        LIMIT 1000000
)
DELETE FROM job_session_messages
WHERE id IN (SELECT id FROM messages_to_delete);

But you should probably increase the max wait time for maintenance queries anyway.

While we're at it, maintenance on job_session_objects also seems to occasionally fail in a similar way although but looks like it's able to recover by it self after a few retries

Code: Select all

	DELETE FROM job_session_objects AS j
        WHERE j.job_session_id IN (
            SELECT j0.id
            FROM job_sessions AS j0
            WHERE j0.end_time <= $1 AND j0.status = ANY ($2)
        )

Post by **Polina** » Sep 05, 2025 5:40 pm this post

Hi Torben,

This, and the two other findings that you shared today, are actually very interesting, and I already passed all three to our RND team for review. It will take some time, but once I get any feedback from our engineering, I'll get back to you.

Thanks!

R&D Forums

Postgresql inefficient query #1: Cleanup

Re: Postgresql inefficient query #1: Cleanup

Who is online