Forcing an ARRAY() in postgres subqueries

Quick note to self, when making a complicated query, comparing to data in a subquery, the postgres query planner will sometimes decide to perform the entire outer query without the comparison, cache the results, and then run the inner query.

Example:

SELECT DISTINCT
...
FROM ...
INNER JOIN ... ON ...
WHERE
...
AND ...
AND ...id IN (
SELECT user_id
FROM ....
WHERE ...
)

Once this gets complicated enough, with large enough tables, the postgres planner may decide to run the entire outer query without filtering on the id. We can encourage it to do this by forcing the results of the inner query into an array, the equivalent of running it ourselves and simply saying WHERE id = any(array[1335, 1336, 1337]), where postgres will not hesitate to do the comparison on the fly, potentially excluding 99% of the dataset from being stored in cache.

Potentially 50x faster example for this scenario, depending on your data and the mood of the query planner:

SELECT DISTINCT
...
FROM ...
INNER JOIN ... ON ...
WHERE
...
AND ...
AND ...id = ANY ( ARRAY (
SELECT user_id
FROM ....
WHERE ...
) )

Go play with your local query planner.

Leave a Reply

Your email address will not be published. Required fields are marked *