Lists: | pgsql-performance |
---|
From: | Felipe López Montes <xocas89(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Simple query with Planner underestimating rows. |
Date: | 2025-01-28 19:29:18 |
Message-ID: | CACJPJu-0u221Bker1yn=AJ3xJbW0j29mSM=EvkA_SXecv75hqw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
Hi all,
I am using PostgreSQL 17.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
12.4.0, 64-bit.
I have been facing this issue past weeks and I don't know what else to try,
I have a very simple query where the planner is underestimating the number
of rows. So instead of estimating the ~ actual number of rows (5120), it
estimates only 1.
For the sake of clarity and simplicity, I have disabled the nestloop join
in the session because it involved a gather clause and parallel workers and
was still underestimating rows, so the same problem happens with nestloop
strategy too. Instead now the planner goes for a merge join ( if you still
prefer me to send you the nestloop plan, I can do):
SELECT t1.participant_identifier,
t2.participant_identifier,
t3.participant_identifier
FROM public.table1 t1
INNER JOIN public.table2 t2
ON t2.participant_identifier = t1.participant_identifier
INNER JOIN public.table3 t3
ON t3.participant_identifier = t1.participant_identifier
AND t3.programme_identifier = t2.programme_identifier;
Plan:
[image: image.png]
Table definitions:
create table table1
(
participant_identifier uuid not null
constraint participant_identifier_unique_idx
primary key
);
create table table2
(
participant_identifier uuid not null,
programme_identifier uuid not null,
constraint participant_identifier_programme_identifier_unique_idx
primary key (participant_identifier, programme_identifier)
);
create index programme_identifier_idx
on table2 (programme_identifier);
create index participant_identifier_idx
on table2 (participant_identifier);
create table table3
(
id varchar(18) not null
constraint table3_pk
primary key,
programme_identifier uuid,
participant_identifier uuid
);
create index participant_identifier_programme_identifier_idx
on table3 (participant_identifier, programme_identifier);
Pg_class info:
[image: image.png]
Additional info:
SELECT COUNT(*) FROM table3 WHERE participant_identifier IS NULL AND
programme_identifier IS NOT NULL; --295
SELECT COUNT(*) FROM table3 WHERE participant_identifier IS NOT NULL
AND programme_identifier IS NULL; --3122
SELECT COUNT(*) FROM table3 WHERE participant_identifier IS NULL AND
programme_identifier IS NULL; -- 15438
SELECT COUNT(*) FROM table3 WHERE participant_identifier IS NOT NULL
AND programme_identifier IS NOT NULL; -- 63339
SELECT COUNT(DISTINCT programme_identifier ) FROM table3; -- 61939
SELECT COUNT(DISTINCT participant_identifier ) FROM table3; -- 63364
SELECT COUNT(*)
FROM (
SELECT DISTINCT programme_identifier, participant_identifier
FROM table3
) subquery; --65057
SELECT COUNT(DISTINCT programme_identifier ) FROM table2; -- 62582
SELECT COUNT(DISTINCT participant_identifier ) FROM table2; -- 62120
- I have also tried creating a compound partial index on table3 with
participant_identifier and programme_identifier with a condition for
non-null values but same estimation problem.
- These tables do not receive regular updates nor deletes as I have put
them in a separate schema for debugging purposes, and besides values
previously shown in table3, there are not null values on other join fields
on the rest of the tables as you can see with the not null constraint.
Autovacuum is also enabled, I will attach the settings later on.
- In terms of Hardware, this queries are being ran in an AWS RDS with 2
CPU, 4GB RAM, instance class db.t3.medium
I have tried several things:
- Creating extended statistics.
- ANALYZE various times on all the tables.
- REINDEX.
- VACUUM ANALYZE.
- Increasing statistics on join fields.
Thank you for your help, I would appreciate some guidance :).
Cheers,
Felipe.
Settings:
[
{
"name": "allow_alter_system",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "application_name",
"current_setting": "PhpStorm 2024.2.4",
"source": "session"
},
{
"name": "archive_library",
"current_setting": "rds_archive",
"source": "configuration file"
},
{
"name": "archive_mode",
"current_setting": "on",
"source": "configuration file"
},
{
"name": "archive_timeout",
"current_setting": "5min",
"source": "configuration file"
},
{
"name": "autovacuum_analyze_scale_factor",
"current_setting": "0.05",
"source": "configuration file"
},
{
"name": "autovacuum_max_workers",
"current_setting": "3",
"source": "configuration file"
},
{
"name": "autovacuum_naptime",
"current_setting": "15s",
"source": "configuration file"
},
{
"name": "autovacuum_vacuum_cost_limit",
"current_setting": "200",
"source": "configuration file"
},
{
"name": "autovacuum_vacuum_scale_factor",
"current_setting": "0.1",
"source": "configuration file"
},
{
"name": "autovacuum_work_mem",
"current_setting": "117495kB",
"source": "configuration file"
},
{
"name": "checkpoint_completion_target",
"current_setting": "0.9",
"source": "configuration file"
},
{
"name": "client_encoding",
"current_setting": "UTF8",
"source": "client"
},
{
"name": "compute_query_id",
"current_setting": "auto",
"source": "configuration file"
},
{
"name": "DateStyle",
"current_setting": "ISO, MDY",
"source": "client"
},
{
"name": "debug_logical_replication_streaming",
"current_setting": "buffered",
"source": "configuration file"
},
{
"name": "debug_parallel_query",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "default_toast_compression",
"current_setting": "lz4",
"source": "configuration file"
},
{
"name": "effective_cache_size",
"current_setting": "1879920kB",
"source": "configuration file"
},
{
"name": "enable_nestloop",
"current_setting": "off",
"source": "session"
},
{
"name": "enable_presorted_aggregate",
"current_setting": "on",
"source": "configuration file"
},
{
"name": "extra_float_digits",
"current_setting": "3",
"source": "session"
},
{
"name": "fsync",
"current_setting": "on",
"source": "configuration file"
},
{
"name": "full_page_writes",
"current_setting": "on",
"source": "configuration file"
},
{
"name": "gss_accept_delegation",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "hot_standby",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "huge_pages",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "icu_validation_level",
"current_setting": "error",
"source": "configuration file"
},
{
"name": "idle_in_transaction_session_timeout",
"current_setting": "1d",
"source": "configuration file"
},
{
"name": "ignore_invalid_pages",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "jit",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "krb_server_keyfile",
"current_setting": "/rdsdbdata/config/keytab",
"source": "configuration file"
},
{
"name": "listen_addresses",
"current_setting": "*",
"source": "command line"
},
{
"name": "lo_compat_privileges",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "log_autovacuum_min_duration",
"current_setting": "10s",
"source": "configuration file"
},
{
"name": "log_checkpoints",
"current_setting": "on",
"source": "configuration file"
},
{
"name": "log_destination",
"current_setting": "stderr",
"source": "configuration file"
},
{
"name": "log_directory",
"current_setting": "/rdsdbdata/log/error",
"source": "configuration file"
},
{
"name": "log_file_mode",
"current_setting": "0644",
"source": "configuration file"
},
{
"name": "log_filename",
"current_setting": "postgresql.log.%Y-%m-%d-%H",
"source": "configuration file"
},
{
"name": "log_line_prefix",
"current_setting": "%t:%r:%u(at)%d:[%p]:",
"source": "configuration file"
},
{
"name": "log_rotation_age",
"current_setting": "1h",
"source": "configuration file"
},
{
"name": "log_timezone",
"current_setting": "UTC",
"source": "configuration file"
},
{
"name": "log_truncate_on_rotation",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "logging_collector",
"current_setting": "on",
"source": "configuration file"
},
{
"name": "maintenance_work_mem",
"current_setting": "64MB",
"source": "configuration file"
},
{
"name": "max_connections",
"current_setting": "403",
"source": "configuration file"
},
{
"name": "max_locks_per_transaction",
"current_setting": "64",
"source": "configuration file"
},
{
"name": "max_parallel_apply_workers_per_subscription",
"current_setting": "2",
"source": "configuration file"
},
{
"name": "max_parallel_workers",
"current_setting": "8",
"source": "configuration file"
},
{
"name": "max_prepared_transactions",
"current_setting": "0",
"source": "configuration file"
},
{
"name": "max_replication_slots",
"current_setting": "20",
"source": "configuration file"
},
{
"name": "max_stack_depth",
"current_setting": "6MB",
"source": "configuration file"
},
{
"name": "max_wal_senders",
"current_setting": "35",
"source": "configuration file"
},
{
"name": "max_wal_size",
"current_setting": "6GB",
"source": "configuration file"
},
{
"name": "max_worker_processes",
"current_setting": "8",
"source": "configuration file"
},
{
"name": "min_wal_size",
"current_setting": "192MB",
"source": "configuration file"
},
{
"name": "port",
"current_setting": "5432",
"source": "configuration file"
},
{
"name": "rds.blue_green_replication_type",
"current_setting": "physical",
"source": "configuration file"
},
{
"name": "rds.cte_materialize_mode",
"current_setting": "default",
"source": "configuration file"
},
{
"name": "rds.delegated_extension_allow_drop_cascade",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "rds.enable_pgactive",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "rds.force_autovacuum_logging_level",
"current_setting": "warning",
"source": "configuration file"
},
{
"name": "rds.internal_databases",
"current_setting": "rdsadmin,template0",
"source": "configuration file"
},
{
"name": "rds.logical_replication",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "rds.rds_reserved_connections",
"current_setting": "4",
"source": "configuration file"
},
{
"name": "recovery_init_sync_method",
"current_setting": "syncfs",
"source": "configuration file"
},
{
"name": "recovery_prefetch",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "remove_temp_files_after_crash",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "reserved_connections",
"current_setting": "2",
"source": "configuration file"
},
{
"name": "scram_iterations",
"current_setting": "4096",
"source": "configuration file"
},
{
"name": "search_path",
"current_setting": "public",
"source": "session"
},
{
"name": "send_abort_for_crash",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "send_abort_for_kill",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "session_preload_libraries",
"current_setting": "",
"source": "configuration file"
},
{
"name": "shared_buffers",
"current_setting": "939960kB",
"source": "configuration file"
},
{
"name": "shared_preload_libraries",
"current_setting": "rdsutils,pg_tle,pg_stat_statements",
"source": "configuration file"
},
{
"name": "ssl",
"current_setting": "on",
"source": "configuration file"
},
{
"name": "ssl_ca_file",
"current_setting": "/rdsdbdata/rds-metadata/ca-cert.pem",
"source": "configuration file"
},
{
"name": "ssl_cert_file",
"current_setting": "/rdsdbdata/rds-metadata/server-cert.pem",
"source": "configuration file"
},
{
"name": "ssl_key_file",
"current_setting": "/rdsdbdata/rds-metadata/server-key.pem",
"source": "configuration file"
},
{
"name": "ssl_min_protocol_version",
"current_setting": "TLSv1.2",
"source": "configuration file"
},
{
"name": "summarize_wal",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "superuser_reserved_connections",
"current_setting": "3",
"source": "configuration file"
},
{
"name": "sync_replication_slots",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "synchronous_commit",
"current_setting": "on",
"source": "configuration file"
},
{
"name": "TimeZone",
"current_setting": "UTC",
"source": "client"
},
{
"name": "track_activity_query_size",
"current_setting": "4kB",
"source": "configuration file"
},
{
"name": "track_functions",
"current_setting": "pl",
"source": "configuration file"
},
{
"name": "track_io_timing",
"current_setting": "on",
"source": "session"
},
{
"name": "track_wal_io_timing",
"current_setting": "on",
"source": "configuration file"
},
{
"name": "unix_socket_directories",
"current_setting": "/tmp",
"source": "configuration file"
},
{
"name": "unix_socket_group",
"current_setting": "rdsdb",
"source": "configuration file"
},
{
"name": "unix_socket_permissions",
"current_setting": "0666",
"source": "command line"
},
{
"name": "update_process_title",
"current_setting": "on",
"source": "configuration file"
},
{
"name": "vacuum_buffer_usage_limit",
"current_setting": "962kB",
"source": "configuration file"
},
{
"name": "vacuum_cost_page_miss",
"current_setting": "5",
"source": "configuration file"
},
{
"name": "vacuum_failsafe_age",
"current_setting": "1200000000",
"source": "configuration file"
},
{
"name": "vacuum_multixact_failsafe_age",
"current_setting": "1200000000",
"source": "configuration file"
},
{
"name": "wal_compression",
"current_setting": "zstd",
"source": "configuration file"
},
{
"name": "wal_keep_size",
"current_setting": "2GB",
"source": "configuration file"
},
{
"name": "wal_level",
"current_setting": "replica",
"source": "configuration file"
},
{
"name": "wal_receiver_create_temp_slot",
"current_setting": "off",
"source": "configuration file"
},
{
"name": "wal_receiver_timeout",
"current_setting": "30s",
"source": "configuration file"
},
{
"name": "wal_sender_timeout",
"current_setting": "30s",
"source": "configuration file"
}
]
From: | Greg Sabino Mullane <htamfids(at)gmail(dot)com> |
---|---|
To: | Felipe López Montes <xocas89(at)gmail(dot)com> |
Cc: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Simple query with Planner underestimating rows. |
Date: | 2025-01-29 02:12:03 |
Message-ID: | CAKAnmmKQcrF_psSgVy++hUfv02R0h8hgUo7fpnxmO6UFSvEYrQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
On Tue, Jan 28, 2025 at 5:30 PM Felipe López Montes <xocas89(at)gmail(dot)com>
wrote:
> For the sake of clarity and simplicity, I have disabled the nestloop join
> in the session because it involved a gather clause and parallel workers and
> was still underestimating rows, so the same problem happens with nestloop
> strategy too. Instead now the planner goes for a merge join ( if you still
> prefer me to send you the nestloop plan, I can do):
>
Yes, please, send the exact plan you are having problems with. Also, what
exactly is the performance issue? It seems your *second best plan* is
running in 36 milliseconds?
Cheers,
Greg
P.S. In the future, you can use this to quickly grab relevant settings:
select name, current_setting(name) from pg_settings where source <>
'default';
From: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
---|---|
To: | Felipe López Montes <xocas89(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org |
Cc: | htamfids(at)gmail(dot)com |
Subject: | Re: Simple query with Planner underestimating rows. |
Date: | 2025-01-29 03:05:50 |
Message-ID: | a383df07-989d-4903-8eb7-c93d4a16b628@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
On 1/29/25 02:29, Felipe López Montes wrote:
> Hi all,
>
> I am using PostgreSQL 17.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
> 12.4.0, 64-bit.
>
> I have been facing this issue past weeks and I don't know what else to
> try, I have a very simple query where the planner is underestimating the
> number of rows. So instead of estimating the ~ actual number of rows
> (5120), it estimates only 1.
Your query represents a typical PostgreSQL issue: Here, I see two
'almost' unique columns: participant_identifier and
programme_identifier. The result is that the join cardinality on a
unique column is predicted to be close to the size of the smaller relation.
But in the second join, you have a join by two columns with low selectivity:
ON t3.participant_identifier = t1.participant_identifier
AND t3.programme_identifier = t2.programme_identifier;
Postgres doesn't gather dependency statistics on two or more columns and
just multiplies the low selectivities of these clauses, reducing the
number of rows to a possible minimum - 1.
What you can do? Right now, maybe only pg_hint_plan may help in such a
situation. However, if you provide some test cases, we may check the
forgotten feature [1], which enables extended statistics in join clause
estimations and may push development efforts in that direction.
[1] using extended statistics to improve join estimates
/message-id/flat/c8c0ff31-3a8a-7562-bbd3-78b2ec65f16c%40enterprisedb.com
--
regards, Andrei Lepikhov
From: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
---|---|
To: | Felipe López Montes <xocas89(at)gmail(dot)com> |
Cc: | "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Simple query with Planner underestimating rows. |
Date: | 2025-01-29 08:57:15 |
Message-ID: | 0b75eb10-5a48-4bea-a5d1-898c52dcb932@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
On 1/29/25 15:32, Felipe López Montes wrote:
> Hi,
>
> Thank you for your reply,
>
> I have already tried creating such statistics on t3.programme_identifier
> and t3.participant_identifier and the plan is the same, however I cannot
> create them for the right part of the join as they are from different
> tables (t1 and t2)
Of course, because for now, a join clause can't be estimated by extended
statistics. It applies only to a scan filter (clause referencing only
one relation).
I have meant that with the development patch [1] applied, you may create
two statistics on t3(participant_identifier,programme_identifier) and
t2(participant_identifier,programme_identifier). These statistics would
then be used to estimate the join clause and may resolve the problem.
[1] using extended statistics to improve join estimates
/message-id/flat/c8c0ff31-3a8a-7562-bbd3-78b2ec65f16c%40enterprisedb.com
--
regards, Andrei Lepikhov