postgresql insert into select无法使用并行查询的解决

本文信息基于PG13.1。

从PG9.6开始支持并行查询。PG11开始支持CREATE TABLE … AS、SELECT INTO以及CREATE MATERIALIZED VIEW的并行查询。

先说结论：

换用create table as 或者 select into 或者导入导出。

首先跟踪如下查询语句的执行计划：

1	select count (*) from test t1,test1 t2 where t1.id = t2.id ;

								
									 postgres=# explain analyze   select   count  (*)   from   test t1,test1 t2   where   t1.id = t2.id ; 

									                                       QUERY PLAN                                     

									 -------------------------------------------------------------------------------------------------------------------------------------------------------- 

									    Finalize Aggregate (cost=34244.16..34244.17   rows  =1 width=8) (actual   time  =683.246..715.324   rows  =1 loops=1) 

									     -> Gather (cost=34243.95..34244.16   rows  =2 width=8) (actual   time  =681.474..715.311   rows  =3 loops=1) 

									        Workers Planned: 2 

									        Workers Launched: 2 

									        ->   Partial   Aggregate (cost=33243.95..33243.96   rows  =1 width=8) (actual   time  =674.689..675.285   rows  =1 loops=3) 

									           -> Parallel Hash   Join   (cost=15428.00..32202.28   rows  =416667 width=0) (actual   time  =447.799..645.689   rows  =333333 loops=3) 

									              Hash Cond: (t1.id = t2.id) 

									              -> Parallel Seq Scan   on   test t1 (cost=0.00..8591.67   rows  =416667 width=4) (actual   time  =0.025..74.010   rows  =333333 loops=3) 

									              -> Parallel Hash (cost=8591.67..8591.67   rows  =416667 width=4) (actual   time  =260.052..260.053   rows  =333333 loops=3) 

									                 Buckets: 131072 Batches: 16 Memory Usage: 3520kB 

									                 -> Parallel Seq Scan   on   test1 t2 (cost=0.00..8591.67   rows  =416667 width=4) (actual   time  =0.032..104.804   rows  =333333 loops=3) 

									    Planning   Time  : 0.420 ms 

									    Execution   Time  : 715.447 ms 

									 (13   rows  )

可以看到走了两个Workers。

下边看一下 insert into select：

								
									 postgres=# explain analyze   insert   into   va   select   count  (*)   from   test t1,test1 t2   where   t1.id = t2.id ;      

									                                     QUERY PLAN                                   

									 -------------------------------------------------------------------------------------------------------------------------------------------------- 

									    Insert   on   va (cost=73228.00..73228.02   rows  =1 width=4) (actual   time  =3744.179..3744.187   rows  =0 loops=1) 

									     -> Subquery Scan   on   "*SELECT*"   (cost=73228.00..73228.02   rows  =1 width=4) (actual   time  =3743.343..3743.352   rows  =1 loops=1) 

									        -> Aggregate (cost=73228.00..73228.01   rows  =1 width=8) (actual   time  =3743.247..3743.254   rows  =1 loops=1) 

									           -> Hash   Join   (cost=30832.00..70728.00   rows  =1000000 width=0) (actual   time  =1092.295..3511.301   rows  =1000000 loops=1) 

									              Hash Cond: (t1.id = t2.id) 

									              -> Seq Scan   on   test t1 (cost=0.00..14425.00   rows  =1000000 width=4) (actual   time  =0.030..421.537   rows  =1000000 loops=1) 

									              -> Hash (cost=14425.00..14425.00   rows  =1000000 width=4) (actual   time  =1090.078..1090.081   rows  =1000000 loops=1) 

									                 Buckets: 131072 Batches: 16 Memory Usage: 3227kB 

									                 -> Seq Scan   on   test1 t2 (cost=0.00..14425.00   rows  =1000000 width=4) (actual   time  =0.021..422.768   rows  =1000000 loops=1) 

									    Planning   Time  : 0.511 ms 

									    Execution   Time  : 3745.633 ms 

									 (11   rows  )

可以看到并没有Workers的指示，没有启用并行查询。

即使开启强制并行，也无法走并行查询。

								
									 postgres=#   set   force_parallel_mode =  on  ; 

									 SET 

									 postgres=# explain analyze   insert   into   va   select   count  (*)   from   test t1,test1 t2   where   t1.id = t2.id ; 

									                                     QUERY PLAN                                   

									 -------------------------------------------------------------------------------------------------------------------------------------------------- 

									    Insert   on   va (cost=73228.00..73228.02   rows  =1 width=4) (actual   time  =3825.042..3825.049   rows  =0 loops=1) 

									     -> Subquery Scan   on   "*SELECT*"   (cost=73228.00..73228.02   rows  =1 width=4) (actual   time  =3824.976..3824.984   rows  =1 loops=1) 

									        -> Aggregate (cost=73228.00..73228.01   rows  =1 width=8) (actual   time  =3824.972..3824.978   rows  =1 loops=1) 

									           -> Hash   Join   (cost=30832.00..70728.00   rows  =1000000 width=0) (actual   time  =1073.587..3599.402   rows  =1000000 loops=1) 

									              Hash Cond: (t1.id = t2.id) 

									              -> Seq Scan   on   test t1 (cost=0.00..14425.00   rows  =1000000 width=4) (actual   time  =0.034..414.965   rows  =1000000 loops=1) 

									              -> Hash (cost=14425.00..14425.00   rows  =1000000 width=4) (actual   time  =1072.441..1072.443   rows  =1000000 loops=1) 

									                 Buckets: 131072 Batches: 16 Memory Usage: 3227kB 

									                 -> Seq Scan   on   test1 t2 (cost=0.00..14425.00   rows  =1000000 width=4) (actual   time  =0.022..400.624   rows  =1000000 loops=1) 

									    Planning   Time  : 0.577 ms 

									    Execution   Time  : 3825.923 ms 

									 (11   rows  )

原因在官方文档有写：

The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. As an exception, the commands CREATE TABLE … AS, SELECT INTO, and CREATE MATERIALIZED VIEW which create a new table and populate it can use a parallel plan.

解决方案有如下三种：

1.select into

								
									 postgres=# explain analyze   select   count  (*)   into   vaa   from   test t1,test1 t2   where   t1.id = t2.id ; 

									                                       QUERY PLAN                                     

									 -------------------------------------------------------------------------------------------------------------------------------------------------------- 

									    Finalize Aggregate (cost=34244.16..34244.17   rows  =1 width=8) (actual   time  =742.736..774.923   rows  =1 loops=1) 

									     -> Gather (cost=34243.95..34244.16   rows  =2 width=8) (actual   time  =740.223..774.907   rows  =3 loops=1) 

									        Workers Planned: 2 

									        Workers Launched: 2 

									        ->   Partial   Aggregate (cost=33243.95..33243.96   rows  =1 width=8) (actual   time  =731.408..731.413   rows  =1 loops=3) 

									           -> Parallel Hash   Join   (cost=15428.00..32202.28   rows  =416667 width=0) (actual   time  =489.880..700.830   rows  =333333 loops=3) 

									              Hash Cond: (t1.id = t2.id) 

									              -> Parallel Seq Scan   on   test t1 (cost=0.00..8591.67   rows  =416667 width=4) (actual   time  =0.033..87.479   rows  =333333 loops=3) 

									              -> Parallel Hash (cost=8591.67..8591.67   rows  =416667 width=4) (actual   time  =266.839..266.840   rows  =333333 loops=3) 

									                 Buckets: 131072 Batches: 16 Memory Usage: 3520kB 

									                 -> Parallel Seq Scan   on   test1 t2 (cost=0.00..8591.67   rows  =416667 width=4) (actual   time  =0.058..106.874   rows  =333333 loops=3) 

									    Planning   Time  : 0.319 ms 

									    Execution   Time  : 783.300 ms 

									 (13   rows  )

2.create table as

								
									 postgres=# explain analyze   create   table   vb   as   select   count  (*)   from   test t1,test1 t2   where   t1.id = t2.id ; 

									                                      QUERY PLAN                                     

									 ------------------------------------------------------------------------------------------------------------------------------------------------------- 

									    Finalize Aggregate (cost=34244.16..34244.17   rows  =1 width=8) (actual   time  =540.120..563.733   rows  =1 loops=1) 

									     -> Gather (cost=34243.95..34244.16   rows  =2 width=8) (actual   time  =537.982..563.720   rows  =3 loops=1) 

									        Workers Planned: 2 

									        Workers Launched: 2 

									        ->   Partial   Aggregate (cost=33243.95..33243.96   rows  =1 width=8) (actual   time  =526.602..527.136   rows  =1 loops=3) 

									           -> Parallel Hash   Join   (cost=15428.00..32202.28   rows  =416667 width=0) (actual   time  =334.532..502.793   rows  =333333 loops=3) 

									              Hash Cond: (t1.id = t2.id) 

									              -> Parallel Seq Scan   on   test t1 (cost=0.00..8591.67   rows  =416667 width=4) (actual   time  =0.018..57.819   rows  =333333 loops=3) 

									              -> Parallel Hash (cost=8591.67..8591.67   rows  =416667 width=4) (actual   time  =189.502..189.503   rows  =333333 loops=3) 

									                 Buckets: 131072 Batches: 16 Memory Usage: 3520kB 

									                 -> Parallel Seq Scan   on   test1 t2 (cost=0.00..8591.67   rows  =416667 width=4) (actual   time  =0.023..77.786   rows  =333333 loops=3) 

									    Planning   Time  : 0.189 ms 

									    Execution   Time  : 565.448 ms 

									 (13   rows  )

3.或者通过导入导出的方式，例如：

1 2	psql -h localhost -d postgres -U postgres -c "select count(*) from test t1,test1 t2 where t1.id = t2.id " -o result.csv -A -t -F "," psql -h localhost -d postgres -U postgres -c "COPY va FROM 'result.csv' WITH (FORMAT CSV, DELIMITER ',', HEADER FALSE, ENCODING 'windows-1252')"

一些场景下也会比非并行快。

到此这篇关于 postgresql insert into select无法使用并行查询的解决的文章就介绍到这了,更多相关postgresql insert into select并行查询内容请搜索服务器之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持服务器之家！

原文链接：https://blog.csdn.net/pg_hgdb/article/details/112297250

声明：本文来自网络，不代表【好得很程序员自学网】立场，转载请注明出处：http://haodehen.cn/did229602

更新时间：2023-05-14 阅读：109次