Lists: | pgsql-sql |
---|
From: | pete(at)phillipsfamily(dot)freeserve(dot)co(dot)uk |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | problem with subqueries |
Date: | 2002-10-05 22:19:31 |
Message-ID: | 200210052219.g95MJVx01588@phillipsfamily.freeserve.co.uk |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-sql |
Hi
Any help appreciated - I have spent 2 days trying to get this query to
work!
I have an orders database and a customer database.
The orders database has a date field for each order.
Because I want to obtain a monthly breakdown, I created a view called
monthcustomer as this select:
select orders.ord_date, customer.cname,
date_part('month',orders.ord_date) AS "month",
date_part('year',orders.ord_date) AS "year",
orders.number_of_items;
Each month will have multiple numbers of items, so to get a monthly
breakdown I tried this:
select distinct year,month,
(select sum(monthcustomer.number_of_items) from monthcustomer where
monthcustomer.year=m.year and monthcustomer.month=m.month) as NumPots
from monthcustomer m;
This goes off and never comes back - CPU is hitting the top of the
chart! I have to ^C to interrupt it, as it runs for ages (I've left
this run for 10-20 minutes and it is still running).
I have indexes on the columns involved from the original tables.
Any help appreciated.
Pete
PS: Using pgsql 7.2-70 from Suse distribution.
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pete(at)phillipsfamily(dot)freeserve(dot)co(dot)uk |
Cc: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: problem with subqueries |
Date: | 2002-10-06 01:01:33 |
Message-ID: | 6763.1033866093@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-sql |
pete(at)phillipsfamily(dot)freeserve(dot)co(dot)uk writes:
> select distinct year,month,
> (select sum(monthcustomer.number_of_items) from monthcustomer where
> monthcustomer.year=m.year and monthcustomer.month=m.month) as NumPots
> from monthcustomer m;
> This goes off and never comes back -
No surprise, considering the sub-select is going to be evaluated
separately for every row of monthcustomer --- and then most of those
evaluations will be thrown away by the DISTINCT :-(
A straightforward way of reducing the redundant computations would be
to do the DISTINCT first:
select year,month,
(select sum(monthcustomer.number_of_items) from monthcustomer where
monthcustomer.year=m.year and monthcustomer.month=m.month) as NumPots
from
(select distinct year, month from monthcustomer) as m;
But it appears to me that you are reinventing the wheel. Isn't this
query the equivalent of a grouped aggregation --- viz,
select year, month, sum(number_of_items) as NumPots
from monthcustomer
group by year, month
regards, tom lane
From: | Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> |
---|---|
To: | <pete(at)phillipsfamily(dot)freeserve(dot)co(dot)uk> |
Cc: | <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: problem with subqueries |
Date: | 2002-10-06 01:02:09 |
Message-ID: | 20021005175906.R58075-100000@megazone23.bigpanda.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-sql |
On Sat, 5 Oct 2002 pete(at)phillipsfamily(dot)freeserve(dot)co(dot)uk wrote:
> Because I want to obtain a monthly breakdown, I created a view called
> monthcustomer as this select:
>
> select orders.ord_date, customer.cname,
> date_part('month',orders.ord_date) AS "month",
> date_part('year',orders.ord_date) AS "year",
> orders.number_of_items;
>
> Each month will have multiple numbers of items, so to get a monthly
> breakdown I tried this:
>
> select distinct year,month,
> (select sum(monthcustomer.number_of_items) from monthcustomer where
> monthcustomer.year=m.year and monthcustomer.month=m.month) as NumPots
> from monthcustomer m;
>
> This goes off and never comes back - CPU is hitting the top of the
> chart! I have to ^C to interrupt it, as it runs for ages (I've left
That's going to run that inner select once for every row in monthcustomer
probably.
Would
select year, month, sum(monthcustomer.number_of_items) as NumPots frmo
monthcustomer m group by year, month;
have the same effect, get the sum of the items for each year/month
combination along with which year and month?
From: | pete(at)phillipsfamily(dot)freeserve(dot)co(dot)uk |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Cc: | tgl(at)sss(dot)pgh(dot)pa(dot)us, kenzo(at)kennethambrose(dot)com |
Subject: | Re: problem with subqueries |
Date: | 2002-10-06 11:13:41 |
Message-ID: | 200210061113.g96BDfG10468@phillipsfamily.freeserve.co.uk |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-sql |
Hi guys. Thanks for the rapid replies so far.
To answer some of the questions:
>you did not indicate an explicit join - or even a "from" clause for that
>matter- in the example of your create view statement.
My original post was a simplified version. Here is the actual view
creating statement:
create view monthord as select ord_date, extract (month from ord_date)
as month, extract (year from ord_date) as year,r_region,
number_of_items from orders,customer where ccode = codenum;
>But it appears to me that you are reinventing the wheel. Isn't this
>query the equivalent of a grouped aggregation
Yes - but again I was simplifying - I want to run a sub query for each
region, so I get output like this:
year month Reg1 Reg2 Reg3 Reg4
----- ----- ---- ---- ----- ----
1999 Jan 20 45 10 27
1999 Feb 30 43 18 37
...
2002 Oct 7 89 60 17
The subquery I have tried to run is actually this (there is probably a
way to do this all in SQL, but at present I would like to just
understand why my subqueries take so long).
-----start of script ---------------
#!/bin/sh
# prepare a query as shell variables
# set up timeframe
SDATE=`date --date '2 years ago' +'%Y-%m-01'`
EDATE=`date --date 'next month' +'%Y-%m-01'`
# use the QUERY env variable to build the full query
QUERY="select distinct year,month, "
for reg in `psql -U postgres -d product_db -c "select distinct r_code from regionlist order by r_code;"`
do
QUERY="$QUERY (select sum(monthcustomer.number_of_items) from monthcustomer where monthcustomer.year=m.year and monthcustomer.month=m.month and r_region like '$reg') as $reg,"
done
# now add a total column
QUERY="$QUERY (select sum(monthcustomer.number_of_items) from monthcustomer where ord_date >= '$SDATE' and ord_date < '$EDATE' and monthcustomer.year=m.year and monthcustomer.month=m.month) as TOTAL from monthcustomer m;"
# execute the query
psql -U postgres -d newmaggot -c "$QUERY"
-----end of script ---------------
As you can see, I get all the regions, and loop through them building
up a bunch of subqueries as I go along.
Hopefully this will answer some of your questions. I will try some of
the suggestions later on today (such as placing the distinct clause at
the end) but first I have to go and get some decorating done :-( (yes,
that sound you can hear is the crack of my wife's whip!).
Regards,
Pete
--
I do whatever the Voices tell me to do
From: | Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> |
---|---|
To: | <pete(at)phillipsfamily(dot)freeserve(dot)co(dot)uk> |
Cc: | <pgsql-sql(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <kenzo(at)kennethambrose(dot)com> |
Subject: | Re: problem with subqueries |
Date: | 2002-10-06 16:18:19 |
Message-ID: | 20021006090327.Q65664-100000@megazone23.bigpanda.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-sql |
On Sun, 6 Oct 2002 pete(at)phillipsfamily(dot)freeserve(dot)co(dot)uk wrote:
> Hi guys. Thanks for the rapid replies so far.
>
> To answer some of the questions:
>
> >you did not indicate an explicit join - or even a "from" clause for that
> >matter- in the example of your create view statement.
>
> My original post was a simplified version. Here is the actual view
> creating statement:
>
> create view monthord as select ord_date, extract (month from ord_date)
> as month, extract (year from ord_date) as year,r_region,
> number_of_items from orders,customer where ccode = codenum;
>
>
> >But it appears to me that you are reinventing the wheel. Isn't this
> >query the equivalent of a grouped aggregation
>
> Yes - but again I was simplifying - I want to run a sub query for each
> region, so I get output like this:
>
> year month Reg1 Reg2 Reg3 Reg4
> ----- ----- ---- ---- ----- ----
> 1999 Jan 20 45 10 27
> 1999 Feb 30 43 18 37
> ...
> 2002 Oct 7 89 60 17
>
> The subquery I have tried to run is actually this (there is probably a
> way to do this all in SQL, but at present I would like to just
> understand why my subqueries take so long).
Well, you're running <n> subqueries for each row in monthcustomer
because the distinct happens afterwards in your query. So if you've
got 4 regions and 1 total and 100,000 rows in monthcustomer, you're
looking at something on the order of 500,000 subqueries. Doing the
distinct before that step should lower the number to
((#year/month combinations) * (#regions+1)).
In any case, you may be better off with one of:
a) Doing something programatic to turn a result set like:
year|month|region|value
1999|Jan |1 |20
1999|Jan |2 |45
...
into the form you want. The above can be gotten by group by
probably and would require no subqueries.
b) Keeping a summary table that you update via triggers. This
requires a bit of work to get the triggers, but it probably
makes the query faster.