Showing posts with label SQL. Show all posts
Showing posts with label SQL. Show all posts

Wednesday, April 01, 2020

Added A Column To Every Table In The Same Schema In PostgresSQL

Sometime, we may find out that we need to add a column to every table in the same schema or same database. For example, we want to add an audit column. The following query is for that.


do $$
declare
    addcolumn record;
    dm_schema VARCHAR(8) := ‘sales_dm';
begin
for addcolumn in
select
      'ALTER TABLE '|| dm_schema ||'.'|| T.targettable ||  ' ADD COLUMN  IF NOT EXISTS last_modified_timestamp timestamp  NULL' as sqlscript
   from
      (
        select tablename as targettable from  pg_tables where schemaname = dm_schema 
      ) t
loop
execute addcolumn.sqlscript;
end loop;
end;
$$;

Sunday, November 11, 2018

SQL Query To Select Row Number, Rank, And Dese Rank

Use Analytic Functions

SELECT e.name as emp_name, d.name dept_name, e.salary,
      ROW_NUMBER() OVER(Partition by d.name ORDER BY e.salary DESC) AS row_num 
      FROM employee e join department d on e.dept_id = d.dept_id
     
 
SELECT e.name as emp_name, d.name dept_name, e.salary,
      RANK() OVER(Partition by d.name ORDER BY e.salary DESC) AS rank
      FROM  employee e join  department d on e.dept_id = d.dept_id
     

SELECT e.name as emp_name, d.name dept_name, e.salary,
      DENSE_RANK() OVER(Partition by d.name ORDER BY e.salary DESC) AS dense_rank 
      FROM  employee e join  department d on e.dept_id = d.dept_id

Without Using Analytic Function
select emp.*,
       (select count(*)
        from employee emp2
        where emp2.dept_id = emp.dept_id  
              and(emp2.salary > emp.salary or emp2.salary = emp.salary  
                   and emp2.emp_id <= emp.emp_id
                  )
       ) as "row_number",
       (select 1 + count(*)
          from employee emp2
          where emp2.dept_id = emp.dept_id and                 emp2.salary > emp.salary 
       ) as "rank",
       (select count(distinct salary)
        from employee emp2
        where emp2.dept_id = emp.dept_id and              emp2.salary >= emp.salary
       ) as "dense_rank"
from employee emp;

Wednesday, April 20, 2011

Load Slowly Changing Dimension Type 2 using Oracle Merge Statement

 Load Slowly Changing Dimension Type 2 using Oracle Merge Statement

By using Oracle Merge statement, we are able to perform an insert and update statements (sometime referred to as “upsert”) in one query. The merge statement primarily useful in data warehousing situations, especially when maintaining Type 2 Slowly changing dimension (SCD2), where large amounts of data are commonly inserted and updated

1. Tables

To explore how the merge statement works, let’s use the following set of tables

Product dimension table: dim_product.


CREATE TABLE DIM_PRODUCT
(
PRODUCT_SID NUMBER NOT NULL,
PRODUCT_ID NUMBER NOT NULL,
PRODUCT_CATEGORY_ID NUMBER,
PRODUCT_NAME VARCHAR2(50 CHAR),
PRODUCT_DESCRIPTION VARCHAR2(200 CHAR),
PRICE NUMBER(8,2),
PRICE_EFFECTIVE_DATE DATE,
LAST_MODIFIED_DATE DATE,
EFFECTIVE_START_DATE DATE,
EFFECTIVE_END_DATE DATE,
IS_CURRENT_ROW VARCHAR2(1 CHAR)
CREATED_DATE DATE,
UPDATED_DATE DATE
);

Source stage table: stg_product.
This is a stage table use to store the newly modified product records from the product table.

CREATE TABLE STG_PRODUCT
(
PRODUCT_ID NUMBER NOT NULL,
PRODUCT_CATEGORY_ID NUMBER,
PRODUCT_NAME VARCHAR2(50 CHAR),
PRODUCT_DESCRIPTION VARCHAR2(200 CHAR),
PRICE NUMBER(8,2),
PRICE_EFFECTIVE_DATE DATE,
LAST_MODIFIED_DATE DATE,
CREATED_DATE DATE
);

Let’s also created a sequence to use for the dim_product dimension table ‘s surrogate keys.

CREATE SEQUENCE s_dim_product
START WITH 1
MAXVALUE 9999999999999999999999999999
MINVALUE 1
NOCYCLE
CACHE 20
NOORDER;

2. SCD Logics and the Merge Query

A slowly changing dimension type 2 (SCD2) is defined to keep track changes of attributes in the underlying source table. While there are many attributes in a table, the business rules decide what changes to track. Here, for demonstration purpose, let’s choose to keep track of changes of each attribute in the underlying product source table, which is to keep history of production name, product description, and product category and price changes.

The merger statement provides the abilities to perform insert and update with only one select statement based upon a join condition. Rows already in the target table that match the join condition are updated; otherwise, a row is inserted using values from a separate subquery. The details of the merge statement can be found from the oracle documentation online.

http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_9016.htm

An SCD2 normally has three additional columns. The data effective start and end date of the record, and a flag to indicate the current record. To maintain histories, a new record is created to store the changing attributes while we keep the exiting records as histories. To help achieve that, a small utility table with only tow records in it is created. In this utility table, one record indicates the insert flag and the other indicates the update flag. Here the table structure:

Utility table: scd_row_type.

CREATE TABLE SCD_ROW_TYPE
(
SCD_ROW_TYPE_ID NUMBER NOT NULL,
SCD_ROW_TYPE_DESCRIPTION VARCHAR2(20 CHAR)
);

After creating the table, insert the following records into the table:

Insert into scd_row_type(scd_row_type_id,scd_row_description) values(1,’For Insert’);
Insert into scd_row_type(scd_row_type_id,scd_row_description) values(2,’For Update’);

Now we are ready to construct the merge query
MERGE INTO dim_product p
USING ( SELECT DECODE(s.scd_row_type_id,1,-6789,m.product_sid) as product_sid,
                   PRODUCT_ID,
                   PRODUCT_CATEGORY_ID,
                   PRODUCT_NAME,
                   PRODUCT_DESCRIPTION,
                   PRICE,
                   PRICE_EFFECTIVE_DATE,
                   LAST_MODIFIED_DATE,
                   CREATED_DATE,
                   m.scd_row_type_id
              FROM    (SELECT dp.product_sid,
                              sp.PRODUCT_ID,
                              sp.PRODUCT_CATEGORY_ID,
                              sp.PRODUCT_NAME,
                              sp.PRODUCT_DESCRIPTION,
                              sp.PRICE,
                              sp.PRICE_EFFECTIVE_DATE,
                              sp.LAST_MODIFIED_DATE,
                              sp.CREATED_DATE,
                              CASE
                                 WHEN dp.product_id IS NULL
                                 THEN
                                    1
                                 WHEN (dp.product_category_id !=
                                          sp.product_category_id
                                       OR dp.product_name != sp.product_name
                                       OR DP.PRODUCT_DESCRIPTION !=
                                             sp.product_description
                                       OR dp.price != sp.price
                                       OR dp.price_effective_date !=
                                             sp.price_effective_date)
                                 THEN
                                    2
                                 ELSE
                                    0
                              END
                                 AS scd_row_type_id
                         FROM    stg_product sp
                              LEFT JOIN
                                 Dim_product dp
                              ON (sp.product_id = dp.product_id and  dp.is_current_row = 'Y')
                     ) m
                   JOIN  scd_row_type s
                   ON (s.scd_row_type_id <= m.scd_row_type_id)
                   ) mp
        ON  (p.product_sid = mp.product_sid)
when matched then
   update set P.EFFECTIVE_END_DATE = mp.LAST_MODIFIED_DATE, is_current_row = 'N', updated_date = sysdate
when NOT matched then
insert (P.PRODUCT_SID,P.PRODUCT_ID,P.PRODUCT_CATEGORY_ID, P.PRODUCT_NAME, P.PRODUCT_DESCRIPTION, P.PRICE, p.PRICE_EFFECTIVE_DATE,P.LAST_MODIFIED_DATE,
 p.effective_start_date,P.EFFECTIVE_END_DATE,is_current_row, created_date,
 updated_date )
 values (s_dim_product.nextval,mp.PRODUCT_ID,mp.PRODUCT_CATEGORY_ID,mp.PRODUCT_NAME,
 mp.PRODUCT_DESCRIPTION, mp.PRICE,mp.price_effective_date,
mp.LAST_MODIFIED_DATE,mp.last_modified_date,
 to_date('2099-12-31 00:00:00','YYYY-MM-DD HH24:MI:SS'),’Y', sysdate,sysdate);
  commit;


3. How it works

To make it easier to understand how the merge statement work when we actually perform some tests with data. First, let’s issue the following insert statements to create two records in the stage table.

truncate table stg_product;

insert into stg_product ( PRODUCT_ID, PRODUCT_CATEGORY_ID,  PRODUCT_NAME,PRODUCT_DESCRIPTION,PRICE,PRICE_EFFECTIVE_DATE,  LAST_MODIFIED_DATE,  CREATED_DATE)
        values (1,8,'Apple iPad 36 GB','Apple iPad 36 GB with case', 800,sysdate,sysdate,sysdate);

insert into stg_product ( PRODUCT_ID, PRODUCT_CATEGORY_ID,  PRODUCT_NAME,PRODUCT_DESCRIPTION,PRICE,PRICE_EFFECTIVE_DATE,  LAST_MODIFIED_DATE,CREATED_DATE)
         values (2,7,'Canon 12 MPIX Digital Camera','Canon 12 MPIX Digital Camera, Zoon Len, Case', 150,sysdate,sysdate,sysdate);

commit;


Check the stage table to make sure the records are there.

Sqlplus>select * from stg_product;

Now run the merge statement which was created in section 2. Once the merge query statement was executed, run the following query to checked the records in the dim_product table.

Sqlplus>select * from dim_product;

There should be two rows in the dimension table with the column effective_start_date, effective_end_date and is_current_row populated. Now, let’s find out how the merge statement handles SCD Type 2 records. To begin, run the following queries against the stage table:

update stg_product set price = 700 , price_effective_date = sysdate, last_modified_date = sysdate where product_id = 1;

insert into stg_product (
PRODUCT_ID, 
PRODUCT_CATEGORY_ID,
PRODUCT_NAME,
PRODUCT_DESCRIPTION,
PRICE,
PRICE_EFFECTIVE_DATE,
LAST_MODIFIED_DATE,
CREATED_DATE)
values (3,6,'EPSon Color Printer',
'EPSon color print with high definition photo prints', 200,sysdate,sysdate,sysdate);
 commit;

The above statements updated a record where product_id equal to 1 to change it price to 700, and inserted one more new record, product_id 3, into the stg_product table. To help understand what the records set looks like from the using clause of the merge statement, let’s copy the query and execute it to get the record set.

SELECT DECODE(s.scd_row_type_id,1,-6789,m.product_sid) as product_sid,
                   PRODUCT_ID,
                   PRODUCT_CATEGORY_ID,
                   PRODUCT_NAME,
                   PRODUCT_DESCRIPTION,
                   PRICE,
                   PRICE_EFFECTIVE_DATE,
                   LAST_MODIFIED_DATE,
                   CREATED_DATE,
                   m.scd_row_type_id,
         s.scd_row_type_id
              FROM    (SELECT dp.product_sid,
                              sp.PRODUCT_ID,
                              sp.PRODUCT_CATEGORY_ID,
                              sp.PRODUCT_NAME,
                              sp.PRODUCT_DESCRIPTION,
                              sp.PRICE,
                              sp.PRICE_EFFECTIVE_DATE,
                              sp.LAST_MODIFIED_DATE,
                              sp.CREATED_DATE,
                              CASE
                                 WHEN dp.product_id IS NULL
                                 THEN
                                    1
                                 WHEN (dp.product_category_id !=
                                          sp.product_category_id
                                       OR dp.product_name != sp.product_name
                                       OR DP.PRODUCT_DESCRIPTION !=
                                             sp.product_description
                                       OR dp.price != sp.price
                                       OR dp.price_effective_date !=
                                             sp.price_effective_date)
                                 THEN
                                    2
                                 ELSE
                                    0
                              END
                                 AS scd_row_type_id
                         FROM    stg_product sp
                              LEFT JOIN
                                 Dim_product dp
                              ON (sp.product_id = dp.product_id and  dp.is_current_row = 'Y')
                     ) m
                   JOIN  scd_row_type s
                   ON (s.scd_row_type_id <= m.scd_row_type_id)


Examine the result set, it should look something like:

PRODUCT_SID PRODUCT_ID PRODUCT_CATEGORY_ID PRODUCT_NAME         PRODUCT_DESCRIPTION                                 PRICE   PRICE_EFFECTIVE_DATE LAST_MODIFIED_DATE  CREATED_DATE   SCD_ROW_TYPE_ID
-6789            1            8            Apple iPad 36 GB     Apple iPad 36 GB with the latest createst           700    4/19/2011 2:11        4/19/2011 2:11     4/19/2011 2:10   2
62               1            8            Apple iPad 36 GB     Apple iPad 36 GB with the latest createst           700    4/19/2011 2:11        4/19/2011 2:11     4/19/2011 2:10   2
                 3            6            EPSon Color Printer  EPSon color print with high definition photo prints 200    4/19/2011 2:11        4/19/2011 2:11     4/19/2011 2:11   1

As you can see, we have two rows for product_id 1, and one row for the new record, product_id 3. Also, at the end of the row, you can also see that the scd_record_type_id column has two ids for product id 1. One is 2 and the other one is 1. for the one the that has a scd_record_type_id equql to 1, the product_sid is changed to –6789. Therefore, this record will be used for insert. The logic that define which record to update or insert was defined in the case statement:

(CASE
   WHEN dp.product_id IS NULL
     THEN  1
   WHEN (dp.product_category_id != sp.product_category_id
          OR dp.product_name != sp.product_name
          OR DP.PRODUCT_DESCRIPTION != sp.product_description
          OR dp.price != sp.price
          OR dp.price_effective_date != sp.price_effective_date)
     THEN  2
   ELSE    0
 END) AS scd_row_type_id


If the product_id is not existing in the dim_product table, it is a new record. Therefore, it is assigned a 1 and it will be inserted. If any of the changes occurred to an existing record, it is an update and is assigned a 2. Otherwise, any identical record will be assign a 0. The join statement:

JOIN  scd_row_type s
                   ON (s.scd_row_type_id <= m.scd_row_type_id)

takes only record with a scd_record_type_id that is 1 or greater since the scd_record_type table only have two records, scd_record_type_id 1 and 2. Therefore, any record that with a scd_record_type = 0 (identically records) will not be included in the final result set. To create a type 2 record, which is a existing record that has one or more new attributes, we change the existing product_sid to a dummy number, -6789, by using the decode statement:

DECODE(s.scd_row_type_id,1,-6789,m.product_sid) as product_sid

We choose this number because we had define the surrogate key of the dim_product table to only use positive value for keys. Hence, it will be inserted into the dimension. The "when matched" and "when not matched" clauses in the merge statement are self-explanatory. We matched the record set with the dim_product table on the product_sid key. If it matched, we set the is_current_row flag to ‘N’ and sent the effective_end_date to the last_modified_date. If it was not matched, records will be insert into the dimension table. Now, run the merge statement in section 2 to load the data into the dimension table.

4. Summary

As it is stated in the Oracle documentation, the MERGE statement can be used to select rows from one or more sources for update or insertion into a table or view. You can specify conditions to determine whether to update or insert into the target table or view.

This statement is a convenient way to combine multiple operations. It lets you avoid multiple INSERT, UPDATE, and DELETE DML statements. Therefore, it is helpful in loading data into a slowly changing dimension type 2 table.

MERGE is a deterministic statement. You cannot update the same row of the target table multiple times in the same MERGE statement. Oracle Database does not implement fine-grained access control during MERGE statements.

 If you are using the fine-grained access control feature on the target table or tables, then use equivalent INSERT and UPDATE statements instead of MERGE to avoid error messages and to ensure correct access control.

Reference:


Oracle Online Documentations

Oracle Database 11g The Complete Reference,Kevin Loney, Oracle Press

The Data Warehouse toolkit, Ralph Kimball and Margy Ross

The Data Warehouse ETL Toolkit, Ralph Kimball and Joe Caserta

Other Reference on the Web

Related Note:

SQL Outer Join

Thursday, March 10, 2011

SQL Outer Join Query

Outer Join Query

SQL OUTER JOIN returns all records from both joined tables even there is no matching record found. There are three types of SQL OUTER JOIN:
  • LEFT OUTER JOIN
  • RIGHT OUTER,
  • FULL OUTER JOIN
In Outer Join, preserved table means that all the rows in the rows in preserved table will be returned from the query results with at least one row. Let's use the following two tables to show the different type of outer joins.



Employee table: employee id and name

Emp_idName
100John
110Mary
111Smith
112Bob
113Jim
114Joan
115Jill

Employee_hours_worked table: daily total hours each employee worked.

Emp_idDateTotal_hours_worked
12009-12-041
1002010-01-123
1102010-01-125
1132010-01-132
1112010-01-134
1132010-01-136
1102010-01-132
1. Left Outer Join

Left outer join preserve the table in the left side. Every row in the preserved table is presented in the result set with at least one result row.

select e.emp_id,e.name, w.date,w.total_hours_worked
from employee e
left join employee_hours_worked w
on e.emp_id = w.emp_id

emp_idnamedatetotal_hours_worked
100 John 1/12/20103
110Mary1/12/20105
110Mary1/13/20102
111Smith1/13/20104
112bob
113Jim1/12/20102
113Jim1/13/20106
114Joan
114Jill

Put search predicate (filter) in the WHERE clause in a left outer join query will filter out the result set.

select e.emp_id,e.name, w.date,w.total_hours_worked
from employee e
left join employee_hours_worked w
on e.emp_id = w.emp_id
where w.total_hours_worked <=4

emp_idnamedatetotal_hours_worked
110Joan1/12/20103
110Mary1/13/20102
111Smith1/13/20104
113Jim1/12/20102

Put search predicate in the ON clause in a left outer join query will filter out only the matched records in the non preserved table.

select e.emp_id,e.name, w.date,w.total_hours_worked
from pcdb.employee e
left join pcdb.employee_hours_worked w
on e.emp_id = w.emp_id and w.total_hours_worked <=4
emp_id name date total_hours_worked

emp_idnamedatetotal_hours_worked
100john1/12/20103
110Mary1/13/20102
111Smith1/13/20104
112Bob
113Jim1/12/20102
114Joan
114Jill

2 Right Outer Join

In Right outer join, the preserved table is on the right with is opposite to Left outer Join.

select e.emp_id,e.name, w.date,w.total_hours_worked
from pcdb.employee e
right outer join pcdb.employee_hours_worked w
on e.emp_id = w.emp_id

emp_idnamedatetotal_hours_worked
100John1/12/20103
110Mary1/12/20105
113Jim1/12/20102
111Smith1/13/20104
113Jim1/13/20102
110Mary1/13/20102
12/4/20091

Notice that the result set include the record that has not emp_id and name (last record).

3. Full Outer Join

Full Outer Join preserved both tables

select e.emp_id,e.name, w.date,w.total_hours_worked
from pcdb.employee e
full outer join pcdb.employee_hours_worked w
on e.emp_id = w.emp_id

emp_idnamedatetotal_hours_worded
100John1/12/20103
110Mary1/12/20105
110Mary1/13/20102
111Smith1/13/20104
112Bob
113Jim1/12/20102
113Jim1/13/20106
114Joan
115Jim
12/4/20091

If a database server (like MySQL) dose not support full outer join, the query can be written with an union and a where clasue.

select e.emp_id,e.name, w.date,w.total_hours_worked
from pcdb.employee e
left outer join pcdb.employee_hours_worked w
on e.emp_id = w.emp_id
union
select e.emp_id,e.name, w.date,w.total_hours_worked
from pcdb.employee e
right outer join pcdb.employee_hours_worked w
on e.emp_id = w.emp_id
where e.emp_id is null

The where clause in the second query ,'where e.emp_id is null', keep all the the records that only exists in the right table, employee_hours_worked.

Reference


Friday, April 30, 2010

SQL: SELECT THE FIRST N RECORDS

Assuming that we want to select the first 8 registrations from the following sample User table

IdFirst_NameLast_NameCityRegistration_Date
1JohnSmithNew York2009-01-01
2AdamKingLondon2008-12-01
3LinLinBeijing2009-12-01

Let's start with a MySQL query, which will use the LIMIT clause:

     Select * from user order by registration_date limit 8;

The above query will give the first 8 users by registration date.

Now, if we want to have the first 8 users from New York, you can change the query to this:

     Select * from user WHERE city=’New York’ ORDER BY registration_date limit 8;

Now let’s try the following query:

    SELECT * FROM
       ( SELECT * FROM user ORDER BY registration_date LIMIT 8 ) As  first_8
    WHERE city='New York';

This above query result actually returns the users from New York from the first 8 registrations. The number of records return will be between 0 and 8 in reality because the WHERE clause filter the records from the sub query result set.

Since each database venders use different non-standard result limit syntax, the above query needs to rewrite to accommodate the particular syntax . For example, in Oracle, the query can be rewritten in follow form:

   Select * from user order by registration_date where rownum >= 8;

And in SQL Server, it can be written like this:

   Select TOP 8 * from user order by registration_date;
 Reference:



Monday, February 23, 2009

SQL: Find Maximum and Minimum Values

Table Fruit

id fruit    cultivar       price
1 Apple FUJI          0.99
2 Apple limbertwig  1.50
3 Apple Macintosh  2.00
4 Orange valencia    1.29
5 Orange Sunkis      1.99
6 Pear bradford       2.99
7 Pear Bartlett         2.45
8 cherry Chelan       3.89
9 Cherry Bang         1.99

1. Find the most expensive (maximum price) fruit of each type. List them in a form of

fruit cultivar price
Apple Macintosh 2.00
cherry Chelan 3.89
Orange Sunkist 1.99
Pear Bradford 2.99


1) Use max() function

Most databases, MySql, Oracle, MS SQL Server, etc, has build in function to get the max value.

a) We can use the max() function to get
the maximum price of each fruit and then use a self-join to get the row.

select f.fruit, f.cultivar, f.price
from (
select fruit, max(price) as maxprice
from fruit group by fruit
) as f2 inner join fruit as f on f.fruit = f2.fruit and f.price = f2.maxprice;


b) We can also use the max() function with a correlated subquery.
This can be much less efficient.

select fruit, cultivar, price
from fruit f
where price = (select max(price) from fruit as f2 where f2.fruit = f.fruit);

2) Without using the max() function

We can also use only self-join to get the result.

SELECT f.fruit, f.cultivar, f.price
FROM fruit f
LEFT JOIN fruit f2 ON (f.fruit = f2.fruit
AND f.price <= f2.price)
GROUP BY f.fruit, f.price
 HAVING COUNT(*) = 1 ;

The following query is for demonstration only. It is an inefficient query. Also, it will not produce correct result if there are more than one record with the same value (price).

select fruit, cultivar, price
from fruit f
where price = (select price from fruit as f2 where f2.fruit = f.fruit order by price desc limit 1);

2. Find the Minimum Price of each fruit

1) Use min() function a) using a self-join to get the row.
select f.fruit, f.cultivar, f.price
  from ( select fruit, min(price) as minprice from fruit group by fruit ) as f2
 inner join fruit as f
 on f.fruit = f2.fruit and f.price = f2.minprice;

b) using a correlated subquery.

select fruit, cultivar, price
 from fruit f
where price = (select min(price) from fruit as f2 where f2.fruit = f.fruit);

2) Without using the min() function

SELECT f.fruit, f.cultivar, f.price
FROM fruit f
LEFT JOIN fruit f2 ON (f.fruit = f2.fruit AND f.price = f2.price)
GROUP BY f.fruit, f.price
HAVING COUNT(*) = 1;