Showing posts with label Data Warehouse. Show all posts
Showing posts with label Data Warehouse. Show all posts

Friday, June 15, 2012

Create Date Dimension Table in MySQL

Date Dimension Table

A date dimension is used to stored attributes of a calendar date. By assigning a day foreign key to a fact table makes it more efficient to generate date related query results. There many ways to define the table. More attributes can be added base on the data warehouse model and requirements. Here is a sample table created for MySQL database.

create table dim_day (
 day_key int not null auto_increment,   -- primay key.
 date datetime ,     -- date: 2008-08-18 00:00:00
 date_num int(8),    -- numeric value, in YYYYMMDD, 20080818
 day_num int (2),    -- numeric value, 18
 day_of_year int(4), -- the day of the year 
 day_of_week int(2), -- the day of the week
 day_of_week_name varchar(20), -- day of week name (Monday, Tuesday,etc)
 week_num int (2), --  week of the year 
 week_begin_date datetime,  -- week begin date
 week_end_date datetime, -- week end date
 last_week_begin_date datetime,  -- priore week begin date
 last_week_end_date datetime,   priore week end date
 last_2_week_begin_date  datetime,   -- priore two week begin date
 last_2_week_end_date datetime,  -- priore two ween end date
 month_num int (2) ,  -- month in number, ie. 12
 month_name varchar(20),  -- month in name, ie. December
 YEARMONTH_NUM int(6),  -- year and month in number, ie. 201212
 last_month_num int (2), -- priore month in number, ie. 11
 last_month_name varchar(20), -- priore month in name, November
 last_month_year int(4),  -- priore month in year, 2012
 last_yearmonth_num int(6), -- priore year and month in  number, ie, 2o1211
 quarter_num int (2),  -- quarter in number, ie 4
 year_num int (4), -- year in number, ie, 2012
 created_date timestamp not null  ,  -- date record was created
 updated_date timestamp not null , -- date record was updated
 primary key (day_key)
 );

Here is a porcedure use to populate the table in MySql. It takes two input parameners, start_date and end_date, for example, 1990-01-01 and 2099-12-31.

use dwdb;

drop procedure if exists sp_day_dim;

truncate table dim_day;

delimiter //

CREATE PROCEDURE sp_day_dim (in p_start_date datetime, p_end_date datetime)
BEGIN

  Declare StartDate datetime;
  Declare EndDate datetime;
  Declare RunDate datetime;

-- Set date variables

  Set StartDate = p_start_date; -- update this value to reflect the earliest date that you will use.
  Set EndDate = p_end_date; -- update this value to reflect the latest date that you will use.
  Set RunDate = StartDate;

-- Loop through each date and insert into DimTime table

WHILE RunDate <= EndDate DO

INSERT Into dim_day(
 date ,
 date_num,
 day_num ,
 Day_of_Year,
 Day_of_Week,
 Day_of_week_name,
 Week_num,
 week_begin_date,
 week_end_date,
 last_week_begin_date,
 last_week_end_date,
 last_2_week_begin_date,
 last_2_week_end_date,
 Month_num ,
 Month_Name,
 yearmonth_num,
 last_month_num,
 last_month_name,
 last_month_year,
 last_yearmonth_num,
 Quarter_num ,
 Year_num  ,
 created_date, 
 updated_date 
)
select 
RunDate date
,CONCAT(year(RunDate), lpad(MONTH(RunDate),2,'0'),lpad(day(RunDate),2,'0')) date_num
,day(RunDate) day_num
,DAYOFYEAR(RunDate) day_of_year
,DAYOFWEEK(RunDate) day_of_week
,DAYNAME(RunDate) day_of_week_name
,WEEK(RunDate) week_num
,DATE_ADD(RunDate, INTERVAL(1-DAYOFWEEK(RunDate)) DAY) week_begin_date
,ADDTIME(DATE_ADD(RunDate, INTERVAL(7-DAYOFWEEK(RunDate)) DAY),'23:59:59') week_end_date
,DATE_ADD(RunDate, INTERVAL ((1-DAYOFWEEK(RunDate))-7) DAY) last_week_begin_date
,ADDTIME(DATE_ADD(RunDate, INTERVAL ((7-DAYOFWEEK(RunDate))-7) DAY),'23:59:59')last_week_end_date
,DATE_ADD(RunDate, INTERVAL ((1-DAYOFWEEK(RunDate))-14) DAY) last_2_week_begin_date
,ADDTIME(DATE_ADD(RunDate, INTERVAL ((7-DAYOFWEEK(RunDate))-7) DAY),'23:59:59')last_2_week_end_date
,MONTH(RunDate) month_num
,MONTHNAME(RunDate) month_name
,CONCAT(year(RunDate), lpad(MONTH(RunDate),2,'0')) YEARMONTH_NUM
,MONTH(date_add(RunDate,interval -1 month)) last_month_num
,MONTHNAME(date_add(RunDate,interval -1 month)) last_month_name
,year(date_add(RunDate,interval -1 month)) last_month_year
,CONCAT(year(date_add(RunDate,interval -1 month)),lpad(MONTH(date_add(RunDate,interval -1 month)),2,'0')) Last_YEARMONTH_NUM
,QUARTER(RunDate) quarter_num
,YEAR(RunDate) year_num
,now() created_date
,now() update_date
;
-- commit;
-- increase the value of the @date variable by 1 day

Set RunDate = ADDDATE(RunDate,1);

END WHILE;
commit;
END;
//

Reference:

Thursday, January 26, 2012

Find Oracle Table Size, Number Of Columns and Rows


Query to Find Table Size, Number of Rows and Columns in a Oracle Database

 Here is a query that gives the table size which include number of columns, number of rows and table data size in MB. I found these information are helpful during the data warehouse design phase.

 select col.table_name,
col.col_cnt as column_count,
rc.row_cnt as row_count,
s.size_in_MB as table_size_in_MB
from
(
/* number of columns */
SELECT upper(table_name), COUNT(*) col_cnt,
FROM dba_tab_columns
WHERE owner = 'V500'
group by upper(table_name)
) col

join

(
/* number of rows */
select
table_name,
to_number(extractvalue(xmltype(dbms_xmlgen.getxml('select count(*) c from '||table_name)),'/ROWSET/ROW/C')) as row_cnt
from dba_tables
where (iot_type != 'IOT_OVERFLOW'or iot_type is null)
and owner = 'SCOTT'
) rc
on upper(col.table_name) = upper(rc.table_name)

join

(
/* table size in MB */
SELECT
owner, table_name, (sum(bytes)/1024/1024) size_in_MB
FROM
(SELECT segment_name table_name, owner, bytes
FROM dba_segments
WHERE segment_type = 'TABLE'
and owner = 'SCOTT'
)

group by owner, table_name

) s
on upper(col.table_name) = upper(s.table_name);

Retrieve Oracle Table Column Name And Data Type

Query to Get  Oolumn Names and Data Type of  an Oracle  Table

 
Here is at query that retrieves column names and their data type of Oracle tables in a database. I found it handy to have a searchable copy of the source tables and their column name and data types when doing data profiling, modeling and mapping.

 
select owner,
 table_name,
 column_name,
 data_type || decode(data_type,'DATE', ' ', '(' || data_length || decode(data_scale,null,'', ',' || data_scale) || ')' ) as   Data_Type,
nullable,
 data_default
from dba_tab_columns
where owner in('SCOTT','MARY') ;

 
If dba_tab_columns is not available, table user_tab_colimns will also work.

 

Related Notes:

Monday, November 21, 2011

LOAD SLOWLY CHANGING DIMENSION TYPE 2 USING MYSQL STORED PROCEDURE

LOAD SLOWLY CHANGING DIMENSION TYPE 2 USING MYSQL STORED PROCEDURE

Unlike Oracle (merge statement) or other databases, MySQL does not support “upsert” (perform update or insert in one sql statement). Therefore, it requires some coding to achieve both insert and update when loading a type 2 slowly changing dimension.

Here was my lab exercise  which I used a common dimension, product.

1. Tables.

I created the following tables during my lab exercise.

1.1 Dimension table: dim_product.

This was the product dimension table that we want to keep history of changes of production name, product description, product category and price.
Create table dim_product(
PRODUCT_SID int(12) NOT NULL auto_increment ,
PRODUCT_ID int(12) NOT NULL,
PRODUCT_CATEGORY_ID int(10) not null,
PRODUCT_NAME VARCHAR(50),
PRODUCT_DESCRIPTION VARCHAR(200),
PRICE DECIMAL(10,2),
PRICE_EFFECTIVE_DATE DATEtime,
LAST_MODIFIED_DATE DATEtime,
EFFECTIVE_START_DATE DATEtime,
EFFECTIVE_END_DATE DATEtime,
IS_CURRENT_ROW char(1),
CREATED_DATE DATEtime,
UPDATED_DATE DATEtime,
primary key (product_sid)
);

1.2 Staging table: stg_product.

This was the staging table that we will use to store new product records. Assuming production records will transfer to this table first before load into the product dimension.

create table pcdb.stg_product (
product_id int(12) not null,
product_category_id int(10) not null,
product_name varchar(50),
product_description varchar(200),
price decimal(8,2),
price_effective_date datetime,
last_modified_date datetime,
created_date datetime);

1.3 Tempary table: tmp_product.

This is a temporary table that will be used in the store procedure to process data.

create table tmp_product (
PRODUCT_SID int(12) ,
PRODUCT_ID int(12) NOT NULL,
PRODUCT_CATEGORY_ID int(10) not null,
PRODUCT_NAME VARCHAR(50),
PRODUCT_DESCRIPTION VARCHAR(200),
PRICE DECIMAL(10,2),
PRICE_EFFECTIVE_DATE DATEtime,
LAST_MODIFIED_DATE DATEtime,
created_date datetime,
scd_row_type_id tinyint(2)
);

2. The Stored Procedure

I created a stored procedure to accomplish the SCD type 2 dimension load in MySQL. Here was the simple store procedure.

use labdb
drop procedure if exists sp_dim_product;
delimiter //

CREATE PROCEDURE sp_dim_product()

BEGIN

/* Procedure name: sp_dim_prodcut
Description: user subscriptions 
SCD: Type 2.
Dependence: dim_user_product
Table size: large. keep track changes in product.
*/

Declare v_start_date datetime;
declare v_load_id int;
declare v_error_msg varchar(200);
declare v_proc_name varchar(36) default 'sp_dim_product'; 
declare v_validation tinyint;
declare v_max_data_date datetime;

DECLARE EXIT HANDLER FOR SQLEXCEPTION 
BEGIN 
set v_error_msg =concat('Procedure ',v_proc_name,' Encounter SQLEXCEPTION Error: Error code unknown');
SELECT v_error_msg; 
END;

truncate table tmp_product;

insert into tmp_product(
PRODUCT_SID ,
PRODUCT_ID ,
PRODUCT_CATEGORY_ID ,
PRODUCT_NAME ,
PRODUCT_DESCRIPTION ,
PRICE ,
PRICE_EFFECTIVE_DATE ,
LAST_MODIFIED_DATE ,
EFFECTIVE_START_DATE ,
EFFECTIVE_END_DATE ,
IS_CURRENT_ROW ,
scd_row_type_id ,
)
select 
ifnull(d.product_sid,0),
p.product_id,
p.product_category_id,
p.product_name,
p.product_description,
p.price,
p.price_effective_date,
p.last_modified_date,
p.last_modified_date as effective_start_date,
'2050/12/31' as effective_end_date,
'T',
( case when d.product_sid is null then 1 
when (d.product_sid is not null and
(p.product_category_id <> d.product_category_id or
p.product_name <> d.product_name or
p.product_description <> d.PRODUCT_DESCRIPTION or
p.price <> d.price) 
) then 2
else 0 
end) as scd_row_type_id ,
now()
from stg_product p
left join dim_product d 
on (p.product_id = d.product_id and d.is_current_row = 'T')
;

update dim_product d, tmp_product t
set d.is_current_row = 'F', 
d.EFFECTIVE_END_DATE = t.LAST_MODIFIED_DATE,
updated_date = now()
where d.product_sid = t.product_sid
and t.scd_row_type_id = 2;

insert into dim_product(
PRODUCT_ID ,
PRODUCT_CATEGORY_ID ,
PRODUCT_NAME ,
PRODUCT_DESCRIPTION ,
PRICE ,
PRICE_EFFECTIVE_DATE ,
LAST_MODIFIED_DATE ,
EFFECTIVE_START_DATE ,
EFFECTIVE_END_DATE ,
IS_CURRENT_ROW ,
CREATED_DATE ,
UPDATED_DATE 
)
select PRODUCT_ID ,
PRODUCT_CATEGORY_ID ,
PRODUCT_NAME ,
PRODUCT_DESCRIPTION ,
PRICE ,
PRICE_EFFECTIVE_DATE ,
LAST_MODIFIED_DATE ,
LAST_MODIFIED_DATE ,
EFFECTIVE_END_DATE ,
IS_CURRENT_ROW ,
now() ,
now()
from tmp_product
where scd_row_type_id in (1,2)
;

commit;

END;
//

3 How it works

A slowly changing dimension type 2 (SCD2) is defined to keep track changes of attributes in the underlying source table. While there are many attributes in a table, the business rules decide what changes to track. Here, for demonstration purpose, I choose to keep track of changes of each of the following attributes: production name, product description, and product category and price.

In the procedure, I used left outer join on product id, between the dimension table (existing records) and the staging table (new and modified records). For the new records, the product_sid will be null because they were not in the dimention table and I used a scd_row_type_id=1 flag to indicate that they were new records and assigned each of them with a ‘0’ poduct_sid. For the existing records, there would be a product_sid for each record and I used a scd_row_type_id=2 flag to indicate that they were modified exiting records.

select 
ifnull(d.product_sid,0),
p.product_id,
p.product_category_id,
p.product_name,
p.product_description,
p.price,
p.price_effective_date,
p.last_modified_date,
p.last_modified_date as effective_start_date,
'2050/12/31' as effective_end_date,
'T',
( case when d.product_sid is null then 1 
when (d.product_sid is not null and
(p.product_category_id <> d.product_category_id or
p.product_name <> d.product_name or
p.product_description <> d.PRODUCT_DESCRIPTION or
p.price <> d.price) 
) then 2
else 0 
end) as scd_row_type_id ,
now()
from stg_product p
left join dim_product d 
on (p.product_id = d.product_id and d.is_current_row = 'T')
;

update dim_product d, tmp_product t
set d.is_current_row = 'F', 
d.EFFECTIVE_END_DATE = t.LAST_MODIFIED_DATE,
updated_date = now()
where d.product_sid = t.product_sid
and t.scd_row_type_id = 2;

To identify the modified records, I used a case statement. By following the business rules, I compared each of the attributes to see if any of the attributes were modified. If it was, assigned a ‘2’ to the scd_row_type_id flag.

( case when d.product_sid is null then 1 
when (d.product_sid is not null and
(p.product_category_id <> d.product_category_id or
p.product_name <> d.product_name or
p.product_description <> d.PRODUCT_DESCRIPTION or
p.price <> d.price) )
 then 2

To indicate the current record, I created a column called IS_CURRENT_ROW and set it to ‘T’ it was a current row. For existing (history) records, I set it to the IS_CURRENT_ROW to ‘F’.

update dim_product d, tmp_product t
set d.is_current_row = 'F', 
d.EFFECTIVE_END_DATE = t.LAST_MODIFIED_DATE,
updated_date = now()
where d.product_sid = t.product_sid
and t.scd_row_type_id = 2;


At the end of the procedure, I inserted both the new and modified records which the t.scd_row_type_id was ‘1’ or ‘2’. Please note that records had not been modified, would had scd_row_type_id = 0 and would not be inserted into the dimention table.

insert into dim_product(
PRODUCT_ID ,
PRODUCT_CATEGORY_ID ,
PRODUCT_NAME ,
PRODUCT_DESCRIPTION ,
PRICE ,
PRICE_EFFECTIVE_DATE ,
LAST_MODIFIED_DATE ,
EFFECTIVE_START_DATE ,
EFFECTIVE_END_DATE ,
IS_CURRENT_ROW ,
CREATED_DATE ,
UPDATED_DATE 
)
select PRODUCT_ID ,
PRODUCT_CATEGORY_ID ,
PRODUCT_NAME ,
PRODUCT_DESCRIPTION ,
PRICE ,
PRICE_EFFECTIVE_DATE ,
LAST_MODIFIED_DATE ,
LAST_MODIFIED_DATE ,
EFFECTIVE_END_DATE ,
IS_CURRENT_ROW ,
now() ,
now()
from tmp_product
where scd_row_type_id in (1,2)
;

To test the procedure, first run the following query to insert two records into the staging table:

truncate table stg_product;

insert into stg_product ( PRODUCT_ID, PRODUCT_CATEGORY_ID, PRODUCT_NAME,PRODUCT_DESCRIPTION,PRICE,PRICE_EFFECTIVE_DATE, LAST_MODIFIED_DATE, CREATED_DATE)
values (1,8,'Apple iPad 36 GB','Apple iPad 36 GB with case', 800,now(),now(),now());

insert into stg_product ( PRODUCT_ID, PRODUCT_CATEGORY_ID, PRODUCT_NAME,PRODUCT_DESCRIPTION,PRICE,PRICE_EFFECTIVE_DATE, LAST_MODIFIED_DATE,CREATED_DATE)
values (2,7,'Canon 12 MPIX Digital Camera','Canon 12 MPIX Digital Camera, Zoon Len, Case', 150,now(),now(),now());

commit;


Check the stage tables to make sure the records are there.

MySQL>select * from stg_product;

Now run the store procedure which was created in section 2. After running the procedure, run the following query to check the records in the dim_product table.

MySQL>select * from dim_product;

There should be two rows in the dimension table with the column effective_start_date, effective_end_date and is_current_row populated. Now, let’s find out how the merge statement handles SCD Type 2 records. To begin, run the following queries against the stage table:

update stg_product set price = 700 , price_effective_date = now(), last_modified_date = now() where product_id = 1;
insert into stg_product (
PRODUCT_ID, 
PRODUCT_CATEGORY_ID,
PRODUCT_NAME,
PRODUCT_DESCRIPTION,
PRICE,
PRICE_EFFECTIVE_DATE,
LAST_MODIFIED_DATE,
CREATED_DATE)
values (3,6,'EPSon Color Printer',
'EPSon color print with high definition photo prints', 200,now(),now(),now());
commit;


The above statements updated a record where product_id equal to 1 to change it price to 700, and inserted one more new record, product_id 3, into the stg_product table. To help understand what the records set looks like from the select statement that use the left outer join, let’s copy the query and execute it to get the record set.

select 
ifnull(d.product_sid,0),
p.product_id,
p.product_category_id,
p.product_name,
p.product_description,
p.price,
p.price_effective_date,
p.last_modified_date,
p.last_modified_date as effective_start_date,
'2050/12/31' as effective_end_date,
'T',
( case when d.product_sid is null then 1 
when (d.product_sid is not null and
(p.product_category_id <> d.product_category_id or
p.product_name <> d.product_name or
p.product_description <> d.PRODUCT_DESCRIPTION or
.price <> d.price) 
) then 2
else 0 
end) as scd_row_type_id ,
now()
from stg_product p
left join dim_product d 
on (p.product_id = d.product_id and d.is_current_row = 'T')
;


As you can see, we have one rows for product_id 1, and one row for the new record, product_id 3. Also, at the end of the row, you can also see that the scd_record_type_id column has two ids for product id 1. One is 2 and the other one is 1.

Now, run the stored procedure in section 2 to load the data into the dimension table.

4. Summary

This purpose of this procedure is to show how to use outer join to maintain a Type 2 slow changing dimention in MySQL. This procedure dose not include all the necessary error checking routines. Also, there are other ‘real world’ situations need to be considered when implementing this procedure. For example, if there are large amount of records have been modified for any given time, there may be a performance issue when performing an update. Also, as the dimension table growth bigger, outer join will get slower. Therefore, other techniques need to be used to make the codes more efficient.


Reference:

Related Note:

Thursday, December 02, 2010

OLAP

OLAP and Business Intelligence





Reference:


A Song About Failed Data Warehouse




Saturday, August 07, 2010

DATA WAREHOUSE CONCEPTS

Data Warehouse Concepts

1. Introduction:

Over the years, I had done a few data warehouse projects and read a few data warehouse and business intelligence books and white pages. However, I had never really summarized the concepts, terminologies, and techniques that I had learned. Here, I will start to put together the basic concepts, and terminologies that are frequently referred to and mythologies and techniques that are frequently used and talked about.

I am going to organize the topic base on the following data warehouse concepts. In this note, I will provide summarize version of the concepts. In the future note, I will provide more explanation and example of the each concept.

  •   Data warehouse and data mart
  •  Fact and Dimension
  •  Star Schema and Snowflake Schema
  •  Extract, Transform and Load (ETL)
  •  Meta Data
  •  Cube
2. Data Warehouse and Data Mart
2.1 Data Warehouse

A data warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems collect. The goal of the data warehouse is make enterprise data easily accessible for strategic decision making.

There are two well-known authors on data warehousing: Bill Inmon and Ralph Kimball. Following are definitions from each of them:


  • Bill Inmon: A data warehouse is a subject-oriented, integrated, nonvolatile, and time-variant collection of  data in support of management’s decisions. 




    • Subject-oriented - Data that gives information about a particular subject instead of about a company's   on-going operations.
    •  Integrated - Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.
    • Time-variant - All data in the data warehouse is identified with a particular time period.
    •  Non-volatile - Data is stable in a data warehouse. More data is added, but data is never removed. This enables management to gain a consistent picture of the business. 


  • Ralph Kimball: The conglomeration of an organization’s data warehouse staging and presentation areas, where operational data is specifically structured for query and analysis performance and ease-of-use.




  • 2.2 Data Mart
     A data mart is a repository of data gathered from operational data and other sources that is designed to serve a particular community of knowledge workers. In scope, the data may derive from an enterprise-wide database or data warehouse or be more specialized. The emphasis of a data mart is on meeting the specific demands of a particular group of knowledge users in terms of analysis, content, presentation, and ease-of-use. Users of a data mart can expect to have data presented in terms that are familiar.

    2.3 Data Warehouse vs. Data Mart:
    •  A data warehouse tends to be a strategic but somewhat unfinished concept. The design of a data warehouse tends to start from an analysis of what data already exists and how it can be collected in such a way that the data can later be used. A data warehouse is a central aggregation of data (which can be distributed physically); 
    • A data mart tends to be tactical and aimed at meeting an immediate need. The design of a data mart tends to start from an analysis of user needs. A data mart is a data repository that may derive from a data warehouse or not and that emphasizes ease of access and usability for a particular designed purpose. 
    3. Fcts and Dimensions
    3.1 Facts

    A fact table is the central table in a star join schema characterized by a composite key, each of whose elements is a foreign key drawn from a dimension table.
    • Represent a business process, i.e., models the business process as an artifact in the data model 
    • Contain the measurements or metrics or facts of business processes. 
    • Most are additive (sales this month), some are semi-additive (balance as of), some are not additive (unit price). 
    • The level of detail is called the “grain” of the table. 
    • Contain foreign keys for the dimension tables include time dimension. 
    • Reolve many-to-many relationships
    3.2 Dimensions
    • Represent the who, what, where, when and how of a measurement/artifact. 
    • Represent real-world entities not business processes. 
    • Give the context of a measurement (subject). 
    • For example for the Sales fact table, the characteristics of the 'monthly sales number' measurement can be a Location (Where), Time (When), Product Sold (What).  
    • The Dimension Attributes are the various columns in a dimension table. In the Location dimension, the attributes can be Location Code, State, Country, Zip code. Generally the Dimension Attributes are used in report labels, and query constraints such as where Country='USA'. The dimension attributes also contain one or more hierarchical relationships.  
    •  Hierarchical relationships.
    4. Star Schemas and Snowflake Schemas
    4.1 Star Schemas

    A start Schema (Dimensional Model) consists a single fact table of metrics surrounded by multiple descriptive dimension tables. A start schema will consist at less on fact table and a few dimensions tables.

      
    4.2 Snowflake Schema

    A snowflake schema is a set of tables comprised of a single, central fact table surrounded by normalized dimensions. The main different between a star schema and a snowflake schema is that star schema tends to have demoralize dimensions while a snowflake schema prefers normalized dimensions.


    5. Extract, Transform and Load (ETL)
    Extract, Transform and Load (ETL) is the name for a process or a group of processes that transport data from source system into the data warehouse or data mart.
    •  Extract is the fist part of an ETL process. It involves extracting the data from different source systems and different data formats.  
    • Transform refers to Appling business rules and logics to the data that was extracted from the source system before loading them into the data warehouse. Data quality assurance can also be performed here. 
    • Load processes load data, after the transform processes, to the data warehouse and data mart which some time also refer to as targets.  
    6. Meta Data

    Meta data literally means "data about data.". It describes the characteristics of a resource. describes how and when and by whom a particular set of data was collected, and how the data is formatted. Metadata is essential for understanding information stored in data warehouses.
    • A data dictionary is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format.
    7. Cube

    A cube contains dimensions, hierarchies, levels, and measures. Each individual point in a cube is referred to as a cell.

    Cubes are OLAP objects consisting of related measures and dimensions that you configure within an Analysis Services database. You can define and configure multiple cubes within a single database and each cube can use some or all of the same dimensions. You can also define a single cube that contains multiple measure groups in the same database rather than defining separate cubes. When you define a cube with multiple measure groups, you need to define how dimensions relate to each measure group and customize, as appropriate, dimension objects within each cube and measure group. When defining a cube, you also define advance cube properties, including calculations, KPIs, actions, partitions and aggregations, perspectives and translations.

    8. Smmary
    The concepts which described above were the basic concepts would start with any data were house project. Understand those concepts provide a starting point in any data warehouse development. Each of the concepts mentioned above represent an important part in the data warehouse development life cycle. These concepts are fundamental to design and implement a data warehouse. Following diagram show each of the concepts tied into in each data warehouse layers.


    References: