Tuesday, February 5, 2013

Data Warehousing

Question:What is data warehousing?
Answer:A data warehousing is a relational database that is designed for query and analysis rather than for transaction processing.
It usually contains historical data derived from transactional data.
Willam Inmon -- 1.Subject Oriented 2.Integrated 3.Non-Volatile 4.Time Variant.

1.Subject Oriented:DWH are designed to help you analyze data.
2.Integration:Closely related to subject orientation.DWH must put data from disparate sources into consistent.
3.Non-Volatile:Once entered into datawarehouse,it should not change with time.It is used for analysis purpose.
4.Time Variant:Datawarehouse can change over time,it needs large amounts of  data for analysis.

2.Difference between OLTP and Datawarehouse?

1.Very few indexes in OLTP,when compared to DWH.
2.Many Joins in OLTP,when compared to DWH.
3.Normalized DMBS is in OLTP and Denormalized DMBS in Datawarehouse.
4.Dervied data and aggregates are very rare  in OLTP but derived data and aggregates are very common in datawarehouse.


Differences of DWH and OLTP:

Workload -- Datawarehouses are designed to accommodate adhoc queries.You might not know the workload of your data warehouse  in advanvce,so a datawarehouse should be optimized to perform well for a wide variety of possible query opeartions.
OLTP systems support only predefined operations.
Data Modifications:
A data warehouse is updated on a regular basis by the ETL process using bulk data modification techniques.
End users of a datawarehouse do not directly update the data warehouse.
In OLTP systems,end users routinely issue individual data modifications statements to the database.OLTP database is always up to date and reflects the currenct state of each business transaction.

Schema design:
Data warehouses often use denormalized or partially denormalized schemas to optimize query performance.
OLTP systems often use fully normalized schemas to optimize update/insert/delete performance and to guarantee data consistency.
Historical data:
Datawarehouses usually store many months or years of data.
This is to support historical analysis.
OLTP systems usually store data from only a few weeks or months.



3.What is Data Warehousing schemas?

A schema is a collection of database objects,including tables,views,indexes and synonyms.
You can arrange schema objects in the schema models designed for data warehousing in a variety of ways.Most datawarehouses uses a dimensional model.

The model of your source data and the requirements of your users help you design the datawarehouse schema.

Star Schemas:

This is the simplest datawarehouse schema.It is called a star schema becuase the diagram resembles a star,which points radiating from a center.The center of the star consists of one or more fact tables and the points of the star are the dimension tables.
The most natural way to model a data warehouse is as a star schema,only one join establishes between the fact table and any one of the dimension tables.
A star schema optimizes performance by keeping queries simple and providing fast response time.All the information about each level is stored in one row.

4.What are the Data warehousing Objects?

1.Fact tables and dimension tables are two types of objects commonly used in dimensional data warehouse schemas.

Fact Tables:

A fact table typically has two types of columns:those that contain numeric facts and those that are foreign keys to dimension tables.
A fact table contains either detail-level facts or facts  that have been aggregated.
Fact tables that contain aggregated facts are often called summary tables.

A fact table usually contains facts with the same level of aggregation.Though most facts are addivitive,they can also be semi-additive or non-additive.
Additive facts can be aggregated by simple arithmetical addition.Non-additive facts cannot be added at all.
Creating a New Fact Table:
1.You must define a fact table for each star schema.From a modelling standpoint,the primary key of the fact table is usually a composite key that is made up of all its foreign keys.

Dimension Tables:

A dimension is a structure,often composed of one or more hierachies,that categorizes data.Dimensional attributes help to describe the dimensional value.
They are normally descriptive,textual values.Several dimensions,combined with facts,enable  you to answer business questions.Commonly used dimensions are customers,products and time.
Dimension data is typically collected at the lowest level of detail and then aggregated into higher level totals that are more useful for analysis.
These natural rollups or aggregations within a dimension table are called hierarchies.


Hierarchies:

Hierarchies are logical structures that use ordered levels as means of organizing data.A Hierarchy can be used to define data aggregation.For example: In a time dimension,a hierarchy might aggregate data from the month level to the quarter level to the year level.A hierarchy can also be used to define a navigational drill path and to establish a family structure.





















5.What is Logical design in datawarehouses?

1.Specific data content.
2.Relationships within and between groups of data.
3.The system environment supporting your data warehouse.
4.The data transformations required.
5.The frequency with which data is refreshed.

Logical design is more conecptual and abstract than the physical design,looking at logical relationship among the objects.


Creating a logical design :

A logical design is conceptual and abstract.E-R modeling involves identifying the things of importance entities,the properties of these things and how they are related to one another.

The process of logical design involves arranging data into a series of logical relationships called entites and attributes.

An entity represents a chunk of information.In relational databases,an entity often maps to a table.
An attribute is a component of an entity that helps define the  uniqueness of the entity.

In relational database,an attribute maps to a column.


In Physical design,we look at the most effective way of storing and retrieving the objects as well as handling them from a transpoartion and backup/recovery perspective.

No comments:

Post a Comment