
OLAP operations include rollup (increasing the level of
aggregation) and drill-down (decreasing the level of
aggregation or increasing detail) along one or more
dimension hierarchies, slice_and_dice (selection and
projection), and pivot (re-orienting the multidimensional view
of data).
Given that operational databases are finely tuned to support
known OLTP workloads, trying to execute complex OLAP
queries against the operational databases would result in
unacceptable performance. Furthermore, decision support
requires data that might be missing from the operational
databases; for instance, understanding trends or making
predictions requires historical data, whereas operational
databases store only current data. Decision support usually
requires consolidating data from many heterogeneous
sources: these might include external sources such as stock
market feeds, in addition to several operational databases.
The different sources might contain data of varying quality, or
use inconsistent representations, codes and formats, which
have to be reconciled. Finally, supporting the
multidimensional data models and operations typical of
OLAP requires special data organization, access methods,
and implementation methods, not generally provided by
commercial DBMSs targeted for OLTP. It is for all these
reasons that data warehouses are implemented separately
from operational databases.
Data warehouses might be implemented on standard or
extended relational DBMSs, called Relational OLAP
(ROLAP) servers. These servers assume that data is stored in
relational databases, and they support extensions to SQL and
special access and implementation methods to efficiently
implement the multidimensional data model and operations.
In contrast, multidimensional OLAP (MOLAP) servers are
servers that directly store multidimensional data in special
data structures (e.g., arrays) and implement the OLAP
operations over these special data structures.
There is more to building and maintaining a data warehouse
than selecting an OLAP server and defining a schema and
some complex queries for the warehouse. Different
architectural alternatives exist. Many organizations want to
implement an integrated enterprise warehouse that collects
information about all subjects (e.g., customers, products,
sales, assets, personnel) spanning the whole organization.
However, building an enterprise warehouse is a long and
complex process, requiring extensive business modeling, and
may take many years to succeed. Some organizations are
settling for data marts instead, which are departmental
subsets focused on selected subjects (e.g., a marketing data
mart may include customer, product, and sales information).
These data marts enable faster roll out, since they do not
require enterprise-wide consensus, but they may lead to
complex integration problems in the long run, if a complete
business model is not developed.
In Section 2, we describe a typical data warehousing
architecture, and the process of designing and operating a
data warehouse. In Sections 3-7, we review relevant
technologies for loading and refreshing data in a data
warehouse, warehouse servers, front end tools, and
warehouse management tools. In each case, we point out
what is different from traditional database technology, and we
mention representative products. In this paper, we do not
intend to provide comprehensive descriptions of all products
in every category. We encourage the interested reader to look
at recent issues of trade magazines such as Databased
Advisor, Database Programming and Design, Datamation,
and DBMS Magazine, and vendors’ Web sites for more
details of commercial products, white papers, and case
studies. The OLAP Council
2
is a good source of information
on standardization efforts across the industry, and a paper by
Codd, et al.
3
defines twelve rules for OLAP products. Finally,
a good source of references on data warehousing and OLAP
is the Data Warehousing Information Center
4
.
Research in data warehousing is fairly recent, and has focused
primarily on query processing and view maintenance issues.
There still are many open research problems. We conclude in
Section 8 with a brief mention of these issues.
2. Architecture and End-to-End Process
Figure 1 shows a typical data warehousing architecture.
Data sources
Operational
dbs
External
sources
Extract
Transform
Load
Refresh
Data Warehouse
Data Marts
Analysis
OLAP
Servers
Data Mining
Query/Reporting
Metadata
Repository
Monitoring & Admnistration
Tools
Serve
Figure 1. Data Warehousing Architecture
It includes tools for extracting data from multiple operational
databases and external sources; for cleaning, transforming
and integrating this data; for loading data into the data
warehouse; and for periodically refreshing the warehouse to
reflect updates at the sources and to purge data from the
warehouse, perhaps onto slower archival storage. In addition
to the main warehouse, there may be several departmental
data marts. Data in the warehouse and data marts is stored
and managed by one or more warehouse servers, which
present multidimensional views of data to a variety of front
end tools: query tools, report writers, analysis tools, and data
mining tools. Finally, there is a repository for storing and