数据仓库技术（AnoverviewofdatawarehousingandOLAPtechnology）资源-CSDN文库

需积分: 10 167 浏览量 2009-02-28 23:30:44 上传评论 2 收藏 233KB PDF 举报

### 数据仓库技术概述 #### 一、引言与背景数据仓库技术是决策支持系统中的关键技术之一，近年来在数据库行业中受到了越来越多的关注。随着商业产品和服务的不断涌现，几乎所有主要的数据库管理系统供应商都推出了相关的解决方案。相比传统的在线事务处理（OLTP）应用，决策支持系统对数据库技术提出了不同的需求。本文旨在提供数据仓库技术和在线联机分析处理（OLAP）技术的全面概述，重点介绍这些新技术的需求及其发展趋势。 #### 二、数据仓库技术概述 ##### 1. 数据仓库的概念数据仓库是一个面向主题的、集成的、随时间变化且非易失的数据集合，用于支持管理决策过程。它通常包含来自一个或多个业务系统的数据，并经过清洗、转换和整合等预处理步骤后存储起来。 ##### 2. 后端工具：数据提取、清洗与加载数据仓库构建过程中，首先需要从各种来源系统中抽取数据。这些数据往往存在格式不一致、质量参差不齐等问题，因此需要通过清洗来确保数据的质量。经过清洗的数据被加载到数据仓库中，这个过程称为ETL（Extract, Transform, Load）。 ##### 3. 多维数据模型 OLAP技术的核心在于多维数据模型，它允许用户从多个角度查询和分析数据。常见的多维数据模型包括星型模式和雪花模式，它们分别适用于不同的场景和需求。 ##### 4. 前端工具：查询与数据分析为了便于用户进行查询和数据分析，数据仓库系统通常配备了一系列前端工具。这些工具可以提供直观的图形界面，支持复杂的查询操作，并能以图表等形式展示分析结果。 ##### 5. 服务器扩展：高效查询处理为了提高查询性能，数据仓库系统还需要具备高效的查询处理能力。这包括优化查询执行计划、利用索引和分区技术等方法来加速数据访问速度。 ##### 6. 元数据管理与仓库管理工具元数据是指关于数据的数据，它可以用来描述数据的结构、来源、更新频率等信息。有效的元数据管理对于维护数据仓库的一致性和完整性至关重要。此外，还需要有一套完整的工具来管理和监控整个数据仓库环境。 #### 三、OLAP技术简介 ##### 1. OLAP技术的定义在线联机分析处理（OLAP）是一种快速提供多维视图的能力，使得用户能够迅速地进行复杂的数据分析。OLAP系统通常与数据仓库相结合使用，为用户提供强大的决策支持能力。 ##### 2. OLAP的分类根据实现方式的不同，OLAP可以分为三种类型：MOLAP（多维OLAP）、ROLAP（关系OLAP）和HOLAP（混合OLAP）。每种类型的OLAP都有其特点和适用场景。 - **MOLAP**：数据以多维数组的形式存储，适合于复杂的数据分析和计算。 - **ROLAP**：数据以关系数据库的形式存储，适用于大规模数据集的处理。 - **HOLAP**：结合了MOLAP和ROLAP的优点，部分数据以多维形式存储，部分数据以关系表的形式存储。 ##### 3. OLAP的功能 OLAP系统提供了多种功能，包括但不限于切片（Slice）、切块（Dice）、旋转（Pivot）、钻取（Drill-down/Up）等，这些功能使得用户可以从不同维度对数据进行深入分析。 #### 四、数据仓库技术的应用领域数据仓库技术已经在多个行业中得到了广泛应用： - **制造业**：订单发货、客户服务等； - **零售业**：用户画像、库存管理等； - **金融服务**：索赔分析、风险评估、信用卡分析及欺诈检测等； - **交通运输**：车队管理等； - **电信业**：通话记录分析、欺诈检测等； - **公共事业**：电力使用分析等； - **医疗保健**：效果分析等。 #### 五、结论与未来研究方向尽管当前的数据仓库技术和OLAP技术已经取得了显著的进步，但仍面临着一些挑战和未解决的问题。例如，如何更有效地处理海量数据、如何提高查询响应速度以及如何更好地支持实时分析等。这些问题不仅是未来研究的重要方向，也是推动数据仓库技术不断进步的关键因素。

资源推荐

资源详情

资源评论

Abstract

Data warehousing and on-line analytical processing (OLAP)

are essential elements of decision support, which has

increasingly become a focus of the database industry. Many

commercial products and services are now available, and all

of the principal database management system vendors now

have offerings in these areas. Decision support places some

rather different requirements on database technology

compared to traditional on-line transaction processing

applications. This paper provides an overview of data

warehousing and OLAP technologies, with an emphasis on

their new requirements. We describe back end tools for

extracting, cleaning and loading data into a data warehouse;

multidimensional data models typical of OLAP; front end

client tools for querying and data analysis; server extensions

for efficient query processing; and tools for metadata

management and for managing the warehouse. In addition to

surveying the state of the art, this paper also identifies some

promising research issues, some of which are related to

problems that the database research community has worked

on for years, but others are only just beginning to be

addressed. This overview is based on a tutorial that the

authors presented at the VLDB Conference, 1996.

1. Introduction

Data warehousing is a collection of decision support

technologies, aimed at enabling the knowledge worker

(executive, manager, analyst) to make better and faster

decisions. The past three years have seen explosive growth,

both in the number of products and services offered, and in

the adoption of these technologies by industry. According to

the META Group, the data warehousing market, including

hardware, database software, and tools, is projected to grow

from $2 billion in 1995 to $8 billion in 1998. Data

warehousing technologies have been successfully deployed in

many industries: manufacturing (for order shipment and

customer support), retail (for user profiling and inventory

management), financial services (for claims analysis, risk

analysis, credit card analysis, and fraud detection),

transportation (for fleet management), telecommunications

(for call analysis and fraud detection), utilities (for power

usage analysis), and healthcare (for outcomes analysis). This

paper presents a roadmap of data warehousing technologies,

focusing on the special requirements that data warehouses

place on database management systems (DBMSs).

A data warehouse is a “subject-oriented, integrated, time-

varying, non-volatile collection of data that is used primarily

in organizational decision making.”

Typically, the data

warehouse is maintained separately from the organization’s

operational databases. There are many reasons for doing this.

The data warehouse supports on-line analytical processing

(OLAP), the functional and performance requirements of

which are quite different from those of the on-line transaction

processing (OLTP) applications traditionally supported by the

operational databases.

OLTP applications typically automate clerical data processing

tasks such as order entry and banking transactions that are the

bread-and-butter day-to-day operations of an organization.

These tasks are structured and repetitive, and consist of short,

atomic, isolated transactions. The transactions require

detailed, up-to-date data, and read or update a few (tens of)

records accessed typically on their primary keys. Operational

databases tend to be hundreds of megabytes to gigabytes in

size. Consistency and recoverability of the database are

critical, and maximizing transaction throughput is the key

performance metric. Consequently, the database is designed

to reflect the operational semantics of known applications,

and, in particular, to minimize concurrency conflicts.

Data warehouses, in contrast, are targeted for decision

support. Historical, summarized and consolidated data is

more important than detailed, individual records. Since data

warehouses contain consolidated data, perhaps from several

operational databases, over potentially long periods of time,

they tend to be orders of magnitude larger than operational

databases; enterprise data warehouses are projected to be

hundreds of gigabytes to terabytes in size. The workloads are

query intensive with mostly ad hoc, complex queries that can

access millions of records and perform a lot of scans, joins,

and aggregates. Query throughput and response times are

more important than transaction throughput.

To facilitate complex analyses and visualization, the data in a

warehouse is typically modeled multidimensionally. For

example, in a sales data warehouse, time of sale, sales district,

salesperson, and product might be some of the dimensions

of interest. Often, these dimensions are hierarchical; time of

sale may be organized as a day-month-quarter-year hierarchy,

product as a product-category-industry hierarchy. Typical

An Overview of Data Warehousing and OLAP Technology

Surajit Chaudhuri Umeshwar Dayal

Microsoft Research, Redmond Hewlett-Packard Labs, Palo Alto

surajitc@microsoft.com dayal@hpl.hp.com

OLAP operations include rollup (increasing the level of

aggregation) and drill-down (decreasing the level of

aggregation or increasing detail) along one or more

dimension hierarchies, slice_and_dice (selection and

projection), and pivot (re-orienting the multidimensional view

of data).

Given that operational databases are finely tuned to support

known OLTP workloads, trying to execute complex OLAP

queries against the operational databases would result in

unacceptable performance. Furthermore, decision support

requires data that might be missing from the operational

databases; for instance, understanding trends or making

predictions requires historical data, whereas operational

databases store only current data. Decision support usually

requires consolidating data from many heterogeneous

sources: these might include external sources such as stock

market feeds, in addition to several operational databases.

The different sources might contain data of varying quality, or

use inconsistent representations, codes and formats, which

have to be reconciled. Finally, supporting the

multidimensional data models and operations typical of

OLAP requires special data organization, access methods,

and implementation methods, not generally provided by

commercial DBMSs targeted for OLTP. It is for all these

reasons that data warehouses are implemented separately

from operational databases.

Data warehouses might be implemented on standard or

extended relational DBMSs, called Relational OLAP

(ROLAP) servers. These servers assume that data is stored in

relational databases, and they support extensions to SQL and

special access and implementation methods to efficiently

implement the multidimensional data model and operations.

In contrast, multidimensional OLAP (MOLAP) servers are

servers that directly store multidimensional data in special

data structures (e.g., arrays) and implement the OLAP

operations over these special data structures.

There is more to building and maintaining a data warehouse

than selecting an OLAP server and defining a schema and

some complex queries for the warehouse. Different

architectural alternatives exist. Many organizations want to

implement an integrated enterprise warehouse that collects

information about all subjects (e.g., customers, products,

sales, assets, personnel) spanning the whole organization.

However, building an enterprise warehouse is a long and

complex process, requiring extensive business modeling, and

may take many years to succeed. Some organizations are

settling for data marts instead, which are departmental

subsets focused on selected subjects (e.g., a marketing data

mart may include customer, product, and sales information).

These data marts enable faster roll out, since they do not

require enterprise-wide consensus, but they may lead to

complex integration problems in the long run, if a complete

business model is not developed.

In Section 2, we describe a typical data warehousing

architecture, and the process of designing and operating a

data warehouse. In Sections 3-7, we review relevant

technologies for loading and refreshing data in a data

warehouse, warehouse servers, front end tools, and

warehouse management tools. In each case, we point out

what is different from traditional database technology, and we

mention representative products. In this paper, we do not

intend to provide comprehensive descriptions of all products

in every category. We encourage the interested reader to look

at recent issues of trade magazines such as Databased

Advisor, Database Programming and Design, Datamation,

and DBMS Magazine, and vendors’ Web sites for more

details of commercial products, white papers, and case

studies. The OLAP Council

is a good source of information

on standardization efforts across the industry, and a paper by

Codd, et al.

defines twelve rules for OLAP products. Finally,

a good source of references on data warehousing and OLAP

is the Data Warehousing Information Center

Research in data warehousing is fairly recent, and has focused

primarily on query processing and view maintenance issues.

There still are many open research problems. We conclude in

Section 8 with a brief mention of these issues.

2. Architecture and End-to-End Process

Figure 1 shows a typical data warehousing architecture.

Data sources

Operational

dbs

External

sources

Extract

Transform

Load

Refresh

Data Warehouse

Data Marts

Analysis

OLAP

Servers

Data Mining

Query/Reporting

Metadata

Repository

Monitoring & Admnistration

Tools

Serve

Figure 1. Data Warehousing Architecture

It includes tools for extracting data from multiple operational

databases and external sources; for cleaning, transforming

and integrating this data; for loading data into the data

warehouse; and for periodically refreshing the warehouse to

reflect updates at the sources and to purge data from the

warehouse, perhaps onto slower archival storage. In addition

to the main warehouse, there may be several departmental

data marts. Data in the warehouse and data marts is stored

and managed by one or more warehouse servers, which

present multidimensional views of data to a variety of front

end tools: query tools, report writers, analysis tools, and data

mining tools. Finally, there is a repository for storing and

剩余9页未读，继续阅读

评论收藏

内容反馈

godcurse

粉丝: 0
资源: 16

数据仓库技术（An overview of data warehousing and OLAP technology）

最新资源

数据仓库技术（An overview of data warehousing and OLAP technology）

数据仓库和olap

Online Bibliography on Data Warehousing and OLAP

An Overview of Servlet and JSP Technology.zip

Overview of data mining

An Overview of Multi-Task Learning in Deep Neural Networks.pdf

An overview of gradient descent optimization algorithms

An overview of multi-task learning.pdf

2007 - Smith - An Overview of the Tesseract OCR Engine.pdf

from theory to practice an overview of MIMO space-time coded systems.pdf

Enterprise Data Warehousing with SAP BW – An Overview

An overview of kernel alignment and its applications

Introduction to data mining

An Overview of Distance Metric Learning (by Liu Yang)

an overview of statistical learning theory IEEE Transactions.pdf

Service-generated Big Data and Big Data-as-a-Service: An Overview

An overview of PKI trust models

AN OVERVIEW OF PEAK-TO-AVERAGE POWER

Overview of the OMG Data Distribution Service

an overview of MIMO systems_a key to gigabits wireless

An Overview of Coal Mine Rescue Robot and its Navigation.pdf

An Overview of Microsoft Research China

An Overview of MIMO Space-Time Coded Wireless Systems

An Overview of MIMO Communications—A Key to Gigabit wireless

An Overview of Business Intelligence Technology

Oracle 官方数据仓库资料

An overview of Lua

仿真电路以及操作方法

python大作业 含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar

最新资源

python大作业含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar