Saturday, March 14, 2009

DBA and Datawarehousing






An Introduction to Oracle Warehouse Builder 10g

by Mark Rittman


Zhilst many Oracle DBAs build and maintain their data warehouses using nothing more than UNIX, SQL, and PL/SQL scripts, many are now starting to use graphical tools such as Oracle Warehouse Builder 10g (OWB) to help with development. What do tools such as OWB10g offer the DBA, and how can they help with the warehouse design and build?


Introduction
The Oracle database has many features that make it well suited to data warehousing, including support for very large databases, automated summary management, and an embedded multidimensional OLAP engine. Recent versions of Oracle have come with built-in extraction, transformation, and load (ETL) features, and it is possible to build an Oracle data warehouse using just these features, and SQL*Plus.


DBAs will, however, probably be aware that a number of vendors, including Oracle, offer tools and applications that are designed to assist with the design, build, and load of data warehouses — a type of application known as an ETL tool, with Oracle Oracle’s own offering being Oracle Warehouse Builder 10g. So, what does Oracle Warehouse Builder offer the Oracle DBA, and how can it help with the warehouse project lifecycle?


Product Architecture
Oracle Warehouse Builder is a GUI tool used for building Oracle data warehouses and data marts. Written in Java and currently now at version 10, versions are generally provided for Microsoft Windows, Intel Linux, and Solaris/HP-UX/AIX/Tru-64. OWB is licensed as part of the 10g Developer Suite, though as versions are generally released more frequently than updates to 10gDS, it’s best to check on the OTN download site to obtain the latest version.


OWB10g consists of two main components: the design environment, which is concerned with working with metadata (abstract representations of warehouse objects, together with business rules), and the runtime environment, which takes this metadata and turns it into physical database objects and process flows.




The design environment consists of the OWB client and a number of utility programs used for setting up the runtime environment, reporting on the repository, and providing a command-line interface for OWB.



All of the client tools work against metadata in the OWB repository, which is usually installed in its own schema on a development server. OWB comes with a Web-based reporting environment, which uses a desktop installation of OC4J to provide reports against the metadata in the repository.
Once the DBA has built the logical model in the repository, he needs to translate it into a physical design using the OWB Runtime component. This runtime component creates physical database objects, together with scripts and packages to extract data from source system, and populate the warehouse objects. OWB10g can extract data from any database that has either an ODBC driver, or an Oracle Gateway, together with flat files and data held in SAP. In addition, customers who license the business intelligence functionality for Oracle Applications are provided with a customized version of OWB, for extraction of applications data using business terms.


Graphical Object Definition
OWB10g allows you to create source modules that import data from flat files, relational databases, or SAP. When deployed, these import routines consist of SQL*Loader scripts, external table definitions, database links through to Oracle databases and databases accessed through Generic Connectivity or Oracle Gateways, and ABAP programs for extracting data out of SAP. Import definitions are defined using the OWB client graphical user interface, saved to the metadata repository, and generated using the OWB runtime component.



These OWB source modules are then used later on in the process as the start point for data transformations, which eventually go on to load the warehouse itself. As such, although a graphical user interface is used to define these sources, an experienced DBA or developer familiar with SQL*Loader syntax or Oracle DDL could probably hand-code the equivalent scripts in a shorter amount of time, as could an ABAP programmer working in a SAP environment. However, by creating source definitions within the OWB client, the definitions of these sources are captured within the central repository, providing a form of documentation as well as a way of automatically reconciling these definitions with possible changes in the underlying source data. Likewise, OWB provides a graphical interface for building warehouse objects, either as regular Oracle tables or with the additional ROLAP metadata available with the Enterprise Edition of Oracle 8i, 9i, and 10g.
Source-to-Target Data Mappings
Although a GUI interface for defining warehouse objects is welcome, the big productivity gain is through the mapping and transformation capability within OWB10g. OWB allows the DBA to define a mapping canvas that has one or more data sources and targets, with data being processed through SQL transformations or PL/SQL functions.
The palette on the right-hand side of the mapping allows you to drop tables, views, external tables, flat files, and materialized views onto the mapping canvas; draw data mappings between the objects; and apply a range of SQL and PL/SQL transformations to the data, including the ability to use pipelined table functions. As part of the OWB10g package, a number of additional PL/SQL transformations are provided, including a name-and-address cleansing feature that matches addresses in the warehouse with third-party name and address files, a “match-merge” feature, PL/SQL transformations that utilize the XML Toolkit, and ones used for loading data into analytic workspaces
How Efficient is the Generated Code?
ETL tools such as OWB10g usually work by generating program code that is then executed to deploy and load the warehouse. A common DBA concern about such an approach is that the generated code may not as efficient, or optimized, as code that the DBA writes himself. So what does OWB10g-generated code look like?
Firstly, it’s worth understanding that as Oracle has chosen to only support the enterprise editions of Oracle 8i, 9i, and 10g as deployment platforms, the code that is generated is Oracle SQL and PL/SQL, and leverages the data warehousing features of the Oracle database. For example, when deploying table definitions, all options such as parallel access mode, partitioning, logging mode, and so on, are available to use, and the default way to build warehouse objects is to use the ROLAP functionality (dimensions, facts, hierarchies, and so on) that comes with the enterprise edition. When deploying data mappings, OWB first tries to generate the mapping as a set-based SQL update, only falling back to a cursor-based PL/SQL update if a single SQL statement cannot be used. For a regular SQL update, the only code that OWB adds to the base SQL statement is a PL/SQL wrapper that runs the code within a package and logs the mapping results to a set of audit tables.
To take an example, a simple mapping that inserts or updates a dimension table from an external table would be built using the following OWB data mapping:
If we were writing this manually and deploying to an Oracle9i or 10g database, we would use the MERGE command to merge into the underlying CHANNEL table by selecting from the CHANNELS_EXT external table, and, if possible, use parallelism and direct path inserts. Looking at the code generated by OWB, this is indeed the approach that the tool takes:


CREATE OR REPLACE PACKAGE "MAP_CHANNEL" ASsql_stmt VARCHAR2(32767);get_abort BOOLEAN := FALSE;get_trigger_success BOOLEAN := TRUE;get_errors NUMBER(22) := 0;get_status NUMBER(22) := 0;-- Status variable for Batch cursors"CHANNEL_St" BOOLEAN;-- Function Main -- Entry point in package "MAP_CHANNEL"FUNCTION Main RETURN NUMBER; END "MAP_CHANNEL";/CREATE OR REPLACE PACKAGE BODY "MAP_CHANNEL" AS------------------------------------------------------------------------- Function "CHANNEL_Bat"-- performs batch extraction-- Returns TRUE on success-- Returns FALSE on failure------------------------------------------------------------------------FUNCTION "CHANNEL_Bat" RETURN BOOLEAN ISBEGIN EXECUTE IMMEDIATE 'ALTER SESSION ENABLE PARALLEL DML'; BEGIN MERGE /*+ APPEND PARALLEL(CHANNEL, DEFAULT, DEFAULT) */ INTO "CHANNEL" USING (SELECT "CHANNELS_EXT"."CHANNEL_CLASS" "CHANNEL_CLASS", "CHANNELS_EXT"."CHANNEL_DESC" "CHANNEL_DESC", "CHANNELS_EXT"."CHANNEL_ID" "CHANNEL_ID" FROM "CHANNELS_EXT" "CHANNELS_EXT" ) "MERGEQUERY_186" ON ( "CHANNEL"."CNL_ID" = "MERGEQUERY_186"."CHANNEL_ID" ) WHEN NOT MATCHED THEN INSERT ("CHANNEL"."CLS_CLASS_NAME", "CHANNEL"."CNL_NAME", "CHANNEL"."CNL_ID") VALUES ("MERGEQUERY_186"."CHANNEL_CLASS", "MERGEQUERY_186"."CHANNEL_DESC", "MERGEQUERY_186"."CHANNEL_ID") WHEN MATCHED THEN UPDATE SET "CLS_CLASS_NAME" = "MERGEQUERY_186"."CHANNEL_CLASS", "CNL_NAME" = "MERGEQUERY_186"."CHANNEL_DESC"; COMMIT; EXCEPTION WHEN OTHERS THEN ROLLBACK; COMMIT; RETURN FALSE; END; COMMIT; RETURN TRUE; END "CHANNEL_Bat"; FUNCTION Main RETURN NUMBER IS get_batch_status BOOLEAN := TRUE; BEGIN -- Initialize all batch status variables "CHANNEL_St" := FALSE; "CHANNEL_St" := "CHANNEL_Bat"; RETURN get_status; END Main; END "MAP_CHANNEL"; /





No comments:

Post a Comment