Designing a Corporate Information Factory
using the Zachman Architecture Framework
by : George Jucan
1. Introduction
A new evolutionary phase in the corporations IT policy is spreading across
As some of the old systems become functional or technological obsolete, the companies are forced to replace them. And most of them are trying to apply an enterprise view for the new systems, to avoid ending up in the same state as they started with.
This correlates with the growing trend of developing Business Intelligence applications to better support the corporate strategic needs, over and above the operational needs of their activity. This class of OLAP systems that includes Data Warehouses, Data Marts, Decision Support Systems and so on require and enterprise view of the business objectives, business processes, supporting data and enabling applications.
There are many models and methodologies used by the industry experts to perform enterprise wide analysis, architecture and deployment. In the Business Intelligence area, Open Data Systems Inc. is applying John Zachman’s Enterprise Architecture Framework to the H.W. Inmon’s Corporate Information Factory concept to define enterprise wide solutions for our clients.
2. The Corporate Information Factory
In the last 4 years, Open Data Systems Inc. developed and refined it’s own version of a recommended Corporate Information Factory. Based on H.W. Inmon’s concept, the practice on client’s projects allowed us to customize the generic model to reflect the particular business environment.
Our concept is organized based on the enterprise data flow, rather then on systems type as in Bill Inmon’s concept. The complete model shows all types of potential combinations a company can implement, but based on specific business criteria parts of them can be easily eliminated without affecting the other components architecture.
The basic architecture presented in Fig. 1 also displays an enterprise wide metadata layer. We actively promote the global metadata repository for the increased business data consistency that it provides, as opposed to locally managed metadata layers. The practice proved that the cost increase in implementing an enterprise wide metadata management system is usually not more then 10% over the cost of local metadata systems, but the benefits for the strategic planning of the enterprise are incommensurable. Moreover, the increased cost is paying back in terms of systems integration efforts, making the global metadata much more cost-effective on long term then its alternative.
Fig.1 – Generic Corporate Information Factory
The main components of the generic CIF model are:
1. Source Systems, mainly represented by the OLTP applications supporting the operational business activities of the enterprise, but also by existing data archives or external data feeds.
2. The Operational Data Store, represents the bridge between OLTP and OLAP environments. The data structure is similar to the transactional systems, but basic cleansing and integration processes are preformed on loaded data. For example master data objects are loaded from the systems of reference (most credible source) and any additional data element is appended to the structure defined by the system of reference.
To summarize, the Operational Data Store is:
- integrated: data from disparate operational systems is consolidated into a consistent view of the enterprise;
- subject oriented: data is stored grouped by business subject areas, rather then optimal transactional processing;
- volatile: data is permanently added, updated and deleted, to provide a snapshot of the current business environment;
- current valued: there is no long term history in an ODS; it usually stores one day/week/month worth of data;
- detail oriented: data in an ODS is at the same level of granularity as the operational systems, with no additional aggregates or summaries.
3. The
The Enterprise Data Warehouse is characterized as:
- integrated: data is stored in an enterprise consolidated view (universal naming conventions, measurements, classifications and so on), even if the source systems are not consistent;
- subject-oriented: all relevant data regarding a business subject area is grouped together;
- non-volatile: once the data was loaded it can be only read; the users are not allowed to perform any update/delete/insert operations, so it can provide a consistent history;
- time-variant: data is stored for long-term periods, quantified in years; it is not unusual for detailed data to be stored for 5-10 years, and summary data for up to 25 years.
4. The Client Systems are represented by departmental Data Marts, enterprise wide information systems (e.g. web portals) or local reporting systems, as well as other OLAP components like Decision Support Systems or MOLAP cubes. The client systems can be fed from the Operational Data Store, from the Data Warehouse or from both of them simultaneously. The data source for the client systems is defined by the integration or detail level required for that particular application. For example, a Corporate Web Portal will most likely receive data from the ODS for current information, but from the Data Warehouse for long-term statistical displays.
5. The Metadata Management System is the logic and semantic layer of understanding and interpreting the information stored by the various systems. The complexity of information regarding the whole environment is usually structured as:
- business metadata, such as subject area definitions, business process descriptions, definition of entities, attributes and relationships, technical implementations of business information, enterprise wide aliases and their departmental equivalents of business data elements, and so forth.
- technical metadata, describing the physical implementation of the business metadata. It is, at its turn, organized in:
- static metadata, describing the objects with very rare changes over time, such as tables descriptions and structure, attributes description and physical definition, unique identifiers of data elements, indexes defined for faster data access, entities relationships and the corresponding foreign keys, and so on.
- dynamic metadata, known also as data metrics, concerning data load volume and quality quantifiers, overall data statistics, data flows, data usage patterns and other information about the usage of the static structures.
3. The Zachman Framework
Enforcing the order in an enterprise wide effort is a huge task in itself. Too many of the today’s data integration and consolidation problems are related to a lack of enterprise perspective when the systems were built. For both companies building a new IT structure, or re-building the existing one, the need for a systematic approach became obvious.
The Open Data Systems Inc. is using for a number of years the Zachman Framework to fit the pieces of puzzle into an organized model. Introduced by John A. Zachman in 1987, its
The generic classification scheme shown in Fig.2 is based on analyzing the contributing factors at each level of abstraction, for each of the major activity layers. As this paper is not intended as an exhaustive presentation of the framework, we will only briefly describe the rows and columns defined in the matrix. Later in this presentation we will show specific usages of this tool to implement the Corporate Information Factory.
Fig.2 –
The framework rows represent the abstraction levels used to perform the system’s analysis:
- Scope is the highest abstraction layer, usually represented from fuzzy ideas or idealistic concepts.
-
- System Model is the level where conceptual objects receive a logical structure.
- Technology Model defines the physical objects that will represent the logical structures.
- Detailed Representation layer is composed by the fully specified physical implementations of each category.
The main enterprise activity layers are represented on the framework columns:
- Data layer reflects information representation.
- Function column is concerned about actions performed with the data.
- Hardware layer is an encompassing column for all the computers, networking and other supporting equipment.
- People column shows the actors involved in the process.
- Time represents the scale associated with the time elements on each abstraction level.
- Motivation is the engine of getting the things done.
Each of the cells defined by the intersection of the abstraction levels with the enterprise activity layers will have various meaning and content based on the subject the framework is applied to. In the following sections we will explore the applications of the framework for 2 of the major Corporate Information Factory components: The Operational Data Store and The Enterprise Data Warehouse.
4. Defining The ODS
The first recommended step in building a Corporate Information Factory is the Operational Data Store. It is usually less costly then the Data Warehouse, while providing a ‘good enough’ data consolidation across the enterprise. Even if its structure is still mostly reflecting the transactional systems’ design, it provides some enterprise wide reporting capabilities missing from the individual applications.
Unfortunately, the most common approach in defining an ODS is to use the largest system as architectural reference and add the missing pieces of information from the other systems available across the enterprise. Even if (or especially because) this method allows the development team to start working very soon after project’s approval, this is the most costly and lengthy method. The initial design will be changed several times during the project, almost every time when new important data pieces come into play. The changes performed in the data structures usually affect considerably the processes already designed and / or built, extending the project duration even more.
The better approach is to perform an enterprise level analysis from the beginning of the project. Even if some executives might get upset by the time spent before writing the first line of code, this approach defines the layout where each piece of the puzzle will fit. Because each system will be analyzed from an enterprise view, it almost makes no difference the order to incorporate the existing applications, as long as master data is incorporated before transactional data.
Fig. 3 represents a sample Zachman framework used by Open Data Systems consultants to define an Operational Data Store from corporate perspective. While the rows and columns have the same meaning as the basic Architectural Concepts template, the cells reflect the actions to be performed and some samples of what to look for. The template can be easily applied at corporate level and for each system to be includes in the ODS analysis.
Fig.3 – ODS Definition Framework
5. Defining The Data Warehouse
The most important piece in a Corporate Information Factory is the Enterprise Data Warehouse. It is the main data repository and the most important source of trustworthy pre-packaged information that can be used directly (with specialized tools) or off-loaded into specialized information processing environments.
There are multiple methodologies to design and implement a Data Warehouse. Open Data Systems adjusted the best theoretical principles to respond to practical implementation issues into its proprietary DW build methodology. In a nutshell, we promote a global high-level analysis to formalize the ‘big picture’, followed by an iterative design-build-implement cycle. This allows our clients to achieve the most critical results in a short timeframe (6-12 months) but still have the complete DW framework defined upfront. This allows the next cycle to build on top of the previous one, without having to redo much of the already built sections (as it is usually the case with iterative DW build).
The most important step to enable our over 90% reusability rate is the full-scale high-level analysis. During this phase the complete Data Warehouse scope is analyzed, as well as all the corporation structures and all the existing information systems. A conceptual data model is usually created and the major systems of reference identified, to ensure the consistency of the future build cycles.
Fig. 4 presents a sample Zachman Framework used for the Data Warehouse definition. While the rows and columns have the same meaning as the basic Architectural Concepts template, the cells reflect the actions to be performed and some samples of what to look for. The DW template can be easily applied both for the initial overall analysis, as well as for the detailed definition of each build cycle.
Fig.4 – Data Warehouse Definition Framework
6. Summary
This paper was not intended as an exhaustive presentation of either the Corporate Information Factory concept or the Zachman Enterprise Architecture Framework. We only presented a method of applying the order defined by Zachman’s Framework to build the most complex components of the Corporate Information Factory. The same method can be applied for the structured definition of other systems that can be part of the CIF, like Data Marts, Metadata Repository, MOLAP cubes and so on.
There are, of course, other methods and methodologies to define a Corporate Information Factory. But the Zachman’s Framework implementation was successfully tested by Open Data Systems’ consultants and proved to shorten the time and project costs with more then 10% over other methodologies.
For more information on the Corporate Information Factory concept please refer to Bill Inmon’s and Claudia Imhoff’s papers on the topic. Most of them can be found at www.billinmon.com. For details on Zachman’s Framework please visit www.zifa.com.
George Jucan is associated with Open Data System Inc. George frequently authors papers on various aspects of design and the associated topics of concern for a valid model..
© Copyright, 1998-2004 InConcept (Information Conceptual Modeling, Inc.) All Rights Reserved. Privacy Statement.
ISSN: 1533-3825
********************************************DISCLAIMER******************************************** This email and any files transmitted with it are confidential and contain privileged or copyright information. If you are not the intended recipient you must not copy, distribute or use this email or the information contained in it for any purpose other than to notify us of the receipt thereof. If you have received this message in error, please notify the sender immediately, and delete this email from your system. Please note that e-mails are susceptible to change.The sender shall not be liable for the improper or incomplete transmission of the information contained in this communication,nor for any delay in its receipt or damage to your system.The sender does not guarantee that this material is free from viruses or any other defects although due care has been taken to minimise the risk. **************************************************************************************************