Cockpit: Decision Support Tool for Factory Operations and Supply Chain Management (continued)


Previous Next     Page 6 of 14

DATA ORGANIZATION

Known Issues

At the outset of this project, we faced a number of challenges from a data perspective:

  • Data was stored in a number of databases within the organization. To meet initial customer commitments, we needed to consolidate data from four separate systems: two manufacturing WIP systems, a planning system, and a safety and health system.

  • There were no existing sources for common data, such as factory calendars, product masters, or factories.

  • Use of OLAP technologies was relatively new at Intel and the selected product, Microsoft OLAP Services*, had just been released. No resources were available to draw on. There were no consultants, books, or classes.

Team Approach

To maximize utilization of resources available within the organization, and to minimize the number of developers we would need to hire, we implemented dimensional modeling techniques to create the data mart. This proved to be the easiest method to consolidate data from disparate systems, allowing for an easily supportable OLAP design. We decided to take a team approach to developing the data mart and so did the following:

  • Implemented dimensional modeling techniques to create the data mart. This proved to be the easiest method to consolidate data from disparate systems belonging to different groups, allowing for an easily supportable OLAP design.

  • Partnered with IT for technical assistance. IT procured and configured the hardware infrastructure, provided data loading expertise, and performed database modeling.

  • Worked with data owners to understand data sources and design data validation routines. We developed cross-departmental relationships to quickly understand data sources. Because these departments traditionally owned the data, they were interested in ensuring that it was accurately reported.

Database Server and OLAP Engine Selection

Due to the development time frame for delivery of the solution to upper management, little time was available to perform a complete evaluation of various database server and OLAP engines. With only three months to delivery date, we selected Microsoft's SQL Server 7.0* and OLAP Services*.

Many developers on the team were already familiar with the SQL Server and knew that it would provide adequate power to support a database size of 10-20GB.

We then began work on creation of the initial data mart, and we selected dimensional database modeling as the design path. Much of the Cockpit's design was based on the teachings of Ralph Kimball, an expert in the field of dimensional modeling [1].

Dimensional Modeling

Dimensional Modeling (DM) is a logical design technique that attempts to present data in a standard, intuitive framework allowing for high-performance access. Essentially, this is a relational database modeling technique, but with some important restrictions [3]:

  • Every dimensional model is composed of one table with a multi-part key, called the fact table. This table typically contains one or more numeric facts or measures.

  • The model also includes a set of smaller dimension tables, each having a single-part key. These tables contain descriptive information about such things as products, factories, and time. Dimension tables are used to describe the facts in the fact table.

  • When these tables are joined together, they create the characteristic star-like structure, commonly called a star join.

Figure 9: Typical star join dimensional model [6]

Dimensional models are inherently additive. Users rarely ever retrieve a single record from the table, but instead retrieve hundreds, thousands, or even millions of records at a time. The only useful operation to perform on such a large number of records is to add these items [4].

Figure 10 displays the facts Units Shipped. This fact is described at each intersection of the dimension tables; in this case, Products, Time, and Factories.

By simply filtering the result set, you easily obtain the number of units shipped for any combination of products and factories during a given time period. From an end-user viewpoint, this type of database design is easy to understand and use [2].

Figure 10: Dimensional Model composition displaying fact and Dimension Table relationships [2]

Data Population
Populating the dimension tables proved the most difficult part of this task. Most of the work was not technical in nature, but required a large amount of business analysis to determine how products are categorized, and how to gather the various shift schedules from each factory.

The product dimension was the most difficult table to populate, due to the need to cross-reference shop floor product codes used in manufacturing and planning systems, with common product nicknames used by most end-users. This information did not exist in any database. Instead it resided in spreadsheets, e-mail messages, and with individuals in factories. After gathering the data, we built and populated an entire set of cross-reference tables to maintain attributes in the product dimension table.

Initial customer requirements indicated the need to view weekly data, with a possible future desire to view daily data. Viewing data at a factory shift level was not a requirement. During the design, we decided to store it at the lowest level possible, and chose to store data in fact tables at a shift level, just as a precaution. This later proved advantageous due to a change in customer requirements requesting shift level data.

After gathering requirements and identifying data sources, data analysis consumed at least 50% of the total time allocated to development. Due to the size of the development window, full up-front data analysis was not possible. During data analysis, we leveraged relationships with other groups within the organization to identify the data experts to help construct the proper queries to extract data. Once data analysis for each indicator was 70-80% complete, Extract, Transform, and Load (ETL) work started. Developers in IT used Informatica* to visually extract data from the source, transform it using a variety of business rules, and load it into the data mart. Data loading jobs were scheduled to load all dimension tables first, followed by fact tables.

OLAP (On-Line Analytical Processing)

OLAP provides organizations with a means of accessing, viewing, and analyzing data with high flexibility and performance. First and foremost, OLAP presents data to end-users through a natural, intuitive data model. Second, OLAP accelerates delivery of information to end-users viewing these multidimensional structures by preparing some computed values in the data in advance, rather than at execution time. The combination of easy navigation and fast performance allows end-users to view and analyze data more quickly and efficiently than is possible with relational database technology only. The end result is more time spent analyzing data and less time analyzing databases.

In an OLAP data model, information is conceptually viewed as cubes, which consist of descriptive categories (dimensions) and quantitative values (measures). The multidimensional data model makes it simple for users to formulate complex queries, arrange data on a report, switch from summary to detail data, and filter or slice data into meaningful subsets. Within each dimension of an OLAP data model, data can be organized into a hierarchy that represents levels of detail on the data. For example, within the time dimension, there can be years, months, and days levels. Similarly, within the geography dimension, there can be country, region, state/province, and city levels. A particular instance of the OLAP data model would have the specific values for each level in the hierarchy. Users viewing OLAP data can move up or down between levels depending on whether they want more or less detail [5]. One common misconception is that OLAP technology is similar to that of relational databases. OLAP cubes are queried using multidimensional expressions or MDX. Though MDX shares a number of keywords with SQL, such as SELECT, FROM, and WHERE, there is little similarity between them.

Figure 11: Multidimensional OLAP cube, dimensions and relational schema [7]

Before engaging in a full OLAP development project, it is advisable to train developers in the differences between the two. For our team, however, creating OLAP cubes was relatively easy, mainly due to the design of the underlying data mart. The majority of issues encountered involved implementing MDX inside of calculated measures. Once data was cubed, data validation became increasingly important. Some of the validation was straightforward, especially where standard reports existed for comparison. However, we encountered challenges validating OLAP cubes against spreadsheets maintained by users who gathered information from a variety of sources.




Previous Next     Page 6 of 14