|
Cockpit: Decision Support Tool for Factory Operations and Supply Chain Management (continued) DATA ORGANIZATION Known Issues At the outset of this project, we faced a number of challenges from a data perspective:
Team Approach To maximize utilization of resources available within the organization, and to minimize the number of developers we would need to hire, we implemented dimensional modeling techniques to create the data mart. This proved to be the easiest method to consolidate data from disparate systems, allowing for an easily supportable OLAP design. We decided to take a team approach to developing the data mart and so did the following:
Database Server and OLAP Engine Selection Due to the development time frame for delivery of the solution to upper management, little time was available to perform a complete evaluation of various database server and OLAP engines. With only three months to delivery date, we selected Microsoft's SQL Server 7.0* and OLAP Services*. Many developers on the team were already familiar with the SQL Server and knew that it would provide adequate power to support a database size of 10-20GB. We then began work on creation of the initial data mart, and we selected dimensional database modeling as the design path. Much of the Cockpit's design was based on the teachings of Ralph Kimball, an expert in the field of dimensional modeling [1]. Dimensional Modeling Dimensional Modeling (DM) is a logical design technique that attempts to present data in a standard, intuitive framework allowing for high-performance access. Essentially, this is a relational database modeling technique, but with some important restrictions [3]:
![]() Figure 9: Typical star join dimensional model [6] Dimensional models are inherently additive. Users rarely ever retrieve a single record from the table, but instead retrieve hundreds, thousands, or even millions of records at a time. The only useful operation to perform on such a large number of records is to add these items [4]. Figure 10 displays the facts Units Shipped. This fact is described at each intersection of the dimension tables; in this case, Products, Time, and Factories. By simply filtering the result set, you easily obtain the number of units shipped for any combination of products and factories during a given time period. From an end-user viewpoint, this type of database design is easy to understand and use [2].
![]() Figure 10: Dimensional Model composition displaying fact and Dimension Table relationships [2]
Data Population The product dimension was the most difficult table to populate, due to the need to cross-reference shop floor product codes used in manufacturing and planning systems, with common product nicknames used by most end-users. This information did not exist in any database. Instead it resided in spreadsheets, e-mail messages, and with individuals in factories. After gathering the data, we built and populated an entire set of cross-reference tables to maintain attributes in the product dimension table. Initial customer requirements indicated the need to view weekly data, with a possible future desire to view daily data. Viewing data at a factory shift level was not a requirement. During the design, we decided to store it at the lowest level possible, and chose to store data in fact tables at a shift level, just as a precaution. This later proved advantageous due to a change in customer requirements requesting shift level data. After gathering requirements and identifying data sources, data analysis consumed at least 50% of the total time allocated to development. Due to the size of the development window, full up-front data analysis was not possible. During data analysis, we leveraged relationships with other groups within the organization to identify the data experts to help construct the proper queries to extract data. Once data analysis for each indicator was 70-80% complete, Extract, Transform, and Load (ETL) work started. Developers in IT used Informatica* to visually extract data from the source, transform it using a variety of business rules, and load it into the data mart. Data loading jobs were scheduled to load all dimension tables first, followed by fact tables. OLAP (On-Line Analytical Processing) OLAP provides organizations with a means of accessing, viewing, and analyzing data with high flexibility and performance. First and foremost, OLAP presents data to end-users through a natural, intuitive data model. Second, OLAP accelerates delivery of information to end-users viewing these multidimensional structures by preparing some computed values in the data in advance, rather than at execution time. The combination of easy navigation and fast performance allows end-users to view and analyze data more quickly and efficiently than is possible with relational database technology only. The end result is more time spent analyzing data and less time analyzing databases. In an OLAP data model, information is conceptually viewed as cubes, which consist of descriptive categories (dimensions) and quantitative values (measures). The multidimensional data model makes it simple for users to formulate complex queries, arrange data on a report, switch from summary to detail data, and filter or slice data into meaningful subsets. Within each dimension of an OLAP data model, data can be organized into a hierarchy that represents levels of detail on the data. For example, within the time dimension, there can be years, months, and days levels. Similarly, within the geography dimension, there can be country, region, state/province, and city levels. A particular instance of the OLAP data model would have the specific values for each level in the hierarchy. Users viewing OLAP data can move up or down between levels depending on whether they want more or less detail [5]. One common misconception is that OLAP technology is similar to that of relational databases. OLAP cubes are queried using multidimensional expressions or MDX. Though MDX shares a number of keywords with SQL, such as SELECT, FROM, and WHERE, there is little similarity between them.
![]() Figure 11: Multidimensional OLAP cube, dimensions and relational schema [7] Before engaging in a full OLAP development project, it is advisable to train developers in the differences between the two. For our team, however, creating OLAP cubes was relatively easy, mainly due to the design of the underlying data mart. The majority of issues encountered involved implementing MDX inside of calculated measures. Once data was cubed, data validation became increasingly important. Some of the validation was straightforward, especially where standard reports existed for comparison. However, we encountered challenges validating OLAP cubes against spreadsheets maintained by users who gathered information from a variety of sources. |