Electrical and Computer Engineering ETDs

Publication Date

7-12-2014

Abstract

As a part of the business intelligence activities initiated at the University of New Mexico (UNM) in the O ce of Institutional Analytics, a need for a data warehouse was established. The goal of the data warehouse is to host data related to students, faculty, sta , nance data and research and make it readily available for the purposes of university analytics. In addition, this data warehouse will be used to generate required reports and help the university better analyze student success activities. In order to build real-time reports, it is essential that the massive amounts of transactional data related to university activities be structured in a way that is op- timal for querying and reporting. This transactional data is stored in relational databases in an Operational Data Store (ODS) at UNM. But for reporting purposes, this design currently requires scores of database join operations between relational database views in order to answer even simple questions. Apart from a ecting per- formance, i.e., the time taken to run these reports, development time is also a factor, as it is very di cult to comprehend the complex data models associated with the ODS in order to generate the appropriate queries. Dimensional modeling was employed to address this issue. Dimensional mod- eling was developed by two pioneers in the eld, Bill Inmon and Ralph Kimball. This thesis explores both methods and implements Kimball's method of dimensional modeling leading to a dimensional data mart based on a star schema design that was implemented using a high performance commercial database. In addition, a data integration tool was used for performing extract-transform-load (ETL) operations necessary to develop jobs and design work ows and to automate the loading of data into the data mart. HTML reports were developed from the data mart using a reporting tool and performance was evaluated relative to reports generated directly from the ODS. On average, the reports developed on top of the data mart were at least 65% faster than those generated from directly from the ODS. One of the reason for this is because the number of joins between tables were drastically reduced. Another reason is that in the ODS, reports were built against views which when queried are slower to perform as compared to reports developed against tables.

Keywords

Thesis, Data warehousing, data mart

Sponsors

Office of Institutional Analytics, University of New Mexico

Document Type

Thesis

Language

English

Degree Name

Computer Engineering

Level of Degree

Masters

Department Name

Electrical and Computer Engineering

First Committee Member (Chair)

TURNER, TERRY

Second Committee Member

LAMB, CHRISTOPHER

Share

COinS