Improving Your Institution with Data-Centric Design

William Morse Jr. | Former Vice President and Chief Information Officer, Pomona College

Not having or understanding your data can have serve consequences on your institutional growth—take the time to approach your infrastructure with a data-centric design.

Every university in the world wants to know the answers to the following questions:

How many students should we expect this fall? How many faculty will we need to teach various disciplines in two, three or four years? How much space and what resources do we need to teach these classes? How can we accurately project our budget now and for future years? How can we tell whether our teaching is effective? How can we help students in trouble at the very earliest opportunity? What resources can we give our first-generation students, so they have the best chance of being successful? How can we target our recruitment efforts to potential students most likely to enroll at the institution? How can we also target our recruitment efforts to generate a diverse student body? Is it possible to not only target students most likely to enroll but also students most likely to be successful and, indeed, stay for all four years? And on and on…

These questions might seem simple, but getting to the right answers can be a daunting and complex task. The dirty little secret is that today, many universities simply don’t know the answers to these questions. Many make do with partial answers, or they simply guess what the answers might be. However, that can be very dangerous and yield unintended consequences either on the budget or on institutional goals. As those in the statistical sciences will tell you, not having all the data, using the wrong data or simply making up data will often lead to the wrong conclusions. For many institutions, this can lead to terrible decisions that, in turn, have profound consequences on the health of the organization, particularly today.

Deploying administrative systems with a data-centric design promises to give us the information we need to make good decisions. The idea behind a data-centric design is to get as much of an institution’s transactional data along with other associated information (co-curricular transcripts, admission materials, Learning Management System (LMS) statistics, etc.) into a usable, denormalized data warehouse as possible. To make the data warehouse successful, the data must have a common syntactic and semantic base. In other words, the data in the warehouse must be linked in such a way that the data points have the same meaning, and derivative results from the data are standardized.

If this sounds like a daunting task, it certainly can be. However, there are strategies, technologies and techniques that can help an institution empower their data. Vendors are providing much better tools and we in higher education have learned a lot from when these projects often ended up being failures. Building a functional data warehouse, while still a complex undertaking, is now doable, even at the smallest institution.

So, where do you start? The first step is to look at the transactional system itself. Institutions can take steps such that getting to a data-centric design is easier. Simplicity and commonality are key to easy implementation. While it is certainly possible to build a data warehouse in a “best of breed” environment (an Enterprise Resource Planning (ERP) system comprised of many different components from different vendors linked together either by live, real-time data transactions, such as found in web services, or through time-delayed batch processing), it is a much more complex challenge than building a data warehouse off an integrated, single-vendor-provided ERP.

The reasons for this are twofold: first, in a best of breed environment, it is more difficult to get the various systems to “agree” to a common syntactic and semantic core. This is because various solutions generally have completely different data structures that must be reconciled at some point for a data warehouse to work correctly. In an integrated ERP, this problem largely goes away because the data structure is already the same across pillars. Second, in a best of breed environment, transactions between the various system components are generally done in the form of summarized data. In other words, only the data that has to be exchanged amongst the systems for them to work is exchanged. The result is that a lot of the richness (detail) of the data is lost as it moves through a best of breed environment. A data warehouse, on the other hand, will be far richer and more useful if it has the full data detail. An integrated ERP solution makes it easier to get this detail into the data warehouse because there is only one system to deal with, and all the data within that system is naturally rich as it has not been reduced to summary data as information moves through various integrations. Finally, with only one system, the investment in the extract, transform, load (ETL) tool (that tool that actually takes the data from the transactional system and places it into the data warehouse) will be far less simple because there will be less need for it. In a best of breed environment, multiple systems may have to be individually linked into the data warehouse, greatly increasing the complexity and time to launch.

However, the simple truth is that many institutions already have a best of breed environment that has been built over many years and are unlikely—because of cost, effort and complexity—to switch to a single vendor ERP solution anytime soon. What can they do? The good news is that vendors have made progress, particularly with integration tools, that can simplify the process and enable richer data sharing. These tools, called integration platform as a service (iPaaS), can make systems integrations far more efficient and effective. These iPaaS solutions often come with prebuilt integrations that can be modestly modified and/or even have robust user groups in which the sharing of integrations within the platform community is common. More importantly, they provide a common coding environment that standardizes the integration process, which, in turn, helps the IT team get more effective and efficient at connecting systems. Code can even often be reused in such circumstances to build richer integrations, which means data doesn’t have to be left behind.

However, even if an institution has a unified ERP or has invested in iPaaS tools, to be successful, an institution must look at its data strategically instead of transactionally. That means looking at data architecturally with every integration the institution puts in place. When new system is purchased, the IT department should do a structured analysis to understand the intent of the solution, the data it will need or create and how that data might be important for larger analytical purposes. Then, when the integration is actually in place, these needs should be included in the design from the beginning.

In addition, good coordination with the university’s institutional research department as well as the formation of a data standards committee will greatly help move things forward. This group can provide the resources to make sure that what the data warehouse reports is accurate and matches institutional definitions. Even more importantly, they can resolve disagreements as to what various data points mean. For example, who in the student body is a freshman, sophomore, junior or senior? Believe it or not, that is not always an easy question to answer, and coming to consensus on these issues is important.

Once the common transactional data is in the data warehouse, other data sources can be added. Many institutions have added information about co-curricular activities, internships, volunteer opportunities, as well as information gathered through admissions, surveys, LMS, employment and more. Some institutions are even including public information from sites like LinkedIn. The richer the data, the better.

Why is all this important? Because one of the powers of a data-centric solution is the ability to look for unexpected data correlations and to ask “what if” questions. For example, a university may believe that working while going to school harms a student’s success potential. Properly configured, the data warehouse will show whether this is actually true. What’s more, an institution can also see when outside work statistically harms the student. Does it occur at 20 hours? Is it 30? And so on.

Of course, privacy should be a major concern. Many data warehouses are built with de-identified data. In such a system, while it may be possible to drill down to a particular student, faculty member or staff, the system is configured so that it is impossible to tell exactly who a particular user is. Finally, and most importantly, all the laws that govern a transactional ERP system, particularly those found in the Family Educational Rights and Privacy Act (FERPA) or as required under the Gramm-Leach-Bliley Act (GLBA), are still in effect for the data warehouse.

That said, it is a serious mistake to downplay privacy concerns. The best thing to do to ensure that these concerns are addressed is to present the issues transparently, get community feedback and build policy and technical protections into the system right from the start. If people understand that the purpose of the system is to help everyone be more successful, they will be supportive, particularly if they also know what protections are in place. After all, collecting data this way can sound very scary. It may take time to build trust. However, with the potential rewards, it is time well spent.

Disclaimer: Embedded links in articles don’t represent author endorsement, but aim to provide readers with additional context and service.

Improving Your Institution with Data-Centric Design

You Might Also Like

Improving Your Institution with Data-Centric Design

Cultivating Leadership in Lifelong Learning Institutes: A Narrow Bridge

Why Continuing Education is Leading Academic Innovation