Why Your Non-Traditional Division Needs to Prioritize Its System
How Offering Self-Service Tools Can Take Non-Credit Divisions From Good to Great
These findings motivated a group of us at the University of Chicago’s Computation Institute to look at ways that research can be made more productive, focusing on those “mundane” activities that are critical to the research but that do not contribute to it directly. One such activity is that of data management. As research in practically every discipline becomes data-driven, investigators are spending increasing amounts of time acquiring, analyzing, moving, describing, protecting, and preserving massive amounts of data. Further, the collaborative nature of research requires that data are easily discoverable and accessible by other researchers, making a seemingly simple task quite daunting. For a few fortunate projects blessed with abundant funding, investing millions in infrastructure for data management is not an issue. But how can the myriad small- and medium-sized research groups on campus deal with this? Being computer scientists and engineers, we naturally focused on automation as the means to operational efficiency, and asked ourselves: How are such time intensive but necessary activities handled more efficiently in industry?
A common answer is by using software-as-a-service (SaaS). Most companies nowadays are outsourcing payroll, accounting, sales management and other administrative tasks to SaaS providers who can service their needs at marginal cost, using economies of scale. The benefits of SaaS are well known: end users gain advanced capabilities that typically require nothing more than a web browser to access; capital expenditures are reduced since no up-front investment in software and hardware is required; operating expenses are smoothed by paying for SaaS on a subscription basis; and the operational burden on IT staff is significantly reduced since they no longer need to install, configure, maintain and support these systems.
Using SaaS is commonplace in industry but only just starting to make inroads in higher education. Very few institutions run their own email services nowadays, having outsourced that mundane task to the likes of Google and Microsoft. But beyond email, most IT services are still delivered by on-campus infrastructure. Research data management in particular requires substantial ongoing investment in storage and network capacity, and more staff to handle increasingly complex usage scenarios. At the Computation Institute we developed SaaS to address this problem. Since late 2010 we have operated a service called Globus that enables researchers to rapidly move large data sets within and across institutions, reliably and securely. Over time we added the ability for researchers to share their data with others, directly from existing storage systems, without requiring special accounts and while maintaining data security and privacy. And more recently, we introduced a data publication service that makes it easy for researchers to describe their data and for their peers to discover and use it.
Early adoption of Globus is encouraging. More than 100 campuses and national research facilities have made the service available to their users, and by many reports, it is greatly increasing researcher productivity. In one example, a meteorologist cut data transfer times from 61 hours to 20 minutes, and in another, a climate research group made terabytes of data readily available to thousands of investigators worldwide who otherwise would have spent countless hours and funds to access that data.
Many of these early adopters were in high-performance computing (HPC) or research computing centers, but we are starting to see applications of the same SaaS technologies by central IT functions. For example, a growing number of institutions are creating campus research data services that are independent of their HPC systems, designed to manage big data as cheaply as possible. Globus is increasingly being used as the only interface to such storage systems because it provides a useful interface that requires minimal administration. A common workflow is that a researcher can request an allocation of project storage on this campus data service. Once approved, space is allocated on the storage system to hold project data, and the researcher can set policies that determine what data may be accessed by the project team and external collaborators. The system administration efficiencies are substantial, as is the time saved by the researcher and her collaborators.
We believe that many other groups on campus can realize these benefits of SaaS, but a number of hurdles remain. For example, our experience has shown that privacy and protection of information requirements can severely limit adoption of SaaS on campus. Data privacy is always a requirement at some level, but the regulations governing personal health information are particularly challenging. The complexity of this requirement is compounded by the very nature of SaaS, where various components of the infrastructure may be hosted by different vendors, making it more difficult for a single provider to ensure compliance of the end-to-end solution. Another issue is the reluctance of campus IT managers to cede any level of control over their resources to an external provider. In Globus, we overcame this by ensuring that our software does not directly affect any aspect of the system outside of existing control mechanisms. Along another dimension, contractual requirements may constrain the ability to deploy SaaS solutions. There is wide variability in the contract terms required by various institutions and negotiating with each SaaS provider individually will be a very time consuming process. The typical terms that a university requires of vendors are rooted in more traditional purchases (i.e. licensed software) from larger vendors, while many SaaS solutions are delivered by smaller vendors that are unable to accept the burden of some of these terms. This will be a recurring issue and will require that institutions revise their approach if they want to realize the benefits of SaaS.
Despite these challenges, the operational efficiencies are compelling and will likely result in broad adoption of SaaS over the next few years. Beyond time and cost arguments, SaaS can deliver advanced capabilities to many researchers that their campus IT would not otherwise be able to provide, thereby “democratizing” access. It is ultimately demand by these researchers that will catalyze the process, removing the barriers to adoption and creating a more productive environment for all.
How Offering Self-Service Tools Can Take Non-Credit Divisions From Good to Great
Author Perspective: Administrator
On the one hand, this sounds very practical and useful, but on the other, I vehemently disagree with the idea (pervasive at some institutions) that researchers are above administrative tasks. In an ideal world, they could do nothing but research, but that’s not the reality.
Of course, it’s not practical to expect that researchers will do nothing administrative. But I hope you will agree that are many tasks in the research workflow that are necessary but take a disproportionate amount of time. It’s in these areas that I’m suggesting automation via SaaS can be beneficial. For example, I would argue that having a postdoc “babysitting” a server while it transfers a few terabytes of data to another server for some analysis is time wasted that could be spent doing more productive things.
I’m curious about the privacy regulations preventing this system from being used more widely. I wonder if it’s problems with information control itself or if it’s just stringent laws governing personal information.
It’s a combination of factors and requirements that can inhibit broader use. For example, HIPAA regulations for safeguarding personal health information are somewhat open to interpretation in requiring that every system in the chain of custody meets certain conditions. In the case of SaaS, providers may interpret these requirements in different ways such that the end user can’t easily verify if the entire system is compliant. There are also other (non-regulatory) concerns, e.g., what information does the SaaS provider store about the customer’s usage of their system, who has access to this usage data, etc.
I agree that it’s not practical to expect that researchers will do nothing administrative. But I do believe there are many tasks in the research workflow that are necessary but take a disproportionate amount of time. It’s in these areas that I’m suggesting automation via SaaS can be beneficial. For example, I hope you agree that having a postdoc “babysitting” a server while it transfers a few terabytes of data to another server for some analysis is time wasted that could be better spent doing more productive things.
Regardless of whose responsibility administrative duties are, the democratization of research can only be a good thing, I think. All the research in the world won’t do anybody any good if it’s not accessible.
Agreed. It’s precisely this broader accessibility to the world’s research data that’s driving our work.