The Impact of Big Data on Student Privacy and How Universities Must ReactJoel Rosenblatt | Director of Computer and Network Security, Columbia University
As you go about your normal life, everything that you do online leaves “digital breadcrumbs” that can be collected and analyzed. The common buzzwords for this process are Big Data, Business Intelligence and Data Analytics. The words are not really that important, but the results are.
A quick Google search turned up a table called Student Data by Exam, which was a list of the information collected by the various tests that most high school students take. They include Demographics, Academic, Collect Plans, High School Courses and Activities. Today’s students, both at the K-12 and postsecondary levels, are generating huge amounts of data and institutions are simply sitting on it.
The Washington Post published an article in November 2015 called The astonishing amount of data being collected about your children, which states:
“Remember that ominous threat from your childhood, ‘This will go down on your permanent record?’ Well, your children’s permanent record is a whole lot bigger today and it may be permanent. Information about your children’s behavior and nearly everything else that a school or state agency knows about them is being tracked, profiled and potentially shared.”
As the father of two boys, who are now adults, I can say that as they were growing up, I found myself constantly reminding them that everything that they do on a computer is logged somewhere—and this was back in the 80’s. Not only is the amount of data being collected today impressive, the diversity of that data is downright scary.
The Family Education Rights and Privacy Act (FERPA) was passed in 1974, intended to protect the privacy of student education records. This federal law was passed before anyone had a clue as to how important the role of computers was to become in the education process. In today’s educational environment, it is virtually impossible to participate in the learning process without access to a computer and the internet. Assignments are sent or downloaded online, and answers are emailed or uploaded for grading. Tests are taken online. Research is done online, and the papers are written using some form of word processing. Major bureaucratic work in the institution is also moving online, with registrations, enrollment, payment, add-drop and other major tasks moving from the registrar’s office to the web portal. Our society is moving to “all online, all the time.” While there are many wonderful benefits to this movement, it may be worthwhile to step back and look at some of the implications.
When storage was an expensive commodity, every byte of data stored was scrutinized, and as soon as the project was finished, data was either erased or taken offline. In today’s world, you can purchase a two-terabyte external drive for under $100. The tendency is to keep everything forever—if you use Gmail, the message displayed when you look in your Trash folder is “No messages in the Trash. Who needs to delete when you have so much storage?!”
Everywhere you look, you are encouraged to save and not delete. The danger in this is that the data you save may not be your own. In the university, we love data. We are constantly analyzing and looking for better ways to use the data to understand why things are the way they are. Every research institute has an Institutional Review Board (IRB) whose purpose is to facilitate human subjects research and to ensure the rights and welfare of human subjects are protected during their participation. One of these rights is the right to privacy. I will frequently ask researchers to consult with the IRB before giving them access to data. One of the things that can be done to prevent the misuse of data is to make sure that it is properly anonymized. Another important tool for protecting data is encryption, however full-disk encryption only helps if your device is stolen, to protect the data from theft by hackers, and you need to use field encryption in databases.
The problems with this (and there are many) all revolve around the concepts of Big Data, searching and privacy. In this case, think of the data as the internet. You can’t get much bigger than that. Google is an amazing tool. It gives you the ability to instantly search through an unlimited amount of unstructured data. Unfortunately, a lot of it is junk and, worse than that, it is often personal information that has been posted (whether on purpose or by accident), and it is left up to the reader to decided what to do with the information. I wrote article called Privacy Is Not Dead, But It Is On Life Support, and while I believe that there may still be some privacy left in the world, it is not to be found online. Anything placed online can be found, and used in any way that the finder chooses.
So how can colleges and universities, who are full participants in today’s online-first world, protect against these forces?
An important document that every institute should have is a Data Classification Policy (you can click to see the Columbia University policy.) Without a policy defining the class of data, you will not be able to make good decisions about storing or protecting it.
In the university, I have a few rules that I use to try and reduce the risk of exposing information:
1. You can’t lose what you don’t have: Never collect information that is not needed. We have a risk-management team that looks at applications that use sensitive or confidential information. One of the questions asked is “Why are you asking for this information?” If there is not a valid business reason, then the data should not be collected.
2. Don’t keep what you don’t need: There are many legitimate reasons to collect data, however when the data is no longer needed, it should be deleted (How to Turn 4 Million Into 18 Million). In 2014, a major university had a data breach and over 200,000 names, Social Security numbers, dates of birth, and university identification numbers were taken by hackers. While I do not fault them for being hacked, I do fault them for having this information stored. Much of it belonged to alumni and not current students. That day I started a hunt for data we should not be keeping and had over 50,000 records deleted that were no longer needed.
3. When you do look for information, verify the source: Not everything you read will be fake news, but always consider the source of the information and, whenever possible, verify (we use the expression “trust but verify” often.)
In the security business, we are entrusted with the safety and privacy of our people. It’s not an easy task, but I believe that it is something worth doing.
Author Perspective: Administrator