Data Privacy and Security in the Age of Big Data – A Guide for Enterprises
Initially this year, the media was awash with stories of a data compromise which involved Facebook and Cambridge Analytics. This raises major concerns over the long burning issue of data security and privacy. As technology has evolved we have started leaving a data trail for almost everything we do. Whether we use the GPS, online shopping, apply for jobs or send emails. All this data becomes fodder for companies to carry out analysis to understand us better. The question is where to draw the line, and where this obsession with data becomes a case of intrusion into our privacy. Recently, several governments are launching laws to ensure data privacy and security of its citizens. The answer, according to many visionaries, lies in changing the way we view data. Who owns the data? Is it the large corporations who collect & store it, or, the people who generate it? The answer lies somewhere midway. The clouds clear up when we realize that the answer lies in the way we view data. It’s an extension of ourselves, and we should have the ownership of how much to reveal and how much to keep private. With this in mind, the mantra that is catching up is leveraging Data Analytics and data management to design for data security.
According to Cyber Security Ventures, every 39 seconds there is an attempt to breach an organizations security from outside sources and its costs are going to reach $6 trillion by 2021. Security spending by different organizations will be around $ 1 trillion by then. Majorly affected sectors are government, retail, and technology.
The Steps Towards Achieving Data Security
Data security is not a one-time effort, but it is an ongoing process involving several stakeholders. Let’s take a look
The initial assessment is key to understanding how the data is organized within a system. Then by leveraging human insights one can identify the sensitive data and infer the state of security.
The next step is to set up a robust metadata management system and a methodology to track the data lineage. Metadata is data about the data that is being stored and plays a key role in enhancing data security. These two aspects converge to identify the sensitive data. Data pseudonymization, anonymization, masking and encryption along with data sub-setting ensures that data critical to an individual stays protected.
The next step is to set up a data governance process, wherein the whole process of identification, protection and archival of sensitive data is overseen. By establishing a data lineage, one can identify the sensitive data and subsequently a vulnerability assessment is carried out. A central master hub needs to be put in place which has linkages to the various applications ingesting and consuming data. This hub can detect any anomaly and alert the respective owners in case of a data breach. Post archival of the data, proper authorization, and access control system needs to be implemented to prevent forgotten data being accessed maliciously.
The next step is to request users to anonymously donate data for analysis and reverse engineering. To elaborate, say a user who has donated his data, has a normal pattern of consuming data online. He reads news and shops. Any change in the behavior will be considered an anomaly and will be investigated for the data breach. Say a random ad by an organization which has access to private data has made him digress from his daily routine. On the basis of this community, data algorithms can be set up to detect abnormal patterns.
To implement data protection for any organization one needs to have a clear understanding of the business terminology (a word and its precise meaning in the business), Business data objects (person, place, vendor etc. and other data points central to the business) and the metadata related to the business data objects.
Data security leveraging big data is still very much a work in progress and set up a feedback loop to bring in continuous process improvements to make the system more robust and foolhardier.
Human experts play a pivotal role. As the hackers are getting smarter, our experts have to continuously evolve the system, combine various methodologies to use human intuition to make the system stand the various attacks.
In short, for providing a holistic and faster data protection the following dimensions need to be taken care of –
- Understand the organization of data in a system
- Leveraging human insight
- Use predictive analytics to predict and prevent malicious behavior
- Build custom models of data management by using the best practices
- Use state-of-the-art technology to efficiently process the big data
- Have a model to correlate user behavior
- Involve a community network to create NLP and machine learning models
- Get an in-depth understanding of the complex relationship between the data
Big Data tools and practices are going to play a far more important role in ensuring security as digital technologies become more and more ubiquitous. Threat data along with the components of attacks and their correlation will be input to the threat intelligence systems. These systems can vastly improve the trust and reliability of an organization in the eye of the customer. At the time time, these can also protect the organization from extra expenditure in the form of recovery costs, legal fallouts as well tarnishing of brand images.