Xerox: How Privacy Engineering Is Changing Big Data

Software application developers generally start the process of application design and construction by thinking about (hopefully) matching customer requirements to fulfill a use case. Within this period of planning, it’s true to say that data privacy is not the first area of concern, generally. But with applications now becoming big data applications -- and with analytics functions pervading throughout technology, data privacy concerns are coming to the fore.

Thierry Jacquin is senior research engineer in enterprise architecture at the Xerox Research Centre Europe. He argues that data privacy is not by any stretch of the imagination a new issue, but it has a new force and urgency nowadays -- especially in the big data era right down to a software application development level.

An organic analogy

“The value of personal data can hardly be exaggerated, as businesses seek to improve competitiveness through customer insight, or create novel services and business models that respond in real time to customer data. I like to compare the rise in public concern over data privacy to what has happened in agriculture, where health and environmental concerns over food have spurred a rapid growth in organic farming. Just as farmers must choose whether to be organic or not, I believe that businesses must make a fundamental strategic choice about what kind of business they want to be when it comes to data privacy,” said Jacquin.

But the challenge we have is at ground level. We need to instill a new urgency for data privacy reflected in the architectural level design of the work carried out by systems engineers, architects and programmers.

To implement privacy you first have to secure the storage of the data considered private by the law in the country where the 'service' (we mean online cloud/web application service) is deployed. Jacquin points out that this necessitates translating legal constraints into technical requirements, which are delivered to the service's software programmers for each of the identified ‘private record’ fields of data we are looking at.

“Coding options to fulfil these requirements include anonymization, access control or both -- via well-known security libraries with encryption/decryption or resource/rules/context-based access controllers. But managing privacy is much more than simple data storage security. It needs to be considered as an interactive social process to build mutual trust between the provider and receiver of personal information,” he said.

How real information transparency works

The basis of this trust lies in transparency, as has always been the case in sharing private information, of course. The receiver of the personal information must declare what information they need and why. Based on this understanding… the owner of the data will first provide fairly limited information & consent… and then evaluate the result. The protocol between the two will continuously evolve (one step at a time) as the benefits of the service are appreciated by the user i.e. they will extend their data and consent to benefit from a better service.

Privacy Engineering (to brand it with CAPS) must therefore become a whole new discipline that supports this virtuous circle of transparency and trust -- and it will do so through a dedicated model driven architecture.

We are at a time whereby it’s possible to imagine the development of platforms of ‘organic’... [+] services, ruled and branded by a shared privacy charter. Free image: Wikimedia Commons

Does more trust = better data quality?

The argument here is that with data privacy controls in place we see trust start to build further trust and, crucially, this approach tends to have a knock-on benefit in improving data quality. The theory goes as follows --- when there is lack of trust in digital services, users are more likely to give inaccurate information; whereas trust engenders a willingness to provide accurate information, and more of it.

So what steps can we take to speak to the ‘geek’ software application developer programmers and create an interest level in data privacy aspects of software design?

“[With this generalized approach as discussed so far], Agile programmers can continue to develop incredibly creative code without having to add the complexity of implementing privacy oriented requirements. However it should be possible at any time to associate privacy policies with the risks identified by the regular Privacy Impact Assessments that will be run on the binary services deployed (privacy policies are similar to access rules associated with files in Unix),” said Jacquin.

Privacy-by-default

Xerox says it champions what it calls a ‘privacy-by-default’ framework that it has built to provide for an architecture that is relatively straightforward for data processors to use. It lets software application developers express the exact nature of the data they need to collect and the precise purpose for which it will be used... and then link this meaningfully to their service code.

“It's very early days for everyone working on making data privacy a reality in our world of ubiquitous connectivity, social networking, cloud computing and big data analytics. Yet we are at a time whereby it’s possible to imagine the development of platforms of ‘organic’ services, ruled and branded by a shared privacy charter,” said Jacquin.

Because it's such early days, it's not always easy for businesses that want to get serious about data privacy to do so. The term competitive advantage through data privacy is rarely used these days, but in the connected world of cloud we can agree that privacy, identity and consent are a much more sensitive issue. More vendors are soon to throw in their commentary in the space, be warned.

Follow me on Twitter or LinkedIn.

More From Forbes

Xerox: How Privacy Engineering Is Changing Big Data