By Glyn Moody
Last year, Italy’s prime minister at the time, Matteo Renzi, announced that IBM would invest $150 million dollars building a new research center in Milan for its Watson Health division, which applies “cognitive computing” techniques to healthcare. As usual, much was made of what was presented as a big win for Italy and its citizens, before rapidly disappearing from the public view. A year later, the Italian journalist Gianni Barbacetto obtained the relevant memorandum of understanding signed by IBM and the Italian government, which revealed the high price the latter would pay.
In return for that $150 million investment, IBM will receive the medical records of 61 million Italians in what seems to be their entirety. According to Barbacetto (original in Italian), the information provided will include: demographic data; all medical conditions, diagnoses, and their treatment; emergency and other hospital visits, including dates and times; prescriptions and their costs; genomic data and information about about any cancers; and much else besides.
This information will be supplied in a supposedly anonymous form, with obvious personal indicators removed. However, it has been known for decades that detailed medical records can never be considered truly anonymous. Here’s what Ross Anderson, Professor of Security Engineering at the Computer Laboratory, University of Cambridge, wrote in 1998 on the topic of de-identifying medical data:
although it is not too difficult to de-identify data that provide only a time-limited snapshot of a population’s health – such as the data which health services use to compile monthly management statistics of numbers of operations, consumption of drugs and the like – it is effectively impossible to de-identify longitudinal records, that is, records which link together all (or even many) of the health care encounters in a patient’s life.
You only need a few reasonably specific medical facts about someone – for example when and where they broke their arm in a fall, or the dates they gave birth to their children – to find a health record that matches. At that point, you will then know the complete medical history of that person. In any case, according to the memorandum obtained by Barbacetto, IBM will have access to identifiable data too, although it’s not clear exactly how, or on what scale:
IBM…expects to be able to gain access – in ways to be defined – for processing the health data of roughly 61 million Italian citizens (understood as historical, present and future health data) both in an anonymous and an identifiable form
A couple of days after revealing the existence of the memorandum of understanding, Barbacetto obtained the related “Industrial Development Contract Proposal” for the deal. Here are some of the benefits that IBM hopes to obtain, in the translation by Walter Vannini, an Italian data and privacy expert:
“To generate strategies for appropriate, coordinated care”; “to improve the management of high-risk-, high-need-patient clusters, lowering service costs and improving patient results”; “to give citizens and businesses easier access the data patrimony owned by the public administration”; and even “develop research projects on big data, infectious diseases, elder care, predictive precision oncology”.
The same document underlines that IBM alone will retain rights to the results of its research and any of the new solutions and tools that are developed using the medical data of 61 million Italians, and that it can license these to third parties. What’s remarkable is that not only is the Italian government giving IBM access to this extremely valuable data for free, it is also providing the US company with an additional €60 million (around $66 million) funding, according to Barbacetto. Half of that will come from the central government, and the other half from the regional government of Lombardy, which has Milan as its local capital.
Not surprisingly, perhaps, Italy’s data protection agency is investigating this massive transfer of extremely sensitive data, which is taking place without any kind of consent from the people most directly affected. But IBM isn’t the only tech giant to be hitting legal problems with accessing health data on a large scale. Over in the UK, Google’s AI subsidiary DeepMind signed a data-sharing agreement with the National Health Service (NHS) there. As New Scientist reported at the time:
The agreement gives DeepMind access to a wide range of healthcare data on the 1.6 million patients who pass through three London hospitals run by the Royal Free NHS Trust – Barnet, Chase Farm and the Royal Free – each year. This will include information about people who are HIV-positive, for instance, as well as details of drug overdoses and abortions. The agreement also includes access to patient data from the last five years.
Although the conditions imposed on Google appear more stringent than those on IBM – Google cannot use the data elsewhere in its business, and DeepMind must delete its copy of the data once the agreement expires later this year – it now seems that the deal may be on shaky ground. Sky News has obtained a copy of a letter sent to the NHS about the Google agreement:
It reveals that the UK’s most respected authority on the protection of NHS patients’ data believes the legal basis for the transfer of information from Royal Free to DeepMind was “inappropriate”.
Irrespective of that specific development, there’s another, more general issue here. It was raised in the 2016 New Scientist article:
For [Professor] Anderson, the more important question is whether Google – already one of the world’s most powerful companies – should have so much control over health analytics. “If Google gets a monopoly on providing some kind of service to the NHS it will burn the NHS,” says Anderson.