Problems to be solved

Data is being used to enhance health outcomes and pave the way for new advances in disease treatment. However, despite the immense amounts of money invested and the potential benefits promised by technologies empowered by big data, the actual results have not met expectations. The primary reasons for this shortfall include a lack of data that is reliable, collected over an extended period, and correlated...(read more)

In simpler terms, beyond artificial intelligence or big data technologies, the quality and quantity of the foundational data are crucial for innovation in healthcare. In this chapter, we will explore the challenges associated with obtaining sufficient, high-quality healthcare data.

1. Adequate balance between protection and utilization of data

Type 1. Difficulties in Combining and Analyzing Data Due to Anonymization

An individual's medical information is one of the most sensitive types of personal data. It is increasingly mandated by laws worldwide to be protected in very rigorous ways. The most prevalent protections are pseudonymization and anonymization. These processes de-identify data, making it challenging or even impossible to identify an individual, thereby reducing the risk of harm from data breaches or misuse. De-identified data, when securely anonymized or pseudonymized, can be freely used for certain purposes, such as research to develop new drugs and treatments. Consequently, major countries are enacting legislation that allows data to be used for valuable purposes while minimizing the risk of personal identification.

Nevertheless, these privacy measures inevitably pose limitations when it comes to extracting value from the information. When data sets are combined, they can be analyzed more comprehensively, thus creating new value. However, pseudonymized data either abstracts or categorizes data values. For instance, a 33-year-old individual is described as being 30 years old, a person weighing 87 kilograms is described as weighing between 80-90 kilograms, and so forth. Depending on the purpose of data usage, this can be inappropriate. Furthermore, combining data sets often enables us to achieve results that would not be possible with individual datasets alone, and anonymizing data makes this very difficult.

Type 2. The difficulty transmitting and utilizing data under the graded protection based on data generating location

Even if the data is not personally identifiable, it may face rigorous protection simply because it was generated in a hospital or pertains to genetic information, thus creating significant obstacles in its practical use. For instance, as of November 2022, in South Korea, if a patient measures their blood sugar at home with a personal device, it is categorized as health information and can be freely transmitted and used for any purpose. However, if the same blood sugar information is collected in a hospital and stored in an Electronic Medical Record (EMR), it falls under the Medical Act's jurisdiction. In such a scenario, even if a patient requests it, the hospital cannot directly send the data to another organization providing blood sugar analysis services. Currently, the only way to transfer this data is by the patient personally visiting the hospital, receiving the information, and delivering it to another organization.

Data subjects should have the self-determination right to decide who will know their information, to what extent, and how it will be used. For general personal information, laws such as the Personal Data Protection Act guarantee this right. However, medical information is governed by the Healthcare Act, which only guarantees self-determination rights as part of the data subject's data portability right (access rights, the usage of structured data formats, and the right to request transfer to a third party).

The frustration originates from how organizations, like hospitals, manage and account for patient data. While the original intention was to protect sensitive medical information, it inadvertently made it challenging for patients to access quality medical and health services, as they cannot transfer or aggregate their data to other organizations at their discretion. Consequently, patients' medical data becomes fragmented and stored by individual healthcare organizations. In such a situation, it becomes virtually impossible to provide customized services such as precision medicine for each individual patient. Thankfully, laws concerning 'my data' have first been implemented in the financial sector, and discussions are in progress to enact legislation to actualize the portability right and complete self-determination of personal medical information.

2. The way of retaining proper agreement and providing the post-right of control

As explained above, the extent of data currently available for use without explicit patient consent is limited to a select few purposes, like research and statistics. Additionally, the quality of such data may be compromised. To amass as much data as possible without these issues, it's essential to inform and secure consent from data subjects such as patients, concerning the types of data to be gathered, its purpose, and the usage terms.

Securing this consent is a fundamental prerequisite for any organization seeking to use the data. Consequently, these organizations will aim to get patient consent to maximize the unrestricted use of data. However, the downside is that consent might be procured in a way that inadequately safeguards the data subject. In fact, both EU (case: German Consumer Federation v. Planet49) and Korean courts (case: 1mm notice on sweepstakes tickets) have ruled that passive consent, such as via a pre-selected checkbox, or consent gathered in a manner that is not easily recognizable to the data subject, is invalid.

Nevertheless, the cause of such "inadequate consent" isn't necessarily attributed to the organization's ill intentions. It could be due to the complexity of the language used in terms of service and privacy policies, which could make it difficult for most people to comprehend. Alternatively, the very act of consent, intended to protect privacy, might paradoxically become burdensome for individuals, causing them to agree or disagree mindlessly, rather than ensuring the terms and conditions align with their best interests. Hence, even if an organization's intention is to better safeguard privacy (at least in terms of complying with the law in good faith), the outcome might still be inadequate consent.

On the flip side, a patient's understanding of how their data will be beneficial and the potential risks involved could also influence consent acquisition. The greater the perceived personal benefit from using their data, and the more they comprehend the risks, the more likely they are to provide informed consent.

On January 20, 2020, the U.S. Department of Health and Human Services (HHS) altered the Common Rule, allowing for the secondary use of data without additional consent, even if the data is identifiable and not used for research purposes. This condition is fulfilled if an informed, blanket consent is obtained initially. This modification aims to boost research efficiency and data value by eliminating the time and cost involved in acquiring patient consent each time, except when specific risk factors are present. It could also be beneficial to gather data with consent for a broader purpose, as plausible uses often cannot be anticipated until after the data collection.

However, it's crucial to offer patients access to complete histories of their data use and disclosure, as well as the right to revoke consent for the use of their data. An alternative approach is to implement a dynamic-consent system that collects data upfront but gives patients the chance to view more details at the usage point, and the option to either opt-in or opt-out at any moment, even after consent has been granted.

Securing informed consent and preserving the right to manage data post-consent is vital for protecting privacy and extracting value from data utilization. Achieving this will deliver a positive experience in terms of information transparency and system trust, expectations that will only increase in the future. This is a crucial consideration for organizations aiming to acquire patients and users, regardless of the legal dimensions. Thus, solutions that facilitate informed consent-based data management and usage are required from the perspectives of patients, organizations aiming to use data, and organizations managing data on behalf of patients.

3. Absence of incentives for data sharing

Major countries are implementing laws to strike a balance between privacy and data utilization through patient data self-determination. A notable instance is the 21st Century Cures Act in the U.S., which mandates healthcare organizations to ensure that patients' medical information, stored within these organizations, is interoperable, and that patients can access, exchange, and use their medical information in their chosen applications. Non-compliance can result in penalties up to $1 million per violation.

However, numerous healthcare organizations still share data in formats that are challenging to read and utilize electronically. Other companies and researchers are legally prevented from sharing data with patients, even with consent, or are hesitant to supply data due to data protection concerns, even in countries where such laws do not exist (Reference 1, Reference 2).

In South Korea, the Ministry of Health and Welfare is advocating for MyData legislation and a pilot service in the medical field called MyHealthway. Recent reports indicate that medical institutions will not be compelled to comply with patient information transfer requests, but voluntary participation will be encouraged to improve service quality for individuals and patients. Private companies other than medical institutions will be eligible to participate after 2024, but this is still far from realizing the right to strict data self-determination.

As such, data self-determination can only be achieved through legal obligations or penalties. Ideally, it would be propelled by the voluntary motivation of ecosystem stakeholders. However, a survey by the U.S. National Academy of Medicine found that executives from healthcare organizations reported a lack of economic incentives to share data, alongside concerns about losing a competitive edge by external data sharing. In reality, data sharing measures such as data structuring, standardization, quality control, and data archiving often require the resources and expertise of the data-generating healthcare organization, while the benefits are more likely to be reaped by the data-using organization. This misalignment of incentives complicates voluntary participation by data generators.

Required actions for sharing data
  1. Data Structuring and Standardization

    • Clinical data often presents inconsistencies in the terminology used in unstructured text formats. The challenge here is to structure this data so that it is comprehensible to a computer.

    • Additional efforts should be made to standardize data types, terminology, and formats to facilitate collaboration via data sharing.

    • Introduce searchable metadata to determine the existence of duplicate data and to identify combinable data.

  2. Quality Control

    • Identify and rectify mistakes such as patients with multiple health conditions only entering the diagnoses necessary for insurance claims, or unintentional omission of information or erroneous entries during manual record keeping.

    • Endeavor to resolve issues of accuracy with measurement devices, including inconsistent results depending on the proficiency of the user.

  3. Storing Data

    • Store and manage up to 200 GB of genomic data per individual (note).

    • Employ technologies that facilitate the storage, management, and transfer of data in a compact form, making it easily reanalyzable.

4. The understanding of data rights and the absence of reliable records

The concept of incentives is closely tied to consensus on data rights. It's widely agreed that patients should have autonomy over their data. However, the idea of ownership, which is intricately linked to incentives, is more complex. Generally, the notion of ownership applies to physical assets like real estate or objects. For intangible assets, there are intellectual property rights like copyright and patents, but these are only recognized when there's creative input involved. Therefore, copyright only applies to compiled databases, not to raw information or data as such.

Creating a patient's medical data involves substantial work. Beyond the basic data primarily gathered by healthcare professionals and machines in medical institutions, a significant portion is produced through expert evaluation or interpretation, such as diagnoses or positive-negative test results. Furthermore, several steps need to be taken before the data can be shared and used in a meaningful way. Sometimes, additional effort is needed to merge separate data. The outcome is a dataset compiled with significant investment and expertise from medical professionals.

Moreover, it's worth mentioning that healthcare data is public, supported either by a health insurance system or a publicly-funded healthcare system. Therefore, rather than securing exclusive revenue and usage rights for a specific entity, it would be more advantageous for the public and the data subjects to guarantee non-rivalry, that is, one entity's use of data doesn't limit its usage by others. This would permit more extensive data use.

Currently, there's no reliable method to document this history of ownership, data sharing, and usage, and to make it openly available and usable for all stakeholders.

Last updated