The government said it would establish a “cancer big data platform” that combines health data of private and state-run medical institutions, drawing local oncologists’ attention.

The platform could open a new chapter for Korea’s cancer diagnosis, treatment, and research, observers said.

Government officials and healthcare professionals discussed the “K-Cancer” project for building big data in oncology at the 47th annual meeting of the Korea Cancer Association (KCA) and the7th International Cancer Conference on Friday.

According to Bang Yeong-sik, director of the healthcare data promotion division at the Ministry of Health and Welfare, the government plans to collect data of 3 million Korean cancer patients under the K-Cancer project and build a clinical data utilization network.

Korea University Anam Hospital Professor Kim Yeul-hong (second from left) chairs the discussion building national cancer data with government officials, including Health and Welfare Ministry’s Healthcare Data Promotion Division Director Bang Yeong-sik (third from left) and National Cancer Center’s Cancer Big Data Center Director Choi Kui-son (third from right) at the online meeting of the Korea Cancer Association (KCA) on Friday.
Korea University Anam Hospital Professor Kim Yeul-hong (second from left) chairs the discussion building national cancer data with government officials, including Health and Welfare Ministry’s Healthcare Data Promotion Division Director Bang Yeong-sik (third from left) and National Cancer Center’s Cancer Big Data Center Director Choi Kui-son (third from right) at the online meeting of the Korea Cancer Association (KCA) on Friday.

As part of the project, the ministry plans to establish Cancer Library, cohort database, data collection/process services by 2025.

Cancer Library refers to a service that collects and provides hospital-specific data in a library form after standardizing item definitions of each cancer data from participating medical institutions.

To do so, the government will establish data of the 10 most frequently occurring types of cancer by 2025 and manage each hospital’s data in the Edge Cloud. The government is reviewing whether to open the data when necessary.

Also, the health authorities will establish a cohort database of six cancer types by 2025 for in-depth research on before and after cancer diagnosis and long-term follow-up studies.

The government aims to build data of 600,000 people, or 20 percent of the total cancer patients, based on the national cancer registry data.

The government will construct a cohort database based on each cancer data built in the library form and data from public institutions. The National Cancer Data Center will manage the cohort database. The government has yet to decide which institution will be the National Cancer Data Center.

To ensure that big data for cancer does not lead to the misuse or abuse of individuals’ private information, the government said it would operate “safe use centers.”

The health and welfare ministry will certify such centers with specific criteria so that only those eligible can use the big data, it said. Designated safe use centers will gradually expand from 2022 to 2025.

“When we start the K-Cancer project next year in earnest, we will confirm the participating institutions, standardize data, and proceed with the project,” Bang said.

“We plan to run a pilot project in late 2022 or 2023 and begin providing data for external users from 2023.”

Choi Kui-son, head of the Cancer Big Data Center at the National Cancer Center, emphasized the need for big data for local cancer treatment and research.

Choi said Korea has established various information in the healthcare sector well, and such private and public data are worth 2 trillion won ($1.7 billion). Although there had been some difficulties in utilizing big data, the recent revision of the Personal Information Protection Act and the government’s policy support will invigorate related studies and businesses actively, she said.

According to Choi, good big data requires “five Vs” – volume, variety, velocity, value, and veracity.

However, existing big data in Korea lacked standardization of clinical data and quality verification, she noted. It was also difficult to create a new value through a data combination or use the data in an integrated way.

Another challenge was to link clinical information in high demand with genetic information-related big data and provide it for researchers, she went on to say.

“The key of the 4th Comprehensive Cancer Management Plan, announced in April, is to utilize big data aggressively,” Choi said. “To do so, we need three strategies – establishing integrated cancer data, running a national cancer data center, and sharing and utilizing cancer data.”

At the discussion, health experts questioned issues related to patient consent for data collection and specific data collection methods.

Under the Personal Information Protection Act amended last year, de-identified information can be used for statistics and research purposes without the individual's consent.

Bang at the health ministry said the K-Cancer project would mainly involve using hospitals’ accumulated data.

“We are designing the cancer clinical data network in a way that we pseudonymize patients’ data and build big data without their consent. As many patients are visiting hospitals, it is difficult to receive consent from all of them for research purposes,” Bang said. “But genomic information or information that is difficult to be deidentified should be tracked for a long time based on the patient's consent.”

It would be nice to collect as much detailed data as possible, but there should be a balance because unnecessary data collection at hospitals can cause operational difficulties, Bang said.

Normally in research, researchers get information from electronic medical records (EMR) or clinical data warehouse (CDW) and build a data set, he went on to say.

At the initial stage of the project, the government will need manpower for such work, but it will need an automated collection of research data from EMR or CDW in the future, he said.

Choi said the NCC’s CONNECT platform could be utilized for joint research of 10 medical institutions.

Following the revision of the Personal Information Protection Act last year, the Cancer Control Act will be enforced this year.

Choi said that if the NCC is designated as a national cancer data center, the NCC will disclose accumulated data.

“According to the guidelines for the use of healthcare data, information excluding the whole genome or germ-line mutation can be pseudonymized, and genomic information can be used safely,” she said.

The NCC plans to collect data by automatically extracting, loading, and converting data from EMR without human input.

It is also considering introducing natural language processing to build data with high utilization value among atypical data in EMR, Choi added.

Copyright © KBR Unauthorized reproduction, redistribution prohibited