Kai On Wong found that machine learning can be used to predict ethnic...
Kai On Wong found that machine learning can be used to predict ethnic background from public health data, which would help fill an information gap and could eventually inform policies aimed at reducing health and social inequities.
Source: University of Alberta

AI uncovers missing info about ethnicity in population health

Machine learning can be used to fill a significant gap in Canadian public health data related to ethnicity and Aboriginal status, according to research by a University of Alberta research epidemiologist.

Kai On Wong, senior data scientist at the Real World Evidence unit of the Northern Alberta Clinical Trials and Research Centre (NACTRC), said ethnicity and Aboriginal status are recognized as key social determinants of health but are often not reported in large databases that track acute and chronic diseases such as asthma, influenza, cancer, cardiovascular diseases, diabetes, disability and mental illness.

“If a database currently lacks ethnicity information, we will not be able to tell whether certain ethnic groups have higher rates of disease or worse clinical outcomes,” Wong said, “This is a way to unlock that missing dimension from existing data sources, which may help us understand, monitor and address issues such as social inequities and racism in Canada.”

It’s all in a name—and a location

Wong created a machine learning framework to analyze the names and geographic locations of 4.8 million people surveyed in the 1901 census, examining features such as spelling and phonetics to predict whether they belonged to one of 13 ethnic groups. “Different ethnic and linguistic groups have different manifestations of features such as how the name sounds, how many letters in the name, how many vowels and unique letter sequences, and so on,” said Wong, who created the program and shared it as a public GitHub repository as part of his doctoral thesis at the U of A’s School of Public Health.

“Machine learning is like having a team of agents who are given vast amounts of information. They are instructed to detect and retain useful patterns to solve practical problems such as predicting the ethnicity from the readily available information,” he said.

Wong said the program performed best at identifying individuals of Chinese, French, Japanese and Russian heritage based on name only, while the accuracy was improved for the Aboriginal classification when locations were also included.

New insights from existing health records

Both the World Health Organization and the Government of Canada recognize ethnicity and Indigeneity as determinants of health, along with other factors such as income, education and gender. Wong first became interested in inequities in health care that affect Indigenous groups when he served as acting territorial epidemiologist for the Government of the Northwest Territories.

Wong said while American health records tend to include questions about ethnicity, this information is not collected consistently in Canadian databases ranging from hospital discharge records to cancer registries.

By using machine learning to uncover this missing information, researchers and policy-makers will be able to learn more from existing records rather than having to carry out new population-level surveys, which are expensive and time-consuming. “A future step forward will be to validate this research with real-world applications using health evidence augmented with ethnicity generated by the machine learning framework and comparing it with existing literature, particularly on health and social inequities,” Wong said.

Wong recommends first updating the ethnicity prediction tool using more recent census information and testing its accuracy when applied to various health records. “It is unrealistic to expect machine learning predictions to be 100 per cent accurate at all times,” Wong said. “The goal is to make predictions that are accurate and generalizable enough to discern underlying patterns in a meaningful way for a particular problem or application.”

Wong acknowledged and expressed gratitude to his thesis advisers, Yutaka Yasui and Faith Davis in the School of Public Health, and computing science professor Osmar Zaïane in the Faculty of Science. Wong’s research was funded by the Canadian Institutes of Health Research Frederick Banting and Charles Best Doctoral Research Award, the University of Alberta President’s Doctoral Prize of Distinction and Queen Elizabeth II Doctoral Scholarship, and the Alberta Machine Intelligence Institute (Amii).

Drawing from world-leading academic research at the University of Alberta and other institutions, Amii accelerates the adoption of artificial intelligence by industry leaders, builds in-house AI capabilities through hands-on coaching and helps prepare Alberta workers for high-demand careers in AI through world-class training opportunities.

The research was published in PLOS ONE.

Subscribe to our newsletter

Related articles

Machine learning system sorts out materials' databases

Machine learning system sorts out materials' databases

Scientists have used machin -learning to organize the chemical diversity found in the ever-growing databases for the popular metal-organic framework materials.

AI enhances predictions of COVID-19 outcomes

AI enhances predictions of COVID-19 outcomes

Researchers have used "federated learning" to examine electronic health records to better predict how COVID-19 patients will progress.

AI accurately detects COVID-19 on chest x-rays

AI accurately detects COVID-19 on chest x-rays

Researchers have developed a new AI platform that detects COVID-19 by analyzing X-ray images of the lungs.

Sorting out viruses with machine learning

Sorting out viruses with machine learning

Scientists develop a label-free method for identifying respiratory viruses based on changes in electrical current when they pass through silicon nanopores.

Machine learning predicts anti-cancer drug efficacy

Machine learning predicts anti-cancer drug efficacy

With the advent of pharmacogenomics, machine learning research is well underway to predict patients' drug response that varies by individual from the algorithms derived from previously collected data on drug responses.

Using AI to track pandemic’s impact on mental health

Using AI to track pandemic’s impact on mental health

Researchers have shown that they can measure those effects of the Corona pandemic on mental health by analyzing the language that people use to express their anxiety online.

AI model detects Covid-19 infections through coughs

AI model detects Covid-19 infections through coughs

Researchers have found that people who are asymptomatic for Covid-19 may differ from healthy individuals in the way that they cough.

Machine learning algorithm detects early stages of Alzheimer's

Machine learning algorithm detects early stages of Alzheimer's

An artificial intelligence-based detects early stages of Alzheimer’s through functional magnetic resonance imaging.

Machine learning comes of age in cystic fibrosis

Machine learning comes of age in cystic fibrosis

Researchers have developed AI technology that offers a glimpse of the future of precision medicine, and unprecedented predictive power to clinicians caring for individuals with the life-limiting condition.

Popular articles