Kai On Wong found that machine learning can be used to predict ethnic...
Kai On Wong found that machine learning can be used to predict ethnic background from public health data, which would help fill an information gap and could eventually inform policies aimed at reducing health and social inequities.
Source: University of Alberta

AI uncovers missing info about ethnicity in population health

Machine learning can be used to fill a significant gap in Canadian public health data related to ethnicity and Aboriginal status, according to research by a University of Alberta research epidemiologist.

Kai On Wong, senior data scientist at the Real World Evidence unit of the Northern Alberta Clinical Trials and Research Centre (NACTRC), said ethnicity and Aboriginal status are recognized as key social determinants of health but are often not reported in large databases that track acute and chronic diseases such as asthma, influenza, cancer, cardiovascular diseases, diabetes, disability and mental illness.

“If a database currently lacks ethnicity information, we will not be able to tell whether certain ethnic groups have higher rates of disease or worse clinical outcomes,” Wong said, “This is a way to unlock that missing dimension from existing data sources, which may help us understand, monitor and address issues such as social inequities and racism in Canada.”

It’s all in a name—and a location

Wong created a machine learning framework to analyze the names and geographic locations of 4.8 million people surveyed in the 1901 census, examining features such as spelling and phonetics to predict whether they belonged to one of 13 ethnic groups. “Different ethnic and linguistic groups have different manifestations of features such as how the name sounds, how many letters in the name, how many vowels and unique letter sequences, and so on,” said Wong, who created the program and shared it as a public GitHub repository as part of his doctoral thesis at the U of A’s School of Public Health.

“Machine learning is like having a team of agents who are given vast amounts of information. They are instructed to detect and retain useful patterns to solve practical problems such as predicting the ethnicity from the readily available information,” he said.

Wong said the program performed best at identifying individuals of Chinese, French, Japanese and Russian heritage based on name only, while the accuracy was improved for the Aboriginal classification when locations were also included.

New insights from existing health records

Both the World Health Organization and the Government of Canada recognize ethnicity and Indigeneity as determinants of health, along with other factors such as income, education and gender. Wong first became interested in inequities in health care that affect Indigenous groups when he served as acting territorial epidemiologist for the Government of the Northwest Territories.

Wong said while American health records tend to include questions about ethnicity, this information is not collected consistently in Canadian databases ranging from hospital discharge records to cancer registries.

By using machine learning to uncover this missing information, researchers and policy-makers will be able to learn more from existing records rather than having to carry out new population-level surveys, which are expensive and time-consuming. “A future step forward will be to validate this research with real-world applications using health evidence augmented with ethnicity generated by the machine learning framework and comparing it with existing literature, particularly on health and social inequities,” Wong said.

Wong recommends first updating the ethnicity prediction tool using more recent census information and testing its accuracy when applied to various health records. “It is unrealistic to expect machine learning predictions to be 100 per cent accurate at all times,” Wong said. “The goal is to make predictions that are accurate and generalizable enough to discern underlying patterns in a meaningful way for a particular problem or application.”

Wong acknowledged and expressed gratitude to his thesis advisers, Yutaka Yasui and Faith Davis in the School of Public Health, and computing science professor Osmar Zaïane in the Faculty of Science. Wong’s research was funded by the Canadian Institutes of Health Research Frederick Banting and Charles Best Doctoral Research Award, the University of Alberta President’s Doctoral Prize of Distinction and Queen Elizabeth II Doctoral Scholarship, and the Alberta Machine Intelligence Institute (Amii).

Drawing from world-leading academic research at the University of Alberta and other institutions, Amii accelerates the adoption of artificial intelligence by industry leaders, builds in-house AI capabilities through hands-on coaching and helps prepare Alberta workers for high-demand careers in AI through world-class training opportunities.

The research was published in PLOS ONE.

Subscribe to our newsletter

Related articles

Machine learning system sorts out materials' databases

Machine learning system sorts out materials' databases

Scientists have used machin -learning to organize the chemical diversity found in the ever-growing databases for the popular metal-organic framework materials.

Medical data paves the way for machine learning

Medical data paves the way for machine learning

An consortium aims to transform the field of prostate cancer care by unlocking the potential of big data and big data analytics.

Artificial intelligence shortcuts introduce bias in cancer treatment

Artificial intelligence shortcuts introduce bias in cancer treatment

AI tools models are a powerful tool in cancer treatment. However, unless these algorithms are properly calibrated, they can sometimes make inaccurate or biased predictions.

Artificial intelligence for emergency management

Artificial intelligence for emergency management

A consortium aims to develop a platform that will serve as the basis for novel services and test the use of new artificial intelligence tools.

AI app could help diagnose HIV more accurately

AI app could help diagnose HIV more accurately

New technology could transform the ability to accurately interpret HIV test results, particularly in low- and middle-income countries.

Self-learning robots go full steam ahead

Self-learning robots go full steam ahead

Researchers have shown that a group of small autonomous, self-learning robots can adapt easily to changing circumstances. They connected the simple robots in a line, after which each individual robot taught itself to move forward as quickly as possible.

AI makes great microscopes better than ever

AI makes great microscopes better than ever

Machine learning helps some of the best microscopes to see better, work faster, and process more data.

AI could crack the language of cancer

AI could crack the language of cancer

Powerful algorithms used by Netflix, Amazon and Facebook can ‘predict’ the biological language of cancer and neurodegenerative diseases like Alzheimer's.

AI can identify cancerous cells by their acidity

AI can identify cancerous cells by their acidity

Using a special dye, cells are colored according to their pH, and a machine learning algorithm can detect changes in the color spectrum due to cancer.

Popular articles

Subscribe to Newsletter