Using machine algorithms

Using machine algorithms on small data to understand why certain patients suffer more despite having similar clinical symptoms

Problem context

In a woman’s health category, we had heard physicians tell us that often certain patients suffer more despite having similar clinical symptoms. We wanted to gain insight on patient characteristics that could be drivers of suffering by using simple machine learning algorithms on patient survey data

Our approach

We ran our analysis on a patient survey with a sample size of over 2000. In addition to patient responses, the survey included physicians’ observations about a patients’ severity related to the disease. We selected a relevant validated scale that measured the extent to which patients suffered because of the disease as a dependant variable. We chose a variety of independent variables related to demographics, general health, behavioural characteristics, disease management history, patient reported experience of symptoms, the clinicians’ observation on the severity, and attitudes. We ran analysis with three supervised machine learning algorithms – logistic regressions, support vector machines, and random forests.

Insights and Impact

The models had an accuracy level of about 70% in identifying patients with severe suffering. We identified four main predictors
Given that the data set was small, expectedly, the support vector machine and random forest models did not perform better than a basic logistic regression model
The client is in the process of testing the utility with a variety of opinion leaders