Masters Thesis Defense “Using Clinical Notes and Natural Language Processing To Understand Sickle Cell Disease” By Shufa Khizra

Wednesday, December 5, 2018, 2 pm to 4 pm
Campus: 
Dayton
304 Russ Engineering
Audience: 
Current Students
Faculty

Committee:  Drs. Tanvi Banerjee, Advisor, Mateen Rizki, and Michelle Cheatham

ABSTRACT:

Sickle Cell Disease (SCD) is a hereditary disorder in red blood cells that can lead to excruciating pain episodes. SCD causes the normal red blood cells to distort its shape and turn into sickle shape. The distorted shape makes the hemoglobin inflexible and stick to the walls of the vessels thereby obstructing the free flow of blood and eventually making the tissues suffer from lack of oxygen. Lack of oxygen causes serious problems including Acute Chest Syndrome (ACS), stroke, infection, organ damage, and over the lifetime an SCD can harm a persons spleen, brain, kidneys, eyes, bones. It is believed that 90,000 to 100,000 American are affected by SCD. Myriad number of studies have been working on gaining better understanding of the disease and predict pain crisis and pain level.

Our study focuses on four research problems namely patient informative, pain informative, pain sentiment and pain scores using SCD data. Notes are taken for a patient during hospitalization but only few provide beneficial information, therefore patient informative and pain informative helps healthcare professionals to scan through the notes that can pro- vide valuable information from all the clinical notes maintained. Pain sentiment and pain score predict the change in pain and pain level for a particular note. Our study experimented with two feature sets, firstly features obtained from cTAKES, a Natural Language Processing (NLP) and secondly features obtained from text using NLP techniques. Four supervised machine learning models namely Logistic Regression, Random Forest, Support Vector Machines, and Multinomial Naive Bayes are built on these different sets of features. From the results, it can be noted that cTAKES features are performing well for SCD problem for all the four research problems with F1 score ranging from 0.40 to 0.86. This indicates that there is promise for using NLP techniques in clinical notes as a means to better understand pain in SCD patients.

For information, contact
Log in to submit a correction for this event (subject to moderation).