Computational Methods for Electronic Health Record-driven Phenotyping
Author | : |
Publisher | : |
Total Pages | : 162 |
Release | : 2013 |
Genre | : |
ISBN | : |
Download Computational Methods for Electronic Health Record-driven Phenotyping Book in PDF, Epub and Kindle
Each year the National Institute of Health spends over 12 billion dollars on patient related medical research. Accurately classifying patients into categories representing disease, exposures, or other medical conditions important to a study is critical when conducting patient-related research. Without rigorous characterization of patients, also referred to as phenotyping, relationships between exposures and outcomes could not be assessed, thus leading to non-reproducible study results. Developing tools to extract information from the electronic health record (EHR) and methods that can augment a team's perspective or reasoning capabilities to improve the accuracy of a phenotyping model is the focus of this research. This thesis demonstrates that employing state-of-the-art computational methods makes it possible to accurately phenotype patients based entirely on data found within an EHR, even though the EHR data is not entered for that purpose. Three studies using the Marshfield Clinic EHR are described herein to support this research. The first study used a multi-modal phenotyping approach to identify cataract patients for a genome-wide association study. Structured query data mining, natural language processing and optical character recognition where used to extract cataract attributes from the data warehouse, clinical narratives and image documents. Using these methods increased the yield of cataract attribute information 3-fold while maintaining a high degree of accuracy. The second study demonstrates the use of relational machine learning as a computational approach for identifying unanticipated adverse drug reactions (ADEs). Matching and filtering methods adopted were applied to training examples to enhance relational learning for ADE detection. The final study examines relational machine learning as a possible alternative for EHR-based phenotyping. Several innovations including identification of positive examples using ICD-9 codes and infusing negative examples with borderline positive examples were employed to minimize reference expert effort, time and even to some extent possible bias. The study found that relational learning performed significantly better than two popular decision tree learning algorithms for phenotyping when evaluating area under the receiver operator characteristic curve. Findings from this research support my thesis that states: Innovative use of computational methods makes it possible to more accurately characterize research subjects based on EHR data.