Validation of Claims-based Algorithms for Newly Diagnosed Juvenile Idiopathic Arthritis Using Machine Learning Methods
Poster presented at American College of Rheumatology (ACR)'s Convergence, November 10 - November 15, 2023 in San Diego, CA, US .
Authors: Patricia Hoffman, Lauren E Parlett, Daniel C Beachler, Daniel Reiff, Sarah McGuire, Sonia Pothraj, Lakshmi Moorthy, Cynthia Salvant, Dawn Koffman, Sanika Rege, Cecilia Huang, Matthew Iozzio, Kevin Schott, Amy Davidow, Kevin Haynes, Stephen Crystal, Tobias Gerhard, Brian L Strom, Carlos D Rose, Daniel B Horton
Presenting author: Patricia Hoffman
Abstract
Background/Purpose: Administrative claims databases represent important settings for studying large populations with juvenile idiopathic arthritis (JIA), but prior efforts to validate diagnostic algorithms for JIA using administrative data have been limited to potentially non-generalizable settings (e.g., specific clinics or health care systems). We aimed to develop and validate algorithms for new diagnoses of JIA in a large claims database using rule-based and machine learning-based approaches.
Methods: We performed a cross-sectional validation study using US commercial health plan data (2013-2020). We identified children diagnosed with JIA (ICD-9-CM: 696.0, 714, 720; ICD-10-CM: L40.5, M05, M06, M08, M45) before age 18 following ≥12 months of baseline continuous enrollment without JIA diagnosis or immunosuppression. JIA diagnoses were based on 3 previously validated definitions: 1) rheumatologist's diagnosis plus orders for ≥2 specific laboratory tests; 2) ≥2 outpatient diagnoses 8-52 weeks apart; or 3) 1 inpatient diagnosis. Charts from a random subset of subjects meeting each definition were abstracted and independently adjudicated by clinical experts; discrepancies were resolved by a third expert or, where necessary, consensus. Incident JIA was defined as definite or probable JIA diagnosed in the prior 4 months. Using data from 1 year before through 1 year after first JIA diagnosis, we then created candidate predictor variables from demographics, diagnoses, medications, procedures, and specialty of clinicians diagnosing JIA. After applying a simulation-based balancing method (Synthetic Minority Oversampling Technique, SMOTE), we selected optimal logistic regression regularization hyperparameters using 10-fold cross-validation. Model variables were used to score observations, and sensitivity, specificity, and positive predictive value (PPV) [95% confidence interval (CI)] were assessed at different thresholds of predicted JIA probability.
Results: Of 182 eligible charts reviewed (92 ICD-9-based, 90 ICD-10-based), 133 had definite/probable JIA (ICD-9 64%, ICD-10 82%). Of JIA diagnoses, 90 were incident (ICD-9 90%, ICD-10 50%). Rule-based algorithms had limited PPV for incident JIA (ICD-9 58%, ICD-10 41%) (Table). Use of machine-learning based algorithms enabled excellent discrimination between incident and prevalent JIA (ICD-9 AUC 0.97, ICD-10 AUC 0.88) and between incident JIA and unlikely JIA (ICD-9 AUC 0.99, ICD-10 AUC 0.94) (Figure 1). Specific predicted probability thresholds yielded excellent test characteristics for differentiating incident JIA from unlikely JIA (ICD-9: sensitivity 95%, specificity 96%, PPV 96% [95% CI 96-100%]; ICD-10: sensitivity 81%, specificity 92%, PPV 91% [95% CI 84-97%]) (Figure 2).
Conclusion: Machine learning-based diagnostic algorithms for incident JIA enhanced traditional rule-based algorithms in identifying new diagnoses of JIA using ICD-9 and ICD-10 codes within a large US claims database. External validation of these models is warranted, but these algorithms will facilitate use of administrative data to study JIA diagnosis, management, and outcomes in large populations.
Table. Accuracy of rule-based algorithms for incident JIA. ICD, International Classification of Diseases; Inc., incident; JIA, juvenile idiopathic arthritis; PPV, positive predictive value; Prev., prevalent. Numbers of charts are shown by claims-based definition and adjudicated diagnosis. PPV represents the percentage with definite or probable incident JIA among those meeting the claims-based definition. Specific labs eligible for Definition 1 were antinuclear antibody, rheumatoid factor, anti-cyclic citrullinated protein antibody, and HLA-B27.
Figure 1. ROC curves for machine learning-based algorithms for incident JIA. AUC, area under the curve; ROC, receiver operating characteristic; ICD, International Classification of Diseases; JIA, juvenile idiopathic arthritis. ROC curves show balance of sensitivity and specificity of machine learning-based algorithms for incident JIA based on comparisons with prevalent JIA (ICD-9 in A, ICD-10 in B) and with unlikely JIA (ICD-9 in C, ICD-10 in D). The Youden Index (dot on ROC curve) represents the maximum value of (Sensitivity + Specificity – 1).
Figure 2. Impact of predicted probability thresholds from machine learning-based models on accuracy of diagnoses of incident JIA compared with unlikely JIA. CI, confidence interval; ICD, International Classification of Diseases; PPV, positive predictive value. Curves show numbers of true positive (dark grey) and false positive (light grey) cases of incident JIA at different thresholds of predicted probability from machine learning-based models comparing children with incident JIA and children unlikely to have JIA based on ICD-9 codes (A) and ICD-10 codes (B). Numbers reflect the total study population, including simulated cases generated by application of a balancing method (Synthetic Minority Oversampling Technique, SMOTE). Threshold values yielding excellent test characteristics (dotted lines) were 0.97 for ICD-9 (sensitivity 95%, specificity 96%, PPV 96% [95% CI 96-100%]) and 0.22 for ICD-10 (sensitivity 81%, specificity 92%, PPV 91% [95% CI 84-97%]).
Tags
Analytic: machine learning | oversampling
Data Source: claims | medical record review
Research Focus: juvenile idiopathic arthritis
Study Design: cohort study
Funding Transparency
This work was possible through:
- Grant/Award
Conflict of interest statement at time of publication:
P. Hoffman: None; L. Parlett: Elevance Health, 3, Sanofi, 5; D. Beachler: Elevance Health, 3; D. Reiff: None; S. McGuire: None; S. Pothraj: None; L. Moorthy: Bristol-Myers Squibb(BMS), 12, SitePI for Abatacept registry BMS.; C. Salvant: None; D. Koffman: None; S. Rege: None; C. Huang: None; M. Iozzio: None; K. Schott: None; K. Haynes: Janssen, 3; A. Davidow: None; S. Crystal: None; T. Gerhard: None; B. Strom: AbbVie/Abbott, 2, Consumer Healthcare Products Association, 2; C. Rose: None; D. Horton: Danisco USA, Inc., 5.
Entry last updated (DMY): 19-01-2025.