Doing health studies using health plan administrative claims

By Lauren Parlett, PhD
Published on February 13, 2025 @ 20:55 EST | Last updated on February 13, 2025 @ 22:29 EST

Tags: claims | secondary data analysis

Doing health studies using health plan administrative claims

In my day job, I am an epidemiologist that uses healthcare medical and pharmacy claims to drive real-world evidence. When I was going to graduate school, I barely understood that this was a valid data source to create health studies. I hope that this article will inform both public health students and the general population on what these data are, how they are used, and what limitations exist.

Administrative claims

In the United States, patients covered by a health insurance plan generate claims for each interaction within the system such as doctor's visits, filling prescriptions, and more. For health studies, this is pretty awesome because researchers gain understanding about the breadth of healthcare encounters for a member. When John Smith visits his primary care doctor for his annual physical, a cardiologist about his new heart health concern, and the pharmacy to fill a statin prescription, that entire journey is captured in claims. The primary care provider, cardiologist, and pharmacist will submit standard forms to Mr. Smith's insurance company to be reimbursed for the cost of the visits or the medication. One such form in the 1500 claim form developed by the National Uniform Claim Committee (1).

On a claim form, there is information related to the patient's identity, their health insurance, and, most importantly for me, their health encounter information. You'll notice on the 1500 claim form, there are entry areas for diagnoses, dates of procedure service, procedures, and rendering provider. However, you're not going to find "hypertension" anywhere on Mr. Smith's claim form. Instead, it is represented by a diagnosis code from the International Classification of Disease codes. If the diagnosis occurred before October 1, 2015, then it would be version 9. After that, it would be version 10. So, instead, the values "401.1" or "I10" would represent Mr. Smith's diagnosis. For inpatient or emergency department claims, the diagnosis in the first position represents the primary diagnosis.

The code lists that I routinely use to decrypt claims data are:

Type of informationStandard Coding
Diagnosis

International Classification of DIsease (ICD), version 9 or 10, Clinical Modification

Diagnostic Related Group (DRG) - inpatient only

ProceduresHealthcare Common Procedure Coding System (HCPCS), Current Procedural Terminology (CPT), ICD-9 Procedure Coding System, ICD-10 Procedure Coding System
ServicesRevenue Codes UB-04
Provider SpecialtyCMS Specialty Taxonomy
MedicationNational Drug Code (NDC), Generic Product Identifier (GPI), sometimes HCPCS
Table 1. Standard element codes in claims data

How they are used

Typically, to be used in research, claims have to be processed and put into a usable structure. That structure can be proprietary or a common data model. Once the information is in a known structure, it can be used to construct cohorts of patients and create variables. Suppose I am studying the comparative effectiveness of Drug A to Drug B among patients with hypertension (high blood pressure). After developing a protocol for this study, the data could be analyzed. Here's some example inclusion and exclusion criteria:

CriteriaCohort - Drug ACohort - Drug B
Inclusion

First medication fill for Drug A (date = index date)

At least one diagnosis for hypertension on or prior to index date

Between 30 and 63 years old on index date

First medication fill for Drug B (date = index date)

At least one diagnosis for hypertension on or prior to index date

Between 30 and 63 years old on index date

ExclusionMedication fill for Drug B on index dateMedication fill for Drug A on index date
Table 2. Example inclusion and exclusion criteria by cohort

There are many different ways that I could pare down the data to only my Drug A and Drug B cohorts, but basically it would involve establishing the index date as the first medication fill for either Drug A or Drug B. Then, after that date is established, only keeping the patients that are aged 30 to 63 as of that date. Then, I would use a medical encounters table to drop anyone without a hypertension diagnosis prior to or on that index date. Finally, if I look at the index date and a patient filled both Drug A and B, they would be dropped from the study.

After you've established your study population, you need to follow them up. However, since this is an observational study using established data for a secondary purpose, your researching time does not have to parallel the patient's real-life time, which is what happens in interventional studies and clinical trials. Instead, the outcome (or no outcome) is already in the database.

Let's say that my outcome of interest was either an emergency department visit or inpatient stay due to a myocardial infarction (heart attack). In my established study population, I would then look forward from the index date and enumerate which patients had the outcome of interest versus those who did not. I could then use epidemiology and statistical methods to appropriately adjust for confounders, mediators, effect modification, and so on that would affect the difference seen in outcomes for Drug A versus outcomes for Drug B.

Limitations

While claims are a bountiful data source, there are a number of limitations to bear in mind. Firstly, people move between health insurance plans and their data does not move with them. As such, a new member to the claims that I see may have experienced an event of interest before, but because that data is opaque to me, I wouldn't know it. This is also called left-censoring. On the opposite side of things, if a patient is in my study population and I'm following them to see if they experience the event, they may disenroll from their health plan. Now I don't know whether or not they experienced the event. This is right-censoring.

In claims alone, you can only study diseases and procedures that have established codes. There is great difficulty sometimes in studying cancer because while different types of cancers do have different diagnosis codes, cancer stage is not explicit. Lifestyle and behavior covariates are generally not available. Obesity can definitely drive different health outcomes; however, the ICD-10 "Z68" diagnosis codes related to BMI category are not routinely used because they do not generate revenue. Since they are inconsistent, we cannot assume that a patient is not obese just because we do not see an obesity-related diagnosis code their claims. This is also true of smoking. Nicotine dependency (F17*) is a diagnosis code, but it is not frequently used.

When it comes to medication exposure, there are a host of assumptions that we make in claims. Firstly, we assume that if the pharmacy provided the medication, it was taken. Next, we assume the medication was taken as directed. Furthermore, since these pharmacoepidemiology studies rely on pharmacy dispensing records, it is difficult to study medications that are over-the-counter, illicit, purchased out of pocket, or given as doctor's office samples.

Some of the limitations stated above can be mitigated with various epidemiological and data techniques. For instance, it may be possible to do data linkage where more of the patient's medical history is linked to their claims. Additionally, it may be possible to reach out to patients for a subset of the population so that covariate information like obesity, smoking, or cancer stage can be ascertained and controlled for in the analyses.

In conclusion

I hope this article provided a little more insight into how health studies are conducted using healthcare claims. They are super useful in tracking some healthcare journeys; however, there are a number of limitations to keep in mind. This is only one segment of my day job, but it is pretty significant. I think I'll go into greater detail on various points that I brought up here in other future blogs.

References

  1. National Uniform Claim Committee. Health Insurance Claim Form. Published February 1, 2012. Accessed February 13, 2025. Available from: https://www.nucc.org/images/stories/PDF/1500_claim_form_2012_02.pdf.
return to blog