Design and analysis of outcomes following SARS-CoV-2 infection in … – BMC Medical Research Methodology


Study design and data

We designed a retrospective cohort study of EHR-based outcomes with a non-equivalent comparator of uninfected Veterans. To facilitate measurement of patient-reported outcomes, this retrospective cohort is paired with an embedded smaller post-only survey-based prospective cohort study. In both components, comparator non-equivalence was reduced by generating matched cohorts.

As described previously [5], we assembled a cohort of VA enrollees who tested positive for SARS-CoV-2 RNA in a respiratory specimen within the VA system based on polymerase chain reaction (PCR) tests as well as those with evidence of SARS-CoV-2 infection identified outside the VA but documented in VA records as identified by the VA National Surveillance Tool between March 1, 2020 and April 30, 2021. The earliest date of a documented positive test was taken as each patients date of infection. We included only those Veterans who had an assigned VA primary care team (e.g., Patient Aligned Care Team) or at least one VA primary care clinic visit in the two-year period prior to infection to minimize missingness in EHR-based covariates that are generated from health system interaction. Cohorts were identified sequentially on a monthly basis, with assignment to a particular month for cases based on the date of the positive test or documentation in notes of non-VA evidence of infection. VA-enrolled Veterans without a positive test prior to or during the month who met the same inclusion criteria were considered uninfected potential comparators for that month. The uninfected control group members were eligible for repeated sampling and matching with replacement until they had a positive test. To avoid misclassification of first infection date based on a positive test, infected Veterans with COVID-19-related diagnostic codes (ICD-10: B97.29, U07.1, U09.9, J12.82, Z86.16) listed in fee-for-service Medicare claims 15 or more days before their VA test were excluded. In addition, Veterans from the uninfected comparator group with any such diagnostic codes were excluded from sampling for matching in the month the COVID-19-related code arose and any months thereafter.

We developed 14 separate monthly patient cohortsone for each month (March 2020-April 2021) for the purpose of defining index dates and matching covariates. For example, the March 2020 cohort included all VA enrollees with an initial positive test during March 2020 and all VA enrollees who were alive as of March 1, 2020 and had not been infected prior to April 1, 2020. SARS-CoV-2-infected patients were included as potential comparator patients in months before infection. In a given month, uninfected Veterans could be matched to multiple infected Veterans in that same month and uninfected Veterans could be included in multiple month-specific cohorts as long as they remained uninfected and continued to meet other eligibility criteria. To minimize immortal time bias, the index date was defined as the date of the earliest positive test for SARS-CoV-2- infected Veterans and as the 1st day of the relevant month for uninfected Veterans [6]. Each patients index date served as the anchor for defining matching covariates (with covariate construction starting 14 days prior to the positive test date for infected patients), based on EHR data from the prior two years.

Our goal was to conduct many-to-one matching that would maximize retention of infected patients for external validity and covariate balance for internal validity. A priori, we defined a suitable matching strategy as one that would result in <5% attrition of the infected cohort and achieve covariate balance among the selected covariates for matching based on standardized differences<0.1 [7].

Coarsened exact matching (CEM) was initially attempted. Covariates used for matching were derived iteratively at a single point in time (summer 2021) with the understanding that the evidence base about causes and consequences of COVID-19 was (and is) evolving rapidly. In collaboration with clinician-investigators (see left column, Appendix 1), we identified a broad list of demographic, clinical, and health care utilization measures hypothesized to be either risk factors for pre-specified outcomes alone (e.g., survival, depression, total VA costs, disability, healthcare-related financial strain due to high out-of-pocket costs) or confounders associated with both infection and outcomes [8].

To minimize sample loss when attempting to match on many covariates in CEM [9], the five physician principal investigators then worked together to prioritize covariates for the final matching specification (see right column, Appendix 1). Modified coarsened exact matching was then implemented using this prioritized set of covariates. However, a suitable exact match could not be identified for 53.7% of infected Veterans, so we reverted to a form of combined exact and calendar time-specific propensity score matching [10], with cohorts identified by index month.

In a two-step process, infected patients were exact matched to uninfected controls based on index month, sex, immunosuppressive medication use (binary), state of residence, and COVID-19 vaccination status (effective in January-April 2021 cohorts only) because these covariates were strong potential confounders. In the second step, a total of 39 binary, categorical, and continuous covariates were included in the propensity score model, including immunosuppressive medication use (binary), nursing home residence any time in the prior two years, vaccination status (January-April 2021 cohorts), and diagnosed CDC high-risk conditions: [11] coronary heart disease, cancer (excluding non-metastatic skin cancers), chronic kidney disease, congestive heart failure, pulmonary-associated conditions (including asthma, COPD, interstitial lung disease, and cystic fibrosis), dementia, diabetes, hypertension, liver disease, sickle cell/thalassemia, solid organ or blood stem cell transplant, stroke/cerebrovascular disorders, substance use disorder, anxiety disorder, bipolar disorder, major depression, PTSD, and schizophrenia.

Other categorical variables in the propensity score model included sex, race, ethnicity, rurality of the Veterans home ZIP code, state of residence, smoking status, and categorization of two comorbidity scores (CAN [12], Nosos [13]). Continuous covariates included age, body mass index (BMI), comorbidity score via Gagne index, distance from a Veterans home to nearest VA hospital, count of CDC high-risk conditions, count of mental health conditions, and four VA utilization measures (inpatient admissions, primary care visits, specialty care visits, mental health visits in the prior 2 years).

A caliper of 0.2 times the pooled estimate of the standard deviation of the logit of the propensity score was used to bound which uninfected patients could be matched to each infected patient [14]. To provide the survey team a sufficiently deep pool of matched controls to account for survey non-participation, the 25 matched uninfected patients closest in propensity score were retained for each infected patient. Infected patients with fewer than 25 matched uninfected patients had all their comparator patients selected as eligible matches. Matching was performed by the PSMATCH procedure from SAS/STAT 15.1 in SAS 9.4M6 via the VA Informatics and Computing Infrastructure (VINCI) platform.

The EHR-based clinical outcomes that we intend to compare between matched cohorts are mortality, depression, suicide, onset of new clinical diagnoses, exacerbation of prevalent conditions, development of COVID-19 sequelae, and health care use and VA health care costs. The survey-based outcomes to be compared between matched cohorts include disability, healthcare-related financial strain, and health-related quality of life. Our default approach to analyses will be per-protocol, such that uninfected patients who cross over to become infected will be censored at the time of infection. Future analyses will account for this potentially informative censoring via inverse probability of censoring weights [15] and/or censoring of the entire matched strata at time of censoring. The study team discussed inclusion of negative control outcomes, but an outcome expected to be null between comparators could not be identified due to the ubiquitous effects of SARS-CoV-2 infection and the conditioning of negative control outcomes on health care utilization that might be differential between comparators.

Continue reading here:
Design and analysis of outcomes following SARS-CoV-2 infection in ... - BMC Medical Research Methodology

Related Posts