Introduction to noncomplyR

Non-compliance to treatment assignment is a common issue in randomized clinical trials (RCTs). The noncomplyR package provides convenient functions for using Bayesian methods to perform inference on the Complier Average Causal Effect, the focus of a compliance-based analysis. The models used in this package are based on Imbens and Rubin (1997). The package currently supports two types of outcome models: the Normal model and the Binary model. Conjugate priors are used to simplify the sampling procedure. Users can, however, supply their own hyperparameter values in order to affect the shape of the prior distribution, thus maintaining some flexibility in the modeling of prior information.

This document takes you through the basic workflow of a compliance-based analysis using the noncomplyR package. To illustrate the workflow, we will work through an analysis of a randomized trial investigating the effect of vitamin A supplementation on childhood mortality.

Data: vitaminA

Individuals in this study were randomly assigned to either receive or not receive vitamin A supplementation. However, some individuals in the treatment arm did not receive the treatment. The data are contained in a dataframe named vitaminA. The dataset contains 23682 observations on 3 variables: survival (1 = lived, 0 = died), treatment assignment (1 = vitamin A, 0 = control), and treatment received (1 = vitamin A, 0 = no vitamin A). Note the order of the variables is outcome, treatment assignment, treatment received. This is the variable order that noncomplyR requires.

We now demonstrate how the main functions of the noncomplyR package can be applied in sequence to fit a non-compliance model and perform posterior-based inference on the CACE.

Step 1: Fit model with compliance_chain

The primary model-fitting function is compliance_chain(). This function uses the data augmentation algorithm to obtain a sample from the posterior distribution for the full set of model parameters. The following code performs 1000 iterations of the data augmentation algorithm and discards the initial 10. Note that we have specified the outcome_model as well as the set of assumptions that we will be making when fitting the model.

model_fit <- compliance_chain(vitaminA, outcome_model = "binary", exclusion_restriction = T,
                              strong_access = T, n_iter = 1000, n_burn = 10)
head(model_fit)
#>        omega_c   omega_n      p_c0      p_c1       p_n
#> [1,] 0.7974922 0.2025078 0.9935898 0.9981105 0.9899783
#> [2,] 0.8027364 0.1972636 0.9938614 0.9986314 0.9880724
#> [3,] 0.8078972 0.1921028 0.9961371 0.9986386 0.9872045
#> [4,] 0.8070221 0.1929779 0.9969108 0.9983559 0.9822705
#> [5,] 0.7993206 0.2006794 0.9964803 0.9985936 0.9843990
#> [6,] 0.7997129 0.2002871 0.9960020 0.9985101 0.9828294

Since the hyper_parameters argument was not specified, compliance_chain automatically uses a non-informative flat prior.

Step 2: Transform the fitted model into a sample from the posterior of the CACE with cace

Once the model has been fit and a posterior sample obtained for the full set of model parameters, the corresponding posterior sample for the CACE can be obtained with the cace function. This function takes the following arguments: the results of a call to the compliance_chain() function, the outcome model used in fitting the model, and a logical value indicating whether the Strong Access Monotonicity assumption was made when fitting the model.

cace_posterior <- cace(chain = model_fit, outcome_model = "binary", strong_access = T)
head(cace_posterior)
#> [1] 0.004520734 0.004770000 0.002501491 0.001445098 0.002113302 0.002508159

Step 3: Obtain summary statistics with summarize_chain

Give the sample from the posterior, the user can directly compute quantities of interest. For convenience, the summarize_chain function computes the posterior mean, median, and 50%, 90%, and 95% credible intervals.

summarize_chain(cace_posterior)
#> Posterior Mean: 0.003 
#> Posterior Median: 0.003 
#> Posterior 50% Credible Interval: (0.002, 0.004) 
#> Posterior 90% Credible Interval: (0.001, 0.005) 
#> Posterior 95% Credible Interval: (0.001, 0.006)

This example illustrates the basic workflow of a compliance-based analysis using noncomplyR: fit the full model using compliance_chain(), obtain the posterior for the CACE using cace(), and summarize using summarize_cace().

Model Details

The function defaults have been chosen so that a standard compliance-based analysis with non-informative priors can be easily performed. However, it may be the case that users have additional prior information that they would like to incorporate into the analysis. We now give further details about the (hyper)parameters underlying the models in this package, so that interested users can build prior information into the model by supplying their own hyperparameters.

The distribution of the compliance types Compliers, Never Takers, and Always Takers is modeled as a Multinomial distribution with probability parameters ωc, ωn, ωa. The prior distribution on these parameters is a Dirichlet distribution with hyperparameters γc, γn, γa. Note that if the Strong Access Monotonicity assumption holds, then (hyper)parameters for the Always Takers drop out and this reduces to a Binomial data model with conjugate Beta prior.

For the Binary outcome model, the distribution of the outcome within compliance type t under treatment assignment z is modeled as a Bernoulli random variable with probability parameter ptz. We then model the prior on this parameter using a Beta distribution with hyperparameters αtz, βtz.

For the Normal outcome model, the distribution of the outcome within compliance type t under treatment assignment z is modeled as a Normal distribution with mean μtz and variance σtz2. We assume the conjugate Normal-Inverse Gamma prior distribution for the mean and variance parameters. That is, the prior for σtz2 is Inverse Gamma with hyperparameters atz, btz and the prior for μtz conditional on sigmatz2 is Normal with prior mean θtz and prior variance σtz2Vtz.

For both outcome models described above, note that if the Exclusion Restriction holds then the Never Takers under both assignment to control and treatment can be modeled with a single set of (hyper)parameters, and similarly for the Always Takers. And, if the Strong Access Monotonicity assumption holds, then any hyper(parameters) related to the Always Takers drop out of the model.

Users can incorporate prior information in the fitting of the model by supplying values to the hyper_parameters argument of compliance_chain(). The number of values to supply and the order in which they should be supplied will depend on the outcome model and the set of assumptions made when fitting the model. The convention for the ordering of the (hyper)parameters is the following:

  • (hyper)parameters for the compliance type distributions come before (hyper)parameters for the outcome distributions
  • within each of these two groups of parameters, (hyper)parameters for Compliers come before (hyper)parameters for Never Takers which come before (hyper)parameters for Always Takers
  • within each compliance-type, (hyper)parameters for the outcome of the control group come before hyper(parameters) for the outcome of the treatment group
  • for the Normal outcome model, mean (hyper)parameters come before variance (hyper)parameters

As an example, consider the Binary outcome model with the Exclusion Restriction but without the Strong Access monotonicity assumption. The parameters for this model are

  • 3 parameters ωc, ωn, ωa for the 3 compliance types
  • 4 parameters pc0, pc1, pn, pa for the outcomes: one for the outcomes for Compliers assigned to control, one for the outcomes for Compliers assigned to treatment, one for Never Takers, and one for Always Takers

Based on the conventions described above, the full set of model parameters would be ordered as (ωc, ωn, ωa, pc0, pc1, pn, pa). The hyperparameters should therefore be ordered as (γc, γn, γa, αc0, βc0, αc1, βc1, αn, βn, αa, βa).

Now consider the Normal outcome model without the Exclusion Restriction but with the Strong Access monotonicity assumption. The parameters for this model

  • 2 parameters ωc, ωn for the 2 compliance types
  • 8 parameters μc0, σc02, μc1, σc12, μn0, σn02, μn1, σn12 for the mean and variance in the Compliers under control, the Compliers under treatment, the Never Takers under control, and the Never Takers under treatment.

The full set of model parameters should be ordered as (ωc, ωn, μc0, σc02, μc1, σc12, μn0, σn02, μn1, σn12) based on the ordering conventions. The corresponding hyperparameters are then ordered as (γc, γn, θc0, Vc0, ac0, bc0, θc1, Vc1, ac1, bc1, θn0, Vn0, an0, bn0, θn1, Vn1, an1, bn1).

References

Guido W. Imbens and Donald B. Rubin. “Bayesian Inference for Causal Effects in Randomized Experiments with Noncompliance”. The Annals of Statistics. 25 (1), 1997.