Introduction to noncomplyR

Non-compliance to treatment assignment is a common issue in randomized clinical trials (RCTs). The noncomplyR package provides convenient functions for using Bayesian methods to perform inference on the Complier Average Causal Effect, the focus of a compliance-based analysis. The models used in this package are based on Imbens and Rubin (1997). The package currently supports two types of outcome models: the Normal model and the Binary model. Conjugate priors are used to simplify the sampling procedure. Users can, however, supply their own hyperparameter values in order to affect the shape of the prior distribution, thus maintaining some flexibility in the modeling of prior information.

This document takes you through the basic workflow of a compliance-based analysis using the noncomplyR package. To illustrate the workflow, we will work through an analysis of a randomized trial investigating the effect of vitamin A supplementation on childhood mortality.

Data: vitaminA

Individuals in this study were randomly assigned to either receive or not receive vitamin A supplementation. However, some individuals in the treatment arm did not receive the treatment. The data are contained in a dataframe named vitaminA. The dataset contains 23682 observations on 3 variables: survival (1 = lived, 0 = died), treatment assignment (1 = vitamin A, 0 = control), and treatment received (1 = vitamin A, 0 = no vitamin A). Note the order of the variables is outcome, treatment assignment, treatment received. This is the variable order that noncomplyR requires.

We now demonstrate how the main functions of the noncomplyR package can be applied in sequence to fit a non-compliance model and perform posterior-based inference on the CACE.

Step 1: Fit model with `compliance_chain`

The primary model-fitting function is compliance_chain(). This function uses the data augmentation algorithm to obtain a sample from the posterior distribution for the full set of model parameters. The following code performs 1000 iterations of the data augmentation algorithm and discards the initial 10. Note that we have specified the outcome_model as well as the set of assumptions that we will be making when fitting the model.

model_fit <- compliance_chain(vitaminA, outcome_model = "binary", exclusion_restriction = T,
                              strong_access = T, n_iter = 1000, n_burn = 10)
head(model_fit)
#>        omega_c   omega_n      p_c0      p_c1       p_n
#> [1,] 0.7974922 0.2025078 0.9935898 0.9981105 0.9899783
#> [2,] 0.8027364 0.1972636 0.9938614 0.9986314 0.9880724
#> [3,] 0.8078972 0.1921028 0.9961371 0.9986386 0.9872045
#> [4,] 0.8070221 0.1929779 0.9969108 0.9983559 0.9822705
#> [5,] 0.7993206 0.2006794 0.9964803 0.9985936 0.9843990
#> [6,] 0.7997129 0.2002871 0.9960020 0.9985101 0.9828294

Since the hyper_parameters argument was not specified, compliance_chain automatically uses a non-informative flat prior.

Step 2: Transform the fitted model into a sample from the posterior of the CACE with `cace`

Once the model has been fit and a posterior sample obtained for the full set of model parameters, the corresponding posterior sample for the CACE can be obtained with the cace function. This function takes the following arguments: the results of a call to the compliance_chain() function, the outcome model used in fitting the model, and a logical value indicating whether the Strong Access Monotonicity assumption was made when fitting the model.

cace_posterior <- cace(chain = model_fit, outcome_model = "binary", strong_access = T)
head(cace_posterior)
#> [1] 0.004520734 0.004770000 0.002501491 0.001445098 0.002113302 0.002508159

Step 3: Obtain summary statistics with `summarize_chain`

Give the sample from the posterior, the user can directly compute quantities of interest. For convenience, the summarize_chain function computes the posterior mean, median, and 50%, 90%, and 95% credible intervals.

summarize_chain(cace_posterior)
#> Posterior Mean: 0.003 
#> Posterior Median: 0.003 
#> Posterior 50% Credible Interval: (0.002, 0.004) 
#> Posterior 90% Credible Interval: (0.001, 0.005) 
#> Posterior 95% Credible Interval: (0.001, 0.006)

This example illustrates the basic workflow of a compliance-based analysis using noncomplyR: fit the full model using compliance_chain(), obtain the posterior for the CACE using cace(), and summarize using summarize_cace().

Model Details

The function defaults have been chosen so that a standard compliance-based analysis with non-informative priors can be easily performed. However, it may be the case that users have additional prior information that they would like to incorporate into the analysis. We now give further details about the (hyper)parameters underlying the models in this package, so that interested users can build prior information into the model by supplying their own hyperparameters.

The distribution of the compliance types Compliers, Never Takers, and Always Takers is modeled as a Multinomial distribution with probability parameters ω_c, ω_n, ω_a. The prior distribution on these parameters is a Dirichlet distribution with hyperparameters γ_c, γ_n, γ_a. Note that if the Strong Access Monotonicity assumption holds, then (hyper)parameters for the Always Takers drop out and this reduces to a Binomial data model with conjugate Beta prior.

For the Binary outcome model, the distribution of the outcome within compliance type t under treatment assignment z is modeled as a Bernoulli random variable with probability parameter p_tz. We then model the prior on this parameter using a Beta distribution with hyperparameters α_tz, β_tz.

For the Normal outcome model, the distribution of the outcome within compliance type t under treatment assignment z is modeled as a Normal distribution with mean μ_tz and variance σ_tz². We assume the conjugate Normal-Inverse Gamma prior distribution for the mean and variance parameters. That is, the prior for σ_tz² is Inverse Gamma with hyperparameters a_tz, b_tz and the prior for μ_tz conditional on sigma_tz² is Normal with prior mean θ_tz and prior variance σ_tz²V_tz.

For both outcome models described above, note that if the Exclusion Restriction holds then the Never Takers under both assignment to control and treatment can be modeled with a single set of (hyper)parameters, and similarly for the Always Takers. And, if the Strong Access Monotonicity assumption holds, then any hyper(parameters) related to the Always Takers drop out of the model.

Users can incorporate prior information in the fitting of the model by supplying values to the hyper_parameters argument of compliance_chain(). The number of values to supply and the order in which they should be supplied will depend on the outcome model and the set of assumptions made when fitting the model. The convention for the ordering of the (hyper)parameters is the following:

(hyper)parameters for the compliance type distributions come before (hyper)parameters for the outcome distributions
within each of these two groups of parameters, (hyper)parameters for Compliers come before (hyper)parameters for Never Takers which come before (hyper)parameters for Always Takers
within each compliance-type, (hyper)parameters for the outcome of the control group come before hyper(parameters) for the outcome of the treatment group
for the Normal outcome model, mean (hyper)parameters come before variance (hyper)parameters

As an example, consider the Binary outcome model with the Exclusion Restriction but without the Strong Access monotonicity assumption. The parameters for this model are

3 parameters ω_c, ω_n, ω_a for the 3 compliance types
4 parameters p_c0, p_c1, p_n, p_a for the outcomes: one for the outcomes for Compliers assigned to control, one for the outcomes for Compliers assigned to treatment, one for Never Takers, and one for Always Takers

Based on the conventions described above, the full set of model parameters would be ordered as (ω_c, ω_n, ω_a, p_c0, p_c1, p_n, p_a). The hyperparameters should therefore be ordered as (γ_c, γ_n, γ_a, α_c0, β_c0, α_c1, β_c1, α_n, β_n, α_a, β_a).

Now consider the Normal outcome model without the Exclusion Restriction but with the Strong Access monotonicity assumption. The parameters for this model

2 parameters ω_c, ω_n for the 2 compliance types
8 parameters μ_c0, σ_c0², μ_c1, σ_c1², μ_n0, σ_n0², μ_n1, σ_n1² for the mean and variance in the Compliers under control, the Compliers under treatment, the Never Takers under control, and the Never Takers under treatment.

The full set of model parameters should be ordered as (ω_c, ω_n, μ_c0, σ_c0², μ_c1, σ_c1², μ_n0, σ_n0², μ_n1, σ_n1²) based on the ordering conventions. The corresponding hyperparameters are then ordered as (γ_c, γ_n, θ_c0, V_c0, a_c0, b_c0, θ_c1, V_c1, a_c1, b_c1, θ_n0, V_n0, a_n0, b_n0, θ_n1, V_n1, a_n1, b_n1).

References

Guido W. Imbens and Donald B. Rubin. “Bayesian Inference for Causal Effects in Randomized Experiments with Noncompliance”. The Annals of Statistics. 25 (1), 1997.