Imagine a scenario where you need to run some type of operation over a series of variables, but also over a set of other variables that may be numeric or string? One such scenario might be that you need to estimate multiple survival models using different parametric assumptions for several dependent variables with the same set of predictors for each model.
A helpful method of estimating these models efficiently is to use nested do loops. That is, you can “nest” some do loops (e.g., forvalues, foreach) inside of another loop (these may also be forvalues or foreach loops, but I will review different options as well). As with my prior blog post, I try to use this kind of syntax to reduce the number of lines of code written in a do file, thereby reducing the opportunities for error in my code.
I’ll first create a few simulated variables for this example:
set obs 1000
generate prior_arrests=rpoisson(1)
generate prior_convictions=rpoisson(1)
generate age=round(runiform(18,65))
forvalues i=1/3 {
generate b`i'=runiform()>=.75
}
rename (b1 b2 b3) (prior_sex_conv prior_viol_conv prior_drug_conv)
generate male=runiform()>=.15
** GENERATE ARREST OUTCOME & CALENDAR
generate arr_cal=round(runiform(0,365))
generate arrest=runiform()>=.66
replace arr_cal=365 if arrest==0
** GENERATE CONVICTION OUTCOME & CALENDAR
generate conv_cal=round(runiform(0,365))
generate conviction=runiform()>=.75 & arrest==1
replace conviction=0 if arrest==0
replace conv_cal=365 if conviction==0
** GENERATE TECHNICAL REVOCATION OUTCOME & CALENDAR
generate trev_cal=round(runiform(0,365))
generate tech_revocation=runiform()>=.50 & arrest==0
replace tech_revocation=0 if arrest==1
replace trev_cal=365 if tech_revocation==0
** DEFINE GLOBAL MACRO FOR PREDICTORS
global IVs prior_arrests prior_convictions prior_sex_conv ///
prior_viol_conv prior_drug_conv age male
** DEFINE GLOBAL MACRO FOR OUTCOMES
global DVs arrest conviction tech_revocation
** DEFINE GLOBAL MACRO FOR CALENDARS
global cals arr_cal conv_cal trev_cal
** DEFINE GLOBAL MACRO FOR DISTRIBUTIONS
global distributions exponential weibull gompertz lognormal loglogistic ggamma
For assistance with interpreting the mechanics of global macros and for loops, please see my prior blog post on these functions.
In the above code, I create 1000 simulated observations (“set obs 1000”), then create several predictor variables (prior_arrests, prior_convictions, etc…) under different distributional assumptions. I finally create an arrest indicator for post-release incidents (“arrest”) and a calendar indicating the day in the follow-up year in which the arrest took place (“cal”). Since some 0s for the arrest indicator will have erroneous information on the calendar variable, I replace these values with 365 to indicate that those with 0s on “arrest” survive for the length of the follow-up period. I do the same for the “conviction” and “revocation” outcome variables.
One issue that always needs to be addressed in a survival analysis is the distributional assumptions you place upon the outcome variable. For example, we can assume that the distribution of the outcome is represented by an exponential, Gompertz, Weibull, log-logistic, log-normal, or generalized gamma distribution function (among others, certainly). I won’t go into the nitty gritty details of these different assumptions here (as others have done so already – Kurlychek, Bushway, & Brame (2012)) but it should be known that the parametric assumptions you place upon the model can significantly alter your inferences.
So, let’s say that you need to assess just how these different parametric assumptions may influence your inferences – you would then need to run the same model over the different outcomes and distribution functions available to you and compare your results. Below is an easy method for doing this using nested do loops:
tokenize $cals
foreach outcome in $DVs {
foreach model in $distributions {
stset `1', failure(`outcome')
streg $IVs, distribution(`model')
estimates store `outcome'_`model'
}
macro shift
}
estimates stats _all
The first line of the code “tokenize $cals” creates a local macro of the different outcome calendars to loop over in the code that follows. The “tokenize” function merely substitutes a numeral, beginning with 1, for the series of calendars I include in the loop. So, it assigns a value of “1” to “arr_cal”, a value of “2” to “conv_cal”, and a value of “3” to “trev_cal”. The “macro shift” function contained within the outer “foreach outcome in $DVs” loop then shifts the numerical value after the arguments within the internal loop have been estimated. I then create the “foreach” loop for the different outcomes (contained within the $DVs global macro defined above) as the outer loop – in other words, I want the inner loop and all the code within it to be estimated over the three different outcomes specified in the $DVs global macro (i.e., arrest, conviction, and tech_revocation).
The inner “foreach” loop specifies that for every outcome in the outer loop, it will estimate one model for each distribution specified in the $distributions global macro. Therefore, six models will be estimated for each outcome and, once the six models are estimated, the outer loop (“foreach outcome in $DVs”) shifts to the next outcome and the local macro (“tokenize $cals”) shifts its numeric value up by one – thereby moving to the calendar for the next outcome. Finally, I include a line of code (“estimates store `outcome’_`model'”) within the inner loop telling STATA to store the estimates from each model so that I can reference them later using the “estimates stats _all” line of code. This final line of code lists model statistics for all 18 models including the number of observations in each model, the log-likelihoods, the degrees of freedom, the Akaike Information Criterion (AIC), and the Bayesian Information Criterion (BIC). I’ll not include the table here, as these are simulated data and the results arbitrary, but will note that you will typically try to identify the model with the lowest AIC or BIC (more often than not both will be lowest for the same models) and choose that specification.
One final note – the above models treat alternative outcomes as “right-censored”, which may be an entirely unjustifiable specification. More recently, survival analyses in the criminal justice literature have been utilizing competing risks frameworks to incorporate additional outcomes into singular survival analysis models. These models, however, come with a new set of issues that need to be addressed including potential dependencies between outcomes. You can find practical applications of competing risks here and here and in many other places, though examples in the criminal justice literature do not abound (yet!). I am currently working on a manuscript that applies a competing risks framework to arrests for new crimes with technical parole violations as a competing event – you can follow my progress on that manuscript on my Research Gate Project Page.