This dataset combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR.
In this exercise, we will be comparing ridge regression, LASSO regression, and elastic net regularization. First we load the data.
library(RCurl)
package ‘RCurl’ was built under R version 3.2.4Loading required package: bitops
library(glmnet)
package ‘glmnet’ was built under R version 3.2.4Loading required package: Matrix
package ‘Matrix’ was built under R version 3.2.5Loading required package: foreach
foreach: simple, scalable parallel programming from Revolution Analytics
Use Revolution R for scalability, fault tolerance and more.
http://www.revolutionanalytics.com
Loaded glmnet 2.0-5
## get column names
# specify the URL for the column names and descriptions
names.file.url <- 'http://archive.ics.uci.edu/ml/machine-learning-databases/communities/communities.names'
names.file.lines <- readLines(names.file.url)
# only keep the attribute names, and discard the rest of the lines
names.dirtylines <- grep("@attribute ", names.file.lines, value = TRUE)
# split on spaces and pick the second word
names <- sapply(strsplit(names.dirtylines, " "), "[[", 2)
# drop the first 5 columns
names <- names[6:length(names)]
## download data and join in names
# specify the URL for the Crime and Communities data CSV
urlfile <-'http://archive.ics.uci.edu/ml/machine-learning-databases/communities/communities.data'
# download the file
downloaded <- getURL(urlfile, ssl.verifypeer=FALSE)
# treat the text data as a steam so we can read from it
connection <- textConnection(downloaded)
# parse the downloaded data as CSV
dataset <- read.csv(connection, header=FALSE, na.strings=c("?"))
# drop irrelevant columns
dataset <- dataset[ ,6:ncol(dataset)]
# drop rows with null columns
dataset <- dataset[rowSums(is.na(dataset)) == 0,]
# fix the column names
colnames(dataset) <- names
# preview the first 5 rows
head(dataset)
We now split this dataset into train and test sets (an alternative would be to use cross-validation).
## perform train-test split
# 75% of the sample size
smp_size <- floor(0.75 * nrow(dataset))
## set the seed to make your partition reproductible
set.seed(123)
train_ind <- sample(seq_len(nrow(dataset)), size = smp_size)
df.train <- dataset[train_ind, ]
df.test <- dataset[-train_ind, ]
Q1. Fit a linear regression model on df.train. The goal is to predict ‘ViolentCrimesPerPop’ from the other columns. What is the r-squared on the train data? What about the test data?
We are going to use the glmnet
package for this exercise (as it is helpful for the next few questions as well). glmnet
fits an elastic net model by default. Recall that the loss function for elastic net was given by: \[\underset{w}{\operatorname{argmin}}{\cal L}(X) + (1 - \alpha) \cdot \lambda \sum_{i=1}^M \beta_i^2 + \alpha \cdot \lambda \sum_{i=1}^M |\beta_i|.\] Thus, for linear regression, we need to set \(\lambda = 0\).
Note that this package can fit other generalized linear models as well, including logistic regression for classification, Poisson regression for modeling count data, and Cox regression for survival analysis/churn prediction. To fit these other models, you will have to change the family
parameter in the glmnet
function call.
For details and examples on the package, visit https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html
labelColIdx = match(c("ViolentCrimesPerPop"), names)
fit.linreg <- glmnet(as.matrix(df.train[,-labelColIdx]),
as.matrix(df.train$ViolentCrimesPerPop),
alpha=1, lambda=0, family="gaussian",
standardize=FALSE)
fitted.linreg.train <- predict(fit.linreg, newx = data.matrix(df.train[,-labelColIdx]), s=0)
fitted.linreg.test <- predict(fit.linreg, newx = data.matrix(df.test[,-labelColIdx]), s=0)
cat("\nTrain correlation coefficient: ", cor(as.matrix(df.train$ViolentCrimesPerPop), fitted.linreg.train)[1])
Train correlation coefficient: 0.9281492
cat("\nTest correlation coefficient: ", cor(as.matrix(df.test$ViolentCrimesPerPop), fitted.linreg.test)[1])
Test correlation coefficient: 0.6168479
Q2. Also fit each of a ridge, lasso, and elastic net regression on the same data. Use the function cv.glmnet
to cross-validate and find the best values of \(\lambda\). For elastic net, try a few values of \(\alpha\) as well.
Hint: cv.glmnet
does not optimize for \(\alpha\). You should use the command set.seed(k)
for some fixed k
before each value of \(\alpha\) being run. You can do the cross validation for \(\alpha\) manually for the purpose of this exercise. cv.glmnet
automatically picks appropriate values of \(\lambda\) to try.
Q3. Which model performs the best?
Ridge Regression
fit.ridge <- cv.glmnet(as.matrix(df.train[,-labelColIdx]),
as.matrix(df.train$ViolentCrimesPerPop),
type.measure="mse", alpha=0, family="gaussian",
standardize=FALSE)
fitted.ridge.train <- predict(fit.ridge,
newx = data.matrix(df.train[,-labelColIdx]),
s="lambda.min")
fitted.ridge.test <- predict(fit.ridge,
newx = data.matrix(df.test[,-labelColIdx]),
s="lambda.min")
cat("\nTrain correlation coefficient: ", cor(as.matrix(df.train$ViolentCrimesPerPop), fitted.ridge.train)[1])
Train correlation coefficient: 0.8463421
cat("\nTest correlation coefficient: ", cor(as.matrix(df.test$ViolentCrimesPerPop), fitted.ridge.test)[1])
Test correlation coefficient: 0.7725274
LASSO
fit.lasso <- cv.glmnet(as.matrix(df.train[,-labelColIdx]),
as.matrix(df.train$ViolentCrimesPerPop),
type.measure="mse", alpha=1, family="gaussian",
standardize=FALSE)
fitted.lasso.train <- predict(fit.lasso,
newx = data.matrix(df.train[,-labelColIdx]),
s="lambda.min")
fitted.lasso.test <- predict(fit.lasso,
newx = data.matrix(df.test[,-labelColIdx]),
s="lambda.min")
cat("\nTrain correlation coefficient: ", cor(as.matrix(df.train$ViolentCrimesPerPop), fitted.lasso.train)[1])
Train correlation coefficient: 0.8416111
cat("\nTest correlation coefficient: ", cor(as.matrix(df.test$ViolentCrimesPerPop), fitted.lasso.test)[1])
Test correlation coefficient: 0.7647234
Elastic Net
Note that since we are doing nested grid-search, we have to split the train dataset further into train and test.
set.seed(1)
smp_size <- floor(0.75 * nrow(df.train))
train_ind <- sample(seq_len(nrow(df.train)), size = smp_size)
df.alphatrain <- df.train[train_ind, ]
df.alphatest <- df.train[-train_ind, ]
for (a in log10(seq(1,10,1))) {
set.seed(123)
cat("\nalpha =", a)
fit.elastic <- cv.glmnet(as.matrix(df.alphatrain[,-labelColIdx]),
as.matrix(df.alphatrain$ViolentCrimesPerPop),
type.measure="mse", alpha=a, family="gaussian",
standardize=FALSE)
fitted.elastic.train <- predict(fit.elastic,
newx = data.matrix(df.alphatrain[,-labelColIdx]),
s="lambda.min")
fitted.elastic.test <- predict(fit.elastic,
newx = data.matrix(df.alphatest[,-labelColIdx]),
s="lambda.min")
cat("\nTrain correlation coefficient: ", cor(as.matrix(df.alphatrain$ViolentCrimesPerPop),
fitted.elastic.train)[1])
cat("\nTest correlation coefficient: ", cor(as.matrix(df.alphatest$ViolentCrimesPerPop),
fitted.elastic.test)[1])
}
alpha = 0
Train correlation coefficient: 0.8482096
Test correlation coefficient: 0.8257798
alpha = 0.30103
Train correlation coefficient: 0.836436
Test correlation coefficient: 0.8146347
alpha = 0.4771213
Train correlation coefficient: 0.837226
Test correlation coefficient: 0.8111497
alpha = 0.60206
Train correlation coefficient: 0.8380694
Test correlation coefficient: 0.8097428
alpha = 0.69897
Train correlation coefficient: 0.8385455
Test correlation coefficient: 0.8087591
alpha = 0.7781513
Train correlation coefficient: 0.837388
Test correlation coefficient: 0.8064273
alpha = 0.845098
Train correlation coefficient: 0.837405
Test correlation coefficient: 0.8061786
alpha = 0.90309
Train correlation coefficient: 0.8360578
Test correlation coefficient: 0.8037225
alpha = 0.9542425
Train correlation coefficient: 0.8344337
Test correlation coefficient: 0.800979
alpha = 1
Train correlation coefficient: 0.8343754
Test correlation coefficient: 0.8017738
Elastic net performs the best (looking at the test correlation coefficient). This is unsurprising, since elastic net is a superset of the other models. Picking the optimal value from above, \(\alpha = 0.30103\).
optalpha <- 0.30103
fit.elastic <- cv.glmnet(as.matrix(df.train[,-labelColIdx]),
as.matrix(df.train$ViolentCrimesPerPop),
type.measure="mse", alpha=optalpha,
family="gaussian", standardize=FALSE)
Q4. Make the following scatterplot:
- Each point corresponds to one predictor in the data
- The x-value is the coefficient of that predictor under OLS regression
- The y-value is the coefficient of that predictor using ridge regularization
Q5. Do the same for OLS vs Lasso, and OLS vs ElasticNet. What do you notice about the magnitude of the parameters and numbers of zeros?
plot(coef(fit.linreg, s=0)[-1], coef(fit.ridge, s=0)[-1], main="OLS vs. Ridge parameters",
xlab="OLS", ylab="Ridge", pch=19)
plot(coef(fit.linreg, s=0)[-1], coef(fit.lasso, s="lambda.min")[-1], main="OLS vs. LASSO parameters",
xlab="OLS", ylab="LASSO", pch=19)
plot(coef(fit.linreg, s=0)[-1], coef(fit.elastic, s="lambda.min")[-1], main="OLS vs. Elastic Net parameters",
xlab="OLS", ylab="Elastic Net", pch=19)
The magnitude of the ridge regression parameters are much smaller in magnitude than those for OLS. For LASSO, many parameters are exactly zero, yielding a sparse solution. Same (but to a lesser extent) for the elastic net.
Q6. Make the following scatterplot:
- Each point corresponds to one predictor in the data
- The x-value is the coefficient of that predictor under OLS regression
- The y-value is the correlation coefficient of that predictor with the target
Q7. Based on the above plot, what can you say about the interpretation of the regression coefficients? Does the sign of the coefficient implying the relationship of that variable and the target? What are some potential issues interpreting the magnitude?
# initialize with coefficient vector to keep the row names
corrs <- array(0.0,length(coef(fit.linreg, s=0)[-1]))
cnt <- 0
for (feat in rownames(coef(fit.linreg, s=0))[-1]) {
cnt <- cnt + 1
corrs[cnt] <- cor(as.matrix(df.train$ViolentCrimesPerPop), df.train[,feat])[1]
}
plot(coef(fit.linreg, s=0)[-1], corrs, main="Regression coefficients versus correlation coefficients",
xlab="Linear regression coefficients", ylab="Correlation coefficients", pch=19)
We should be very careful in interpreting regression coefficients directly. While a predictor \(x_i\) may be postively (negatively) correlated with the target, \(\beta_i\) may be negative (positive) in the presence of other predictors, which are correlated with \(x_i\). Similarly, a small \(\beta_i\) does not mean that \(x_i\) is uninformative, but it may be that other variables have ‘captured’ the information, and left \(x_i\) with nothing left to contribute.
Additionally, we have to consider issues of scale - whether we measure distance in meters or feet, the information content does not change. But the regression coefficient would change to scale appropriately. Thus, we have to ensure that our variables are normalized to have zero mean and unit standard deviation prior to fitting the model.
LS0tCnRpdGxlOiAiQ3JpbWUgYW5kIENvbW11bml0aWVzIExpbmVhciBSZWdyZXNzaW9uIEV4ZXJjaXNlIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpUaGlzIGRhdGFzZXQgY29tYmluZXMgc29jaW8tZWNvbm9taWMgZGF0YSBmcm9tIHRoZSAxOTkwIFVTIENlbnN1cywgbGF3IGVuZm9yY2VtZW50IGRhdGEgZnJvbSB0aGUgMTk5MCBVUyBMRU1BUyBzdXJ2ZXksIGFuZCBjcmltZSBkYXRhIGZyb20gdGhlIDE5OTUgRkJJIFVDUi4KCkluIHRoaXMgZXhlcmNpc2UsIHdlIHdpbGwgYmUgY29tcGFyaW5nIHJpZGdlIHJlZ3Jlc3Npb24sIExBU1NPIHJlZ3Jlc3Npb24sIGFuZCBlbGFzdGljIG5ldCByZWd1bGFyaXphdGlvbi4gRmlyc3Qgd2UgbG9hZCB0aGUgZGF0YS4KCmBgYHtyfQpsaWJyYXJ5KFJDdXJsKQpsaWJyYXJ5KGdsbW5ldCkKCiMjIGdldCBjb2x1bW4gbmFtZXMKIyBzcGVjaWZ5IHRoZSBVUkwgZm9yIHRoZSBjb2x1bW4gbmFtZXMgYW5kIGRlc2NyaXB0aW9ucwpuYW1lcy5maWxlLnVybCA8LSAnaHR0cDovL2FyY2hpdmUuaWNzLnVjaS5lZHUvbWwvbWFjaGluZS1sZWFybmluZy1kYXRhYmFzZXMvY29tbXVuaXRpZXMvY29tbXVuaXRpZXMubmFtZXMnCm5hbWVzLmZpbGUubGluZXMgPC0gcmVhZExpbmVzKG5hbWVzLmZpbGUudXJsKQojIG9ubHkga2VlcCB0aGUgYXR0cmlidXRlIG5hbWVzLCBhbmQgZGlzY2FyZCB0aGUgcmVzdCBvZiB0aGUgbGluZXMKbmFtZXMuZGlydHlsaW5lcyA8LSBncmVwKCJAYXR0cmlidXRlICIsIG5hbWVzLmZpbGUubGluZXMsIHZhbHVlID0gVFJVRSkKIyBzcGxpdCBvbiBzcGFjZXMgYW5kIHBpY2sgdGhlIHNlY29uZCB3b3JkCm5hbWVzIDwtIHNhcHBseShzdHJzcGxpdChuYW1lcy5kaXJ0eWxpbmVzLCAiICIpLCAiW1siLCAyKQojIGRyb3AgdGhlIGZpcnN0IDUgY29sdW1ucwpuYW1lcyA8LSBuYW1lc1s2Omxlbmd0aChuYW1lcyldCgojIyBkb3dubG9hZCBkYXRhIGFuZCBqb2luIGluIG5hbWVzCiMgc3BlY2lmeSB0aGUgVVJMIGZvciB0aGUgQ3JpbWUgYW5kIENvbW11bml0aWVzIGRhdGEgQ1NWCnVybGZpbGUgPC0naHR0cDovL2FyY2hpdmUuaWNzLnVjaS5lZHUvbWwvbWFjaGluZS1sZWFybmluZy1kYXRhYmFzZXMvY29tbXVuaXRpZXMvY29tbXVuaXRpZXMuZGF0YScKIyBkb3dubG9hZCB0aGUgZmlsZQpkb3dubG9hZGVkIDwtIGdldFVSTCh1cmxmaWxlLCBzc2wudmVyaWZ5cGVlcj1GQUxTRSkKIyB0cmVhdCB0aGUgdGV4dCBkYXRhIGFzIGEgc3RlYW0gc28gd2UgY2FuIHJlYWQgZnJvbSBpdApjb25uZWN0aW9uIDwtIHRleHRDb25uZWN0aW9uKGRvd25sb2FkZWQpCiMgcGFyc2UgdGhlIGRvd25sb2FkZWQgZGF0YSBhcyBDU1YKZGF0YXNldCA8LSByZWFkLmNzdihjb25uZWN0aW9uLCBoZWFkZXI9RkFMU0UsIG5hLnN0cmluZ3M9YygiPyIpKQojIGRyb3AgaXJyZWxldmFudCBjb2x1bW5zCmRhdGFzZXQgPC0gZGF0YXNldFsgLDY6bmNvbChkYXRhc2V0KV0KIyBkcm9wIHJvd3Mgd2l0aCBudWxsIGNvbHVtbnMKZGF0YXNldCA8LSBkYXRhc2V0W3Jvd1N1bXMoaXMubmEoZGF0YXNldCkpID09IDAsXQojIGZpeCB0aGUgY29sdW1uIG5hbWVzCmNvbG5hbWVzKGRhdGFzZXQpIDwtIG5hbWVzCiMgcHJldmlldyB0aGUgZmlyc3QgNSByb3dzCmhlYWQoZGF0YXNldCkKYGBgCgpXZSBub3cgc3BsaXQgdGhpcyBkYXRhc2V0IGludG8gdHJhaW4gYW5kIHRlc3Qgc2V0cyAoYW4gYWx0ZXJuYXRpdmUgd291bGQgYmUgdG8gdXNlIGNyb3NzLXZhbGlkYXRpb24pLgoKYGBge3J9CiMjIHBlcmZvcm0gdHJhaW4tdGVzdCBzcGxpdAojIDc1JSBvZiB0aGUgc2FtcGxlIHNpemUKc21wX3NpemUgPC0gZmxvb3IoMC43NSAqIG5yb3coZGF0YXNldCkpCgojIyBzZXQgdGhlIHNlZWQgdG8gbWFrZSB5b3VyIHBhcnRpdGlvbiByZXByb2R1Y3RpYmxlCnNldC5zZWVkKDEyMykKdHJhaW5faW5kIDwtIHNhbXBsZShzZXFfbGVuKG5yb3coZGF0YXNldCkpLCBzaXplID0gc21wX3NpemUpCgpkZi50cmFpbiA8LSBkYXRhc2V0W3RyYWluX2luZCwgXQpkZi50ZXN0IDwtIGRhdGFzZXRbLXRyYWluX2luZCwgXQpgYGAKPGJyPgoKIyMjIyBRMS4gRml0IGEgbGluZWFyIHJlZ3Jlc3Npb24gbW9kZWwgb24gZGYudHJhaW4uIFRoZSBnb2FsIGlzIHRvIHByZWRpY3QgJ1Zpb2xlbnRDcmltZXNQZXJQb3AnIGZyb20gdGhlIG90aGVyIGNvbHVtbnMuIFdoYXQgaXMgdGhlIHItc3F1YXJlZCBvbiB0aGUgdHJhaW4gZGF0YT8gV2hhdCBhYm91dCB0aGUgdGVzdCBkYXRhPwoKV2UgYXJlIGdvaW5nIHRvIHVzZSB0aGUgYGdsbW5ldGAgcGFja2FnZSBmb3IgdGhpcyBleGVyY2lzZSAoYXMgaXQgaXMgaGVscGZ1bCBmb3IgdGhlIG5leHQgZmV3IHF1ZXN0aW9ucyBhcyB3ZWxsKS4gYGdsbW5ldGAgZml0cyBhbiBlbGFzdGljIG5ldCBtb2RlbCBieSBkZWZhdWx0LiBSZWNhbGwgdGhhdCB0aGUgbG9zcyBmdW5jdGlvbiBmb3IgZWxhc3RpYyBuZXQgd2FzIGdpdmVuIGJ5OgokJFx1bmRlcnNldHt3fXtcb3BlcmF0b3JuYW1le2FyZ21pbn19e1xjYWwgTH0oWCkgKyAoMSAtIFxhbHBoYSkgXGNkb3QgXGxhbWJkYSBcc3VtX3tpPTF9Xk0gXGJldGFfaV4yICsgXGFscGhhIFxjZG90IFxsYW1iZGEgXHN1bV97aT0xfV5NIHxcYmV0YV9pfC4kJApUaHVzLCBmb3IgbGluZWFyIHJlZ3Jlc3Npb24sIHdlIG5lZWQgdG8gc2V0ICRcbGFtYmRhID0gMCQuCgpOb3RlIHRoYXQgdGhpcyBwYWNrYWdlIGNhbiBmaXQgb3RoZXIgZ2VuZXJhbGl6ZWQgbGluZWFyIG1vZGVscyBhcyB3ZWxsLCBpbmNsdWRpbmcgbG9naXN0aWMgcmVncmVzc2lvbiBmb3IgY2xhc3NpZmljYXRpb24sIFBvaXNzb24gcmVncmVzc2lvbiBmb3IgbW9kZWxpbmcgY291bnQgZGF0YSwgYW5kIENveCByZWdyZXNzaW9uIGZvciBzdXJ2aXZhbCBhbmFseXNpcy9jaHVybiBwcmVkaWN0aW9uLiBUbyBmaXQgdGhlc2Ugb3RoZXIgbW9kZWxzLCB5b3Ugd2lsbCBoYXZlIHRvIGNoYW5nZSB0aGUgYGZhbWlseWAgcGFyYW1ldGVyIGluIHRoZSBgZ2xtbmV0YCBmdW5jdGlvbiBjYWxsLgoKRm9yIGRldGFpbHMgYW5kIGV4YW1wbGVzIG9uIHRoZSBwYWNrYWdlLCB2aXNpdCBodHRwczovL3dlYi5zdGFuZm9yZC5lZHUvfmhhc3RpZS9nbG1uZXQvZ2xtbmV0X2FscGhhLmh0bWwKCmBgYHtyfQpsYWJlbENvbElkeCA9IG1hdGNoKGMoIlZpb2xlbnRDcmltZXNQZXJQb3AiKSwgbmFtZXMpCgpmaXQubGlucmVnIDwtIGdsbW5ldChhcy5tYXRyaXgoZGYudHJhaW5bLC1sYWJlbENvbElkeF0pLAogICAgICAgICAgICAgICAgICAgICBhcy5tYXRyaXgoZGYudHJhaW4kVmlvbGVudENyaW1lc1BlclBvcCksCiAgICAgICAgICAgICAgICAgICAgIGFscGhhPTEsIGxhbWJkYT0wLCBmYW1pbHk9ImdhdXNzaWFuIiwKICAgICAgICAgICAgICAgICAgICAgc3RhbmRhcmRpemU9RkFMU0UpCgpmaXR0ZWQubGlucmVnLnRyYWluIDwtIHByZWRpY3QoZml0LmxpbnJlZywgbmV3eCA9IGRhdGEubWF0cml4KGRmLnRyYWluWywtbGFiZWxDb2xJZHhdKSwgcz0wKQpmaXR0ZWQubGlucmVnLnRlc3QgPC0gcHJlZGljdChmaXQubGlucmVnLCBuZXd4ID0gZGF0YS5tYXRyaXgoZGYudGVzdFssLWxhYmVsQ29sSWR4XSksIHM9MCkKCmNhdCgiXG5UcmFpbiBjb3JyZWxhdGlvbiBjb2VmZmljaWVudDogIiwgY29yKGFzLm1hdHJpeChkZi50cmFpbiRWaW9sZW50Q3JpbWVzUGVyUG9wKSwgZml0dGVkLmxpbnJlZy50cmFpbilbMV0pCmNhdCgiXG5UZXN0IGNvcnJlbGF0aW9uIGNvZWZmaWNpZW50OiAiLCBjb3IoYXMubWF0cml4KGRmLnRlc3QkVmlvbGVudENyaW1lc1BlclBvcCksIGZpdHRlZC5saW5yZWcudGVzdClbMV0pCmBgYAo8YnI+CgojIyMjIFEyLiBBbHNvIGZpdCBlYWNoIG9mIGEgcmlkZ2UsIGxhc3NvLCBhbmQgZWxhc3RpYyBuZXQgcmVncmVzc2lvbiBvbiB0aGUgc2FtZSBkYXRhLiBVc2UgdGhlIGZ1bmN0aW9uIGBjdi5nbG1uZXRgIHRvIGNyb3NzLXZhbGlkYXRlIGFuZCBmaW5kIHRoZSBiZXN0IHZhbHVlcyBvZiAkXGxhbWJkYSQuIEZvciBlbGFzdGljIG5ldCwgdHJ5IGEgZmV3IHZhbHVlcyBvZiAkXGFscGhhJCBhcyB3ZWxsLgoKIyMjIyMgSGludDogYGN2LmdsbW5ldGAgZG9lcyBub3Qgb3B0aW1pemUgZm9yICRcYWxwaGEkLiBZb3Ugc2hvdWxkIHVzZSB0aGUgY29tbWFuZCBgc2V0LnNlZWQoaylgIGZvciBzb21lIGZpeGVkIGBrYCBiZWZvcmUgZWFjaCB2YWx1ZSBvZiAkXGFscGhhJCBiZWluZyBydW4uIFlvdSBjYW4gZG8gdGhlIGNyb3NzIHZhbGlkYXRpb24gZm9yICRcYWxwaGEkIG1hbnVhbGx5IGZvciB0aGUgcHVycG9zZSBvZiB0aGlzIGV4ZXJjaXNlLiBgY3YuZ2xtbmV0YCBhdXRvbWF0aWNhbGx5IHBpY2tzIGFwcHJvcHJpYXRlIHZhbHVlcyBvZiAkXGxhbWJkYSQgdG8gdHJ5Lgo8YnI+CgojIyMjIFEzLiBXaGljaCBtb2RlbCBwZXJmb3JtcyB0aGUgYmVzdD8KCiMjIyMgUmlkZ2UgUmVncmVzc2lvbgpgYGB7cn0KZml0LnJpZGdlIDwtIGN2LmdsbW5ldChhcy5tYXRyaXgoZGYudHJhaW5bLC1sYWJlbENvbElkeF0pLAogICAgICAgICAgICAgICAgICAgICAgICBhcy5tYXRyaXgoZGYudHJhaW4kVmlvbGVudENyaW1lc1BlclBvcCksCiAgICAgICAgICAgICAgICAgICAgICAgIHR5cGUubWVhc3VyZT0ibXNlIiwgYWxwaGE9MCwgZmFtaWx5PSJnYXVzc2lhbiIsCiAgICAgICAgICAgICAgICAgICAgICAgIHN0YW5kYXJkaXplPUZBTFNFKQoKZml0dGVkLnJpZGdlLnRyYWluIDwtIHByZWRpY3QoZml0LnJpZGdlLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICBuZXd4ID0gZGF0YS5tYXRyaXgoZGYudHJhaW5bLC1sYWJlbENvbElkeF0pLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICBzPSJsYW1iZGEubWluIikKZml0dGVkLnJpZGdlLnRlc3QgPC0gcHJlZGljdChmaXQucmlkZ2UsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgbmV3eCA9IGRhdGEubWF0cml4KGRmLnRlc3RbLC1sYWJlbENvbElkeF0pLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgIHM9ImxhbWJkYS5taW4iKQoKY2F0KCJcblRyYWluIGNvcnJlbGF0aW9uIGNvZWZmaWNpZW50OiAiLCBjb3IoYXMubWF0cml4KGRmLnRyYWluJFZpb2xlbnRDcmltZXNQZXJQb3ApLCBmaXR0ZWQucmlkZ2UudHJhaW4pWzFdKQpjYXQoIlxuVGVzdCBjb3JyZWxhdGlvbiBjb2VmZmljaWVudDogIiwgY29yKGFzLm1hdHJpeChkZi50ZXN0JFZpb2xlbnRDcmltZXNQZXJQb3ApLCBmaXR0ZWQucmlkZ2UudGVzdClbMV0pCmBgYAojIyMjIExBU1NPCmBgYHtyfQpmaXQubGFzc28gPC0gY3YuZ2xtbmV0KGFzLm1hdHJpeChkZi50cmFpblssLWxhYmVsQ29sSWR4XSksCiAgICAgICAgICAgICAgICAgICAgICAgIGFzLm1hdHJpeChkZi50cmFpbiRWaW9sZW50Q3JpbWVzUGVyUG9wKSwKICAgICAgICAgICAgICAgICAgICAgICAgdHlwZS5tZWFzdXJlPSJtc2UiLCBhbHBoYT0xLCBmYW1pbHk9ImdhdXNzaWFuIiwKICAgICAgICAgICAgICAgICAgICAgICAgc3RhbmRhcmRpemU9RkFMU0UpCgpmaXR0ZWQubGFzc28udHJhaW4gPC0gcHJlZGljdChmaXQubGFzc28sCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIG5ld3ggPSBkYXRhLm1hdHJpeChkZi50cmFpblssLWxhYmVsQ29sSWR4XSksCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHM9ImxhbWJkYS5taW4iKQpmaXR0ZWQubGFzc28udGVzdCA8LSBwcmVkaWN0KGZpdC5sYXNzbywKICAgICAgICAgICAgICAgICAgICAgICAgICAgICBuZXd4ID0gZGF0YS5tYXRyaXgoZGYudGVzdFssLWxhYmVsQ29sSWR4XSksCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgcz0ibGFtYmRhLm1pbiIpCgpjYXQoIlxuVHJhaW4gY29ycmVsYXRpb24gY29lZmZpY2llbnQ6ICIsIGNvcihhcy5tYXRyaXgoZGYudHJhaW4kVmlvbGVudENyaW1lc1BlclBvcCksIGZpdHRlZC5sYXNzby50cmFpbilbMV0pCmNhdCgiXG5UZXN0IGNvcnJlbGF0aW9uIGNvZWZmaWNpZW50OiAiLCBjb3IoYXMubWF0cml4KGRmLnRlc3QkVmlvbGVudENyaW1lc1BlclBvcCksIGZpdHRlZC5sYXNzby50ZXN0KVsxXSkKYGBgCiMjIyMgRWxhc3RpYyBOZXQKCiMjIyMgTm90ZSB0aGF0IHNpbmNlIHdlIGFyZSBkb2luZyBuZXN0ZWQgZ3JpZC1zZWFyY2gsIHdlIGhhdmUgdG8gc3BsaXQgdGhlIHRyYWluIGRhdGFzZXQgZnVydGhlciBpbnRvIHRyYWluIGFuZCB0ZXN0LgoKYGBge3J9CnNldC5zZWVkKDEpCnNtcF9zaXplIDwtIGZsb29yKDAuNzUgKiBucm93KGRmLnRyYWluKSkKdHJhaW5faW5kIDwtIHNhbXBsZShzZXFfbGVuKG5yb3coZGYudHJhaW4pKSwgc2l6ZSA9IHNtcF9zaXplKQoKZGYuYWxwaGF0cmFpbiA8LSBkZi50cmFpblt0cmFpbl9pbmQsIF0KZGYuYWxwaGF0ZXN0IDwtIGRmLnRyYWluWy10cmFpbl9pbmQsIF0KCmZvciAoYSBpbiBsb2cxMChzZXEoMSwxMCwxKSkpIHsKICBzZXQuc2VlZCgxMjMpCiAgCiAgY2F0KCJcbmFscGhhID0iLCBhKQogIGZpdC5lbGFzdGljIDwtIGN2LmdsbW5ldChhcy5tYXRyaXgoZGYuYWxwaGF0cmFpblssLWxhYmVsQ29sSWR4XSksCiAgICAgICAgICAgICAgICAgICAgICAgICBhcy5tYXRyaXgoZGYuYWxwaGF0cmFpbiRWaW9sZW50Q3JpbWVzUGVyUG9wKSwKICAgICAgICAgICAgICAgICAgICAgICAgIHR5cGUubWVhc3VyZT0ibXNlIiwgYWxwaGE9YSwgZmFtaWx5PSJnYXVzc2lhbiIsCiAgICAgICAgICAgICAgICAgICAgICAgICBzdGFuZGFyZGl6ZT1GQUxTRSkKCiAgZml0dGVkLmVsYXN0aWMudHJhaW4gPC0gcHJlZGljdChmaXQuZWxhc3RpYywKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIG5ld3ggPSBkYXRhLm1hdHJpeChkZi5hbHBoYXRyYWluWywtbGFiZWxDb2xJZHhdKSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHM9ImxhbWJkYS5taW4iKQogIGZpdHRlZC5lbGFzdGljLnRlc3QgPC0gcHJlZGljdChmaXQuZWxhc3RpYywKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgbmV3eCA9IGRhdGEubWF0cml4KGRmLmFscGhhdGVzdFssLWxhYmVsQ29sSWR4XSksCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHM9ImxhbWJkYS5taW4iKQoKICBjYXQoIlxuVHJhaW4gY29ycmVsYXRpb24gY29lZmZpY2llbnQ6ICIsIGNvcihhcy5tYXRyaXgoZGYuYWxwaGF0cmFpbiRWaW9sZW50Q3JpbWVzUGVyUG9wKSwgICAgICAgCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGZpdHRlZC5lbGFzdGljLnRyYWluKVsxXSkKICBjYXQoIlxuVGVzdCBjb3JyZWxhdGlvbiBjb2VmZmljaWVudDogIiwgY29yKGFzLm1hdHJpeChkZi5hbHBoYXRlc3QkVmlvbGVudENyaW1lc1BlclBvcCksIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGZpdHRlZC5lbGFzdGljLnRlc3QpWzFdKQp9CmBgYAoKRWxhc3RpYyBuZXQgcGVyZm9ybXMgdGhlIGJlc3QgKGxvb2tpbmcgYXQgdGhlIHRlc3QgY29ycmVsYXRpb24gY29lZmZpY2llbnQpLiBUaGlzIGlzIHVuc3VycHJpc2luZywgc2luY2UgZWxhc3RpYyBuZXQgaXMgYSBzdXBlcnNldCBvZiB0aGUgb3RoZXIgbW9kZWxzLiBQaWNraW5nIHRoZSBvcHRpbWFsIHZhbHVlIGZyb20gYWJvdmUsICRcYWxwaGEgPSAwLjMwMTAzJC4KCmBgYHtyfQpvcHRhbHBoYSA8LSAwLjMwMTAzCmZpdC5lbGFzdGljIDwtIGN2LmdsbW5ldChhcy5tYXRyaXgoZGYudHJhaW5bLC1sYWJlbENvbElkeF0pLAogICAgICAgICAgICAgICAgICAgICAgICAgYXMubWF0cml4KGRmLnRyYWluJFZpb2xlbnRDcmltZXNQZXJQb3ApLAogICAgICAgICAgICAgICAgICAgICAgICAgdHlwZS5tZWFzdXJlPSJtc2UiLCBhbHBoYT1vcHRhbHBoYSwgCiAgICAgICAgICAgICAgICAgICAgICAgICBmYW1pbHk9ImdhdXNzaWFuIiwgc3RhbmRhcmRpemU9RkFMU0UpCmBgYAo8YnI+CgojIyMjIFE0LiBNYWtlIHRoZSBmb2xsb3dpbmcgc2NhdHRlcnBsb3Q6CiMjIyMgLSBFYWNoIHBvaW50IGNvcnJlc3BvbmRzIHRvIG9uZSBwcmVkaWN0b3IgaW4gdGhlIGRhdGEKIyMjIyAtIFRoZSB4LXZhbHVlIGlzIHRoZSBjb2VmZmljaWVudCBvZiB0aGF0IHByZWRpY3RvciB1bmRlciBPTFMgcmVncmVzc2lvbgojIyMjIC0gVGhlIHktdmFsdWUgaXMgdGhlIGNvZWZmaWNpZW50IG9mIHRoYXQgcHJlZGljdG9yIHVzaW5nIHJpZGdlIHJlZ3VsYXJpemF0aW9uCjxicj4KCiMjIyMgUTUuIERvIHRoZSBzYW1lIGZvciBPTFMgdnMgTGFzc28sIGFuZCBPTFMgdnMgRWxhc3RpY05ldC4gV2hhdCBkbyB5b3Ugbm90aWNlIGFib3V0IHRoZSBtYWduaXR1ZGUgb2YgdGhlIHBhcmFtZXRlcnMgYW5kIG51bWJlcnMgb2YgemVyb3M/CgpgYGB7cn0KcGxvdChjb2VmKGZpdC5saW5yZWcsIHM9MClbLTFdLCBjb2VmKGZpdC5yaWRnZSwgcz0wKVstMV0sIG1haW49Ik9MUyB2cy4gUmlkZ2UgcGFyYW1ldGVycyIsIAogIAl4bGFiPSJPTFMiLCB5bGFiPSJSaWRnZSIsIHBjaD0xOSkKcGxvdChjb2VmKGZpdC5saW5yZWcsIHM9MClbLTFdLCBjb2VmKGZpdC5sYXNzbywgcz0ibGFtYmRhLm1pbiIpWy0xXSwgbWFpbj0iT0xTIHZzLiBMQVNTTyBwYXJhbWV0ZXJzIiwgCiAgCXhsYWI9Ik9MUyIsIHlsYWI9IkxBU1NPIiwgcGNoPTE5KQpwbG90KGNvZWYoZml0LmxpbnJlZywgcz0wKVstMV0sIGNvZWYoZml0LmVsYXN0aWMsIHM9ImxhbWJkYS5taW4iKVstMV0sIG1haW49Ik9MUyB2cy4gRWxhc3RpYyBOZXQgcGFyYW1ldGVycyIsIAogIAl4bGFiPSJPTFMiLCB5bGFiPSJFbGFzdGljIE5ldCIsIHBjaD0xOSkKYGBgClRoZSBtYWduaXR1ZGUgb2YgdGhlIHJpZGdlIHJlZ3Jlc3Npb24gcGFyYW1ldGVycyBhcmUgbXVjaCBzbWFsbGVyIGluIG1hZ25pdHVkZSB0aGFuIHRob3NlIGZvciBPTFMuIEZvciBMQVNTTywgbWFueSBwYXJhbWV0ZXJzIGFyZSBleGFjdGx5IHplcm8sIHlpZWxkaW5nIGEgc3BhcnNlIHNvbHV0aW9uLiBTYW1lIChidXQgdG8gYSBsZXNzZXIgZXh0ZW50KSBmb3IgdGhlIGVsYXN0aWMgbmV0Lgo8YnI+CgojIyMjIFE2LiBNYWtlIHRoZSBmb2xsb3dpbmcgc2NhdHRlcnBsb3Q6CiMjIyMgLSBFYWNoIHBvaW50IGNvcnJlc3BvbmRzIHRvIG9uZSBwcmVkaWN0b3IgaW4gdGhlIGRhdGEKIyMjIyAtIFRoZSB4LXZhbHVlIGlzIHRoZSBjb2VmZmljaWVudCBvZiB0aGF0IHByZWRpY3RvciB1bmRlciBPTFMgcmVncmVzc2lvbgojIyMjIC0gVGhlIHktdmFsdWUgaXMgdGhlIGNvcnJlbGF0aW9uIGNvZWZmaWNpZW50IG9mIHRoYXQgcHJlZGljdG9yIHdpdGggdGhlIHRhcmdldAo8YnI+CgojIyMjIFE3LiBCYXNlZCBvbiB0aGUgYWJvdmUgcGxvdCwgd2hhdCBjYW4geW91IHNheSBhYm91dCB0aGUgaW50ZXJwcmV0YXRpb24gb2YgdGhlIHJlZ3Jlc3Npb24gY29lZmZpY2llbnRzPyBEb2VzIHRoZSBzaWduIG9mIHRoZSBjb2VmZmljaWVudCBpbXBseWluZyB0aGUgcmVsYXRpb25zaGlwIG9mIHRoYXQgdmFyaWFibGUgYW5kIHRoZSB0YXJnZXQ/IFdoYXQgYXJlIHNvbWUgcG90ZW50aWFsIGlzc3VlcyBpbnRlcnByZXRpbmcgdGhlIG1hZ25pdHVkZT8KCmBgYHtyfQojIGluaXRpYWxpemUgd2l0aCBjb2VmZmljaWVudCB2ZWN0b3IgdG8ga2VlcCB0aGUgcm93IG5hbWVzCmNvcnJzIDwtIGFycmF5KDAuMCxsZW5ndGgoY29lZihmaXQubGlucmVnLCBzPTApWy0xXSkpCgpjbnQgPC0gMApmb3IgKGZlYXQgaW4gcm93bmFtZXMoY29lZihmaXQubGlucmVnLCBzPTApKVstMV0pIHsKICBjbnQgPC0gY250ICsgMQogIGNvcnJzW2NudF0gPC0gY29yKGFzLm1hdHJpeChkZi50cmFpbiRWaW9sZW50Q3JpbWVzUGVyUG9wKSwgZGYudHJhaW5bLGZlYXRdKVsxXQp9CgpwbG90KGNvZWYoZml0LmxpbnJlZywgcz0wKVstMV0sIGNvcnJzLCBtYWluPSJSZWdyZXNzaW9uIGNvZWZmaWNpZW50cyB2ZXJzdXMgY29ycmVsYXRpb24gY29lZmZpY2llbnRzIiwgCiAgICAJeGxhYj0iTGluZWFyIHJlZ3Jlc3Npb24gY29lZmZpY2llbnRzIiwgeWxhYj0iQ29ycmVsYXRpb24gY29lZmZpY2llbnRzIiwgcGNoPTE5KQpgYGAKCldlIHNob3VsZCBiZSB2ZXJ5IGNhcmVmdWwgaW4gaW50ZXJwcmV0aW5nIHJlZ3Jlc3Npb24gY29lZmZpY2llbnRzIGRpcmVjdGx5LiBXaGlsZSBhIHByZWRpY3RvciAkeF9pJCBtYXkgYmUgcG9zdGl2ZWx5IChuZWdhdGl2ZWx5KSBjb3JyZWxhdGVkIHdpdGggdGhlIHRhcmdldCwgJFxiZXRhX2kkIG1heSBiZSBuZWdhdGl2ZSAocG9zaXRpdmUpIGluIHRoZSBwcmVzZW5jZSBvZiBvdGhlciBwcmVkaWN0b3JzLCB3aGljaCBhcmUgY29ycmVsYXRlZCB3aXRoICR4X2kkLiBTaW1pbGFybHksIGEgc21hbGwgJFxiZXRhX2kkIGRvZXMgbm90IG1lYW4gdGhhdCAkeF9pJCBpcyB1bmluZm9ybWF0aXZlLCBidXQgaXQgbWF5IGJlIHRoYXQgb3RoZXIgdmFyaWFibGVzIGhhdmUgJ2NhcHR1cmVkJyB0aGUgaW5mb3JtYXRpb24sIGFuZCBsZWZ0ICR4X2kkIHdpdGggbm90aGluZyBsZWZ0IHRvIGNvbnRyaWJ1dGUuCgpBZGRpdGlvbmFsbHksIHdlIGhhdmUgdG8gY29uc2lkZXIgaXNzdWVzIG9mIHNjYWxlIC0gd2hldGhlciB3ZSBtZWFzdXJlIGRpc3RhbmNlIGluIG1ldGVycyBvciBmZWV0LCB0aGUgaW5mb3JtYXRpb24gY29udGVudCBkb2VzIG5vdCBjaGFuZ2UuIEJ1dCB0aGUgcmVncmVzc2lvbiBjb2VmZmljaWVudCB3b3VsZCBjaGFuZ2UgdG8gc2NhbGUgYXBwcm9wcmlhdGVseS4gVGh1cywgd2UgaGF2ZSB0byBlbnN1cmUgdGhhdCBvdXIgdmFyaWFibGVzIGFyZSBub3JtYWxpemVkIHRvIGhhdmUgemVybyBtZWFuIGFuZCB1bml0IHN0YW5kYXJkIGRldmlhdGlvbiBwcmlvciB0byBmaXR0aW5nIHRoZSBtb2RlbC4K