Intro
It all began with the fact that I’ve decided to study k-means clustering algorithm. The more resources I read, the more confused . The definition of cluster centers has been stumbling-block. The most accessible explanation was provided by Wikipedia:
Assign each observation to the cluster whose mean has the least squared Euclidean distance.
But how does this happen? So I’ve decided to reinvent a wheel and reproduce step by step all stages of the algorithm. As a basis I’ve decided to take an example from the book by Roger D. Peng Exploratory Data Analysis with R.
Because I’m a passionate fan of R I’ve decided to use fantastic package tydiverse to reproduce algorithm and gganimate to visualize result.
So, let’s begin!
1. Dive into k-means
Data preparation
I’ve decided to reproduce dataset that is provided by Roger D.Peng (Figure 12.1: Simulated dataset)
set.seed(1234)
x <- c(0.75, 1.1, 1.25, 0.55, 2.15, 2.17, 1.85, 1.86, 2.87, 2.75, 2.91, 2.71)
y <- c(0.6, 1.0, 1.25, 1.0, 1.9, 1.8, 1.85, 2.5, 1.0, 0.85, 0.86, 1.18)
df_xy <- tibble(x, y) %>% rowid_to_column()
# Length of dataset
n_xy <- dim(df_xy)[1]
Let’s take three clusters for example
Cx = c(1, 1.7, 2.5)
Cy = c(2, 1.0, 1.5)
# Number of clusters define as length of Cx or Cy vectors
n_clust <- length(Cx)
df_cl <- tibble(Cluster = factor(seq(n_clust), ordered = T), Cx, Cy)
Next create a dataset with coordinates of observations and coordinates of cluster centers
df_kmeans0 <- bind_cols(df_xy, map_dfr(df_cl, rep, length.out = length(x)))
Here’s how all this looks like at the initial stage
ggplot(df_kmeans0, aes(x, y)) +
geom_point(shape = 1, size = 5, stroke = 1) +
geom_text(aes(x, y, label = rownames(df_xy)), vjust = 2.0, size = 4, color = "black") +
geom_point(aes(Cx, Cy, color = Cluster), shape = 3, size = 5, stroke = 1) +
coord_equal() +
labs(title = "The initial location of the cluster centers") +
theme(plot.title = element_text(hjust = 0.5))
Now let’s look what’s happen under the hood of k-means
Step 1. Calculate squared Euclidean distance between each observations and each clusters
Firs of all we need to choose a metric for measuring distance between observations and cluster’s centers. Let it be Euclidean distance or rather squared Euclidean distance.
Eu_dist2 = (xi - xc)2 + (yi - yc)2
Next, it needs to determine the distance from each point to each cluster. Since we have 12 points and 3 clusters, we get 36 distances. It was the first stumbling-block.
df_dist <- bind_cols(map_dfr(df_xy, rep, n_clust),
map_dfr(df_cl, rep, each = n_xy)) %>%
mutate(Eu_dist = sqrt((.$x - .$Cx)^2 + (.$y - .$Cy)^2))
df_dist
Step.2 Assign each observation to the cluster whose mean has the least squared Euclidean distance
step2 <- df_dist %>%
select(-Cx, -Cy) %>%
spread(Cluster, Eu_dist) %>%
nest(-rowid, -x, -y) %>%
mutate(Eu_dist = map_dbl(.$data, min),
Cluster = map_dbl(.$data, which.min) %>% factor(., ordered = T)) %>%
unnest(data)
step2
Columns 1-3 represent all (3 in our case) distances of observation to each cluster. We need to choose the minimum value for each observation. They are gathered in Eu_dist column. So, the columns Eu_dist and Cluster demonstrate which observations to which clusters belong. It was the second and the last stumbling-block.
Step 3. Join cluster’s coordinates to previous dataset
Now it remains to add the coordinates of cluster centers to our previous data set
step2 %>%
left_join(df_cl, by = 'Cluster') -> step3
step3
Let’s demonstrate result
ggplot(step3, aes(x, y, color = Cluster)) +
geom_point(size = 5) +
geom_point(aes(x = step3$Cx, y = step3$Cy), shape = 3, size = 7) +
geom_text(label = rownames(step3), vjust = 1.8, size = 4, color = 'black') +
coord_fixed()
Step 4. Recalculation center’s coordinates of new clusters
step3 %>%
select(-num_range("", 1:n_clust)) %>%
group_by(Cluster) %>%
mutate(Cx2 = mean(x),
Cy2 = mean(y)) -> step4
step4
Let’s look at new clusters (smaller size - are initial values)
step4 %>%
ggplot(aes(x, y, color = Cluster)) +
geom_point(size = 5) +
geom_point(aes(x = step4$Cx, y = step4$Cy), shape = 3, size = 3) +
geom_point(aes(x = step4$Cx2, y = step4$Cy2), shape = 3, size = 6) +
geom_text(aes(label = rownames(step4)), vjust = 1.8, size = 4, color = 'black')
Put our dataset with new clusters into “canonical” form
step4 %>%
select(Cluster, Cx2, Cy2) %>%
group_by(Cluster) %>%
unique(.) %>%
arrange(Cluster) -> new_iteration
new_iteration
Next - repeat iteration from step 1 and so on, but for this purpose it needs to have something like recursive function or even two!. I’ve created k_means() function that execute k-means algorithm and wrapper_kmeans() function for calling k_means() in the best way. These function and couple others are contained in helper_functions.R file.
It must be noted, at this stage I’ve became aware of the principle of k-means, but I wanted to see the result for several iterations and compare one with a regular algorithm - the kmeans() function.
So, let’s keep on!
2. k-means from a bird’s eye view
Set number of iteration and pass it with observation (df_xy) and cluster (df_cl) tibbles to wraper_kmeans(). It return list of tibbles with observations and clusters from k_means() function from all iterations.
n_iter <- 5 # nubmer of k-means iterations
df_kmeans <- wraper_kmeans(df_xy, df_cl, n_iter)
df_kmeans[[n_iter]]
Also we can view main results of k_means algorithm for chosen iteration
my_kmeans <- get_kmeans(n_iter, df_kmeans)
my_kmeans
… and compare it this result of build-in k-means() function
And how about visualization? And it is desirable with animation!
Yes of course! But first make “long” dataframe from the df_kmeans list:
df_long <- map_dfr(df_kmeans, ~ .)
Ta da!
theme_ani <- theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
legend.title = element_text(face = "bold", colour = "black", size = 12),
legend.text = element_text(size = 11, color = "black"),
axis.text = element_text(size = 11, colour = "black"),
axis.title = element_text(size = 12, face = "bold"))
ggplot(df_long, aes(x, y, color = Cluster, frame = Iteration)) +
geom_point(aes(group = Cluster), size = 5) +
geom_point(aes(x = Cx, y = Cy), shape = 3, size = 5, color = "black", stroke = 1) +
geom_path(aes(x = Cx, y = Cy, cumulative = TRUE, group = Cluster),
color = "black") +
geom_text(aes(x, y, label = rowid), vjust = 1.8, size = 4, color = 'black') +
geom_polygon(data = df_polygon(df_kmeans),
aes(x, y, group = Cluster, fill = Cluster), alpha = .5) +
coord_equal() +
labs(title = "How k-means work:") +
theme_ani -> gg
gganimate(gg)
ggplot(df_long, aes(x = rowid, y = Eu_dist, frame = Iteration, cumulative = TRUE)) +
geom_point(aes(color = Iteration), size = 5) +
geom_line(aes(color = Iteration), size = 1) +
scale_x_continuous(breaks = 1:dim(df_xy)[1]) +
labs(x = "Observations", y = "Euclidean distance squared",
title = "Squared Euclidean distance for each observation on:") +
theme_ani -> gg_dist
gganimate(gg_dist)
df_long %>%
group_by(Iteration) %>%
summarise(tot_sum = sum(Eu_dist)) %>%
ungroup() %>%
ggplot(aes(x = Iteration, y = tot_sum, frame = Iteration, cumulative = TRUE)) +
geom_bar(stat = 'identity', aes(fill = Iteration)) +
labs(x = NULL, title = "Total within-cluster sum of square on:") +
theme_ani -> gg_tot_sum
gganimate(gg_tot_sum)
LS0tCnRpdGxlOiAiay1tZWFucyBleHBsYW5hdGlvbiIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQojIyMgLi4uIG9yIGFib3V0IHRoZSBiZW5lZml0cyBvZiByZWludmVudGluZyB0aGUgd2hlZWwgICAKCmBgYHtyIHNldHVwLCBpbmNsdWRlPUZBTFNFfQprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUsCiAgICAgICAgICAgICAgICAgICAgICBmaWcud2lkdGggPSA4KQoKbGlicmFyeSh0aWR5dmVyc2UpCmxpYnJhcnkoZ2dhbmltYXRlKQpsaWJyYXJ5KGFuaW1hdGlvbikKCnNvdXJjZSgiaGVscGVyX2Z1bmN0aW9ucy5SIikKCmFuaS5vcHRpb25zKGFuaS53aWR0aCA9IDc4MCkKYGBgCgojIyMgKipJbnRybyoqCgpJdCBhbGwgYmVnYW4gd2l0aCB0aGUgZmFjdCB0aGF0IEkndmUgZGVjaWRlZCB0byBzdHVkeSBrLW1lYW5zIGNsdXN0ZXJpbmcgYWxnb3JpdGhtLiBUaGUgbW9yZSByZXNvdXJjZXMgSSByZWFkLCB0aGUgbW9yZSBjb25mdXNlZCAuIFRoZSAgZGVmaW5pdGlvbiBvZiBjbHVzdGVyIGNlbnRlcnMgaGFzIGJlZW4gc3R1bWJsaW5nLWJsb2NrLiBUaGUgbW9zdCBhY2Nlc3NpYmxlIGV4cGxhbmF0aW9uIHdhcyBwcm92aWRlZCBieSBbV2lraXBlZGlhXShodHRwczovL2VuLndpa2lwZWRpYS5vcmcvd2lraS9LLW1lYW5zX2NsdXN0ZXJpbmcpOgoKPiBBc3NpZ24gZWFjaCBvYnNlcnZhdGlvbiB0byB0aGUgY2x1c3RlciB3aG9zZSBtZWFuIGhhcyB0aGUgbGVhc3Qgc3F1YXJlZCBFdWNsaWRlYW4gZGlzdGFuY2UuCgpCdXQgaG93IGRvZXMgdGhpcyBoYXBwZW4/IFNvIEkndmUgZGVjaWRlZCB0byByZWludmVudCBhIHdoZWVsIGFuZCByZXByb2R1Y2Ugc3RlcCBieSBzdGVwIGFsbCBzdGFnZXMgb2YgdGhlIGFsZ29yaXRobS4KQXMgYSBiYXNpcyBJJ3ZlIGRlY2lkZWQgdG8gdGFrZSBhbiBleGFtcGxlIGZyb20gdGhlIGJvb2sgYnkgUm9nZXIgRC4gUGVuZyBbRXhwbG9yYXRvcnkgRGF0YSBBbmFseXNpcyB3aXRoIFJdKGh0dHBzOi8vYm9va2Rvd24ub3JnL3JkcGVuZy9leGRhdGEvay1tZWFucy1jbHVzdGVyaW5nLmh0bWwpLgoKQmVjYXVzZSBJJ20gYSBwYXNzaW9uYXRlIGZhbiBvZiBSIEkndmUgZGVjaWRlZCB0byB1c2UgZmFudGFzdGljIHBhY2thZ2UgW3R5ZGl2ZXJzZV0oaHR0cHM6Ly93d3cudGlkeXZlcnNlLm9yZy8pIHRvIHJlcHJvZHVjZSBhbGdvcml0aG0gYW5kIFtnZ2FuaW1hdGVdKGh0dHBzOi8vZ2l0aHViLmNvbS90aG9tYXNwODUvZ2dhbmltYXRlKSB0byB2aXN1YWxpemUgcmVzdWx0LiAgClNvLCBsZXQncyBiZWdpbiEKCiMjIyAqKjEuIERpdmUgaW50byBrLW1lYW5zKioKCiMjIyMgKipEYXRhIHByZXBhcmF0aW9uKioKSSd2ZSBkZWNpZGVkIHRvIHJlcHJvZHVjZSBkYXRhc2V0IHRoYXQgaXMgcHJvdmlkZWQgYnkgUm9nZXIgRC5QZW5nIChGaWd1cmUgMTIuMTogU2ltdWxhdGVkIGRhdGFzZXQpIVtdKHNpbXVsYXRlZF9kYXRhc2V0LnBuZykKYGBge3J9CnNldC5zZWVkKDEyMzQpCnggPC0gYygwLjc1LCAxLjEsIDEuMjUsIDAuNTUsIDIuMTUsIDIuMTcsIDEuODUsIDEuODYsIDIuODcsIDIuNzUsIDIuOTEsIDIuNzEpCnkgPC0gYygwLjYsIDEuMCwgMS4yNSwgMS4wLCAxLjksIDEuOCwgMS44NSwgMi41LCAxLjAsIDAuODUsIDAuODYsIDEuMTgpCmRmX3h5IDwtIHRpYmJsZSh4LCB5KSAlPiUgcm93aWRfdG9fY29sdW1uKCkKIyBMZW5ndGggb2YgZGF0YXNldApuX3h5IDwtIGRpbShkZl94eSlbMV0KYGBgCkxldCdzIHRha2UgdGhyZWUgY2x1c3RlcnMgZm9yIGV4YW1wbGUKYGBge3J9CkN4ID0gYygxLCAxLjcsIDIuNSkKQ3kgPSBjKDIsIDEuMCwgMS41KQojIE51bWJlciBvZiBjbHVzdGVycyBkZWZpbmUgYXMgbGVuZ3RoIG9mIEN4IG9yIEN5IHZlY3RvcnMKbl9jbHVzdCA8LSBsZW5ndGgoQ3gpCmRmX2NsIDwtIHRpYmJsZShDbHVzdGVyID0gZmFjdG9yKHNlcShuX2NsdXN0KSwgb3JkZXJlZCA9IFQpLCBDeCwgQ3kpCmBgYApOZXh0IGNyZWF0ZSBhIGRhdGFzZXQgd2l0aCBjb29yZGluYXRlcyBvZiBvYnNlcnZhdGlvbnMgYW5kIGNvb3JkaW5hdGVzIG9mIGNsdXN0ZXIgY2VudGVycwpgYGB7cn0KZGZfa21lYW5zMCA8LSBiaW5kX2NvbHMoZGZfeHksIG1hcF9kZnIoZGZfY2wsIHJlcCwgbGVuZ3RoLm91dCA9IGxlbmd0aCh4KSkpCmBgYApIZXJlJ3MgaG93IGFsbCB0aGlzIGxvb2tzIGxpa2UgYXQgdGhlIGluaXRpYWwgc3RhZ2UgIApgYGB7cn0KZ2dwbG90KGRmX2ttZWFuczAsIGFlcyh4LCB5KSkgKwogIGdlb21fcG9pbnQoc2hhcGUgPSAxLCBzaXplID0gNSwgc3Ryb2tlID0gMSkgKwogIGdlb21fdGV4dChhZXMoeCwgeSwgbGFiZWwgPSByb3duYW1lcyhkZl94eSkpLCB2anVzdCA9IDIuMCwgc2l6ZSA9IDQsIGNvbG9yID0gImJsYWNrIikgKyAKICBnZW9tX3BvaW50KGFlcyhDeCwgQ3ksIGNvbG9yID0gQ2x1c3RlciksIHNoYXBlID0gMywgc2l6ZSA9IDUsIHN0cm9rZSA9IDEpICsKICBjb29yZF9lcXVhbCgpICsKICBsYWJzKHRpdGxlID0gIlRoZSBpbml0aWFsIGxvY2F0aW9uIG9mIHRoZSBjbHVzdGVyIGNlbnRlcnMiKSArCiAgdGhlbWUocGxvdC50aXRsZSA9IGVsZW1lbnRfdGV4dChoanVzdCA9IDAuNSkpCmBgYAoKTm93IGxldCdzIGxvb2sgd2hhdCdzIGhhcHBlbiB1bmRlciB0aGUgaG9vZCBvZiBrLW1lYW5zCgojIyMjICoqU3RlcCAxLiBDYWxjdWxhdGUgc3F1YXJlZCBFdWNsaWRlYW4gZGlzdGFuY2UgYmV0d2VlbiBlYWNoIG9ic2VydmF0aW9ucyBhbmQgZWFjaCBjbHVzdGVycyoqCkZpcnMgb2YgYWxsIHdlIG5lZWQgdG8gY2hvb3NlIGEgbWV0cmljIGZvciBtZWFzdXJpbmcgZGlzdGFuY2UgYmV0d2VlbiBvYnNlcnZhdGlvbnMgYW5kIGNsdXN0ZXIncyBjZW50ZXJzLiBMZXQgaXQgYmUgRXVjbGlkZWFuIGRpc3RhbmNlCm9yIHJhdGhlciBzcXVhcmVkIEV1Y2xpZGVhbiBkaXN0YW5jZS4KCj4gRXVfZGlzdF4yXiA9ICh4fml+IC0geH5jfileMl4gKyAoeX5pfiAtIHl+Y34pXjJeCgpOZXh0LCBpdCBuZWVkcyB0byBkZXRlcm1pbmUgdGhlIGRpc3RhbmNlIGZyb20gZWFjaCBwb2ludCB0byBlYWNoIGNsdXN0ZXIuIFNpbmNlIHdlIGhhdmUgMTIgcG9pbnRzIGFuZCAzIGNsdXN0ZXJzLCB3ZSBnZXQgMzYgZGlzdGFuY2VzLiBJdCB3YXMgdGhlIGZpcnN0IHN0dW1ibGluZy1ibG9jay4KYGBge3J9CmRmX2Rpc3QgPC0gYmluZF9jb2xzKG1hcF9kZnIoZGZfeHksIHJlcCwgbl9jbHVzdCksCiAgICAgICAgICAgICAgICAgICAgIG1hcF9kZnIoZGZfY2wsIHJlcCwgZWFjaCA9IG5feHkpKSAlPiUgCiAgbXV0YXRlKEV1X2Rpc3QgPSBzcXJ0KCguJHggLSAuJEN4KV4yICsgKC4keSAtIC4kQ3kpXjIpKQpkZl9kaXN0CmBgYAojIyMjICoqU3RlcC4yIEFzc2lnbiBlYWNoIG9ic2VydmF0aW9uIHRvIHRoZSBjbHVzdGVyIHdob3NlIG1lYW4gaGFzIHRoZSBsZWFzdCBzcXVhcmVkIEV1Y2xpZGVhbiBkaXN0YW5jZSoqCgpgYGB7cn0Kc3RlcDIgPC0gZGZfZGlzdCAlPiUgCiAgc2VsZWN0KC1DeCwgLUN5KSAlPiUgCiAgc3ByZWFkKENsdXN0ZXIsIEV1X2Rpc3QpICU+JSAKICBuZXN0KC1yb3dpZCwgLXgsIC15KSAlPiUKICBtdXRhdGUoRXVfZGlzdCA9IG1hcF9kYmwoLiRkYXRhLCBtaW4pLAogICAgICAgICBDbHVzdGVyID0gbWFwX2RibCguJGRhdGEsIHdoaWNoLm1pbikgJT4lIGZhY3RvciguLCBvcmRlcmVkID0gVCkpICU+JQogIHVubmVzdChkYXRhKQpzdGVwMgpgYGAKQ29sdW1ucyAxLTMgcmVwcmVzZW50IGFsbCAoMyBpbiBvdXIgY2FzZSkgIGRpc3RhbmNlcyBvZiBvYnNlcnZhdGlvbiB0byBlYWNoIGNsdXN0ZXIuIFdlIG5lZWQgdG8gY2hvb3NlIHRoZSBtaW5pbXVtIHZhbHVlIGZvciBlYWNoIG9ic2VydmF0aW9uLiBUaGV5IGFyZSBnYXRoZXJlZCBpbiBFdV9kaXN0IGNvbHVtbi4gU28sIHRoZSBjb2x1bW5zIEV1X2Rpc3QgYW5kIENsdXN0ZXIgZGVtb25zdHJhdGUgd2hpY2ggb2JzZXJ2YXRpb25zIHRvIHdoaWNoIGNsdXN0ZXJzIGJlbG9uZy4gSXQgd2FzIHRoZSBzZWNvbmQgYW5kIHRoZSBsYXN0IHN0dW1ibGluZy1ibG9jay4gCgojIyMjICoqU3RlcCAzLiBKb2luIGNsdXN0ZXIncyBjb29yZGluYXRlcyB0byBwcmV2aW91cyBkYXRhc2V0KioKTm93IGl0IHJlbWFpbnMgdG8gYWRkIHRoZSBjb29yZGluYXRlcyBvZiBjbHVzdGVyIGNlbnRlcnMgdG8gb3VyIHByZXZpb3VzIGRhdGEgc2V0ICAKYGBge3J9CnN0ZXAyICU+JQogIGxlZnRfam9pbihkZl9jbCwgYnkgPSAnQ2x1c3RlcicpIC0+IHN0ZXAzCnN0ZXAzCmBgYApMZXQncyBkZW1vbnN0cmF0ZSByZXN1bHQgIApgYGB7cn0KZ2dwbG90KHN0ZXAzLCBhZXMoeCwgeSwgY29sb3IgPSBDbHVzdGVyKSkgKwogIGdlb21fcG9pbnQoc2l6ZSA9IDUpICsKICBnZW9tX3BvaW50KGFlcyh4ID0gc3RlcDMkQ3gsIHkgPSBzdGVwMyRDeSksIHNoYXBlID0gMywgc2l6ZSA9IDcpICsKICBnZW9tX3RleHQobGFiZWwgPSByb3duYW1lcyhzdGVwMyksIHZqdXN0ID0gMS44LCBzaXplID0gNCwgY29sb3IgPSAnYmxhY2snKSArCiAgY29vcmRfZml4ZWQoKQpgYGAKCiMjIyMgKipTdGVwIDQuIFJlY2FsY3VsYXRpb24gY2VudGVyJ3MgY29vcmRpbmF0ZXMgb2YgbmV3IGNsdXN0ZXJzKioKYGBge3J9CnN0ZXAzICU+JSAKICBzZWxlY3QoLW51bV9yYW5nZSgiIiwgMTpuX2NsdXN0KSkgJT4lIAogIGdyb3VwX2J5KENsdXN0ZXIpICU+JSAKICBtdXRhdGUoQ3gyID0gbWVhbih4KSwKICAgICAgICAgQ3kyID0gbWVhbih5KSkgLT4gc3RlcDQKc3RlcDQKYGBgCkxldCdzIGxvb2sgYXQgbmV3IGNsdXN0ZXJzIChzbWFsbGVyIHNpemUgLSBhcmUgaW5pdGlhbCB2YWx1ZXMpICAKYGBge3J9CnN0ZXA0ICU+JSAKICBnZ3Bsb3QoYWVzKHgsIHksIGNvbG9yID0gQ2x1c3RlcikpICsKICAgIGdlb21fcG9pbnQoc2l6ZSA9IDUpICsKICAgIGdlb21fcG9pbnQoYWVzKHggPSBzdGVwNCRDeCwgeSA9IHN0ZXA0JEN5KSwgc2hhcGUgPSAzLCBzaXplID0gMykgKwogICAgZ2VvbV9wb2ludChhZXMoeCA9IHN0ZXA0JEN4MiwgeSA9IHN0ZXA0JEN5MiksIHNoYXBlID0gMywgc2l6ZSA9IDYpICsKICAgIGdlb21fdGV4dChhZXMobGFiZWwgPSByb3duYW1lcyhzdGVwNCkpLCB2anVzdCA9IDEuOCwgc2l6ZSA9IDQsIGNvbG9yID0gJ2JsYWNrJykgCmBgYAoKUHV0IG91ciBkYXRhc2V0IHdpdGggbmV3IGNsdXN0ZXJzIGludG8gImNhbm9uaWNhbCIgZm9ybSAgCmBgYHtyIHdhcm5pbmc9RkFMU0V9CnN0ZXA0ICU+JSAKICBzZWxlY3QoQ2x1c3RlciwgQ3gyLCBDeTIpICU+JSAKICBncm91cF9ieShDbHVzdGVyKSAlPiUgCiAgdW5pcXVlKC4pICU+JSAKICBhcnJhbmdlKENsdXN0ZXIpIC0+IG5ld19pdGVyYXRpb24KbmV3X2l0ZXJhdGlvbgpgYGAKTmV4dCAtIHJlcGVhdCBpdGVyYXRpb24gZnJvbSBzdGVwIDEgYW5kIHNvIG9uLCBidXQgZm9yIHRoaXMgcHVycG9zZSBpdCBuZWVkcyB0byBoYXZlIHNvbWV0aGluZyBsaWtlIHJlY3Vyc2l2ZSBmdW5jdGlvbiBvciBldmVuIHR3byEuIEkndmUgY3JlYXRlZCBrX21lYW5zKCkgZnVuY3Rpb24gdGhhdCBleGVjdXRlIGstbWVhbnMgYWxnb3JpdGhtIGFuZCB3cmFwcGVyX2ttZWFucygpIGZ1bmN0aW9uIGZvciBjYWxsaW5nIGtfbWVhbnMoKSBpbiB0aGUgYmVzdCB3YXkuIFRoZXNlIGZ1bmN0aW9uIGFuZCBjb3VwbGUgb3RoZXJzIGFyZSBjb250YWluZWQgaW4gaGVscGVyX2Z1bmN0aW9ucy5SIGZpbGUuICAgCkl0IG11c3QgYmUgbm90ZWQsIGF0IHRoaXMgc3RhZ2UgSSd2ZSBiZWNhbWUgYXdhcmUgb2YgdGhlIHByaW5jaXBsZSBvZiBrLW1lYW5zLCBidXQgSSB3YW50ZWQgdG8gc2VlIHRoZSByZXN1bHQgZm9yIHNldmVyYWwgaXRlcmF0aW9ucyBhbmQgY29tcGFyZSBvbmUgd2l0aCBhIHJlZ3VsYXIgYWxnb3JpdGhtIC0gdGhlIGttZWFucygpIGZ1bmN0aW9uLiAgClNvLCBsZXQncyBrZWVwIG9uIQoKIyMjICoqMi4gay1tZWFucyBmcm9tIGEgYmlyZCdzIGV5ZSB2aWV3KioKClNldCBudW1iZXIgb2YgaXRlcmF0aW9uIGFuZCBwYXNzIGl0IHdpdGggb2JzZXJ2YXRpb24gKGRmX3h5KSBhbmQgY2x1c3RlciAoZGZfY2wpIHRpYmJsZXMgdG8gd3JhcGVyX2ttZWFucygpLiBJdCByZXR1cm4gbGlzdCBvZiB0aWJibGVzIHdpdGggb2JzZXJ2YXRpb25zIGFuZCBjbHVzdGVycyBmcm9tIGtfbWVhbnMoKSBmdW5jdGlvbiBmcm9tIGFsbCBpdGVyYXRpb25zLgpgYGB7ciB3YXJuaW5nPUZBTFNFfQpuX2l0ZXIgPC0gNSAjIG51Ym1lciBvZiBrLW1lYW5zIGl0ZXJhdGlvbnMKZGZfa21lYW5zIDwtIHdyYXBlcl9rbWVhbnMoZGZfeHksIGRmX2NsLCBuX2l0ZXIpCmRmX2ttZWFuc1tbbl9pdGVyXV0KYGBgCkFsc28gd2UgY2FuIHZpZXcgbWFpbiByZXN1bHRzIG9mIGtfbWVhbnMgYWxnb3JpdGhtIGZvciBjaG9zZW4gaXRlcmF0aW9uCmBgYHtyIHdhcm5pbmc9RkFMU0UsIHBhZ2VkLnByaW50PUZBTFNFfQpteV9rbWVhbnMgPC0gZ2V0X2ttZWFucyhuX2l0ZXIsIGRmX2ttZWFucykKbXlfa21lYW5zCmBgYAogLi4uIGFuZCBjb21wYXJlIGl0IHRoaXMgcmVzdWx0IG9mIGJ1aWxkLWluIGstbWVhbnMoKSBmdW5jdGlvbgpgYGB7ciBlY2hvPUZBTFNFfQprbWVhbnMoZGZfeHlbLCAtMV0sIGNlbnRlcnMgPSAzLCBpdGVyLm1heCA9IG5faXRlcikKYGBgCkFuZCBob3cgYWJvdXQgdmlzdWFsaXphdGlvbj8gQW5kIGl0IGlzIGRlc2lyYWJsZSB3aXRoIGFuaW1hdGlvbiEgICAKWWVzIG9mIGNvdXJzZSEgQnV0IGZpcnN0IG1ha2UgImxvbmciIGRhdGFmcmFtZSBmcm9tIHRoZSBkZl9rbWVhbnMgbGlzdDoKYGBge3J9CmRmX2xvbmcgPC0gbWFwX2RmcihkZl9rbWVhbnMsIH4gLikKYGBgClRhIGRhIQpgYGB7ciB3YXJuaW5nPUZBTFNFLCBmaWcuc2hvdyA9ICdhbmltYXRlJ30KdGhlbWVfYW5pIDwtIHRoZW1lKAogICAgICAgICAgcGxvdC50aXRsZSA9IGVsZW1lbnRfdGV4dChzaXplID0gMTQsIGZhY2UgPSAiYm9sZCIsIGhqdXN0ID0gMC41KSwKICAgICAgICAgIGxlZ2VuZC50aXRsZSA9IGVsZW1lbnRfdGV4dChmYWNlID0gImJvbGQiLCBjb2xvdXIgPSAiYmxhY2siLCBzaXplID0gMTIpLAogICAgICAgICAgbGVnZW5kLnRleHQgPSBlbGVtZW50X3RleHQoc2l6ZSA9IDExLCBjb2xvciA9ICJibGFjayIpLAogICAgICAgICAgYXhpcy50ZXh0ID0gZWxlbWVudF90ZXh0KHNpemUgPSAxMSwgY29sb3VyID0gImJsYWNrIiksCiAgICAgICAgICBheGlzLnRpdGxlID0gZWxlbWVudF90ZXh0KHNpemUgPSAxMiwgZmFjZSA9ICJib2xkIikpCgpnZ3Bsb3QoZGZfbG9uZywgYWVzKHgsIHksIGNvbG9yID0gQ2x1c3RlciwgZnJhbWUgPSBJdGVyYXRpb24pKSArCiAgZ2VvbV9wb2ludChhZXMoZ3JvdXAgPSBDbHVzdGVyKSwgc2l6ZSA9IDUpICsKICBnZW9tX3BvaW50KGFlcyh4ID0gQ3gsIHkgPSBDeSksIHNoYXBlID0gMywgc2l6ZSA9IDUsIGNvbG9yID0gImJsYWNrIiwgc3Ryb2tlID0gMSkgKwogIGdlb21fcGF0aChhZXMoeCA9IEN4LCB5ID0gQ3ksIGN1bXVsYXRpdmUgPSBUUlVFLCBncm91cCA9IENsdXN0ZXIpLAogICAgICAgICAgICBjb2xvciA9ICJibGFjayIpICsKICBnZW9tX3RleHQoYWVzKHgsIHksIGxhYmVsID0gcm93aWQpLCB2anVzdCA9IDEuOCwgc2l6ZSA9IDQsIGNvbG9yID0gJ2JsYWNrJykgKwogIGdlb21fcG9seWdvbihkYXRhID0gZGZfcG9seWdvbihkZl9rbWVhbnMpLAogICAgICAgICAgICAgICBhZXMoeCwgeSwgZ3JvdXAgPSBDbHVzdGVyLCBmaWxsID0gQ2x1c3RlciksIGFscGhhID0gLjUpICsKICBjb29yZF9lcXVhbCgpICsKICBsYWJzKHRpdGxlID0gIkhvdyBrLW1lYW5zIHdvcms6IikgKwogIHRoZW1lX2FuaSAtPiBnZwpnZ2FuaW1hdGUoZ2cpCmBgYApgYGB7ciBmaWcuc2hvdyA9ICdhbmltYXRlJ30KZ2dwbG90KGRmX2xvbmcsIGFlcyh4ID0gcm93aWQsIHkgPSBFdV9kaXN0LCBmcmFtZSA9IEl0ZXJhdGlvbiwgY3VtdWxhdGl2ZSA9IFRSVUUpKSArCiAgZ2VvbV9wb2ludChhZXMoY29sb3IgPSBJdGVyYXRpb24pLCBzaXplID0gNSkgKwogIGdlb21fbGluZShhZXMoY29sb3IgPSBJdGVyYXRpb24pLCBzaXplID0gMSkgKwogIHNjYWxlX3hfY29udGludW91cyhicmVha3MgPSAxOmRpbShkZl94eSlbMV0pICsKICBsYWJzKHggPSAiT2JzZXJ2YXRpb25zIiwgeSA9ICJFdWNsaWRlYW4gZGlzdGFuY2Ugc3F1YXJlZCIsCiAgICAgICB0aXRsZSA9ICJTcXVhcmVkIEV1Y2xpZGVhbiBkaXN0YW5jZSBmb3IgZWFjaCBvYnNlcnZhdGlvbiBvbjoiKSArCiAgdGhlbWVfYW5pIC0+IGdnX2Rpc3QKCmdnYW5pbWF0ZShnZ19kaXN0KQpgYGAKYGBge3IgZmlnLnNob3cgPSAnYW5pbWF0ZSd9CmRmX2xvbmcgJT4lIAogIGdyb3VwX2J5KEl0ZXJhdGlvbikgJT4lIAogIHN1bW1hcmlzZSh0b3Rfc3VtID0gc3VtKEV1X2Rpc3QpKSAlPiUgCiAgdW5ncm91cCgpICU+JSAKICBnZ3Bsb3QoYWVzKHggPSBJdGVyYXRpb24sIHkgPSB0b3Rfc3VtLCBmcmFtZSA9IEl0ZXJhdGlvbiwgY3VtdWxhdGl2ZSA9IFRSVUUpKSArCiAgICBnZW9tX2JhcihzdGF0ID0gJ2lkZW50aXR5JywgYWVzKGZpbGwgPSBJdGVyYXRpb24pKSArCiAgICBsYWJzKHggPSBOVUxMLCB0aXRsZSA9ICJUb3RhbCB3aXRoaW4tY2x1c3RlciBzdW0gb2Ygc3F1YXJlIG9uOiIpICsKICAgIHRoZW1lX2FuaSAtPiBnZ190b3Rfc3VtCgpnZ2FuaW1hdGUoZ2dfdG90X3N1bSkKYGBgCgojIyMjICoqQ29tbWVudCMxKiogIAoKQnkgdGhlIGVuZCBJIHdhbnQgdG8gbWVudGlvbiBzb21lIHRoaW5ncy4gICAKVGhpcyBpcyBhIGRlbW9uc3RyYXRpb24gZXhhbXBsZSwgc28gdGhlIGstbWVhbnMgYWxnb3JpdGhtIGltcGxlbWVudGVkIGhlcmUgbWF5IGJlIGZhciBmcm9tIGVmZmVjdGl2ZS4gQmVjYXVzZSBvZiB0aGlzLCBJIGRvIG5vdCByZWNvbW1lbmQgdXNpbmcgaXQgd2l0aCBsYXJnZSBkYXRhc2V0cyBhbmQgYSBsYXJnZSBudW1iZXIgb2YgaXRlcmF0aW9ucyAoPiAxMCkuIE9mIGNvdXJzZSwgeW91IGNhbiB1c2UgeW91ciBvd24gZGF0YXNldCBpZiB5b3UgYXJlIGludGVyZXN0ZWQuICAgCgojIyMjICoqQ29tbWVudCMyKiogICAKClNvLCB3aGF0IGhhdmUgSSBhY2hpZXZlZCBieSByZWludmVudGluZyB0aGUgd2hlZWw6ICAgCjEuICBVbmRlcnN0b29kIHRoZSBbay1tZWFuc10oaHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvSy1tZWFuc19jbHVzdGVyaW5nKSBhbGdvcml0aG0sIG9mIGNvdXJzZSkgICAKMi4gIEltcHJvdmVkIG15IHNraWxscyBpbiBbdHlkaXZlcnNlXShodHRwczovL3d3dy50aWR5dmVyc2Uub3JnLykgICAKMy4gIExlYXJuZWQgdG8gd3JpdGUgcmVjdXJzaXZlIGZ1bmN0aW9ucyB0aGF0IHJldHVybiB0d28gdmFsdWVzLCBvbmUgb2Ygd2hpY2ggaXMgbmVjZXNzYXJ5IGZvciByZWN1cnNpb24gd29yaywgYW5kIHRoZSBvdGhlciAtIGZvciBleHRlcm5hbCBuZWVkcyAodGhpcyBpcyB3aGF0IHdyYXBlcl9rbWVhbnMoKSByZXR1cm4pLiBUaGlzIHdhcyBhY2hpZXZlZCBieSB0aGUgYWRkaXRpb25hbCB3cmFwcGVyIGZ1bmN0aW9uIGFuZCB0aGUgY29uY2VwdCBvZiBlbnZpcm9ubWVudHMuICAKNC4gIExlYXJuZWQgdG8gbWFrZSBhbmltYXRlZCBjaGFydHMgd2l0aCBbZ2dhbmltYXRlXShodHRwczovL2dpdGh1Yi5jb20vdGhvbWFzcDg1L2dnYW5pbWF0ZSkuICAKNS4gIEV4YW1pbmVkIFtjaHVsbF0oaHR0cHM6Ly93d3cucmRvY3VtZW50YXRpb24ub3JnL3BhY2thZ2VzL2dyRGV2aWNlcy92ZXJzaW9ucy8zLjUuMS90b3BpY3MvY2h1bGwpIGZ1bmN0aW9uIGZyb20gYmFzZSBSICAgCjYuICBMZWFybmVkIGhvdyB0byBbaGlkZSBjb21tZW50c10oaHR0cHM6Ly9zdGFja292ZXJmbG93LmNvbS9xdWVzdGlvbnMvNDgyMzQ2OC9jb21tZW50cy1pbi1tYXJrZG93bikgaW4gW1JtYXJrZG93bl0oaHR0cHM6Ly9ybWFya2Rvd24ucnN0dWRpby5jb20vaW5kZXguaHRtbCkgIAogICAgCkknZCBiZSBhcHByaWNpYXRlIGZvciB5b3VyIGNyaXRpY2FsIHJlbWFya3MgYW5kIGNvbW1lbnRzLiBXZWxsLCBvZiBjb3Vyc2UgSSB3aWxsIGJlIHZlcnktdmVyeS12ZXJ5IGhhcHB5IGlmIHRoaXMgbWF0ZXJpYWwgaGVscHMgeW91IHVuZGVyc3RhbmQgay1tZWFucyBhcyBpdCBoZWxwZWQgbWUhIFlvdSBjYW4gZmluZCBjb2RlIFtoZXJlXShodHRwczovL2dpdGh1Yi5jb20vRG15dHJvUnliYWxrby9rLW1lYW5zX2V4cGxhbmF0aW9uKQo=