This analysis is based on the Bank Marketing dataset from the UCI Machine Learning Repository, which contains information from a Portuguese banking institution’s direct marketing campaigns (phone calls) conducted between May 2008 and November 2010. The primary objective of these campaigns was to encourage clients to subscribe to a term deposit product.
The data was collected through direct telephone marketing campaigns, with each record representing a client contact. Multiple contacts were often necessary to determine if the client would subscribe to the term deposit (the target variable). The dataset captures both client attributes and campaign interaction details.
The dataset consists of 45,211 client records with 17 input variables that can be categorized as follows:
Client Demographics and Financial Information:
age: Client’s age in years (numeric)
job: Type of employment (categorical: ‘admin’,
‘blue-collar’, ‘entrepreneur’, etc.)
marital: Marital status (categorical: ‘married’,
‘divorced’, ‘single’)
education: Education level (categorical: ‘primary’,
‘secondary’, ‘tertiary’, ‘unknown’)
default: Has credit in default? (binary: ‘yes’,
‘no’)
balance: Average yearly balance in euros
(numeric)
housing: Has housing loan? (binary: ‘yes’,
‘no’)
loan: Has personal loan? (binary: ‘yes’,
‘no’)
Campaign-Related Information:
contact: Contact communication type (categorical:
‘cellular’, ‘telephone’, ‘unknown’)
day: Last contact day of the month
(numeric)
month: Last contact month of year (categorical:
‘jan’ to ‘dec’)
duration: Last contact duration in seconds
(numeric)
campaign: Number of contacts performed during this
campaign for this client (numeric)
pdays: Number of days since client was last
contacted from a previous campaign (numeric; -1 means client was not
previously contacted)
previous: Number of contacts performed before this
campaign for this client (numeric)
poutcome: Outcome of the previous marketing campaign
(categorical: ‘failure’, ‘nonexistent’, ‘success’)
Target Variable:
y: Has the client subscribed to a term deposit?
(binary: ‘yes’, ‘no’)The primary business goal is to develop a targeted marketing strategy that maximizes the subscription rate for term deposit products while optimizing resource allocation. By segmenting customers based on their characteristics and behaviors, the bank aims to:
Through this customer segmentation analysis, we seek to transform the bank’s marketing approach from a mass-marketing strategy to a more personalized, data-driven approach that aligns with modern customer expectations and business efficiency requirements.
This analysis segments bank customers based on demographic, financial, and behavioral data to optimize marketing strategies for term deposit products. Using advanced clustering techniques, we identified six distinct customer segments with varying propensities to subscribe to the bank’s offerings.
Key findings indicate that:
These insights enable targeted marketing strategies with significantly higher ROI potential compared to mass marketing approaches.
# Set seed for reproducibility
set.seed(123)
# Load and sample data for analysis
bank_data_full <- read.csv("bank.csv", sep=";", stringsAsFactors = TRUE)
sample_indices <- sample(1:nrow(bank_data_full), 10000)
bank_data <- bank_data_full[sample_indices, ]
# Data preprocessing
# Create meaningful categorical variables
bank_data$pdays_status <- ifelse(bank_data$pdays == -1, "Not Contacted", "Contacted")
bank_data$pdays_status <- as.factor(bank_data$pdays_status)
bank_data$pdays_clean <- ifelse(bank_data$pdays == -1, NA, bank_data$pdays)
# Create age groups for easier interpretation
bank_data$age_group <- cut(bank_data$age,
breaks = c(0, 30, 40, 50, 60, 100),
labels = c("Under 30", "30-40", "40-50", "50-60", "60+"),
right = FALSE)
# Create balance groups
bank_data$balance_group <- cut(bank_data$balance,
breaks = c(-Inf, 0, 1000, 5000, 10000, Inf),
labels = c("Negative", "0-1K", "1K-5K", "5K-10K", "10K+"),
right = FALSE)
# Check for missing values
missing_values <- colSums(is.na(bank_data))# Function to create comparison plots
create_barplot <- function(data, var_name, title) {
ggplot(data, aes_string(x = var_name, fill = "y")) +
geom_bar(position = "fill") +
scale_fill_viridis_d(option = "D", begin = 0.3, end = 0.7) +
labs(title = title, y = "Proportion", fill = "Subscription") +
theme_minimal(base_size = 12) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.position = "right"
)
}
# Create key visualizations
p1 <- ggplot(bank_data, aes(x = age)) +
geom_histogram(bins = 30, fill = "#3498db", color = "white", alpha = 0.8) +
labs(title = "Age Distribution", x = "Age", y = "Count") +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
p2 <- ggplot(bank_data %>% filter(balance < quantile(balance, 0.99)),
aes(x = balance)) +
geom_histogram(bins = 30, fill = "#2ecc71", color = "white", alpha = 0.8) +
labs(title = "Balance Distribution (99th percentile)", x = "Balance (€)", y = "Count") +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
p3 <- ggplot(bank_data, aes(x = fct_infreq(job))) +
geom_bar(fill = "#9b59b6", color = "white", alpha = 0.8) +
labs(title = "Job Distribution", x = "Job", y = "Count") +
theme_minimal(base_size = 12) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5, face = "bold")
)
p4 <- ggplot(bank_data, aes(x = fct_infreq(education))) +
geom_bar(fill = "#e74c3c", color = "white", alpha = 0.8) +
labs(title = "Education Distribution", x = "Education", y = "Count") +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
# Arrange plots in a grid
grid.arrange(p1, p2, p3, p4, ncol = 2)The customer base analysis reveals:
The bank’s customer portfolio primarily consists of working-age individuals with modest account balances and mid-level education.
# Create comparative visualizations for target variable
age_plot <- create_barplot(bank_data, "age_group", "Age Group vs Subscription")
balance_plot <- create_barplot(bank_data, "balance_group", "Balance Group vs Subscription")
job_plot <- create_barplot(bank_data, "job", "Job vs Subscription")
education_plot <- create_barplot(bank_data, "education", "Education vs Subscription")
# Arrange comparative plots
grid.arrange(age_plot, balance_plot, job_plot, education_plot, ncol = 2)The subscription patterns reveal clear demographic and financial trends:
These patterns suggest differentiated marketing approaches based on customer life stage and financial capacity.
# Select numerical variables for clustering
numerical_vars <- c("age", "balance", "duration", "campaign", "previous")
cluster_data <- bank_data[, numerical_vars]
# Check correlations between numerical variables
correlation <- cor(cluster_data)
corrplot(correlation,
method = "circle",
type = "upper",
order = "hclust",
addCoef.col = "black",
tl.col = "black",
tl.srt = 45,
diag = FALSE,
title = "Correlation Matrix of Clustering Variables",
mar = c(0,0,2,0))The correlation analysis shows minimal multicollinearity between selected clustering variables:
This low interdependence strengthens the clustering approach by avoiding redundant features.
# Principal Component Analysis
pca_result <- prcomp(cluster_data_scaled, center = TRUE, scale. = TRUE)
# Visualize PCA variable contributions
fviz_pca_var(pca_result,
col.var = "contrib",
gradient.cols = viridis(10, direction = -1),
repel = TRUE,
title = "Variables - PCA")# Make sure subscription status is properly encoded
bank_data$y <- as.factor(bank_data$y)
# Visualize individuals on PCA plot
fviz_pca_ind(pca_result,
geom.ind = "point",
col.ind = bank_data$y,
palette = c("#FC8D62", "#66C2A5"),
addEllipses = TRUE,
alpha = 0.5,
legend.title = "Subscription",
title = "Individuals - PCA")The PCA results indicate:
The even distribution of variance across components confirms that customer segmentation requires a multidimensional approach.
# Calculate and visualize the within sum of squares for different k values
set.seed(123)
wss <- sapply(1:10, function(k) {
kmeans(cluster_data_scaled, centers = k, nstart = 25)$tot.withinss
})
# Manual plotting of elbow method
elbow_df <- data.frame(k = 1:10, wss = wss)
elbow_plot <- ggplot(elbow_df, aes(x = k, y = wss)) +
geom_line(linewidth = 1, color = "#3498db") +
geom_point(size = 3, color = "#3498db") +
scale_x_continuous(breaks = 1:10) +
labs(title = "Elbow Method for Optimal k",
x = "Number of clusters (k)",
y = "Total Within Sum of Squares") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
# Manual silhouette calculation
silhouette_avg <- numeric(5)
k_values <- 2:6 # Testing k from 2 to 6
for(i in 1:length(k_values)) {
k <- k_values[i]
km <- kmeans(cluster_data_scaled, centers = k, nstart = 25)
sil <- silhouette(km$cluster, dist(cluster_data_scaled))
silhouette_avg[i] <- mean(sil[, 3])
}
# Manual plotting of silhouette method
silhouette_df <- data.frame(k = k_values, silhouette_avg = silhouette_avg)
silhouette_plot <- ggplot(silhouette_df, aes(x = k, y = silhouette_avg)) +
geom_line(linewidth = 1, color = "#e74c3c") +
geom_point(size = 3, color = "#e74c3c") +
scale_x_continuous(breaks = k_values) +
labs(title = "Silhouette Method for Optimal k",
x = "Number of clusters (k)",
y = "Average Silhouette Width") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
# Display the plots together
grid.arrange(elbow_plot, silhouette_plot, ncol = 2)Two complementary methods were used to determine the optimal number of clusters:
Based on these results, a 6-cluster solution provides the best balance between model complexity and cluster separation. This is consistent with both the statistical evidence and business interpretability needs.
# Define function for cluster analysis
analyze_clusters <- function(data, cluster_var) {
# Convert cluster_var to symbol
cluster_var_sym <- sym(cluster_var)
# Numerical variables summary by cluster
num_summary <- data %>%
group_by(!!cluster_var_sym) %>%
summarise(
Count = n(),
Percentage = n() / nrow(data) * 100,
Avg_Age = mean(age),
Avg_Balance = mean(balance),
Avg_Duration = mean(duration),
Avg_Campaign = mean(campaign),
Avg_Previous = mean(previous),
Subscription_Rate = mean(y == "yes") * 100
)
# Categorical variables summary
cat_summary <- list()
# Job distribution by cluster
cat_summary$job <- data %>%
count(!!cluster_var_sym, job) %>%
group_by(!!cluster_var_sym) %>%
mutate(percentage = n / sum(n) * 100) %>%
arrange(!!cluster_var_sym, desc(percentage))
# Education distribution by cluster
cat_summary$education <- data %>%
count(!!cluster_var_sym, education) %>%
group_by(!!cluster_var_sym) %>%
mutate(percentage = n / sum(n) * 100) %>%
arrange(!!cluster_var_sym, desc(percentage))
# Marital status by cluster
cat_summary$marital <- data %>%
count(!!cluster_var_sym, marital) %>%
group_by(!!cluster_var_sym) %>%
mutate(percentage = n / sum(n) * 100) %>%
arrange(!!cluster_var_sym, desc(percentage))
return(list(numerical = num_summary, categorical = cat_summary))
}
# Set optimal number of clusters
optimal_k <- 6
# Perform K-means clustering
set.seed(123)
kmeans_result <- kmeans(cluster_data_scaled, centers = optimal_k, nstart = 25)
# Add cluster assignment to the original data
bank_data$cluster <- as.factor(kmeans_result$cluster)
# Visualize clusters in PCA space
fviz_cluster(list(data = cluster_data_scaled, cluster = kmeans_result$cluster),
palette = viridis(optimal_k, option = "D"),
ellipse.type = "convex",
repel = FALSE,
label = FALSE,
shape = 19,
pointsize = 1,
show.clust.cent = TRUE,
geom = "point",
ggtheme = theme_minimal(base_size = 12),
main = "Customer Segments - 6 Clusters",
xlab = paste0("Principal Component 1 (", round(pca_result$sdev[1]^2/sum(pca_result$sdev^2)*100, 1), "%)"),
ylab = paste0("Principal Component 2 (", round(pca_result$sdev[2]^2/sum(pca_result$sdev^2)*100, 1), "%)"))The cluster visualization in PCA space reveals:
This visualization confirms the presence of distinct customer groups with different behavioral patterns.
# Perform GMM clustering with focused range
gmm_model <- Mclust(cluster_data_scaled, G = 4:8)
# Plot BIC for model selection
plot(gmm_model, what = "BIC",
xlab = "Number of clusters",
ylab = "BIC",
main = "BIC by Number of Clusters")# Add GMM cluster assignment to data
bank_data$gmm_cluster <- as.factor(gmm_model$classification)
# Compare K-means with GMM results
comparison_table <- table(bank_data$cluster, bank_data$gmm_cluster)
# Visualize agreement between clustering methods
agreement_df <- as.data.frame(comparison_table)
colnames(agreement_df) <- c("KMeans", "GMM", "Count")
ggplot(agreement_df, aes(x = KMeans, y = GMM, fill = Count)) +
geom_tile() +
scale_fill_viridis_c(option = "D") +
geom_text(aes(label = Count), color = "white") +
labs(title = "Agreement Between K-means and GMM Clustering",
x = "K-means Cluster",
y = "GMM Cluster") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))The Gaussian Mixture Model analysis provides a complementary perspective:
For practical marketing purposes, the K-means solution with 6 well-defined clusters is more suitable due to its balanced cluster sizes and clear interpretability.
# Analyze K-means clusters
kmeans_analysis <- analyze_clusters(bank_data, "cluster")
# Display numerical summary of clusters
kable(kmeans_analysis$numerical,
caption = "Numerical Characteristics by Customer Segment",
digits = 1,
format.args = list(big.mark = ","))| cluster | Count | Percentage | Avg_Age | Avg_Balance | Avg_Duration | Avg_Campaign | Avg_Previous | Subscription_Rate |
|---|---|---|---|---|---|---|---|---|
| 1 | 5,041 | 50.4 | 33.8 | 789.9 | 200.8 | 2.2 | 0.3 | 7.6 |
| 2 | 323 | 3.2 | 39.5 | 853.5 | 138.4 | 16.1 | 0.1 | 2.2 |
| 3 | 316 | 3.2 | 41.5 | 1,264.6 | 237.9 | 2.7 | 8.6 | 28.2 |
| 4 | 795 | 8.0 | 39.5 | 1,111.0 | 912.8 | 2.3 | 0.4 | 48.6 |
| 5 | 3,183 | 31.8 | 52.3 | 1,082.4 | 196.0 | 2.5 | 0.3 | 9.3 |
| 6 | 342 | 3.4 | 44.0 | 12,198.9 | 245.4 | 2.4 | 0.3 | 15.8 |
# Visualize numerical features by cluster
kmeans_num_long <- kmeans_analysis$numerical %>%
select(cluster, Avg_Age, Avg_Balance, Avg_Duration, Avg_Campaign, Avg_Previous) %>%
pivot_longer(cols = starts_with("Avg_"),
names_to = "Feature",
values_to = "Value")
ggplot(kmeans_num_long, aes(x = Feature, y = Value, fill = cluster)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_viridis_d(option = "D") +
facet_wrap(~Feature, scales = "free_y") +
labs(title = "Numerical Features by Cluster",
y = "Average Value",
fill = "Cluster") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5, face = "bold"),
strip.text = element_text(face = "bold")
)# Subscription rate by cluster
ggplot(kmeans_analysis$numerical, aes(x = cluster, y = Subscription_Rate, fill = cluster)) +
geom_bar(stat = "identity") +
scale_fill_viridis_d(option = "D") +
geom_text(aes(label = sprintf("%.1f%%", Subscription_Rate)),
position = position_stack(vjust = 0.5),
color = "white",
fontface = "bold") +
labs(title = "Subscription Rate by Cluster",
y = "Subscription Rate (%)",
fill = "Cluster") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))Based on the numerical characteristics, we can define the following customer segments:
| Cluster | Profile_Name | Description | Marketing_Strategy |
|---|---|---|---|
| Cluster 1 | Mass Market Customers | Middle-aged customers (avg. 33.8 years) with low balances (790€) forming the largest segment (50.4%). Average engagement with moderate call duration. | Low-cost, broad digital campaigns with basic product offers. Focus on increasing engagement and identifying high-potential customers within this group. |
| Cluster 2 | Marketing Resistant Group | Young adults requiring high contact frequency (16.1 campaigns) with brief interactions (138s). Smallest segment (3.2%) showing minimal interest. | Reduce marketing contact frequency. Test alternative channels and messaging to find more effective approach or consider lower priority. |
| Cluster 3 | Engaged Mid-Value Clients | Middle-aged customers (41.5 years) with above-average balances (1265€) and longer call durations (238s). Small but highly responsive segment (3.2%). | Personalized relationship-building approach with dedicated account managers. Focus on financial advisory and service upgrades. |
| Cluster 4 | High Conversion Segment | Young adults with moderate balances (1111€) and extremely long call durations (913s). Most responsive segment (8%) to marketing efforts. | Priority segment for intensive marketing efforts. Extended conversations focused on specific product benefits and detailed explanations. |
| Cluster 5 | Senior Conservative Customers | Oldest customer group (52.3 years) with average balances (1082€). Second-largest segment (31.8%) with standard engagement patterns. | Life-stage appropriate offerings focusing on security and stability. Conservative investment products with emphasis on long-term benefits. |
| Cluster 6 | Affluent Clients | Middle-aged affluent customers (44 years) with extremely high balances (12,199€). Small segment (3.4%) with longer-than-average call durations. | Premium wealth management services, exclusive investment opportunities, and preferential rates. Focus on retention and share-of-wallet growth. |
The subscription rate analysis reveals dramatic performance differences:
These differences highlight the potential efficiency gains from targeted marketing versus mass campaigns.
# Age distribution by cluster
p1 <- ggplot(bank_data, aes(x = age, fill = cluster)) +
geom_density(alpha = 0.7) +
scale_fill_viridis_d(option = "D") +
labs(title = "Age Distribution by Cluster", x = "Age", y = "Density") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
# Balance distribution by cluster (with outlier treatment)
p2 <- ggplot(bank_data %>% filter(balance < quantile(balance, 0.99)),
aes(x = balance, fill = cluster)) +
geom_density(alpha = 0.7) +
scale_fill_viridis_d(option = "D") +
labs(title = "Balance Distribution by Cluster", x = "Balance", y = "Density") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
# Duration distribution by cluster
p3 <- ggplot(bank_data %>% filter(duration < quantile(duration, 0.99)),
aes(x = duration, fill = cluster)) +
geom_density(alpha = 0.7) +
scale_fill_viridis_d(option = "D") +
labs(title = "Call Duration Distribution by Cluster", x = "Duration (seconds)", y = "Density") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
# Campaign distribution by cluster
p4 <- ggplot(bank_data, aes(x = campaign, fill = cluster)) +
geom_histogram(position = "dodge", bins = 10, alpha = 0.7) +
scale_fill_viridis_d(option = "D") +
labs(title = "Number of Campaigns by Cluster", x = "Number of Contacts", y = "Count") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
# Arrange plots
grid.arrange(p1, p2, p3, p4, ncol = 2)The feature distribution analysis reveals:
These distinctive patterns validate the clustering approach and provide clear targeting dimensions for marketing strategies.
# Job distribution within each cluster
job_cluster <- bank_data %>%
count(cluster, job) %>%
group_by(cluster) %>%
mutate(percent = n / sum(n) * 100) %>%
arrange(cluster, desc(percent)) %>%
group_by(cluster) %>%
top_n(5, percent)
ggplot(job_cluster, aes(x = reorder(job, percent), y = percent, fill = cluster)) +
geom_bar(stat = "identity") +
scale_fill_viridis_d(option = "D") +
facet_wrap(~cluster, scales = "free_y") +
coord_flip() +
labs(title = "Top 5 Jobs within Each Cluster",
x = "Job",
y = "Percentage (%)") +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
strip.text = element_text(face = "bold")
)The occupational analysis shows both common patterns and segment-specific characteristics:
These occupational patterns provide additional targeting dimensions for customized marketing approaches.
# Calculate efficiency metrics
subscription_performance <- bank_data %>%
group_by(cluster) %>%
summarise(
Total = n(),
Total_Percentage = n() / nrow(bank_data) * 100,
Subscribed = sum(y == "yes"),
Subscription_Percentage = Subscribed / sum(bank_data$y == "yes") * 100,
Not_Subscribed = sum(y == "no"),
Subscription_Rate = Subscribed / Total * 100,
Efficiency_Index = (Subscription_Percentage / Total_Percentage)
)
# Create efficiency visualization
ggplot(subscription_performance, aes(x = Total_Percentage, y = Subscription_Percentage, color = cluster)) +
geom_point(aes(size = Subscription_Rate)) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "gray") +
scale_color_viridis_d(option = "D") +
scale_size_continuous(range = c(3, 15)) +
geom_text(aes(label = paste0("Cluster ", cluster)), hjust = -0.3, vjust = 1.5) +
labs(title = "Marketing Efficiency by Customer Segment",
x = "Percentage of Total Customers",
y = "Percentage of Total Subscriptions",
size = "Subscription Rate (%)") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))# Display the performance table
kable(subscription_performance %>%
select(cluster, Total_Percentage, Subscription_Percentage, Subscription_Rate, Efficiency_Index) %>%
arrange(desc(Efficiency_Index)),
caption = "Marketing Efficiency Metrics by Segment",
col.names = c("Cluster", "% of Customers", "% of Subscriptions", "Conversion Rate (%)", "Efficiency Index"),
digits = 1)| Cluster | % of Customers | % of Subscriptions | Conversion Rate (%) | Efficiency Index |
|---|---|---|---|---|
| 4 | 8.0 | 31.8 | 48.6 | 4.0 |
| 3 | 3.2 | 7.3 | 28.2 | 2.3 |
| 6 | 3.4 | 4.5 | 15.8 | 1.3 |
| 5 | 31.8 | 24.3 | 9.3 | 0.8 |
| 1 | 50.4 | 31.5 | 7.6 | 0.6 |
| 2 | 3.2 | 0.6 | 2.2 | 0.2 |
The marketing efficiency analysis reveals dramatic differences in segment performance:
These metrics provide clear guidance for optimizing marketing spend across segments.
Based on the comprehensive cluster analysis, we recommend the following segment-specific marketing approaches:
Cluster 4: High Conversion Segment (48.6% conversion) - Allocate highest marketing budget share despite small size (8% of customers) - Emphasize detailed product explanations during extended conversations - Design specialized training for representatives handling these customers - Develop exclusive early-access offers to maintain engagement
Cluster 3: Engaged Mid-Value Clients (28.2% conversion) - Implement relationship-based marketing with assigned account managers - Create tailored financial advisory services matching their moderate wealth level - Develop loyalty programs that reward their high responsiveness - Focus on service upgrades and complementary product offerings
Cluster 6: Affluent Clients (15.8% conversion) - Deploy premium wealth management solutions - Design exclusive investment opportunities matching their high financial capacity - Emphasize high-end benefits and prestiged positioning - Focus on retention and share-of-wallet growth rather than simple acquisition
Cluster 5: Senior Conservative Customers (9.3% conversion) - Create age-appropriate messaging focused on security and stability - Develop retirement-oriented financial products - Use traditional communication channels matching preferences - Emphasize long-term benefits and risk minimization
Cluster 1: Mass Market Customers (7.6% conversion) - Implement low-cost digital campaigns only - Test microsegmentation to identify high-potential subgroups - Develop entry-level products with clear value proposition - Focus on improving engagement metrics before pushing conversions
Cluster 2: Marketing Resistant Group (2.2% conversion) - Drastically reduce contact frequency (currently excessive at 16.1 contacts) - Test alternative channels and messaging approaches - Consider deprioritizing for active marketing - Monitor for changes in behavior patterns
To operationalize these insights, we’ll apply the clustering model to the entire customer database:
# Scale the full data
full_numerical_vars <- c("age", "balance", "duration", "campaign", "previous")
full_cluster_data <- bank_data_full[, full_numerical_vars]
full_cluster_data_scaled <- scale(full_cluster_data)
# Apply the k-means model to all customers
set.seed(123)
full_kmeans_result <- kmeans(full_cluster_data_scaled, centers = kmeans_result$centers, nstart = 1)
# Add cluster assignments to the full dataset
bank_data_full$cluster <- as.factor(full_kmeans_result$cluster)
# Export the segmented data for business use
write.csv(bank_data_full, "bank_segmented_full_k6.csv", row.names = FALSE)
write.csv(bank_data, "bank_segmented_sample_k6.csv", row.names = FALSE)
# Confirm success
"Customer segmentation model successfully applied to the full dataset."By implementing a segmentation-based approach, the bank can expect:
This customer segmentation analysis has revealed six distinct customer groups with dramatically different product adoption propensities and engagement patterns. The findings demonstrate that:
By redirecting marketing resources based on these insights, the bank can significantly improve campaign effectiveness, customer experience, and ultimately, profitability. The segmentation approach also provides a foundation for future product development and service design tailored to specific customer needs.
The implementation of this segmentation framework represents a shift from mass marketing to precision targeting, enabling the bank to compete more effectively in an increasingly personalized financial services landscape.