Roberto Busby

Logo


👋 Hi, I'm a Data Analyst with experience in data manipulation, business intelligence, and process automation. I love turning complex data into insights that drive positive change.

Visit my LinkedIn

GitHub Profile

SkyLink Communications Customer Cluster Analysis

Back to Projects

Table of Contents

Overview

In this project, I analyzed customer data from SkyLink Communications to identify distinct customer segments using cluster analysis in R. By preparing and normalizing the data, I ensured accurate clustering results. Using the Elbow Method, I determined the optimal number of clusters, then applied the k-means algorithm to segment the customers. The analysis revealed four distinct clusters: Global Communicators, Data Enthusiasts, Emerging Users, and Local Loyalists. Each group exhibits unique behaviors and preferences, providing SkyLink Communications with valuable insights to tailor their marketing strategies and service offerings to better meet the diverse needs of their customers.

Key Findings Summary

Cluster Characteristics Marketing Strategies
Global Communicators - Long-term customers
- High international call usage
- High international charges
- Loyalty programs
- Premium international calling plans
Data Enthusiasts - High data usage
- Moderate international call usage
- Unlimited or high-data plans
- Bundled data and international calling features
- Promotions focusing on data services
Emerging Users - Newer customers
- Minimal service usage
- Engagement strategies to increase service utilization
- Introductory offers
- Educational marketing on services
Local Loyalists - Long-term customers
- High local call usage
- Low international call usage
- Bundled services emphasizing local features
- Incentives for increased local usage
- Tailored local communication packages or discounts

Key Steps and Insights

Data Preparation and Exploration

I began by preparing and exploring the data, focusing on key continuous variables that influence customer behavior:

Data Normalization

To ensure a fair comparison, I normalized the data, making sure each variable equally contributed to the clustering process.

#normalize each variable
quant_vars_n <- scale(quant_vars)
View(quant_vars_n)

Determining the Optimal Number of Clusters

Using the Elbow Method, I determined that four clusters were optimal for the data. This method helped me identify the point where adding more clusters didn’t significantly improve the model.

# Plot the elbow plot to visually determine the optimal number of clusters based on the point
# where the decrease in the within-cluster sum of squares (WSS) begins to slow down (elbow point)
ggplot(elbowdf, mapping = aes(x = k_values, y = wss_values)) +
  geom_line() +
  geom_point() +
  scale_x_continuous(breaks = seq(1, 10, 1))

K-Means Clustering

I performed k-means clustering and identified four distinct customer segments:

  1. Global Communicators
  2. Data Enthusiasts
  3. Emerging Users
  4. Local Loyalists

Visualizations

Distribution of Account Length by Cluster

I visualized the distribution of customer tenure across clusters. The red dashed line indicates the overall mean account length of approximately 32 months.

Distribution of Account Length in Months by Cluster

Local vs. International Minutes by Cluster

The scatter plot below shows the relationship between local and international minutes for each cluster.

Local Minutes vs International Minutes by Cluster

Conclusion

Summary of Insights

Recapping the main discoveries:

Reflection on the Process

Through this project, I learned:

See this file for the detailed analysis.

Back to Projects