Chapter 3 Deep into data

Scenario. While your first attempt to show the company’s performance to Lucy was not bad, clearly she has a lot more requirements than what you provided. She wants to know more about the year over year sales, sliced into different categories, sub-categories, and countries. She also wants to see additional information such as customer demographics.

Jack has provided you with a different data source. This time the data has more than one hundred thousand rows.

Before you can create additional reports to Lucy, first you need to prepare the data.

Load libraries

library(tidyverse)
library(lubridate, warn.conflicts = F)
library(formattable)
library(DT)
library(kableExtra)

Load data for this Lab:

my_data <- readRDS("data/processing/data4week2.rds")

3.1 What is the total revenue for all the sales in the United States?

ans_1 <- my_data %>%
  filter(Country == "United States") %>%
  summarise(`US Revenue` = sum(Revenue)) %>%
  currency(digits = 0)

Answer: $27,975,547


Now, you need to add several columns, derived from existing columns in the data. First, let’s add a Month column. Insert a new column to the left of the Customer ID column.

my_data <- my_data %>%
    mutate(Month = month(Date)) %>%
    select(Date:Year, Month, everything())

3.2 What is the total revenue for all the sales in the month of December?

my_data %>%
  filter(Month == 12) %>%
  summarise(`Total Revenue` = sum(Revenue)) %>%
  currency(digits = 0) -> ans_2

Answer: $9,086,931


Next, let’s add an “Age Group” column. Insert a new column to the left of the Customer Gender, and use formula to derive the age group from the Customer Age column. Let’s group the customers based on the following criteria:

  • Youth (<25);
  • Young Adults (25-34);
  • Adults (35-64);
  • Seniors (>64)

Define function age_group()

age_group <- function(age) {
        if (age < 25) {
            "Youth"
        } else if (age > 24 & age < 35) {
            "Young Adults"
        } else if (age > 34 & age < 65) {
            "Adults"
        } else {
            "Seniors"
        }
}

Add Age Group column

my_data <- my_data %>%
    mutate('Age Group' = map_chr(.$`Customer Age`, age_group)) %>%
    select(Date:`Customer Age`, `Age Group`, everything())

3.3 What is the total revenue for all the sales for young adults age group?

my_data %>%
  filter(`Age Group` == "Young Adults") %>%
  summarise(Total = sum(Revenue)) %>%
  currency(digits = 0) -> ans_3

Answer: $30,655,614


Now, let’s add a Frame Size column. Insert a new column to the left of the Order Quantity use formula to derive the frame size of a bicycle from the last two characters of the Product column, when the Product Category is Bikes, otherwise - NA.

my_data <- my_data %>%
    mutate('Frame Size' = ifelse(my_data$`Product Category` == 'Bikes', my_data$Product, NA) %>%
               str_extract("\\d\\d$") %>%
               as.numeric()) %>%
    select(Date:Product, 'Frame Size', everything())

3.4 What is the total revenue for all the bikes with frame size 62 for the customer age group seniors?

ans_4 <- my_data %>%
    filter(`Product Category` == "Bikes",
           `Frame Size` == 62,
           `Age Group` == "Seniors") %>%
    summarise(`Total Revenue` = sum(Revenue)) %>%
    currency(digits = 0)

Answer: $12,452


Let’s add a Profit column. Insert a new column to the right of the Revenue, and use formula to derive the Profit from both the Revenue and Cost columns. Show the Total for the Profit column.

my_data <- my_data %>%
    mutate(Profit = Revenue - Cost)

3.5 What is the total profit for United States sales in the month of October 2015, for customer age group adults?

ans_5 <- my_data %>%
    filter(Country == "United States", Year == 2015, Month == 10, `Age Group` == "Adults") %>%
    summarise(`Total Profit` = sum(Profit)) %>%
    currency(digits = 0)

Answer: $138,419