Hotel Booking Analysis: Understanding Guest Cancellations

--

Key insights into cancellation trends and their contributing factors

Introduction

The hotel booking dataset consists of over 119,000 records capturing reservation details for two types of hotels — City Hotels and Resort Hotels — across a specific time period. This dataset provides insights into guest behavior, booking channels, lead times, and cancellation patterns. By analyzing these features, hoteliers and analysts can identify areas for optimizing revenue, improving customer retention, and enhancing overall operational efficiency.

Problem Statement

High cancellation rates and unpredictability in hotel bookings present a critical challenge to revenue management. A significant portion of bookings never materializes into actual stays, resulting in lost revenue opportunities, misallocation of room inventory, and operational inefficiencies. This project aims to:

  1. Investigate the root causes of cancellations (e.g., lead time, market segment, deposit types).
  2. Understand booking behavior patterns (e.g., group vs. individual, previous cancellations, total guests).
  3. Develop actionable insights and strategies to reduce cancellation rates and improve overall hotel performance.

About the Dataset

The dataset includes 119,388 observations for bookings at two types of hotels: City Hotels and Resort Hotels. Each observation represents a booking made between July 2015 and August 2017, with key details captured across 33 columns. These columns include (but are not limited to):

  • hotel: Specifies whether the booking was for a City Hotel or a Resort Hotel.
  • is_canceled: Indicates whether the booking was canceled (1) or not (0).
  • lead_time: Number of days between the booking date and the arrival date.
  • arrival_date: Date the guest(s) arrived (or were meant to arrive) at the hotel.
  • market_segment: The source of the booking, such as Online Travel Agencies (OTA), Direct, or Group bookings.
  • deposit_type: Indicates whether the booking required a deposit (No Deposit, Refundable, or Non-Refundable).
  • adr (Average Daily Rate): Revenue generated per room per night.

Some Terminologies in the Dataset

  1. Lead Time: The number of days between the booking date and the arrival date.
  2. Average Daily Rate (ADR): The average revenue earned per room per night, calculated by dividing the total lodging revenue by the number of occupied room nights.
  3. Market Segment: The source of the booking, such as Direct, Online Travel Agencies (OTAs), or Group bookings.
  4. Deposit Type: Indicates the type of payment or deposit made when booking (e.g., No Deposit, Refundable, Non-Refundable).
  5. Special Requests: Specific requests made by the guest, such as a high floor, extra pillows, or late check-out.
  6. Cancellation Rate: The percentage of total bookings that were canceled.
  7. Previous Cancellations: The number of bookings a guest had canceled before their current booking.
  8. Total Guests: The sum of adults, children, and babies in a single booking.

Data Transformation

Step 1: Initial Cleaning in Excel

The raw dataset was first reviewed and cleaned in Excel to ensure data quality and consistency. The following steps were performed:

  1. Duplicate Removal: Duplicate rows were identified and removed to avoid double-counting bookings or skewing the analysis.
  2. Creating the Arrival Date Field: The arrival_date_year, arrival_date_month, and arrival_date_day_of_month columns were combined into a single Arrival Date field, simplifying date-related analysis and enabling trend visualization.
  3. Creating the Total Guests Field: A new Total Guests column was generated by summing the adults, children, and babies columns, providing a clearer view of the total number of guests per booking.

Step 2: Advanced Transformation in Power BI (Power Query and DAX)

Once the data was prepared in Excel, further transformations were carried out in Power Query within Power BI:

  1. Column Renaming: Fields were renamed where necessary to improve clarity (e.g., adr was renamed to "Average Daily Rate").
  2. Data Type Corrections: Ensured that all numerical columns (e.g., lead_time, adr) and date fields (e.g., arrival_date) were formatted correctly for accurate analysis.
  3. Filter and Segmentation: Filters were applied to isolate key data segments, such as canceled vs. non-canceled bookings or specific market segments, enabling targeted insights.
  4. Creating Lead Time Range Groups:
    To analyze booking patterns based on lead times, a new column lead_time_range was created with the following DAX formula.
lead_time_range = 
SWITCH (
TRUE(),
hotel_bookings[lead_time] <= 180, "Under 6 Months",
hotel_bookings[lead_time] <= 365, "6 Months - 1 Year",
hotel_bookings[lead_time] <= 547, "1 Year - 18 Months",
"Over 18 Months"
)

5. Grouping Guests with Previous Cancellations:
A calculated column named has_previous_cancellations was created using the following formula to categorize bookings:

has_previous_cancellations =
IF (
hotel_bookings[previous_cancellations] = 0,
0,
1
)

Insights

1. Overall Cancellation Rate (37%)

Out of 119,388 total bookings, 44,224 were canceled, resulting in a 37% cancellation rate.

2. City Hotel vs. Resort Hotel

The City Hotel has a cancellation rate of about 41.7%, whereas the Resort Hotel’s rate is around 27.8%.

3. Non-Refundable Deposit Cancellation Rate (99.4%)

  • Bookings with a Non-Refundable deposit type exhibit a 99.4% cancellation rate, meaning nearly all such bookings ended up canceled.
  • “No Deposit” bookings show about a 28.4% cancellation rate, while Refundable deposits hover around 22.2%.

4. Lead Time Impact

  • Bookings made more than six months in advance have a higher cancellation rate, which reaches 99.2% for those booked over 18 months ahead.
  • Shorter lead times (under six months) reflect a lower cancellation rate of approximately 31.8%.

5. Market Segment Patterns

  • Group bookings record the highest cancellation rate (approximately 61.1%).
  • Online Travel Agencies (OTA) follow at about 36.7%, while Direct (15.3%) and Corporate (18.7%) channels have lower rates.

6. Room Type Variations

Certain room types (e.g., Room Type A) account for a large share of total and canceled bookings. Rates generally range between 30–40% across different room categories.

7. Cancellation Trends Over Time

While total cancellations decreased from 2016 to 2017, the proportion of canceled bookings increased from 35.9% to 38.7% in the same period, indicating fewer total bookings but a higher share of cancellations.

Recommendations

1. Revisit Deposit and Cancellation Policies

The extremely high 99.4% cancellation rate for non-refundable bookings indicates a need to reassess policy terms. To reduce unexpected cancellations consider:

  • Partial refunds,
  • Flexible date changes, or
  • Better customer communication.

2. Targeted Strategies for Group Bookings

Since Group bookings show the highest cancellation rate (61.1%), implement stricter deposit requirements, milestone payments, or attractive retention incentives (e.g., discounts for groups that finalize on schedule).

3. Encourage Bookings Closer to Arrival Dates

Given the high cancellation rate for long lead times, create promotions for nearer-term reservations or offer more flexible policies to disincentivize cancellations for those who book far in advance.

4. Promote Direct and Corporate Channels

Direct (15.3% cancellation rate) and corporate (18.7%) channels are more stable. Strengthen these avenues with loyalty programs, corporate partnerships, or exclusive packages to mitigate revenue loss.

5. Refine City Hotel Strategy

The 41.7% cancellation rate for City Hotels suggests a focus on pricing optimization and tailored offerings. Analyze competitive landscapes, local events, and typical booking patterns to reduce last-minute cancellations.

Conclusion

This project identified key factors contributing to high cancellation rates, such as lead time, deposit type, and market segment. Long lead times and non-refundable deposits significantly increase the likelihood of cancellations. By refining these policies and targeting more reliable booking channels, hotels can reduce cancellations, improve revenue predictability, and enhance the guest experience. These findings provide practical steps for improving operational efficiency and financial performance in the hospitality industry.

Access the Interactive Dashboard

Explore the data further through the interactive dashboard, where you can analyze key trends, cancellation patterns, and booking behaviors in real-time. Access the dashboard here.

--

--

No responses yet