Elsevier

Expert Systems with Applications

Volume 40, Issue 12, 15 September 2013, Pages 4944-4956
Expert Systems with Applications

Bagged Clustering and its application to tourism market segmentation

https://doi.org/10.1016/j.eswa.2013.03.005Get rights and content

Abstract

Aim of the paper is to propose a segmentation technique based on the Bagged Clustering (BC) method. In the partitioning step of the BC method, B bootstrap samples with replacement are generated by drawing from the original sample. The fuzzy C-medoids Clustering (FCMdC) method is run on each bootstrap sample, obtaining (B × C) medoids and the membership degrees of each unit to the different clusters. The second step consists in running a hierarchical clustering algorithm on the (B × C) medoids. The best partition of the medoids is obtained investigating properly the dendrogram. Then each unit is assigned to each cluster based on the membership degrees observed in the partitioning step. The effectiveness of the suggested procedure has been shown analyzing a suggestive tourism segmentation problem. We analyze two sample of tourists, each one attending a different cultural attraction, enlightening differences among clusters in socio-economic characteristics and in the motivational reasons behind visit behavior.

Highlights

► A novel segmentation technique based on the Bagged Clustering is proposed. ► In the partitioning step the Fuzzy C-Medoids Clustering method is run. ► Visitors of two cultural events are clustered with respect to the their motivations.

Introduction

For a long time visitors of cultural attractions were treated as a homogeneous group of people. The tendency of the recent tourism literature is instead to consider them as heterogeneous groups with different characteristics, perceptions and needs (Hughes, 2002). Brida, Disegna, and Osti (2012) showed that visitors of Christmas Markets in Northern Italy clustered into three groups, according to a set of motivational factors that affect the visit behavior. Other studies showed that tourists who visited art museums presented different socio-demographic characteristics (in particular regarding the level of education, income and occupation) than those who engaged in festivals, musical activities, theme parks, amusements parks, local fairs, and events (Bennett and Council, 1994, Kim et al., 2007, Schuster, 1991).

Market segmentation is a process used in order to discover homogeneous subgroups in the market, according to specific characteristics of customers. In tourism market segmentation, tourists grouped in the same segment are similar to each other (and different from those in other segments) in the way they react to internal stimulus, as desires or emotions, and/or external stimulus, as promotions or advertising.

Understanding the characteristics and the behavior of tourists can be crucial for marketing success. This information allows the marketers to direct marketing efforts toward the groups of tourists more economically significant, improving the overall survival and profitability of cultural attractions, businesses, firms, and destinations in a market that is more and more competitive.

Since the introduction of market segmentation in the late 1950s, the number and type of approaches for segmentation has grown enormously (Dolnicar and Leisch, 2004, Liao et al., 2012). Boone and Roehm (2002) pointed out that there are over 50 methods that can be applied to deal with market segmentation problems. Since each method conducts a multivariate description of the data, grouping units based on their similarity, “different methods present different views of data” (Leisch, 2006). Unfortunately, as emphasized by many researchers, no absolutely “correct” method to segment exists in the literature (Beane and Ennis, 1987, Dolnicar et al., 2008, Kotler et al., 2010, Tkaczynski and Rundle-Thiele, 2011), since the underlying relationships among units have different structures, depending on data at hand, and the researcher must find the best segmentation method to capture this hidden structure. In addition, the researcher intervenes in different moments of the estimation process, “creating” an ever increasing number of new segmentation methods and giving subjective interpretations of the final results.

Segmentation techniques can be classified into two groups, namely supervised and unsupervised classification techniques. Supervision means that “membership of data points which can illustrate the general structure of the group is required in order to derive the classification rules” (Budayan, Dikmen, & Birgonul, 2009). Unsupervision implies that there is no rule for the initiation of classification and that the empirical distribution and characteristics of the data will determine the segments’ membership.

Cluster analysis methods represent the most used unsupervised market segmentation techniques in the literature and comprise a set of different techniques, which can be broadly divided into partitioning and hierarchical methods (Saarenvirta, 1998). Given a set of selected segmentation variables, these methodologies aggregate units in groups, in such a way that each group contains the most similar units and, at the same time, is the most dissimilar from the remainder groups.

Beyond more traditional methods, non-linear techniques, such as Neural Networks (NNs) algorithms and Kohonen Map (Kohonen, 1989), or Self-Organizing Map (SOM), have also been used in tourism research. Mazanec (1992) is one of the first scholars to use NNs, applying this technique to a market segmentation analysis of Austrian tourists in the “Euro-Sports Region”. Dolnicar (1997) used the SOM to identify the characteristics of summer tourists visiting Austria. The latter method was also used, for example: to identify strategic groups of UK hotels (Curry, Davies, Phillips, Evans, & Moutinho, 2001); to segment senior travelers in Western Australia (Kim, Wei, & Ruys, 2003); to segment the international tourist market in Cape Town, South Africa (Bloom, 2005); to segment the visitors of a particular cultural attraction in the Northern Italy, the Christmas Markets (Brida et al., 2012).

More recently, the Bagged Clustering (BC) algorithm, based on the Bagging (“bootstrap aggregating”) procedure (Breiman, 1996), has been introduced in the tourism market segmentation (Dolnicar and Leisch, 2000, Dolnicar and Leisch, 2003, Dolnicar and Leisch, 2004; Leisch, 1999).

BC combines sequentially partitioning and hierarchical clustering methods, to overcome some limitations of both the two procedures. In its initial formulation, first a partitioning method, namely the classic k-means algorithm, is applied to B bootstrap samples generated from the data set. Then a hierarchical method is applied to the results of the partitioning steps. This procedure presents two main advantages, with respect to more traditional clustering techniques (Leisch, 1999): (i) it is not necessary to impose the number of clusters in advance; (ii) the final solution is less dependent on the initialization of the algorithm.

The aim of this paper is to propose a novel segmentation technique based on the BC algorithm. The main difference is that in the partitioning step the fuzzy C-medoids Clustering (FCMdC) algorithm (Krishnapuram, Joshi, Nasraoui, & Yi, 2001) is adopted. FCMdC inherits both the benefits of the partitioning around medoids-based clustering approach and the flexibility and other benefits of the fuzzy approach (see, e.g. D‘Urso, Di Lallo, & Maharaj, 2013). By means of this approach, units are classified in homogeneous classes characterized by prototypal observed units (the medoids), which synthesize the structural information of each cluster. Units are assigned to different clusters with fuzzy membership degrees, representing an uncertainty measure in the assignment process. Conversely to crisp clustering in which the membership degrees can assume values 0 or 1, in the fuzzy clustering the membership degrees assume values between 0 and 1. This approach has the advantage to allow capturing the vague (fuzzy) behavior of particular units. This is reasonable in the market segmentation, when some customers may share some characteristics to more than one segment and, hence, assigning one of them to only one cluster entail a loss of information.

In order to show the effectiveness of the procedure, an empirical analysis on tourism data is finally provided and discussed. The analysis is carried out on two different surveys. The first considers tourists that visit the Museum of Modern and Contemporary Art of Trento and Rovereto, Italy, during the summer season of 2011. In the second survey, Italian visitors of the Christmas Market held in Merano, Italy, in 2011 have been interviewed. Both surveys collected a set of socio-economic characteristics of the tourists and information about the trip. In addition, questions aimed at detecting the motivational factors affecting their visit behavior have been submitted to respondents. In both applications, the segmentation variables are the items regarding the motivational factors, while the socio-economic characteristics and other additional information serve for the ex-post analysis of the obtained clusters.

The paper is organized as follows: in Section 2 we first proceed by overviewing the clustering technique proposed; in Section 3 samples and questionnaires employed in the empirical application are presented, and the clustering results are discussed; in Section 4 final considerations and remarks for future researches are discussed.

Section snippets

Methodology

Two main issues in market segmentation based on cluster analysis are deserved to be mentioned. First, the detection of the appropriate number of market segments (clusters) in the dataset and, secondly, the allocation of customers to these clusters, assessing the accuracy of cluster assignments for each unit.

The Bagging (“bootstrap aggregating”) procedure (Breiman, 1996) is a resampling method applied in the field of supervised and unsupervised learning to improve the accuracy of prediction.

Tourism market segmentation: a case study

In this section we provide an application of the proposed clustering method in the field of tourism market. Profiling visitors can be of crucial importance for local policymakers, managers and marketing analysts. Identifying homogeneous clusters of consumers-visitors can be in fact an essential step both for planning and developing appropriate strategies and for directing the political and economical actions towards the groups economically more relevant.

Two datasets drawn from surveys conducted

Conclusions

In this paper we propose a clustering method based on the “bagging” (bootstrap aggregating) procedure. Bagging procedure is recognized to enhance stability of results and classification accuracy (Breiman, 1996).

Building on the BC method proposed by Leisch (1999), we make use of the fuzzy C-medoids Clustering (FCMdC) algorithm in the partitioning phase of the procedure. Once the hierarchical phase, which is carried out on the medoids identified, is completed, we attribute each unit to a cluster

Acknowledgments

The authors thank the Professor Juan Gabriel Brida, Free University of Bolzano, for providing us data used in the empirical applications of this paper. The data collections were supported by the Autonomous Province of Bolzano project “Le attrazioni culturali e naturali come motore dello sviluppo turistico. Un’analisi del loro impatto economico, sociale e culturale”, Research Funds 2009.

References (42)

  • F. Leisch

    A toolbox for k-centroids cluster analysis

    Computational Statistics & Data Analysis

    (2006)
  • A. McBratney et al.

    Application of fuzzy sets to climatic classification

    Agricultural and Forest Meteorology

    (1985)
  • A. Tkaczynski et al.

    Event segmentation: A review and research agenda

    Tourism Management

    (2011)
  • M. Wedel et al.

    A fuzzy clusterwise regression approach to benefit segmentation

    International Journal of Research in Marketing

    (1989)
  • M. Yang et al.

    On a class of fuzzy c-numbers clustering procedures for fuzzy data

    Fuzzy Sets and Systems

    (1996)
  • T. Beane et al.

    Market segmentation: A review

    European Journal of Marketing

    (1987)
  • Bennett, T., & Council, A. (1994). The Reluctant Museum Visitor: A study of non-goers to history museums and art...
  • L. Breiman

    Bagging predictors

    Machine Learning

    (1996)
  • Brida, J. G., Disegna, M., & Scuderi, R. (2012b). Segmenting visitors of cultural event: the case of the Christmas...
  • Brida, J. G., Disegna, M., & Osti, L. (2013a). Authenticity perception of cultural events: A host-tourist analysis....
  • Brida, J. G., Disegna, M., & Scuderi, R. (2013b). Visitors of two types of museums: a segmentation study. Expert...
  • Cited by (56)

    • K-bMOM: A robust Lloyd-type clustering algorithm based on bootstrap median-of-means

      2022, Computational Statistics and Data Analysis
      Citation Excerpt :

      However, there are also essential differences, since we select one of the candidates by a simple median criterion for dimension one statistics, whereas consensus clusterings aggregate the candidates in a more complicated fashion, using some similarity measures between clusterings. Interestingly, so-called bagged clustering (Leisch, 1999; Dolnicar and Leisch, 2003; D'Urso et al., 2013) proposes performing clusterings on bootstrap samples and aggregating them using a hierarchical clustering on the collection of obtained centroids. Moreover, the size of the bootstrap samples is equal to the original sample size, whereas in our approach the sub-sampling is crucial and directly related to the allowed proportion of outliers (see Section 3.2).

    • Application of graph theory to mining the similarity of travel trajectories

      2021, Tourism Management
      Citation Excerpt :

      The PNTS scores of each nationality were used to perform (1) heat map analysis, which presents similar trajectory matching between a pair of nationalities, and (2) agglomerative hierarchical clustering, which groups nationalities on the basis of similarity scores (Xu, Li, Belyi, & Park, 2020). Cluster analysis divides a universal set into subsets (Hastie, Tibshirani, & Friedman, 2009), and it provides widely used unsupervised market segment techniques (D’Urso, De Giovanni, Disegna, & Massari, 2013). Notably, we chose agglomerative hierarchical clustering over other alternatives (e.g., k-means) for two reasons.

    View all citing articles on Scopus
    View full text