Business Intelligence and Data Mining in tourism
Business intelligence is nowadays used as an umbrella term for concepts and methods to improve business decision making that use fact-based support systems typically covering techniques, like data warehousing, reporting & OLAP (online analytical processing) and data mining. Data mining aims at discovering correlations, patterns and trends by sifting through large amounts of data, by using pattern recognition, and statistical/mathematical techniques. The application of techniques from the area of business intelligence and data mining gains more and more momentum in the tourism domain and currently constitutes an important issue and research challenge concerning the use of ICT in the tourism domain.
The competitiveness of any kind of tourism organisations strongly depends on how information needs are satisfied by information and communication technologies (Buhalis, 2006; Back, et al., 2007). However, although huge amounts of information on customers, products, processes, and competitors are electronically available in tourism (e.g. web-servers store tourists’ website navigation, computer reservation systems (CRS) save bookings and customer profiles, property management systems (PMS) and destination management systems (DMS) store tourism offers and supplier information), these valuable knowledge sources are not used adequately in tourism destinations (Pyo, 2005; Höpken, et al., 2011, p. 417). Thus, managerial competences and organisational learning in tourism destinations could be significantly enhanced by applying methods of business intelligence (BI) and data mining (DM), offering reliable, up-to-date and strategically relevant information about tourists’ travel motives and service expectations, channel use and related conversion rates, booking trends, and estimates about the quality of service experience and value-added per guest segment (Min & Emam, 2002; Pyo, et al., 2002; Sambamurthy & Subramani, 2005; Wong, et al., 2006).
Since the widespread adoption of computerized reservation and booking systems in the 1980ies, comprehensive databases are available for all types of tourism transactions, i.e. the complete booking and consumption behavior (e.g. Passenger Name Record (PNR) databases of global distribution systems (GDS) or the airline on-time performance database of the Bureau of Transportation Statistics; BTS, 2012). Immediately, especially airline companies started to analyze such data as input to process and product optimization, respectively. A first prominent example in the area of revenue and yield management is the DINAMO system, introduced by American Airlines in 1988 (Smith et al. 1992). Further early examples can be found in the area of demand forecasting (McGill & Van Ryzin, 1999; Subramanian et al., 1999), prediction of cancellation or no-show behavior (Hueglin & Vannotti, 2001; Lawrence et al., 2003; Garrow & Koppelman, 2004), or customer segmentation (Min & Enam, 2002).
Only very recently, data mining (DM) became increasingly important for tourism branches, due to its ability to discover previously unknown patterns in huge data bases through explorative techniques and - compared to most statistical methods - to also identify non-linear relationships (Fuchs & Höpken, 2009; Fuchs et al. 2010; Höpken et al., 2011). Although, the potential of DM is not fully used in tourism, yet, all major DM techniques are principally applied. More precisely, descriptive data analysis is widely used in form of reports or online analytical processing (OLAP), e.g. to visualize tourism arrivals depending on dimensions, like time/season, travel type or customer origin (e.g. TourMIS; Wöber, 1998; Destinometer; Fuchs & Weiermair, 2004). Methods of supervised learning, like classification, estimation and prediction are used to explain tourists’ booking/cancellation or consumption behavior (Law, 1998; Iliescu, 2006; Morales & Wang, 2008) and to predict tourism demand (Chu, 2004; Goh et al., 2008; Vlahogianni & Karlaftis, 2010). By contrast, as a method of unsupervised learning, clustering is one of the most heavily used DM technique in tourism, mostly applied to the task of customer segmentation as input to product differentiation, dynamic pricing or customer relationship management (Bloom, 2004; Xia et al., 2010; Kuo et al., 2012).
With the uptake of the World Wide Web and its tremendous adoption in tourism the topic of web DM gained more and more attention. Web content mining, i.e. the analysis of content from online platforms and websites, first of all deals with the analysis of user generated content (UGC), i.e. tourists’ feedback and comments in blogs or review platforms, which currently constitutes one of the most intensively researched topics in tourism (Bronner & Hoog, 2011; Lexhagen, et al., 2012; Kuttainen, et al., 2012). Methods of text mining are applied to the tasks of feedback aggregation and opinion mining or sentiment detection, typically based on statistical or linguistic approaches (Kasper & Vela, 2011; Gräbner et al., 2012; Schmunk, et al., 2014). Additionally, web content mining increasingly deals with the extraction of knowledge about tourism markets and offers (market and concurrence analysis) (Walchhofer et al., 2010). Web usage mining is dealing with the analysis of tourists’ behavior when using online platforms or websites. Although current applications typically focus on descriptive analyses, like number of clicks or sessions depending on dimensions, like time, origin of user or URL, also supervised and unsupervised learning techniques have been applied, like customer segmentation for website adaptation and product recommendation (Wallace et al., 2004; Pitman et al., 2010) or (sequential) association rule mining for click-stream analysis (Jiang & Gruenwald, 2006).
Examples / Research
A prominent example in the area of revenue and yield management in the airline industry is the DINAMO system introduced by American Airlines in 1988 (Smith, et al., 1992). DINAMO builds on American Airline’s GDS SABRE as data source, providing comprehensive information on all transactions, related to the business processes reservation/booking, cancellation (no-show), and offerings/resource management. Mathematically, the yield management problem is represented as a nonlinear, stochastic, mixed-integer program and, in order to reduce complexity, it is broken down into the sub-problems overbooking, discount allocation and traffic management. The overbooking problem is solved by forecasting cancellations, no-shows, and over sale (i.e. compensation) costs, while a consecutive revenue optimization is finding the optimal overbooking level which equals the marginal revenue gained and over sale costs. The discount allocation problem is represented by a decision tree, based on demand predictions for multiple fare types, using exponential smoothing time-series techniques and a passenger-choice model reflecting customer reactions on schedule and price changes. Finally, traffic management handles the problem of single flights serving different markets due to connecting flights in a hub and spoke network, and is handled by clustering a multitude of market/fare combinations into a limited and manageable number of similar-valued groups, called buckets (Smith, et al., 1992).
Early applications of BI can also be found in the area of tourism destinations and the hospitality industry. A typical example is the Austrian tourism marketing information system TourMIS (Wöber, 1998), offering market research information and decision support for tourism destinations and stakeholders. Based on a homogenous data model for tourism arrivals, overnight stays and visits at tourism attractions, TourMIS collects data directly from destination management organisations by a manual data input process, restricting data granularity to mostly yearly, or in some cases monthly, aggregates. TourMIS supports especially descriptive (i.e. OLAP-like) analyses of tourism performance indicators, like arrivals, overnights or visits aggregated on the level of tourism destinations, regions, countries, or customer characteristics, like sending country. Additionally, techniques of trend analysis and prediction models are applied in order to identify seasonal or long-term trends and to predict future tourism demand or guest mix changes.
The Tyrolean (Austria) benchmarking tool Destinometer™ analyses representative survey data on customers’ satisfaction with the destination offer (e.g. accommodation, gastronomy, animation, wellness, sport, shopping, etc.), thereby offering various benchmarking functions. The first analysis approach supplements and combines this data with data on customers’ price satisfaction, thus, showing the perceived value-for money along the major destination value-chain areas (Fuchs, 2004a). The second analysis approach utilizes Kano’s (1984) factor structure model of customer satisfaction and employs Brandt’s (1988) dummy-based regression method to identify those destination activities and value-chain areas with the highest relative potential to delight the customer. The third and final analysis approach further adds supply-side data of the destination, such as output data (e.g. overnight stays, price levels for the various accommodation categories) and destination resource data (i.e. inputs), like the bed base, marketing costs, cost for energy, water and recycling, as well as aggregated wages for tourism personnel. By employing a data envelopment analysis (DEA), the relative efficiency level of the destination is gained, and optimal strategies to enhance customer satisfaction can be deduced, which in turn, also improve the aggregated level of destination efficiency (Fuchs & Höpken, 2005; Weiermair & Fuchs, 2007).
MANOVA WEBMARK (Kepplinger, 2006), a management information system for Austrian tourism stakeholders, supports tourism destinations, accommodation providers, attraction providers and ski lift operators in their operative and strategic decision making process. Tourism indicators, like arrivals, overnights, visits, and passengers/transportations, as well as guest feedback and satisfaction are gathered, either manually on a yearly or monthly aggregation level, or by online surveys. MANOVA WEBMARK supports the analysis of guest satisfaction (based on guests’ demographic characteristics, travel motives and consumption behaviour), performance indicators and trends, benchmarking as well as strategic analyses, like SWOT, or importance/performance analyses (IPA), respectively.
DestiMetrics (www.destimetrics.com) supports performance analyses and decision making for tourism destinations and accommodation providers in the United States and Canada. Detailed (i.e. non-aggregated) reservation data on different accommodation types (i.e. hotel and non-hotel facilities) are imported from property management companies and vacation rental units on a monthly basis, enabling detailed analyses of past and upcoming arrivals and overnights. DestiMetrics offers performance indicators, like occupancy rate, daily average room rate, or revenue per available room (RevPAR), interlinks them with contextual factors which are influencing tourism demand, like holiday information, and offers benchmarking functionalities for tourism suppliers within a destination as well as between tourism destinations.
t-stats (www.t-stats.co.uk), a MIS for tourism destinations, supports descriptive analyses and benchmarking functionality in the areas of accommodation (i.e. indicators, like occupancy rates, average room rate, RevPAR, etc.), attractions (i.e. indicators, like the number of visitors, expenditures per visit, etc.), general tourism statistics (e.g. arrivals, expenditures, car parking, visitors of information centres, visits to events and festivals, weather data, exchange rates, etc.), customer feedback and satisfaction (based on customizable surveys), and website hits (i.e. web navigation behaviour). Source data are mainly entered manually by tourism stakeholders or destinations on a monthly or daily aggregation level.
In the area of web content mining UGC analysis, like tourists’ comments in blogs or review platforms, in form of feedback aggregation, opinion mining or sentiment analysis, gains most attention in research as well as practical applications. Especially tourism destinations and accommodation providers can benefit from monitoring, collecting and analysing UGC and, thus, different software tools are available and already in practical use by tourism stakeholders, e.g. comprehensive social media monitoring and analysis tools like Trackur (www.trackur.com) and Alterian SM2 (Laine & Frühwirth, 2010), the social media search engine Social Mention, focussing on real-time aggregation of social media content and point-in-time social media search (www.socialmention.com), Tweettronics, enabling to track words and phrases on Twitter and execute competitive and trend analyses of product mentions and customer sentiments (www.tweettronics.com), or even basic tools, like Google Alerts (www.google.com/alerts). Kuttainen et al. (2012) evaluated tools and methods for collecting UGC related to tourism destinations and the current attitude of destination managers and stakeholders and argue that destination stakeholders certainly make use of software tools for analysing UGC, but still lack a well-structured and efficient analysis approach.
DMIS™ (Höpken et al., 2014) offers cross-process knowledge extraction and decision support for tourism destinations, thus, supporting information extraction, data warehousing and data visualization and analysis for all business process, relevant for a tourism destination, like information request, web navigation, booking, consumption, location tracking and customer feedback. A homogenous and comprehensive multi-dimensional data model enables the integration of heterogeneous data from different business processes and data sources into a central, process-overarching destination data warehouse. Conformed dimensions (i.e. uniformly defined across several processes) facilitate the identification of relationships and patterns across different business processes and the extraction of previously unknown and unavailable knowledge.
Back, A., Enkel, E., & V. Krogh, G. (2007). Knowledge networks for business, Springer, NY.
Bloom, J., (2004). Tourist market segmentation with linear and non-linear techniques. Tourism Management, 25(6), 723-733.
Brandt, R. (1988). How service marketers can identify value enhancing service elements. Journal of Services Marketing, 2(3), 35–41.
Bronner, F., & Hoog, R. (2011). Vacationers and eWOM: Who posts, and why, where, and what?. Journal of Travel Research, 50(1), 15-26.
BTS (2012). Transportation on-time performance database. Bureau of Transportation Statistics. Retrieved July 19, 2012, from http://www.transtats.bts.gov/.
Buhalis, D. (2006). The impact of ICT on tourism competition. In: Paptheodorou, A. (ed.), Corporate rivalry and market power: Competition issues in the tourism industry, IB Tauris, London, 143-171.
Chu, F.L. (2004). Forecasting tourism demand: A cubic polynomial approach. Tourism Management, 25, 209-218.
Fuchs, M. (2004a). Pilot Project DestinometerTM: The Tyrolean Benchmarking System (In German: “Pilotprojekt DESTINOMETER® - Benchmarkingsystem des Tiroler Tourismus”). Tourismus Journal, 7(1), 65-76.
Fuchs, M. & Höpken, W. (2005). Towards @Destination: A Data Envelopment Analysis based Decision Support Framework. In: A. Frew, ed. Information and Communication Technologies in Tourism. New York: Springer, 57-66.
Fuchs, M., & Höpken, W. (2009). Data mining in tourism (In German: „Data Mining im Tourismus“), Praxis der Wirtschaftsinformatik, 270(12), 73-81.
Fuchs, M., Höpken, W., Föger, A., & Kunz, M. (2010). E-business readiness, intensity, and impact – an Austrian destination management organization study, Journal of Travel Research, 49(2), 165-178.
Garrow, L., & Koppelman, F. (2004). Predicting air travelers’ no-show and standby behavior using passenger and directional itinerary information. Journal of Air Transport Management, 10, 401-411.
Goh, C., Law, R., & Mok, H.M.K. (2008). Analyzing and forecasting tourism demand: A rough sets approach. Journal of Travel Research, 46(3), 327-338.
Gräbner, D., Zanker, M., Fliedl, G., & Fuchs, M. (2012). Classification of customer reviews based on sentiment analysis. In: Fuchs, M., Ricci, F., & Cantoni, L. (eds.). Information and Communication Technologies in Tourism, Springer, Wien NewYork, 460-470.
Höpken, W., Fuchs, M., Keil, D., & Lexhagen, M. (2011). The knowledge destination – a customer information-based destination management information system. In: Law, R., Fuchs, M., & Ricci, F. (eds.). Information and Communication Technologies in Tourism, Springer, New York, 417-429.
Höpken, W., Fuchs, M. & Lexhagen, M. (2014). The Knowledge Destination – Applying Methods of Business Intelligence to Tourism Applications. In: Encyclopedia of Business Analytics and Optimization. Hershey, PA: IGI Global, 2542-2556.
Hueglin, C., & Vannotti, F. (2001). Data mining techniques to improve forecast accuracy in airline business. KDD 0l, San Francisco CA, USA, 438 - 442.
Iliescu, D.C. (2006). Analysis of U.S. airline passengers refund and exchange behaviour across multiple airlines. AGIFORS Anna Valicek Competition.
Jiang, N., & Gruenwald, L. (2006). Research issues in data stream association rule mining. SIGMOD, 35(1), 14-19.
Kano, N. (1984). Attractive Quality and Must-be Quality. The Journal of the Japanese Society for Quality Control, 14(2), 39–48.
Kasper, W., & Vela, M. (2011). Sentiment analysis for hotel reviews. Proceedings of the Computational Linguistics-Applications Conference, 45–52.
Kepplinger, D. (2006). Tourismus WEBMART - Interaktive Datenerfassung und Ergebnisdarstellung durch Online-Datenbanken. In: R. Bachleitner, R. Egger & T. Herdin, Hrsg. Innovationen in der Tourismusforschung: Methoden und Anwendungen. Wien: Lit Verlag, 63-76.
Kuo, R.J., Akbaria, K., & Subroto, B. (2012). Application of particle swarm optimization and perceptual map to tourist market segmentation. Expert Systems with Applications, doi: 10.1016/j.eswa.2012.01.208. Kuttainen, C., Lexhagen, M., Fuchs, M. & Höpken, W. (2012). Social media monitoring and analysis in tourism. In: E. Christou, D. Chionis, D. Gursory & M. Sigala, eds. Advances in Hospitality and Tourism Marketing & Management.
Laine, M. & Frühwirth, C. (2010). Monitoring Social Media: Tools, Characteristics and Implications. Business Information Processing, 51(2), 193-198.
Law, R., (1998). Room occupancy rate forecasting – A neural network approach. International Journal of Contemporary Hospitality Management, 10(6), 234-239.
Lawrence, R.D., Hong, S.J., & Cherrier, J. (2003). Passenger-based predictive modeling of airline noshow rates. SIGKDD ’03.
Lexhagen, M., Kuttainen, C., Fuchs, M. & Höpken, W. (2012). Destination Talk in Social Media: A Content Analysis for Innovation. In: E. Christou, D. Chionis, D. Gursory & M.
McGill, J.I., & Van Ryzin, G.J. (1999). Revenue management: Research overview and prospects. Transportation Science, 33(2), 233-255.
Min, H. & Emam, A., 2002. A DM approach to develop the profile of hotel customers. Contemporary Hospitality Management, 14(6), pp. 274-285.
Morales, D.R., & Wang, J. (2008). Passenger name record data mining based cancellation forecasting for revenue management. Innovative Applications of O.R., 202(2), 554-562.
Pitman, A., Zanker, M., Fuchs, M., & Lexhagen, M. (2010). Web usage mining in tourism – a query term analysis and clustering approach. In: Gretzel, U., Law, R., & Fuchs, M. (eds.). Information and Communication Technologies in Tourism. Springer, Wien NewYork, 393-403.
Pyo, S., Uysal, M., & Chang, H. (2002). Knowledge discovery in databases for tourist destinations. Journal of Travel Research, 40(4), 396-403.
Pyo, S. (2005). Knowledge map for tourist destinations. Tourism Management, 26(4), 583-594.
Sambamurthy, V., & Subramani, M. (2005). Information technologies and knowledge management, Management Information Systems Quarterly, 29(1), 1-7.
Schmunk, S., Höpken, W., Fuchs, M. & Lexhagen, M. (2014). Sentiment analysis – extracting decision-relevant knowledge from UGC. In: Z. Xiang & I. Tussyadiah, eds. Information and Communication Technologies in Tourism. Heidelberg: Springer, 253-265.
Smith, B. C., Leimkuhler, J. F., & Darrow, R. M. (1992). Yield management at American airlines, Interfaces, 22(1), 8-31.
Subramanian, J., Stidham, S., & Lautenbacher, C. J. (1999). Airline yield management with overbooking, cancellations, and no-shows. Transportation Science, 33(2), 147-167.
Vlahogianni, E.I., & Karlaftis, M.G. (2010). Advanced computational approaches for predicting tourist arrivals: The case of charter air-travel. Evans, T. (ed.). Nonlinear Dynamics, 309-324.
Walchhofer, N., Hronsky, M., Pöttler, M., Baumgartner, R., & Fröschl, K.A. (2010). Semantic online tourism market monitoring. In: Gretzel, U., Law, R., & Fuchs, M. (eds.). Information and Communication Technologies in Tourism. Springer, Wien NewYork, 629-641.
Wallace, M., Maglogiannis, I., Karpouzis, K., Kormentzas, G., & Kollias, S. (2004). Intelligent one-stop-shop travel recommendations using an adaptive neural network. Information Technology & Tourism, 6, 181-193.
Weiermair, K. & Fuchs, M. (2007). Productivity Differentials across Tourist Destinations - A Theoretical / Empirical Analysis. In: P. Keller & T. Bieger, eds. Productivity in Tourism - Fundamentals and Concepts for Achieving Growth and Competitiveness. Berlin: Erich Schmidt Verlag, 41-54.
Wöber, K.W. (1998). Global statistical sources: TourMIS: An adaptive distributed marketing information system for strategic decision support in national, regional, or city tourist offices. Pacific Tourism Review, 2(3), 273-286.
Wong J.-Y., Chen H.-J., Chung, P.-H., & Kao, N.-C. (2006). Identifying valuable travellers by the application of data mining. Asia Pacific J. of Tourism Research, 11(4), 355-373. Xia, J., Evans, F. H., Spilsbury, K, Ciesielski, V., Arrowsmith, C., & Wright, G. (2010). Market segments based on the dominant movement patterns of tourists. Tourism Management, 31, 464-469.
Cho, V., & Leung, P. (2002). Knowledge discovery techniques in database marketing for the tourism industry, J. of Quality Assurance in Hospitality & Tourism, 3(3), 109-131.
Fuchs, M., Chekalina, T., & Lexhagen, M. (2011). Destination brand equity modelling and measurement – a summer tourism case from Sweden, In: Tsiotsou, R. H., & Goldsmith, R.E (eds.), Strategic Marketing in Tourism Services, Emerald Group Publishing Ltd., London, 95-116.
Golfarelli, D. M., & Rizzi, S. (1998). The dimensional fact model: A conceptual model for data warehouses. Bologna, University of Bologna.
Harren, A. (2001). Multidimensional modeling language (MML) und multidimensional UML (MUML). Data-Warehouse-Systeme: Architektur, Entwicklung, Anwendung. dpunkt Verlag, 163-167.
Höpken, W. et al. (2012). Digitalizing Loyalty Cards in Tourism. New York, Springer, 272-283.
Höpken, W., Scheuringer, M., Linke, D. & Fuchs, M. (2008). Context-based Adaptation of ubiquitous Web Applications in Tourism. In: P. O’Connor, W. Höpken & U. Gretzel, eds. Information and Communication Technologies in Tourism. New York: Springer, 533-544.
Inmon, W.H. (2002). Building the data warehouse. Wiley & Sons, New York.
Kasavana, M.L., & Knutson, B.J (1999). A primer on software: warehousing, marting and mining hospitality data for more effective marketing decisions. Journal of Hospitality and Leisure Marketing, 6(1), 83-96. Kimball, R., Ross, M., Thronthwaite, W., Mundy, J., & Becker, B. (2008). The data warehouse lifecycle toolkit: Practical techniques for building data warehouse and business intelligence systems. 2nd Edition, Indianapolis, Indiana, Wiley.
Lau, K-N., Lee, K.-H., & Ho, Y. (2005). Text mining for the hotel industry. Cornell Hotel and Restaurant Administration Quarterly, 46, August, 344-362.
Liu, B. (2008). Web data mining. New York, Springer.
Lujan-Mora, S., Trujillo, J., & Song, I.-Y. (2006). A UML profile for multidimensional modeling in data warehouses. Data & Knowledge Engineering, 59(3), 725-769.
Manning, C. D. & Schütz, H. (2001). Foundations of Statistical Natural Language Processing, Cambridge: MIT.
Magnini, V., Honeycutt, E. Jr., & Hodge, S. (2003). Data mining for hotel firms: Use and limitations. Cornell Hotel and Restaurant Administration Quarterly, 44, December, 94-105.
Olmeda, I. & Sheldon, P. (2002). Data Mining Techniques and Applications for Tourism Internet Marketing. Travel & Tourism Marketing, 11(2/3), 1-20.
Sapia, C., Blaschka, M., Höfling, G., & Dinter, B. (1998). Extending the E/R model for the multidimensional paradigm. Proc. International Workshop on Data Warehouse and Data Mining (DWDM, in connection with ER'98), Nov 19-20, Singapore.
Wong, J.-Y., Chen, H.-J., Chung, P.-H. & Kao, N.-C. (2006). Identifying valuable Travellers by the Application of Data Mining. Asia Pacific J. of Tourism Research, 11(4), 355-373.
Zanker, M., Jessenitschnig, M. & Fuchs, M. (2010). Automated Semantic Annotation of Tourism Resources based on Geo-Spatial Data. Information Technology and Tourism, 11(4), 341-354.