您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [TIPDM]:第一届挑战赛A4-航空客运信息挖掘 - 发现报告

第一届挑战赛A4-航空客运信息挖掘

2013-04-22 TIPDM Zt
报告封面

题目航空客运信息挖掘队长王军晓成员周雨来丁铖学校(院系)大连海事大学(信息科学技术学院 2010 级)指导教师冯士刚老师完成时间4 月 15 日 综合评定成绩: 评委评语评委评语评委评语评委评语: 评委签名评委签名评委签名评委签名:::: 航空公司常旅客信息挖掘航空公司常旅客信息挖掘航空公司常旅客信息挖掘航空公司常旅客信息挖掘 摘摘摘摘要要要要:::: 提高航空客运的上座率既能使航空资源得到充分利用,更能显著的增加航空公司的效益。为了实现这一目标,我们从大量航空公司会员的会员数据出发,通过数据挖掘技术,分别建立用户细分、用户价值评估和流失预测模型,区分客户群,并提出相应的营销策略,从而达到提高上座率和效益的目标。 首先是建立客户细分模型。我们根据文献并对已有数据进行预处理,筛选出 5 个 L、R、F、M、C 五个指标作为航空公司客户细分的核心维度,利用 LRFMC 聚类分析法进行用户分群与初步评分。这个方法利用层次分析法(AHP)计算各核心维度的权重,再对数据进行标准化,并用 SPSS 软件实现 K-means 聚类法将所有客户划分为 32 种客户类别及其类型特点。接着利用权重计算各客户群综合得分,从而将航空公司的客户群体划分成重要保持客户、重要发展客户、重要挽留客户、忠诚型一般客户、低价值客户等五个级别的客户群。 然后是建立客户价值评估模型。我们将客户细分模型所得的 5 个客户群的数据进行预处理,选择对客户价值影响最大的 14 个属性作为主成分分析法中所选用的要素。通过 SPSS 软件,对数据进行因子分析和降维处理,根据所得的数据可知,这 14 个属性可由 2 个主成分来综合表示,同时得出了各属性在主成分的组成中所占权重。通过数据标准化,结合权重,计算出了各用户群的综合得分,作为价值排名依据。 接着是建立客户流失模型。我们定义了客户回头率这个概念(客户第二年乘机次数与第一年乘机次数比值),以 0.5 和 0.8 为两个临界值将老客户划分为流失客户、准流失客户、未流失客户三种客户类型,并选取一些维度及其衍生出的维度,使用决策树、神经网络两种方法进行客户流失模型的建立,找出了影响客户流失的关键性因素,并用将两种方法进行对比,最终确定了影响客户流失的几个比较重要因素有平均折扣率、单位里程票价和单位里程所得积分。 最后,根据已建立的客户细分、价值评估和流失模型所得的结论,针对各个不同的客户群提出不同的服务和营销策略,吸引客户乘坐航班,来提高上座率和效益。 关键词关键词关键词关键词:::: LRFMC聚类分析法主成分分析SPSS决策树神经网络 Airline frequent flyer information mining Abstract: Improving the attendances of scheduled flights can not only make full use of aviation resources, butcan also increase the benefit of the airline effectively. In order to meet this goal, we build CustomerSegmentation Model ,Customer Value Model and Customer Churn Model, using mass data of members ofthe airline, by data mining, to distinguish customer bases and put forward the corresponding marketingstrategies. So that ,the airline is able to improve the attendences and benefit. Firstly, the Customer Segmentation Model must be built. According to the references, we preprocessthe data first, and then screen out 5 indexes(L, R, F, M and C) , as core dimensions of customersegmentation.We use LRFMC-cluster analysis to segment customers and preliminary evaluate them.Thismethod calculates the weights of several core dimensions by AHP, standardize the data, and then implementK-means clustering method by SPSS, to segment all the customers into 32 groups.We can also know thecharacteristics of all the 32 groups.At last,we calculate the composite scores using the weights of coredimensions,and divide the customer groups of the airline into 5 levels of customer bases:Important-maintaining clients, Important-developinging clients, Important-retaining clients, general clientsand low-value clients. Secondly, we build the Customer Value Model. We preprocess the data, coming from the result of theCustomer Segmentation Model, and choose 14 attributes which influence customer value mostly as factorsusing in the principal componsis. By using SPSS, factor analysis and data reduction are carried out.Fromthe result which we get, 2 principal components can substitute for the 14 attributes and at the same time ,theweightsof all the attributes taking in the 2 principal components can also be known.By datastandardization, the composite scores of all the 5 customer bases can be calculated ,using the weights whichare known.The composite scores are what we use to evaluate the customers. Thirdly, Customer Churn Model is to be built. We define the concept of customer retention (the ratioof second year’s flight time and first year’s time of the customer), and divide regular customers into 3 types:lost customers, prospective lost customers and no-lost customers, using 0.5 and 0.8 as critical values. At thesame time, we choose some dimensions and the dimensions deriving from these dimensions which havebeen known to build Customer Churn Model with the two methods of decision trees and neural networks.So that, we can find out the key factors which influence customer churn. And after the comparison of thesetwo methods ,we finally find out that the relatively important factors which affect customer churn areaverage discount factor, unit mileage fare and unit mileage integral. In the end, according to the conclusions of the three models, different services and marketingstrategies are put forward for different customer groups. With these strategies, it’s possible that morecustomers will take flights of this airline and attendences will be improved as well. Key words: 目录目录目录目录 1.挖掘目标挖掘目标挖掘目标挖掘目标.................................................................................................................. 7 3.数据抽取数据抽取数据抽取数据抽取.................................................................................................................. 7 3.1数据规约技术数据规约技术数据规约技术数据规约技术................................................................................................................................................... 8 4.1数据清理数据清理数据清理数据清理.......................................................................................................................................................... 8 4.2数据集成数据集成数据集成数据集成........................................................................................................