您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [TIPDM]:第三届挑战赛A2-基于电商平台家电设备的消费者需求及产品数据挖掘分析 - 发现报告

第三届挑战赛A2-基于电商平台家电设备的消费者需求及产品数据挖掘分析

2015-11-23 TIPDM 米软绵gogo
报告封面

全国大学生数据挖掘竞赛 优秀作品 作品名称:基于电商平台家电设备的消费者评论数据挖掘分析荣获奖项:一等奖作品单位:东南大学作品成员:杨情吴俊康指导教师: 基于电商平台家电设备的消费者评论数据挖掘分析 摘要: 在传统的市场中,销售人员与消费者是面对面的。在整个销售的过程中,有经验的销售人员可以敏锐的捕捉到消费者对商品的需求及对各产品的比较,从而总结出出消费者的消费模式。而在电商环境下,交易并没有人与人的互动,有的只是消费者在电商平台下留下的消费痕迹。这些痕迹描述了消费者和他们的需求的联系。这就需要我们运用数据挖掘的手段去探究这些足迹与消费者需要和愿望之间的关系。究竟如何实现这两者之间的模式挖掘正是本文的目的。 本文的研究主要基于消费评论数据,首先运用ICTCLAS分词工具对中文文本进行分词;为实现计算机对中文词语的理解,我们运用word2vec将中文词语转化为对应的词向量。最后对词向量进行分类和聚类,其中小部分数据基于支持向量机的好评差评分类正确率高达95%,较好的实现了对评论数据的挖掘。 研究发现用户购买净水器的理由是净水器能够改善用户的水质,保障用户的身体健康。同时,由于购买净水器而不需要再购买瓶装水为用户节省了时间和金钱。安装的方便性是所有用户购买净水器时关注的焦点。服务,净水质量,以及产品的方便性也都是优先关注点。同时我们也发现不同购物平台上消费者的个性化需求,例如国美网站上的用户关心产品是否是正品;而苏宁上的用户则更为关注净水后的水质,而淘宝用户对卖家的发货及物流速度有比较高的期望。同时本文也对各大品牌进行了对比并提出了其主要卖点和需要加强的方面,3M产品在用户关注的各方面都有较好的表现,用户较多,品牌效应较强,且能提供规范的服务;沁园在产品外观上具有明显优势且有专门的服务专员,但产品安装上需要用户提取预定,同时净水过程中产生的废水较多。特浩恩在四个品牌中不具备较强的竞争力,特别是快递服务有待加强。道尔顿价格公道,服务热忱,回头客较多,但受众面小,需要加强宣传力度。 关键词:中文分词,词向量,支持向量机,K-均值聚类 The data-mining analysis based on consumers’ feedbackon household appliances in e-commerce platform Abstract: In the traditional market, sales staff and consumers are face to face. In the process of the whole sales,sales staff in experience can be keen to capture consumer’s demand for goods, make comparison of variousproducts,and figure out their consumption pattern.Instead,electricity trading lead to no humaninteractionbut onlyexpense calendar of consumption in the e-commerce platform , which makes importantconnection between consumers and their demand. It is, therefore, worthy to exploring the tracks andtherelationship between the consumer’s demands by data-mining methods, and the objective of this article wasto realize the data mining and analysis between these two patterns. In this article, segmentation tools (ICTCLAS) was primarily applied for Chinese text segmentationbased on the consumers’ feedback data. Then word2vec software was employed to convert Chinese wordsto the corresponding term vectors for Chinese word identification by computer. Finally, the vectorclassification and clustering was implementedfor data mining, and results showed that the classificationaccuracy of segmental reviewer datas based on the segmented vector terms was as high as 95%. We found that the reason consumers buyingwater purification products is that water purifierscouldimprove the user's water quality and assure their health. Besides, the possession of a water purifier couldhelp them save lots of money and time because of no bought of bottled water. The convenience ofinstallation was the most focus of attentionwhen coming to a water purifier, and different products demandwas shown in different shopping platform, such as the careness of product quality in Gome, the attention towater quality after purfication in Suning, and the expectation of quicker shipping and logistics speed inTaobao. Meanwhile, the merits and demerits of different brands of water purifier were also compared.3Mhasgoodperformance in many aspects, and it has manyusers,strong brand effect, and standardizedservices;Qinyuan has obvious advantage in the product appearance and special service commissioner, butitrequires the user to extract scheduled on product installation and produces wastewater at the same time.Tehaoenin four brands doesnot have strong competitiveness, especiallytheexpress service needs to bestrengthened.Daltonownsfair prices, service enthusiasm,so it attractsmore repeat customers, butit needsto improve brandpopularity Key words:Chinese word segmentation,word vector,support vector machine (SVM),Kmeansclustering 目录 1.挖掘目标...............................................................................................1 2.总体流程...............................................................................................1 3.1.剔除空格及标点符号.................................................................................................23.2.剔除默认评论及水军评论.......................................................................................23.3.剔除停用词...............................................................................................................2 4.1.中文分词方法概述.....................................................................................................44.2.基于ICTCLAS的分词..................................................................................................54.3.分词结果示例.............................................................................................................74.4.本章小结.....................................................................................................................8 6.文档模型..............................................................................................11 6.1.文档模型概述...........................................................................................................116.2.基于向量空间模型的词向量...................................................................................136.3.向量模型及应用.......................................................................................................18 7.词向量聚类及分类.....