全国大学生数据挖掘竞赛 优秀作品 作品名称:基于电商平台家电设备的消费者评论数据挖掘分析荣获奖项:二等奖作品单位:韩山师范学院作品成员:何钢浩高童强韦振宇指导教师:刘波 基于电商平台家电设备的消费者需求及产品数据挖掘分析 摘要 本文通过对各电商平台的评论用八爪鱼软件进行爬虫式爬取净水器消费数据,运用情感分析,统计分析以及多元回归模型等多种方法从多个角度进行消费者的个性化需求分析,并分析各品牌的特点及现有的电商平台的情况,得出各品牌及电商运营优势及不足,最后从消费者的消费途径分析得出一些消费习惯,针对电商及品牌给出一些意见跟建议。 针对问题一,用网络爬虫工具爬取消费者对净水器的选择及评价,运用R语言进行关键词提取各产品主要属性,绘制词云,找出消费者关心的个性属性为价格,售后服务,质量,物流,品牌,利用数理统计对爬取的数据统计汇总得出各属性的高峰属性值,逐步分析,得出消费者的普遍的个性化需求中价格接受尺度为2500之间,净水级别4级以上的免费安装产品。 针对问题二、三、四,采用情感分析的方法计算各品牌的属性的情感分,绘制表格,得到各属性分值表,据此绘制属性条形图进行比较,可以看出,各品牌的特点显著不同,但总体而言,消费者普遍关心的产品属性,这里品牌优势高于同行其他品牌。又由于不同消费者的不同消费需求,销量也紧随其后,这些不同的属性差异形成了个品牌之间的各自特色。根据问题一得出的个性化需求结果,可以发现3M跟美的的销量优势与它的属性跟消费者的个性化需求吻合优势离不开,这也造成了该类产品的销量领先于其他产品,从这一方面看,各品牌的属性化差异是否与消费者的个性需求对接很大程度上影响了消费者的消费行为。 针对最后一个问题,基于多元回归模型分析消费者的购买行为,其拟合的效果较好,一般而言,消费者更多地根据产品的质量及售后服务选择品牌,从购买途径来看,消费者网上购物搜索的关键词以及购买时的关注点与各品牌间的差异化特色紧密关联,而消费者购买后的使用及评价又会在以后反过来影响消费者的个性化需求。 关键词:文本挖掘,情感分析,逐步回归,个性化需求,R语言 Abstract The comment on each business platform with Octopus software for crawlercrawling water purifier consumption data, using sentiment analysis, clusteringmodel and statistical analysis and other methods from multiple angles werepersonalized consumer demand analysis, and analyze the characteristics of the brandand the existing business platform, andcome to the conclusion that the brand andbusiness operational advantages and disadvantages, finally from the consumer'sconsumption path analysis draw some consumption habits, the business and brand givessome opinions and suggestions. To solve a problem, with web crawler tool climbing consumers of water purifierselection and evaluation, using R language to extract keywords from the main productattributes, the word cloud rendering, find out the personality attributes of consumersconcerned about the, by means of mathematical statistics and statistics to take up thesummary that the attributes of the peak values, clustering model is established on thebasis of the analysis, concluded that the consumer generally personalized needs. Aiming at theproblem of the second, the third and the fourth by sentiment analysismethod for calculation of properties of various brands of emotion, drawing table, eachattribute value table, thus rendering the properties bar charts are compared, as canbe seen, significantly different from the characteristics of the brand, but overall,consumers are generally concerned about product attributes, here brand strengths higherthan the counterparts of other brands.Because of the different consumer's differentconsumer demand, the sales volume is closely followed, these different attributedifference forms the respective characteristic of the brand.According to problemscome to the demand for personalized results, it can be foundwith the sales advantageand its propertywith the consumer demand for personalized identical advantageinseparable. This is caused by the sales of the products leading to other products,from this, the brand attribute differences whether personality demand docking withconsumers, to a large extent, affected the consumer behavior. For one last question, analysis of consumer purchase behavior, from theperspective of the means to buy, purchase of focus and consumer online shopping searchkeywords and the brand differentiation features closely related, and consumer use andevaluation after purchase will in later come to affect the consumer's personalizeddemand. Keywords:sentiment analysis, stepwise regression, clustering analysis, personalized needs,R language 目录 1.问题重述...............................................................................................1 1.1.问题背景.......................................................................................................................11.2.问题提出.....................................................................................................................1 2.问题分析...............................................................................................1 2.1.概论.............................................................................................................................12.2.模型建立分析.............................................................................................................2 4.模型的建立与求解................................................................................3 4.1.数据预处理.................................................................................................................34.2.模型准备.....................................................................................................................44.2.1.文本情感分析....................................................................................................44.2.2.回归模型............................................................................................................64.3.模型的建立与求解.....................................................................................................64.3.1.个性化需求分析................................................................................................64.3.2