您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。[微软]:想降低标注成本?GPT-3来帮忙 - 发现报告

想降低标注成本?GPT-3来帮忙

2021-08-30微软有***
AI智能总结
查看更多
想降低标注成本?GPT-3来帮忙

Shuohang WangYang LiuYichong XuChenguang ZhuMichael Zeng Microsoft Cognitive Services Research Group {shuowa,yaliu10,yicxu,chezhu,nzeng}@microsoft.com of processed tokens1. Thus, an interesting prob-lem arises: instead of directly deploying GPT-3 fordownstream tasks, how can we leverage GPT-3 toachieve a more cost-effective and efficient trainingof other models? Abstract Dataannotation is a time-consuming andlabor-intensive process for many NLP tasks.Although there exist various methods to pro-duce pseudo data labels, they are often task-specific and require a decent amount of labeleddata to start with. Recently, the immense lan-guage model GPT-3 with 175 billion param-eters has achieved tremendous improvementacross many few-shot learning tasks.In thispaper, we explore ways to leverage GPT-3 asa low-cost data labeler to train other models.We find that, to make the downstream modelachieve the same performance on a variety ofNLU and NLG tasks, it costs 50% to 96%less to use labels from GPT-3 than using la-bels from humans. Furthermore, we propose anovel framework of combining pseudo labelsfrom GPT-3 with human labels, which leads toeven better performance with limited labelingbudget. These results present a cost-effectivedata labeling methodology that is generaliz-able to many practical applications. In this paper, we employ GPT-3 to label unan-notated data to train smaller models which are de-ployed for inference. Although the data labeled byGPT-3 is usually more noisy than human-labeleddata, the process is much cheaper, faster and gen-eralizable to multiple tasks. For example, for theStanford Sentiment Treebank (SST-2) task (Socheret al., 2013), it takes as low as 0.002 dollars onaverage to use the GPT-3 API to annotate onelabel. However, it costs 0.11 dollars to label aninstance on crowd-sourcing platforms. Plus, theGPT-3 API can label data non-stoppingly at a muchfaster speed than human labelers.In our extensive empirical analysis, we find that to make in-house models (e.g. PEGASUS (Zhanget al., 2020), RoBERTa (Liu et al., 2019)) toachieve the same performance on various NLUand NLG tasks, data labeled by GPT-3 incurs amuch lower cost (e.g. 50%-95% lower) than datalabeled by humans, especially in low-resource set-tings. Moreover, we also find that these in-housemodels trained with data labeled by GPT-3 canoutperform GPT-3 itself under the fewshot setting,which we give theoretical justifications.In addition to using labeled data from a single 1Introduction Data always plays a crucial role in developing ma-chine learning models. However, collecting human-labeled data is a costly and time-consuming pro-cess, especially in multi-task scenarios. With thesuccess of pre-trained models (Zhang et al., 2020;Raffel et al., 2020; Liu et al., 2019; Devlin et al.,2019) on unlabeled data, the performance of mod-els under few-shot and zero-shot settings has beengreatly enhanced. In particular, the large-scale lan-guage model GPT-3 (Brown et al., 2020), with 175billion parameters, is the state-of-the-art few shotlearner on many NLP tasks. source, we explore ways to smartly assign unla-beled data to different labelers, i.e.GPT-3 andhuman, under a fixed budget. We frame this as adual supervision problem (Jung and Shim, 2020)with cost and budget constraints. In detail, we triedmixing data labeled by GPT-3 and humans withdifferent ratios: 25%, 50%, 75% of the budget.Moreover, we propose an active labeling strategyto have humans re-annotate data labeled by GPT-3with the lowest confidence scores. Both strategies However, GPT-3 is constrained on its immensemodel size and requires a large amount of resourceto be deployed for real applications.Moreover,GPT-3 doesn’t provide a free lunch, and its pub-lic API has a charge correlated with the number manifest clear improvement over using a singlesource of labeler. We conduct comprehensive empirical analysisof our proposed cost-effective labeling strategieson 9 NLP tasks, including text entailment (Da-gan et al., 2005; De Marneffe et al., 2019), sen-timent analysis (Socher et al., 2013), topic clas-sification (Zhang et al., 2015), answer type clas-sification (Voorhees and Tice, 2000), summariza-tion (Rush et al., 2015; Narayan et al., 2018), andquestion generation (Rajpurkar et al., 2016). Weshow that our labeling strategy can significantly re-duce labeling cost while achieving the same perfor-mance with human-labeled data. For instance, ourmethod saves 96% cost on the sentence classifica-tion task SST-2, 93.8% cost on the summarizationtask Gigaword, and 50-75% cost on other tasks. Table 1: Cost ($) per GPT-3 and Human labeling. #Tokis the number of tokens on average from the corre-sponding dataset. For different GPT-3 few-shot label-ing strategies, it charges differently based on the se-quence length. The final cost per label for n-shot GPT-3is#tok×4×10−5×(n+1), where4×10−5is the costGPT-3 charged per token. For human labeling, it costs$0.11 per 50 input tokens with a min