您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。[O’Reilly]:如何利用合成数据推进人工智能和机器学习(2020) - 发现报告

如何利用合成数据推进人工智能和机器学习(2020)

AI智能总结
查看更多
如何利用合成数据推进人工智能和机器学习(2020)

Generating Data for AI Projects Khaled El Emam REPORT THE LEADER IN AI COMPUTING. Sign up to get thelatest AI news straightto your inbox. SUBSCRIBE Accelerating AI withSynthetic DataGenerating Data for AI Projects Khaled El Emam Accelerating AI with Synthetic Data by Khaled El Emam Copyright © 2020 O’Reilly Media, Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA95472. O’Reilly books may be purchased for educational, business, or sales promotional use.Online editions are also available for most titles (http://oreilly.com). For more infor‐mation, contact our corporate/institutional sales department: 800-998-9938 orcorporate@oreilly.com. Acquisitions Editor:Jonathan HassellDevelopment Editor:Melissa PotterProduction Editor:Daniel ElfanbaumCopyeditor:Sharon Wilkey Proofreader:Shannon TurlingtonInterior Designer:David FutatoCover Designer:Karen MontgomeryIllustrator:Rebecca Demarest June 2020:First Edition Revision History for the First Edition 2020-06-03:First Release The O’Reilly logo is a registered trademark of O’Reilly Media, Inc.Accelerating AIwith Synthetic Data, the cover image, and related trade dress are trademarks ofO’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent thepublisher’s views. While the publisher and the author have used good faith efforts toensure that the information and instructions contained in this work are accurate, thepublisher and the author disclaim all responsibility for errors or omissions, includ‐ing without limitation responsibility for damages resulting from the use of or reli‐ance on this work. Use of the information and instructions contained in this work isat your own risk. If any code samples or other technology this work contains ordescribes is subject to open source licenses or the intellectual property rights of oth‐ers, it is your responsibility to ensure that your use thereof complies with such licen‐ses and/or rights. This work is part of a collaboration between O’Reilly and NVIDIA. See our state‐ment of editorial independence. Table of Contents What Is Synthetic Data?2The Benefits of Synthetic Data5Learning to Trust Synthetic Data9Other Approaches to Accessing Data11Generating Synthetic Data from Real Data12Conclusions15 2.The Synthesis Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Data Synthesis Projects17 The Data Synthesis Pipeline21Synthesis Program Management27Best Practices for Implementing Data Synthesis28Conclusions30 3.Synthetic Data Case Studies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Manufacturing and Distribution34 Health Care36Financial Services43Transportation46Conclusions50 4.The Future of Data Synthesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Creating a Data Utility Framework51 Removing Information from Synthetic Data52 Using Data Watermarking53Generating Synthesis from Simulators54Conclusions55 DefiningSynthetic Data Interest in synthetic data has been growing quite rapidly over thelast few years. This has been driven by two simultaneous trends. Thefirst is the demand for large amounts of data to train and build arti‐ficial intelligence and machine learning (AIML) models. The secondis recent work that has demonstrated effective methods to generatehigh-quality synthetic data. Both have resulted in the recognitionthat synthetic data can solve some difficult problems quite effec‐tively, especially within the AIML community. Groups and busi‐nesses within companies like NVIDIA, IBM, and Alphabet, as wellas agencies such as the US Census Bureau, have adopted differenttypes of data synthesis to support model building, application devel‐opment, and data dissemination. This report provides a general overview of synthetic data generation,with a focus on the business value and use cases, and high-level cov‐erageof techniques and implementation practices.We aim toanswer the questions that a business reader would typically ask (andhas typically asked), but at the same time provide some direction toanalytics leadership seeking to understand the options available andwhere to look to get started. We show how synthetic data can accelerate AIML projects. Someproblems that can be tackled by using synthetic data would be toocostly or dangerous (e.g., in the case of training models controllingautonomous vehicles) to solve using more traditional methods, orsimply cannot be done otherwise. AIML projects run in different industries, and the multiple industryuse cases that we include in this report are intended to give you aflavor of the broad applications of data synthesis. We define anAIML project quite broadly as well, to include, for example, thedevelopment of software applications that have AIML components. The report is divided into four chapters. This introductory chaptercovers basic concep