Practical Synthetic Data Generation

ebook Balancing Privacy and the Broad Availability of Data

By Khaled El Emam

cover image of Practical Synthetic Data Generation

Sign up to save your library

With an OverDrive account, you can save your favorite libraries for at-a-glance information about availability. Find out more about OverDrive accounts.

   Not today

Find this title in Libby, the library reading app by OverDrive.

Download Libby on the App Store Download Libby on Google Play

Search for a digital library with this title

Title found at these libraries:

Library Name Distance
Loading...

Building and testing machine learning models requires access to large and diverse data. But where can you find usable datasets without running into privacy issues? This practical book introduces techniques for generating synthetic data—fake data generated from real data—so you can perform secondary analysis to do research, understand customer behaviors, develop new products, or generate new revenue.

Data scientists will learn how synthetic data generation provides a way to make such data broadly available for secondary purposes while addressing many privacy concerns. Analysts will learn the principles and steps for generating synthetic data from real datasets. And business leaders will see how synthetic data can help accelerate time to a product or solution.

This book describes:

  • Steps for generating synthetic data using multivariate normal distributions
  • Methods for distribution fitting covering different goodness-of-fit metrics
  • How to replicate the simple structure of original data
  • An approach for modeling data structure to consider complex relationships
  • Multiple approaches and metrics you can use to assess data utility
  • How analysis performed on real data can be replicated with synthetic data
  • Privacy implications of synthetic data and methods to assess identity disclosure
  • Practical Synthetic Data Generation