Synthetic Data Generation - Service Excellence by Kinetic Vision

Lack of synthetic data generation datasets is often cited as the major development obstacle for deep learning systems. Creating and labeling sufficient data from physical testing and other non-algorithmic methods, such as photography, can be extremely time consuming or impossible. The problem is further compounded when the product or process being studied is under development and no physical data exists, or if the items of interest are rare and underrepresented in the physical dataset.

We create data sets using a variety of proprietary synthetic data generation methods which virtually eliminates machine learning development obstructions. The synthetic data sets can be multi-class and are developed for both regression and classification problems. Data annotation is automatic, zero cost, and 100% accurate. Our synthetic data generation data set are provided using a database and labeling schema designed for your requirements. Contact us to discuss your particular machine learning data needs.

Learn more about synthetic data at Wikipedia.

Synthetic Data Generation Used for Retail Merchandising Audit System

In this example created by Deep Vision Data, a deep learning model based on the ResNet101 architecture was trained to classify product SKU’s, stock outs and mis-merchandised products for a retail store merchandising audit system. The model was trained with 20,000 synthetic product images using a 50-50 split of structured and unstructured domain randomized subsets and an 80-20 training/validation data split. Model validation was also completely done with 100% synthetic training data. The test set was comprised of actual photos; a sampling of results images are shown to the right.

Domain randomization (DR) is a powerful tool available with synthetic data: it enables the creation of data variability that encompasses both expected and unexpected real-world input, forcing the model to focus on the data features most important to the problem understanding. DR is much more costly and difficult to implement with physical data. For example, the creation of a dataset of thousands of products where each product is shown in thousands of poses on dozens of backgrounds requires many millions of labeled images. That dataset is easily generated synthetically, while virtually impossible to create using physical product photos. In addition, the distribution of rare (but possibly very important) events or conditions is easily controlled, unlike physical data where rare occurrences are by definition poorly represented.

Synthetic Data Generation Methods

Our methodology for synthetic data generation is a matrix of multiple input sources and software systems. The model is populated with real or simulated data from CAD/engineering simulations, computer graphics code, real-time simulator data, industrial computed tomography scanning, and custom software applications. Our data scientists and engineers are able to use the data capture to create multi-million synthetic data sets with full annotation utilizing our state-of-the-art NVIDIA AI Server. To help automate the data creation process, Kinetic Vision has developed AiVision™ Generate – a proprietary software system that creates data sets with full variability control including randomization, exposure compensation, and scene parameter controls. For more information watch this video on AiVision™ Generate.

Synthetic data generation.

Synthetic data generation services by Kinetic Vision