News Article

DrivenData and HeroX Announce Winners of NIST's Synthetic Data Challenge
Date: Jun 16, 2021
Author: AIT News Desk
Source: Aithority ( click here to go to the source)

Featured firm in this article: Drivendata Inc of Denver, CO

DrivenData, the host of data science competitions that advance solutions for social good, and HeroX, the social network for innovation and the world's leading platform for crowdsourced solutions, announced the winners of the third and final sprint of the Algorithm Contest of the Differential Privacy Temporal Map Challenge, which was sponsored by the Public Safety Communications Research (PSCR) Division of the National Institute of Standards and Technology (NIST).

With a prize purse totaling $161,000 across the entire challenge, today's announcement of the third algorithm sprint offered $25,000 to the first place winner. The team, "N -- CRiPT", a group of differential privacy researchers from the National University of Singapore, Alibaba Group, secured first place. Their goal was to bring differential privacy into a practical setting. The second place winner was the "Minutemen" team, a group of differential privacy graduate students from the University of Massachusetts Amherst.

The focus of this prize challenge was to create synthetic data that preserves the characteristics of a dataset containing time and geographic information. Synthetic data has the ability to offer greater privacy protections than traditional anonymization techniques. Differentially private synthetic data can be shared with researchers, policy makers, and even the public without the risk of exposing individuals in the original data. However, the synthetic records are only useful if they preserve the trends and relationships in the original data.

Contestants of this challenge were charged with developing algorithms that de-identify datasets while maintaining a high level of accuracy. This ensures the data is both private and useful. Top contestants of the final sprint demonstrated algorithms that produce records with both more privacy and greater accuracy than the typical subsampling techniques used by many government agencies to release records.

The first sprint featured data captured from 911 calls in Baltimore, MD made over the course of one year. Participants in this sprint were tasked with developing de-identification algorithms designed to generate privatized data sets using the monthly reported incident counts for each type of incident by neighborhood. Winners were announced here.

The second sprint used demographic data from the U.S. Census Bureau's American Community Survey which surveyed individuals in various U.S. states from 2012 to 2018. The data set included 35 different survey features (such as age, sex, income, education, work and health insurance data) for every individual surveyed. Simulated longitudinal data was created by linking different individual records across multiple years, which increased the difficulty of protecting each simulated person's privacy. To succeed in this sprint, participants needed to build de-identification algorithms by generating a set of synthetic, privatized survey records that most accurately preserved the patterns in the original data. Winners were announced here.

The third sprint centered around taxi rides taken in Chicago, Illinois. Because the sprint focused on protecting the taxi drivers rather than just their trips, competitors needed to provide privacy for up to 200 records per individual driver, a very challenging problem. They were evaluated over 77 Chicago community areas. The deidentified synthetic data needed to preserve the characteristics of taxi trips in each community area, the patterns of traffic between communities, as well as the population characteristics of taxi drivers themselves (typical working times and locations). The top two winning teams were each able to produce synthetic data that provided very strong privacy protection and was also more accurate for analysis than data protected by traditional privacy techniques such as subsampling.

Challenge participants are now eligible to earn up to $5000 for creating and executing a development plan that further improves the code quality of solutions and advances their usefulness to the public safety community. Participants can also earn the Open Source prize, an additional $4000, by releasing their solutions in an open source repository. Winning solutions will be those that meet differential privacy after being uploaded to an open source repository.