SBIR-STTR Award

Open Machine Learning Competitions with Private Data
Award last edited on: 10/20/21

Sponsored Program
SBIR
Awarding Agency
NSF
Total Award Amount
$256,000
Award Phase
1
Solicitation Topic Code
IT
Principal Investigator
Peter Bull

Company Information

Drivendata Inc

1644 Platte Street Suite 400
Denver, CO 80202
   (774) 276-1299
   info@drivendata.org
   www.drivendata.org
Location: Single
Congr. District: 01
County: Denver

Phase I

Contract Number: 2038067
Start Date: 8/1/21    Completed: 3/31/22
Phase I year
2021
Phase I Amount
$256,000
The broader impact of this Small Business Innovation Research (SBIR) Phase I project will be to expand access to artificial intelligence (AI) talent and spur innovation to solve hard problems while protecting privacy. Machine learning and AI are bringing transformational change to governments, private companies, and social sector organizations. Yet in the coming years, innovation will be hamstrung by limited access to AI talent. Open innovation, such as machine learning (ML) competitions, provides governments and firms the ability to tap into a global talent pool to solve some of their most pressing and vexing challenges. Yet there is currently an immense barrier to running these competitions: the data must be made available to participants, which can preclude running a competition if the associated data are too sensitive to release due to concerns about privacy, security, or confidentiality. With data talent in increasingly high demand, government agencies, companies, and others have demonstrated a willingness to invest in this fashion. The proposed project develops a method to maintain data privacy at scale. This Small Business Innovation Research (SBIR) Phase I project will develop an end-to-end competition system that provides privacy guarantees for data used to build crowdsourced algorithmic solutions. Open ML challenges typically work by providing participants with training data to learn underlying patterns, then evaluating resulting predictions on unlabeled test data. For many important problems, making training data available in this way violates concerns about privacy or enables abuse. The critical gap is preserving the privacy of training data while enabling participants to build models that can learn from it. This project will bring together recent advances in three of the most promising approaches in privacy-preserving data analysis: homomorphic encryption, federated learning, and differential privacy. Each technique will be developed and tested in a dedicated challenge structure with two core properties: 1) to preserve the privacy of sensitive data; and 2) to ensure competitors are able to get feedback on submitted models during the competition to inform algorithm improvements. Each competition system will result in a set of performance measures, including benchmarked algorithm performance and data privacy guarantees, to assess system feasibility. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Phase II

Contract Number: ----------
Start Date: 00/00/00    Completed: 00/00/00
Phase II year
----
Phase II Amount
----