The biopharmaceutical industry is facing fast growing data movement challenges, e.g. the latest powerful genome sequencers can spit out multiple TBs of data in each run; in an active lab, the amount of personalized medicine research data is growing by multiple TBs/day. The large entities of the industry are often distributed, even global in nature. Multi-site collaboration is mandatory to make research progress. These, plus other needs, such as disaster recovery, make the problems even more acute. The existing data transfer solutions, both commercial and free, have shown to be woefully inadequate. So a new way of thinking and doing becomes mandatory. Zettar has introduced a hyperscale data distribution software platform which has conclusively shown in the environment of a large global biotech company to be 10X superior to all existing data transfer solutions in terms of performance, ease-of-use, robustness, and scalability. With the two proposed non- incremental innovations, it will be valuable to all the distributed data-intensive industries. During Phase I, Zettar will conduct research in 1) Parallel streaming, integrating new results with reference data in real time at very high-speed, and 2) A machine-learning (ML) based approach to optimize any data transfer set up. This helps manage data across distributed environments and/or heterogeneous architectures. The project will enable and facilitate nearly real-time analysis even when petascale data sets must be em- ployed in a distributed manner. For example, this is valuable in the efective and eecient use of output data from modern sequencing machines and electron-microscopy devices, among others.