This proposal describes the development of a generalized toolkit that enables improved and automated mapping of partitioned subdomains onto available distributed compute nodes for applications operating within pure-MPI or hybrid-MPI parallel runtime environments. This toolkit may be invoked either as an independent pre-processing step or as a dynamic library, improving an applications real-time domain decomposition and placement decisions based on available hardware nodes. The purpose of using this toolkit is to significantly reduce runtime bottlenecking costs incurred from message passing data inefficiently across modern compute platform interconnect topologies. Our Phase I effort will start using available information on hardware node organization for HPC platforms. Software libraries will be developed to assist standard partitioning algorithms to optimize subdomain organization so that communication costs across nodes are minimized. Topology mapping will be incorporated into partition decision-making to supplement overall workload balancing strategy. A ping test routine will be constructed to help identify current system communication latency costs during simulation runtime. Several partitioning strategies, varying decomposition strategy and hybrid parallelism, will be evaluated as an optimization to minimize inter-node communication traffic. Communication profiling information will also be collected via TAU profiling, display, and database management to help develop an optimized topology awareness strategy.