Tensors are fundamental building blocks in a wide range of high-performance computer applications including Artificial Intelligence (e.g., Deep Neural Networks) and Numerical Model and Simulation (e.g., Finite Element codes). High Performance Computer platforms are increasingly heterogeneous, with upcoming exascale platforms using heterogeneous processors (e.g., Intel Sapphire Rapids, Nvidia Grace, AMD EPYC processors) that include vector engines, matrix engines, and heterogeneous cores coupled with compute accelerators from a variety of vendors (e.g., Intel Xe HPC/Ponte Vecchio, Nvidia A100, AMD MI100). Developing applications that can compile and run efficiently across the wide range of computer environments is a massive challenge. Tools such as Kokkos and KokkosKernels help reduce the burden, but they currently lack a performance portable tensor library. The proposed project will develop an optimized KokkosTensor API that supports tensor transpose, tensor contractions, as well as optimization of tensor expressions involving tensor contraction and other element- wise tensor operators. The implementations will utilize the Kokkos primitives to enable performance portable codes, will leverage high-performance vendor libraries where they exist (e.g., cuTensor), and will include architecture-aware tuned implementations where appropriate (e.g., Nvidia, AMD, and/or Intel accelerators). The Phase I effort will develop a prototype KokkosTensor implementation that includes tensor contraction and tensor transpose. The library will support multiple GPU architectures and will be demonstrated using existing Kokkos finite element applications. KokkosTensor will be beneficial to many DOE and commercial applications, such as higher-order finite element discretizations that are being used in hypersonic reentry, mechanics, and fire simulations at Sandia. There are other examples in solvers for matrix-free multigrid methods and several other use cases across.