This Phase I proposal is for evaluating
the feasibility of a system for (i) automatic extraction and representation of important attributes of image and video data, and (ii) inference of high level scene descriptions from these attributes. Specifically, the proposed work is aimed at exploring the feasibility of methodically incorporating into a system the capabilities to: (1) discover any salient patterns prevalent in image and video data, (2) learn definitions of complex visual concepts characterized by increasingly complex patterns, (3) recognize their occurrences in previously unseen data, (4) search and retrieve all parts of a video database containing occurrences of previously learned concepts, and (5) provide a summary of a given image/video in terms of spatial and chronological relationships that the concepts recognized in the data exhibit. These capabilities are of significant general importance as they can act as the foundation of solutions to problems in a broad range of contexts such as military, law enforcement, commerce, and the internet usage. Synergistic integration into a single system as proposed will significantly amplify their individual strengths, to yield a powerful tool for information processing and knowledge representation for automated image understanding.
Keywords: Object Recognition, Category Learning, Knowledge Representation, Semantic Search, Perceptual Content, Video Syntax, Conceptual Summarization, Video Grammar