SPATIAL LANGUAGE UNDERSTANDING : DEEP LEARNING, REASONING, AND EVALUATION

Spatial language understanding plays an essential role in human communication and perception of the physical world. It encompasses how people describe, understand, and communicate spatial relationships between objects and environmental entities, such as location, orientation, distance, and relative position. Spatial language processing presents numerous challenges, which often stem from the inherent ambiguity of natural language in describing spatial relations or the complexity of spatial reasoning to infer indirect relations, particularly when multi-hop reasoning is needed. Despite the remarkable achievements of pretrained language models (PLMs) in various natural language processing (NLP) tasks, their effectiveness in spatial language processing has not yet been thoroughly examined. Therefore, in this thesis, we first aim to evaluate these models’ performance in multi-hop spatial reasoning. Second, we intend to propose deep learning methods and models that can achieve better multi-hop spatial reasoning performance in both controlled and real-world settings. As a result, this thesis has four main contributions to the understanding and reasoning of spatial language. The first contribution is proposing novel question-answering benchmarks to evaluate the spatial reasoning capability of deep neural models. These benchmarks include complex and realistic spatial phenomena not covered in previous work, making it more challenging for state-of-the-art PLMs. The second contribution is an approach to generate large distance supervision for spatial question answering and spatial role labeling tasks to enhance the spatial language understanding of models. We design grammar and reasoning rules to automatically generate a spatial description of scenes and corresponding QA pairs. In this approach, we integrate a diverse set of spatial relation types and expressions, complemented by additional functions, to enhance the flexibility and extensibility of the data generation process. Further training PLMs on this data significantly improves their capability on spatial understanding, thereby enabling them to solve other benchmarks and external datasets better. Furthermore, the third contribution explores the potential benefits of disentangling the processes of information extraction and reasoning in neural models to address the challenges of multi-hop spatial reasoning. To explore this, we design various models that disentangle extraction and reasoning (either symbolic or neural) and compare them with state-of-the-art baselines with no explicit design for these parts. Our experimental results consistently demonstrate the efficacy of disentangling, showcasing its ability to enhance models’ generalizability within realistic data domains. Ultimately, the fourth contribution investigates the role of Large Language Models (LLMs) in multi-hop spatial reasoning tasks, focusing on their performance with and without in-context learning. Besides, we integrate LLMs as extraction modules within a pipeline for extraction and symbolic reasoning. While our case studies in controlled environments indicate the benefits of this idea, our experiments in real-world settings reveal that the model’s efficiency decreases due to escalating errors in the extraction process. We also utilize probabilistic logical reasoning and LLMs’ commonsense knowledge, improving the pipeline model’s performance in real-world applications. Despite these enhancements, the pipeline model continues to exhibit inferior performance compared to standalone LLMs.

Read