Kate Sanders

ksande25@jhu.edu

Google Scholar

LinkedIn

CV

Site last updated August 2025

I am a final year Ph.D. student at the JHU Center for Language and Speech Processing at Johns Hopkins University. I am advised by Professor Benjamin Van Durme. During my Ph.D., I have been researching transparent and reliable reasoning, multimodal understanding, and uncertainty.

I spent this past summer as an intern at AWS designing reward functions to train reasoning models, where I was lucky to be mentored by Nathaniel Weir. The previous summer, I co-organized and facilitated the 10-week SCALE 2024 Summer Research Workshop at the HLTCOE.

Before starting my Ph.D., I received my BA in Computer Science from UC Berkeley where I conducted AI and robotics research at the UC Berkeley AUTOLab and was advised by Professor Ken Goldberg.

Research areas

Please visit my Google Scholar profile for a full list of publications.

Benchmarks for complex visual event understanding

I have spent the last few years working with collaborators to develop ways of thinking about and formulating events in visual data. This has inspired the development of a collection of benchmarks designed to evaluate agents' abilities to recognize and explain these events across different languages and cultures. This effort began as a small video retrieval task that was then extended to a full event extraction benchmark. We emphasize the notion of "partially-defined events" in this paper: In visual content, we argue that it is critical to model the epistemic and aleatoric uncertainty associated with identifying events more commonly described through natural language. More recent extensions to this line of work include a massive video retrieval dataset built on these initial benchmarks that better mirrors datasets developed by the information retrieval community and a report generation benchmark that highlights the difficulty of piecing together multiple videos that only together help to describe some partially-defined event.


  1. Sanders, K.*, Etter, D.*, Kriz, R.*, Van Durme, B. MultiVENT: Multilingual Videos of Events with Aligned Natural Text. NeurIPS 2023 D&B.
  2. Sanders, K.*, Kriz, R.*, Etter, D.*, Recknor, H., Martin, A., Carpenter, C., Lin, J., Van Durme, B. Grounding Partially-Defined Events in Multimodal Data. EMNLP 2024 Findings.
  3. Kriz, R.*, Sanders, K.*, Etter, D., Murray, M., Carpenter, C., Recknor, H., Blasco, J., Martin, A., Yang, E., Van Durme, B. "MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval. CVPR 2025.
  4. Martin, A., Kriz, R., Walden, W., Sanders, K., Recknor H., Yang, E., Ferraro, F., Van Durme, B. WikiVideo: Article Generation from Multiple Videos. 2025 arXiv preprint.
  5. Sanders, K., Van Durme, B. (2024). A Survey of Video Datasets for Grounded Event Understanding. CVPR 2024 Workshops.

Interpretable and factually accurate reasoning

I am very lucky to have been able to spend a lot of time discussing notions of factuality and transparency with my labmates. I collaborated to extend ideas in informal logic to assess the quality of compositional entailment in neuro-symbolic reasoning systems, and extended these reasoning systems to transparently verify content in the video-language domain. I built on this framework to incorporate uncertainty modeling (described further in "benchmarks for complex visual event understanding" above) and generalize to modalities beyond vision and text. I also worked with labmates to develop a subclaim selection comoponent to improve the trustworthiness of factuality metrics like FActScore, and evaluate LLM's abilities to verify claims in scientific reports.

More publications on trustworthy reasoning are coming soon!


  1. Weir, N., Sanders, K., Weller, O., Sharma, S., Jiang, D., Jiang, Z., ..., Van Durme, B. Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic. EMNLP 2024.
  2. Sanders, K., Weir, N., Van Durme, B. TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning. EMNLP 2024.
  3. Sanders, K., Van Durme, B. Bonsai: Interpretable Tree-Adaptive Grounded Reasoning. 2025 arXiv preprint.
  4. Jiang, Z., Zhang, J., Weir, N., Ebner, S., Wanner, M., Sanders, K., Khashabi, D., Liu, A., Van Durme, B. (2024). Core: Robust Factual Precision Scoring with Informative Sub-Claim Identification. ACL 2025 Findings.
  5. Ou, J.*, Walden, W.*, Sanders, K., Jiang, Z., Sun, K., Cheng, J., ..., Van Durme, B. CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers? 2025 arXiv preprint.

Video retrieval for real-world queries

I spent a summer at SCALE 2024 hacking on the MultiVENT video retrieval dataset to find new ways to perform video retrieval on more realistic, real-world queries pertaining to visual events. In addition to the MultiVENT 2.0 benchmark (which served as a shared task for our ACL 2025 workshop on multimodal retrieval and generation) we came up with some methods papers for tackling these sorts of problems: Video-ColBERT draws inspiration from late interaction retrieval methods to combine token-wise interaction on static frame features and temporally contextualized video features for improved retrieval. MMMORRF takes a separate approach, balancing the contributions of both vision and audio by establishing distinct data processing pipelines for each and fusing them through modality-aware weighted reciprocal rank fusion. FORTIFY addresses the challenging problem of modeling OCR content for retrieval by leveraging generative models to rewrite and synthesize these fragments as noisy, multilingual documents.


  1. Reddy, A., Martin, A., Yang, E., Yates, A., Sanders, K., Murray, K., Kriz, R., M de Melo, C., Van Durme, B., Chellappa, R. Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval. CVPR 2025.
  2. Samuel, S., DeGenaro, D., Guallar-Blasco, J., Sanders, K., Eisape, O., ..., Kriz, R. MMMORRF: Multimodal Multilingual MOdularized Reciprocal Rank Fusion. SIGIR 2025 Demo.
  3. DeGenaro, D., Yang, E., Etter, D., Carpenter, C., Sanders, K., Martin, A., Murray, K., Kriz, R. FORTIFY: Generative Model Fine-tuning with ORPO for ReTrieval Expansion of InFormal NoisY Text ACL 2025 Workshops.

Miscellaneous

I recently worked on a fun collaboration with MIT CSAIL's Computer-Aided Programming Group where we explored how well LLMs and reasoning models can learn low-level language syntax via in-context learning [1]. I also have experience developing benchmarks for web agents [2] and for assessing the calibration of vision classification models via human judgment solicitation [3]. I published a handful of robotics papers while in undergrad: My two favorites concern identifying failure modes for bin-picking systems and suction grippers [4, 5].


  1. Gupta, K., Sanders, K., Solar-Lezama, A. Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs. ICLR 2025 Workshops.
  2. Xu, K., Kordi, Y., Nayak, T., Asija, A., Wang, Y., Sanders, K., Byerly, A., Zhang, J., Van Durme, B., Khashabi, D. Tur[k]ingBench: A Challenge Benchmark for Web Agents. NAACL 2025.
  3. Sanders, K., Kriz, R., Liu, A., Van Durme, B. Ambiguous Images With Human Judgments for Robust Visual Event Classification. NeurIPS 2022 D&B.
  4. Sanders, K., Danielczuk, M., Mahler, J., Tanwani, A., Goldberg, K. Non-Markov Policies to Reduce Sequential Failures in Robot Bin Picking. CASE 2020.
  5. Huh, T. M., Sanders, K., Danielczuk, M., Li, M., Chen, Y., Goldberg, K., Stuart, H. S. A Multi-Chamber Smart Suction Cup for Adaptive Gripping and Haptic Exploration. IROS 2021.

News

Aug 2025 Co-organized the first workshop on multimodal retrieval and generation at ACL 2025.
July 2025 CORE and FORTIFY presented @ ACL 2025.
July 2025 MMMORRF presented @ SIGIR 2025.
June 2025 MultiVENT 2.0 and Video-ColBERT presented @ CVPR 2025.
May 2025 Interning at AWS to work on reasoning model reward functions.
April 2025 Randomly Sampled Language Reasoning Problems presented @ ICLR 2025 VerifAI workshop.
April 2025 Tur[k]ingBench presented @ NAACL 2025.
April 2025 Bonsai now on arXiv.