ksande25@jhu.edu
Site last updated August 2025
I am a final year Ph.D. student at the JHU Center for Language and Speech Processing at Johns Hopkins University. I am advised by Professor Benjamin Van Durme. During my Ph.D., I have been researching transparent and reliable reasoning, multimodal understanding, and uncertainty.
I spent this past summer as an intern at AWS designing reward functions to train reasoning models, where I was lucky to be mentored by Nathaniel Weir. The previous summer, I co-organized and facilitated the 10-week SCALE 2024 Summer Research Workshop at the HLTCOE.
Before starting my Ph.D., I received my BA in Computer Science from UC Berkeley where I conducted AI and robotics research at the UC Berkeley AUTOLab and was advised by Professor Ken Goldberg.
Please visit my Google Scholar profile for a full list of publications.
I have spent the last few years working with collaborators to develop ways of thinking about and formulating events in visual data. This has inspired the development of a collection of benchmarks designed to evaluate agents' abilities to recognize and explain these events across different languages and cultures. This effort began as a small video retrieval task that was then extended to a full event extraction benchmark. We emphasize the notion of "partially-defined events" in this paper: In visual content, we argue that it is critical to model the epistemic and aleatoric uncertainty associated with identifying events more commonly described through natural language. More recent extensions to this line of work include a massive video retrieval dataset built on these initial benchmarks that better mirrors datasets developed by the information retrieval community and a report generation benchmark that highlights the difficulty of piecing together multiple videos that only together help to describe some partially-defined event.
I am very lucky to have been able to spend a lot of time discussing notions of factuality and transparency with my labmates. I collaborated to extend ideas in informal logic to assess the quality of compositional entailment in neuro-symbolic reasoning systems, and extended these reasoning systems to transparently verify content in the video-language domain. I built on this framework to incorporate uncertainty modeling (described further in "benchmarks for complex visual event understanding" above) and generalize to modalities beyond vision and text. I also worked with labmates to develop a subclaim selection comoponent to improve the trustworthiness of factuality metrics like FActScore, and evaluate LLM's abilities to verify claims in scientific reports.
More publications on trustworthy reasoning are coming soon!
I spent a summer at SCALE 2024 hacking on the MultiVENT video retrieval dataset to find new ways to perform video retrieval on more realistic, real-world queries pertaining to visual events. In addition to the MultiVENT 2.0 benchmark (which served as a shared task for our ACL 2025 workshop on multimodal retrieval and generation) we came up with some methods papers for tackling these sorts of problems: Video-ColBERT draws inspiration from late interaction retrieval methods to combine token-wise interaction on static frame features and temporally contextualized video features for improved retrieval. MMMORRF takes a separate approach, balancing the contributions of both vision and audio by establishing distinct data processing pipelines for each and fusing them through modality-aware weighted reciprocal rank fusion. FORTIFY addresses the challenging problem of modeling OCR content for retrieval by leveraging generative models to rewrite and synthesize these fragments as noisy, multilingual documents.
I recently worked on a fun collaboration with MIT CSAIL's Computer-Aided Programming Group where we explored how well LLMs and reasoning models can learn low-level language syntax via in-context learning [1]. I also have experience developing benchmarks for web agents [2] and for assessing the calibration of vision classification models via human judgment solicitation [3]. I published a handful of robotics papers while in undergrad: My two favorites concern identifying failure modes for bin-picking systems and suction grippers [4, 5].
Aug 2025 | Co-organized the first workshop on multimodal retrieval and generation at ACL 2025. |
July 2025 | CORE and FORTIFY presented @ ACL 2025. |
July 2025 | MMMORRF presented @ SIGIR 2025. |
June 2025 | MultiVENT 2.0 and Video-ColBERT presented @ CVPR 2025. |
May 2025 | Interning at AWS to work on reasoning model reward functions. |
April 2025 | Randomly Sampled Language Reasoning Problems presented @ ICLR 2025 VerifAI workshop. |
April 2025 | Tur[k]ingBench presented @ NAACL 2025. |
April 2025 | Bonsai now on arXiv. |