Meet the Patronus team: Varun Gangal

Bio:
I am a language generation researcher with current specific interests in LLM evaluation and alignment. With my first A* publication being in 2017, I’ve had the privilege of being engaged in battle and seeing with invested eyes the frontier of language generation ability advance from the high noon of task-specific recurrent architectures to contextual embeddings + BERTology to the GPT-2/T5 era LLMs to the post-ChatGPT timeline.Most recently, I was at Amazon AGI working on LLM evaluation and post-training / preference tuning [reasoning benchmarks, reward models, LLM Judges, long context ability and multi-turn agentic benchmarks]. I was a contributor to the Nova family of LLMs. Before Amazon, I was a researcher at ASAPP Inc, a post-series C, B2B SaaS, LLM-driven customer support agent startup from Dec'22 - Aug'24, contributing both to the multi-turn, tool-calling GenerativeAgent product as well as research on pref tuning, hallucination detection & sparsification.I started my NLP research journey via my PhD from 2016-22 at the CMU Language Technologies Institute (LTI), advised by Prof Ed Hovy, with research foci in Natural Language Generation, particularly tasks requiring commonsense and creativity such as figurative language and reordered narratives, and Data Augmentation; both for finetuning as well as better evaluation.
Q: Why are you excited to join Patronus AI?
Evaluating frontier of LLM ability has always been a topic close to my heart as well as interests ; from my early 2020-2023 work on dialog evaluation, creative generation and co-organizing workshops on evaluation generation (the GEM workshops) and controllability; through creating in-house test suites and envs for multi-turn ability and safety at ASAPP; and recently both contributing to Humanity’s Last Exam and evaluating Nova LLMs on many fronts @Amazon AGI.While LLM evaluation has always been a significant and growing factor in the flywheel of LLMs and the march to AGI/ASI ever since we transcended basic fluency and coherence in the early 2020s, I posit that we have entered a regime of benchmarks/evaluations themselves having significant alpha since mid-2024. This can be seen from how specific benchmarks — NIAH, SWEBench, AIME, FrontierMath and ARC-AGI have shaped the nature of LLM post-training and arguably even driven invest/compute decisions and spawned reasoning + inference scaling.Patronus AI, with the troika of a strong starting point (Lynx family, FinanceBench), a coherent business thesis & recent momentum (benchmarks abilities-at-the-edge like BLUR, multimodal LLM judges), all 3 threaded on LLM evaluation at the edge, is poised to capitalize thoughtfully on the aforementioned alpha, and starting at such a place at such a time truly excites me.
Q: What do you like to do in your free time?
In an ideal alternative timeline, I would have betted all my life’s force on being a historian or a paleontologist or both, and though I have diverged irreversibly enough away from that fork in the road, I enjoy consuming, discussing and non-fiction, podcasts and sometimes more involved content such as paperWhats and primary sources on these and allied topics such as geopolitics, evolutionary biology inter alia.