Harshit Dave
Bridging state-of-the-art AI research with production-grade fintech solutions.
I am a Data Scientist at slice, where I build production-scale LLM-powered agentic systems and multi-agent architectures for production applications These systems bridge cutting-edge research and real-world applications in the fintech space. My work is centered around LLMs, multimodal AI, and creating intelligent conversational systems.
My journey in AI/ML began at IIITM Gwalior, where I reseached and engineered the Integrated-Particle Swarm Optimization (i-PSO) technique and analyzed with various ML algorithms for detecting malware, under the supervision of Dr. Saumya Bhadauria and Prof. Aditya Trivedi. This early work taught me how to blend deep theoretical concepts with practical, real-world problem-solving.
Later, as a remote research intern with the artificial intelligence institute of University of South Carolina (AIISC), I worked with a team on multimodal fact verification. We used Stable Diffusion 2 to enhance the mutlimodal data and built more robust datasets (FACTIFY3M) by injecting adversarial attacks in the form of fake news. This research, which was accepted at EMNLP 2023, gave me firsthand experience in working on multimodal AI.
During my internship at IBM Research, I benchmarked open-source LLMs for enterprise tasks and developed a 3-stage LLM-based approach for NBA (next best agent) recommendation in cold-start settings, work that was accepted at IAAI-24 (co-located with AAAI-24).
My Work at slice
Since joining slice, I've had the opportunity to build several impactful systems.
sliceMCP: I architected the pipeline that combines a Confluence qdrant vector database with a KG to answer complex questions, reducing the onboarding time for new projects from weeks to just a few hours. To make it fast, I designed a data pipeline using Merkle trees that improved synchronization speeds by 100 times for vector database.
Pay-Via-Voice: Built a voice-first payment system using Whisper ASR and multi-agent orchestration. We achieved 1.013s average transaction latency with full UPI integration and policy validation.
ConvoBot: Developed a context-aware chatbot using GPT-4o prompt-chaining that achieves 98.9% precision, 96.7% recall, and 37-42% end-to-end resolution rates.
SMS Processing: Inferenced LLMs like Qwen2.5-3B and Llama-3.1-8B on GPU with batch inference to reduce latency. Got an accuracy of 79% with 0.55s per 3 SMS. Then, trained HRM (Hierarchial Reasoning Model) using DAPT (Domain Adaptive Pre-Training) and SFT for SMS categorization, achieving 86% macro-F1 score with latency around 2ms per SMS.
Research Interests
I am interested in developing AI systems capable in symbolic reasoning and build efficient collaborative systems that works alongside people in meaningful ways. My current focus is on building collaborative, agentic models that can understand goals, share context, and adapt through interaction, ideas that connect closely with Human-Computer Interaction (HCI) and Human-Centered AI (HCAI).
Currently Exploring
Research Publications
Research contributions in AI, NLP, and machine learning across prestigious conferences
Multi-Stage Prompting for Next Best Agent Recommendations in Adaptive Workflows
IAAI-24 [Collocated with AAAI-24], 2024
Traditional business processes such as loan processing, order processing, or procurement have a series of steps that are pre-defined at design and executed by enterprise systems. Recent advancements in new-age businesses, however, focus on having adaptive and ad-hoc processes by stitching together a set of functions or steps enabled through autonomous agents. Further, to enable business users to execute a flexible set of steps, there have been works on providing a conversational interface to interact and execute automation. Often, it is necessary to guide the user through the set of possible steps in the process (or workflow). Existing work on recommending the next agent to run relies on historical data. However, with changing workflows and new automation constantly getting added, it is important to provide recommendations without historical data. Additionally, hand-crafted recommendation rules do not scale. The adaptive workflow being a combination of structured and unstructured information, makes it harder to mine. Hence, in this work, we leverage Large Language Models (LLMs) to combine process knowledge with the meta-data of agents to discover NBAs specifically at cold-start. We propose a multi-stage approach that uses existing process knowledge and agent meta-data information to prompt LLM and recommend meaningful next best agent (NBA) based on user utterances.
FACTIFY3M: A benchmark for multimodal fact verification with explainability through 5W Question-Answering
EMNLP 2023
Combating disinformation is one of the burning societal crises - about 67% of the American population believes that disinformation produces a lot of uncertainty, and 10% of them knowingly propagate disinformation. Evidence shows that disinformation can manipulate democratic processes and public opinion, causing disruption in the share market, panic and anxiety in society, and even death during crises. Therefore, disinformation should be identified promptly and, if possible, mitigated. With approximately 3.2 billion images and 720,000 hours of video shared online daily on social media platforms, scalable detection of multimodal disinformation requires efficient fact verification. Despite progress in automatic text-based fact verification (e.g., FEVER, LIAR), the research community lacks substantial effort in multimodal fact verification. To address this gap, we introduce FACTIFY 3M, a dataset of 3 million samples that pushes the boundaries of the domain of fact verification via a multimodal fake news dataset, in addition to offering explainability through the concept of 5W question-answering. Salient features of the dataset include: (i) textual claims, (ii) ChatGPT-generated paraphrased claims, (iii) associated images, (iv) stable diffusion-generated additional images (i.e., visual paraphrases), (v) pixel-level image heatmap to foster image-text explainability of the claim, (vi) 5W QA pairs, and (vii) adversarial fake news stories.
Interested in discussing AI and machine learning? I enjoy connecting with fellow researchers, engineers, and teams working on interesting problems in conversational AI, multimodal systems, applied ML, reinforcement learning and HCAI.
Blog
Insights and explorations in AI, ML, and research
Work in Progress
Currently working on blog posts covering my experiences with LLMs