Data Scientist
Dell Technologies
- Proposed an entropy-based metric to assess and cluster SQL query for improved confidence estimation .
- Integrated entropy-guided refinement into CHESS and CHASE pipelines to reduce redundant generations.
- Built a modular agentic framework using Mixtral and Deepseek models for retrieval, generation, and selection.
- Applied MarkupLM with DBSCAN to clusterexecution-based embeddings of SQL outputs.
- Used entropy thresholds as stopping criteria in query refinement to cut down computation costs.
- Ran detailed experiments on BIRD benchmark using diverse generation strategies like Divide-and-Conquer, Query Plan, Online Synthetic Examples.
- Automated 6,000 quarterly flux analyses using a custom-built LLM pipeline, reducing 32K+ hours of manual effort annually.
- Conducted extensive prompt engineering experiments and adopted Chain-of-Thought prompting for best results.
- Fine-tuned a Llama-3.1-8B-Instruct using PEFT with LORA on historical finance data to improve accuracy.
- Designed and implemented an tool based agentic framework to orchestrate data preprocessing, variance analysis, and auto-commentary generation.
2025 Accepted Research Paper: How Does Entropy Influence Modern Text-to-SQL Systems?
AI-Powered Balance Sheet Flux Commentary Automation