Animesh Chowdhury — AI & Data Product Leader

What I do

My edge is the combination of hands-on technical depth (LLM evaluation, RAG, observability, experimentation) with the product judgment to weigh customer experience, safety, cost, and scale in one call.

Build AI products

Take customer-facing GenAI products from problem definition and requirements through launch, experimentation, and iteration.

Decide what ships

Define the quality KPIs, evaluation standards, and experiments that tell Product and Engineering what to build next.

Make AI trustworthy

Grounding, observability, and human-in-the-loop governance so AI systems are measurable, auditable, and safe at consumer scale.

Projects you can open right now

Five live, self-contained apps spanning the AI-product lifecycle: build, evaluate, experiment, monitor, explain. Each runs on synthetic or real public data, so there's nothing proprietary. Just click and explore.

Monitor

LLM Observability & Evals

Model-health monitoring across quality, safety, performance, cost & drift, on a SQL-backed pipeline with alerting and PDF/PPTX export.

Open demo ↗ Code

Evaluate

Chat Quality Score (CQS)

An LLM-as-a-judge evaluation that scores conversations on a 4-dimension rubric, calibrated against human labels.

Open demo ↗ Code

Explain

Product Recommendation Quality

Tracks AI recommendation relevance week over week and surfaces the drivers behind any change.

Open demo ↗ Code

Experiment

A/B Experimentation Framework

Hypothesis design, randomization, guardrail metrics, and ship / iterate / stop decisioning.

Open demo ↗ Code

Build

LedgerIQ — Finance RAG Agent

A finance-ops RAG agent over two sources (real SEC EDGAR filings and FP&A planning documents) with grounded, cited answers that refuse when out-of-corpus, plus token-minimization controls and MCP retrieval servers.

Open demo ↗ Code

Experience

Principal Product Data Analyst · Walmart

Oct 2022 – Present · Product, Evaluation & Quality Lead, Agentic AI

Data product lead for Sparky, Walmart's AI shopping assistant. Defined the platform's first standardized quality KPI and its greenfield evaluation standards from zero; own the analytics, experimentation, and measurement strategy that steer the roadmap. Two-time Bravo Award recipient.

Senior Associate, Finance Operations · Skillz

Aug 2021 – Oct 2022

Owned the BI and data-reporting infrastructure powering financial budgets, forecasts, and analyses, partnering closely with Finance and Business Operations.

Business Analyst · Electronic Arts

Aug 2020 – Aug 2021

Built strategic frameworks and automated reporting (SQL, R, Tableau) for mobile-studio workforce planning, translating leadership questions into decision-ready analytics.

Senior Analytics Specialist · Deloitte & Touche

Nov 2015 – Aug 2020

Led a self-service analytics platform, managing a team of 6–7 and driving adoption across US and global partner firms. Innovation Challenge Winner (selected from 218 submissions).

What I work with

Product

Product strategy & roadmapFeature prioritization PRDs & requirementsA/B testing KPI ownershipStakeholder management

GenAI & AI/ML

LLM evaluationRAG & grounding Agentic AI & MCPModel observability AI safety evaluationHuman-in-the-loop governance Token & cost optimization

Data & Platform

SQLPythonR BigQuerySnowflakePostgreSQL Kafka

Analytics & BI

TableauPower BIStreamlit Excel (modeling)JiraMiro