Skip to main content
LangSmith supports two types of evaluations based on when and where they run:

Offline Evaluation

Test before you shipRun evaluations on curated datasets during development to compare versions, benchmark performance, and catch regressions.

Online Evaluation

Monitor in productionEvaluate real user interactions in real-time to detect issues and measure quality on live traffic.

Evaluation workflow

  • Offline evaluation flow
  • Online evaluation flow
1

Create a dataset

Create a dataset with from manually curated test cases, historical production traces, or synthetic data generation.
2

Define evaluators

Create to score performance:
3

Run an experiment

Execute your application on the dataset to create an . Configure repetitions, concurrency, and caching to optimize runs.
4

Analyze results

Compare experiments for benchmarking, unit tests, regression tests, or backtesting.
For more on the differences between offline and online evaluation, refer to the Evaluation concepts page.

Get started

To set up a LangSmith instance, visit the Platform setup section to choose between cloud, hybrid, or self-hosted. All options include observability, evaluation, prompt engineering, and deployment.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.