Open AI development and evaluation tools

MLflow

Open-source AI engineering platform for experiments, evaluations, observability, and model management.

MIT 21.8K stars

Tools MIT

promptfoo

Open-source tool for testing prompts, agents, RAG systems, and AI security behavior.

Apache-2.0 14.2K stars

Ragas

Open-source evaluation framework for LLM applications and RAG workflows.

Apache-2.0 2.5K stars

OpenLIT

OpenTelemetry-native open-source AI engineering platform for LLM observability, evaluations, guardrails, prompts, and GPU monitoring.

Apache-2.0 1.1K stars

See repository 28.3K stars

Future AGI

Open-source platform for evaluating, observing, and improving LLM and AI agent applications.

Open sourceSelf-hosted

Langfuse

LLM engineering platform for observability, evals, prompt management, datasets, and traces.

Observability and debugging tools

Tools for tracing, logging, inspecting, and debugging agent behavior and LLM interactions.

Apache-2.0 2.5K stars

OpenLIT

OpenTelemetry-native open-source AI engineering platform for LLM observability, evaluations, guardrails, prompts, and GPU monitoring.

Apache-2.0 1.1K stars

See repository 28.3K stars

Future AGI

Open-source platform for evaluating, observing, and improving LLM and AI agent applications.

Open sourceSelf-hosted

Langfuse

LLM engineering platform for observability, evals, prompt management, datasets, and traces.

See repository 10K stars

MCP Inspector

Visual testing tool for Model Context Protocol servers.

MCP

Deployment and serving tools

Utilities for deploying, serving, and managing AI models and agent systems in production.

See repository 49K stars

LiteLLM

AI gateway and Python SDK for calling many LLM providers through OpenAI-compatible or native formats.

More resources to compare

Additional published resources that do not fit one narrow subcategory yet.

MIT 43.6K stars

Tools MIT

nanobot

Lightweight open-source AI agent that connects to your tools, chats, and workflows for automation.