Building, testing, observing, and deploying agents require different tool categories.
Open AI development and evaluation tools
Browse practical utilities for building, testing, benchmarking, and operating AI products and agent systems.
Tools that work with standard formats, APIs, and export paths integrate better into existing stacks.
Docs, update cadence, and community responsiveness matter more for production tooling.
Evaluation and benchmark tools
Frameworks and tools for benchmarking, testing, and evaluating AI agent and model performance.
MLflow
Open-source AI engineering platform for experiments, evaluations, observability, and model management.
promptfoo
Open-source tool for testing prompts, agents, RAG systems, and AI security behavior.
Ragas
Open-source evaluation framework for LLM applications and RAG workflows.
OpenLIT
OpenTelemetry-native open-source AI engineering platform for LLM observability, evaluations, guardrails, prompts, and GPU monitoring.
Future AGI
Open-source platform for evaluating, observing, and improving LLM and AI agent applications.
Langfuse
LLM engineering platform for observability, evals, prompt management, datasets, and traces.
Observability and debugging tools
Tools for tracing, logging, inspecting, and debugging agent behavior and LLM interactions.
OpenLIT
OpenTelemetry-native open-source AI engineering platform for LLM observability, evaluations, guardrails, prompts, and GPU monitoring.
Future AGI
Open-source platform for evaluating, observing, and improving LLM and AI agent applications.
Langfuse
LLM engineering platform for observability, evals, prompt management, datasets, and traces.
MCP Inspector
Visual testing tool for Model Context Protocol servers.
Deployment and serving tools
Utilities for deploying, serving, and managing AI models and agent systems in production.
More resources to compare
Additional published resources that do not fit one narrow subcategory yet.