Tools directory

Open AI development and evaluation tools

Browse practical utilities for building, testing, benchmarking, and operating AI products and agent systems.

Match the workflow stage

Building, testing, observing, and deploying agents require different tool categories.

Prefer composable tools

Tools that work with standard formats, APIs, and export paths integrate better into existing stacks.

Consider operational maturity

Docs, update cadence, and community responsiveness matter more for production tooling.

Evaluation and benchmark tools

Frameworks and tools for benchmarking, testing, and evaluating AI agent and model performance.

Observability and debugging tools

Tools for tracing, logging, inspecting, and debugging agent behavior and LLM interactions.

Deployment and serving tools

Utilities for deploying, serving, and managing AI models and agent systems in production.

More resources to compare

Additional published resources that do not fit one narrow subcategory yet.