Models
llama.cpp
A fast, portable inference stack for running open-weight language models across local machines, servers, and edge devices.
A fast, portable inference stack for running open-weight language models across local machines, servers, and edge devices.