Models

GLM-OCR

Open OCR model and pipeline for turning complex document images into usable text.

MIT model / Apache-2.0 code License
Open source
GLM-OCR MIT model / Apache-2.0 code License zai-org/GLM-OCR verified 2026-04-19
About

GLM-OCR overview

GLM-OCR is an open OCR model and document pipeline from Z.ai, focused on accurate, fast, and comprehensive image-to-text extraction for documents, tables, formulas, and complex layouts.

Document-first model focus

GLM-OCR targets OCR and image-to-text extraction rather than general chat.

Specialization is valuable when a workflow depends on layout, tables, equations, and structured document text.

Open model and pipeline licensing

The repository states MIT licensing for the model and Apache-2.0 licensing for code components.

Clear licensing makes it easier to evaluate for production document workflows.

Useful for agent intake

OCR output can feed downstream agents, search indexes, and retrieval systems.

Agents are only as useful as the documents and screens they can accurately read.
Use cases

When to use GLM-OCR

PDF and document ingestion

Convert scans and visual documents into text before indexing or summarization.

Research workflow automation

Extract usable text from papers, reports, forms, and tables for downstream analysis.

RAG preprocessing

Use OCR as the first stage before chunking, embedding, and retrieval.

Compare

How it compares

Choose GLM-OCR for document pipelines vs general VLMs

A general multimodal model may describe an image, but GLM-OCR is the better starting point when the job is faithful document extraction.

FAQ

Questions

What should I check before using GLM-OCR?

Run GLM-OCR on a fixed prompt set from your own workflow. Compare quality, latency, context handling, retry behavior, deployment path, and license fit against nearby open models before adopting it.

Is GLM-OCR open source?

GLM-OCR is listed with MIT model / Apache-2.0 code based on the official source links in this profile. Re-check the repository, model card, or docs before production use.

Who should evaluate GLM-OCR?

GLM-OCR is most worth evaluating for builders working on document AI, PDF processing, or knowledge ingestion.

Tags

Capabilities

local inferencetool callingopen sourceopen weightsdeveloper workflow
Decision brief

Should you use GLM-OCR?

JSON
Best for
  • Builders working on document AI, PDF processing, or knowledge ingestion
  • Teams that need an open OCR component before RAG or agent workflows
  • Researchers comparing modern OCR pipelines beyond generic vision-language models
Not for
  • Users who want a fully managed consumer product with no setup work
  • Teams that cannot review the linked source, license, and operational requirements before adoption
Trust and freshness
  • Verified 2026-04-19
  • License: MIT model / Apache-2.0 code
  • Repo: zai-org/GLM-OCR
  • Open-source signal
Deployment

Check source

Permission surface

memory

Decision signals

No extra signals recorded

Agent packet

Structured decision data for GLM-OCR

This packet is the compact machine-readable view agents should use before following source links or taking action.

Capabilities

local inference, tool calling

Constraints

open source, open weights

Deployment

Check source

Permission surface

memory

Recommended workflows

Coding agent workflow, Local or private AI stack, Memory or RAG workflow

Overview

What GLM-OCR does

What it is

GLM-OCR is an open model resource to evaluate by workload, serving path, context behavior, license terms, and how reliably it supports the agent or local AI tasks you actually plan to run.

Why it matters

GLM-OCR matters because many real AI workflows begin with messy documents, not clean chat messages. A strong open OCR layer can become the front door for PDF analysis, retrieval systems, research workflows, and agent tools that need reliable document ingestion.

How to evaluate it

Run GLM-OCR on a fixed prompt set from your own workflow. Compare quality, latency, context handling, retry behavior, deployment path, and license fit against nearby open models before adopting it.

Facts

Known metadata and operating surface

These fields are separated from editorial interpretation so agents can reason over facts and missing checks.

Resource type model
Category Models
Maturity active
Difficulty Unknown
License MIT model / Apache-2.0 code
Pricing open source
Verified 2026-04-19
Source confidence medium
Risk level low
Fit matrix

Where GLM-OCR fits in an agent stack

strong

Coding agent workflow

GLM-OCR has multiple signals for coding agent workflow, including matching tags, capabilities, category, or positioning.

  • Run a small repository change and inspect the diff, tests, and rollback path.
  • Confirm official docs, current maintenance, license, and runtime constraints before production use.
strong

Local or private AI stack

GLM-OCR has multiple signals for local or private ai stack, including matching tags, capabilities, category, or positioning.

  • Verify hardware requirements, data path, storage, and whether all calls stay in your environment.
  • Confirm official docs, current maintenance, license, and runtime constraints before production use.
strong

Memory or RAG workflow

GLM-OCR has multiple signals for memory or rag workflow, including matching tags, capabilities, category, or positioning.

  • Create, update, retrieve, correct, and delete memory or retrieval objects with real data.
  • Confirm official docs, current maintenance, license, and runtime constraints before production use.
partial

Browser automation

GLM-OCR has at least one signal for browser automation, but should be checked against a real task before adoption.

  • Run one non-sensitive website task and inspect clicks, waits, retries, and changed URLs.
  • Confirm official docs, current maintenance, license, and runtime constraints before production use.
partial

Reusable skill workflow

GLM-OCR has at least one signal for reusable skill workflow, but should be checked against a real task before adoption.

  • Run one skill end to end and check whether it produces evidence or structured output.
  • Confirm official docs, current maintenance, license, and runtime constraints before production use.
weak

Connector or protocol layer

GLM-OCR is not primarily positioned for connector or protocol layer in the current metadata.

  • Connect one low-risk service, then inspect schemas, auth scope, errors, and logs.
  • Confirm official docs, current maintenance, license, and runtime constraints before production use.
Inputs and outputs

What an agent should inspect

Likely inputs

  • Repositories, files, issues, terminal output, and test results
  • Documents, user facts, entities, context, or retrieval queries
  • Prompts, messages, documents, images, or model inputs
  • Official setup instructions and a small real workflow

Likely outputs

  • Diffs, commits, explanations, test results, or review notes
  • Retrieved context, memory updates, graph relations, or citations
  • A decision on whether this resource fits the target workflow
Evidence

Sources, claims, and missing checks

Claims are marked separately from source links so future crawlers and reviewers can update them without rewriting the page.

verified

GLM-OCR is listed as open source.

License metadata: MIT model / Apache-2.0 code
verified

GLM-OCR has a recorded GitHub repository: zai-org/GLM-OCR.

Resource facts and GitHub source link.
inferred

GLM-OCR is tagged with local inference, tool calling capabilities.

OpenAgent capability taxonomy.
Missing checks
  • Dedicated docs link is missing.
  • Repository freshness has not been recorded.
Next action

How to start evaluating GLM-OCR

Inspect repository

Check license, recent activity, issues, examples, and security-sensitive code paths.

Open source

Clone GLM-OCR

Use the official repository examples for the current vLLM or local inference setup.

git clone https://github.com/zai-org/GLM-OCR.git
Compare

Alternatives and nearby resources

Use related resources to compare category fit, license, deployment model, and first-workflow behavior.

FAQ

Common questions about GLM-OCR

What should I check before using GLM-OCR?

Run GLM-OCR on a fixed prompt set from your own workflow. Compare quality, latency, context handling, retry behavior, deployment path, and license fit against nearby open models before adopting it.

Is GLM-OCR open source?

GLM-OCR is listed with MIT model / Apache-2.0 code based on the official source links in this profile. Re-check the repository, model card, or docs before production use.

Who should evaluate GLM-OCR?

GLM-OCR is most worth evaluating for builders working on document AI, PDF processing, or knowledge ingestion.