Tsinghua University

CogVLM

Vision-language understanding with strong grounding.

Visually-grounded text generationState-of-the-art VQA accuracyEfficient architectureOpen-source availability
Today's score
82.0
Try CogVLM

Where it ranks today

Best for / Not great for

Best for
  • Detailed image descriptions
  • Visual reasoning tasks
  • Developing specialized vision assistants
  • Educational tools for visual learning
Not great for
  • Audio or video analysis
  • Large-scale text generation
  • Real-time conversational agents
  • Complex coding tasks

Why it ranks here

CogVLM remains a notable academic achievement in multimodal AI, offering strong visual grounding and reasoning. Its open-source nature and benchmark performance solidify its place for specialized vision-language research and development.

30-day trend

Score breakdown

Search trends83
Benchmarks84
Developer buzz81
News mentions81

Pricing

API: $0.00 in · $0.00 out per 1M tokens · Consumer: $0.00/mo

Pricing plans

Popular
Open Source Download
Access CogVLM research code.
Free
  • Research-focused
  • Requires significant setup
  • Community support via GitHub
  • Model weights available
Get Code
Hugging Face Inference API
Inference for CogVLM.
$0 /usage
  • Pay-per-request
  • Easy integration
  • Managed infrastructure
Try API
Compare with another modelHow is this score calculated? →Snapshot 2026-05-19