Microsoft / University of Wisconsin-Madison

LLaVA 1.6

Open-source vision-language understanding.

Strong image-text alignmentOpen-source availabilityGood performance on VQA tasksRelatively lightweight
Today's score
88.0
Try LLaVA 1.6

Where it ranks today

Best for / Not great for

Best for
  • Visual Question Answering (VQA)
  • Image captioning
  • Developing custom vision tools
  • Research in vision-language models
Not great for
  • Audio or video processing
  • Complex multi-turn dialogues
  • Generating novel images

Why it ranks here

LLaVA continues to be a leading choice for open-source vision-language tasks. Its accessibility and solid performance in understanding images and text make it popular for researchers and developers building specific visual AI applications.

30-day trend

Score breakdown

Search trends87
Benchmarks89
Developer buzz90
News mentions88

Pricing

API: $0.00 in · $0.00 out per 1M tokens · Consumer: $0.00/mo

Pricing plans

Popular
Open Source
Freely available for research and development.
Free
  • Model weights available
  • Requires self-hosting
  • Customizable
  • Active community support
Get the code
Compare with another modelHow is this score calculated? →Snapshot 2026-05-13