OpenAI

CLIP ViT-L/14

Name: CLIP ViT-L/14
Brand: OpenAI
Rating: 8.7 (1 reviews)

Pioneering multimodal embeddings for vision and text.

Cross-modal understandingImage searchZero-shot image classification

Today's score

87.0

Try CLIP ViT-L/14

Where it ranks today

Embeddings & Search

Best for / Not great for

Best for

Image and video search
Multimodal RAG
Content moderation
Visual question answering

Not great for

Purely text-based tasks
Very high-dimensional text-only data
Low-resource environments

Why it ranks here

While not solely for text, CLIP's ability to link images and text makes it indispensable for multimodal search and RAG. Its strong performance in cross-modal tasks continues to drive innovation in content understanding.

30-day trend

Score breakdown

Search trends88

Benchmarks87

Developer buzz86

News mentions87

Pricing

API: $0.00 in · $0.00 out per 1M tokens · Consumer: $0.00/mo

Pricing plans

Popular

Open Source

Use and adapt the powerful multimodal model.

Free

Pre-trained model weights
Supports image and text
Customizable
Extensive research applications

Get on Hugging Face

API Access (via third-party)

Convenient API for multimodal tasks.

$0 /usage

Managed inference
Pay-per-use
Easy integration
Supports image/text search

Explore OpenAI API

Compare with another model How is this score calculated? →Snapshot 2026-05-23