The easiest and cheapest inference engine

Deploy any open source model, auto-scale instantly, and pay for what you use

20X

cheaper GPT-4o

Deploy any models in seconds

Trusted By:

Setup inference in minutes

Setup inference in minutes

  • Deploy any open source or fine-tuned model
  • Serverless and Dedicated endpoints for any model
  • Customize your hardware configuration

Pricing

Model type

Model Size

Price (per 1M Tokens)

Llama 3.2

1B & 3B

3 cents

Llama 3.1 & Llama 3.2

8B & 11B

9 cents

Llama 3.1 & Llama 3.3

70B & 70B

60 cents

Llama 3.2

405B

90 cents

Llama 3.1

405B

$2

Model type

Model Size

Price (per minute)

Llama 3.2

1B & 3B

1 cent

Llama 3.1 & Llama 3.2

8B & 11B

3 cents

Pricing

Serverless

Model type

Model Size

Price (per 1M Tokens)

Llama 3.2

1B & 3B

3 cents

Llama 3.1 & Llama 3.2

8B & 11B

9 cents

Llama 3.1 & Llama 3.3

70B & 70B

60 cents

Llama 3.2

405B

90 cents

Llama 3.1

405B

$2

Dedicated

Model type

Model Size

Price (per minute)

Llama 3.2

1B & 3B

1 cent

Llama 3.1 & Llama 3.2

8B & 11B

3 cents