LLM API Landscape (Free and Cost-Effective Options)

Google Gemini API (Best Free Tier)

To promote the Gemini lineup, Google currently provides one of the most generous free quotas. Pricing/details: https://ai.google.dev/gemini-api/docs/pricing?hl=zh-cn

Models: Gemini 3 Flash Preview, Gemini 2.5 Pro (as of 2026-02-12). In general, the newest top-end Pro model may have tighter free limits, while many other models still provide free usage.

Pros:

Even top-tier models may include free quota.
Very large context window (1M+ tokens).
Strong multimodal support (image/video input).

Cons:

Data privacy: free-tier inputs may be used by Google to improve models (use with caution in production).
IP restrictions: strict regional policy; unsupported locations may hit 403 or User Location Not Supported.

Groq (Speed King)

Groq uses its self-developed LPU (Language Processing Unit) hardware and provides extremely fast inference. Pricing/details: https://groq.com/pricing

Models: GPT OSS / Kimi K2 / Llama 3,4 / Qwen3 Quota: No free tier, but relatively low price.

Pros:

Very low latency; TTFT is often within 200ms.
Great for real-time chat and voice assistants.

Cons:

Model scope is mostly open-source models; no GPT-4 or Claude hosted directly.

SiliconCloud (Strong Domestic Option)

A fast-growing China-based inference platform that aggregates many high-quality domestic open-source models. Pricing/details: https://siliconflow.cn/pricing

Models: Qwen 2.5 (7B/14B/72B), DeepSeek-V2, Yi-1.5, Kimi K2. Quota: Some models (for example Qwen 7B, GLM-4-9B) currently offer permanent free calls.

Pros:

Fast domestic connectivity.
New domestic open-source models are usually available quickly.

Cons:

Free access is mainly for smaller models.
Top models (such as 72B / DeepSeek 236B) are usually paid.