Google Gemini API (Best Free Tier)
To promote the Gemini lineup, Google currently provides one of the most generous free quotas. Pricing/details: https://ai.google.dev/gemini-api/docs/pricing?hl=zh-cn
Models: Gemini 3 Flash Preview, Gemini 2.5 Pro (as of 2026-02-12). In general, the newest top-end Pro model may have tighter free limits, while many other models still provide free usage.
Pros:
- Even top-tier models may include free quota.
- Very large context window (1M+ tokens).
- Strong multimodal support (image/video input).
Cons:
- Data privacy: free-tier inputs may be used by Google to improve models (use with caution in production).
- IP restrictions: strict regional policy; unsupported locations may hit
403orUser Location Not Supported.
Groq (Speed King)
Groq uses its self-developed LPU (Language Processing Unit) hardware and provides extremely fast inference. Pricing/details: https://groq.com/pricing
Models: GPT OSS / Kimi K2 / Llama 3,4 / Qwen3 Quota: No free tier, but relatively low price.
Pros:
- Very low latency; TTFT is often within 200ms.
- Great for real-time chat and voice assistants.
Cons:
- Model scope is mostly open-source models; no GPT-4 or Claude hosted directly.
SiliconCloud (Strong Domestic Option)
A fast-growing China-based inference platform that aggregates many high-quality domestic open-source models. Pricing/details: https://siliconflow.cn/pricing
Models: Qwen 2.5 (7B/14B/72B), DeepSeek-V2, Yi-1.5, Kimi K2. Quota: Some models (for example Qwen 7B, GLM-4-9B) currently offer permanent free calls.
Pros:
- Fast domestic connectivity.
- New domestic open-source models are usually available quickly.
Cons:
- Free access is mainly for smaller models.
- Top models (such as 72B / DeepSeek 236B) are usually paid.