AI models and features
On this page
Supported modelsModel lifecycleCode generationError diagnosisPlan requirementReliabilityfold.run includes a curated catalog of AI models and AI-powered capabilities for code generation and error diagnosis. AI features are available on the Pro plan.
Supported models
Every model uses a @fold/ prefixed ID. Pass the model ID to fold.ai.run() in your function code.
Text generation
| Model ID | Name | Notes |
|---|---|---|
@fold/meta/llama-3.3-70b-instruct-fp8-fast |
Llama 3.3 70B | Flagship large model, best overall quality |
@fold/meta/llama-4-scout-17b-16e-instruct |
Llama 4 Scout 17B | Latest-generation model with mixture of experts |
@fold/meta/llama-3.1-8b-instruct |
Llama 3.1 8B | Efficient general-purpose model |
@fold/meta/llama-3.2-3b-instruct |
Llama 3.2 3B | Lightweight, fast responses |
@fold/qwen/qwen2.5-coder-32b-instruct |
Qwen 2.5 Coder 32B | Optimized for code generation |
@fold/qwen/qwq-32b |
QwQ 32B | Reasoning-focused model |
@fold/mistralai/mistral-small-3.1-24b-instruct |
Mistral Small 3.1 24B | Strong multilingual support |
@fold/google/gemma-3-12b-it |
Gemma 3 12B | Balanced size and performance |
@fold/deepseek-ai/deepseek-r1-distill-qwen-32b |
DeepSeek R1 32B | Advanced reasoning capabilities |
@fold/openai/gpt-oss-120b |
GPT OSS 120B | Largest available open model |
Embeddings
| Model ID | Name | Notes |
|---|---|---|
@fold/baai/bge-base-en-v1.5 |
BGE Base EN | Standard English embeddings |
@fold/baai/bge-large-en-v1.5 |
BGE Large EN | Higher-quality English embeddings |
@fold/baai/bge-m3 |
BGE M3 | Multilingual embeddings |
Image generation
| Model ID | Name | Notes |
|---|---|---|
@fold/black-forest-labs/flux-1-schnell |
FLUX.1 Schnell | Fast, high-quality generation |
@fold/stabilityai/stable-diffusion-xl-base-1.0 |
Stable Diffusion XL | Versatile image generation |
@fold/leonardo/phoenix-1.0 |
Leonardo Phoenix | Creative image generation |
Vision
| Model ID | Name | Notes |
|---|---|---|
@fold/meta/llama-3.2-11b-vision-instruct |
Llama 3.2 11B Vision | Image understanding and analysis |
Speech to text
| Model ID | Name | Notes |
|---|---|---|
@fold/openai/whisper-large-v3-turbo |
Whisper Large V3 Turbo | Best accuracy transcription |
@fold/deepgram/nova-3 |
Deepgram Nova 3 | Real-time transcription |
Text to speech
| Model ID | Name | Notes |
|---|---|---|
@fold/deepgram/aura-2-en |
Deepgram Aura 2 EN | High-quality English speech |
@fold/myshell-ai/melotts |
MeloTTS | Lightweight text-to-speech |
Classification
| Model ID | Name | Notes |
|---|---|---|
@fold/huggingface/distilbert-sst-2-int8 |
DistilBERT Sentiment | Sentiment analysis |
@fold/microsoft/resnet-50 |
ResNet-50 | Image classification |
Summarization
| Model ID | Name | Notes |
|---|---|---|
@fold/facebook/bart-large-cnn |
BART Large CNN | Document summarization |
Translation
| Model ID | Name | Notes |
|---|---|---|
@fold/meta/m2m100-1.2b |
M2M100 1.2B | Multilingual translation |
Reranking
| Model ID | Name | Notes |
|---|---|---|
@fold/baai/bge-reranker-base |
BGE Reranker Base | Search result reranking |
Safety
| Model ID | Name | Notes |
|---|---|---|
@fold/meta/llama-guard-3-8b |
Llama Guard 3 | Content safety classification |
Model lifecycle
Each model has a lifecycle status:
- Active — fully supported and recommended for use.
- Deprecated — still works, but you should migrate to the recommended successor.
- Retiring — will stop working on the listed retirement date. Blocked on new deploys.
- Retired — no longer available. Blocked on all deploys.
When a model is deprecated or retiring, deploy warnings will indicate the successor model and any deadline. All models listed above are currently active.
Code generation
Generate a complete function from a natural language description:
curl -X POST https://api.fold.run/ai/generate \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"description": "An API that returns the current time in multiple timezones"
}'Response:
{
"code": "import { defineHandler, type FoldContext } from '@fold-run/runtime';\n\nexport default defineHandler(async (fold: FoldContext) => {\n // ...\n});",
"description": "Returns current time for UTC, EST, PST, and CET timezones"
}The generated code always uses @fold-run/runtime's defineHandler pattern with no external dependencies.
Tool generation
Generate a function with an input schema, ready for MCP or A2A:
curl -X POST https://api.fold.run/ai/generate-tool \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"description": "Look up the weather for a given city",
"input_schema": {
"type": "object",
"properties": {
"city": { "type": "string" }
}
}
}'Response includes code, description, and the finalized input_schema.
Via the CLI
# Generate and deploy in one step
fold create-tool weather-lookup "Look up weather for a city"Error diagnosis
When a function is throwing errors, AI can analyze the source code and recent error logs to suggest a fix:
curl -X POST https://api.fold.run/functions/fn_abc123/diagnose \
-H "Authorization: Bearer YOUR_TOKEN"Response:
{
"function_id": "fn_abc123",
"error_count": 10,
"diagnosis": "The function is failing because `env.API_KEY` is undefined. You need to add an API_KEY secret for this organization before deploying. Run: curl -X POST https://api.fold.run/secrets -H 'Authorization: Bearer <token>' -d '{\"name\": \"API_KEY\", \"value\": \"...\"}'"
}The diagnosis examines the last 10 error activations and the function's source code to provide actionable suggestions.
Via MCP
AI agents can diagnose errors using the MCP diagnose_function tool:
"Diagnose the errors on my weather-lookup function"See MCP integration for details.
Plan requirement
AI features require the Pro plan. On the Free plan, AI endpoints return a 403 with error code PRO_PLAN_REQUIRED.
The Pro plan includes 5,000,000 AI tokens per month. Usage beyond this cap is metered at $10.00 per 1,000,000 tokens (see Billing for details). The Free plan has no AI token allowance.
Upgrade from the billing page or see Billing.
Reliability
AI features are fire-and-forget — they never block deploys or activations. If the AI model is temporarily unavailable, the platform continues operating normally without AI-generated summaries or diagnoses.