AI models and features

Supported models

Every model uses a @fold/ prefixed ID. Pass the model ID to fold.ai.run() in your function code.

Text generation

Model ID	Name	Notes
`@fold/meta/llama-3.3-70b-instruct-fp8-fast`	Llama 3.3 70B	Flagship large model, best overall quality
`@fold/meta/llama-4-scout-17b-16e-instruct`	Llama 4 Scout 17B	Latest-generation model with mixture of experts
`@fold/meta/llama-3.1-8b-instruct`	Llama 3.1 8B	Efficient general-purpose model
`@fold/meta/llama-3.2-3b-instruct`	Llama 3.2 3B	Lightweight, fast responses
`@fold/qwen/qwen2.5-coder-32b-instruct`	Qwen 2.5 Coder 32B	Optimized for code generation
`@fold/qwen/qwq-32b`	QwQ 32B	Reasoning-focused model
`@fold/mistralai/mistral-small-3.1-24b-instruct`	Mistral Small 3.1 24B	Strong multilingual support
`@fold/google/gemma-3-12b-it`	Gemma 3 12B	Balanced size and performance
`@fold/deepseek-ai/deepseek-r1-distill-qwen-32b`	DeepSeek R1 32B	Advanced reasoning capabilities
`@fold/openai/gpt-oss-120b`	GPT OSS 120B	Largest available open model

Embeddings

Model ID	Name	Notes
`@fold/baai/bge-base-en-v1.5`	BGE Base EN	Standard English embeddings
`@fold/baai/bge-large-en-v1.5`	BGE Large EN	Higher-quality English embeddings
`@fold/baai/bge-m3`	BGE M3	Multilingual embeddings

Image generation

Model ID	Name	Notes
`@fold/black-forest-labs/flux-1-schnell`	FLUX.1 Schnell	Fast, high-quality generation
`@fold/stabilityai/stable-diffusion-xl-base-1.0`	Stable Diffusion XL	Versatile image generation
`@fold/leonardo/phoenix-1.0`	Leonardo Phoenix	Creative image generation

Vision

Model ID	Name	Notes
`@fold/meta/llama-3.2-11b-vision-instruct`	Llama 3.2 11B Vision	Image understanding and analysis

Speech to text

Model ID	Name	Notes
`@fold/openai/whisper-large-v3-turbo`	Whisper Large V3 Turbo	Best accuracy transcription
`@fold/deepgram/nova-3`	Deepgram Nova 3	Real-time transcription

Text to speech

Model ID	Name	Notes
`@fold/deepgram/aura-2-en`	Deepgram Aura 2 EN	High-quality English speech
`@fold/myshell-ai/melotts`	MeloTTS	Lightweight text-to-speech

Classification

Model ID	Name	Notes
`@fold/huggingface/distilbert-sst-2-int8`	DistilBERT Sentiment	Sentiment analysis
`@fold/microsoft/resnet-50`	ResNet-50	Image classification

Summarization

Model ID	Name	Notes
`@fold/facebook/bart-large-cnn`	BART Large CNN	Document summarization

Translation

Model ID	Name	Notes
`@fold/meta/m2m100-1.2b`	M2M100 1.2B	Multilingual translation

Reranking

Model ID	Name	Notes
`@fold/baai/bge-reranker-base`	BGE Reranker Base	Search result reranking

Safety

Model ID	Name	Notes
`@fold/meta/llama-guard-3-8b`	Llama Guard 3	Content safety classification

Model lifecycle

Each model has a lifecycle status:

Active — fully supported and recommended for use.
Deprecated — still works, but you should migrate to the recommended successor.
Retiring — will stop working on the listed retirement date. Blocked on new deploys.
Retired — no longer available. Blocked on all deploys.

When a model is deprecated or retiring, deploy warnings will indicate the successor model and any deadline. All models listed above are currently active.

Code generation

Generate a complete function from a natural language description:

curl -X POST https://api.fold.run/ai/generate \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "description": "An API that returns the current time in multiple timezones"
  }'

Response:

{
  "code": "import { defineHandler, type FoldContext } from '@fold-run/runtime';\n\nexport default defineHandler(async (fold: FoldContext) => {\n  // ...\n});",
  "description": "Returns current time for UTC, EST, PST, and CET timezones"
}

The generated code always uses @fold-run/runtime's defineHandler pattern with no external dependencies.

Tool generation

Generate a function with an input schema, ready for MCP or A2A:

curl -X POST https://api.fold.run/ai/generate-tool \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "description": "Look up the weather for a given city",
    "input_schema": {
      "type": "object",
      "properties": {
        "city": { "type": "string" }
      }
    }
  }'

Response includes code, description, and the finalized input_schema.

Via the CLI

# Generate and deploy in one step
fold create-tool weather-lookup "Look up weather for a city"

Error diagnosis

When a function is throwing errors, AI can analyze the source code and recent error logs to suggest a fix:

curl -X POST https://api.fold.run/functions/fn_abc123/diagnose \
  -H "Authorization: Bearer YOUR_TOKEN"

Response:

{
  "function_id": "fn_abc123",
  "error_count": 10,
  "diagnosis": "The function is failing because `env.API_KEY` is undefined. You need to add an API_KEY secret for this organization before deploying. Run: curl -X POST https://api.fold.run/secrets -H 'Authorization: Bearer <token>' -d '{\"name\": \"API_KEY\", \"value\": \"...\"}'"
}

The diagnosis examines the last 10 error activations and the function's source code to provide actionable suggestions.

Via MCP

AI agents can diagnose errors using the MCP diagnose_function tool:

"Diagnose the errors on my weather-lookup function"

See MCP integration for details.

Plan requirement

AI features require the Pro plan. On the Free plan, AI endpoints return a 403 with error code PRO_PLAN_REQUIRED.

The Pro plan includes 5,000,000 AI tokens per month. Usage beyond this cap is metered at $10.00 per 1,000,000 tokens (see Billing for details). The Free plan has no AI token allowance.

Upgrade from the billing page or see Billing.

Reliability

AI features are fire-and-forget — they never block deploys or activations. If the AI model is temporarily unavailable, the platform continues operating normally without AI-generated summaries or diagnoses.