Documentation Index Fetch the complete documentation index at: https://mintlify.com/block/goose/llms.txt
Use this file to discover all available pages before exploring further.
Providers are the bridge between Goose and AI models. They abstract different LLM APIs behind a common interface, enabling Goose to work with 25+ AI services from Anthropic, OpenAI, local models, and more.
What is a Provider?
A provider in Goose is a component that:
Connects to an AI model service (cloud or local)
Translates Goose’s conversation format to the model’s API format
Handles authentication and API keys
Streams responses back to the agent
Manages tool calling protocols
Tracks token usage and costs
// Core provider trait (simplified)
#[async_trait]
pub trait Provider : Send + Sync {
/// Send a completion request to the model
async fn complete (
& self ,
system : String ,
messages : Vec < Message >,
tools : Vec < Tool >,
) -> Result < BoxStream < ProviderMessage >>;
/// List available models
fn list_models ( & self ) -> Vec < ModelInfo >;
/// Generate a session name from conversation
async fn generate_session_name (
& self ,
messages : & [ Message ],
) -> Result < String >;
}
Built-in Providers
Goose includes native support for many popular AI providers:
Cloud Providers
Anthropic Claude models (Sonnet, Opus, Haiku)
Native tool calling
Prompt caching
Extended context windows
OpenAI GPT models (GPT-4o, GPT-4, GPT-3.5)
Function calling
Vision support
Structured outputs
Google Gemini models via Vertex AI
Multi-modal support
Large context windows
OAuth authentication
AWS Bedrock Multiple model families
Anthropic Claude
Meta Llama
AWS credentials
Local & Open Source
Ollama Local model execution
Privacy-first
No API keys required
Qwen, Llama, Mistral, etc.
LiteLLM Unified gateway to 100+ models
Consistent API across providers
Load balancing
Fallback handling
OpenAI Compatible Custom OpenAI-compatible servers
vLLM, LocalAI, etc.
Self-hosted models
Custom endpoints
Local Inference Direct local model execution
llama.cpp integration
GGUF model support
CPU/GPU acceleration
Enterprise Providers
Azure OpenAI : Enterprise OpenAI deployment
Databricks : Databricks model serving
Snowflake Cortex : Snowflake’s AI models
GitHub Copilot : GitHub’s code models
OpenRouter : Multi-provider routing
Venice.ai : Privacy-focused inference
Provider Architecture
Configuration
Via Environment Variables
The simplest way to configure a provider:
# Anthropic
export GOOSE_PROVIDER = anthropic
export GOOSE_MODEL = claude-sonnet-4-20250514
export ANTHROPIC_API_KEY = sk-ant- ...
# OpenAI
export GOOSE_PROVIDER = openai
export GOOSE_MODEL = gpt-4o
export OPENAI_API_KEY = sk- ...
# Ollama (local)
export GOOSE_PROVIDER = ollama
export GOOSE_MODEL = qwen3-coder : latest
export OLLAMA_HOST = http :// localhost : 11434
Via Configuration File
# ~/.config/goose/config.yaml
GOOSE_PROVIDER : anthropic
GOOSE_MODEL : claude-sonnet-4-20250514
API keys stored separately in keyring or secrets file:
# ~/.config/goose/secrets.yaml (if GOOSE_DISABLE_KEYRING=1)
ANTHROPIC_API_KEY : sk-ant-...
Via Recipe
# recipe.yaml
title : GPT-4o Code Review
settings :
goose_provider : openai
goose_model : gpt-4o
temperature : 0.2
Provider Implementation
Example: Anthropic Provider
// From crates/goose/src/providers/anthropic.rs
pub struct AnthropicProvider {
api_key : String ,
base_url : String ,
http_client : reqwest :: Client ,
}
#[async_trait]
impl Provider for AnthropicProvider {
async fn complete (
& self ,
system : String ,
messages : Vec < Message >,
tools : Vec < Tool >,
) -> Result < BoxStream < ProviderMessage >> {
// Convert to Anthropic API format
let request = self . build_request (
system ,
messages ,
tools ,
) ? ;
// Make streaming API call
let response = self . http_client
. post ( & format! ( "{}/v1/messages" , self . base_url))
. header ( "x-api-key" , & self . api_key)
. header ( "anthropic-version" , "2023-06-01" )
. json ( & request )
. send ()
. await ? ;
// Stream and parse chunks
let stream = response
. bytes_stream ()
. map ( | chunk | self . parse_chunk ( chunk ))
. boxed ();
Ok ( stream )
}
fn list_models ( & self ) -> Vec < ModelInfo > {
vec! [
ModelInfo :: with_cost (
"claude-sonnet-4-20250514" ,
200_000 , // context window
0.000003 , // input cost per token
0.000015 , // output cost per token
),
// ... other models
]
}
}
Different providers have different tool calling formats. Goose translates between them:
// Anthropic format
{
"tools" : [{
"name" : "read_file" ,
"description" : "Read a file" ,
"input_schema" : {
"type" : "object" ,
"properties" : {
"path" : { "type" : "string" }
}
}
}]
}
// OpenAI format
{
"tools" : [{
"type" : "function" ,
"function" : {
"name" : "read_file" ,
"description" : "Read a file" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"path" : { "type" : "string" }
}
}
}
}]
}
Goose providers handle these translations automatically.
Model Capabilities
Providers expose model capabilities through ModelInfo:
pub struct ModelInfo {
pub name : String ,
pub context_limit : usize , // Max tokens
pub input_token_cost : Option < f64 >, // Cost per input token
pub output_token_cost : Option < f64 >, // Cost per output token
pub supports_cache_control : Option < bool >, // Prompt caching
}
Goose uses this information to:
Manage context windows
Estimate costs
Enable/disable features (like prompt caching)
Choose appropriate models for subagents
Custom Providers
You can add custom providers without modifying Goose’s code:
1. Declarative Provider (JSON)
For OpenAI-compatible APIs:
// ~/.config/goose/custom_providers/my-provider.json
{
"name" : "my_provider" ,
"engine" : "openai" ,
"display_name" : "My Custom LLM" ,
"description" : "Internal LLM endpoint" ,
"api_key_env" : "MY_PROVIDER_API_KEY" ,
"base_url" : "https://llm.company.internal/v1" ,
"models" : [
{
"name" : "company-llm-v1" ,
"context_limit" : 32768 ,
"input_token_cost" : 0.000001 ,
"output_token_cost" : 0.000002
}
],
"supports_streaming" : true ,
"requires_auth" : true
}
Supported engines:
openai: OpenAI-compatible API
anthropic: Anthropic-compatible API
ollama: Ollama-compatible API
2. Code-Based Provider (Rust)
For custom protocols:
// 1. Create a new file: crates/goose/src/providers/my_provider.rs
use super :: base :: { Provider , ModelInfo , ProviderMessage };
use async_trait :: async_trait;
use anyhow :: Result ;
pub struct MyProvider {
api_key : String ,
endpoint : String ,
}
impl MyProvider {
pub fn new ( api_key : String , endpoint : String ) -> Self {
Self { api_key , endpoint }
}
}
#[async_trait]
impl Provider for MyProvider {
async fn complete (
& self ,
system : String ,
messages : Vec < Message >,
tools : Vec < Tool >,
) -> Result < BoxStream < ProviderMessage >> {
// Your implementation here
todo! ()
}
fn list_models ( & self ) -> Vec < ModelInfo > {
vec! [ ModelInfo :: new ( "my-model-v1" , 8192 )]
}
}
// 2. Register in crates/goose/src/providers/init.rs
pub fn create_provider ( config : & Config ) -> Result < Box < dyn Provider >> {
match config . provider . as_str () {
"anthropic" => Ok ( Box :: new ( AnthropicProvider :: new ( config ) ? )),
"openai" => Ok ( Box :: new ( OpenAIProvider :: new ( config ) ? )),
"my_provider" => Ok ( Box :: new ( MyProvider :: new (
config . get_secret ( "MY_PROVIDER_API_KEY" ) ? ,
config . get ( "MY_PROVIDER_ENDPOINT" ) ? ,
) ? )),
// ...
}
}
Provider Selection
Goose determines which provider to use via configuration precedence:
Subagent settings (highest priority)
subagent(settings : { provider : "openai" , model : "gpt-4o-mini" } )
Recipe settings
settings :
goose_provider : anthropic
goose_model : claude-sonnet-4-20250514
Environment variables
GOOSE_PROVIDER = ollama
GOOSE_MODEL = qwen3-coder:latest
Config file
GOOSE_PROVIDER : anthropic
Default (Anthropic Claude)
Streaming
All providers support streaming responses:
pub enum ProviderMessage {
Text ( String ), // Text chunk
ToolUse ( ToolRequest ), // Tool call request
Thinking ( String ), // Model reasoning (if supported)
Usage ( TokenUsage ), // Token counts
Done , // Stream complete
}
// Agent consumes stream:
let mut stream = provider . complete ( system , messages , tools ) . await ? ;
while let Some ( msg ) = stream . next () . await {
match msg {
ProviderMessage :: Text ( text ) => {
// Stream to user immediately
send_to_user ( text ) . await ? ;
}
ProviderMessage :: ToolUse ( tool ) => {
// Execute tool
execute_tool ( tool ) . await ? ;
}
ProviderMessage :: Done => break ,
}
}
Token Usage Tracking
Providers report token usage for cost estimation:
pub struct TokenUsage {
pub input_tokens : usize ,
pub output_tokens : usize ,
pub cache_read_tokens : Option < usize >, // For prompt caching
pub cache_write_tokens : Option < usize >,
}
// Stored in session
session . accumulated_input_tokens += usage . input_tokens;
session . accumulated_output_tokens += usage . output_tokens;
// Calculate cost
let cost = ( usage . input_tokens as f64 * model_info . input_token_cost)
+ ( usage . output_tokens as f64 * model_info . output_token_cost);
Error Handling
Providers return standardized errors:
pub enum ProviderError {
AuthenticationError ( String ), // Invalid API key
RateLimitError ( String ), // Rate limit hit
InvalidRequestError ( String ), // Bad request
ModelNotFoundError ( String ), // Unknown model
NetworkError ( String ), // Connection issues
StreamError ( String ), // Streaming failure
// ...
}
The agent’s retry manager handles transient errors automatically:
loop {
match provider . complete ( ... ) . await {
Ok ( stream ) => return Ok ( stream ),
Err ( ProviderError :: RateLimitError ( _ )) => {
// Wait and retry
sleep ( backoff . next_delay ()) . await ;
}
Err ( ProviderError :: NetworkError ( _ )) => {
// Retry with backoff
sleep ( backoff . next_delay ()) . await ;
}
Err ( e ) => return Err ( e ), // Don't retry auth errors, etc.
}
}
Multi-Provider Workflows
You can use different providers for different tasks:
title : Multi-Provider Analysis
instructions : |
Use different models for different tasks:
- GPT-4o-mini for simple file operations
- Claude Sonnet for complex analysis
- Local Ollama for privacy-sensitive data
settings :
goose_provider : anthropic
goose_model : claude-sonnet-4-20250514
prompt : |
# Use cheap model for file listing
subagent(
instructions: "List all Python files",
settings: {provider: "openai", model: "gpt-4o-mini"}
)
# Use powerful model for analysis
subagent(
instructions: "Analyze the architecture for security issues",
settings: {provider: "anthropic", model: "claude-sonnet-4-20250514"}
)
# Use local model for sensitive data
subagent(
instructions: "Process customer data locally",
settings: {provider: "ollama", model: "qwen3-coder:latest"}
)
Provider Comparison
Provider Tool Calling Streaming Vision Local Cost Anthropic Native Yes Yes (Claude 3.5+) No $$$ OpenAI Function calling Yes Yes (GPT-4V) No $$$ Ollama Via toolshim Yes Some models Yes Free Google Gemini Native Yes Yes No $$ AWS Bedrock Model-dependent Yes Model-dependent No $$ LiteLLM Pass-through Yes Model-dependent No Varies Local Inference Via toolshim Yes No Yes Free
Native : Provider API has built-in tool calling support
Anthropic, OpenAI, Google Gemini
Best accuracy and performance
Toolshim : Goose adds tool calling via system prompts
Ollama, local models
Works but less reliable
Good for experimentation
Best Practices
Choose models appropriate to the task
Simple tasks: Use cheaper/faster models (GPT-4o-mini, Claude Haiku)
Complex reasoning: Use powerful models (Claude Sonnet, GPT-4o)
Code generation: Use code-specialized models (Qwen Coder, Claude)
Privacy-sensitive: Use local models (Ollama)
Set appropriate context limits
// Check model context before sending
let token_count = count_tokens ( & messages );
let model_info = provider . list_models ()
. find ( | m | m . name == model_name ) ? ;
if token_count > model_info . context_limit * 0.75 {
// Compact messages
messages = compact_messages ( messages );
}
Handle rate limits gracefully
# Configure retry behavior
retry :
max_retries : 3
initial_delay_ms : 1000
max_delay_ms : 30000
backoff_multiplier : 2.0
Use prompt caching when available
Anthropic and some other providers support caching system prompts: // Automatically enabled for supported models
// Significantly reduces cost for repeated requests
if model_info . supports_cache_control {
// System prompt is cached
// Only pay for new user messages
}
Troubleshooting
Common Issues
“Authentication failed”
# Check API key is set
echo $ANTHROPIC_API_KEY
# Verify in config
goose configure get ANTHROPIC_API_KEY
“Model not found”
# List available models
goose configure list-models
# Check provider documentation for exact model names
“Rate limit exceeded”
Wait and retry (automatic)
Upgrade API tier
Use multiple API keys with load balancing (via LiteLLM)
“Context length exceeded”
# Reduce max_turns or enable aggressive compaction
settings :
max_turns : 20 # Limit conversation length
Next Steps
Extensions Learn about the tools providers can use
Recipes Configure providers in recipes
Configuration Advanced provider configuration
Custom Distributions Bundle custom providers