Create chat completion

POSThttps://api.poe.com/v1/chat/completions

Overview

Creates a chat completion response for the given conversation.

Features:

  • Streaming support
  • Tool calling (function calling)
  • Multi-modal inputs (text, images)
  • OpenAI-compatible format

Important notes:

  • Private bots are not currently supported
  • Image/video/audio bots should use stream: false for best results
  • Custom parameters require the Poe Python SDK

Authentication

Send your Poe API key in the Authorization header:

Authorization: Bearer sk_test_51SAMPLEKEY

All requests must be made over HTTPS.

Parameters

This endpoint does not accept query or path parameters.

Request body

FieldTypeRequiredDescription
modelstringRequiredID of the model to use. Use Poe bot names.

Note: Poe UI-specific system prompts are skipped
messagesobject[]RequiredA list of messages comprising the conversation so far
messages[].role"system" | "user" | "assistant" | "tool"RequiredThe role of the message author Allowed values: system, user, assistant, tool
messages[].contentstring | object[]OptionalThe contents of the message
messages[].namestringOptionalThe name of the author of this message
messages[].tool_callsobject[]OptionalTool calls generated by the model
messages[].tool_call_idstringOptionalTool call that this message is responding to
max_tokensinteger | nullOptionalMaximum number of tokens to generate
max_completion_tokensinteger | nullOptionalMaximum number of completion tokens to generate
temperaturenumber | nullOptionalSampling temperature between 0 and 2 Min: 0 · Max: 2
top_pnumber | nullOptionalNucleus sampling parameter Min: 0 · Max: 1
streambooleanOptionalWhether to stream back partial progress Default: false
stream_optionsobject | nullOptionalOptions for streaming
stopstring | string[]OptionalUp to 4 sequences where the API will stop generating
toolsarray | nullOptionalList of tools the model may call
tool_choicestring | objectOptionalControls which (if any) function is called by the model
parallel_tool_callsboolean | nullOptionalWhether to enable parallel function calling
n1OptionalNumber of chat completion choices to generate (must be 1) Default: 1 · Allowed values: 1

Responses

FieldTypeRequiredDescription
idstringOptionalUnique identifier for the chat completion
object"chat.completion"OptionalAllowed values: chat.completion
createdintegerOptionalUnix timestamp
modelstringOptionalThe model used
choicesobject[]Optional
choices[].indexintegerOptionalThe index of this choice
choices[].messageobjectOptional
choices[].message.role"system" | "user" | "assistant" | "tool"RequiredThe role of the message author Allowed values: system, user, assistant, tool
choices[].message.contentstring | object[]OptionalThe contents of the message
choices[].message.namestringOptionalThe name of the author of this message
choices[].message.tool_callsobject[]OptionalTool calls generated by the model
choices[].message.tool_call_idstringOptionalTool call that this message is responding to
choices[].finish_reason"stop" | "length" | "tool_calls" | "content_filter"OptionalReason the model stopped generating Allowed values: stop, length, tool_calls, content_filter
usageobjectOptional
usage.prompt_tokensintegerOptionalNumber of tokens in the prompt
usage.completion_tokensintegerOptionalNumber of tokens in the completion
usage.total_tokensintegerOptionalTotal number of tokens used

❌ Error codes

HttpTypeDescription
400invalid_request_errorBad request Malformed JSON or missing required fields
401authentication_errorAuthentication failed Invalid API key
402insufficient_creditsInsufficient credits Point balance is zero or negative
429rate_limit_errorRate limit exceeded Rate limit exceeded (500 requests per minute)

Best Practices

Streaming vs Non-Streaming

Use streaming (stream: true) for better user experience in chat interfaces. Users see responses as they generate rather than waiting for completion.

For most text-based models, streaming provides a better user experience:

  • Users see responses immediately as they generate
  • Lower perceived latency
  • Better for long-form content

For image/video/audio generation models, use non-streaming mode:

  • These models typically return complete outputs
  • Streaming may not work as expected

Error Handling

Always implement retry logic with exponential backoff for production applications. Rate limits (429) and temporary failures (503) should be retried.

Implement proper error handling for all API calls:

async function chatWithRetry(payload, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await fetch(url, { method: 'POST', ...options });

      if (!response.ok) {
        const error = await response.json();

        // Retry on rate limit or server errors
        if ([429, 500, 502, 503].includes(response.status) && attempt < maxRetries) {
          const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
          await new Promise(resolve => setTimeout(resolve, delay));
          continue;
        }

        throw new Error(`API error: ${error.message}`);
      }

      return await response.json();
    } catch (err) {
      if (attempt === maxRetries) throw err;
    }
  }
}

Rate Limiting

The API has a rate limit of 500 requests per minute. Monitor the X-RateLimit-Remaining response header to track your usage.

Monitor rate limits in production:

  • Check X-RateLimit-Remaining header
  • Implement request queuing when approaching limits
  • Consider caching responses when appropriate

🔁 Callbacks & webhooks

No callbacks or webhooks are associated with this endpoint.