Create chat completion
https://api.poe.com/v1/chat/completionsOverview
Creates a chat completion response for the given conversation.
Features:
- Streaming support
- Tool calling (function calling)
- Multi-modal inputs (text, images)
- OpenAI-compatible format
Important notes:
- Private bots are not currently supported
- Image/video/audio bots should use
stream: falsefor best results - Custom parameters require the Poe Python SDK
Authentication
Send your Poe API key in the Authorization header:
Authorization: Bearer sk_test_51SAMPLEKEYAll requests must be made over HTTPS.
Parameters
This endpoint does not accept query or path parameters.
Request body
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Required | ID of the model to use. Use Poe bot names. Note: Poe UI-specific system prompts are skipped |
messages | object[] | Required | A list of messages comprising the conversation so far |
messages[].role | "system" | "user" | "assistant" | "tool" | Required | The role of the message author Allowed values: system, user, assistant, tool |
messages[].content | string | object[] | Optional | The contents of the message |
messages[].name | string | Optional | The name of the author of this message |
messages[].tool_calls | object[] | Optional | Tool calls generated by the model |
messages[].tool_call_id | string | Optional | Tool call that this message is responding to |
max_tokens | integer | null | Optional | Maximum number of tokens to generate |
max_completion_tokens | integer | null | Optional | Maximum number of completion tokens to generate |
temperature | number | null | Optional | Sampling temperature between 0 and 2 Min: 0 · Max: 2 |
top_p | number | null | Optional | Nucleus sampling parameter Min: 0 · Max: 1 |
stream | boolean | Optional | Whether to stream back partial progress Default: false |
stream_options | object | null | Optional | Options for streaming |
stop | string | string[] | Optional | Up to 4 sequences where the API will stop generating |
tools | array | null | Optional | List of tools the model may call |
tool_choice | string | object | Optional | Controls which (if any) function is called by the model |
parallel_tool_calls | boolean | null | Optional | Whether to enable parallel function calling |
n | 1 | Optional | Number of chat completion choices to generate (must be 1) Default: 1 · Allowed values: 1 |
Responses
| Field | Type | Required | Description |
|---|---|---|---|
id | string | Optional | Unique identifier for the chat completion |
object | "chat.completion" | Optional | Allowed values: chat.completion |
created | integer | Optional | Unix timestamp |
model | string | Optional | The model used |
choices | object[] | Optional | |
choices[].index | integer | Optional | The index of this choice |
choices[].message | object | Optional | |
choices[].message.role | "system" | "user" | "assistant" | "tool" | Required | The role of the message author Allowed values: system, user, assistant, tool |
choices[].message.content | string | object[] | Optional | The contents of the message |
choices[].message.name | string | Optional | The name of the author of this message |
choices[].message.tool_calls | object[] | Optional | Tool calls generated by the model |
choices[].message.tool_call_id | string | Optional | Tool call that this message is responding to |
choices[].finish_reason | "stop" | "length" | "tool_calls" | "content_filter" | Optional | Reason the model stopped generating Allowed values: stop, length, tool_calls, content_filter |
usage | object | Optional | |
usage.prompt_tokens | integer | Optional | Number of tokens in the prompt |
usage.completion_tokens | integer | Optional | Number of tokens in the completion |
usage.total_tokens | integer | Optional | Total number of tokens used |
❌ Error codes
| Http | Type | Description |
|---|---|---|
| 400 | invalid_request_error | Bad request Malformed JSON or missing required fields |
| 401 | authentication_error | Authentication failed Invalid API key |
| 402 | insufficient_credits | Insufficient credits Point balance is zero or negative |
| 429 | rate_limit_error | Rate limit exceeded Rate limit exceeded (500 requests per minute) |
Best Practices
Streaming vs Non-Streaming
Use streaming (stream: true) for better user experience in chat interfaces. Users see responses as they generate rather than waiting for completion.
For most text-based models, streaming provides a better user experience:
- Users see responses immediately as they generate
- Lower perceived latency
- Better for long-form content
For image/video/audio generation models, use non-streaming mode:
- These models typically return complete outputs
- Streaming may not work as expected
Error Handling
Always implement retry logic with exponential backoff for production applications. Rate limits (429) and temporary failures (503) should be retried.
Implement proper error handling for all API calls:
async function chatWithRetry(payload, maxRetries = 3) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const response = await fetch(url, { method: 'POST', ...options });
if (!response.ok) {
const error = await response.json();
// Retry on rate limit or server errors
if ([429, 500, 502, 503].includes(response.status) && attempt < maxRetries) {
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
throw new Error(`API error: ${error.message}`);
}
return await response.json();
} catch (err) {
if (attempt === maxRetries) throw err;
}
}
}Rate Limiting
The API has a rate limit of 500 requests per minute. Monitor the X-RateLimit-Remaining response header to track your usage.
Monitor rate limits in production:
- Check
X-RateLimit-Remainingheader - Implement request queuing when approaching limits
- Consider caching responses when appropriate
🔁 Callbacks & webhooks
No callbacks or webhooks are associated with this endpoint.