企业级AI模型API负载均衡服务,提供高可用性、智能路由和实时监控
与AI模型进行对话交互,支持多轮对话、流式响应和丰富的参数配置
import httpx
import json
# 配置信息
API_BASE = "http://localhost:3811"
API_KEY = "your-api-key-here"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}"
}
# 非流式请求
payload = {
"model": "deepseek-ai/DeepSeek-R1",
"messages": [
{"role": "system", "content": "你是一个有用的AI助手。"},
{"role": "user", "content": "你好,请介绍一下自己"}
],
"temperature": 0.7,
"max_tokens": 1000,
"stream": False
}
response = httpx.post(
f"{API_BASE}/v1/chat/completions",
headers=headers,
json=payload
)
result = response.json()
print(result['choices'][0]['message']['content'])
# 流式请求
payload["stream"] = True
with httpx.stream(
"POST",
f"{API_BASE}/v1/chat/completions",
headers=headers,
json=payload
) as response:
for chunk in response.iter_text():
if chunk.strip():
# 处理流式数据
if chunk.startswith("data: "):
data = chunk[6:] # 移除 "data: " 前缀
if data.strip() != "[DONE]":
try:
json_data = json.loads(data)
content = json_data['choices'][0]['delta'].get('content', '')
if content:
print(content, end='', flush=True)
except json.JSONDecodeError:
continue
# 非流式请求
curl -X POST http://localhost:3811/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key-here" \
-d '{
"model": "deepseek-ai/DeepSeek-R1",
"messages": [
{"role": "user", "content": "你好!"}
],
"temperature": 0.7,
"max_tokens": 1000
}'
# 流式请求
curl -X POST http://localhost:3811/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key-here" \
-d '{
"model": "deepseek-ai/DeepSeek-R1",
"messages": [
{"role": "user", "content": "请详细介绍人工智能的发展历程"}
],
"stream": true
}' \
--no-buffer
获取当前可用的AI模型列表,包含模型ID、状态和基本信息
import httpx
headers = {"Authorization": f"Bearer {API_KEY}"}
response = httpx.get(f"{API_BASE}/v1/models", headers=headers)
models = response.json()
print("可用模型列表:")
print("-" * 50)
for model in models["data"]:
print(f"模型ID: {model['id']}")
print(f"创建时间: {model.get('created', 'N/A')}")
print(f"拥有者: {model.get('owned_by', 'N/A')}")
print("-" * 50)
curl -H "Authorization: Bearer your-api-key-here" \
http://localhost:3811/v1/models | jq .
传统的文本补全API,适用于简单的文本生成和补全任务
payload = {
"model": "deepseek-ai/DeepSeek-R1",
"prompt": "人工智能的未来发展趋势是",
"max_tokens": 200,
"temperature": 0.7,
"top_p": 0.9,
"stream": False
}
response = httpx.post(
f"{API_BASE}/v1/completions",
headers=headers,
json=payload
)
result = response.json()
print(result['choices'][0]['text'])
| 参数名称 | 数据类型 | 是否必需 | 说明 |
|---|---|---|---|
| model | string | 必需 | 要使用的AI模型标识符,如 "deepseek-ai/DeepSeek-R1" |
| messages | array | 必需 | 对话消息数组,每个消息包含 role(角色)和 content(内容) |
| temperature | number | 可选 | 控制响应的随机性,范围 0-2,默认 0.7。较高值产生更随机的输出 |
| max_tokens | integer | 可选 | 生成响应的最大token数量,控制输出长度 |
| stream | boolean | 可选 | 是否启用流式响应,默认 false。启用后可实时接收生成内容 |
| top_p | number | 可选 | 核采样参数,范围 0-1。与 temperature 一起控制输出随机性 |
| frequency_penalty | number | 可选 | 频率惩罚,范围 -2 到 2。正值减少重复词汇的出现 |
| presence_penalty | number | 可选 | 存在惩罚,范围 -2 到 2。正值鼓励模型讨论新话题 |
GET /v1/models
接口获取完整的实时可用模型列表和详细信息
| HTTP状态码 | 错误类型 | 错误描述 | 解决方案 |
|---|---|---|---|
| 401 | Unauthorized | API密钥无效、过期或格式错误 | 检查 Authorization 头部格式是否为 "Bearer your-api-key" |
| 429 | Rate Limited | 请求频率超过限制 | 降低请求频率,实施指数退避重试策略 |
| 400 | Bad Request | 请求参数格式错误或缺失必需参数 | 检查请求体格式和参数有效性 |
| 502 | Bad Gateway | 上游AI服务暂时不可用 | 稍后重试或尝试其他模型 |
| 503 | Service Unavailable | 服务临时维护或过载 | 等待服务恢复或联系管理员 |