assistant/ — Assistant 核心
目录: src/assistant/
assistant/ 是 Claude Code 的LLM 对话引擎核心——封装 Claude 的调用、思考链、工具使用。
职责
和 query/ 目录互补:
- query/ — 协调工具调用循环(who/when)
- assistant/ — 专注 LLM 的调用细节(how)
核心接口
interface Assistant {
complete(params: CompleteParams): Promise<Message>
stream(params: StreamParams): AsyncIterable<StreamEvent>
countTokens(messages: Message[]): Promise<number>
}
请求构造
async function buildRequest(
messages: Message[],
tools: Tool[],
systemPrompt: string
): APIRequest {
return {
model: currentModel,
system: [
{ type: 'text', text: systemPrompt, cache_control: { type: 'ephemeral' } }
],
messages: normalizeMessages(messages),
tools: tools.map(toAPITool),
max_tokens: 8192,
temperature: 0, // 默认确定性
metadata: { user_id: hashedUserId }
}
}
流式处理
async function* stream(params: StreamParams): AsyncIterable<AssistantEvent> {
const raw = apiClient.stream(params)
for await (const event of raw) {
switch (event.type) {
case 'message_start':
yield { type: 'start', message: event.message }
break
case 'content_block_start':
yield { type: 'block_start', block: event.content_block }
break
case 'content_block_delta':
yield { type: 'delta', delta: event.delta }
break
case 'content_block_stop':
yield { type: 'block_end' }
break
case 'message_delta':
yield { type: 'usage', usage: event.usage }
break
case 'message_stop':
yield { type: 'end' }
break
}
}
}
Thinking 支持
Claude 4.6 支持 thinking(推理过程):
{
thinking: { type: 'enabled', budget_tokens: 10000 },
messages: [...]
}
返回:
{
content: [
{ type: 'thinking', thinking: '...' }, // 推理
{ type: 'text', text: '...' }, // 回复
]
}
Claude Code 默认开启 thinking——让 Agent 做决策更可靠。
Reasoning Effort
Opus 4.6 支持推理深度调节:
{
reasoning: {
effort: 'low' | 'medium' | 'high' // 思考多少
}
}
- low — 快速,浅思考
- medium — 默认
- high — 深度,慢
Claude Code 根据任务复杂度动态选择:
function selectEffort(task: Task): 'low' | 'medium' | 'high' {
if (task.type === 'quick_answer') return 'low'
if (task.type === 'refactor') return 'high'
return 'medium'
}
系统提示词
assistant/systemPrompt.ts 构造 system prompt:
function buildSystemPrompt(ctx: Context): string {
const sections = [
corePersona(), // "You are Claude Code..."
currentDate(), // "Today's date is 2026-04-05"
workingDirectory(ctx.cwd), // "Primary working directory: ..."
environment(), // OS, shell
memoryIndex(ctx.memory), // 记忆索引
availableTools(ctx.tools),
skillDescriptions(ctx.skills),
currentMode(ctx.mode), // "You are in plan mode" (if applicable)
customInstructions(ctx.config),
]
return sections.join('\n\n')
}
模型选择
function selectModel(task: Task): string {
// Haiku 用于压缩、简单任务
if (task.type === 'compact') return 'claude-haiku-4-5-20251001'
// Sonnet 平衡
if (task.type === 'routine') return 'claude-sonnet-4-6'
// Opus 复杂任务
return 'claude-opus-4-6'
}
多模型策略 — 不是所有任务都用最贵的模型。
Token 预算
async function completeWithBudget(
messages: Message[],
budget: number
): Promise<Message> {
const estimated = await countTokens(messages)
if (estimated > budget) {
throw new BudgetExceeded(estimated, budget)
}
return complete({ messages, max_tokens: budget - estimated })
}
Stop Sequences
{
stop_sequences: ['</answer>', 'USER:']
}
控制生成何时停止。
Assistant Hooks
响应的每个事件都可以被 hook:
assistant.on('thinking', (chunk) => logThinking(chunk))
assistant.on('text', (chunk) => updateUI(chunk))
assistant.on('tool_use', (tool) => logToolUse(tool))
错误恢复
async function completeWithRetry(params, attempt = 0) {
try {
return await complete(params)
} catch (e) {
if (attempt >= 3) throw e
if (e.type === 'rate_limit') {
await sleep(e.retryAfter * 1000)
return completeWithRetry(params, attempt + 1)
}
if (e.type === 'overloaded') {
await sleep(exponentialBackoff(attempt))
return completeWithRetry(params, attempt + 1)
}
throw e // 其他错误不重试
}
}
Token 计数缓存
const tokenCountCache = new Map<string, number>()
async function countTokensCached(messages: Message[]): Promise<number> {
const key = hashMessages(messages)
if (tokenCountCache.has(key)) return tokenCountCache.get(key)!
const count = await api.countTokens(messages)
tokenCountCache.set(key, count)
return count
}
Token count 调用不是免费的——缓存可省钱。
消息前处理
function preprocessMessages(messages: Message[]): Message[] {
let out = messages
out = mergeConsecutive(out) // 合并相邻同 role
out = sanitize(out) // 过滤不安全内容
out = truncateTooLong(out) // 单消息不能过长
out = validateToolPairs(out) // 验证 use/result 配对
return out
}
值得学习的点
- Assistant vs Query 分工 — how vs who/when
- Thinking 开启 — 推理可观察
- Effort 动态调节 — 按任务复杂度
- 多模型策略 — Haiku/Sonnet/Opus 分工
- Token count 缓存 — 省 API 调用
- 消息前处理 — 保证协议合规
- Hooks 贯穿全程 — 可观察性