RAG 与记忆系统：让 AI 拥有知识和记忆 📚

"LLM 的知识是静态的，RAG 让它动态获取信息。"

1. RAG 基础

1.1 什么是 RAG

RAG (Retrieval-Augmented Generation) 是一种让 LLM 访问外部知识的技术。

传统 LLM:
  问题 → LLM (仅靠训练知识) → 回答

RAG:
  问题 → 检索相关文档 → LLM (问题 + 文档) → 回答

1.2 RAG 工作流

┌─────────────────────────────────────────────────────────────────┐
│                        RAG Pipeline                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                    离线索引 (Indexing)                     │   │
│  │                                                           │   │
│  │  文档 → 分块 (Chunking) → 嵌入 (Embedding) → 向量数据库    │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                    在线查询 (Query)                        │   │
│  │                                                           │   │
│  │  用户问题 → 嵌入 → 向量搜索 → 获取相关文档 → 构建 Prompt    │   │
│  │                                          ↓                │   │
│  │                                       LLM 生成回答         │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

2. 文档分块策略

2.1 基础分块

javascript

// 按字符数分块
function chunkBySize(text, chunkSize = 1000, overlap = 200) {
  const chunks = [];
  let start = 0;
  
  while (start < text.length) {
    const end = Math.min(start + chunkSize, text.length);
    chunks.push({
      content: text.slice(start, end),
      start,
      end
    });
    start = end - overlap;  // 重叠部分保持上下文连续性
  }
  
  return chunks;
}

// 按段落分块
function chunkByParagraph(text, maxSize = 1000) {
  const paragraphs = text.split(/\n\n+/);
  const chunks = [];
  let currentChunk = '';
  
  for (const para of paragraphs) {
    if ((currentChunk + para).length > maxSize && currentChunk) {
      chunks.push(currentChunk.trim());
      currentChunk = para;
    } else {
      currentChunk += '\n\n' + para;
    }
  }
  
  if (currentChunk.trim()) {
    chunks.push(currentChunk.trim());
  }
  
  return chunks;
}

2.2 语义分块

javascript

// 基于语义边界分块 (使用 LLM)
async function semanticChunking(text) {
  const response = await llm.chat({
    messages: [{
      role: 'user',
      content: `将以下文本分割成语义完整的段落。
每个段落应该:
1. 讨论一个完整的主题
2. 可以独立理解
3. 大约 200-500 字

用 "---CHUNK---" 分隔每个段落。

文本:
${text}`
    }]
  });
  
  return response.content.split('---CHUNK---').map(c => c.trim());
}

2.3 代码分块

javascript

// 按函数/类分块代码
function chunkCode(code, language) {
  const ast = parseAST(code, language);
  const chunks = [];
  
  for (const node of ast.body) {
    if (node.type === 'FunctionDeclaration' || 
        node.type === 'ClassDeclaration' ||
        node.type === 'ExportNamedDeclaration') {
      chunks.push({
        type: node.type,
        name: node.id?.name || 'anonymous',
        content: code.slice(node.start, node.end),
        // 包含必要的 import 语句
        imports: extractRelevantImports(ast, node)
      });
    }
  }
  
  return chunks;
}

// 示例：对 TypeScript 文件分块
const chunks = chunkCode(`
import React from 'react';
import { useState } from 'react';

function Counter() {
  const [count, setCount] = useState(0);
  return <button onClick={() => setCount(c => c + 1)}>{count}</button>;
}

function App() {
  return <Counter />;
}

export default App;
`, 'typescript');

// 结果:
// [
//   { type: 'function', name: 'Counter', content: 'function Counter...', imports: ['useState'] },
//   { type: 'function', name: 'App', content: 'function App...', imports: [] }
// ]

3. 向量嵌入

3.1 使用嵌入模型

javascript

import OpenAI from 'openai';

const openai = new OpenAI();

async function getEmbedding(text) {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',  // 或 text-embedding-3-large
    input: text
  });
  
  return response.data[0].embedding;  // 返回向量 (1536维 或 3072维)
}

// 批量嵌入
async function getEmbeddings(texts) {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts  // 可以传入数组
  });
  
  return response.data.map(d => d.embedding);
}

3.2 嵌入模型选择

模型	维度	特点	适用场景
`text-embedding-3-small`	1536	快速、便宜	一般用途
`text-embedding-3-large`	3072	更精确	高精度需求
`Cohere embed-v3`	1024	多语言优秀	多语言文档
`BGE-M3`	1024	开源、本地部署	隐私敏感场景

4. 向量数据库

4.1 常用向量数据库

数据库	类型	特点
Pinecone	云服务	最易用，自动扩展
Weaviate	开源/云	混合搜索 (向量+关键词)
Qdrant	开源	Rust 高性能
Chroma	开源	轻量、适合开发
pgvector	PostgreSQL 扩展	与现有 PG 集成

4.2 Chroma 示例

javascript

import { ChromaClient } from 'chromadb';

const client = new ChromaClient();

// 创建集合
const collection = await client.createCollection({
  name: 'codebase',
  metadata: { 'hnsw:space': 'cosine' }  // 使用余弦相似度
});

// 添加文档
await collection.add({
  ids: ['doc1', 'doc2', 'doc3'],
  embeddings: [embedding1, embedding2, embedding3],
  documents: ['文档内容1', '文档内容2', '文档内容3'],
  metadatas: [
    { source: 'src/App.tsx', type: 'code' },
    { source: 'README.md', type: 'doc' },
    { source: 'src/utils.ts', type: 'code' }
  ]
});

// 查询
const results = await collection.query({
  queryEmbeddings: [queryEmbedding],
  nResults: 5,
  where: { type: 'code' }  // 可选：元数据过滤
});

4.3 PostgreSQL + pgvector

sql

-- 启用扩展
CREATE EXTENSION vector;

-- 创建表
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT,
  embedding vector(1536),  -- 1536 维向量
  metadata JSONB
);

-- 创建索引 (IVFFlat 用于大数据集)
CREATE INDEX ON documents 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- 查询最相似的文档
SELECT id, content, 1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1  -- <=> 是余弦距离运算符
LIMIT 5;

5. 检索策略

5.1 基础向量检索

javascript

async function retrieve(query, k = 5) {
  const queryEmbedding = await getEmbedding(query);
  
  const results = await collection.query({
    queryEmbeddings: [queryEmbedding],
    nResults: k
  });
  
  return results.documents[0];  // 返回 top-k 文档
}

5.2 混合检索 (Hybrid Search)

结合向量检索和关键词检索：

javascript

async function hybridSearch(query, k = 5) {
  // 向量检索
  const vectorResults = await vectorSearch(query, k * 2);
  
  // 关键词检索 (BM25)
  const keywordResults = await bm25Search(query, k * 2);
  
  // 融合排序 (RRF - Reciprocal Rank Fusion)
  const scores = new Map();
  
  vectorResults.forEach((doc, rank) => {
    const score = 1 / (60 + rank);  // RRF 公式
    scores.set(doc.id, (scores.get(doc.id) || 0) + score);
  });
  
  keywordResults.forEach((doc, rank) => {
    const score = 1 / (60 + rank);
    scores.set(doc.id, (scores.get(doc.id) || 0) + score);
  });
  
  // 按融合分数排序
  return [...scores.entries()]
    .sort((a, b) => b[1] - a[1])
    .slice(0, k)
    .map(([id]) => getDocument(id));
}

5.3 重排序 (Re-ranking)

使用更精确的模型对检索结果重排序：

javascript

async function retrieveWithRerank(query, k = 5) {
  // 第一阶段：粗检索 (召回更多候选)
  const candidates = await retrieve(query, k * 3);
  
  // 第二阶段：重排序
  const reranked = await rerank(query, candidates);
  
  return reranked.slice(0, k);
}

async function rerank(query, documents) {
  // 使用 Cohere Rerank 或 cross-encoder 模型
  const response = await cohere.rerank({
    model: 'rerank-english-v3.0',
    query: query,
    documents: documents.map(d => d.content)
  });
  
  return response.results
    .sort((a, b) => b.relevance_score - a.relevance_score)
    .map(r => documents[r.index]);
}

5.4 查询扩展

javascript

async function expandQuery(query) {
  // 使用 LLM 生成多个查询变体
  const response = await llm.chat({
    messages: [{
      role: 'user',
      content: `生成 3 个与以下查询语义相似但表达不同的问题:
      
原始查询: "${query}"

每行一个，不要编号。`
    }]
  });
  
  const variations = response.content.split('\n').filter(Boolean);
  return [query, ...variations];
}

async function retrieveWithExpansion(query, k = 5) {
  const queries = await expandQuery(query);
  
  // 对每个查询变体检索
  const allResults = await Promise.all(
    queries.map(q => retrieve(q, k))
  );
  
  // 去重并合并
  const seen = new Set();
  const merged = [];
  
  for (const results of allResults) {
    for (const doc of results) {
      if (!seen.has(doc.id)) {
        seen.add(doc.id);
        merged.push(doc);
      }
    }
  }
  
  return merged.slice(0, k);
}

6. RAG 应用

6.1 完整 RAG 示例

javascript

class RAGSystem {
  constructor(collection, llm) {
    this.collection = collection;
    this.llm = llm;
  }
  
  async query(question) {
    // 1. 检索相关文档
    const docs = await this.retrieve(question);
    
    // 2. 构建增强的 Prompt
    const context = docs.map(d => d.content).join('\n\n---\n\n');
    
    // 3. 生成回答
    const response = await this.llm.chat({
      messages: [
        {
          role: 'system',
          content: `你是一个有用的助手。根据提供的上下文回答问题。
如果上下文中没有相关信息，就说"我没有找到相关信息"。

上下文:
${context}`
        },
        {
          role: 'user',
          content: question
        }
      ]
    });
    
    return {
      answer: response.content,
      sources: docs.map(d => d.metadata.source)
    };
  }
  
  async retrieve(question, k = 5) {
    const embedding = await getEmbedding(question);
    
    const results = await this.collection.query({
      queryEmbeddings: [embedding],
      nResults: k
    });
    
    return results.documents[0].map((content, i) => ({
      content,
      metadata: results.metadatas[0][i]
    }));
  }
}

6.2 代码库 RAG

javascript

class CodebaseRAG {
  async indexRepository(repoPath) {
    const files = await glob(`${repoPath}/**/*.{ts,tsx,js,jsx}`);
    
    for (const file of files) {
      const content = await fs.readFile(file, 'utf-8');
      const chunks = chunkCode(content, getLanguage(file));
      
      for (const chunk of chunks) {
        await this.collection.add({
          ids: [`${file}:${chunk.name}`],
          documents: [chunk.content],
          metadatas: [{
            file,
            type: chunk.type,
            name: chunk.name,
            imports: chunk.imports.join(',')
          }]
        });
      }
    }
  }
  
  async findRelatedCode(question) {
    // 检索相关代码
    const results = await this.retrieve(question, 10);
    
    // 让 LLM 过滤最相关的
    const filtered = await this.llm.chat({
      messages: [{
        role: 'user',
        content: `用户问题: ${question}

以下是检索到的代码片段。选择与问题最相关的 3-5 个。

${results.map((r, i) => `[${i}] ${r.metadata.file}\n${r.content}`).join('\n\n')}

输出选中的编号，逗号分隔。`
      }]
    });
    
    const indices = filtered.content.split(',').map(Number);
    return indices.map(i => results[i]);
  }
}

7. 记忆系统

7.1 对话记忆

javascript

class ConversationMemory {
  constructor(maxMessages = 20) {
    this.messages = [];
    this.maxMessages = maxMessages;
  }
  
  add(role, content) {
    this.messages.push({ role, content, timestamp: Date.now() });
    
    // 超过限制时，保留 system prompt 和最近的消息
    if (this.messages.length > this.maxMessages) {
      const systemMessages = this.messages.filter(m => m.role === 'system');
      const recentMessages = this.messages
        .filter(m => m.role !== 'system')
        .slice(-this.maxMessages + systemMessages.length);
      
      this.messages = [...systemMessages, ...recentMessages];
    }
  }
  
  getContext() {
    return this.messages.map(({ role, content }) => ({ role, content }));
  }
  
  // 生成对话摘要以压缩上下文
  async summarize() {
    if (this.messages.length < 10) return;
    
    const oldMessages = this.messages.slice(0, -5);
    
    const summary = await llm.chat({
      messages: [{
        role: 'user',
        content: `总结以下对话的关键信息:
${oldMessages.map(m => `${m.role}: ${m.content}`).join('\n')}`
      }]
    });
    
    // 用摘要替换旧消息
    this.messages = [
      { role: 'system', content: `对话历史摘要: ${summary.content}` },
      ...this.messages.slice(-5)
    ];
  }
}

7.2 长期记忆

javascript

class LongTermMemory {
  constructor(vectorDB) {
    this.db = vectorDB;
    this.collection = this.db.collection('memories');
  }
  
  async store(content, metadata = {}) {
    const id = crypto.randomUUID();
    const embedding = await getEmbedding(content);
    
    await this.collection.add({
      ids: [id],
      embeddings: [embedding],
      documents: [content],
      metadatas: [{
        ...metadata,
        timestamp: Date.now()
      }]
    });
    
    return id;
  }
  
  async recall(query, k = 5) {
    const embedding = await getEmbedding(query);
    
    const results = await this.collection.query({
      queryEmbeddings: [embedding],
      nResults: k
    });
    
    return results.documents[0].map((content, i) => ({
      content,
      metadata: results.metadatas[0][i]
    }));
  }
  
  async forget(id) {
    await this.collection.delete({ ids: [id] });
  }
  
  // 自动记忆重要信息
  async processConversation(messages) {
    // 让 LLM 提取值得记忆的信息
    const response = await llm.chat({
      messages: [{
        role: 'user',
        content: `从以下对话中提取值得长期记忆的信息（用户偏好、项目细节、重要决定等）:

${messages.map(m => `${m.role}: ${m.content}`).join('\n')}

输出 JSON 数组，每项包含 content 和 type 字段。
如果没有值得记忆的信息，输出空数组 []。`
      }],
      response_format: { type: 'json_object' }
    });
    
    const memories = JSON.parse(response.content).memories || [];
    
    for (const memory of memories) {
      await this.store(memory.content, { type: memory.type });
    }
  }
}

7.3 工作记忆 (Scratchpad)

javascript

class WorkingMemory {
  constructor() {
    this.scratchpad = {};  // 临时存储
    this.facts = [];       // 已确认的事实
    this.goals = [];       // 当前目标
  }
  
  // 存储临时计算结果
  set(key, value) {
    this.scratchpad[key] = {
      value,
      timestamp: Date.now()
    };
  }
  
  get(key) {
    return this.scratchpad[key]?.value;
  }
  
  // 记录已确认的事实
  addFact(fact) {
    this.facts.push({
      content: fact,
      timestamp: Date.now()
    });
  }
  
  // 获取当前上下文
  getContext() {
    return {
      scratchpad: Object.entries(this.scratchpad).map(([k, v]) => `${k}: ${v.value}`),
      facts: this.facts.map(f => f.content),
      goals: this.goals
    };
  }
  
  // 注入到 System Prompt
  toSystemPrompt() {
    const ctx = this.getContext();
    return `
## 当前工作记忆
${ctx.scratchpad.length > 0 ? `临时数据:\n${ctx.scratchpad.join('\n')}` : ''}
${ctx.facts.length > 0 ? `已确认的事实:\n${ctx.facts.join('\n')}` : ''}
${ctx.goals.length > 0 ? `当前目标:\n${ctx.goals.join('\n')}` : ''}
    `.trim();
  }
}

8. 上下文管理

8.1 上下文窗口优化

javascript

class ContextManager {
  constructor(maxTokens = 100000) {
    this.maxTokens = maxTokens;
    this.priorities = {
      system_prompt: 1,      // 最高优先级
      user_question: 2,
      relevant_code: 3,
      recent_messages: 4,
      retrieved_docs: 5,
      history_summary: 6     // 最低优先级
    };
  }
  
  buildContext(components) {
    // 按优先级排序
    const sorted = Object.entries(components)
      .sort(([a], [b]) => this.priorities[a] - this.priorities[b]);
    
    let totalTokens = 0;
    const included = [];
    
    for (const [type, content] of sorted) {
      const tokens = estimateTokens(content);
      
      if (totalTokens + tokens <= this.maxTokens) {
        included.push({ type, content });
        totalTokens += tokens;
      } else {
        // 尝试截断
        const remaining = this.maxTokens - totalTokens;
        if (remaining > 500) {  // 至少保留 500 tokens
          const truncated = truncateToTokens(content, remaining);
          included.push({ type, content: truncated, truncated: true });
        }
        break;
      }
    }
    
    return included;
  }
}

8.2 动态上下文选择

javascript

async function selectContext(question, availableContext) {
  // 让 LLM 选择最相关的上下文
  const response = await llm.chat({
    messages: [{
      role: 'user',
      content: `用户问题: ${question}

以下是可用的上下文。选择与问题最相关的（最多 5 个）。

${availableContext.map((ctx, i) => `[${i}] ${ctx.title}\n${ctx.preview}`).join('\n\n')}

输出选中的编号，用逗号分隔。只输出编号，不要其他内容。`
    }]
  });
  
  const indices = response.content.split(',').map(s => parseInt(s.trim()));
  return indices.map(i => availableContext[i]).filter(Boolean);
}

9. 实战：代码库问答系统

javascript

class CodebaseQA {
  constructor() {
    this.vectorDB = new ChromaClient();
    this.collection = null;
    this.memory = new ConversationMemory();
    this.longTermMemory = new LongTermMemory(this.vectorDB);
  }
  
  async initialize(repoPath) {
    this.collection = await this.vectorDB.createCollection({ name: 'codebase' });
    await this.indexRepository(repoPath);
  }
  
  async chat(question) {
    // 1. 检索相关代码
    const relevantCode = await this.retrieveCode(question);
    
    // 2. 回忆相关长期记忆
    const memories = await this.longTermMemory.recall(question, 3);
    
    // 3. 构建上下文
    const context = this.buildContext(relevantCode, memories);
    
    // 4. 获取对话历史
    this.memory.add('user', question);
    const history = this.memory.getContext();
    
    // 5. 生成回答
    const response = await llm.chat({
      messages: [
        {
          role: 'system',
          content: `你是代码库专家。根据上下文回答问题。

## 相关代码
${context.code}

## 相关记忆
${context.memories}

## 规则
- 引用代码时说明文件路径
- 如果不确定，说明你不确定
- 提供具体的代码示例`
        },
        ...history,
        { role: 'user', content: question }
      ]
    });
    
    // 6. 保存回答到记忆
    this.memory.add('assistant', response.content);
    
    // 7. 检查是否有值得长期记忆的信息
    await this.longTermMemory.processConversation([
      { role: 'user', content: question },
      { role: 'assistant', content: response.content }
    ]);
    
    return {
      answer: response.content,
      sources: relevantCode.map(c => c.metadata.file)
    };
  }
}

10. 关键要点

分块策略很重要: 语义完整的分块提高检索质量
混合检索更有效: 结合向量和关键词检索
重排序提升精度: 用更精确的模型重排检索结果
分层记忆: 工作记忆 + 对话记忆 + 长期记忆
上下文管理: 在有限窗口内优化信息选择
持续优化: 监控检索质量，迭代改进

RAG 与记忆系统：让 AI 拥有知识和记忆 📚 ​

1. RAG 基础 ​

1.1 什么是 RAG ​

1.2 RAG 工作流 ​

2. 文档分块策略 ​

2.1 基础分块 ​

2.2 语义分块 ​

2.3 代码分块 ​

3. 向量嵌入 ​

3.1 使用嵌入模型 ​

3.2 嵌入模型选择 ​

4. 向量数据库 ​

4.1 常用向量数据库 ​

4.2 Chroma 示例 ​

4.3 PostgreSQL + pgvector ​

5. 检索策略 ​

5.1 基础向量检索 ​

5.2 混合检索 (Hybrid Search) ​

5.3 重排序 (Re-ranking) ​

5.4 查询扩展 ​

6. RAG 应用 ​

6.1 完整 RAG 示例 ​

6.2 代码库 RAG ​

7. 记忆系统 ​

7.1 对话记忆 ​

7.2 长期记忆 ​

7.3 工作记忆 (Scratchpad) ​

8. 上下文管理 ​

8.1 上下文窗口优化 ​

8.2 动态上下文选择 ​

9. 实战：代码库问答系统 ​

10. 关键要点 ​

延伸阅读 ​

RAG 与记忆系统：让 AI 拥有知识和记忆 📚

1. RAG 基础

1.1 什么是 RAG

1.2 RAG 工作流

2. 文档分块策略

2.1 基础分块

2.2 语义分块

2.3 代码分块

3. 向量嵌入

3.1 使用嵌入模型

3.2 嵌入模型选择

4. 向量数据库

4.1 常用向量数据库

4.2 Chroma 示例

4.3 PostgreSQL + pgvector

5. 检索策略

5.1 基础向量检索

5.2 混合检索 (Hybrid Search)

5.3 重排序 (Re-ranking)

5.4 查询扩展

6. RAG 应用

6.1 完整 RAG 示例

6.2 代码库 RAG

7. 记忆系统

7.1 对话记忆

7.2 长期记忆

7.3 工作记忆 (Scratchpad)

8. 上下文管理

8.1 上下文窗口优化

8.2 动态上下文选择

9. 实战：代码库问答系统

10. 关键要点

延伸阅读