微软 GraphRAG 框架
GraphRAG 方法论
GraphRAG 介绍

\
GraphRAG 是一种结构化的、分层的检索增强生成 (RAG) 方法,不同于使用纯文本片段的简单语义搜索方法。GraphRAG 流程包括从原始文本中提取知识图谱、构建社区层次结构、为这些社区生成摘要,然后在执行基于 RAG 的任务时利用这些结构。
传统 RAG 的不足
- 基线 RAG 难以将各个信息点联系起来。这种情况发生在回答问题需要梳理分散的信息,并分析它们之间的共同属性,从而得出新的综合见解时。
- 当被要求全面理解大型数据集甚至单个大型文档中的概括语义概念时,基线 RAG 的表现很差。
GraphRAG 关键理论
GraphRAG 使用 LLM 基于输入语料库创建知识图谱用于增强查询时的提示。GraphRAG 在回答上述两类问题方面表现出显着的进步。
- 知识图谱建模
- 本地搜索
- 全局搜索
- 向量搜索
- 文本搜索
微软 GraphRAG
基于模块化图的检索增强生成 (RAG) 系统,此存储库介绍了一种使用知识图谱内存结构来增强 LLM 输出的方法。请注意,提供的代码仅供演示,并非 Microsoft 官方支持的产品。
⚠️ 警告:GraphRAG 索引可能是一项昂贵的操作,请阅读所有文档以了解所涉及的流程和成本,并从小处着手。
索引流程
- 将输入语料库分割成一系列文本单元,这些文本单元将作为其余过程的可分析单元,并在我们的输出中提供细- 粒度的参考。
- 从文本单元中提取所有实体、关系和关键声明。
- 使用莱顿聚类法对图进行层次聚类。为了更直观地展示结果,请参见上图 1。每个圆圈代表一个实体(例- 如,人、地点或组织),圆圈的大小代表该实体的度,颜色代表其所属的社群。
- 自下而上地生成每个社区及其成员的概况。这有助于全面了解数据集。
An LLM-generated knowledge graph built using GPT-4 Turbo.
查询流程
- Local Search 本地搜索 通过向邻近实体和相关概念扩展,对特定实体进行局部搜索推理。
- Global Search 全球搜索 利用社区摘要,对语料库的整体性问题进行全局推理搜索 。
- DRIFT Search DRIFT 搜索 通过向特定实体的邻居和相关概念展开搜索 ,并结合社区信息,来寻找关于特定实体的推理。
- Basic Search 标准 RAG 基本搜索适用于当您的查询最好通过基线 RAG(标准前 k 个向量搜索)来回答时的情况。
快速开始
安装并初始化配置
# python >= 3.10
pip install graphrag
# 初始化配置
mkdir -p ./christmas/
cd christmas
graphrag init --root .
项目目录结构
.
./settings.yaml
./cache
./cache/summarize_descriptions
./cache/.DS_Store
./cache/extract_graph
./.env.qwen
./input
./input/三国演义.txt
./output
./output/text_units.parquet
./output/documents.parquet
./output/context.json
./output/stats.json
./logs
./logs/indexing-engine.log
./.env
./prompts
./prompts/extract_graph.txt
./prompts/summarize_descriptions.txt
./prompts/extract_claims.txt
./prompts/drift_search_system_prompt.txt
./prompts/local_search_system_prompt.txt
./prompts/community_report_graph.txt
./prompts/global_search_map_system_prompt.txt
./prompts/global_search_knowledge_system_prompt.txt
./prompts/basic_search_system_prompt.txt
./prompts/question_gen_system_prompt.txt
./prompts/drift_reduce_prompt.txt
./prompts/community_report_text.txt
./prompts/global_search_reduce_system_prompt.txt
配置定义
diff --git a/settings.yaml b/settings.yaml
index 861a551..871ca6c 100644
--- a/settings.yaml
+++ b/settings.yaml
@@ -10,11 +10,11 @@ models:
model_provider: openai
auth_type: api_key # or azure_managed_identity
api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file, or remove if managed identity
- model: gpt-4-turbo-preview
- # api_base: https://<instance>.openai.azure.com
- # api_version: 2024-05-01-preview
+ model: ${GRAPHRAG_MODEL}
+ api_base: ${GRAPHRAG_API_BASE}
+ api_version: ${GRAPHRAG_API_VERSION}
model_supports_json: true # recommended if this is available for your model.
- concurrent_requests: 25
+ concurrent_requests: 1
async_mode: threaded # or asyncio
retry_strategy: exponential_backoff
max_retries: 10
@@ -24,11 +24,11 @@ models:
type: embedding
model_provider: openai
auth_type: api_key
- api_key: ${GRAPHRAG_API_KEY}
- model: text-embedding-3-small
- # api_base: https://<instance>.openai.azure.com
- # api_version: 2024-05-01-preview
- concurrent_requests: 25
+ api_key: ${GRAPHRAG_EMBEDDING_API_KEY}
+ model: ${GRAPHRAG_EMBEDDING_MODEL}
+ api_base: ${GRAPHRAG_EMBEDDING_API_BASE}
+ api_version: ${GRAPHRAG_EMBEDDING_API_VERSION}
+ concurrent_requests: 1
async_mode: threaded # or asyncio
retry_strategy: exponential_backoff
max_retries: 10
@@ -116,7 +116,7 @@ umap:
enabled: false # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)
snapshots:
- graphml: false
+ graphml: true
embeddings: false
### Query settings ###
.env 配置示例
# .env内容
GRAPHRAG_API_BASE=http://127.0.0.1:8001/compatible-mode/v1
GRAPHRAG_API_KEY=${QWEN_TOKEN}
GRAPHRAG_MODEL=qwen-flash
GRAPHRAG_API_VERSION=qwen-flash
GRAPHRAG_EMBEDDING_API_BASE=http://127.0.0.1:8003/_hy/v1
GRAPHRAG_EMBEDDING_API_KEY=ceshiren.com
GRAPHRAG_EMBEDDING_MODEL=qwen3-embedding
GRAPHRAG_EMBEDDING_API_VERSION=qwen3-embedding
微软 GraphRAG 基本流程
# 下载文档
mkdir -p input
curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt -o input/book.txt
# 索引
graphrag index --root .
# 查询
graphrag query --root . --method local -q "选择打卡方式有几种, 使用中文回复"
graphrag query --root . --method glboal -q "选择打卡方式有几种, 使用中文回复"
graphrag query --root . --method drift -q "选择打卡方式有几种, 使用中文回复"
提示词
{
"messages": [
{
"role": "user",
"content": "\n-Goal-\nGiven a text document that is potentially relevant to this activity and a list of entity types, identify all entities of those types from the text and all relationships among the identified entities.\n \n-Steps-\n1. Identify all entities. For each identified entity, extract the following information:\n- entity_name: Name of the entity, capitalized\n- entity_type: One of the following types: [organization,person,geo,event]\n- entity_description: Comprehensive description of the entity's attributes and activities\nFormat each entity as (\"entity\"<|><entity_name><|><entity_type><|><entity_description>)\n \n2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly related* to each other.\nFor each pair of related entities, extract the following information:\n- source_entity: name of the source entity, as identified in step 1\n- target_entity: name of the target entity, as identified in step 1\n- relationship_description: explanation as to why you think the source entity and the target entity are related to each other\n- relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entity\n Format each relationship as (\"relationship\"<|><source_entity><|><target_entity><|><relationship_description><|><relationship_strength>)\n \n3. Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use **##** as the list delimiter.\n \n4. When finished, output <|COMPLETE|>\n \n######################\n-Examples-\n######################\nExample 1:\nEntity_types: ORGANIZATION,PERSON\nText:\nThe Verdantis's Central Institution is scheduled to meet on Monday and Thursday, with the institution planning to release its latest policy decision on Thursday at 1:30 p.m. PDT, followed by a press conference where Central Institution Chair Martin Smith will take questions. Investors expect the Market Strategy Committee to hold its benchmark interest rate steady in a range of 3.5%-3.75%.\n######################\nOutput:\n(\"entity\"<|>CENTRAL INSTITUTION<|>ORGANIZATION<|>The Central Institution is the Federal Reserve of Verdantis, which is setting interest rates on Monday and Thursday)\n##\n(\"entity\"<|>MARTIN SMITH<|>PERSON<|>Martin Smith is the chair of the Central Institution)\n##\n(\"entity\"<|>MARKET STRATEGY COMMITTEE<|>ORGANIZATION<|>The Central Institution committee makes key decisions about interest rates and the growth of Verdantis's money supply)\n##\n(\"relationship\"<|>MARTIN SMITH<|>CENTRAL INSTITUTION<|>Martin Smith is the Chair of the Central Institution and will answer questions at a press conference<|>9)\n<|COMPLETE|>\n\n######################\nExample 2:\nEntity_types: ORGANIZATION\nText:\nTechGlobal's (TG) stock skyrocketed in its opening day on the Global Exchange Thursday. But IPO experts warn that the semiconductor corporation's debut on the public markets isn't indicative of how other newly listed companies may perform.\n\nTechGlobal, a formerly public company, was taken private by Vision Holdings in 2014. The well-established chip designer says it powers 85% of premium smartphones.\n######################\nOutput:\n(\"entity\"<|>TECHGLOBAL<|>ORGANIZATION<|>TechGlobal is a stock now listed on the Global Exchange which powers 85% of premium smartphones)\n##\n(\"entity\"<|>VISION HOLDINGS<|>ORGANIZATION<|>Vision Holdings is a firm that previously owned TechGlobal)\n##\n(\"relationship\"<|>TECHGLOBAL<|>VISION HOLDINGS<|>Vision Holdings formerly owned TechGlobal from 2014 until present<|>5)\n<|COMPLETE|>\n\n######################\nExample 3:\nEntity_types: ORGANIZATION,GEO,PERSON\nText:\nFive Aurelians jailed for 8 years in Firuzabad and widely regarded as hostages are on their way home to Aurelia.\n\nThe swap orchestrated by Quintara was finalized when $8bn of Firuzi funds were transferred to financial institutions in Krohaara, the capital of Quintara.\n\nThe exchange initiated in Firuzabad's capital, Tiruzia, led to the four men and one woman, who are also Firuzi nationals, boarding a chartered flight to Krohaara.\n\nThey were welcomed by senior Aurelian officials and are now on their way to Aurelia's capital, Cashion.\n\nThe Aurelians include 39-year-old businessman Samuel Namara, who has been held in Tiruzia's Alhamia Prison, as well as journalist Durke Bataglani, 59, and environmentalist Meggie Tazbah, 53, who also holds Bratinas nationality.\n######################\nOutput:\n(\"entity\"<|>FIRUZABAD<|>GEO<|>Firuzabad held Aurelians as hostages)\n##\n(\"entity\"<|>AURELIA<|>GEO<|>Country seeking to release hostages)\n##\n(\"entity\"<|>QUINTARA<|>GEO<|>Country that negotiated a swap of money in exchange for hostages)\n##\n##\n(\"entity\"<|>TIRUZIA<|>GEO<|>Capital of Firuzabad where the Aurelians were being held)\n##\n(\"entity\"<|>KROHAARA<|>GEO<|>Capital city in Quintara)\n##\n(\"entity\"<|>CASHION<|>GEO<|>Capital city in Aurelia)\n##\n(\"entity\"<|>SAMUEL NAMARA<|>PERSON<|>Aurelian who spent time in Tiruzia's Alhamia Prison)\n##\n(\"entity\"<|>ALHAMIA PRISON<|>GEO<|>Prison in Tiruzia)\n##\n(\"entity\"<|>DURKE BATAGLANI<|>PERSON<|>Aurelian journalist who was held hostage)\n##\n(\"entity\"<|>MEGGIE TAZBAH<|>PERSON<|>Bratinas national and environmentalist who was held hostage)\n##\n(\"relationship\"<|>FIRUZABAD<|>AURELIA<|>Firuzabad negotiated a hostage exchange with Aurelia<|>2)\n##\n(\"relationship\"<|>QUINTARA<|>AURELIA<|>Quintara brokered the hostage exchange between Firuzabad and Aurelia<|>2)\n##\n(\"relationship\"<|>QUINTARA<|>FIRUZABAD<|>Quintara brokered the hostage exchange between Firuzabad and Aurelia<|>2)\n##\n(\"relationship\"<|>SAMUEL NAMARA<|>ALHAMIA PRISON<|>Samuel Namara was a prisoner at Alhamia prison<|>8)\n##\n(\"relationship\"<|>SAMUEL NAMARA<|>MEGGIE TAZBAH<|>Samuel Namara and Meggie Tazbah were exchanged in the same hostage release<|>2)\n##\n(\"relationship\"<|>SAMUEL NAMARA<|>DURKE BATAGLANI<|>Samuel Namara and Durke Bataglani were exchanged in the same hostage release<|>2)\n##\n(\"relationship\"<|>MEGGIE TAZBAH<|>DURKE BATAGLANI<|>Meggie Tazbah and Durke Bataglani were exchanged in the same hostage release<|>2)\n##\n(\"relationship\"<|>SAMUEL NAMARA<|>FIRUZABAD<|>Samuel Namara was a hostage in Firuzabad<|>2)\n##\n(\"relationship\"<|>MEGGIE TAZBAH<|>FIRUZABAD<|>Meggie Tazbah was a hostage in Firuzabad<|>2)\n##\n(\"relationship\"<|>DURKE BATAGLANI<|>FIRUZABAD<|>Durke Bataglani was a hostage in Firuzabad<|>2)\n<|COMPLETE|>\n\n######################\n-Real Data-\n######################\nEntity_types: organization,person,geo,event\nText: �机有相应的人数上限,不能选择人数超限的考勤机作为打卡设备。\n5 / 设置打卡时间\n添加企业的上下班时间,你还可以在高级设置中设置允许迟到、早退的分钟数等,设计更人性化的打卡方式。\n\n6 / 关于外出打卡\n与上下班打卡不同,外出打卡是一种特殊的打卡方式,它适合出差外勤等无法在办公区域打卡的成员。配置外出打卡规则后,外出人员只需要在企业微信手机客户端中上传位置定位和拍照来作为打卡依据。为保证打卡的真实性,管理员可设置添加的照片只能通过拍照来提交,防止作弊。\n\n设置特殊打卡规则\n为适应不同企业和员工的个性需求,你可以在管理后台对添加的打卡规则进行特殊设置。如设置拍照打卡、补卡申请,添加特殊日期打卡等。\n\n设置入口:【管理后台】>【企业应用】>【打卡】>【上下班打卡】>【设置】查看\n\n1 / 拍照打卡\n为防止员工打卡作弊,可设置打卡时必须拍照。\n设置方法:在添加规则页面底部勾选“员工打卡必须拍照”。\n设置异常打卡时提交的备注图片只能拍照上传。\n设置方法:在添加规则页面底部勾选“备注不允许上传本地图片,只能拍照”。\n2 / 补卡申请\n员工因为特殊情况迟到/早退时,可提交补卡申请,审批通过后可智能校准打卡状态。\n设置方法:在添加打卡规则的页面底部勾选“员工异常打卡时可提交申请,审批通过后修正异常”。设置后,【补卡申请】将会出现在【审批】应用中,员工提交补卡申请,上级审批通过后,会自动将异常打卡状态校正为正常。\n\n3 / 特殊日期打卡\n除常规打卡时间,你可以根据公司的特殊放假安排(如公司周年庆、年会、集体团建等),设置必须打卡或者不用打卡的日期,人性化打卡方式更受员工的欢迎。\n设置方法:在添加打卡规则页面,点击特殊日期的添加按钮。选择需要打卡的日期、时段和打卡事由。如需添加多个时间,可新增时段。设置不用打卡日期的方法同上。\n如果公司在非工作日有加班需求,也可设置非工作日允许打卡,员工可自由签到,记录工作时长。\n设置方法:在添加打卡规则页面,勾选“非工作日允许打卡”。\n4 / 设置打卡提醒\n在新打卡方式的普及期间,有些员工可能会忘记打卡。为避免这种情况,你可以设置打卡提醒,在打卡时间准点或提前提醒员工打卡。\n设置方法:在【添加打卡规则页面】>【提醒】的下拉列表中,选择提醒时间。届时员工的企业微信将在指定时间收到打卡提醒。\n\n5 / 设置打卡应用的可见范围\n可见范围是指在手机中可以看到打卡应用并使用该应用的人群范围。你可以在打卡应用首页进行配置。点击【修改】,从通讯录列表中选择部门和成员即可。系统默认打卡应用为所有人可见。\n\n查看/导出打卡记录\n设置入口:【管理后台】>【企业应用】>【打卡】>【上下班打卡】>【查看】查看\n\n在按日统计和按月统计Tab右上角,导出Excel打卡数据。\n在顶部工具栏根据时间、部门、打卡人员、打卡状态筛选打卡记录。\n打卡如何与审批关联\n员工的某\n######################\nOutput:"
}
],
"model": "qwen3",
"frequency_penalty": 0.0,
"n": 1,
"presence_penalty": 0.0,
"temperature": 0.0,
"top_p": 1.0
}
图谱可视化工具
- Gephi + Leiden Algorithm Plugin
- 官网 Neo4j 导入导出
- Neo4j GraphRAG

graphrag-visualizer 可视化
