构建 AI 驱动的竞争情报监控系统
发布: (2026年1月15日 GMT+8 15:21)
7 min read
原文: Dev.to
Source: Dev.to
竞争情报监控
保持领先于竞争对手需要持续的警惕——跟踪产品发布、融资轮次、合作伙伴关系以及网络上的战略举措。开源的 Competitive Intelligence Monitor 项目展示了如何使用 CocoIndex、Tavily Search 和 LLM 提取来自动化此过程,持续跟踪并将竞争对手新闻结构化存入可查询的 PostgreSQL 数据库。
工作原理
系统通过以下方式实现网页监控自动化:
- Tavily AI Search – 抓取全文文章。
- LLM 提取(GPT‑4o‑mini) – 检测结构化的“竞争事件”。
- PostgreSQL – 存储事件和来源文章,以便基于查询的智能分析。
提取的事件类型
- 产品发布和功能上线
- 合作伙伴关系与协作
- 融资轮次和财务新闻
- 关键高管的招聘 / 离职
- 收购与合并
这些事件及其来源文章被存入 PostgreSQL,团队可以用自然语言提问,例如:
- “Anthropic 最近在做什么?”
- “本周哪些竞争对手的新闻最多?”
架构图
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Tavily AI │────▶│ CocoIndex │────▶│ PostgreSQL │
│ Search │ │ Pipeline │ │ Database │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
▼ ▼ ▼
Articles Extraction Intelligence
(web data) (GPT‑4o‑mini) (structured)
数据从 Tavily 搜索结果流入 LLM 提取步骤,生成 CompetitiveEvent 对象,然后写入两张表——一张存原始文章,另一张存归一化的事件。
数据模型
import dataclasses
@dataclasses.dataclass
class CompetitiveEvent:
"""A competitive intelligence event extracted from text.
Examples:
- Product Launch: "OpenAI released GPT‑5 with multimodal capabilities"
- Partnership: "Anthropic partnered with Google Cloud for enterprise AI"
- Funding: "Mistral AI raised $400M Series B led by Andreessen Horowitz"
- Key Hire: "Former Meta AI director joined Cohere as Chief Scientist"
- Strategic Move: "Microsoft acquired AI startup Inflection for $650M"
"""
event_type: str # "product_launch", "partnership", "funding",
# "key_hire", "acquisition", "other"
competitor: str # Company name (e.g., "OpenAI")
description: str # Brief description of the event
significance: str # "high", "medium", "low"
related_companies: list[str] # Other companies mentioned
Tavily 搜索源连接器
class TavilySearchSource(SourceSpec):
"""Fetches competitive intelligence using Tavily AI Search API."""
competitor: str
days_back: int = 7
max_results: int = 10
@source_connector(
spec_cls=TavilySearchSource,
key_type=_ArticleKey,
value_type=_Article,
)
class TavilySearchConnector:
async def list(self) -> AsyncIterator[PartialSourceRow[_ArticleKey, _Article]]:
"""List articles from Tavily search."""
search_query = (
f"{self._spec.competitor} AND "
f"(funding OR partnership OR product launch OR acquisition OR executive hire)"
)
client = TavilyClient(api_key=self._api_key)
response = client.search(
query=search_query,
search_depth="advanced",
max_results=self._spec.max_results,
include_raw_content=True,
)
for ordinal, result in enumerate(response.get("results", [])):
url = result["url"]
yield PartialSourceRow(
key=_ArticleKey(url=url),
data=PartialSourceRowData(ordinal=ordinal),
)
主流水线(CocoIndex 流程)
@cocoindex.flow_def(name="CompetitiveIntelligence")
def competitive_intelligence_flow(
flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
) -> None:
"""Main pipeline for competitive intelligence monitoring."""
competitors = os.getenv("COMPETITORS", "OpenAI,Anthropic").split(",")
refresh_interval = int(os.getenv("REFRESH_INTERVAL_SECONDS", "3600"))
# Add Tavily search source for each competitor
for competitor in competitors:
data_scope[f"articles_{competitor.strip()}"] = flow_builder.add_source(
TavilySearchSource(
competitor=competitor.strip(),
days_back=7,
max_results=10,
),
refresh_interval=timedelta(seconds=refresh_interval),
)
articles_index = data_scope.add_collector()
events_index = data_scope.add_collector()
# Process each competitor's articles
for competitor in competitors:
articles = data_scope[f"articles_{competitor.strip()}"]
with articles.row() as article:
# Extract competitive events using GPT‑4o‑mini via OpenRouter
article["events"] = article["content"].transform(
cocoindex.functions.ExtractByLlm(
llm_spec=cocoindex.LlmSpec(
api_type=cocoindex.LlmApiType.OPENAI,
model="openai/gpt-4o-mini",
address="https://openrouter.ai/api/v1",
),
output_type=list[CompetitiveEvent],
instruction=(
"Extract competitive intelligence events from this article. "
"Focus on: product launches, partnerships, funding rounds, "
"key hires, acquisitions, and other strategic moves."
),
)
)
查询处理程序
@competitive_intelligence_flow.query_handler()
def search_events(query: str) -> list[dict]:
"""
Execute a natural‑language query against the PostgreSQL store.
Example queries:
- "What product launches did Anthropic announce this month?"
- "List all funding rounds for competitors in the last week."
"""
# Implementation omitted for brevity – uses SQL generation from the LLM.
pass
摘要
- Tavily AI 获取关于目标竞争对手的最新网络文章。
- CocoIndex 协调管道并运行 GPT‑4o‑mini 提取,以生成结构化的 CompetitiveEvent 记录。
- 所有原始文章和标准化事件都持久化在 PostgreSQL 中,支持自然语言查询和仪表盘,实现持续的竞争情报。
def ch_by_competitor(
competitor: str,
event_type: str | None = None,
limit: int = 20,
) -> cocoindex.QueryOutput:
"""Find recent competitive intelligence about a specific competitor."""
with connection_pool().connection() as conn:
with conn.cursor() as cur:
sql = f"""
SELECT e.competitor,
e.event_type,
e.description,
e.significance,
e.related_companies,
a.title,
a.url,
a.source,
a.published_at
FROM {events_table} e
JOIN {articles_table} a ON e.article_id = a.id
WHERE LOWER(e.competitor) LIKE LOWER(%s)
"""
params = [f"%{competitor}%"]
if event_type:
sql += " AND e.event_type = %s"
params.append(event_type)
sql += " ORDER BY a.published_at DESC LIMIT %s"
cur.execute(sql, params)
return cocoindex.QueryOutput(results=[...])
配置(环境变量)
DATABASE_URL=postgresql://user:password@localhost:5432/competitive_intel
COCOINDEX_DATABASE_URL=postgresql://user:password@localhost:5432/competitive_intel
OPENAI_API_KEY=sk-or-v1-...
TAVILY_API_KEY=tvly-...
COMPETITORS=OpenAI,Anthropic,Google AI,Meta AI,Mistral AI
REFRESH_INTERVAL_SECONDS=3600
SEARCH_DAYS_BACK=7
首次设置
python3 run_interactive.py
自动部署 (CocoIndex)
cocoindex update main -f # Initial sync
cocoindex update -L main.py # Continuous monitoring
监控工作原理
- AI‑native search – Tavily 提取文章内容,避免脆弱的抓取。
- De‑duplication – CocoIndex 通过增量处理跟踪已处理的文章。
- Signal extraction – 对结构化事件进行重要性评分。
- Flexible analysis – 双重索引(原始 + 提取)提供最大灵活性。
支持的查询类型
- 按竞争对手名称搜索
- 按事件类型筛选(融资、合作、收购等)
- 按重要性排名(高 = 3,中 = 2,低 = 1 加权评分)
- 跨时间段的趋势分析
项目概述
竞争情报监控
使用 AI 驱动的搜索和大语言模型提取,跟踪网络上的竞争对手提及。该流水线自动:
- 使用 Tavily AI(为代理优化)进行网络搜索
- 使用 DeepSeek 大语言模型分析提取竞争情报事件
- 将原始文章和提取的事件一起索引到 PostgreSQL
捕获的事件类型
- 产品发布与功能更新
- 合作伙伴关系与协作
- 融资轮次与财务新闻
- 关键高管的招聘/离职
- 收购与合并
示例查询
- “OpenAI 最近在做什么?”
- “哪些竞争对手的新闻最多?”
- “查找所有合作伙伴宣布的消息”
- “本周最重要的竞争动作是什么?”
前置条件
- PostgreSQL(本地安装或云服务)
- Python 3.11+(CocoIndex 所需)
- API 密钥(必需):
- Tavily API 密钥(免费层:1,000 次请求/天)
- OpenAI / OpenRouter API 密钥(用于 LLM 提取)
构建工具
- CocoIndex – 现代数据管道框架
- Tavily AI Search – AI 原生搜索引擎
- OpenRouter – 多模型 API 网关
贡献
有问题或想要贡献?请在下方留言或在 GitHub 上打开一个 issue!
License: MIT
Repository: