如何在Python中获取实时新闻（3个实用示例）

发布: 1个月前 (2025年12月31日 GMT+8 08:34)

9 分钟阅读

原文: Dev.to

Source: Dev.to

如何在 Python 中获取实时新闻——3 个实用示例

在当今信息爆炸的时代，实时获取新闻数据对很多项目（如数据分析、情感分析、推荐系统等）都至关重要。本文将展示三种在 Python 中获取实时新闻的常用方法，并提供完整的代码示例，帮助你快速上手。

示例 1：使用 NewsAPI

NewsAPI 是一个流行的新闻聚合服务，提供了简洁的 RESTful 接口，支持按关键词、来源、语言等过滤新闻。

步骤

前往 NewsAPI 官网注册并获取 API Key。
安装 requests（如果尚未安装）：

pip install requests

使用以下代码获取最新头条新闻：

import requests
import json

API_KEY = "YOUR_NEWSAPI_KEY"
BASE_URL = "https://newsapi.org/v2/top-headlines"

params = {
    "country": "us",      # 目标国家代码，可自行更改
    "category": "technology",
    "apiKey": API_KEY
}

response = requests.get(BASE_URL, params=params)
data = response.json()

# 打印前 5 条新闻标题
for article in data["articles"][:5]:
    print(article["title"])

说明

country 参数决定新闻来源的国家（如 us、gb、cn 等）。
category 可选值包括 business、entertainment、health、science、sports、technology。
返回的 JSON 包含 articles 列表，每篇文章包含 title、description、url、publishedAt 等字段。

示例 2：使用 RSS Feed（`feedparser`）

很多新闻网站仍然提供 RSS Feed，使用 feedparser 可以轻松解析这些 XML 格式的订阅源。

步骤

安装 feedparser：

pip install feedparser

读取并解析 RSS Feed（以 BBC Technology 为例）：

import feedparser

RSS_URL = "http://feeds.bbci.co.uk/news/technology/rss.xml"
feed = feedparser.parse(RSS_URL)

# 打印最新 5 条新闻标题和链接
for entry in feed.entries[:5]:
    print(f"标题: {entry.title}")
    print(f"链接: {entry.link}\n")

说明

feed.entries 是一个包含所有条目的列表。
每个 entry 对象通常拥有 title、link、published、summary 等属性。
RSS 适合获取特定站点的实时更新，且不需要 API Key。

示例 3：使用网页爬虫（`requests` + `BeautifulSoup`）

当新闻源没有公开 API 或 RSS 时，可以直接抓取网页内容。下面演示如何爬取 The Verge 的最新新闻标题。

步骤

安装依赖：

pip install requests beautifulsoup4

编写爬虫代码：

import requests
from bs4 import BeautifulSoup

URL = "https://www.theverge.com/tech"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/92.0.4515.131 Safari/537.36"
}

response = requests.get(URL, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")

# 根据页面结构提取新闻标题（这里以 <h2> 标签为例）
articles = soup.find_all("h2", class_="c-entry-box--compact__title")
for article in articles[:5]:
    title = article.get_text(strip=True)
    link = article.find("a")["href"]
    print(f"标题: {title}")
    print(f"链接: {link}\n")

说明

为避免被目标站点屏蔽，务必在请求头中加入 User-Agent。
具体的 CSS 选择器（如 h2.c-entry-box--compact__title）需要根据目标网站的 HTML 结构进行调整。
爬取频率不宜过高，建议加入 time.sleep() 并遵守网站的 robots.txt 规则。

小结

NewsAPI：适合需要跨平台、跨语言统一获取新闻的场景，使用简单但受配额限制。
RSS Feed：轻量、免鉴权，适合订阅特定站点的实时更新。
网页爬虫：最灵活，可针对没有 API 的网站，但需要处理反爬机制并遵守法律法规。

根据你的项目需求，选择最合适的方式即可。如果需要进一步的过滤（如关键词、情感分析），可以在获取新闻后结合 nltk、spaCy 或 transformers 进行自然语言处理。祝你玩得开心，抓到最新的新闻！

Source: …

前置条件

Python 3.7 及以上
requests 库
```
pip install requests
```
来自 NewsMesh 的免费 API 密钥（提供免费套餐）

示例 1：获取热门新闻

import requests

API_KEY = "your_api_key_here"
BASE_URL = "https://api.newsmesh.co/v1"

def get_trending_news(limit: int = 10):
    """Fetch currently trending news articles."""
    response = requests.get(
        f"{BASE_URL}/trending",
        params={"apiKey": API_KEY, "limit": limit}
    )
    response.raise_for_status()
    return response.json()["data"]

# Usage
for article in get_trending_news():
    print(f"📰 {article['title']}")
    print(f"   Source: {article['source']} | Category: {article['category']}")
    print(f"   Link: {article['link']}\n")

示例输出

📰 Fed Signals Rate Cut in Early 2025
   Source: Reuters | Category: business
   Link: https://reuters.com/...

📰 OpenAI Announces GPT‑5 Preview
   Source: TechCrunch | Category: technology
   Link: https://techcrunch.com/...

示例 2：按类别和国家过滤新闻

def get_latest_news(category: str | None = None,
                    country: str | None = None,
                    limit: int = 10):
    """
    Fetch latest news with optional filters.

    Args:
        category: politics, technology, business, health, entertainment,
                  sports, science, lifestyle, environment, world
        country: ISO code like 'us', 'gb', 'de', 'fr', etc.
        limit: max 25 articles per request
    """
    params = {"apiKey": API_KEY, "limit": limit}
    if category:
        params["category"] = category
    if country:
        params["country"] = country

    response = requests.get(f"{BASE_URL}/latest", params=params)
    response.raise_for_status()
    return response.json()["data"]

# Technology news from the US
tech_news = get_latest_news(category="technology", country="us")

# Sports news from the UK
uk_sports = get_latest_news(category="sports", country="gb")

# All business news (max 25)
business = get_latest_news(category="business", limit=25)

可用类别

Category	Description
`politics`	政治新闻
`technology`	技术、初创公司、设备
`business`	市场、金融、经济
`health`	医学、健康
`entertainment`	名人、电影、音乐
`sports`	所有体育报道
`science`	科学发现
`lifestyle`	文化、潮流
`environment`	气候、可持续性
`world`	国际新闻

支持的国家

包括 35+ 个国家：us、gb、ca、au、de、fr、jp、in、br 等。

示例 3：按关键字搜索新闻

def search_news(query: str,
                from_date: str | None = None,
                to_date: str | None = None,
                sort_by: str = "date_descending"):
    """
    Search news articles by keyword.

    Args:
        query: Search term (e.g., "bitcoin", "climate change")
        from_date: Start date as YYYY‑MM‑DD
        to_date: End date as YYYY‑MM‑DD
        sort_by: 'date_descending', 'date_ascending', or 'relevant'
    """
    params = {
        "apiKey": API_KEY,
        "q": query,
        "sortBy": sort_by,
        "limit": 25
    }
    if from_date:
        params["from"] = from_date
    if to_date:
        params["to"] = to_date

    response = requests.get(f"{BASE_URL}/search", params=params)
    response.raise_for_status()
    return response.json()["data"]

# Search for Bitcoin news from the last week
bitcoin_news = search_news(
    query="bitcoin",
    from_date="2024-12-23",
    sort_by="relevant"
)

for article in bitcoin_news[:5]:
    print(f"• {article['title']}")
    print(f"  {article['published_date'][:10]} - {article['source']}\n")

额外奖励：简易新闻监控脚本

一个实用的脚本，用于监控包含特定关键词的新闻，并在出现新文章时打印出来。

import requests
import time
from datetime import datetime

API_KEY = "your_api_key_here"
KEYWORDS = ["AI", "OpenAI", "GPT"]   # Topics to monitor
CHECK_INTERVAL = 300                # Check every 5 minutes

seen_articles = set()

def check_for_news():
    """Check for new articles matching our keywords."""
    for keyword in KEYWORDS:
        response = requests.get(
            "https://api.newsmesh.co/v1/search",
            params={"apiKey": API_KEY, "q": keyword, "limit": 5}
        )
        if response.status_code != 200:
            continue

        for article in response.json().get("data", []):
            article_id = article["article_id"]
            if article_id not in seen_articles:
                seen_articles.add(article_id)
                print(f"\n🔔 NEW: {article['title']}")
                print(f"   Keyword: {keyword} | Source: {article['source']}")

if __name__ == "__main__":
    while True:
        check_for_news()
        time.sleep(CHECK_INTERVAL)

print(f"   Link: {article['link']}")

if __name__ == "__main__":
    print(f"Monitoring news for: {', '.join(KEYWORDS)}")
    print("Press Ctrl+C to stop\n")

    while True:
        check_for_news()
        time.sleep(CHECK_INTERVAL)

处理分页

对于获取超过 25 篇文章，请使用响应中返回的 next_cursor：

def get_all_articles(category, max_articles=100):
    """Fetch multiple pages of articles."""
    all_articles = []
    cursor = None

    while len(all_articles) < max_articles:
        params = {"apiKey": API_KEY, "category": category, "limit": 25}
        if cursor:
            params["cursor"] = cursor

        response = requests.get(f"{BASE_URL}/latest", params=params)
        data = response.json()

        all_articles.extend(data["data"])
        cursor = data.get("next_cursor")

        if not cursor:  # No more pages
            break

    return all_articles[:max_articles]

错误处理最佳实践

始终优雅地处理 API 错误：

def safe_api_call(endpoint, params):
    """Make API call with proper error handling."""
    try:
        response = requests.get(endpoint, params=params, timeout=10)
        response.raise_for_status()
        return response.json()

    except requests.exceptions.Timeout:
        print("Request timed out. Try again.")
        return None

    except requests.exceptions.HTTPError as e:
        error_data = e.response.json()
        print(f"API Error: {error_data.get('message', 'Unknown error')}")
        return None

    except requests.exceptions.RequestException as e:
        print(f"Request failed: {e}")
        return None

您可以构建什么？

新闻摘要机器人 – 每日电子邮件/Slack 汇总您关心的主题
交易信号 – 实时响应金融新闻
内容聚合器 – 构建您自己的 Google 新闻
研究工具 – 跟踪特定公司或主题的报道
新闻小部件 – 在您的网站添加实时新闻推送

总结

获取新闻的 Python 代码在拥有可靠的 API 后非常简单。上面的示例可以帮助你快速入门。

资源

你正在用新闻数据构建什么？ 在下方留下评论——我很想了解你的项目！

如何在Python中获取实时新闻（3个实用示例）

如何在 Python 中获取实时新闻——3 个实用示例

示例 1：使用 NewsAPI

步骤

说明

示例 2：使用 RSS Feed（`feedparser`）

步骤

说明

示例 3：使用网页爬虫（`requests` + `BeautifulSoup`）

步骤

说明

小结

前置条件

示例 1：获取热门新闻

示例输出

示例 2：按类别和国家过滤新闻

可用类别

支持的国家

示例 3：按关键字搜索新闻

额外奖励：简易新闻监控脚本

处理分页

错误处理最佳实践

您可以构建什么？

总结

资源

相关文章

🎲 构建 DiceForge：使用 Python 与 Tkinter 的现代掷骰模拟器

Streamlit 从零开始：如何创建一个用于从 CSV 探索和可视化数据的应用

使用 Python 和 Tkinter 构建简易文件资源管理器 – FileMate Explorer

Python 正则表达式：终极指南，彻底理解 Regex

如何在 Python 中获取实时新闻——3 个实用示例

示例 1：使用 NewsAPI

步骤

说明

示例 2：使用 RSS Feed（feedparser）

步骤

说明

示例 3：使用网页爬虫（requests + BeautifulSoup）

步骤

说明

小结

前置条件

示例 1：获取热门新闻

示例输出

示例 2：按类别和国家过滤新闻

可用类别

支持的国家

示例 3：按关键字搜索新闻

额外奖励：简易新闻监控脚本

处理分页

错误处理最佳实践

您可以构建什么？

总结

资源

相关文章

🎲 构建 DiceForge：使用 Python 与 Tkinter 的现代掷骰模拟器

Streamlit 从零开始：如何创建一个用于从 CSV 探索和可视化数据的应用

使用 Python 和 Tkinter 构建简易文件资源管理器 – FileMate Explorer

Python 正则表达式：终极指南，彻底理解 Regex

示例 2：使用 RSS Feed（`feedparser`）

示例 3：使用网页爬虫（`requests` + `BeautifulSoup`）

示例 1：获取热门新闻

示例 2：按类别和国家过滤新闻

示例 3：按关键字搜索新闻