使用 Python 创建和自定义 Excel 数据透视表

发布: 1个月前 (2025年12月16日 GMT+8 09:47)

8 min read

原文: Dev.to

Source: Dev.to

使用 Python 自动化 Excel 数据透视表创建

Excel 数据透视表是用于汇总和分析大型数据集的强大工具，帮助用户快速从原始数据中发现趋势和洞察。然而，手动创建和维护数据透视表既耗时又容易出错，尤其是在需要定期更新报告时。

Python 提供了一种高效的方式来自动化此过程。通过以编程方式生成数据透视表，您可以确保一致性，减少人工工作量，并提升报告工作流的可靠性。

在本文中，您将学习如何使用 Python 创建和自定义 Excel 数据透视表。我们将涵盖：

环境搭建
数据准备
基础数据透视表创建
常见自定义（行/列字段、聚合方式、报告过滤器）

设置环境并准备数据

要开始我们的自动化之旅，我们需要一个可靠的 Python 库来操作 Excel 文件。本教程将使用 Spire.XLS for Python，这是一个用于读取、写入和操作 Excel 文档的强大库。

安装库

pip install spire.xls

创建示例数据并写入 Excel 工作表

from spire.xls import *

# 1. Sample data
headers = [
    "Region", "Product", "Salesperson",
    "Units Sold", "Revenue", "Date"
]

data = [
    ["East",  "A", "Alice",   10, 1000, "2023-01-15"],
    ["West",  "B", "Bob",     15, 1500, "2023-01-20"],
    ["East",  "A", "Alice",   12, 1200, "2023-02-10"],
    ["South", "C", "Charlie",  8,  800, "2023-02-25"],
    ["West",  "B", "Bob",     20, 2000, "2023-03-05"],
    ["North", "A", "Alice",   11, 1100, "2023-03-12"],
    ["East",  "C", "Charlie",  9,  900, "2023-04-01"],
    ["South", "B", "Bob",     14, 1400, "2023-04-18"],
    ["North", "C", "Alice",    7,  700, "2023-05-01"],
    ["West",  "A", "Charlie", 18, 1800, "2023-05-10"],
]

# 2. Initialize workbook and sheet
workbook = Workbook()
workbook.CreateEmptySheets(1)
sheet = workbook.Worksheets[0]
sheet.Name = "RawData"

# 3. Write headers
for col_index, header in enumerate(headers, start=1):
    sheet.Range[1, col_index].Text = header

# 4. Write data rows
for row_index, row in enumerate(data, start=2):
    for col_index, value in enumerate(row, start=1):
        sheet.Range[row_index, col_index].Value = str(value)

# 5. Auto‑fit columns
for col in range(1, len(headers) + 1):
    sheet.AutoFitColumn(col)

print("Sample data written to 'RawData' sheet.")

# Optional: Save the workbook
# workbook.SaveToFile("DataForPivot.xlsx", ExcelVersion.Version2016)

生成工作表的预览

使用 Python 生成的 Excel 工作表

核心：创建基本数据透视表

# Continue from the previous script

# Determine the last row (header + data rows)
last_row = len(data) + 1

# Define the data range (A1:F{last_row})
data_range = sheet.Range[f"A1:F{last_row}"]

# Add a worksheet for the pivot table
pivot_sheet = workbook.Worksheets.Add("PivotReport")

# Create a pivot cache based on the data range
cache = workbook.PivotCaches.Add(data_range)

# Add the pivot table to the new sheet
pt = pivot_sheet.PivotTables.Add(
    "SalesPivot",          # Pivot table name
    pivot_sheet.Range["A1"],  # Start cell
    cache                 # Pivot cache
)

print("Basic pivot table structure created.")

此时工作簿包含一个空的数据透视表壳。打开文件在 Excel 中会显示一个空白的数据透视区域，准备进行字段配置。

定制透视表以获取更深入的洞察

透视表的真正威力在于对其字段的配置：行、列、数据（聚合）以及报表筛选器。

添加行字段

# Add 'Region' and 'Product' as row fields
row_field_region = pt.PivotFields["Region"]
row_field_region.Axis = AxisTypes.Row

row_field_product = pt.PivotFields["Product"]
row_field_product.Axis = AxisTypes.Row

print("Row fields 'Region' and 'Product' added.")

添加列字段

# Add 'Salesperson' as a column field
col_field_salesperson = pt.PivotFields["Salesperson"]
col_field_salesperson.Axis = AxisTypes.Column

print("Column field 'Salesperson' added.")

添加数据字段（聚合）

# Add 'Revenue' as a data field with SUM aggregation
data_field_revenue = pt.PivotFields["Revenue"]
data_field_revenue.Function = AggregateFunction.Sum
data_field_revenue.Name = "Total Revenue"

# Add 'Units Sold' as a data field with SUM aggregation
data_field_units = pt.PivotFields["Units Sold"]
data_field_units.Function = AggregateFunction.Sum
data_field_units.Name = "Total Units Sold"

print("Data fields added with SUM aggregation.")

添加报表筛选器

# Add 'Date' as a report filter
filter_field_date = pt.PivotFields["Date"]
filter_field_date.Axis = AxisTypes.Page

print("Report filter 'Date' added.")

完成并保存工作簿

# Optional: Refresh the pivot table to apply changes
pt.RefreshData()
pt.Calculate()

# Save the workbook with the pivot table
workbook.SaveToFile("AutomatedPivotReport.xlsx", ExcelVersion.Version2016)

print("Workbook saved as 'AutomatedPivotReport.xlsx'.")

添加列字段（交叉制表）

# 继续前面的脚本

# 将 'Salesperson' 添加为列字段
col_field_salesperson = pt.PivotFields["Salesperson"]
col_field_salesperson.Axis = AxisTypes.Column   # 分配到列轴

print("Column field 'Salesperson' added.")

定义数据字段（值字段）

数据字段是实际进行计算的地方。您需要指定要聚合的字段以及使用哪种聚合函数（例如，求和、计数、平均值）。

# Continue from the previous script

# Add 'Units Sold' as a data field with Sum aggregation
data_field_units = pt.PivotFields["Units Sold"]
pt.DataFields.Add(data_field_units, "Sum of Units Sold", SubtotalTypes.Sum)

# Add 'Revenue' as a data field with Sum aggregation
data_field_revenue = pt.PivotFields["Revenue"]
pt.DataFields.Add(data_field_revenue, "Sum of Revenue", SubtotalTypes.Sum)

# You can also add the same field with a different aggregation, e.g., Average Revenue
pt.DataFields.Add(data_field_revenue, "Average Revenue", SubtotalTypes.Average)

print("Data fields 'Units Sold' (Sum) and 'Revenue' (Sum, Average) added.")

应用报告筛选

报告筛选允许用户根据特定条件过滤整个数据透视表，提供交互式数据探索。

# Continue from the previous script

# Add 'Date' as a report filter
report_filter_date = PivotReportFilter("Date", True)
pt.ReportFilters.Add(report_filter_date)

print("Report filter 'Date' added.")

可选：布局和样式

虽然在 Excel 中可以交互式地进行大量样式设置，spire.xls 仍提供了一些基本布局和内置样式选项，以提升可读性。

# Continue from the previous script

# Apply a built‑in pivot table style
pt.BuiltInStyle = PivotBuiltInStyles.PivotStyleMedium12

# Set the report layout to tabular form for better readability
# This repeats item labels for better clarity in hierarchical row fields
pt.Options.RowLayout = PivotTableLayoutType.Tabular
pt.Options.RowHeaderCaption = "Region/Product"   # Custom caption for row header

# Ensure item labels are repeated for clarity in tabular layout
pt.PivotFields["Region"].RepeatItemLabels = True
pt.PivotFields["Product"].RepeatItemLabels = True

# Calculate the pivot table to ensure all changes are applied
pt.CalculateData()

print("Pivot table style and layout applied.")

保存并查看自动化报告

# Continue from the previous script

# Save the workbook to a file
output_file = "Automated_Sales_Report.xlsx"
workbook.SaveToFile(output_file, ExcelVersion.Version2016)
workbook.Dispose()

print(f"Pivot table report saved to '{output_file}'")
print("Automation complete!")

生成的报告将如下所示：

现在，当您打开 Automated_Sales_Report.xlsx 时，您会看到两个工作表：

RawData – 包含您的原始数据集。
PivotReport – 包含一个预先配置好的数据透视表，按地区和产品汇总销售数据，按业务员细分，并且可以按日期进行筛选。

整个过程，从数据准备到完整格式化的报告，都是通过一段 Python 脚本完成的。

结论

使用 Python 自动化 Excel 数据透视表的创建，可在数据分析和报告效率方面实现显著提升。我们已经演示了如何利用 Spire.XLS for Python 来：

设置环境并准备数据以创建数据透视表。
初始化基本的数据透视表结构。
通过添加行字段、列字段、带有各种聚合函数的数据字段以及报告筛选器来自定义数据透视表。
应用基本的样式和布局选项，以提升可读性。
保存最终的自动化 Excel 报告。

优势

提高效率。
减少人工错误。
工作流可复现。

将这些自动化技术整合到您自己的项目中，探索更多自定义功能，并将其应用于您的独特数据集。数据分析的未来在于智能自动化，而 Python 是打开它的钥匙。

使用 Python 创建和自定义 Excel 数据透视表

设置环境并准备数据

安装库

创建示例数据并写入 Excel 工作表

生成工作表的预览

核心：创建基本数据透视表

定制透视表以获取更深入的洞察

添加行字段

添加列字段

添加数据字段（聚合）

添加报表筛选器

完成并保存工作簿

添加列字段（交叉制表）

定义数据字段（值字段）

应用报告筛选

可选：布局和样式

保存并查看自动化报告

结论

相关文章

提升我的数据科学技能的第30天

Python 优化在跳过这一步时失败

Crier：跨平台发布您的内容

数据架构师大师专业工作簿

设置环境并准备数据

安装库

创建示例数据并写入 Excel 工作表

生成工作表的预览

核心：创建基本数据透视表

定制透视表以获取更深入的洞察

添加行字段

添加列字段

添加数据字段（聚合）

添加报表筛选器

完成并保存工作簿

添加列字段（交叉制表）

定义数据字段（值字段）

应用报告筛选

可选：布局和样式

保存并查看自动化报告

结论

相关文章

提升我的数据科学技能的第30天

Python 优化 在 跳过 这一步 时 失败

Crier：跨平台发布您的内容

数据架构师大师专业工作簿

Python 优化在跳过这一步时失败