使用 Python 创建和自定义 Excel 数据透视表
Source: Dev.to

Excel 数据透视表是用于汇总和分析大型数据集的强大工具,帮助用户快速从原始数据中发现趋势和洞察。然而,手动创建和维护数据透视表既耗时又容易出错,尤其是在需要定期更新报告时。
Python 提供了一种高效的方式来自动化此过程。通过以编程方式生成数据透视表,您可以确保一致性,减少人工工作量,并提升报告工作流的可靠性。
在本文中,您将学习如何使用 Python 创建和自定义 Excel 数据透视表。我们将涵盖:
- 环境搭建
- 数据准备
- 基础数据透视表创建
- 常见自定义(行/列字段、聚合方式、报告过滤器)
设置环境并准备数据
要开始我们的自动化之旅,我们需要一个可靠的 Python 库来操作 Excel 文件。本教程将使用 Spire.XLS for Python,这是一个用于读取、写入和操作 Excel 文档的强大库。
安装库
pip install spire.xls
创建示例数据并写入 Excel 工作表
from spire.xls import *
# 1. Sample data
headers = [
"Region", "Product", "Salesperson",
"Units Sold", "Revenue", "Date"
]
data = [
["East", "A", "Alice", 10, 1000, "2023-01-15"],
["West", "B", "Bob", 15, 1500, "2023-01-20"],
["East", "A", "Alice", 12, 1200, "2023-02-10"],
["South", "C", "Charlie", 8, 800, "2023-02-25"],
["West", "B", "Bob", 20, 2000, "2023-03-05"],
["North", "A", "Alice", 11, 1100, "2023-03-12"],
["East", "C", "Charlie", 9, 900, "2023-04-01"],
["South", "B", "Bob", 14, 1400, "2023-04-18"],
["North", "C", "Alice", 7, 700, "2023-05-01"],
["West", "A", "Charlie", 18, 1800, "2023-05-10"],
]
# 2. Initialize workbook and sheet
workbook = Workbook()
workbook.CreateEmptySheets(1)
sheet = workbook.Worksheets[0]
sheet.Name = "RawData"
# 3. Write headers
for col_index, header in enumerate(headers, start=1):
sheet.Range[1, col_index].Text = header
# 4. Write data rows
for row_index, row in enumerate(data, start=2):
for col_index, value in enumerate(row, start=1):
sheet.Range[row_index, col_index].Value = str(value)
# 5. Auto‑fit columns
for col in range(1, len(headers) + 1):
sheet.AutoFitColumn(col)
print("Sample data written to 'RawData' sheet.")
# Optional: Save the workbook
# workbook.SaveToFile("DataForPivot.xlsx", ExcelVersion.Version2016)
生成工作表的预览

核心:创建基本数据透视表
# Continue from the previous script
# Determine the last row (header + data rows)
last_row = len(data) + 1
# Define the data range (A1:F{last_row})
data_range = sheet.Range[f"A1:F{last_row}"]
# Add a worksheet for the pivot table
pivot_sheet = workbook.Worksheets.Add("PivotReport")
# Create a pivot cache based on the data range
cache = workbook.PivotCaches.Add(data_range)
# Add the pivot table to the new sheet
pt = pivot_sheet.PivotTables.Add(
"SalesPivot", # Pivot table name
pivot_sheet.Range["A1"], # Start cell
cache # Pivot cache
)
print("Basic pivot table structure created.")
此时工作簿包含一个空的数据透视表壳。打开文件在 Excel 中会显示一个空白的数据透视区域,准备进行字段配置。
定制透视表以获取更深入的洞察
透视表的真正威力在于对其字段的配置:行、列、数据(聚合)以及报表筛选器。
添加行字段
# Add 'Region' and 'Product' as row fields
row_field_region = pt.PivotFields["Region"]
row_field_region.Axis = AxisTypes.Row
row_field_product = pt.PivotFields["Product"]
row_field_product.Axis = AxisTypes.Row
print("Row fields 'Region' and 'Product' added.")
添加列字段
# Add 'Salesperson' as a column field
col_field_salesperson = pt.PivotFields["Salesperson"]
col_field_salesperson.Axis = AxisTypes.Column
print("Column field 'Salesperson' added.")
添加数据字段(聚合)
# Add 'Revenue' as a data field with SUM aggregation
data_field_revenue = pt.PivotFields["Revenue"]
data_field_revenue.Function = AggregateFunction.Sum
data_field_revenue.Name = "Total Revenue"
# Add 'Units Sold' as a data field with SUM aggregation
data_field_units = pt.PivotFields["Units Sold"]
data_field_units.Function = AggregateFunction.Sum
data_field_units.Name = "Total Units Sold"
print("Data fields added with SUM aggregation.")
添加报表筛选器
# Add 'Date' as a report filter
filter_field_date = pt.PivotFields["Date"]
filter_field_date.Axis = AxisTypes.Page
print("Report filter 'Date' added.")
完成并保存工作簿
# Optional: Refresh the pivot table to apply changes
pt.RefreshData()
pt.Calculate()
# Save the workbook with the pivot table
workbook.SaveToFile("AutomatedPivotReport.xlsx", ExcelVersion.Version2016)
print("Workbook saved as 'AutomatedPivotReport.xlsx'.")
添加列字段(交叉制表)
# 继续前面的脚本
# 将 'Salesperson' 添加为列字段
col_field_salesperson = pt.PivotFields["Salesperson"]
col_field_salesperson.Axis = AxisTypes.Column # 分配到列轴
print("Column field 'Salesperson' added.")
定义数据字段(值字段)
数据字段是实际进行计算的地方。您需要指定要聚合的字段以及使用哪种聚合函数(例如,求和、计数、平均值)。
# Continue from the previous script
# Add 'Units Sold' as a data field with Sum aggregation
data_field_units = pt.PivotFields["Units Sold"]
pt.DataFields.Add(data_field_units, "Sum of Units Sold", SubtotalTypes.Sum)
# Add 'Revenue' as a data field with Sum aggregation
data_field_revenue = pt.PivotFields["Revenue"]
pt.DataFields.Add(data_field_revenue, "Sum of Revenue", SubtotalTypes.Sum)
# You can also add the same field with a different aggregation, e.g., Average Revenue
pt.DataFields.Add(data_field_revenue, "Average Revenue", SubtotalTypes.Average)
print("Data fields 'Units Sold' (Sum) and 'Revenue' (Sum, Average) added.")
应用报告筛选
报告筛选允许用户根据特定条件过滤整个数据透视表,提供交互式数据探索。
# Continue from the previous script
# Add 'Date' as a report filter
report_filter_date = PivotReportFilter("Date", True)
pt.ReportFilters.Add(report_filter_date)
print("Report filter 'Date' added.")
可选:布局和样式
虽然在 Excel 中可以交互式地进行大量样式设置,spire.xls 仍提供了一些基本布局和内置样式选项,以提升可读性。
# Continue from the previous script
# Apply a built‑in pivot table style
pt.BuiltInStyle = PivotBuiltInStyles.PivotStyleMedium12
# Set the report layout to tabular form for better readability
# This repeats item labels for better clarity in hierarchical row fields
pt.Options.RowLayout = PivotTableLayoutType.Tabular
pt.Options.RowHeaderCaption = "Region/Product" # Custom caption for row header
# Ensure item labels are repeated for clarity in tabular layout
pt.PivotFields["Region"].RepeatItemLabels = True
pt.PivotFields["Product"].RepeatItemLabels = True
# Calculate the pivot table to ensure all changes are applied
pt.CalculateData()
print("Pivot table style and layout applied.")
保存并查看自动化报告
# Continue from the previous script
# Save the workbook to a file
output_file = "Automated_Sales_Report.xlsx"
workbook.SaveToFile(output_file, ExcelVersion.Version2016)
workbook.Dispose()
print(f"Pivot table report saved to '{output_file}'")
print("Automation complete!")
生成的报告将如下所示:
现在,当您打开 Automated_Sales_Report.xlsx 时,您会看到两个工作表:
- RawData – 包含您的原始数据集。
- PivotReport – 包含一个预先配置好的数据透视表,按地区和产品汇总销售数据,按业务员细分,并且可以按日期进行筛选。
整个过程,从数据准备到完整格式化的报告,都是通过一段 Python 脚本完成的。
结论
使用 Python 自动化 Excel 数据透视表的创建,可在数据分析和报告效率方面实现显著提升。我们已经演示了如何利用 Spire.XLS for Python 来:
- 设置环境并准备数据以创建数据透视表。
- 初始化基本的数据透视表结构。
- 通过添加行字段、列字段、带有各种聚合函数的数据字段以及报告筛选器来自定义数据透视表。
- 应用基本的样式和布局选项,以提升可读性。
- 保存最终的自动化 Excel 报告。
优势
- 提高效率。
- 减少人工错误。
- 工作流可复现。
将这些自动化技术整合到您自己的项目中,探索更多自定义功能,并将其应用于您的独特数据集。数据分析的未来在于智能自动化,而 Python 是打开它的钥匙。
