How to Programmatically Split Word Documents in Java using Spire.Doc
Source: Dev.to
Introducing Spire.Doc for Java and Installation
Spire.Doc for Java is a professional Java library designed for creating, writing, editing, converting, and printing Word documents without requiring Microsoft Office to be installed. It supports DOC, DOCX, RTF, and XML formats. Its comprehensive API allows developers to perform complex document manipulations such as splitting, merging, or extracting content with high fidelity.
To integrate Spire.Doc for Java into your project, add the following dependency to your pom.xml (Maven):
<dependency>
<groupId>com.e-iceblue</groupId>
<artifactId>spire.doc</artifactId>
<version>13.11.2</version>
<repository>
<id>e-iceblue</id>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</dependency>
After adding the dependency, synchronize your project to download the necessary libraries.
Splitting Word Documents by Page Breaks
Splitting a Word document by page breaks is ideal when each logical unit (e.g., a chapter or report section) begins on a new page. This method works well when visual page separation aligns with content separation.
Process for Splitting by Page Breaks
- Create a
Documentinstance and load the source file withDocument.loadFromFile(). - Create a new Word document and add a section to it.
- Loop through all body child objects in each section of the original document, identifying paragraphs and tables.
- If the object is a table, add it directly to the new document’s section.
- If the object is a paragraph, add the paragraph to the new section, then check its child objects for page breaks.
- When a page break is found, remove it from the paragraph, save the current new document, and start a new document for the next content block.
- Repeat until all content is processed.
Java Example
import com.spire.doc.*;
import com.spire.doc.documents.*;
public class SplitDocByPageBreak {
public static void main(String[] args) throws Exception {
// Load the original document
Document original = new Document();
original.loadFromFile("E:\\Files\\SplitByPageBreak.docx");
// Prepare the first output document
Document newWord = new Document();
Section section = newWord.addSection();
int index = 0;
// Traverse all sections of the original document
for (int s = 0; s = 0) {
section.getParagraphs().get(0).getChildObjects().removeAt(breakIdx);
breakIdx--;
}
}
} else if (obj instanceof Table) {
// Add tables directly to the new document
section.getBody().getChildObjects().add(obj.deepClone());
}
}
}
// Save the final part
newWord.saveToFile("output/result" + index + ".docx", FileFormat.Docx);
}
}
}
}
Splitting Word Documents by Section Breaks
Splitting by section breaks provides more granular control, especially for documents with varying headers/footers, page orientations, or other layout differences. A section break denotes a logical division that can have its own formatting properties.
Process for Splitting by Section Breaks
- Create a
Documentinstance and load the source file. - Create a new Word document.
- Loop through all sections in the original document.
- For each section, clone it using
Section.deepClone(). - Add the cloned section to the new document with
Document.getSections().add(). - Save the resulting document using
Document.saveToFile().
Java Example
import com.spire.doc.*;
public class SplitDocBySectionBreak {
public static void main(String[] args) throws Exception {
// Load the original document
Document original = new Document();
original.loadFromFile("E:\\Files\\SplitBySectionBreak.docx");
// Prepare the output document
Document newDoc = new Document();
// Iterate through each section and clone it into the new document
for (int i = 0; i < original.getSections().getCount(); i++) {
Section srcSection = original.getSections().get(i);
Section clonedSection = (Section) srcSection.deepClone();
newDoc.getSections().add(clonedSection);
}
// Save the split document
newDoc.saveToFile("output/sectionSplit.docx", FileFormat.Docx);
}
}