当前位置：首页 > 前端开发 > 正文

Java如何把HTML转成PDF？

admin
前端开发
2025-06-02
3067

Java可通过第三方库如iText、Flying Saucer或OpenPDF实现HTML转PDF，先将HTML内容解析为DOM树，再结合CSS样式渲染布局，最终生成PDF文件，常用工具包包括iText的XML Worker或基于wkhtmltopdf的封装库。

在Java中将HTML转换为PDF是常见的开发需求，尤其适用于生成报表、电子合同、数据存档等场景，以下是几种主流实现方案,结合代码示例和关键注意事项：

主流技术方案及实现步骤

iText + Flying Saucer (XML Worker)

原理：解析HTML/CSS并渲染为PDF
优点：开源免费、支持CSS 2.1
局限：对CSS 3和复杂布局支持有限
Maven依赖：

<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itextpdf</artifactId>
    <version>5.5.13.3</version>
</dependency>
<dependency>
    <groupId>org.xhtmlrenderer</groupId>
    <artifactId>flying-saucer-pdf</artifactId>
    <version>9.1.22</version>
</dependency>

示例代码：

Java如何把HTML转成PDF？第1张

import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfWriter;
import org.xhtmlrenderer.pdf.ITextRenderer;
public class HtmlToPdf {
    public static void convert(String htmlPath, String pdfPath) throws Exception {
        Document document = new Document();
        PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(pdfPath));
        document.open();
        ITextRenderer renderer = new ITextRenderer();
        renderer.setDocument(new File(htmlPath));
        renderer.layout();
        renderer.createPDF(writer.getOs());
        document.close();
    }
}

Apache PDFBox + PDFBox-PDL

原理：直接操作PDF文档结构
适用场景：需要精细控制PDF元素的场景
Maven依赖：

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>3.0.0</version>
</dependency>

基础转换流程：

PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
try (PDPageContentStream contentStream = new PDPageContentStream(document, page)) {
    // 手动添加文本/图像（需自行解析HTML）
    contentStream.beginText();
    contentStream.setFont(PDType1Font.HELVETICA_BOLD, 12);
    contentStream.newLineAtOffset(100, 700);
    contentStream.showText("Hello PDFBox!");
    contentStream.endText();
}
document.save("output.pdf");
document.close();

OpenHTML to PDF (开源方案)

优势：支持CSS 3和现代布局
Maven依赖：

<dependency>
    <groupId>com.openhtmltopdf</groupId>
    <artifactId>openhtmltopdf-pdfbox</artifactId>
    <version>1.0.10</version>
</dependency>

示例代码：

import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;
public void convertHtmlToPdf(String htmlFile, String pdfFile) throws Exception {
    try (OutputStream os = new FileOutputStream(pdfFile)) {
        PdfRendererBuilder builder = new PdfRendererBuilder();
        builder.useFastMode();
        builder.withFile(new File(htmlFile));
        builder.toStream(os);
        builder.run();
    }
}

商业库Aspose.HTML

优势：高保真转换、企业级支持
代码示例：

import com.aspose.html.HTMLDocument;
import com.aspose.html.saving.PdfSaveOptions;
public class AsposeConverter {
    public static void convert(String inputPath, String outputPath) {
        HTMLDocument document = new HTMLDocument(inputPath);
        PdfSaveOptions options = new PdfSaveOptions();
        document.save(outputPath, options);
    }
}

关键问题与解决方案

问题类型	解决方案
中文乱码	使用嵌入中文字体：`renderer.getFontResolver().addFont("font.ttf", true);`
CSS样式丢失	避免使用CSS 3特性，或用OpenHTML替代
图片不显示	检查图片路径是否为绝对路径，或使用Base64编码
分页控制	在HTML中添加CSS打印样式：`@media print { .page-break { page-break-after: always; } }`
性能优化	启用缓存：`builder.useCache(true);`

方案选型建议

开源项目 → OpenHTML to PDF（平衡性能与兼容性）
企业级需求 → Aspose.HTML（付费但省心）
简单文本转换 → iText/Flying Saucer
自定义PDF结构 → PDFBox直接操作API

最佳实践注意事项

字体嵌入：始终嵌入字体避免客户端显示异常
资源路径：将图片/CSS转为绝对路径或内联资源
内存管理：使用try-with-resources确保流关闭
错误处理：捕获IOException和DocumentException

// 健壮性处理示例
try (PDDocument doc = new PDDocument()) {
    // 操作逻辑
} catch (IOException e) {
    logger.error("PDF生成失败", e);
    throw new ConversionException("文件创建异常");
}