← All posts

Why HTML to PDF breaks on long tables, and how to fix it

Long tables are where HTML to PDF falls apart: missing headers, rows cut in half, orphaned totals. Here is why it happens and how a declarative document model fixes it.

html-to-pdftablespaginationpdf

HTML to PDF feels solved until you print a long table. A short one looks fine. Add enough rows to spill onto a second page and the cracks show: the header is gone, a row is sliced through the middle, and the totals sit alone at the top of a fresh page. This is the single most common way PDF generation breaks, and it is worth understanding why.

The web was not built for pages

A web page is one long scroll. There is no page two. When you print HTML to PDF with a headless browser, you are forcing a layout that assumes infinite height into fixed sheets of paper. The browser slices the tall layout into pages wherever it happens to land. It does not know that a table header should repeat, or that a row should not be split, or that totals belong with the table above them.

So you reach for CSS to teach it. And that is where the day disappears.

The CSS you end up writing

To make a long table survive pagination, you start adding print rules:

thead { display: table-header-group; } /* repeat the header */
tr { break-inside: avoid; } /* keep rows whole */
tfoot { break-inside: avoid; } /* keep totals together */
.totals { break-before: avoid; } /* keep totals with the table */

In theory this works. In practice, support is uneven, the rules interact in surprising ways, and one of them quietly fails. display: table-header-group repeats the header until a cell is too tall. break-inside: avoid is ignored when a row is taller than the page. Margins and running headers fight each other. You test with ten rows, it looks great, you ship, and a customer with two hundred rows gets a broken document. Now you are debugging print CSS in a headless browser you cannot see.

Why this keeps happening

The root problem is that HTML to PDF is two layers glued together. The first layer is your content. The second is a pagination engine you are steering indirectly through CSS hints. You never get to say “this is a table, repeat its header and keep its rows whole.” You can only suggest it and hope the browser agrees. When the content grows past what you tested, the suggestion breaks and you find out in production.

A table is not special markup to a browser. It is just boxes. The browser does not know it is a table in the way you mean it. That gap is why long tables break.

The fix: tell the renderer it is a table

The way out is to stop hinting and start declaring. Instead of HTML plus print CSS, describe the document as data and let a renderer that understands pages do the work. That is the kove model. A table is a real primitive, not a stack of divs:

{
"type": "table",
"columns": ["Description", "Qty", "Unit", "Amount"],
"rows": [
["Consulting", "40", "$120.00", "$4,800.00"],
["Hosting", "1", "$84.00", "$84.00"]
]
}

Because kove knows this is a table, it can do the right thing automatically across any number of pages:

  • The column header repeats at the top of every page.
  • Rows are never cut in half.
  • The totals block stays with the table instead of drifting onto its own page.
  • Page numbers are added without you writing a single rule.

You do not set break-inside. You do not declare a header group. You do not test with ten rows and pray about two hundred. The rules are built into the renderer because it knows what a table is.

See it in one command

The fastest way to feel the difference is to render a long table yourself. Put a few hundred rows in a document and render it locally, free and with no account:

Terminal window
npx kove render report.json -o report.pdf

Open the PDF and scroll to page two. The header is there. The rows are whole. The totals are where they should be. No print CSS in sight.

If you would rather generate it from your app, the same document goes to the hosted API with a Bearer key, and you run none of the rendering. Your coding agent can even wire that call in for you: kove ships AI-friendly docs, so you ask it to add document generation and it reads the spec and writes the integration. The document is the same in every case, so the pagination is the same too.

When HTML to PDF is still fine

If your document is always one page, like a simple receipt or a label, HTML to PDF is fine and you do not need anything else. The trouble is specific to content that grows past a page, and tables are the most common version of that. Reports, invoices with many line items, statements, and order lists all hit it.

The moment your table can be long, you have a choice. You can keep tuning print CSS and re-testing every edge, or you can describe the document once and let a renderer that understands pages handle the rest. The declarative path is less code, and it does not break on the customer with two hundred rows.