Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for widows when determining the location of page breaks #1356

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
38 changes: 28 additions & 10 deletions src/FrameDecorator/Page.php
Original file line number Diff line number Diff line change
Expand Up @@ -357,14 +357,33 @@ protected function _page_break_allowed(Frame $frame)

// Rule C
$block_parent = $frame->find_block_parent();
if (count($block_parent->get_line_boxes()) < $frame->get_style()->orphans) {
$block_line_boxes = $block_parent->get_line_boxes();
$frame_line_number = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be initialized as $frame_line_number = count($block_line_boxes), basically as before.
Reason being that the frame might not have been added to the line boxes yet, because we are currently in the process of reflowing its children. This can definitely happen in the case of inline-block frames (which would become a concern with #2497, when line breaks before inline-block frames are enabled).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense. The line count will always be the current number of rendered lines. And if Dompdf has to walk back through previous siblings it should be able to get a valid line number.

$i = 1;
$first_node_on_line = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The $first_node_on_line variable doesn't seem to be used. I'm not sure why it's there so perhaps we can go ahead and remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable is indeed unused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps was meant to be used at some point but never fully developed. I think removing the fist-line logic makes sense since it's not current used. This should be easy enough to re-implement if we need it later.

foreach ($block_line_boxes as &$line_box) {
if ($line_box === $frame->get_containing_line()) {
$frame_line_number = $i;
$frames = $line_box->get_frames();
if (count($frames) > 0) {
$first_node_on_line = $frames[0]->get_node();
}
break;
}
$i++;
}

if ($frame_line_number <= $frame->get_style()->orphans) {
Helpers::dompdf_debug("page-break", "orphans");

return false;
}

// FIXME: Checking widows is tricky without having laid out the
// remaining line boxes. Just ignore it for now...
if (count($block_line_boxes) - $frame_line_number + 1 < $frame->get_style()->widows) {
Helpers::dompdf_debug("page-break", "widows");

return false;
}

// Rule D
$p = $block_parent;
Expand Down Expand Up @@ -392,13 +411,6 @@ protected function _page_break_allowed(Frame $frame)
return false;
}

// Skip breaks on empty text nodes
if ($frame->is_text_node() &&
$frame->get_node()->nodeValue == ""
) {
return false;
}

Helpers::dompdf_debug("page-break", "inline: break allowed");

return true;
Expand Down Expand Up @@ -546,6 +558,12 @@ function check_page_break(Frame $frame)
}

if ($next = $iter->get_prev_sibling()) {
while ($next->is_text_node() && $next->get_node()->nodeValue == "") {
$next = $next->get_prev_sibling();
if (!$next)
break 2;
}

Comment on lines 560 to +566
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks with a whitespace node as first node, in which case it does not check for a parent in the following if block (because of the break 2).

Proposal:

            $next = $iter->get_prev_sibling();
            // Skip empty text nodes
            while ($next && $next->is_text_node() && $next->get_node()->nodeValue === "") {
                $next = $next->get_prev_sibling();
            }

            if ($next) {

Helpers::dompdf_debug("page-break", "following prev sibling.");

if ($next->is_table() && !$iter->is_table())
Expand Down
28 changes: 17 additions & 11 deletions src/FrameReflower/Block.php
Original file line number Diff line number Diff line change
Expand Up @@ -792,26 +792,32 @@ function reflow(BlockFrameDecorator $block = null)

// Set the containing blocks and reflow each child
foreach ($this->_frame->get_children() as $child) {

// Bail out if the page is full
if ($page->is_full()) {
break;
}

$child->set_containing_block($cb_x, $cb_y, $w, $cb_h);

$this->process_clear($child);

$child->reflow($this->_frame);

// Don't add the child to the line if a page break has occurred
if ($page->check_page_break($child)) {
break;
}

$this->process_float($child, $cb_x, $w);
}

// Go back through and handle any page breaks
foreach ($this->_frame->get_children() as $child) {
// Skip children that are empty text nodes
if (!$child->is_text_node() ||
$child->get_node()->nodeValue != ""
) {
if ($page->check_page_break($child)) {
break;
}

// Bail out if the page is full
if ($page->is_full()) {
break;
}
}
}

Comment on lines +804 to +820
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is problematic with the current infrastructure: A page break in a child element (when reflowing the child) can now result in siblings before it staying on the original page, even though there is not enough space for them; they are checked after the page break has already occurred and so no page break is possible anymore.

Copy link
Member

@Mellthas Mellthas Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To illustrate that see the following HTML:

<!DOCTYPE html>
<html>

<head>
<meta charset="UTF-8">
<style type="text/css">
@page {
	size: 400pt 300pt;
	margin: 50pt;
}

body {
	margin: 0;
}

.block {
	height: 200pt;
	background-color: red;
}
</style>
</head>

<body><div class="block"></div><div class="block"></div><div class="block"></div><div class="block"></div><div class="block"><div></div></div></body>

</html>

And the before and after rendering:
swallow-before.pdf
swallow-after.pdf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not quite right but I think something along these lines is what we'll need to do in order to support widows. Otherwise we don't have a valid count of the number of lines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also fails if a page break is forced using page-break styling.

For sure to fully support this would require a deep rework of how the document is reflowed, e.g. reflow the content then page it instead of doing both at the same time, maybe followed by additional steps. Changing the process this way might allow us to implement some functionality that we might have trouble with otherwise. This would be a major re-work, though, so best to pursue in a future release.

// Determine our height
list($height, $margin_top, $margin_bottom, $top, $bottom) = $this->_calculate_restricted_height();
$style->height = $height;
Expand Down
121 changes: 121 additions & 0 deletions tests/Dompdf/Tests/DompdfTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -98,4 +98,125 @@ public function testSpaceAtStartOfSecondInlineTag()
$this->assertEquals("one", $text_frame_contents[0]);
$this->assertEquals(" - two", $text_frame_contents[1]);
}

private function _pageBreak($orphans, $widows, $lines_in_paragraph, $lines_that_could_fit_on_first_page)
{
$dompdf = new Dompdf();

$callback_count = 0;
$number_of_lines_on_second_page = 0;

// Use a callback to inspect the frame tree; otherwise FrameReflower\Page::reflow()
// will dispose of it before dompdf->render finishes
$dompdf->setCallbacks(array('test' => array(
'event' => 'end_page_render',
'f' => function($params) use (&$number_of_lines_on_second_page, &$callback_count) {
if ($callback_count == 1) {
$frame = $params["frame"];
foreach ($frame->get_children() as $child) {
foreach ($child->get_children() as $grandchild) {
if ($grandchild->get_node()->nodeName == "#text")
$number_of_lines_on_second_page++;
}
}
}
$callback_count++;
}
)));

$html = "<p>";
$filler = array();
$MAX_FILLER = 47; // Is there a better way to know how many lines fit on a page?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a fixed, and smaller page size together with a fixed line height should make this easier to handle.

for ($i = 1; $i <= $MAX_FILLER - $lines_that_could_fit_on_first_page; $i++) {
$filler[] = "Filler $i";
}
$html .= join("<br>", $filler);
$html .= "</p>";

$html .= "<p style=\"widows: $widows; orphans: $orphans;\">";
$lines = array();
for ($i = 1; $i <= $lines_in_paragraph; $i++) {
$lines[] = "Line $i";
}
$html .= join("<br>", $lines);
$html .= "</p>";

$dompdf->loadHtml($html);
$dompdf->render();

return $number_of_lines_on_second_page;
}

public function testPageBreak1orphan1widow()
{
$orphans = 1; // Minimum lines on first page
$widows = 1; // Minimum lines on second page
$lines_in_paragraph = 4; // Expected: 2 on first, 2 on second
$lines_that_could_fit_on_first_page = 2;
$number_of_lines_on_second_page = $this->_pageBreak($orphans, $widows, $lines_in_paragraph, $lines_that_could_fit_on_first_page);

$this->assertEquals(2, $number_of_lines_on_second_page, "Unexpected number of lines on the second page");
}

public function testPageBreak1orphan2widows()
{
$orphans = 1;
$widows = 2; // Minimum lines on second page
$lines_in_paragraph = 4;
$lines_that_could_fit_on_first_page = 2; // Expected: 2 on first, 2 on second
$number_of_lines_on_second_page = $this->_pageBreak($orphans, $widows, $lines_in_paragraph, $lines_that_could_fit_on_first_page);

$this->assertEquals(2, $number_of_lines_on_second_page, "Unexpected number of lines on the second page");
}

public function testPageBreak1orphan3widows()
{
$orphans = 1;
$widows = 3; // Minimum lines on second page
$lines_in_paragraph = 4;
$lines_that_could_fit_on_first_page = 2; // Expected: 1 on first, 3 on second
$number_of_lines_on_second_page = $this->_pageBreak($orphans, $widows, $lines_in_paragraph, $lines_that_could_fit_on_first_page);
$this->assertEquals(3, $number_of_lines_on_second_page, "Unexpected number of lines on the second page");
}

public function testPageBreak1orphan4widows()
{
$orphans = 1;
$widows = 4; // Minimum lines on second page
$lines_in_paragraph = 4;
$lines_that_could_fit_on_first_page = 2; // Expected: 0 on first, 4 on second
$number_of_lines_on_second_page = $this->_pageBreak($orphans, $widows, $lines_in_paragraph, $lines_that_could_fit_on_first_page);
$this->assertEquals(4, $number_of_lines_on_second_page, "Unexpected number of lines on the second page");
}

public function testPageBreak2orphans1widow()
{
$orphans = 2;
$widows = 1; // Minimum lines on second page
$lines_in_paragraph = 4;
$lines_that_could_fit_on_first_page = 2; // Expected: 2 on first, 2 on second
$number_of_lines_on_second_page = $this->_pageBreak($orphans, $widows, $lines_in_paragraph, $lines_that_could_fit_on_first_page);
$this->assertEquals(2, $number_of_lines_on_second_page, "Unexpected number of lines on the second page");
}

public function testPageBreak3orphans1widow()
{
$orphans = 3;
$widows = 1; // Minimum lines on second page
$lines_in_paragraph = 4;
$lines_that_could_fit_on_first_page = 2; // Expected: 0 on first, 4 on second
$number_of_lines_on_second_page = $this->_pageBreak($orphans, $widows, $lines_in_paragraph, $lines_that_could_fit_on_first_page);
$this->assertEquals(4, $number_of_lines_on_second_page, "Unexpected number of lines on the second page");
}

public function testPageBreak4orphans1widow()
{
$orphans = 4;
$widows = 1; // Minimum lines on second page
$lines_in_paragraph = 4;
$lines_that_could_fit_on_first_page = 2; // Expected: 0 on first, 4 on second
$number_of_lines_on_second_page = $this->_pageBreak($orphans, $widows, $lines_in_paragraph, $lines_that_could_fit_on_first_page);
$this->assertEquals(4, $number_of_lines_on_second_page, "Unexpected number of lines on the second page");
}

}