New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement support for widows when determining the location of page breaks #1356
base: master
Are you sure you want to change the base?
Conversation
Nice! I'll try to take a look for the next release. |
@davidsickmiller you available to work on this PR. Since we took so long to review it's a bit out of date now. I'm happy to pull it in but if you have time to update that'd be awesome. |
$block_line_boxes = $block_parent->get_line_boxes(); | ||
$frame_line_number = 0; | ||
$i = 1; | ||
$first_node_on_line = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The $first_node_on_line
variable doesn't seem to be used. I'm not sure why it's there so perhaps we can go ahead and remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable is indeed unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps was meant to be used at some point but never fully developed. I think removing the fist-line logic makes sense since it's not current used. This should be easy enough to re-implement if we need it later.
I think this looks good in general! The whitespace changes alone would be really nice to have: Before, page breaks before empty text nodes were not allowed, but they were still checked; and as they report their line height as the space needed, they could force a page break, which then would continue to search backwards in the tree for an element that allowed a page break (even though it would fit on the current page). With the changes, empty text nodes are not checked for page breaks, circumventing the problem with their non-zero reported height. |
if ($next = $iter->get_prev_sibling()) { | ||
while ($next->is_text_node() && $next->get_node()->nodeValue == "") { | ||
$next = $next->get_prev_sibling(); | ||
if (!$next) | ||
break 2; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks with a whitespace node as first node, in which case it does not check for a parent in the following if
block (because of the break 2
).
Proposal:
$next = $iter->get_prev_sibling();
// Skip empty text nodes
while ($next && $next->is_text_node() && $next->get_node()->nodeValue === "") {
$next = $next->get_prev_sibling();
}
if ($next) {
@@ -357,14 +357,33 @@ protected function _page_break_allowed(Frame $frame) | |||
|
|||
// Rule C | |||
$block_parent = $frame->find_block_parent(); | |||
if (count($block_parent->get_line_boxes()) < $frame->get_style()->orphans) { | |||
$block_line_boxes = $block_parent->get_line_boxes(); | |||
$frame_line_number = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be initialized as $frame_line_number = count($block_line_boxes)
, basically as before.
Reason being that the frame might not have been added to the line boxes yet, because we are currently in the process of reflowing its children. This can definitely happen in the case of inline-block frames (which would become a concern with #2497, when line breaks before inline-block frames are enabled).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense. The line count will always be the current number of rendered lines. And if Dompdf has to walk back through previous siblings it should be able to get a valid line number.
|
||
$html = "<p>"; | ||
$filler = array(); | ||
$MAX_FILLER = 47; // Is there a better way to know how many lines fit on a page? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a fixed, and smaller page size together with a fixed line height should make this easier to handle.
Before, page breaks before empty text nodes were not allowed, but they were still checked; and as they report their line height as the space needed, they could force a page break, which then would continue to search backwards in the tree for an element that allowed a page break (even though it would fit on the current page). Based on work in dompdf#1356.
// Go back through and handle any page breaks | ||
foreach ($this->_frame->get_children() as $child) { | ||
// Skip children that are empty text nodes | ||
if (!$child->is_text_node() || | ||
$child->get_node()->nodeValue != "" | ||
) { | ||
if ($page->check_page_break($child)) { | ||
break; | ||
} | ||
|
||
// Bail out if the page is full | ||
if ($page->is_full()) { | ||
break; | ||
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is problematic with the current infrastructure: A page break in a child element (when reflowing the child) can now result in siblings before it staying on the original page, even though there is not enough space for them; they are checked after the page break has already occurred and so no page break is possible anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To illustrate that see the following HTML:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<style type="text/css">
@page {
size: 400pt 300pt;
margin: 50pt;
}
body {
margin: 0;
}
.block {
height: 200pt;
background-color: red;
}
</style>
</head>
<body><div class="block"></div><div class="block"></div><div class="block"></div><div class="block"></div><div class="block"><div></div></div></body>
</html>
And the before and after rendering:
swallow-before.pdf
swallow-after.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not quite right but I think something along these lines is what we'll need to do in order to support widows. Otherwise we don't have a valid count of the number of lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also fails if a page break is forced using page-break styling.
For sure to fully support this would require a deep rework of how the document is reflowed, e.g. reflow the content then page it instead of doing both at the same time, maybe followed by additional steps. Changing the process this way might allow us to implement some functionality that we might have trouble with otherwise. This would be a major re-work, though, so best to pursue in a future release.
As the change to the reflow process (see last review comment) is problematic, the PR in the current form is unfortunately less ready than I thought. This requires more thought on the interaction (and order) of reflow and page break. I have distilled and fixed the changes about skipping empty text nodes into #2506. #2507 is just about fixing the orphans check, which would have been handled by this PR. |
Before, page breaks before empty text nodes were not allowed, but they were still checked; and as they report their line height as the space needed, they could force a page break, which then would continue to search backwards in the tree for an element that allowed a page break (even though it would fit on the current page). Based on work in dompdf#1356 Fixes dompdf#2468
I'm thinking @davidsickmiller is not available to work on this. I'm going to pull into another branch so that it can, ideally, be finalized for the next release. Or recreate the commits from a fresh branch of Develop, but I prefer to let people claim their work in the repo. |
Before, page breaks before empty text nodes were not allowed, but they were still checked; and as they report their line height as the space needed, they could force a page break, which then would continue to search backwards in the tree for an element that allowed a page break (even though it would fit on the current page). Based on work in dompdf#1356 Fixes dompdf#2468 Co-authored-by: David Sickmiller <david@sickmiller.com>
This effectively forbids line breaks before the first line. Based on work in dompdf#1356 Fixes dompdf#1617 Co-authored-by: David Sickmiller <david@sickmiller.com>
I have added @davidsickmiller as co-author to #2506 and #2507, to preserve the attribution for these parts. If these commits are in, we should be able to rebase the |
Before, page breaks before empty text nodes were not allowed, but they were still checked; and as they report their line height as the space needed, they could force a page break, which then would continue to search backwards in the tree for an element that allowed a page break (even though it would fit on the current page). Based on work in #1356 Fixes #2468
This effectively forbids line breaks before the first line. Also, check the orphans property on the block parent, as it only applies to block container elements per spec. Based on work in dompdf#1356 Fixes dompdf#1617 Co-authored-by: David Sickmiller <david@sickmiller.com>
This effectively forbids line breaks before the first line. Also, check the orphans property on the block parent, as it only applies to block container elements per spec. Based on work in dompdf#1356 Fixes dompdf#1617 Co-authored-by: David Sickmiller <david@sickmiller.com>
* Moving the PDF version to a `const` * Remove unused import * Coding standards: Enable more PSR-2 rules Enable rules which should be uncontroversial and easy to fix. * Apply coding-standards changes to files * Travis: Also check tests dir for coding standards * Update IMagick version check logic fixes dompdf#1296 * Fix scoping issue for IMagick version check * fix Trying to access array offset on value of type null (dompdf#2227) * fix Trying to access array offset on value of type null * space * fix css selector issue * Fall back to decimals for roman numerals, if out of range Consistent with how web browsers do it. See https://en.wikipedia.org/wiki/Roman_numerals#Standard_form * Stylesheet: Rename and document `_image` method for clarity * Stylesheet: Make `resolve_url` public to reduce code duplication * FontMetrics: Fix text-width calculations when using different backends Fixes issue dompdf#2403. * FontMetrics: Use direct access for protected properties * Extend tests for resolving URLs in stylesheets Without `$sheet->set_base_path(__DIR__)`, the stylesheet location would be the location of phpunit. * Fix instantiating CPDF Adapter without Dompdf * Avoid offset error in CPDF line trasparency logic related to dompdf#2227 * Use strict comparison in CPDF transparency logic * Stylesheet: Improve return-value check * Remove temporary file created when caching an image. (dompdf#2411) * Remove temporary file created when caching an image. dompdf#2377 * move unlink() to exception catch block * Fix text-width calulation with letter spacing (dompdf#2414) * Consistent doc for `get_text_width` arguments To clear up the confusion about whether word and char spacing can be float. * CPDF: Fix text-width calulation with letter spacing The letter spacing was counted twice for every space character. * PDFLib: Fix text-width calculation with letter spacing See discussion in dompdf#2414. * Use OS-indepenent file path for unit test * BlockFrameReflower: Always resolve `auto` widths * Fix undefined variable (dompdf#2466) * Bump version to 0.8.6 * Reset version string to commit hash * Bump version to 1.0.1 * Reset version string to commit hash * Bump version to 1.0.2 * Reset version string to commit hash * Fix undefined variable * Fix indentation * Fix undefined variable * Update src/FontMetrics.php * Update src/FontMetrics.php Co-authored-by: Brian Sweeney <brian@eclecticgeek.com> * Add support for `page-break-before/-after` on table rows pr dompdf#2512: * Fix some typos * Update documentation text for `_page_break_allowed` Mainly to clear up a few unclear points that have changed. The comment about table headers does not apply anymore. * Add support for `page-break-before/-after` on table rows Caveat: `page-break-after: always` does not work if the row is the last of its table row group. I want to keep things simple now; this might be better fixed in a proper rework of the page-break-properties handling, e.g. by properly propagating the property values as per https://www.w3.org/TR/css-break-3/#break-propagation. Fixes dompdf#329 * BlockFrameReflower: Fixes to height calculations (dompdf#2511) Brings the code closer to the spec: * Always resolve `auto` heights * Apply `min-/max-height` independently of the `overflow` property value * `min-height` overrules `max-height` (swapping removed) * Prevent case exception in text renderer (dompdf#2423) In case of " auto " width is throws an error as following and breaks the PDF render. ErrorException: A non-numeric value encountered in /Path_To_Project/vendor/dompdf/dompdf/src/Renderer/Text.php:158 type casting the $width to float will prevent the ErrorException * Resolve auto margins in more cases (dompdf#2531) Fixes dompdf#1482 * Properly skip checking for page breaks on empty text nodes (dompdf#2506) Before, page breaks before empty text nodes were not allowed, but they were still checked; and as they report their line height as the space needed, they could force a page break, which then would continue to search backwards in the tree for an element that allowed a page break (even though it would fit on the current page). Based on work in dompdf#1356 Fixes dompdf#2468 * Fix table column widths after page break (dompdf#2502) Node attributes are always string, so when a table was laid out, and then a page break occurred before the table at a later point, `$colspan` and `$rowspan` would always be string values and fail the checks below, most notably the check where the minimum column width is set, so that all columns received `0` minimum width. Fixes dompdf#2395 * Add support for any inline-positioned frames in text alignment This notably adds support for inline-block. Fixes dompdf#1761 * Fix `justify` text alignment with letter spacing The old logic seemed to correct for wrong text-width calculation involving letter spacing when using the CPDF backend. With the fix in dompdf#2414 this is no longer needed. * Improve support for relative positioning * Move handling of relative positioning to after the reflow process, so that it does not affect layout. * Handle `right` and `bottom` properties plus `auto` values (except for RTL language support, which is an easy FIXME). * As a side-effet, this adds support for relative positioning on inline and inline-block elements. Fixes dompdf#516 * Fix logic error in `add_frame_to_line` function The block must check its own content-box width, not the content-box width of its containing block when placing child frames. Fixes dompdf#2476. * Apply line shift to justified lines with break * Tweak Style border/outline method color handling fixes dompdf#2367 * BlockFrameReflower: Fix handling of `100%` width The condition is wrong, as an `auto` width can resolve to more than 100% if the minimum width of the content is larger. * BlockFrameReflower: Remove swapping of `min-/max-width` Per spec, `min-width` overrules `max-width`. https://www.w3.org/TR/CSS21/visudet.html#min-max-widths * Remove erroneous ‘degenerate’ case handling in `get_min_max_width` If the frame has no children, it might still have a fixed width set. Fixes dompdf#2366 * Small code cleanups (dompdf#2547) * Clean up `get_margin_height/width` functions For simplicity, return a `float` value consistently. The `get_break_margins` method was unused. * TextReflower: Small cleanup for `get_min_max_width` Also, add type declarations to `_collapse_white_space`. * Clean up `split` method signatures Consistent argument names and more type declarations. Co-authored-by: Thomas Landauer <thomas@landauer.at> Co-authored-by: Till Berger <till@mellthas.de> Co-authored-by: Brian Sweeney <brian@eclecticgeek.com> Co-authored-by: Edwin Manuel Cerrón Angeles <xerron.angels@gmail.com> Co-authored-by: Ion Bazan <ion.bazan@gmail.com> Co-authored-by: bradbulger <1234634+bradbulger@users.noreply.github.com> Co-authored-by: Jáchym Toušek <enumag@gmail.com> Co-authored-by: Madi <mudasser.ismail@gmail.com>
I took a shot at implementing the code for CSS widows. This was previously a "FIX ME" comment in the code.
While testing my new code, I noticed and corrected some issues in how whitespace in the HTML affected the page breaks. I'm not sure whether or not these problems were introduced by my widows changes.