Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for widows when determining the location of page breaks #1356

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

davidsickmiller
Copy link
Contributor

I took a shot at implementing the code for CSS widows. This was previously a "FIX ME" comment in the code.

While testing my new code, I noticed and corrected some issues in how whitespace in the HTML affected the page breaks. I'm not sure whether or not these problems were introduced by my widows changes.

@bsweeney bsweeney added this to the 0.7.1 milestone Jan 23, 2017
@bsweeney
Copy link
Member

Nice! I'll try to take a look for the next release.

@bsweeney bsweeney modified the milestones: 0.7.2, 0.7.1 Feb 2, 2017
@bsweeney bsweeney removed this from the 0.8.1 milestone May 21, 2017
@bsweeney bsweeney added this to the 0.8.4 milestone Feb 23, 2019
@bsweeney bsweeney added css and removed On Deck labels Feb 23, 2019
@bsweeney
Copy link
Member

@davidsickmiller you available to work on this PR. Since we took so long to review it's a bit out of date now. I'm happy to pull it in but if you have time to update that'd be awesome.

@bsweeney bsweeney modified the milestones: 0.8.4, dompdf-next Jan 2, 2020
$block_line_boxes = $block_parent->get_line_boxes();
$frame_line_number = 0;
$i = 1;
$first_node_on_line = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The $first_node_on_line variable doesn't seem to be used. I'm not sure why it's there so perhaps we can go ahead and remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable is indeed unused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps was meant to be used at some point but never fully developed. I think removing the fist-line logic makes sense since it's not current used. This should be easy enough to re-implement if we need it later.

Mellthas added a commit to Mellthas/dompdf that referenced this pull request Mar 11, 2021
Mellthas added a commit to Mellthas/dompdf that referenced this pull request Jul 27, 2021
@Mellthas
Copy link
Member

Mellthas commented Jul 29, 2021

I think this looks good in general! The whitespace changes alone would be really nice to have: Before, page breaks before empty text nodes were not allowed, but they were still checked; and as they report their line height as the space needed, they could force a page break, which then would continue to search backwards in the tree for an element that allowed a page break (even though it would fit on the current page). With the changes, empty text nodes are not checked for page breaks, circumventing the problem with their non-zero reported height.

Comment on lines 560 to +566
if ($next = $iter->get_prev_sibling()) {
while ($next->is_text_node() && $next->get_node()->nodeValue == "") {
$next = $next->get_prev_sibling();
if (!$next)
break 2;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks with a whitespace node as first node, in which case it does not check for a parent in the following if block (because of the break 2).

Proposal:

            $next = $iter->get_prev_sibling();
            // Skip empty text nodes
            while ($next && $next->is_text_node() && $next->get_node()->nodeValue === "") {
                $next = $next->get_prev_sibling();
            }

            if ($next) {

@@ -357,14 +357,33 @@ protected function _page_break_allowed(Frame $frame)

// Rule C
$block_parent = $frame->find_block_parent();
if (count($block_parent->get_line_boxes()) < $frame->get_style()->orphans) {
$block_line_boxes = $block_parent->get_line_boxes();
$frame_line_number = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be initialized as $frame_line_number = count($block_line_boxes), basically as before.
Reason being that the frame might not have been added to the line boxes yet, because we are currently in the process of reflowing its children. This can definitely happen in the case of inline-block frames (which would become a concern with #2497, when line breaks before inline-block frames are enabled).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense. The line count will always be the current number of rendered lines. And if Dompdf has to walk back through previous siblings it should be able to get a valid line number.


$html = "<p>";
$filler = array();
$MAX_FILLER = 47; // Is there a better way to know how many lines fit on a page?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a fixed, and smaller page size together with a fixed line height should make this easier to handle.

Mellthas added a commit to Mellthas/dompdf that referenced this pull request Aug 1, 2021
Before, page breaks before empty text nodes were not allowed, but they
were still checked; and as they report their line height as the space
needed, they could force a page break, which then would continue to
search backwards in the tree for an element that allowed a page break
(even though it would fit on the current page).

Based on work in dompdf#1356.
Comment on lines +804 to +820
// Go back through and handle any page breaks
foreach ($this->_frame->get_children() as $child) {
// Skip children that are empty text nodes
if (!$child->is_text_node() ||
$child->get_node()->nodeValue != ""
) {
if ($page->check_page_break($child)) {
break;
}

// Bail out if the page is full
if ($page->is_full()) {
break;
}
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is problematic with the current infrastructure: A page break in a child element (when reflowing the child) can now result in siblings before it staying on the original page, even though there is not enough space for them; they are checked after the page break has already occurred and so no page break is possible anymore.

Copy link
Member

@Mellthas Mellthas Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To illustrate that see the following HTML:

<!DOCTYPE html>
<html>

<head>
<meta charset="UTF-8">
<style type="text/css">
@page {
	size: 400pt 300pt;
	margin: 50pt;
}

body {
	margin: 0;
}

.block {
	height: 200pt;
	background-color: red;
}
</style>
</head>

<body><div class="block"></div><div class="block"></div><div class="block"></div><div class="block"></div><div class="block"><div></div></div></body>

</html>

And the before and after rendering:
swallow-before.pdf
swallow-after.pdf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not quite right but I think something along these lines is what we'll need to do in order to support widows. Otherwise we don't have a valid count of the number of lines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also fails if a page break is forced using page-break styling.

For sure to fully support this would require a deep rework of how the document is reflowed, e.g. reflow the content then page it instead of doing both at the same time, maybe followed by additional steps. Changing the process this way might allow us to implement some functionality that we might have trouble with otherwise. This would be a major re-work, though, so best to pursue in a future release.

@Mellthas
Copy link
Member

Mellthas commented Aug 1, 2021

As the change to the reflow process (see last review comment) is problematic, the PR in the current form is unfortunately less ready than I thought. This requires more thought on the interaction (and order) of reflow and page break.

I have distilled and fixed the changes about skipping empty text nodes into #2506. #2507 is just about fixing the orphans check, which would have been handled by this PR.

Mellthas added a commit to Mellthas/dompdf that referenced this pull request Aug 1, 2021
Before, page breaks before empty text nodes were not allowed, but they
were still checked; and as they report their line height as the space
needed, they could force a page break, which then would continue to
search backwards in the tree for an element that allowed a page break
(even though it would fit on the current page).

Based on work in dompdf#1356
Fixes dompdf#2468
@bsweeney
Copy link
Member

bsweeney commented Aug 27, 2021

I'm thinking @davidsickmiller is not available to work on this. I'm going to pull into another branch so that it can, ideally, be finalized for the next release. Or recreate the commits from a fresh branch of Develop, but I prefer to let people claim their work in the repo.

@bsweeney bsweeney changed the base branch from develop to widow-support August 27, 2021 01:31
@bsweeney bsweeney modified the milestones: dompdf-next, 1.1.0 Aug 27, 2021
@bsweeney bsweeney changed the base branch from widow-support to develop August 27, 2021 01:35
Mellthas added a commit to Mellthas/dompdf that referenced this pull request Aug 27, 2021
Before, page breaks before empty text nodes were not allowed, but they
were still checked; and as they report their line height as the space
needed, they could force a page break, which then would continue to
search backwards in the tree for an element that allowed a page break
(even though it would fit on the current page).

Based on work in dompdf#1356
Fixes dompdf#2468

Co-authored-by: David Sickmiller <david@sickmiller.com>
Mellthas added a commit to Mellthas/dompdf that referenced this pull request Aug 27, 2021
This effectively forbids line breaks before the first line.

Based on work in dompdf#1356
Fixes dompdf#1617

Co-authored-by: David Sickmiller <david@sickmiller.com>
@Mellthas Mellthas mentioned this pull request Aug 27, 2021
@Mellthas
Copy link
Member

I have added @davidsickmiller as co-author to #2506 and #2507, to preserve the attribution for these parts. If these commits are in, we should be able to rebase the widow-support branch on the develop branch, which should preserve the commit authorship. This should give a clean slate for further work on widows support. As this requires quite a bit more work in my understanding, potentially even some major refactors, I think it can safely be pushed to a later release.

@bsweeney bsweeney modified the milestones: 1.1.0, 1.1.1 Aug 27, 2021
bsweeney pushed a commit that referenced this pull request Sep 6, 2021
Before, page breaks before empty text nodes were not allowed, but they
were still checked; and as they report their line height as the space
needed, they could force a page break, which then would continue to
search backwards in the tree for an element that allowed a page break
(even though it would fit on the current page).

Based on work in #1356
Fixes #2468
Mellthas added a commit to Mellthas/dompdf that referenced this pull request Sep 13, 2021
This effectively forbids line breaks before the first line. Also, check
the orphans property on the block parent, as it only applies to block container
elements per spec.

Based on work in dompdf#1356
Fixes dompdf#1617

Co-authored-by: David Sickmiller <david@sickmiller.com>
Mellthas added a commit to Mellthas/dompdf that referenced this pull request Sep 14, 2021
This effectively forbids line breaks before the first line. Also, check
the orphans property on the block parent, as it only applies to block container
elements per spec.

Based on work in dompdf#1356
Fixes dompdf#1617

Co-authored-by: David Sickmiller <david@sickmiller.com>
liam-milbel added a commit to Milestone-Belanova/dompdf that referenced this pull request Sep 16, 2021
* Moving the PDF version to a `const`

* Remove unused import

* Coding standards: Enable more PSR-2 rules

Enable rules which should be uncontroversial and easy to fix.

* Apply coding-standards changes to files

* Travis: Also check tests dir for coding standards

* Update IMagick version check logic

fixes dompdf#1296

* Fix scoping issue for IMagick version check

* fix Trying to access array offset on value of type null (dompdf#2227)

* fix Trying to access array offset on value of type null

* space

* fix css selector issue

* Fall back to decimals for roman numerals, if out of range

Consistent with how web browsers do it.

See https://en.wikipedia.org/wiki/Roman_numerals#Standard_form

* Stylesheet: Rename and document `_image` method for clarity

* Stylesheet: Make `resolve_url` public to reduce code duplication

* FontMetrics: Fix text-width calculations when using different backends

Fixes issue dompdf#2403.

* FontMetrics: Use direct access for protected properties

* Extend tests for resolving URLs in stylesheets

Without `$sheet->set_base_path(__DIR__)`, the stylesheet location would
be the location of phpunit.

* Fix instantiating CPDF Adapter without Dompdf

* Avoid offset error in CPDF line trasparency logic

related to dompdf#2227

* Use strict comparison in CPDF transparency logic

* Stylesheet: Improve return-value check

* Remove temporary file created when caching an image. (dompdf#2411)

* Remove temporary file created when caching an image.
dompdf#2377

* move unlink() to exception catch block

* Fix text-width calulation with letter spacing (dompdf#2414)

* Consistent doc for `get_text_width` arguments

To clear up the confusion about whether word and char spacing can be
float.

* CPDF: Fix text-width calulation with letter spacing

The letter spacing was counted twice for every space character.

* PDFLib: Fix text-width calculation with letter spacing

See discussion in dompdf#2414.

* Use OS-indepenent file path for unit test

* BlockFrameReflower: Always resolve `auto` widths

* Fix undefined variable (dompdf#2466)

* Bump version to 0.8.6

* Reset version string to commit hash

* Bump version to 1.0.1

* Reset version string to commit hash

* Bump version to 1.0.2

* Reset version string to commit hash

* Fix undefined variable

* Fix indentation

* Fix undefined variable

* Update src/FontMetrics.php

* Update src/FontMetrics.php

Co-authored-by: Brian Sweeney <brian@eclecticgeek.com>

* Add support for `page-break-before/-after` on table rows

pr dompdf#2512:

* Fix some typos

* Update documentation text for `_page_break_allowed`

Mainly to clear up a few unclear points that have changed. The comment
about table headers does not apply anymore.

* Add support for `page-break-before/-after` on table rows

Caveat: `page-break-after: always` does not work if the row is the last
of its table row group. I want to keep things simple now; this might be
better fixed in a proper rework of the page-break-properties handling,
e.g. by properly propagating the property values as per
https://www.w3.org/TR/css-break-3/#break-propagation.

Fixes dompdf#329

* BlockFrameReflower: Fixes to height calculations (dompdf#2511)

Brings the code closer to the spec:
* Always resolve `auto` heights
* Apply `min-/max-height` independently of the `overflow` property
value
* `min-height` overrules `max-height` (swapping removed)

* Prevent case exception in text renderer (dompdf#2423)

In case of " auto " width is throws an error as following and breaks the PDF render.
ErrorException: A non-numeric value encountered in /Path_To_Project/vendor/dompdf/dompdf/src/Renderer/Text.php:158

type casting the $width to float will prevent the ErrorException

* Resolve auto margins in more cases (dompdf#2531)

Fixes dompdf#1482

* Properly skip checking for page breaks on empty text nodes (dompdf#2506)

Before, page breaks before empty text nodes were not allowed, but they
were still checked; and as they report their line height as the space
needed, they could force a page break, which then would continue to
search backwards in the tree for an element that allowed a page break
(even though it would fit on the current page).

Based on work in dompdf#1356
Fixes dompdf#2468

* Fix table column widths after page break (dompdf#2502)

Node attributes are always string, so when a table was laid out, and
then a page break occurred before the table at a later point, `$colspan`
and `$rowspan` would always be string values and fail the checks below,
most notably the check where the minimum column width is set, so that
all columns received `0` minimum width.

Fixes dompdf#2395

* Add support for any inline-positioned frames in text alignment

This notably adds support for inline-block.

Fixes dompdf#1761

* Fix `justify` text alignment with letter spacing

The old logic seemed to correct for wrong text-width calculation
involving letter spacing when using the CPDF backend. With the fix in
dompdf#2414 this is no longer needed.

* Improve support for relative positioning

* Move handling of relative positioning to after the reflow process, so
that it does not affect layout.
* Handle `right` and `bottom` properties plus `auto` values (except for
RTL language support, which is an easy FIXME).
* As a side-effet, this adds support for relative positioning on inline
and inline-block elements.

Fixes dompdf#516

* Fix logic error in `add_frame_to_line` function

The block must check its own content-box width, not the content-box
width of its containing block when placing child frames.

Fixes dompdf#2476.

* Apply line shift to justified lines with break

* Tweak Style border/outline method color handling

fixes dompdf#2367

* BlockFrameReflower: Fix handling of `100%` width

The condition is wrong, as an `auto` width can resolve to more than
100% if the minimum width of the content is larger.

* BlockFrameReflower: Remove swapping of `min-/max-width`

Per spec, `min-width` overrules `max-width`.

https://www.w3.org/TR/CSS21/visudet.html#min-max-widths

* Remove erroneous ‘degenerate’ case handling in `get_min_max_width`

If the frame has no children, it might still have a fixed width set.

Fixes dompdf#2366

* Small code cleanups (dompdf#2547)

* Clean up `get_margin_height/width` functions

For simplicity, return a `float` value consistently.
The `get_break_margins` method was unused.

* TextReflower: Small cleanup for `get_min_max_width`

Also, add type declarations to `_collapse_white_space`.

* Clean up `split` method signatures

Consistent argument names and more type declarations.

Co-authored-by: Thomas Landauer <thomas@landauer.at>
Co-authored-by: Till Berger <till@mellthas.de>
Co-authored-by: Brian Sweeney <brian@eclecticgeek.com>
Co-authored-by: Edwin Manuel Cerrón Angeles <xerron.angels@gmail.com>
Co-authored-by: Ion Bazan <ion.bazan@gmail.com>
Co-authored-by: bradbulger <1234634+bradbulger@users.noreply.github.com>
Co-authored-by: Jáchym Toušek <enumag@gmail.com>
Co-authored-by: Madi <mudasser.ismail@gmail.com>
bsweeney pushed a commit that referenced this pull request Sep 18, 2021
This effectively forbids line breaks before the first line. Also, check
the orphans property on the block parent, as it only applies to block container
elements per spec.

Based on work in #1356
Fixes #1617

Co-authored-by: David Sickmiller <david@sickmiller.com>
@bsweeney bsweeney changed the base branch from develop to master November 8, 2021 19:26
@bsweeney bsweeney modified the milestones: 2.0.1, 2.0.2 Aug 25, 2022
@bsweeney bsweeney modified the milestones: 2.0.2, 2.0.3 Dec 29, 2022
@bsweeney bsweeney modified the milestones: 3.0.1, 3.1.0 May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants