Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

White Space is Stripped from HTML #668

Open
linusbobcat opened this issue Jul 27, 2021 · 5 comments
Open

White Space is Stripped from HTML #668

linusbobcat opened this issue Jul 27, 2021 · 5 comments

Comments

@linusbobcat
Copy link

It seems that Harp strips all white space indentation from the HTML when it either serves or compiles web pages. Although empty spaces where EJS tags used to exist are strangely preserved.

This is typically a non-issue as compiled HTML isn't supposed to be directly interacted with. However, it's also removing all the indentation in my <pre> and <code> tags.

I noticed some commented out CLI flags to preserve white space and indentation, and while they don't point anywhere, I was wondering if it were possible to enable them somehow? I would rather not manually edit my HTML after compilation.

Additional details can be provided if necessary.

And regardless of everything, thank you for maintaining Harp.

@gbielskiqt
Copy link

First of all, thank you for taking care of Harp!

We recently updated to the new Harp and unfortunately, this is a serious problem for us as we have many code examples with <pre> and <code> tags that need precise formatting to be copy-pasteable. Updating it manually is not really feasible without a significant time commitment.

Do you know what could have caused the issue and if there is some potential workaround?

@sintaxi
Copy link
Owner

sintaxi commented Sep 20, 2021

Hmm, A bunch of redundant minification that didn't provide significant value to harp got removed. Are you able to provide me an example of one of your templates so I can have a look?

@linusbobcat
Copy link
Author

linusbobcat commented Sep 21, 2021

There doesn't seem to be anything specific to using particular layouts, templates, or EJS features.
Compiling the following index.ejs results in the following:

Original

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
        <title>test</title>
    </head>
    <body>
        <h1>a header</h1>
        <p>a paragraph</p>
        <% if(locals.test) { %>
        <% }; %>
    <pre>
        <code>
p {
    font-size: 16px;
    font-family: sans-serif;
}
        </code>
    </pre>
    </body>
</html>

Compiled

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>test</title>
</head>
<body>
<h1>a header</h1>
<p>a paragraph</p>

<pre>
<code>
p {
font-size: 16px;
font-family: sans-serif;
}
</code>
</pre>
</body>
</html>

The EJS function is just a dummy, but notice the code and pre tags being squashed, and the blank line where an EJS function took place.

To gbielskiqt, I used a very hacky workabout. I prefixed all my code tags like so

<pre>
<code>
@@ p {
@@    font-color: red;
@@ }
<code>
<pre>

And ran a hacky sed script to remove all the "@@" characters (after compilation), something like so:

for FILE in ./writing/*.html; do
    sed -i '' -e 's/@@/ /' $FILE
done

Obviously, if your actual code has lots of "@@" characters, use something else.

@smnsc
Copy link

smnsc commented Sep 28, 2021

I also have the same issue, and for very similar reasons I often post <pre><code> snippets for easy copy/pasting.

I worked around it by adding some client-side post processing. I adapted a function I found on SO to re-format the HTML.

The function below takes a HTML string, and returns it as a string of formatted HTML which you can re-insert into your document.

/* Adapted from: https://stackoverflow.com/a/26361620/216104 */
    function formatHtml(htmlString) {
        var div = document.createElement('div');
        div.innerHTML = htmlString.trim();

        const format = function (node, level) {
            var indentBefore = new Array(level++ + 1).join('  '),
                indentAfter = new Array(level - 1).join('  '),
                textNode;

            for (var i = 0; i < node.children.length; i++) {

                textNode = document.createTextNode('\n' + indentBefore);
                node.insertBefore(textNode, node.children[i]);

                format(node.children[i], level);

                if (node.lastElementChild == node.children[i]) {
                    textNode = document.createTextNode('\n' + indentAfter);
                    node.appendChild(textNode);
                }
            }

            return node;
        }

        return format(div, 0).innerHTML;
    }

@Prinzhorn
Copy link
Contributor

Prinzhorn commented Oct 11, 2021

Took a quick look at it https://github.com/sintaxi/terraform/blob/cbd673212b246e76d64c33a43c9059625640e32c/lib/template/processors/ejs.js#L8

👀

Introduced here sintaxi/terraform@13bfd03#diff-7bc4d9d7c5ecce5be75d3e86cf54c1ba33481b10d78426d46290850f2ec17a9bR8

rmWhitespace Remove all safe-to-remove whitespace, including leading and trailing whitespace. It also enables a safer version of -%> line slurping for all scriptlet tags (it does not strip new lines of tags in the middle of a line).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants