[rebased, test WIP] i559 TexyServiceEngine #485

OndraZizka · 2018-07-23T16:56:09Z

Here is a working extension of an MarkupEngine that calls a remote service to convert Texy syntax.

I would be glad if it got to JBake, so please tell me what would you like me to do with the code to meet JBake project's criteria.

Thanks :)

Edit:

I am migrating to JBake from my own JTexy + Wicket + Wildfly + OpenShift based solution. So I have quite a few Texy files that I would prefer not to edit.
However, the header is not actually needed. Everything either has a default (type, status), or can be extracted from the document or filesystem metadata (title, date).

This PR contains changes that allow not to have a header at all, if the defaults are set.
I will separate it from the TexyService changes and create 2nd PR, and clean up this one.
I hope I don't annoy you with the multitude of issues and PRs. If so, let me know, I will throttle down :) On the other hand, I will do a batch of PRs and then phase out for a while unless you want some help with 3.0.

Related to that - is there any document with the processes? Like, if I have a suspicion for a bug, should it be discussed first in Google Groups or can I file a bug right away (when confident enough)? Or, if I see a refactoring oportunity, should that go as an issue or jbake-dev post?
Are the tests supposed to pass? It fails with

Error resolving plugin [id: 'io.sdkman.vendors', version: '1.1.1', apply: false]

coveralls · 2018-07-23T17:01:50Z

Coverage decreased (-1.3%) to 78.817% when pulling c3d1201 on OndraZizka:TexyServiceEngine into a402b14 on jbake-org:master.

ancho

Ok. This is a very quick review. First things first. Thank you :)

It would be nice to cover the service and the new engine with tests. I added an inline comment. We need some kind of error handling, if the service is not available or does not respond as expected.
That's all I have to say for now.

In the long term we want to add new markup and template engines as extensions to the core.
To maintain them seperately and keep the core stable and clean.

But for now I think we should add them and seperate them out as soon as some kind of extension mechanism evolves.

ancho · 2018-07-24T06:17:28Z

jbake-core/src/main/java/org/jbake/parser/texy/TexyServiceEngine.java

+        String documentBody = context.getBody();
+
+        try (InputStream stream = new ByteArrayInputStream(documentBody.getBytes(StandardCharsets.UTF_8))){
+            TexyRestService texyService = new TexyRestService(new URL("http://localhost:8022/TexyService.rest.php"));


This seems like a candidate for another configuration option. Maybe a check if the service is available and handle the error?

Config - sure, I am looking at how the config is propagated around.
In the end it will have around 8 config entries, which fine-tune Texy's behavior.

Check for availability - yes. I would basically just throw an exception and it should be up to JBake to tell the user that "couldn't convert document XY" + the reason.

you can find the configuration in the ParserContext class.

jonbullock · 2018-07-24T08:20:03Z

Wow lots of activity right now, apologies for the radio silence from me, I've been preparing for a JBake talk I'm giving tonight. Once that's out of the way I can properly reply to this and all the other activity.

OndraZizka · 2018-07-24T17:09:22Z

jbake-core/src/main/java/org/jbake/parser/MarkupEngine.java

@@ -32,6 +33,9 @@
 public abstract class MarkupEngine implements ParserEngine {
    private static final Logger LOGGER = LoggerFactory.getLogger(MarkupEngine.class);

+    public static final int MAX_HEADER_LINES = 50;


This is a PoC so this would also be a config.
However, I think I will put the MarkupEngine changes aside from this PR.

jap. good idea.

try to limit changes that are needed for the TexyService thing. Please have a look at the failing tests and fix them.

That's in my plan. Sorry for the noise, I was getting familiar with JBake. Now this is a bit mess but I will split soon, perhaps during the weekend, and update (reduce) this PR.

OndraZizka · 2018-07-24T17:11:15Z

jbake-core/src/main/java/org/jbake/parser/MarkupEngine.java

+
+
+        //for (String line : contents) {
+        for (int i = 0; i < contents.size() && i < MAX_HEADER_LINES; i++) {


I have some files which are thousands lines long. And JBake scans them 3 times. I think we can optimize a bit and apply some configurable limit.

jonbullock · 2018-08-01T12:28:17Z

In answer to your original questions, if you think you've found a bug please raise an issue. For enhancements I've tried to encourage users to suggest them on the mailing list first (via the contributors guide) but this doesn't tend to happen in practice and we seem to get more input on GH issues rather than anywhere else.

jonbullock · 2018-08-01T12:30:27Z

I'm still catching up on activity right now.... but we do have some outstanding PR's that when merged will most likely break other PR's that have recently been raised so it might be an idea to throttle back on new PR's for the time being...

OndraZizka · 2018-10-29T20:55:04Z

I'll rebase this to the new master, soon.

OndraZizka · 2018-10-30T04:04:11Z

Rebased, not tested. Will test soon.
I think I will also squash this to 1 commit and make a new PR. Rebasing was too painful :)

imports cleanup. Final touches. Extract the title from the Texy documents. Improve hasHeader(): only scan first N lines, skip #... lines, skip blank lines, test against a regex; do not require status and type if defaults are set. Add DebugUtil with map printing, since JBake code uses maps heavily Wrap Crawler iteration into a try/catch Start refactoring MarkupEngine so it supports files without a header. Make DebugUtil more generic Improve MarkupEngine: Support no header if all values are known; Improve validation; Refactor. Extract title from Texy documents. Implement RawMarkupEngine: Extract title from the HTML; Normalize HTML; Pretty print HTML; Export as XHTML; Change exported charset; Introduce `input.charset`. Fix MarkupEngine - don't return headers map if the header separator is not found. Allow .-_ in the header names Refactor HtmlUtil#fixImageSourceUrls(). Keeps the same behavior. Fix jbake-org#499 file names encoding. Implements jbake-org#500 Make URL fixing optional jbake-org#500 Refactor createUri() and createNoExtensionUri() into one. Make index creation bit more readable (just reorder) Make index creation bit more readable (reuse the attrib name) Refactor Crawler#crawlSourceFile() logic around updating cache flag. Implement ContentStore#mergeDocument(Map<String, Object>) to update docs. Implement Make "relative <img src> points to assets" optional jbake-org#502 Implement Make URL fixing optional jbake-org#500 These two are hitting the same code, so it's hard to split them. MarkupUtil and RawMarkupUtil cleanup DebugUtil call Force normalize HTML files if they contain <body>. Rename vars Allow deduplication of title autodetected from the document's header - mark that header with a CSS class. Fix: Storing the altered DOM wrapped in <div> resulted in this <div> being serialized too. This removes it. Make innerXml more robust.

OndraZizka · 2018-10-31T16:25:36Z

Squashed. I might need some help with getting the tests to pass.

ancho · 2018-10-31T23:24:31Z

All right this review will take some time. Did you try to create a branch that first of all just integrates the new Engine you want to add?

It looks like there are changes which are addressed in different other PR's you originaly split out but no changed test case which reflects the new behavior.

I'm missing a test for the new Engine.

Don't know when I find some time to see what change exactly breaks the tests at the moment.

ancho · 2018-10-31T23:34:38Z

jbake-core/src/main/java/org/jbake/app/Oven.java

+
+        // If this is enabled, then this already happened in Crawler.
+        // TODO: Remove the fixing from Crawler.
+        //       We should keep the pristine doc body as long as possible, or change it locally.


Yes. That's a thing we should discuss. But I don't think it it's necessary to change this behavior to introduce the new Engine. It really should be integrated into the rendering phase to produce alternative paths for index files for types, that live in a totally different location than the listed document of the specific type.

ancho · 2018-10-31T23:38:50Z

jbake-core/src/main/java/org/jbake/app/configuration/DefaultJBakeConfiguration.java

+     * Handle invalid or unavailable charset.
+     */
+    private Charset getAsCharset(String key, Charset defaultCharset)
+    {


Please don't mix styles. The unwritten convention is placing the bracket on the same line as the method definition or controll statement.

Yes:

if ( foo ) { return bar; }

No:

if ( foo ) { return bar; }

Ok I will align the styles. I blame IDEA. Or my little finger twitching above enter :)

ancho · 2018-10-31T23:48:35Z

jbake-core/src/main/java/org/jbake/util/HtmlUtil.java


-            if (isRelative(source)) {
-                source = uri + source.replaceFirst("\\./", "");


The removal of this line causes a few tests to fail.
If it really is unnecessary to replace paths starting with "./" you need to change the tests accordingly.
https://github.com/jbake-org/jbake/blob/master/jbake-core/src/test/java/org/jbake/util/HtmlUtilTest.java#L53 for example.

Hi @ancho , that's something for discussion (WIP). I just rebased and din't have time to test. But IIRC, I needed some per-case way to turn on or off the site URL placing. The most natural way is to use such "neutral" prefix in src and give it a meaning.
I will finish this and comment here.

ancho · 2018-10-31T23:50:06Z

jbake-dist/src/dist/lib/logging/logback.xml

@@ -14,6 +14,7 @@
    <logger name="org.eclipse" level="WARN"/>
    <logger name="org.apache" level="WARN"/>
    <logger name="org.jbake" level="INFO"/>
+    <logger name="org.jbake.parser.texy" level="DEBUG"/>


Maybe INFO is better for production.

OndraZizka · 2018-11-01T00:16:42Z

Regarding isolating a branch that only integrates Texy. That is possible. I had some branches which took some changes from this PR, and most were merged. After that, this become quite tedious to rebase.

Some things are still necessary, like getExtractTitleFromDoc() - Texy documents typically rely on that.

So I will take what's needed to make Texy markup work and do a smaller PR. ETA 2 to 4 weeks.

ancho reviewed Jul 24, 2018

View reviewed changes

OndraZizka commented Jul 24, 2018

View reviewed changes

OndraZizka changed the title ~~Request for comments TexyServiceEngine~~ [PR cleanup in progress] TexyServiceEngine Jul 25, 2018

OndraZizka changed the title ~~[PR cleanup in progress] TexyServiceEngine~~ [PR rebase and cleanup in progress] TexyServiceEngine Oct 29, 2018

OndraZizka mentioned this pull request Oct 29, 2018

Support for Texy markup language #559

Open

OndraZizka changed the title ~~[PR rebase and cleanup in progress] TexyServiceEngine~~ [PR rebase and cleanup in progress] i559 TexyServiceEngine Oct 29, 2018

OndraZizka force-pushed the TexyServiceEngine branch 2 times, most recently from 7b6ad40 to 7fe542f Compare October 30, 2018 03:49

OndraZizka changed the title ~~[PR rebase and cleanup in progress] i559 TexyServiceEngine~~ [rebased, WIP] i559 TexyServiceEngine Oct 30, 2018

OndraZizka changed the title ~~[rebased, WIP] i559 TexyServiceEngine~~ [rebased, test WIP] i559 TexyServiceEngine Oct 30, 2018

OndraZizka added 6 commits October 31, 2018 17:24

Fixes after grand rebase

fe97417

Add .../out/ to gitignore

5172c3a

Make javadoc lint happy

9bbf009

Don't require output.html.charset, use UTF-8

c7a3b5d

Fix bugs revealed by tests

b95f8d0

OndraZizka force-pushed the TexyServiceEngine branch from fd6e6b9 to b95f8d0 Compare October 31, 2018 16:25

ancho reviewed Oct 31, 2018

View reviewed changes

jonbullock added enhancement templates labels Nov 16, 2018

jonbullock added this to the v2.7.0 milestone Nov 16, 2018

jonbullock added this to PR needs further review/discussion in v2.7.0 Release May 4, 2021

jonbullock modified the milestones: v2.7.0, v2.8.0 May 25, 2021

jonbullock removed this from PR needs further review/discussion in v2.7.0 Release May 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rebased, test WIP] i559 TexyServiceEngine #485

[rebased, test WIP] i559 TexyServiceEngine #485

OndraZizka commented Jul 23, 2018 •

edited

coveralls commented Jul 23, 2018 •

edited

ancho left a comment

ancho Jul 24, 2018

OndraZizka Jul 24, 2018

ancho Jul 24, 2018

jonbullock commented Jul 24, 2018

OndraZizka Jul 24, 2018

ancho Jul 24, 2018

ancho Jul 24, 2018

OndraZizka Jul 24, 2018

OndraZizka Jul 24, 2018

jonbullock commented Aug 1, 2018

jonbullock commented Aug 1, 2018

OndraZizka commented Oct 29, 2018

OndraZizka commented Oct 30, 2018 •

edited

OndraZizka commented Oct 31, 2018 •

edited

ancho commented Oct 31, 2018

ancho Oct 31, 2018

ancho Oct 31, 2018

OndraZizka Nov 1, 2018

ancho Oct 31, 2018

OndraZizka Nov 1, 2018 •

edited

ancho Oct 31, 2018

OndraZizka commented Nov 1, 2018 •

edited



		//for (String line : contents) {
		for (int i = 0; i < contents.size() && i < MAX_HEADER_LINES; i++) {


		if (isRelative(source)) {
		source = uri + source.replaceFirst("\\./", "");

[rebased, test WIP] i559 TexyServiceEngine #485

Are you sure you want to change the base?

[rebased, test WIP] i559 TexyServiceEngine #485

Conversation

OndraZizka commented Jul 23, 2018 • edited

coveralls commented Jul 23, 2018 • edited

ancho left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonbullock commented Jul 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonbullock commented Aug 1, 2018

jonbullock commented Aug 1, 2018

OndraZizka commented Oct 29, 2018

OndraZizka commented Oct 30, 2018 • edited

OndraZizka commented Oct 31, 2018 • edited

ancho commented Oct 31, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OndraZizka Nov 1, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OndraZizka commented Nov 1, 2018 • edited

OndraZizka commented Jul 23, 2018 •

edited

coveralls commented Jul 23, 2018 •

edited

OndraZizka commented Oct 30, 2018 •

edited

OndraZizka commented Oct 31, 2018 •

edited

OndraZizka Nov 1, 2018 •

edited

OndraZizka commented Nov 1, 2018 •

edited