Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rebased, test WIP] i559 TexyServiceEngine #485

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

OndraZizka
Copy link
Contributor

@OndraZizka OndraZizka commented Jul 23, 2018

Here is a working extension of an MarkupEngine that calls a remote service to convert Texy syntax.

I would be glad if it got to JBake, so please tell me what would you like me to do with the code to meet JBake project's criteria.

Thanks :)

Edit:

  1. I am migrating to JBake from my own JTexy + Wicket + Wildfly + OpenShift based solution. So I have quite a few Texy files that I would prefer not to edit.
    However, the header is not actually needed. Everything either has a default (type, status), or can be extracted from the document or filesystem metadata (title, date).

    This PR contains changes that allow not to have a header at all, if the defaults are set.
    I will separate it from the TexyService changes and create 2nd PR, and clean up this one.

  2. I hope I don't annoy you with the multitude of issues and PRs. If so, let me know, I will throttle down :) On the other hand, I will do a batch of PRs and then phase out for a while unless you want some help with 3.0.

    Related to that - is there any document with the processes? Like, if I have a suspicion for a bug, should it be discussed first in Google Groups or can I file a bug right away (when confident enough)? Or, if I see a refactoring oportunity, should that go as an issue or jbake-dev post?

  3. Are the tests supposed to pass? It fails with

    Error resolving plugin [id: 'io.sdkman.vendors', version: '1.1.1', apply: false]

@coveralls
Copy link

coveralls commented Jul 23, 2018

Coverage Status

Coverage decreased (-1.3%) to 78.817% when pulling c3d1201 on OndraZizka:TexyServiceEngine into a402b14 on jbake-org:master.

Copy link
Member

@ancho ancho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. This is a very quick review. First things first. Thank you :)

It would be nice to cover the service and the new engine with tests. I added an inline comment. We need some kind of error handling, if the service is not available or does not respond as expected.
That's all I have to say for now.

In the long term we want to add new markup and template engines as extensions to the core.
To maintain them seperately and keep the core stable and clean.

But for now I think we should add them and seperate them out as soon as some kind of extension mechanism evolves.

String documentBody = context.getBody();

try (InputStream stream = new ByteArrayInputStream(documentBody.getBytes(StandardCharsets.UTF_8))){
TexyRestService texyService = new TexyRestService(new URL("http://localhost:8022/TexyService.rest.php"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a candidate for another configuration option. Maybe a check if the service is available and handle the error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config - sure, I am looking at how the config is propagated around.
In the end it will have around 8 config entries, which fine-tune Texy's behavior.

Check for availability - yes. I would basically just throw an exception and it should be up to JBake to tell the user that "couldn't convert document XY" + the reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can find the configuration in the ParserContext class.

@jonbullock
Copy link
Member

Wow lots of activity right now, apologies for the radio silence from me, I've been preparing for a JBake talk I'm giving tonight. Once that's out of the way I can properly reply to this and all the other activity.

@@ -32,6 +33,9 @@
public abstract class MarkupEngine implements ParserEngine {
private static final Logger LOGGER = LoggerFactory.getLogger(MarkupEngine.class);

public static final int MAX_HEADER_LINES = 50;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a PoC so this would also be a config.
However, I think I will put the MarkupEngine changes aside from this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jap. good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to limit changes that are needed for the TexyService thing. Please have a look at the failing tests and fix them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's in my plan. Sorry for the noise, I was getting familiar with JBake. Now this is a bit mess but I will split soon, perhaps during the weekend, and update (reduce) this PR.



//for (String line : contents) {
for (int i = 0; i < contents.size() && i < MAX_HEADER_LINES; i++) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some files which are thousands lines long. And JBake scans them 3 times. I think we can optimize a bit and apply some configurable limit.

@OndraZizka OndraZizka changed the title Request for comments TexyServiceEngine [PR cleanup in progress] TexyServiceEngine Jul 25, 2018
@jonbullock
Copy link
Member

In answer to your original questions, if you think you've found a bug please raise an issue. For enhancements I've tried to encourage users to suggest them on the mailing list first (via the contributors guide) but this doesn't tend to happen in practice and we seem to get more input on GH issues rather than anywhere else.

@jonbullock
Copy link
Member

I'm still catching up on activity right now.... but we do have some outstanding PR's that when merged will most likely break other PR's that have recently been raised so it might be an idea to throttle back on new PR's for the time being...

@OndraZizka
Copy link
Contributor Author

I'll rebase this to the new master, soon.

@OndraZizka OndraZizka changed the title [PR cleanup in progress] TexyServiceEngine [PR rebase and cleanup in progress] TexyServiceEngine Oct 29, 2018
@OndraZizka OndraZizka changed the title [PR rebase and cleanup in progress] TexyServiceEngine [PR rebase and cleanup in progress] i559 TexyServiceEngine Oct 29, 2018
@OndraZizka OndraZizka force-pushed the TexyServiceEngine branch 2 times, most recently from 7b6ad40 to 7fe542f Compare October 30, 2018 03:49
@OndraZizka OndraZizka changed the title [PR rebase and cleanup in progress] i559 TexyServiceEngine [rebased, WIP] i559 TexyServiceEngine Oct 30, 2018
@OndraZizka OndraZizka changed the title [rebased, WIP] i559 TexyServiceEngine [rebased, test WIP] i559 TexyServiceEngine Oct 30, 2018
@OndraZizka
Copy link
Contributor Author

OndraZizka commented Oct 30, 2018

Rebased, not tested. Will test soon.
I think I will also squash this to 1 commit and make a new PR. Rebasing was too painful :)

imports cleanup.
Final touches.
Extract the title from the Texy documents.
Improve hasHeader(): only scan first N lines, skip #... lines, skip blank lines, test against a regex; do not require status and type if defaults are set.
Add DebugUtil with map printing, since JBake code uses maps heavily
Wrap Crawler iteration into a try/catch
Start refactoring MarkupEngine so it supports files without a header.
Make DebugUtil more generic
Improve MarkupEngine: Support no header if all values are known; Improve validation; Refactor.
Extract title from Texy documents.
Implement RawMarkupEngine: Extract title from the HTML; Normalize HTML; Pretty print HTML; Export as XHTML; Change exported charset;
Introduce `input.charset`. Fix MarkupEngine - don't return headers map if the header separator is not found.
Allow .-_ in the header names
Refactor HtmlUtil#fixImageSourceUrls(). Keeps the same behavior.
Fix jbake-org#499 file names encoding.
Implements jbake-org#500  Make URL fixing optional jbake-org#500
Refactor createUri() and createNoExtensionUri() into one.
Make index creation bit more readable (just reorder)
Make index creation bit more readable (reuse the attrib name)
Refactor Crawler#crawlSourceFile() logic around updating cache flag.
Implement ContentStore#mergeDocument(Map<String, Object>) to update docs.
Implement Make "relative <img src> points to assets" optional jbake-org#502
Implement Make URL fixing optional jbake-org#500
These two are hitting the same code, so it's hard to split them.
MarkupUtil and RawMarkupUtil cleanup DebugUtil call
Force normalize HTML files if they contain <body>.
Rename vars
Allow deduplication of title autodetected from the document's header - mark that header with a CSS class.
Fix: Storing the altered DOM wrapped in <div> resulted in this <div> being serialized too. This removes it.
Make innerXml more robust.
@OndraZizka
Copy link
Contributor Author

OndraZizka commented Oct 31, 2018

Squashed. I might need some help with getting the tests to pass.

@ancho
Copy link
Member

ancho commented Oct 31, 2018

All right this review will take some time. Did you try to create a branch that first of all just integrates the new Engine you want to add?

It looks like there are changes which are addressed in different other PR's you originaly split out but no changed test case which reflects the new behavior.

I'm missing a test for the new Engine.

Don't know when I find some time to see what change exactly breaks the tests at the moment.


// If this is enabled, then this already happened in Crawler.
// TODO: Remove the fixing from Crawler.
// We should keep the pristine doc body as long as possible, or change it locally.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That's a thing we should discuss. But I don't think it it's necessary to change this behavior to introduce the new Engine. It really should be integrated into the rendering phase to produce alternative paths for index files for types, that live in a totally different location than the listed document of the specific type.

* Handle invalid or unavailable charset.
*/
private Charset getAsCharset(String key, Charset defaultCharset)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't mix styles. The unwritten convention is placing the bracket on the same line as the method definition or controll statement.

Yes:

if ( foo ) {
  return bar;
}

No:

if ( foo )
{
   return bar;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will align the styles. I blame IDEA. Or my little finger twitching above enter :)


if (isRelative(source)) {
source = uri + source.replaceFirst("\\./", "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of this line causes a few tests to fail.
If it really is unnecessary to replace paths starting with "./" you need to change the tests accordingly.
https://github.com/jbake-org/jbake/blob/master/jbake-core/src/test/java/org/jbake/util/HtmlUtilTest.java#L53 for example.

Copy link
Contributor Author

@OndraZizka OndraZizka Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ancho , that's something for discussion (WIP). I just rebased and din't have time to test. But IIRC, I needed some per-case way to turn on or off the site URL placing. The most natural way is to use such "neutral" prefix in src and give it a meaning.
I will finish this and comment here.

@@ -14,6 +14,7 @@
<logger name="org.eclipse" level="WARN"/>
<logger name="org.apache" level="WARN"/>
<logger name="org.jbake" level="INFO"/>
<logger name="org.jbake.parser.texy" level="DEBUG"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe INFO is better for production.

@OndraZizka
Copy link
Contributor Author

OndraZizka commented Nov 1, 2018

Regarding isolating a branch that only integrates Texy. That is possible. I had some branches which took some changes from this PR, and most were merged. After that, this become quite tedious to rebase.

Some things are still necessary, like getExtractTitleFromDoc() - Texy documents typically rely on that.

So I will take what's needed to make Texy markup work and do a smaller PR. ETA 2 to 4 weeks.

@jonbullock jonbullock added this to the v2.7.0 milestone Nov 16, 2018
@jonbullock jonbullock added this to PR needs further review/discussion in v2.7.0 Release May 4, 2021
@jonbullock jonbullock modified the milestones: v2.7.0, v2.8.0 May 25, 2021
@jonbullock jonbullock removed this from PR needs further review/discussion in v2.7.0 Release May 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants