Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[simple-api] simplified method for JNI bridges #221

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

NPavie
Copy link
Contributor

@NPavie NPavie commented Nov 20, 2023

  • Handle URIs in options
  • Return the job object for external monitoring
  • Generalize withArgument to allow the use of java objects and raw types instead of string in options
  • handle null arguments
  • when writing result, decode the idx to avoid url-encoded file names

@bertfrees
Copy link
Member

Generalize withArgument to allow the use of java objects and raw types instead of string in options

@NPavie I wonder why you need this. Also I'm not sure whether your change actually did what you intended: I don't see any Java objects provided to startJob() actually going into BoundScript.Builder without being first converted into strings. Or am I missing something? After all, script options are always strings. This is how the script API currently works. You currently can't pass an integer value as an int or a boolean value as a boolean.

Also note that CommandLineJobParser is intended to be a simplfied API on top of the script API for users who want to parse command line arguments. Command line arguments are strings, that is why SimpleAPI doesn't support non-String values for inputs and outputs either.

If you want to provide inputs or fetch outputs through other ways than through (string) paths, CommandLineJobParser is not your friend. For these cases it probably makes more sense to use BoundScript.Builder directly.

handle null arguments

Why do you need this? Actually I'd rather throw an IllegalArgumentException for null values, because null values are more likely to indicate an error than be intended.

Handle URIs in options

Why do you need this? The idea of CommandLineJobParser was to simplify the API for use cases where inputs are always files on the local file system. I'm not totally against making the API a bit more robust against wrong use though. But I'd like to understand your use case better. (For one thing, I don't know which options still exist today with type anyFileURI or anyDirURI and that are not inputs or outputs.)

when writing result, decode the idx to avoid url-encoded file names

Why do you do that? In which case did you get an URL-encoded idx? If this happens, that might be an error in the Pipeline itself (and not something we need to fix in SimpleAPI).

@@ -334,7 +358,7 @@ else if (file.list().length > 0)
if (file.isDirectory()) {
if (file.list().length > 0)
throw new IllegalArgumentException("Directory is not empty: " + file);
resultLocations.put(port, URI.create(file.toURI() + "/"));
resultLocations.put(port, file.toURI());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

@NPavie NPavie Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i recall, It could lead to invalid/malformed path on windows ending by "//" or "\/"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Shouldn't we instead change the

if (result.endsWith("/")) {

above to

if (result.endsWith("/") || result.endsWith("\\")) {

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I did not treat this in the results side (only the parameters initialization side was creating issues during developement of the connection between word addin and pipeline)

@NPavie
Copy link
Contributor Author

NPavie commented Nov 23, 2023

Generalize withArgument to allow the use of java objects and raw types instead of string in options

@NPavie I wonder why you need this. Also I'm not sure whether your change actually did what you intended: I don't see any Java objects provided to startJob() actually going into BoundScript.Builder without being first converted into strings. Or am I missing something? After all, script options are always strings. This is how the script API currently works. You currently can't pass an integer value as an int or a boolean value as a boolean.

Also note that CommandLineJobParser is intended to be a simplfied API on top of the script API for users who want to parse command line arguments. Command line arguments are strings, that is why SimpleAPI doesn't support non-String values for inputs and outputs either.

If you want to provide inputs or fetch outputs through other ways than through (string) paths, CommandLineJobParser is not your friend. For these cases it probably makes more sense to use BoundScript.Builder directly.

Oh Ok, I might need to change the plugin code in that case, or make an alternative entrypoint.
I use Java object because the JNI layer i use autmatically converts csharp raw types and objects to Java on the JVM side equivalent.
I can't use string directly without adding a lot of boilercode in the addin on chsarp side as Java toString and CSharp toString are different (like boolean tostring would produce 'False' on csharp, not 'false').
Using Java object directly on my side seemed the safest as to ensure the correct value is reported for the options at the end.
If you want i can revert back my changes on that and make an alternative copy of the changes in a "JNIJob" and "JNIJobParser" specifically for that.

handle null arguments

Why do you need this? Actually I'd rather throw an IllegalArgumentException for null values, because null values are more likely to indicate an error than be intended.

If i recall, at first i had "unspecified" options in some scripts, and i don't remember why but in that case it would produce "null" values for the options and it was making the script crash.
So i handled it just in case.

Handle URIs in options

Why do you need this? The idea of CommandLineJobParser was to simplify the API for use cases where inputs are always files on the local file system. I'm not totally against making the API a bit more robust against wrong use though. But I'd like to understand your use case better. (For one thing, I don't know which options still exist today with type anyFileURI or anyDirURI and that are not inputs or outputs.)

URI handling in the code was incorrect in the previous code, it would raised exception on windows, especially when trying to check if path is absolute.
I discovered that passing the URI directly to the file object without casting it to URI first was at the origin of the issue.

when writing result, decode the idx to avoid url-encoded file names

Why do you do that? In which case did you get an URL-encoded idx? If this happens, that might be an error in the Pipeline itself (and not something we need to fix in SimpleAPI).

When i checked, the idx values are URI encoded and you create a system path in the code (not an URI path):
for example if you have spaces in file names, they are written as %20 in the final result.
The "path" in the jobresult on the other end is correct, but i was not able to find in time the correct way to retrieve the internal job path as to extract and reuse the relative path of the result in it.

@bertfrees
Copy link
Member

So if I understand correctly you made the withArgument function accept any object because you want to use Java's Object.toString() method, so as to get string values that Pipeline will interpret correctly? That feels a bit of a hack to me. Accepting specific types such as integer, boolean, File and URI would be acceptable, but accepting any object and then converting it to string does not feel right.

Perhaps the real issue we need to solve is that Pipeline should be more relaxed about which values to accept. For example, a "boolean" value should currently be either "true", "false", "1" or "0". In addition we could allow also "True", "False", "TRUE", "FALSE", etc.

URI handling in the code was incorrect in the previous code, it would raised exception on windows, especially when trying to check if path is absolute. I discovered that passing the URI directly to the file object without casting it to URI first was at the origin of the issue.

I think an example would help. What kind of input is causing exceptions?

When i checked, the idx values are URI encoded and you create a system path in the code (not an URI path)

Aha I see. You're right. I think I would fix it as follows though:

File dest = new File(u.resolve(r.strip().getIdx()));

@NPavie
Copy link
Contributor Author

NPavie commented Nov 30, 2023

Hum i have rested the simple API from the current version and i can't reproduce the issue i had before when forwarding URIs.

I realised that the URI fix and most changes i made was based on the first version of the SimpleAPI and with an older JRE so it's possible the fixes i made was only needed with that version, and/or that some fixes like the absolute check for URI path was done at the JRE level.

I'll rework the PR to only provided the fixes that i think are really of use for (my) usage of the SimpleAPI through JNI :

  • the _startJob will now return the job object it creates
  • I'll propose a new static method called startMonitorableJob to start a command line job and return it for external monitoring
  • i'll add your proposed fix on file destination (tested, it works)

@NPavie NPavie changed the title [simple-api] fixes for usage through jni [simple-api] simplified method for JNI bridges Nov 30, 2023
@bertfrees
Copy link
Member

@NPavie So you prefer to use the JobMonitor object then the getNewMessages() convenience method? Then there is no reason to keep getNewMessages() and getLastJobStatus().

Regarding the new startMonitorableJob() method: I think it makes sense to return a CommandLineJob object from startJob() as well, especially if we drop getNewMessages() and getLastJobStatus(). Would it be OK if I replace the startMonitorableJob() method with a new static method to help with the correct serializing?

public static Map<String,Iterable<String>> stringifyOptions(Map<String,Object> options) { ... }

@bertfrees
Copy link
Member

Perhaps the real issue we need to solve is that Pipeline should be more relaxed about which values to accept. For example, a "boolean" value should currently be either "true", "false", "1" or "0". In addition we could allow also "True", "False", "TRUE", "FALSE", etc.

I still think this problem can also (at least partly) be solved by being less strict in the validation of option values. Can you show me what a typical option map coming from the Word add-in looks like?

@NPavie
Copy link
Contributor Author

NPavie commented Dec 13, 2023

@NPavie So you prefer to use the JobMonitor object then the getNewMessages() convenience method? Then there is no reason to keep getNewMessages() and getLastJobStatus().

Yes i prefer, mainly because i have a project in house to do some batch conversions from another C# project with possibly multiple jobs running.

Regarding the new startMonitorableJob() method: I think it makes sense to return a CommandLineJob object from startJob() as well, especially if we drop getNewMessages() and getLastJobStatus(). Would it be OK if I replace the startMonitorableJob() method with a new static method to help with the correct serializing?

public static Map<String,Iterable<String>> stringifyOptions(Map<String,Object> options) { ... }

Oh indeed it would be a good alternative !
I could use that to rely on the original startJob method while having the correction conversion of parameters from the C# JNI code.
i'll update the code on both side to do that.

I still think this problem can also (at least partly) be solved by being less strict in the validation of option values. Can you show me what a typical option map coming from the Word add-in looks like?

I'll try to make an example, but i remember i was facing 2 problems when i had the need of going with the more generic Map<String, Object> options type :

  • An issue with boolean value that are stringified as "True" or "False"
    (that is easy to fix on the addin side for this specifc type, but that made me worry about other types stringification)

  • Something that we did not mention, but there was a "Math equation" map as parameter of the word-to-dtbook script under development at some point that i was testing through the addin at the time.

I think the second point was the main issue that made me change the start function to accept more generic objects at the time.
But I realise just now that with the progress on the word-to-dtbook side with the ongoing port of preprocessing stage; this could not be an issue anymore (and if i recall, the map type for parameters was not ready yet to be used in calabash).

Anyway, i think the "stringifyOptions" approach will help ensure parameters are correctly stringified for usage of SimpleAPI Through JNI, while keeping the normal startJob behavior as targeted for command line usage.

- New stringifyOptions method to help convert options object value into compatible string values:
This method get a map of java objects and convert the values back to a Map of list of string values to ensure the parameters passed to the startJob method are well-formed.
- _startJob and startJob now returns the job it has created and launched for external monitoring.
- CommandLineJob has now a getNewMessages and a getErrors methods to help logging
  - the CommandLineJob getNewMessages method replaces the previous static SimpleAPI.getNewMessages method for finer per-job logging.
- Fix uri-encoded result file names.
@NPavie
Copy link
Contributor Author

NPavie commented Dec 14, 2023

@bertfrees I added a bit more documentation to the SimpleAPI and did the following:

  • Replaced the previous startMonitorableJob by a stringifyOptions method to get a map compatible with the startJob method
  • Made the startJob method to return a CommandLineJob object
  • Moved the getNewMessages, andgetErrors to the CommandLineJob object
  • Removed static getLastStatus in favour of the CommandLineJob.getStatus method
  • I kept the getMonitor with a small warning that it should be reserved for advance usage if needing a finer grained monitoring
    (I simplified a bit the API for my usage to get the new messages and the errors through a java function in the CommandLineJob object itself, but i'm planning to use the monitor getter in debug mode to get debug messages)
  • Updated the provided main to reflect the changes

It makes the API more CommandLineJob centric for logging and status checking.
This should ease the use of the API for JNI calls and also for batching if others want to use the pipeline through this API.

(And I also changed the job info messages output of the main to be on the standard output)

I'm thinking of making on the side a little C or C++ version of the main just to illustrate the usage through JNI, maybe in another repo for demo.

@bertfrees
Copy link
Member

Something that we did not mention, but there was a "Math equation" map as parameter of the word-to-dtbook script under development at some point that i was testing through the addin at the time.

I think there has been a bit of a misunderstanding. Calabash indeed supports options with any type of XDM value, and the word-to-dtbook step indeed had an option that accepted maps. However Pipeline scripts are an abstraction layer on top of XProc steps, and script options are unfortunately simple strings. It is of course also possible to create JNI bindings for the XProc API, and if I remember correctly we discussed that, and at one point I had even added it to the SimpleAPI class. But I changed my mind when I realized it was not a good idea. There is a reason we have the abstraction layer, and we should not try to work around it unless we have a very good reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants