Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nuxeo - Talend integration #127

Open
mdutoo opened this issue Sep 25, 2012 · 5 comments
Open

Nuxeo - Talend integration #127

mdutoo opened this issue Sep 25, 2012 · 5 comments
Assignees

Comments

@mdutoo
Copy link
Member

mdutoo commented Sep 25, 2012

There are several solutions, some of them already work, so it must be decided to which extent this will go, according to partner's interests.

Talend - Nuxeo integration, Service-oriented :

That's a matter of telling Talend about endpoints and letting it get their (versioned) definition. When a service version goes up, developers should refresh & update the definition in the Talend job definition, patch the newly appeared inconsistencies and voilà.

Talend - Nuxeo integration, Data-oriented :

That's a first matter of easing for the Talend developper the mapping of Nuxeo fields to other Talend-known fields. There are several solutions :

  • using Talend's abilities to map XML (tXMLMap), and working with XML in and out of Nuxeo using the right Nuxeo XML services or even the bulk export's zipped XML. Limitation : the user has to specify himself (using xpath mainly) where are the document boundaries and what are the data fields in the XML.
  • using Nuxeo services that expose Nuxeo data as service definitions that Talend can process, such as WSDLs, and then going back to tree data (XML) mapping. Limitations : this is still remote from the document-oriented paradigm and some things could be easier to do (ex. facets, ACLs, relations ?).
  • defining Talend-known fields the Nuxeo way, by parsing Nuxeo document types definition and automatically generating Talend-known fields, which then are easily mappable by the user. That's what I did for the Alfresco ETL Connector for Talend (see Talend doc and http://knowledge.openwide.fr/Main/AlfrescoETLConnector ), by writing a custom Talend Studio dialog box plugin (see source) that lets the user choose document definition files, and then generating Talend data fields out of document fields. This allowed the user to import documents of only one type & field definitions but at any path in the document tree (which may itself be the request of a first folder-creating job), while the rest of the Alfresco connector Studio UI was done by merely writing the right plugin definition (see source) and letting Talend Studio generate the form out of it. Talking from Talend Studio to a remote Nuxeo's services is then done using a custom client library that uses those field definitions, along with the Talend data flow, to generate the right requests. This library has then to be provided to the Talend plugin's runtime, and called in this its .JET Java code templates (see runtime libraries as well as begin / main / end templates in source). This library is also the most easily testable part of it all (see mine), so put as much code in it rather than generating it.

In my experience (e.g. Alfresco ETL Connector), here are useful requirements for remote Nuxeo services to be used for Talend integration :

  • provide per-document commit and errror return, to allow for iterative import (e.g. write a first Job working for 80% of the mass import, then tweaking it for the remaining part...). Calling Nuxeo once for each document to import does the job, but when scaling up there may be too much requests, in which case it should...
  • ...allow scalable per-request mass import of document, each request importing a lot (ex. 100k of XML) of documents and return import result code (success or error message) per document.
@ghost ghost assigned tiry Sep 25, 2012
@tiry
Copy link
Member

tiry commented Nov 26, 2012

Raw XML Mapping

We could use CoreIO to manage XML data interface with Nuxeo.

This should work easily, how evever this would be more a data level integration than a service level integration :

  • document level granularity
  • limited to import/export

SOAP

Using SOAP Webservice exposed by Nuxeo is an option.

However :

Using Automation

As for any WebService issue in Nuxeo, I would prefer to use Automation that provide adaptability of API and of granularity.

Inside Nuxeo Automation there are a list of well known types :

  • document
  • lists of documents
  • blobs
  • JSON Objects
  • ...

And for each operation we have a definition of :

  • operation name
  • operation input type
  • operation output type
  • operation parameter

Ideally, when using Talend I would like to be able to call any Automation Chain or Automation Operation that is exposed by Nuxeo.
Then, dependening on the Input/Output of the target Operation/Chain, I can do the mapping inside Talend.

On the Talend side, naively I would say that it requires :

  • fetching from Nuxeo the Operations definition
  • build a dynamic configuration dialog box
  • call the Operation with the right parameters
  • process the result

NB : If needed, we can easily extend the default Automation Marshaling to provide XML instead of JSON.

@tiry
Copy link
Member

tiry commented Nov 26, 2012

As already discussed I think the best option would be to have 1 Nuxeo dev + 1 Talend dev working together during 1 or 2 days to define what can be done.

@tiry
Copy link
Member

tiry commented Dec 10, 2012

An initial Connector was commited here https://github.com/tiry/nuxeo-talend-components.

For now this code is in a Sandbox because :

  • an import connector is still needed (I'll work on this ASAP)
  • some improvements are needed on Nuxeo side (see https://jira.nuxeo.com/browse/NXP-10615)
  • I would like to make the connector code better but I need help on that (if there are known ways of doing that)
    • how to avoid commiting jars (and simply fetch them via maven) ?
    • how to do unit tests ?
    • how to configure the schema mapper without having to "hack" using the comment field for XPath

Of course, this would be even better to be able to have a "custom UI" where we can select Operation and display parameters according to selected Operation.

=> This would also allow to have a kind of playground to test Automation API from withing TOS

Tiry

@mdutoo
Copy link
Member Author

mdutoo commented Dec 11, 2012

Nice !

Cédric, your suggestions about Thierry's connector best practices questions ?

About avoiding the comment field "hack" for XPath : I'd say have another field of type list and of the same size as the number of rows (arguments) and store XPath there at the same row index. A custom UI would allow to resize & fill it transparently.

About TOS Automation playground : now I guess it would be much more work... Being able to generate the right Talend data schema (and XPath list field) from you Automation Operation definitions would be the first step. From there TOS itself could be used as "smart playground-like client".

@tiry
Copy link
Member

tiry commented Dec 11, 2012

I just added support for RecordSet object in Automation, so I should be able to leverage this from Talend.
I'll try to work on that tomorow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants