Skip to content

Sharing pipeline configuration between Python (train env) and Java (prod env) #6315

Answered by maziyarpanahi
mwunderlich asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @mwunderlich

Spark NLP extends Spark ML Pipeline natively, in that sprit every model or PipelineModel is saved with metadata. (both default and already set parameters)

Regardless of where they are being trained/saved and being loaded (Python, Scala, Java, or R) the metadata that was saved for each stage (annotator) will be loaded alongside and it will behave exactly the same.

They look something like this:

  • metadata for the whole pipeline
{"class":"org.apache.spark.ml.PipelineModel","timestamp":1632168876633,"sparkVersion":"3.0.2","uid":"RECURSIVE_PIPELINE_b04dd1c887aa","paramMap":{"stageUids":["document_811d40a38b24","SENTENCE_ce56851acebe","REGEX_TOKENIZER_78daa3b4692f","SPELL_79c88…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@mwunderlich
Comment options

Answer selected by mwunderlich
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants