SparkNLP 996 - Introducing Phi-2 #14178

prabod · 2024-02-19T10:30:14Z

Description

Introducing Phi-2
Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters.

Screenshots (if appropriate):

Types of changes

New feature (non-breaking change which adds functionality)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING page.
I have added tests to cover my changes.
All new and existing tests passed.

* SPARKNLP-942: MPNetForSequenceClassification * SPARKNLP-942: MPNetForQuestionAnswering * SPARKNLP-942: MPNet Classifiers Documentation * Restore RobertaforQA bugfix

…#14158)

* introducing LLAMA2 * Added option to read model from model path to onnx wrapper * Added option to read model from model path to onnx wrapper * updated text description * LLAMA2 python API * added method to save onnx_data * added position ids * - updated Generate.scala to accept onnx tensors - added beam search support for LLAMA2 * updated max input length * updated python default params changed test to slow test * fixed serialization bug

* Added retrieval interface to the doc sim rank approach * Added Python interface as retriever in doc sim ranker --------- Co-authored-by: Stefano Lori <s.lori@izicap.com>

* adding code * adding notebook for import --------- Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

…4155) * introducing LLAMA2 * Added option to read model from model path to onnx wrapper * Added option to read model from model path to onnx wrapper * updated text description * LLAMA2 python API * added method to save onnx_data * added position ids * - updated Generate.scala to accept onnx tensors - added beam search support for LLAMA2 * updated max input length * updated python default params changed test to slow test * fixed serialization bug * Added Scala code for M2M100 * Documentation for scala code * Python API for M2M100 * added more tests for scala * added tests for python * added pretrained * rewording * fixed serialization bug * fixed serialization bug --------- Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

Some annotators might have different naming schemes for their files. Added a parameter to control this.

…bs/spark-nlp into release/530-release-candidate

#14167) * [SPARKNLP-940] Adding changes to correctly copy cluster index storage when defined * [SPARKNLP-940] Moving local mode control to its right place * [SPARKNLP-940] Refactoring sentToCLuster method

jiamaozheng and others added 23 commits February 6, 2024 12:55

remove file system url prefix (#14132)

9377bb3

SPARKNLP-942: MPNet Classifiers (#14147)

db55524

* SPARKNLP-942: MPNetForSequenceClassification * SPARKNLP-942: MPNetForQuestionAnswering * SPARKNLP-942: MPNet Classifiers Documentation * Restore RobertaforQA bugfix

adding import notebook + changing default model + adding onnx support (…

37c4df2

…#14158)

Doc sim rank as retriever (#14149)

54d4605

* Added retrieval interface to the doc sim rank approach * Added Python interface as retriever in doc sim ranker --------- Co-authored-by: Stefano Lori <s.lori@izicap.com>

812 implement de berta for zero shot classification annotator (#14151)

6566239

* adding code * adding notebook for import --------- Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>

Add notebook for fine tuning sbert (#14152)

2e8410a

[SPARKNLP-986] Fixing optional input col validations (#14153)

c97e877

[SPARKNLP-984] Fixing Deberta notebooks URIs (#14154)

0e01a2c

SPARKNLP-985: Add flexible naming for onnx_data (#14165)

2efa215

Some annotators might have different naming schemes for their files. Added a parameter to control this.

Add LLAMA2Transformer and M2M100Transformer to annotator

8d66d3b

Add LLAMA2Transformer and M2M100Transformer to ResourceDownloader

41d2e1b

Merge branch 'release/530-release-candidate' of github.com:johnsnowla…

bb9f58b

…bs/spark-nlp into release/530-release-candidate

bump version to 5.3.0 [skip test]

08e9211

SPARKNLP-999: Fix remote model loading for some onnx models

6010244

used filesystem to check for the onnx_data file (#14169)

0e9b54d

[SPARKNLP-940] Adding changes to correctly copy cluster index storage… (

219fc19

#14167) * [SPARKNLP-940] Adding changes to correctly copy cluster index storage when defined * [SPARKNLP-940] Moving local mode control to its right place * [SPARKNLP-940] Refactoring sentToCLuster method

[SPARKNLP-988] Updating EntityRuler documentation (#14168)

f00f11a

Phi2 scala api

102650a

Phi2 python api

681f52d

Phi2 python and scala tests

aace059

Phi2 python and scala tests

17ba102

prabod changed the title ~~Sparknlp 996 implement phi as casual lm similar to llama 2 for text generation phi 2~~ SparkNLP 996 - Introducing Phi-2 Feb 19, 2024

prabod self-assigned this Feb 19, 2024

prabod added new-feature Introducing a new feature new model DON'T MERGE Do not merge this PR labels Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SparkNLP 996 - Introducing Phi-2 #14178

SparkNLP 996 - Introducing Phi-2 #14178

prabod commented Feb 19, 2024

SparkNLP 996 - Introducing Phi-2 #14178

Are you sure you want to change the base?

SparkNLP 996 - Introducing Phi-2 #14178

Conversation

prabod commented Feb 19, 2024

Description

Screenshots (if appropriate):

Types of changes

Checklist: