Skip to content

donyori/gocorenlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gocorenlp

Go Report Card Go Reference

A Go (Golang) client for Stanford CoreNLP server.

Installation

Similar to getting other Go libraries, run the go get command for the latest version:

go get github.com/donyori/gocorenlp@latest

Or specify a particular version (for example, v0.1.0):

go get github.com/donyori/gocorenlp@v0.1.0

For more information, see go get documentation here and here.

Usage

0. Prerequisites

We assume that you are familiar with the Stanford CoreNLP server. If not, see its official page.

Before using this library, you need to have Stanford CoreNLP installed or have a Stanford CoreNLP server available somewhere. To install Stanford CoreNLP, see its download page.


1. Start with the client package

The client package provides functionality to annotate your human language text with the CoreNLP server. See the documentation of package client for details.

Now we assume that you have already launched a CoreNLP server on 127.0.0.1:9000. (If your server is elsewhere, see Section 2.)

Here is a simple example of using this server to annotate the text

The quick brown fox jumped over the lazy dog.

package main

import (
	"fmt"

	"github.com/donyori/gocorenlp/client"
	"github.com/donyori/gocorenlp/model/v4.5.6-eb50467fa8e3/pb"
)

func main() {
	text := "The quick brown fox jumped over the lazy dog."
	annotators := "tokenize,ssplit,pos,lemma"

	// Specify the document model.
	// Depending on your CoreNLP version, use the appropriate model.
	// See package github.com/donyori/gocorenlp/model for details.
	doc := new(pb.Document)

	// Annotate the text with the specified annotators
	// and store the result in doc.
	err := client.AnnotateString(text, annotators, doc)
	if err != nil {
		panic(err) // handle error
	}

	// Print some annotation results.
	fmt.Println("Original text:", doc.GetText())
	fmt.Println("+--------+-----+--------+")
	fmt.Println("| Word   | POS | Lemma  |")
	fmt.Println("+--------+-----+--------+")
	for _, token := range doc.GetSentence()[0].GetToken() {
		fmt.Printf(
			"| %-7s| %-4s| %-7s|\n",
			token.GetWord(),
			token.GetPos(),
			token.GetLemma(),
		)
	}
	fmt.Println("+--------+-----+--------+")
}

It outputs:

Original text: The quick brown fox jumped over the lazy dog.
+--------+-----+--------+
| Word   | POS | Lemma  |
+--------+-----+--------+
| The    | DT  | the    |
| quick  | JJ  | quick  |
| brown  | JJ  | brown  |
| fox    | NN  | fox    |
| jumped | VBD | jump   |
| over   | IN  | over   |
| the    | DT  | the    |
| lazy   | JJ  | lazy   |
| dog    | NN  | dog    |
| .      | .   | .      |
+--------+-----+--------+

2. Create a client with custom settings

The previous example uses the default client, which can only connect to the CoreNLP server on 127.0.0.1:9000.

If you want to use a CoreNLP server elsewhere, you need to create a new client.

To create a client, use function New. The supported options are in struct Options.

Here is an example snippet:

c, err := client.New(&client.Options{
	Hostname:   "localhost", // Set the hostname here. If omitted, "127.0.0.1" is used.
	Port:       8080,        // Set the port number here. If omitted, 9000 is used.
	StatusPort: 8081,        // Set the port number of the status server here. If omitted, it is the same as Port.

	ClientTimeout: time.Second * 15,      // Set a timeout for each request here.
	Charset:       "utf-8",               // Set the charset of your text here. If omitted, "utf-8" is used.
	Annotators:    "tokenize,ssplit,pos", // Set the default annotators here.

	// Set the username and password here
	// if your server requires a basic auth.
	Username: "Alice",
	Password: "Alice's password",

	// If your server has a server ID
	// (i.e., server name, set by -server_id),
	// set it here.
	ServerID: "CoreNLPServer",
})
if err != nil {
	panic(err) // handle error
}
// Now you can work with the new client c.

3. Check status and stop server

You can check if the server is online (liveness) and ready to accept connections (readiness) using functions/methods Live and Ready:

if err := client.Live(); err == nil {
	fmt.Println("Server is live.")
} else {
	fmt.Println("Server is offline.")
}

if err := client.Ready(); err == nil {
	fmt.Println("Server is ready to accept connections.")
} else {
	fmt.Println("Server is not ready.")
}

In addition, you can shut down the server through the client.

To shut down a local server, use function Shutdown or client's method ShutdownLocal:

err := client.Shutdown()
if err == nil {
	fmt.Println("Server has been shut down.")
} else {
	panic(err) // handle error
}

To shut down a remote server, you need to provide the shutdown key and using client's method Shutdown. (Don't know what the shutdown key is? See here.)


4. Cache annotation results

You can cache the annotation results for future use. To do this, use functions/methods AnnotateRaw or AnnotateStringRaw.

Here is an example snippet to cache the annotation results in a local file:

// Create a file to save the annotation result.
filename := "./annotation.ann"
f, err := os.Create(filename)
if err != nil {
	panic(err) // handle error
}
defer f.Close()

// Annotate the text with the specified annotators
// and store the result in f.
_, err = client.AnnotateStringRaw(text, annotators, f)
if err != nil {
	panic(err) // handle error
}
// Then you can use the annotation by reading it from the file.

AnnotateRaw and AnnotateStringRaw output data without parsing. You can decode it into the document model by function DecodeResponseBody in our package model:

doc := new(pb.Document) // specify your document model
err := model.DecodeResponseBody(data, doc) // data is that output by AnnotateRaw or AnnotateStringRaw
if err != nil {
	panic(err) // handle error
}

If you put many annotation results together in a large file, you can decode them using ResponseBodyDecoder:

// Open your annotation result file.
filename := "./annotation.ann"
f, err := os.Open(filename)
if err != nil {
	panic(err) // handle error
}
defer f.Close()

// Create a ResponseBodyDecoder on it.
dec := model.NewResponseBodyDecoder(f)

var doc pb.Document // specify your document model
// Decode the annotation results until EOF.
for {
	err := dec.Decode(&doc)
	if err != nil {
		if errors.Is(err, io.EOF) {
			break
		}
		panic(err) // handle error
	}
	// Work with doc.
}

5. Annotation data model

The CoreNLP server provides several forms to present annotation results, such as JSON, XML, text, and serialized formats (including ProtoBuf). (See this page for details.)

Our client asks the CoreNLP server to serialize the results in Protocol Buffers (ProtoBuf).

At the current stage, we provide the models supporting CoreNLP 3.6.0, and 4.0.0 to 4.5.6. These models are organized into subpackages of model named in the form

github.com/donyori/gocorenlp/model/vX.Y.Z-abcdefabcdef/pb

where vX.Y.Z-abcdefabcdef means:

  • X.Y.Z is the version of CoreNLP.
  • abcdefabcdef is a 12-character prefix of the commit hash of the retrieved ProtoBuf file in the Stanford CoreNLP project.

For example, the model for CoreNLP 4.5.0 is in

github.com/donyori/gocorenlp/model/v4.5.0-45b47e245c36/pb

See the documentation of package model for details.

The following table shows the correspondence between models and CoreNLP versions:

Model Package CoreNLP Version
model/v3.6.0-29765338a2e8/pb 3.6.0
model/v4.0.0-2b3dd38abe00/pb 4.0.0
model/v4.1.0-a1427196ba6e/pb 4.1.0
model/v4.2.0-3ad83fc2e42e/pb 4.2.0
model/v4.2.1-d8d09b2c81a5/pb 4.2.1, 4.2.2
model/v4.3.0-f885cd198767/pb 4.3.0, 4.3.1, 4.3.2
model/v4.4.0-e90f30f13c40/pb 4.4.0
model/v4.5.0-45b47e245c36/pb 4.5.0, 4.5.1
model/v4.5.2-9c3dfee5af50/pb 4.5.2
model/v4.5.3-5250f9faf9f1/pb 4.5.3, 4.5.4
model/v4.5.5-f1b929e47a57/pb 4.5.5
model/v4.5.6-eb50467fa8e3/pb 4.5.6

(CoreNLP 4.2.1 and 4.2.2 use exactly the identical ProtoBuf model, as do 4.3.0, 4.3.1, 4.3.2, and 4.5.0, 4.5.1, and 4.5.3, 4.5.4.)

Our library also accepts the ProtoBuf model generated by you. You can retrieve the CoreNLP ProtoBuf file (edu.stanford.nlp.pipeline.CoreNLP.proto) from your CoreNLP JAR file (usually stanford-corenlp-X.Y.Z-sources.jar), edit it, and compile it to .go file. Then pass your Document struct to our API to make it work with your CoreNLP server:

doc := new(mymodel.Document) // mymodel.Document is the document struct compiled by you
err := client.AnnotateString(text, annotators, doc)
if err != nil {
	panic(err) // handle error
}

You can also retrieve the CoreNLP ProtoBuf file from its GitHub repository.

About how to compile ProtoBuf to Go, see this tutorial.


For more documentation about this library, see on pkg.go.dev.

License

The GNU Affero General Public License 3.0 (AGPL-3.0) - Yuan Gao. Please have a look at the LICENSE.

Contact

You can contact me by email: <donyoridoyodoyo@outlook.com>.

About

A Go (Golang) client for Stanford CoreNLP server.

Topics

Resources

License

Stars

Watchers

Forks

Languages