Skip to content
This repository has been archived by the owner on May 14, 2020. It is now read-only.
Kenny Bastani edited this page Aug 26, 2014 · 2 revisions

Graphify

Graphify is a Neo4j unmanaged extension that provides plug and play natural language text classification.

Graphify gives you a mechanism to train natural language parsing models that extract features of a text using deep learning. When training models, you send text documents or sentences extracted from a document and provide a set of labels that the text belongs to. Over time the natural language parsing model in Neo4j will grow to identify those features that optimally disambiguate a text.

Feature Hierarchy

The feature hierarchy is generated probabilistically as a result of a statistical analysis of neighboring words to a feature. By doing this it becomes possible to recognize a large set of features in test data by eliminating possibilities at each layer.

The lowest level representation of a feature is closest to the root pattern. In the case of Graphify, the root pattern is a space character. As training increases the number of examples that match the space character, deeper levels of representations will be generated by choosing features with the highest probability of being matched to the left or right of a feature.

An advantage of using Neo4j to do this is that you can attach labels to the features that matched text with those labels during training.

Features Have Classes

Using a 3D visualization tool called UbiGraph, a visualization of the feature hierarchy is visualized showing how deep feature representations grow over time.

Training the feature hierarchy

Vector Space Model

Graphify generates a Vector Space Model when classifying text on test data. There are two endpoints that provide classification and similarity features.

Classify unlabeled text

The first endpoint is http://localhost:7474/service/graphify/classify which supports the HTTP method POST. By posting the following JSON model, the text property will automatically be classified to the feature vector of all previously trained classes and sorted by the cosine similarity between these vectors.

{
    "text": "Interoperability is the ability of making systems and organizations work together."
}

The result that will be returned from Neo4j will be a sorted list of matches that are ordered on the cosine similarity of feature vectors for each class in the database.

{
    "classes": [
        {
            "class": "Interoperability",
            "similarity": 0.01478629324290398
        },
        {
            "class": "Natural language",
            "similarity": 0.014352533094325508
        },
        {
            "class": "Artificial intelligence",
            "similarity": 0.008389954131481638
        },
        {
            "class": "Graph database",
            "similarity": 0.006780234851792194
        },
        {
            "class": "Inference engine",
            "similarity": 0.005775135975571818
        },
        {
            "class": "Neo4j",
            "similarity": 0.005011493979094744
        },
        {
            "class": "Expert system",
            "similarity": 0.0045493507614881076
        },
        {
            "class": "Knowledge representation and reasoning",
            "similarity": 0.0035488311479422202
        },
        {
            "class": "Speech recognition",
            "similarity": 0.0035459146405026746
        },
        {
            "class": "Knowledge acquisition",
            "similarity": 0.0033585907499658666
        },
        {
            "class": "Memory",
            "similarity": 0.003286652624915932
        },
        {
            "class": "Cognitive robotics",
            "similarity": 0.0026605991849062826
        },
        {
            "class": "Hierarchical control system",
            "similarity": 0.0024852750266223995
        },
        {
            "class": "NoSQL",
            "similarity": 0.002359964627061625
        },
        {
            "class": "Hierarchical database model",
            "similarity": 0.0016629332691377717
        },
        {
            "class": "Never-Ending Language Learning",
            "similarity": 0.0014433749914281816
        },
        {
            "class": "Multilayer perceptron",
            "similarity": 0.0014070718231579983
        },
        {
            "class": "Sentence (linguistics)",
            "similarity": 0.0012682029230640021
        },
        {
            "class": "Argument",
            "similarity": 0.0012446298877431268
        },
        {
            "class": "Deep learning",
            "similarity": 0.0011171501184315629
        },
        {
            "class": "Inductive reasoning",
            "similarity": 0.0010671296082781958
        },
        {
            "class": "Machine translation",
            "similarity": 0.0010150803638098256
        },
        {
            "class": "Automatic Language Translator",
            "similarity": 0.001008811074376599
        },
        {
            "class": "Relational database",
            "similarity": 0.0009875922800915275
        },
        {
            "class": "Storage (memory)",
            "similarity": 0.000980910572273953
        },
        {
            "class": "Clause",
            "similarity": 0.0009355842513276578
        },
        {
            "class": "Dependency grammar",
            "similarity": 0.0006764745128168179
        },
        {
            "class": "Autoencoder",
            "similarity": 0.0005224831369792641
        },
        {
            "class": "Phrase",
            "similarity": 0.00029583989661492754
        }
    ]
}

Get similar classes

To get most related classes, which were provided during training as labels, the following endpoint: http://localhost:7474/service/graphify/similar/{class} provides a way to get the most similar classes to a provided class name. Again, this uses a vector space model generated from the hierarchy of features mined in the pattern recognition tree.

The result is a sorted list of classes ordered by the cosine similarity of each of the feature vectors associated with a class.

For example, issuing a HTTP GET request to the following endpoint, http://localhost:7474/service/graphify/similar/NoSQL returns the following results:

{
    "classes": [
        {
            "class": "Graph database",
            "similarity": 0.09574535643836013
        },
        {
            "class": "Relational database",
            "similarity": 0.07991318266439677
        },
        {
            "class": "Machine translation",
            "similarity": 0.07693041732140395
        },
        {
            "class": "Deep learning",
            "similarity": 0.07027180553561777
        },
        {
            "class": "Speech recognition",
            "similarity": 0.06491846260229797
        },
        {
            "class": "Knowledge representation and reasoning",
            "similarity": 0.061825794099321346
        },
        {
            "class": "Artificial intelligence",
            "similarity": 0.059426927894936345
        },
        {
            "class": "Multilayer perceptron",
            "similarity": 0.056943365042175544
        },
        {
            "class": "Hierarchical database model",
            "similarity": 0.05617955585333319
        },
        {
            "class": "Interoperability",
            "similarity": 0.05541367925131132
        },
        {
            "class": "Memory",
            "similarity": 0.05514558364443694
        },
        {
            "class": "Expert system",
            "similarity": 0.04869202636766413
        },
        {
            "class": "Inductive reasoning",
            "similarity": 0.04542968846354395
        },
        {
            "class": "Argument",
            "similarity": 0.04473621436021445
        },
        {
            "class": "Clause",
            "similarity": 0.03686385050753761
        },
        {
            "class": "Dependency grammar",
            "similarity": 0.035584209032388084
        },
        {
            "class": "Sentence (linguistics)",
            "similarity": 0.03329025076397098
        },
        {
            "class": "Inference engine",
            "similarity": 0.031225512897898145
        },
        {
            "class": "Neo4j",
            "similarity": 0.03101280823703653
        },
        {
            "class": "Storage (memory)",
            "similarity": 0.02979918393661567
        },
        {
            "class": "Hierarchical control system",
            "similarity": 0.028800749676585427
        },
        {
            "class": "Autoencoder",
            "similarity": 0.02527201414259688
        },
        {
            "class": "Cognitive robotics",
            "similarity": 0.023697018076748396
        },
        {
            "class": "Never-Ending Language Learning",
            "similarity": 0.021246276238820964
        },
        {
            "class": "Phrase",
            "similarity": 0.019941608021991825
        },
        {
            "class": "Natural language",
            "similarity": 0.019809613865907624
        },
        {
            "class": "Automatic Language Translator",
            "similarity": 0.017520049172816868
        },
        {
            "class": "Knowledge acquisition",
            "similarity": 0.01264614704679436
        }
    ]
}

Training

The training endpoint is located at http://localhost:7474/service/graphify/training. By issuing an HTTP POST request to this endpoint with the following model:

{
    "text": [
        "Interoperability is the ability of making systems and organizations work together."
    ],
    "label": [
        "Interoperability"
    ]
}

Features are learned through repetition. The more text containing similar phrases (ngrams), the more likely those features will be extracted and associated with any classes contained in prior training data.

Clone this wiki locally