Skip to content

mariamihai/neo4j-related

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Neo4j

Table of Content

My completed courses

To finish:

Cypher

Pattern:

  • nodes with (): (Person)
  • labels with :: (:Person)
  • relationships with -- or greater or less for direction (->, <-): (:Person)--(:Movie) or (:Person)->(:Movie)
  • type of relationship with []: [:ACTED_IN]
  • properties are specified in JSON like syntax: {name: 'Tom Hanks'}

Example of pattern: (m:Movie {title: 'Cloud Atlas'})<-[:ACTED_IN]-(p:Person)


  • labels, property keys and variables are case-sensitive
  • cypher keywords are not case-sensitive
  • best practices:
    • name labels with CamelCase
    • property keys and variables with camelCase
    • cypher keywords with UPPERCASE
    • relationships are UPPERCASE with _ characters
    • have at least one label for a node but no more than four (labels should help with most of the use cases)
    • labels should have nothing to do with one another
    • better not to use the same type of label in different contexts
    • don't label the nodes to represent hierarchies
    • eliminate duplicate data. Create new nodes and relationships if necessary. Queries related to the information in the nodes require that all nodes be retrieved.

MATCH

  • read data
  • similar to the FROM clause in an SQL statement
  • need to return something
  • you don't need to specify direction in the MATCH pattern, the query engine will look for all nodes that are connected, regardless of the direction of the relationship
Code examples

Return all nodes:

MATCH (n)
RETURN n

Return all nodes with the label Person:

MATCH (p:Person)
RETURN p

Return a person based on a property:

MATCH (p:Person {name: 'Tom Hanks'})
RETURN p

Return a property:

MATCH (p:Person {name: 'Tom Hanks'})
RETURN p.born

Return a property based on a relation:

MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)
RETURN m.title

WHERE

Code examples

Filter by specifying the property value:

MATCH (p:Person)
WHERE p.name = 'Tom Hanks' OR p.name = 'Rita Wilson'
RETURN p.name, p.born

Filter by node labels:

MATCH (p)-[:ACTED_IN]->(m)
WHERE p:Person AND m:Movie AND m.title='The Matrix'
RETURN p.name

is the same as:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.title='The Matrix'
RETURN p.name

Filter with ranges:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE 2000 <= m.released <= 2003
RETURN p.name, m.title, m.released

Filter by existence of a property:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name='Jack Nicholson' AND m.tagline IS NOT NULL
RETURN m.title, m.tagline

Filter strings:

  • partial strings (STARTS WITH, ENDS WITH, CONTAINS):
MATCH (p:Person)-[:ACTED_IN]->()
WHERE p.name STARTS WITH 'Michael'
RETURN p.name
  • string tests are case-sensitive
  • toLower(), toUpper() functions
MATCH (p:Person)-[:ACTED_IN]->()
WHERE toLower(p.name) STARTS WITH 'michael'
RETURN p.name

Filter by patterns in the graph:

// Find all people who wrote a movie but not directed it
MATCH (p:Person)-[:WROTE]->(m:Movie)
WHERE NOT exists( (p)-[:DIRECTED]->(m) )
RETURN p.name, m.title

Filter using lists:

  • of numeric or string values
MATCH (p:Person)
WHERE p.born IN [1965, 1970, 1975]
RETURN p.name, p.born
  • existing lists in the graph
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE  'Neo' IN r.roles AND m.title='The Matrix'
RETURN p.name, r.roles

Filter based on the existence of a relationship:

MATCH (p:Person)
WHERE exists ((p)-[:ACTED_IN]-()) // or WHERE NOT exists ((p)-[:ACTED_IN]-())
SET p:Actor

MERGE

  • the MERGE operations work by first trying to find a pattern in the graph. If the pattern is found then the data already exists and is not created. If the pattern is not found, then the data can be created
  • when using MERGE you need to add at least a property that will make the unique primary key for the node
Code examples
MERGE (p:Person {name: 'Michael Cain'})

Can merge multiple MERGE clauses together:

MERGE (p:Person {name: 'Katie Holmes'})
MERGE (m:Movie {title: 'The Dark Knight'})
RETURN p, m

Create a relationship based on 2 existing nodes:

MATCH (p:Person {name: 'Michael Cain'})
MATCH (m:Movie {title: 'The Dark Knight'})
MERGE (p)-[:ACTED_IN]->(m)

Create the nodes and the relationship

  • using multiple clauses:
MERGE (p:Person {name: 'Chadwick Boseman'})
MERGE (m:Movie {title: 'Black Panther'})
MERGE (p)-[:ACTED_IN]-(m)

(if the direction of the relationship is not set, it is assumed to be left-to-right)

  • in single clause
MERGE (p:Person {name: 'Emily Blunt'})-[:ACTED_IN]->(m:Movie {title: 'A Quiet Place'})
RETURN p, m

Customized MERGE behavior

  • set behavior at runtime to set properties when the node is created or when it is found with ON CREATE SET, ON MATCH SET or SET
Code example
// Find or create a person with this name
MERGE (p:Person {name: 'McKenna Grace'})

// Only set the `createdAt` property if the node is created during this query
ON CREATE SET p.createdAt = datetime()

// Only set the `updatedAt` property if the node was created previously
ON MATCH SET p.updatedAt = datetime()

// Set the `born` property regardless
SET p.born = 2006

RETURN p

CREATE

  • it doesn't look up the primary key before adding the node
  • provides greater speed during import
  • MERGE eliminates duplication of nodes
Code examples

Create nodes:

CREATE (n);

CREATE (n:Person);

CREATE (n:Person {name: 'Andy', title: 'Developer'});

Create relationships:

MATCH
  (a:Person),
  (b:Person)
WHERE a.name = 'A' AND b.name = 'B'
CREATE (a)-[r:RELTYPE]->(b)
RETURN type(r)

SET

  • set a property value
  • this can be done with MERGE as well
Code examples

Set one or more properties:

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE p.name = 'Michael Cain' AND m.title = 'The Dark Knight'
SET r.roles = ['Alfred Penny'], r.year = 2008
RETURN p, r, m

Update existing properties:

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE p.name = 'Michael Cain' AND m.title = 'The Dark Knight'
SET r.roles = ['Mr. Alfred Penny']
RETURN p, r, m

Add new label to a node:

MATCH (p:Person {name: 'Jane Doe'})
SET p:Developer
RETURN p

Unsetting a property

Code example

Remove property:

MATCH (p:Person)
WHERE p.name = 'Gene Hackman'
SET p.born = null
RETURN p

REMOVE

Code examples

Remove a property:

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE p.name = 'Michael Cain' AND m.title = 'The Dark Knight'
REMOVE r.roles
RETURN p, r, m

Remove a label from a node:

MATCH (p:Person {name: 'Jane Doe'}) // Same as MATCH (p:Person:Developer {name: 'Jane Doe'})
REMOVE p:Developer
RETURN p

DELETE

  • attempting to delete a node with a relationship will throw an error - Neo4j prevents orphaned relationships in the graph
Code examples
MATCH (p:Person)
WHERE p.name = 'Jane Doe'
DELETE p

Remove a relationship:

MATCH (p:Person {name: 'Jane Doe'})-[r:ACTED_IN]->(m:Movie {title: 'The Matrix'})
DELETE r
RETURN p, m

Using DETACH

Code examples

Delete a node and all its relationships:

MATCH (p:Person {name: 'Jane Doe'})
DETACH DELETE p

Delete all nodes and all relationships in the graph:

MATCH (n)
DETACH DELETE n

(this will exhaust memory on a large db)

UNWIND

  • expand a list into a sequence of rows
  • nothing is returned if the list is empty or the expression is not a list
Code examples
UNWIND [1, 2, 3, null] AS x // null is returned as well
RETURN x, 'val' AS y 

Create a distinct list:

WITH [1, 1, 2, 2] AS coll
UNWIND coll AS x
WITH DISTINCT x
RETURN collect(x) AS setOfVals // [1,2]

Using UNWIND with any expression returning a list:

WITH
  [1, 2] AS a,
  [3, 4] AS b
UNWIND (a + b) AS x
RETURN x // the lists are concatenated and 4 rows are returned

Use multiple UNWIND clauses with a nested list:

WITH [[1, 2], [3, 4], 5] AS nested
UNWIND nested AS x
UNWIND x AS y
RETURN y // 5 rows

Replace empty list with null with CASE:

WITH [] AS list
UNWIND
  CASE
    WHEN list = [] THEN [null]
    ELSE list
  END AS emptylist
RETURN emptylist

Example of splitting the languages from movies to own nodes:

MATCH (m:Movie)
UNWIND m.languages AS language
WITH  language, collect(m) AS movies
MERGE (l:Language {name:language})
WITH l, movies
UNWIND movies AS m
WITH l,m
MERGE (m)-[:IN_LANGUAGE]->(l);
MATCH (m:Movie)
SET m.languages = null

Example of splitting genres to own nodes:

MATCH (m:Movie)
UNWIND m.genres AS genre
MERGE (g:Genre {name: genre})
MERGE (m)-[:IN_GENRE]->(g)
SET m.genres = null

Other

  • keys() - get the properties of a node
MATCH (p:Person)
RETURN p.name, keys(p) 
  • get all node labels defined in the graph
CALL db.labels()
  • get all property keys defined (even if there are no nodes or relationships with them anymore)
CALL db.propertyKeys()
  • date specific uses

    • datetime() - current date and time
    • date("2019-09-30") = 2019-09-29
    • datetime({epochmillis: ms}) = 2019-09-25T06:29:39Z
    • use APOC functions for more specific needs (apoc.temporal)
  • use transactions by wrapping the queries with :BEGIN and :COMMIT:

:BEGIN

MATCH (u:User)
SET u.name = "Steve"

:COMMIT 
  • produce a query plan showing the operations that occurred during a query:
PROFILE MATCH (p:Person)-[:ACTED_IN]-()
WHERE p.born < '1950'
RETURN p.name 
  • use APOC for creating new and specialized relationships
MATCH (n:Actor)-[r:ACTED_IN]->(m:Movie)
CALL apoc.merge.relationship(n,
                              'ACTED_IN_' + left(m.released,4),
                              {},
                              m ) YIELD rel
RETURN COUNT(*) AS `Number of relationships merged`
  • view the schema with :schema

  • visualize: CALL db.schema.visualization

Graph Data Modeling

The process to create a graph data model:

  • understand the domain and define use cases

    • describe the app in details
    • identify the users of the app (people, systems)
    • identify the use cases
    • rank them based on importance
  • develop the initial model

    • model the nodes (the entities)
    • model the relationships between nodes

    Types of models:

    • data model - describe the labels, relationships and properties of the graph
    • instance model - sample data used to test against the use cases

    The node properties are used to uniquely identify a node, answer specific details of the use cases and / or return data.

    They are defined based on the use cases and the steps required to answer them. Examples:

    • What people acted in a movie?
      • Retrieve a movie by its title.
      • Return the names of the actors.
    • What movies did a person act in?
      • Retrieve a person by their name.
      • Return the titles of the movies.
    • What is the highest rated movie in a particular year according to imDB?
      • Retrieve all movies released in a particular year.
      • Evaluate the imDB ratings.
      • Return the movie title.

    Relationships are usually between 2 different nodes, but they can also be to the same node.

    Can add specialized relationships if that will filter fewer nodes but keeping the original generic relationships as well. For eg., besides ACTED_IN can add ACTED_IN_2023 as wel.

    Can create intermediate nodes when you need to:

    • connect more than 2 nodes in a single context (hyperedges, n-ary relationships)
    • relate something to a relationship
    • share data in the graph between entities
  • test the use cases against the initial data model

  • create the instance model with test data using Cypher

  • test the use cases including performance against the graph

  • refactor the graph data model in case of changes in the key use cases or for performance reasons

  • implement the refactoring on the graph and retest using Cypher

Import data

  • Cypher has a built-in clause (LOAD CSV), for importing JSON or XML need to use the APOC library

  • default field terminator is ,

  • the types of data that you can store as properties in Neo4j include:

    • String
    • Long (integer values)
    • Double (decimal values)
    • Boolean
    • Date/Datetime
    • Point (spatial)
    • StringArray (comma-separated list of strings)
    • LongArray (comma-separated list of integer values)
    • DoubleArray (comma-separated list of decimal values)

Random

  • Neo4j’s Cypher statement language is optimized for node traversal so that relationships are not traversed multiple times
  • each relationship must have a direction in the graph. The relationship can be queried in either direction, or ignored completely at query time
  • Neo4j stores nodes and relationships as objects that are linked to each other via pointers
    • index-free adjacency - a reference to the relationship is stored with both start and end nodes

golang-migrate

Docs: migrate, go.

go install -tags 'neo4j' github.com/golang-migrate/migrate/v4/cmd/migrate@latest

migrate -h

# ext specifies the file extension to use when creating migrations file.
# dir specifies which directory to create the migrations in.
migrate create -ext cypher -dir db/migrations <filename>

# neo4j://user:password@host:port/
export DB_URL='...'

# run migrations
migrate -database ${DB_URL} -path db/migrations up
migrate -database <db> -path db/migrations up
# rollback migrations
migrate -database <db> -path db/migrations down

# run the first two migrations
migrate -source db/migrations -database <db> up 2
# migrations hosted on github
migrate -source github://mattes:personal-access-token@mattes/migrate_test \
        -database <db> down 2

# docker usage
docker run -v {{ migration dir }}:/migrations --network host migrate/migrate
    -path=/migrations/ -database <db> up
    
# drop everything inside the db (verbose)
migrate -database <db> -path db/migrations -verbose drop

Issues

Running the migration on an empty db

error: Server error: [Neo.ClientError.Statement.SyntaxError] Invalid constraint syntax, ON and ASSERT should not be used. Replace ON with FOR and ASSERT with REQUIRE. (line 1, column 1 (offset: 0))
"CREATE CONSTRAINT ON (a:SchemaMigration) ASSERT a.version IS UNIQUE"

Fix:

Create the constraint manually:

CREATE CONSTRAINT FOR (a:SchemaMigration) REQUIRE a.version IS UNIQUE

Issue coming from here.

Dirty database version

Dirty database version xxx. Fix and force version.

Check schema migration:

MATCH(sm:SchemaMigration) RETURN sm

This will return something like this with dirty = true:

{
  "identity": 0,
  "labels": [
    "SchemaMigration"
  ],
  "properties": {
    "dirty": true,
    "version": 20230120122715,
    "ts": "2023-01-20T13:52:44.802000000Z"
  },
  "elementId": "0"
}

Fix:

Clean up the database and then change the dirty flag on SchemaMigration and rollback version number to last migration that was successfully applied.

MATCH(sm:SchemaMigration) SET sm.dirty = false, sm.version = <previous-version> RETURN sm

Can set version with:

migrate force V  # Set version V but don't run migration (ignores dirty state)

migrate -database <db> -path db/migrations -verbose version <version>

Links