Neo4j

Table of Content

My completed courses
Cypher
- MATCH
- WHERE
- MERGE
  - Customized MERGE behavior
- CREATE
- SET
- REMOVE
- DELETE
  - Using DETACH
- UNWIND
- Other
Graph Data Modeling
Import Data
Random
golang-migrate
- Issues
  - Running the migration on an empty db
  - Dirty database version
Links

My completed courses

Neo4j Fundamentals
Cypher Fundamentals
Graph Data Modeling Fundamentals
Building Neo4j Applications with Go (my implementation here)

To finish:

Cypher

Pattern:

nodes with (): (Person)
labels with :: (:Person)
relationships with -- or greater or less for direction (->, <-): (:Person)--(:Movie) or (:Person)->(:Movie)
type of relationship with []: [:ACTED_IN]
properties are specified in JSON like syntax: {name: 'Tom Hanks'}

Example of pattern: (m:Movie {title: 'Cloud Atlas'})<-[:ACTED_IN]-(p:Person)

labels, property keys and variables are case-sensitive
cypher keywords are not case-sensitive
best practices:
- name labels with CamelCase
- property keys and variables with camelCase
- cypher keywords with UPPERCASE
- relationships are UPPERCASE with _ characters
- have at least one label for a node but no more than four (labels should help with most of the use cases)
- labels should have nothing to do with one another
- better not to use the same type of label in different contexts
- don't label the nodes to represent hierarchies
- eliminate duplicate data. Create new nodes and relationships if necessary. Queries related to the information in the nodes require that all nodes be retrieved.

MATCH

read data
similar to the FROM clause in an SQL statement
need to return something
you don't need to specify direction in the MATCH pattern, the query engine will look for all nodes that are connected, regardless of the direction of the relationship

Code examples

Return all nodes:

MATCH (n)
RETURN n

Return all nodes with the label Person:

MATCH (p:Person)
RETURN p

Return a person based on a property:

MATCH (p:Person {name: 'Tom Hanks'})
RETURN p

Return a property:

MATCH (p:Person {name: 'Tom Hanks'})
RETURN p.born

Return a property based on a relation:

MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)
RETURN m.title

WHERE

Code examples

Filter by specifying the property value:

MATCH (p:Person)
WHERE p.name = 'Tom Hanks' OR p.name = 'Rita Wilson'
RETURN p.name, p.born

Filter by node labels:

MATCH (p)-[:ACTED_IN]->(m)
WHERE p:Person AND m:Movie AND m.title='The Matrix'
RETURN p.name

is the same as:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.title='The Matrix'
RETURN p.name

Filter with ranges:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE 2000 <= m.released <= 2003
RETURN p.name, m.title, m.released

Filter by existence of a property:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name='Jack Nicholson' AND m.tagline IS NOT NULL
RETURN m.title, m.tagline

Filter strings:

partial strings (STARTS WITH, ENDS WITH, CONTAINS):

MATCH (p:Person)-[:ACTED_IN]->()
WHERE p.name STARTS WITH 'Michael'
RETURN p.name

string tests are case-sensitive
toLower(), toUpper() functions

MATCH (p:Person)-[:ACTED_IN]->()
WHERE toLower(p.name) STARTS WITH 'michael'
RETURN p.name

Filter by patterns in the graph:

// Find all people who wrote a movie but not directed it
MATCH (p:Person)-[:WROTE]->(m:Movie)
WHERE NOT exists( (p)-[:DIRECTED]->(m) )
RETURN p.name, m.title

Filter using lists:

of numeric or string values

MATCH (p:Person)
WHERE p.born IN [1965, 1970, 1975]
RETURN p.name, p.born

existing lists in the graph

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE  'Neo' IN r.roles AND m.title='The Matrix'
RETURN p.name, r.roles

Filter based on the existence of a relationship:

MATCH (p:Person)
WHERE exists ((p)-[:ACTED_IN]-()) // or WHERE NOT exists ((p)-[:ACTED_IN]-())
SET p:Actor

MERGE

the MERGE operations work by first trying to find a pattern in the graph. If the pattern is found then the data already exists and is not created. If the pattern is not found, then the data can be created
when using MERGE you need to add at least a property that will make the unique primary key for the node

Code examples

MERGE (p:Person {name: 'Michael Cain'})

Can merge multiple MERGE clauses together:

MERGE (p:Person {name: 'Katie Holmes'})
MERGE (m:Movie {title: 'The Dark Knight'})
RETURN p, m

Create a relationship based on 2 existing nodes:

MATCH (p:Person {name: 'Michael Cain'})
MATCH (m:Movie {title: 'The Dark Knight'})
MERGE (p)-[:ACTED_IN]->(m)

Create the nodes and the relationship

using multiple clauses:

MERGE (p:Person {name: 'Chadwick Boseman'})
MERGE (m:Movie {title: 'Black Panther'})
MERGE (p)-[:ACTED_IN]-(m)

(if the direction of the relationship is not set, it is assumed to be left-to-right)

in single clause

MERGE (p:Person {name: 'Emily Blunt'})-[:ACTED_IN]->(m:Movie {title: 'A Quiet Place'})
RETURN p, m

Customized MERGE behavior

set behavior at runtime to set properties when the node is created or when it is found with ON CREATE SET, ON MATCH SET or SET

Code example

// Find or create a person with this name
MERGE (p:Person {name: 'McKenna Grace'})

// Only set the `createdAt` property if the node is created during this query
ON CREATE SET p.createdAt = datetime()

// Only set the `updatedAt` property if the node was created previously
ON MATCH SET p.updatedAt = datetime()

// Set the `born` property regardless
SET p.born = 2006

RETURN p

CREATE

it doesn't look up the primary key before adding the node
provides greater speed during import
MERGE eliminates duplication of nodes

Code examples

Create nodes:

CREATE (n);

CREATE (n:Person);

CREATE (n:Person {name: 'Andy', title: 'Developer'});

Create relationships:

MATCH
  (a:Person),
  (b:Person)
WHERE a.name = 'A' AND b.name = 'B'
CREATE (a)-[r:RELTYPE]->(b)
RETURN type(r)

SET

set a property value
this can be done with MERGE as well

Code examples

Set one or more properties:

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE p.name = 'Michael Cain' AND m.title = 'The Dark Knight'
SET r.roles = ['Alfred Penny'], r.year = 2008
RETURN p, r, m

Update existing properties:

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE p.name = 'Michael Cain' AND m.title = 'The Dark Knight'
SET r.roles = ['Mr. Alfred Penny']
RETURN p, r, m

Add new label to a node:

MATCH (p:Person {name: 'Jane Doe'})
SET p:Developer
RETURN p

Unsetting a property

Code example

Remove property:

MATCH (p:Person)
WHERE p.name = 'Gene Hackman'
SET p.born = null
RETURN p

REMOVE

Code examples

Remove a property:

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE p.name = 'Michael Cain' AND m.title = 'The Dark Knight'
REMOVE r.roles
RETURN p, r, m

Remove a label from a node:

MATCH (p:Person {name: 'Jane Doe'}) // Same as MATCH (p:Person:Developer {name: 'Jane Doe'})
REMOVE p:Developer
RETURN p

DELETE

attempting to delete a node with a relationship will throw an error - Neo4j prevents orphaned relationships in the graph

Code examples

MATCH (p:Person)
WHERE p.name = 'Jane Doe'
DELETE p

Remove a relationship:

MATCH (p:Person {name: 'Jane Doe'})-[r:ACTED_IN]->(m:Movie {title: 'The Matrix'})
DELETE r
RETURN p, m

Using DETACH

Code examples

Delete a node and all its relationships:

MATCH (p:Person {name: 'Jane Doe'})
DETACH DELETE p

Delete all nodes and all relationships in the graph:

MATCH (n)
DETACH DELETE n

(this will exhaust memory on a large db)

UNWIND

expand a list into a sequence of rows
nothing is returned if the list is empty or the expression is not a list

Code examples

UNWIND [1, 2, 3, null] AS x // null is returned as well
RETURN x, 'val' AS y

Create a distinct list:

WITH [1, 1, 2, 2] AS coll
UNWIND coll AS x
WITH DISTINCT x
RETURN collect(x) AS setOfVals // [1,2]

Using UNWIND with any expression returning a list:

WITH
  [1, 2] AS a,
  [3, 4] AS b
UNWIND (a + b) AS x
RETURN x // the lists are concatenated and 4 rows are returned

Use multiple UNWIND clauses with a nested list:

WITH [[1, 2], [3, 4], 5] AS nested
UNWIND nested AS x
UNWIND x AS y
RETURN y // 5 rows

Replace empty list with null with CASE:

WITH [] AS list
UNWIND
  CASE
    WHEN list = [] THEN [null]
    ELSE list
  END AS emptylist
RETURN emptylist

Example of splitting the languages from movies to own nodes:

MATCH (m:Movie)
UNWIND m.languages AS language
WITH  language, collect(m) AS movies
MERGE (l:Language {name:language})
WITH l, movies
UNWIND movies AS m
WITH l,m
MERGE (m)-[:IN_LANGUAGE]->(l);
MATCH (m:Movie)
SET m.languages = null

Example of splitting genres to own nodes:

MATCH (m:Movie)
UNWIND m.genres AS genre
MERGE (g:Genre {name: genre})
MERGE (m)-[:IN_GENRE]->(g)
SET m.genres = null

Other

keys() - get the properties of a node

MATCH (p:Person)
RETURN p.name, keys(p)

get all node labels defined in the graph

CALL db.labels()

get all property keys defined (even if there are no nodes or relationships with them anymore)

CALL db.propertyKeys()

date specific uses
- datetime() - current date and time
- date("2019-09-30") = 2019-09-29
- datetime({epochmillis: ms}) = 2019-09-25T06:29:39Z
- use APOC functions for more specific needs (apoc.temporal)
use transactions by wrapping the queries with :BEGIN and :COMMIT:

:BEGIN

MATCH (u:User)
SET u.name = "Steve"

:COMMIT

produce a query plan showing the operations that occurred during a query:

PROFILE MATCH (p:Person)-[:ACTED_IN]-()
WHERE p.born < '1950'
RETURN p.name

use APOC for creating new and specialized relationships

MATCH (n:Actor)-[r:ACTED_IN]->(m:Movie)
CALL apoc.merge.relationship(n,
                              'ACTED_IN_' + left(m.released,4),
                              {},
                              m ) YIELD rel
RETURN COUNT(*) AS `Number of relationships merged`

view the schema with :schema
visualize: CALL db.schema.visualization

Graph Data Modeling

The process to create a graph data model:

understand the domain and define use cases
- describe the app in details
- identify the users of the app (people, systems)
- identify the use cases
- rank them based on importance
develop the initial model
- model the nodes (the entities)
- model the relationships between nodes
Types of models:
- data model - describe the labels, relationships and properties of the graph
- instance model - sample data used to test against the use cases
The node properties are used to uniquely identify a node, answer specific details of the use cases and / or return data.

They are defined based on the use cases and the steps required to answer them. Examples:
- What people acted in a movie?
  - Retrieve a movie by its title.
  - Return the names of the actors.
- What movies did a person act in?
  - Retrieve a person by their name.
  - Return the titles of the movies.
- What is the highest rated movie in a particular year according to imDB?
  - Retrieve all movies released in a particular year.
  - Evaluate the imDB ratings.
  - Return the movie title.
Relationships are usually between 2 different nodes, but they can also be to the same node.

Can add specialized relationships if that will filter fewer nodes but keeping the original generic relationships as well. For eg., besides ACTED_IN can add ACTED_IN_2023 as wel.

Can create intermediate nodes when you need to:
- connect more than 2 nodes in a single context (hyperedges, n-ary relationships)
- relate something to a relationship
- share data in the graph between entities
test the use cases against the initial data model
create the instance model with test data using Cypher
test the use cases including performance against the graph
refactor the graph data model in case of changes in the key use cases or for performance reasons
implement the refactoring on the graph and retest using Cypher

Import data

Cypher has a built-in clause (LOAD CSV), for importing JSON or XML need to use the APOC library
default field terminator is ,
the types of data that you can store as properties in Neo4j include:
- String
- Long (integer values)
- Double (decimal values)
- Boolean
- Date/Datetime
- Point (spatial)
- StringArray (comma-separated list of strings)
- LongArray (comma-separated list of integer values)
- DoubleArray (comma-separated list of decimal values)

Random

Neo4j’s Cypher statement language is optimized for node traversal so that relationships are not traversed multiple times
each relationship must have a direction in the graph. The relationship can be queried in either direction, or ignored completely at query time
Neo4j stores nodes and relationships as objects that are linked to each other via pointers
- index-free adjacency - a reference to the relationship is stored with both start and end nodes

golang-migrate

Docs: migrate, go.

go install -tags 'neo4j' github.com/golang-migrate/migrate/v4/cmd/migrate@latest

migrate -h

# ext specifies the file extension to use when creating migrations file.
# dir specifies which directory to create the migrations in.
migrate create -ext cypher -dir db/migrations <filename>

# neo4j://user:password@host:port/
export DB_URL='...'

# run migrations
migrate -database ${DB_URL} -path db/migrations up
migrate -database <db> -path db/migrations up
# rollback migrations
migrate -database <db> -path db/migrations down

# run the first two migrations
migrate -source db/migrations -database <db> up 2
# migrations hosted on github
migrate -source github://mattes:personal-access-token@mattes/migrate_test \
        -database <db> down 2

# docker usage
docker run -v {{ migration dir }}:/migrations --network host migrate/migrate
    -path=/migrations/ -database <db> up
    
# drop everything inside the db (verbose)
migrate -database <db> -path db/migrations -verbose drop

Issues

Running the migration on an empty db

error: Server error: [Neo.ClientError.Statement.SyntaxError] Invalid constraint syntax, ON and ASSERT should not be used. Replace ON with FOR and ASSERT with REQUIRE. (line 1, column 1 (offset: 0))
"CREATE CONSTRAINT ON (a:SchemaMigration) ASSERT a.version IS UNIQUE"

Fix:

Create the constraint manually:

CREATE CONSTRAINT FOR (a:SchemaMigration) REQUIRE a.version IS UNIQUE

Issue coming from here.

Dirty database version

Dirty database version xxx. Fix and force version.

Check schema migration:

MATCH(sm:SchemaMigration) RETURN sm

This will return something like this with dirty = true:

{
  "identity": 0,
  "labels": [
    "SchemaMigration"
  ],
  "properties": {
    "dirty": true,
    "version": 20230120122715,
    "ts": "2023-01-20T13:52:44.802000000Z"
  },
  "elementId": "0"
}

Fix:

Clean up the database and then change the dirty flag on SchemaMigration and rollback version number to last migration that was successfully applied.

MATCH(sm:SchemaMigration) SET sm.dirty = false, sm.version = <previous-version> RETURN sm

Can set version with:

migrate force V  # Set version V but don't run migration (ignores dirty state)

migrate -database <db> -path db/migrations -verbose version <version>

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
db/migrations		db/migrations
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

db/migrations

db/migrations

.gitignore

.gitignore

readme.md

readme.md

Repository files navigation

Neo4j

My completed courses

Cypher

MATCH

WHERE

MERGE

Customized MERGE behavior

CREATE

SET

Unsetting a property

REMOVE

DELETE

Using DETACH

UNWIND

Other

Graph Data Modeling

Import data

Random

golang-migrate

Issues

Running the migration on an empty db

Dirty database version

Links

About

Languages

mariamihai/neo4j-related

Folders and files

Latest commit

History

db/migrations

db/migrations

.gitignore

.gitignore

readme.md

readme.md

Repository files navigation

Neo4j

My completed courses

Cypher

MATCH

WHERE

MERGE

Customized MERGE behavior

CREATE

SET

Unsetting a property

REMOVE

DELETE

Using DETACH

UNWIND

Other

Graph Data Modeling

Import data

Random

golang-migrate

Issues

Running the migration on an empty db

Dirty database version

Links

About

Topics

Resources

Stars

Watchers

Forks

Languages