Neo4j

Basic Cypher

Creating Node

MERGE keyword is used to create a pattern in the database. It checks for duplication before creating.

CREATE can also be used to create node. The benefit is that it does not look up the primary key before adding the node. Use it when you are sure your data is clean and you want greater speed during import.

MERGE (p:Person {name: 'Katie Holmes'})
CREATE (m:Movie {title: 'The Dark Knight'})
RETURN p, m

Creating nodes and relationships using multiple clauses

MATCH (p:Person {name: 'Michael Caine'})
MATCH (m:Movie {title: 'The Dark Knight'})
MERGE (p)-[:ACTED_IN]->(m)

MERGE (p:Person {name: 'Chadwick Boseman'})
MERGE (m:Movie {title: 'Black Panther'})
MERGE (p)-[:ACTED_IN]-(m)

Adding properties for a node or relationship

Inline as part of the MERGE clause

MERGE (p:Person {name: 'Michael Caine'})
MERGE (m:Movie {title: 'Batman Begins'})
MERGE (p)-[:ACTED_IN {roles: ['Alfred Penny']}]->(m)
RETURN p,m

Using the SET keyword

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE p.name = 'Michael Caine' AND m.title = 'The Dark Knight'
SET r.roles = ['Alfred Penny']
RETURN p, r, m

Setting label

MATCH (p:Person {name: 'Jane Doe'})
SET p:Developer
RETURN p

Setting mutliple properties

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE p.name = 'Michael Caine' AND m.title = 'The Dark Knight'
SET r.roles = ['Alfred Penny'], r.year = 2008
RETURN p, r, m

Update properties

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE p.name = 'Michael Caine' AND m.title = 'The Dark Knight'
SET r.roles = ['Mr. Alfred Penny']
RETURN p, r, m

Remove properties

You should never remove the property that is used as the primary key for a node.

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE p.name = 'Michael Caine' AND m.title = 'The Dark Knight'
REMOVE r.roles
SET p.born = null
RETURN p, r, m

Customize MERGE behaviour

// Find or create a person with this name
MERGE (p:Person {name: 'McKenna Grace'})

// Only set the `createdAt` property if the node is created during this query
ON CREATE SET p.createdAt = datetime()

// Only set the `updatedAt` property if the node was created previously
ON MATCH SET p.updatedAt = datetime()

// Set the `born` property regardless
SET p.born = 2006

RETURN p

Deleting node/relationship/labels

MATCH (p:Person)
WHERE p.name = 'Jane Doe'
DELETE p

Neo4j prevents orphaned relationships in the graph. So node can't be deleted if it has relationship connected to it.

// Direct execution raises error
MATCH (p:Person {name: 'Jane Doe'})
DELETE p

// First remove relationship so node can be deleted
MATCH (p:Person {name: 'Jane Doe'})-[r:ACTED_IN]->(m:Movie {title: 'The Matrix'})
DELETE r
RETURN p, m

Use DETACH to delete node and relationship together. Note that only do this on relatively small databases as trying to do this on a large database exhausts memory.

MATCH (p:Person {name: 'Jane Doe'})
DETACH DELETE p

// This deletes entire graph
MATCH (n)
DETACH DELETE n

To remove label:

MATCH (p:Person {name: 'Jane Doe'})
REMOVE p:Developer
RETURN p

Returning Path It's sometimes useful to work with path object using built-in functions:
- length(p) returns the length of a path.
- nodes(p) returns a list containing the nodes for a path.
- relationships(p) returns a list containing the relationships for a path.
```
MATCH p = ((person:Person)-[]->(movie))
WHERE person.name = 'Walt Disney'
RETURN p
```

Graph Traversal

When the execution plan is created, it determines the set of nodes, referred as anchor, that will be the starting points for the query. The anchor is often based upon a MATCH clause, typically determined by meta-data that is stored in the graph or a filter that is provided inline or in a WHERE clause. The anchor for a query will be based upon the fewest number of nodes that need to be retrieved into memory.

Once anchor node is extracted, path will then be expanded from them if provided

Find anchor nodes

// p is anchor, since Person node amount is less than general nodes m
PROFILE MATCH (p:Person)-[:ACTED_IN]->(m)
RETURN p.name, m.title LIMIT 100

// m is anchor, since Movie node amount is less than Person node
PROFILE MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
RETURN p.name, m.title LIMIT 100

// p is anchor, since there is only one Person node with name Eminem
PROFILE MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = 'Eminem'
RETURN p.name, m.title

Varying length traversal

Shortest path For shortestPath() and allShortestPaths() you can provide an upper bound on the length of the path(s), but not a lower bound.

// This query finds the shortest path between two nodes regardless of relationship types
MATCH p = shortestPath((p1:Person)-[*]-(p2:Person))
WHERE p1.name = "Eminem"
AND p2.name = "Charlton Heston"
RETURN  p

// This query finds the shortest path between two nodes using only ACTED_IN relationship
MATCH p = shortestPath((p1:Person)-[:ACTED_IN*]-(p2:Person))
WHERE p1.name = "Eminem"
AND p2.name = "Charlton Heston"
RETURN  p

Varying length traversal When exact hops are given, paths are traversed in a depth-first manner.

// This query returns Person node that are exactly two hops away from Eminem, if any
MATCH (p:Person {name: 'Eminem'})-[:ACTED_IN*2]-(others:Person)
RETURN  others.name

// This query returns Person node that are up to four hops away from Eminem, if any
MATCH (p:Person {name: 'Eminem'})-[:ACTED_IN*1..4]-(others:Person)
RETURN  others.name

Advanced Cypher

Get node by id
```
MATCH (n) where ID(n)=<your_id>
RETURN n
```

Label related

// Get list of all labels
CALL db.labels()

// Find node with label
MATCH (p)
WHERE  p.born.year > 1960
AND p:Actor
AND p:Director
RETURN p.name, p.born, labels(p)

Schema Related

// View data model
CALL db.schema.visualization()

// View node property type
CALL db.schema.nodeTypeProperties()

// View relationship property type
CALL db.schema.relTypeProperties()

// Show constraints
SHOW CONSTRAINTS

Patterns-related In general, using a single MATCH clause will perform better than multiple MATCH clauses.

// Multiple patterns
MATCH (a:Person)-[:ACTED_IN]->(m:Movie), (m)<-[:DIRECTED]-(d:Person)
WHERE m.year > 2000
RETURN a.name, m.title, d.name

// This query will have better performance then the above one
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
WHERE m.year > 2000
RETURN a.name, m.title, d.name

// Multi-relationship match
MATCH (p:Person) -[r:ACTED_IN|DIRECTED]->(m:Movie)
RETURN p.name, r.role, m.title

// Test pattern existence
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE  p.name = 'Tom Hanks'
AND exists {(p)-[:DIRECTED]->(m)}
RETURN p.name, labels(p), m.title

Filtering Note that string matching in Cypher is case-sensitive.

Warning

Query engine turns off the use of the index if a string property is transformed uring query, such as using toUpper() or toLower().

A best practice for handling property values that need to be evaluated as upper, lower, or mixed case is to use fulltext schema indexes.

// Check string property value starts with prefix or ends with suffix
MATCH (m:Movie)
WHERE  m.title STARTS WITH 'Toy Story'
WHERE  m.title ENDS WITH '2'
RETURN m.title, m.released

// Check value containse substring
MATCH (p:Person)
WHERE toUpper(p.name) CONTAINS ' DE '
RETURN p.name

// To avoid missing data due to mismatch in case
MATCH (p:Person)
WHERE toLower(p.name) ENDS WITH 'demille'
OR toUpper(p.name) ENDS WITH 'DEMILLE'
RETURN p.name

// Testing inequality of a property using the <> predicate
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name <> 'Tom Hanks'
AND m.title = 'Captain Phillips'
RETURN p.name

// Less/Greater than
MATCH (m:Movie) WHERE m.title = 'Toy Story'
RETURN
    m.year < 1995 AS lessThan, //  Less than (false)
    m.year <= 1995 AS lessThanOrEqual, // Less than or equal(true)
    m.year > 1995 AS moreThan, // More than (false)
    m.year >= 1995 AS moreThanOrEqual // More than or equal (true)

// Range
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE 2005 <= m.year <= 2010
RETURN m.title, m.released

// Or condition
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
OR m.title = 'Captain Phillips'
RETURN p.name, m.title

// Null testing
MATCH (p:Person)
WHERE p.died IS NOT NULL
AND p.born.year >= 1985
RETURN p.name, p.born, p.died

MATCH (p:Person)
WHERE p.died IS NULL
AND p.born.year <= 1922
RETURN p.name, p.born, p.died

Profiling The difference between using EXPLAIN and PROFILE is that EXPLAIN provides estimates of the query steps where PROFILE provides the exact steps and number of rows retrieved for the query.

PROFILE MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p)
WHERE  p.name = 'Tom Hanks'
RETURN  m.title

EXPLAIN MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p)
WHERE  p.name = 'Tom Hanks'
RETURN  m.title

Optional Match OPTIONAL MATCH matches patterns with your graph, just like MATCH does. The difference is that if no matches are found, OPTIONAL MATCH will use nulls for missing parts of the pattern. OPTIONAL MATCH could be considered the Cypher equivalent of the outer join in SQL.
```
MATCH (m:Movie) WHERE m.title = "Kiss Me Deadly"
MATCH (m)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(rec:Movie)
OPTIONAL MATCH (m)<-[:ACTED_IN]-(a:Actor)-[:ACTED_IN]->(rec)
RETURN rec.title, a.name
```

Order result

There is no limit to the number of properties you can order by.

MATCH (p:Person) WHERE p.born.year > 1980 RETURN p
ORDER BY p.born

MATCH (p:Person) WHERE p.born.year > 1980 RETURN p
ORDER BY p.born DESC

// Multiple sort expression
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
OR p.name = 'Keanu Reeves'
RETURN  m.year, m.title
ORDER BY m.year DESC , m.title

Paganition

MATCH (p:Person)
WHERE p.born.year = 1980
RETURN  p.name as name,
p.born AS birthDate
ORDER BY p.born SKIP 40 LIMIT 10

Distinct You can use DISTINCT to eliminate duplication of:

rows returned (you have just learned this)
property values
nodes

MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
RETURN DISTINCT m.title, m.released
ORDER BY m.titles

Customize returned result

// Return partial property
MATCH (p:Person)
WHERE p.name CONTAINS "Thomas"
RETURN p { .name, .born } AS person
ORDER BY p.name

// Adding custom data
MATCH (m:Movie)<-[:DIRECTED]-(d:Director)
WHERE d.name = 'Woody Allen'
RETURN m {.*, favorite: true} AS movie

// Custom string/numeric operation
MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)
WHERE m.title CONTAINS 'Toy Story' AND
p.died IS NULL
RETURN 'Movie: ' + m.title AS movie,
p.name AS actor,
p.born AS dob,
date().year - p.born.year AS ageThisYear

// Conditionally changine returned data
MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)
WHERE p.name = 'Henry Fonda'
RETURN m.title AS movie,
CASE
WHEN m.year < 1940 THEN 'oldies'
WHEN 1940 <= m.year < 1950 THEN 'forties'
WHEN 1950 <= m.year < 1960 THEN 'fifties'
WHEN 1960 <= m.year < 1970 THEN 'sixties'
WHEN 1970 <= m.year < 1980 THEN 'seventies'
WHEN 1980 <= m.year < 1990 THEN 'eighties'
WHEN 1990 <= m.year < 2000 THEN 'nineties'
ELSE  'two-thousands'
END
AS timeFrame

// Return a list
MATCH (p:Person)
RETURN p.name, [p.born, p.died] AS lifeTime

Aggregation

List of aggregation functions can be checked here.

// Return a list
MATCH (p:Person)
RETURN p.name, [p.born, p.died] AS lifeTime

// Using collect() to create a list
// collect() removes duplicates as well
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
RETURN a.name AS actor,
count(*) AS total,
collect(m.title) AS movies
ORDER BY total DESC LIMIT 10

// Collect nodes
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name ='Tom Cruise'
RETURN collect(m) AS tomCruiseMovies

// Access element of a list
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
RETURN m.title AS movie,
collect(a.name)[0] AS castMember,
size(collect(a.name)) as castSize

// Returning a slice of collection 
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
RETURN m.title AS movie,
collect(a.name)[2..] AS castMember,
size(collect(a.name)) as castSize

// List comprehension
MATCH (m:Movie)
RETURN m.title as movie,
[x IN m.countries WHERE x CONTAINS 'USA' OR x CONTAINS 'Germany']
AS country LIMIT 500

tip

count() may be more efficient because it gets its values for node counts or relationships from a node from the internal count store of the graph.

Pattern comprehension

MATCH (m:Movie)
WHERE m.year = 2015
RETURN m.title,
[(dir:Person)-[:DIRECTED]->(m) | dir.name] AS directors,
[(actor:Person)-[:ACTED_IN]->(m) | actor.name] AS actors

Map projection

MATCH (m:Movie)
WHERE m.title CONTAINS 'Matrix'
RETURN m { .title, .released } AS movie

Setting variable

WITH 'Tom Hanks' AS actorName
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = actorName
RETURN m.title AS movies

Utilizing scope

// Limit return amount using `with` scope
WITH  'Tom Hanks' AS theActor
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = theActor
WITH m  LIMIT 2
// possibly do more with the two m nodes
RETURN m.title AS movies

// Sort before scope
WITH  'Tom Hanks' AS theActor
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = theActor
WITH m ORDER BY m.year LIMIT 5
// possibly do more with the five m nodes in a particular order
RETURN m.title AS movies, m.year AS yearReleased

Map projection and aggregation

MATCH (n:Movie)
WHERE n.imdbRating IS NOT NULL
AND n.poster IS NOT NULL
WITH n {
  .title,
  .year,
  .languages,
  .plot,
  .poster,
  .imdbRating,
  directors: [ (n)<-[:DIRECTED]-(d) | d { tmdbId:d.imdbId, .name } ]
}
ORDER BY n.imdbRating DESC LIMIT 4
RETURN collect(n)

tip

Although this is nice for processing on the client side, it takes more memory on the server as records cannot be streamed to the client but are collected into the list structure on the server.

UNWIND

UNWIND returns a row for each element of a list.

MATCH (m:Movie)-[:ACTED_IN]-(a:Actor)
WHERE a.name = 'Tom Hanks'
UNWIND m.languages AS lang
RETURN m.title AS movie,
m.languages AS languages,
lang AS language

Date Manipulation

Create date properties

MERGE (x:Test {id: 1})
SET x.date = date(),
    x.datetime = datetime(),
    x.time = time()
RETURN x

CALL apoc.meta.nodeTypeProperties()

Extract date data

MATCH (x:Test {id: 1})
RETURN x.date.day, x.date.year,
x.datetime.year, x.datetime.hour,
x.datetime.minute

Setting date values

MATCH (x:Test {id: 1})
SET x.date1 = date('2022-01-01'),
    x.date2 = date('2022-01-15')
RETURN x

Setting datetime values

MATCH (x:Test {id: 1})
SET x.datetime1 = datetime('2022-01-04T10:05:20'),
    x.datetime2 = datetime('2022-04-09T18:33:05')
RETURN x

Working with durations

// Match between duration
MATCH (x:Test {id: 1})
RETURN duration.between(x.date1,x.date2)

// Calculate duration in days
MATCH (x:Test {id: 1})
RETURN duration.inDays(x.datetime1,x.datetime2).days

// Add duration to date
MATCH (x:Test {id: 1})
RETURN x.date1 + duration({months: 6})

Format date and time with apoc

MATCH (x:Test {id: 1})
RETURN x.datetime as Datetime,
apoc.temporal.format( x.datetime, 'HH:mm:ss.SSSS')
AS formattedDateTime

MATCH (x:Test {id: 1})
RETURN apoc.date.toISO8601(x.datetime.epochMillis, "ms")
AS iso8601

RETURN apoc.temporal.toZonedTemporal('2012-12-23 23:59:59',"yyyy-MM-dd HH:mm:ss") AS output;

Return path in curry

MATCH p = ((person:Person)-[]->(movie))
WHERE person.name = 'Walt Disney'
RETURN p

Reducing Memory

In Cypher, a query represents a single transaction against the graph.

Subquery
- A subquery returns values referred to by the variables in the RETURN clause.
- A subquery cannot return variables with the same name used in the enclosing query.
- You must explicitly pass in variables from the enclosing query to a subquery.
```
CALL {
  MATCH (m:Movie) WHERE m.year = 2000
  RETURN m ORDER BY m.imdbRating DESC LIMIT 10
}
MATCH  (:User)-[r:RATED]->(m)
RETURN m.title, avg(r.rating)
```

Union

tip

UNION ALL returns all results which is more efficient on memory but can lead to duplicates. UNION returns distinct results.

MATCH (m:Movie) WHERE m.year = 2000
RETURN {type:"movies", theMovies: collect(m.title)} AS data
UNION ALL
MATCH (a:Actor) WHERE a.born.year > 2000
RETURN { type:"actors", theActors: collect(DISTINCT a.name)} AS data

Union in subquery

MATCH (p:Person)
WITH p LIMIT 100
CALL {
  WITH p
  OPTIONAL MATCH (p)-[:ACTED_IN]->(m:Movie)
  RETURN m.title + ": " + "Actor" AS work
UNION
  WITH p
  OPTIONAL MATCH (p)-[:DIRECTED]->(m:Movie)
  RETURN m.title+ ": " +  "Director" AS work
}
RETURN p.name, collect(work)

Using parameters

A parameter name is prefixed with a $ symbol in Cypher statement.

// Setting paramter with string value
:param actorName: 'Tom Hanks'

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = $actorName
RETURN m.released AS releaseDate,
m.title AS title
ORDER BY m.released DESC

// JSON syntax to set multiple parameters
:params {actorName: 'Tom Cruise', movieName: 'Top Gun'}

// List all parameters
:params

// Clear all parameters
:params {}

However, special care is required when handling integer value due to a discrepancy between integers in JavaScript and the Neo4j type system. To avoid any data loss on large numbers, any integers are converted to floating point values when the parameter is set.

:param number: 10   // outputs: { "number": 10.0 }

// Force type
:param number=> 10  // outputs: { "number": 10}

Using constraints

Constraint is interally implemented as an index, and is typically created before the data is loaded into the graph.

There are three types of constraints:

Uniqueness for a single node property value.
Existence for a property of a node or relationship.
Existence and uniqueness for a set of node property values (called a Node key).

When creating a uniqueness constraint on a property, a node label that the constraint will be associated with must be specified. Attempt to create a uniqueness constraint on a property that is not unique for all nodes with that label, the constraint creation will fail.

Creating basic constraints

// Create a uniqueness constraint for a single property
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR (x:<node_label>)
REQUIRE x.<property_key> IS UNIQUE

// Create a uniqueness constraint for multiple properties
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR (x:<node_label>)
REQUIRE (x.<property_key1>, x.<property_key2>)  IS UNIQUE

// Create a existance constraint on node
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR (x:<node_label>)
REQUIRE x.<property_key>  IS NOT NULL

// Create a existance constraint on relationship
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR ()-[x:<RELATIONSHIP_TYPE>]-()
REQUIRE x.<property_key> IS NOT NULL

// List all constraints
SHOW CONSTRAINTS

Creating node key constraint

Node key constraint is a specialized type of constraint for the properties of a node. It combines existence with uniqueness.

// Creating a Node key constraint for a single property
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR (x:<node_label>)
REQUIRE x.<property_key> IS NODE KEY

// Creating a Node key constraint for mulitple properties
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR (x:<node_label>)
REQUIRE (x.<property_key1>, x.<property_key2>)  IS NODE KEY

Drop constraint

DROP CONSTRAINT <constraint_name>

// Creating list of constraints to drop
SHOW CONSTRAINTS 
YIELD name  
RETURN collect('DROP CONSTRAINT ' + name + ';') AS Statements

// Drop constraints using APOC
CALL apoc.schema.assert({},{},true)

Using indexes

Neo4j supports two types of index: search-performace index and semantic index Searh-performance index type enables quicker retrieval of exact matches between an index and the primary data storage, including: RANGE, LOOKUP, TEXT, POINT. Semantic index type captures the semantic meaning or context of the data by returning an approximation score, which indicates the similarity between a query string and the data in a database. It includes: full-text and vector.

note

Unlike search-performance indexes, semantic indexes are not automatically used by the Cypher® planner. To use semantic indexes, they must be explicitly called with specific procedures.

During query planning, at most one index is used. So it is important to plan what properties to index how to create them.

To list and drop indexes created for the graph: below Cypher command can be used:

SHOW INDEXES
DROP INDEX <index_name>

Range index

Range index can be defined on a property of node or relationship type. It is implemented with B tree which is often used to enable efficient sorting. It can speed up following Cypher queries:

Equality checks = (TEXT index may perform better for string properties)
Range comparisons >, >=, <, <=
STARTS WITH string comparisons (performs better than TEXT index)
Existence checks IS NOT NULL

Creating RANGE index

// Create index on node
CREATE INDEX <index_name> IF NOT EXISTS
FOR (x:<node_label>)
ON x.<property_key>

// Create index on relationship
CREATE INDEX <index_name> IF NOT EXISTS
FOR ()-[x:<relationship_label>]-()
ON x.<property_key>

TEXT index

A TEXT index supports node or relationship property types that must be strings. It speeds up the following Cypher code:

Equality checks: =
String comparisons:ENDS WITH, CONTAINS
List membership: x.prop in ["a","b","c"]

Using TEXT index also brings below benefits:

Take up less space
More performant than RANGE indexes when the property being indexed contains a lot of duplicate values.

Tip 1

For ENDS WITH ands CONTAINS comparisons, TEXT index might perform better than RANGE index.

Tip 2

Total db hits is often a useful indicator to evaluate the performance of a query. For TEXT index, however, this might not be the case. It can have higher db hits but providing less elapsed time.

Creating TEXT index

// Create TEXT index on node
CREATE TEXT INDEX <index_name> IF NOT EXISTS
FOR (x:<node_label>)
ON x.<property_key>

// Create TEXT index on relationship
CREATE TEXT INDEX <index_name> IF NOT EXISTS
FOR ()-[x:<RELATIONSHIP_TYPE>]-()
ON (x.<property_key>)

Composite index

A composite index combines values from multiple properties, can be of different type, for a node label or for relationship type. This index comes in handy when multiple properties are often queried together.

Creating Composite index

// Composite index on node
CREATE INDEX <index_name> IF NOT EXISTS
FOR (x:<node_label>)
ON (x.<property_key1>,x.<property_key2>,...)

// Composite index on relationship
CREATE INDEX <index_name> IF NOT EXISTS
FOR ()-[x:<RELATIONSHIP_TYPE>]-()
ON (x.<property_key1>, x.<property_key2>,...)

However, queries that simply check if property exist or only one property of the index is tested won't utilize this kind of index. For instance, composite index won't be used in below query even if a composite index, say (m.year, m.runtime), is created.

PROFILE MATCH (m:Movie)
WHERE m.year IS NULL AND m.runtime IS NULL
RETURN m.title, m.year, m.runtime

// OR

PROFILE MATCH (m:Movie)
WHERE m.year = 1994
RETURN m.title, m.year, m.runtime

Full-Text index

A full-text index is based upon string values only, but provides additional search capabilities that RANGE or TEXT indexes don't offer for it is built on Apache Lucene. It can be used on:

Node or relationship properties
Single property or multiple properties
Single or multiple types of nodes (labels)
Single or multiple types of relationships

Full-text index has below benefits:

Utilize Lucene to offer performant full-text search
Enables using regular expressions for complex query predicates
Allow specifying an index on multiple properties associated with multiple labels

Full-text index can be created for:

A string property in each node with a given label.
Multiple string properties across multiple node labels.
A string property in each relationship with a given type.
Multiple string properties across multiple relationship types.

But not:

❌ Properties across node and relationship

Using full-text index

i Explicit instruction must be included in Cypher to hint the query engine to use full-text index by calling a special procedure, db.index.fulltext.queryNodes() or db.index.fulltext.queryRelationships() ,whereas other indexs are used automatically.

// Create FULLTEXT index for node
CREATE FULLTEXT INDEX <index_name> IF NOT EXISTS
FOR (x:<node_label>)
ON EACH [x.<property_key>]

// Create FULLTEXT index for relationship
CREATE FULLTEXT INDEX <index_name> IF NOT EXISTS
FOR ()-[x:<RELATIONSHIP_TYPE>]-()
ON EACH [x.<property_key>]

// Query node utilizing FULLTEXT index
CALL db.index.fulltext.queryNodes
("<index_name>", "<lucene_query>")
YIELD node
RETURN node.<property>

// Query relationship utilizing FULLTEXT index
CALL db.index.fulltext.queryRelationships
("<index_name>", "<lucene_query>")
YIELD relationship
RETURN relationship.<property>

tip

The total db hits statistics do not reflect the true db hits because the PROFILE will never expose details of a procedure. All that you can measure for queries that call procedures is elapsed time.

LOOKUP index

LOOKUP indexes are used to look up nodes with a specific label or relationships of a specific type. They are always created over all labels or relationship types. Therefore, databases can have a maximum of two token LOOKUP indexes - one for nodes and one for relationships.

Controlling index usage

MATCH clause uses an index by default, specific hint USING must be added to utilize specific index.

PROFILE MATCH
(p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p2:Person)
USING INDEX p:Person(name)
WHERE
p.name CONTAINS 'John'
AND
p2.name CONTAINS 'George'
RETURN p.name, p2.name,  m.title

Multiple USING can be used:

PROFILE MATCH
(p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p2:Person)
USING INDEX p:Person(name)
USING INDEX p2:Person(name)
WHERE
p.name CONTAINS 'John'
AND
p2.name CONTAINS 'George'
RETURN p.name, p2.name,  m.title

However, not all queries will benefit from query hints, so always test the effectiveness of the queries/indexes.

Recommend reading

Index hints for the Cypher planner

Index limitations

Recommed REad

Recommend Reading

Cypher Aggregation

Creating List

MATCH (m:Movie)
RETURN [m.title, m.released, date().year - date(m.released).year + 1 ]

collect can be used to aggregate nods related to specific nodes into a list.

MATCH (a:label1)--(m:label2)
WITH a, collect (m) AS col2
RETURN a AS col1, col2

Neo4j

Basic Cypher​

Creating Node​

Creating nodes and relationships using multiple clauses​

Adding properties for a node or relationship​

Graph Traversal​

Varying length traversal​

Advanced Cypher​

Date Manipulation​

Reducing Memory​

Using parameters​

Using constraints​

Creating basic constraints​

Creating node key constraint​

Drop constraint​

Using indexes​

Range index​

Creating RANGE index​

TEXT index​

Creating TEXT index​

Composite index​

Creating Composite index​

Full-Text index​

Using full-text index​

LOOKUP index​

Controlling index usage​

Index limitations​

Cypher Aggregation​

Creating List​