Neo4j
Basic Cypher
Creating Node
MERGE
keyword is used to create a pattern in the database. It checks for duplication before creating.
CREATE
can also be used to create node. The benefit is that it does not look up the primary key before adding the node. Use it when you are sure your data is clean and you want greater speed during import.
MERGE (p:Person {name: 'Katie Holmes'})
CREATE (m:Movie {title: 'The Dark Knight'})
RETURN p, m
Creating nodes and relationships using multiple clauses
MATCH (p:Person {name: 'Michael Caine'})
MATCH (m:Movie {title: 'The Dark Knight'})
MERGE (p)-[:ACTED_IN]->(m)
MERGE (p:Person {name: 'Chadwick Boseman'})
MERGE (m:Movie {title: 'Black Panther'})
MERGE (p)-[:ACTED_IN]-(m)
Adding properties for a node or relationship
Inline as part of the MERGE clause
MERGE (p:Person {name: 'Michael Caine'})
MERGE (m:Movie {title: 'Batman Begins'})
MERGE (p)-[:ACTED_IN {roles: ['Alfred Penny']}]->(m)
RETURN p,mUsing the SET keyword
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE p.name = 'Michael Caine' AND m.title = 'The Dark Knight'
SET r.roles = ['Alfred Penny']
RETURN p, r, mSetting label
MATCH (p:Person {name: 'Jane Doe'})
SET p:Developer
RETURN pSetting mutliple properties
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE p.name = 'Michael Caine' AND m.title = 'The Dark Knight'
SET r.roles = ['Alfred Penny'], r.year = 2008
RETURN p, r, mUpdate properties
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE p.name = 'Michael Caine' AND m.title = 'The Dark Knight'
SET r.roles = ['Mr. Alfred Penny']
RETURN p, r, mRemove properties
You should never remove the property that is used as the primary key for a node.
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE p.name = 'Michael Caine' AND m.title = 'The Dark Knight'
REMOVE r.roles
SET p.born = null
RETURN p, r, mCustomize
MERGE
behaviour// Find or create a person with this name
MERGE (p:Person {name: 'McKenna Grace'})
// Only set the `createdAt` property if the node is created during this query
ON CREATE SET p.createdAt = datetime()
// Only set the `updatedAt` property if the node was created previously
ON MATCH SET p.updatedAt = datetime()
// Set the `born` property regardless
SET p.born = 2006
RETURN pDeleting node/relationship/labels
MATCH (p:Person)
WHERE p.name = 'Jane Doe'
DELETE pNeo4j prevents orphaned relationships in the graph. So node can't be deleted if it has relationship connected to it.
// Direct execution raises error
MATCH (p:Person {name: 'Jane Doe'})
DELETE p
// First remove relationship so node can be deleted
MATCH (p:Person {name: 'Jane Doe'})-[r:ACTED_IN]->(m:Movie {title: 'The Matrix'})
DELETE r
RETURN p, mUse
DETACH
to delete node and relationship together. Note that only do this on relatively small databases as trying to do this on a large database exhausts memory.MATCH (p:Person {name: 'Jane Doe'})
DETACH DELETE p
// This deletes entire graph
MATCH (n)
DETACH DELETE nTo remove label:
MATCH (p:Person {name: 'Jane Doe'})
REMOVE p:Developer
RETURN pReturning Path It's sometimes useful to work with
path
object using built-in functions:- length(p) returns the length of a path.
- nodes(p) returns a list containing the nodes for a path.
- relationships(p) returns a list containing the relationships for a path.
MATCH p = ((person:Person)-[]->(movie))
WHERE person.name = 'Walt Disney'
RETURN p
Graph Traversal
When the execution plan is created, it determines the set of nodes, referred as anchor
, that will be the starting points for the query. The anchor is often based upon a MATCH clause, typically determined by meta-data that is stored in the graph or a filter that is provided inline or in a WHERE clause. The anchor for a query will be based upon the fewest number of nodes that need to be retrieved into memory.
Once anchor node is extracted, path will then be expanded from them if provided
Find anchor nodes
// p is anchor, since Person node amount is less than general nodes m
PROFILE MATCH (p:Person)-[:ACTED_IN]->(m)
RETURN p.name, m.title LIMIT 100
// m is anchor, since Movie node amount is less than Person node
PROFILE MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
RETURN p.name, m.title LIMIT 100
// p is anchor, since there is only one Person node with name Eminem
PROFILE MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = 'Eminem'
RETURN p.name, m.title
Varying length traversal
Shortest path For
shortestPath()
andallShortestPaths()
you can provide an upper bound on the length of the path(s), but not a lower bound.// This query finds the shortest path between two nodes regardless of relationship types
MATCH p = shortestPath((p1:Person)-[*]-(p2:Person))
WHERE p1.name = "Eminem"
AND p2.name = "Charlton Heston"
RETURN p
// This query finds the shortest path between two nodes using only ACTED_IN relationship
MATCH p = shortestPath((p1:Person)-[:ACTED_IN*]-(p2:Person))
WHERE p1.name = "Eminem"
AND p2.name = "Charlton Heston"
RETURN pVarying length traversal When exact hops are given, paths are traversed in a depth-first manner.
// This query returns Person node that are exactly two hops away from Eminem, if any
MATCH (p:Person {name: 'Eminem'})-[:ACTED_IN*2]-(others:Person)
RETURN others.name
// This query returns Person node that are up to four hops away from Eminem, if any
MATCH (p:Person {name: 'Eminem'})-[:ACTED_IN*1..4]-(others:Person)
RETURN others.name
Advanced Cypher
Get node by id
MATCH (n) where ID(n)=<your_id>
RETURN nLabel related
// Get list of all labels
CALL db.labels()
// Find node with label
MATCH (p)
WHERE p.born.year > 1960
AND p:Actor
AND p:Director
RETURN p.name, p.born, labels(p)Schema Related
// View data model
CALL db.schema.visualization()
// View node property type
CALL db.schema.nodeTypeProperties()
// View relationship property type
CALL db.schema.relTypeProperties()
// Show constraints
SHOW CONSTRAINTSPatterns-related In general, using a single MATCH clause will perform better than multiple MATCH clauses.
// Multiple patterns
MATCH (a:Person)-[:ACTED_IN]->(m:Movie), (m)<-[:DIRECTED]-(d:Person)
WHERE m.year > 2000
RETURN a.name, m.title, d.name
// This query will have better performance then the above one
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
WHERE m.year > 2000
RETURN a.name, m.title, d.name
// Multi-relationship match
MATCH (p:Person) -[r:ACTED_IN|DIRECTED]->(m:Movie)
RETURN p.name, r.role, m.title
// Test pattern existence
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
AND exists {(p)-[:DIRECTED]->(m)}
RETURN p.name, labels(p), m.titleFiltering Note that string matching in Cypher is case-sensitive.
WarningQuery engine turns off the use of the index if a string property is transformed uring query, such as using
toUpper()
ortoLower()
.A best practice for handling property values that need to be evaluated as upper, lower, or mixed case is to use fulltext schema indexes.
// Check string property value starts with prefix or ends with suffix
MATCH (m:Movie)
WHERE m.title STARTS WITH 'Toy Story'
WHERE m.title ENDS WITH '2'
RETURN m.title, m.released
// Check value containse substring
MATCH (p:Person)
WHERE toUpper(p.name) CONTAINS ' DE '
RETURN p.name
// To avoid missing data due to mismatch in case
MATCH (p:Person)
WHERE toLower(p.name) ENDS WITH 'demille'
OR toUpper(p.name) ENDS WITH 'DEMILLE'
RETURN p.name
// Testing inequality of a property using the <> predicate
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name <> 'Tom Hanks'
AND m.title = 'Captain Phillips'
RETURN p.name
// Less/Greater than
MATCH (m:Movie) WHERE m.title = 'Toy Story'
RETURN
m.year < 1995 AS lessThan, // Less than (false)
m.year <= 1995 AS lessThanOrEqual, // Less than or equal(true)
m.year > 1995 AS moreThan, // More than (false)
m.year >= 1995 AS moreThanOrEqual // More than or equal (true)
// Range
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE 2005 <= m.year <= 2010
RETURN m.title, m.released
// Or condition
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
OR m.title = 'Captain Phillips'
RETURN p.name, m.title
// Null testing
MATCH (p:Person)
WHERE p.died IS NOT NULL
AND p.born.year >= 1985
RETURN p.name, p.born, p.died
MATCH (p:Person)
WHERE p.died IS NULL
AND p.born.year <= 1922
RETURN p.name, p.born, p.diedProfiling The difference between using EXPLAIN and PROFILE is that EXPLAIN provides estimates of the query steps where PROFILE provides the exact steps and number of rows retrieved for the query.
PROFILE MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p)
WHERE p.name = 'Tom Hanks'
RETURN m.title
EXPLAIN MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p)
WHERE p.name = 'Tom Hanks'
RETURN m.titleOptional Match
OPTIONAL MATCH
matches patterns with your graph, just likeMATCH
does. The difference is that if no matches are found,OPTIONAL MATCH
will use nulls for missing parts of the pattern.OPTIONAL MATCH
could be considered the Cypher equivalent of the outer join in SQL.MATCH (m:Movie) WHERE m.title = "Kiss Me Deadly"
MATCH (m)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(rec:Movie)
OPTIONAL MATCH (m)<-[:ACTED_IN]-(a:Actor)-[:ACTED_IN]->(rec)
RETURN rec.title, a.nameOrder result
There is no limit to the number of properties you can order by.
MATCH (p:Person) WHERE p.born.year > 1980 RETURN p
ORDER BY p.born
MATCH (p:Person) WHERE p.born.year > 1980 RETURN p
ORDER BY p.born DESC
// Multiple sort expression
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
OR p.name = 'Keanu Reeves'
RETURN m.year, m.title
ORDER BY m.year DESC , m.titlePaganition
MATCH (p:Person)
WHERE p.born.year = 1980
RETURN p.name as name,
p.born AS birthDate
ORDER BY p.born SKIP 40 LIMIT 10Distinct You can use DISTINCT to eliminate duplication of:
- rows returned (you have just learned this)
- property values
- nodes
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
RETURN DISTINCT m.title, m.released
ORDER BY m.titlesCustomize returned result
// Return partial property
MATCH (p:Person)
WHERE p.name CONTAINS "Thomas"
RETURN p { .name, .born } AS person
ORDER BY p.name
// Adding custom data
MATCH (m:Movie)<-[:DIRECTED]-(d:Director)
WHERE d.name = 'Woody Allen'
RETURN m {.*, favorite: true} AS movie
// Custom string/numeric operation
MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)
WHERE m.title CONTAINS 'Toy Story' AND
p.died IS NULL
RETURN 'Movie: ' + m.title AS movie,
p.name AS actor,
p.born AS dob,
date().year - p.born.year AS ageThisYear
// Conditionally changine returned data
MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)
WHERE p.name = 'Henry Fonda'
RETURN m.title AS movie,
CASE
WHEN m.year < 1940 THEN 'oldies'
WHEN 1940 <= m.year < 1950 THEN 'forties'
WHEN 1950 <= m.year < 1960 THEN 'fifties'
WHEN 1960 <= m.year < 1970 THEN 'sixties'
WHEN 1970 <= m.year < 1980 THEN 'seventies'
WHEN 1980 <= m.year < 1990 THEN 'eighties'
WHEN 1990 <= m.year < 2000 THEN 'nineties'
ELSE 'two-thousands'
END
AS timeFrame
// Return a list
MATCH (p:Person)
RETURN p.name, [p.born, p.died] AS lifeTimeAggregation
List of aggregation functions can be checked here.
// Return a list
MATCH (p:Person)
RETURN p.name, [p.born, p.died] AS lifeTime
// Using collect() to create a list
// collect() removes duplicates as well
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
RETURN a.name AS actor,
count(*) AS total,
collect(m.title) AS movies
ORDER BY total DESC LIMIT 10
// Collect nodes
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name ='Tom Cruise'
RETURN collect(m) AS tomCruiseMovies
// Access element of a list
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
RETURN m.title AS movie,
collect(a.name)[0] AS castMember,
size(collect(a.name)) as castSize
// Returning a slice of collection
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
RETURN m.title AS movie,
collect(a.name)[2..] AS castMember,
size(collect(a.name)) as castSize
// List comprehension
MATCH (m:Movie)
RETURN m.title as movie,
[x IN m.countries WHERE x CONTAINS 'USA' OR x CONTAINS 'Germany']
AS country LIMIT 500tipcount()
may be more efficient because it gets its values for node counts or relationships from a node from the internal count store of the graph.Pattern comprehension
MATCH (m:Movie)
WHERE m.year = 2015
RETURN m.title,
[(dir:Person)-[:DIRECTED]->(m) | dir.name] AS directors,
[(actor:Person)-[:ACTED_IN]->(m) | actor.name] AS actorsMap projection
MATCH (m:Movie)
WHERE m.title CONTAINS 'Matrix'
RETURN m { .title, .released } AS movieSetting variable
WITH 'Tom Hanks' AS actorName
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = actorName
RETURN m.title AS moviesUtilizing scope
// Limit return amount using `with` scope
WITH 'Tom Hanks' AS theActor
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = theActor
WITH m LIMIT 2
// possibly do more with the two m nodes
RETURN m.title AS movies
// Sort before scope
WITH 'Tom Hanks' AS theActor
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = theActor
WITH m ORDER BY m.year LIMIT 5
// possibly do more with the five m nodes in a particular order
RETURN m.title AS movies, m.year AS yearReleased
Map projection and aggregation
MATCH (n:Movie)
WHERE n.imdbRating IS NOT NULL
AND n.poster IS NOT NULL
WITH n {
.title,
.year,
.languages,
.plot,
.poster,
.imdbRating,
directors: [ (n)<-[:DIRECTED]-(d) | d { tmdbId:d.imdbId, .name } ]
}
ORDER BY n.imdbRating DESC LIMIT 4
RETURN collect(n)tipAlthough this is nice for processing on the client side, it takes more memory on the server as records cannot be streamed to the client but are collected into the list structure on the server.
UNWIND
UNWIND returns a row for each element of a list.
MATCH (m:Movie)-[:ACTED_IN]-(a:Actor)
WHERE a.name = 'Tom Hanks'
UNWIND m.languages AS lang
RETURN m.title AS movie,
m.languages AS languages,
lang AS language
Date Manipulation
Create date properties
MERGE (x:Test {id: 1})
SET x.date = date(),
x.datetime = datetime(),
x.time = time()
RETURN x
CALL apoc.meta.nodeTypeProperties()Extract date data
MATCH (x:Test {id: 1})
RETURN x.date.day, x.date.year,
x.datetime.year, x.datetime.hour,
x.datetime.minuteSetting date values
MATCH (x:Test {id: 1})
SET x.date1 = date('2022-01-01'),
x.date2 = date('2022-01-15')
RETURN xSetting datetime values
MATCH (x:Test {id: 1})
SET x.datetime1 = datetime('2022-01-04T10:05:20'),
x.datetime2 = datetime('2022-04-09T18:33:05')
RETURN xWorking with durations
// Match between duration
MATCH (x:Test {id: 1})
RETURN duration.between(x.date1,x.date2)
// Calculate duration in days
MATCH (x:Test {id: 1})
RETURN duration.inDays(x.datetime1,x.datetime2).days
// Add duration to date
MATCH (x:Test {id: 1})
RETURN x.date1 + duration({months: 6})Format date and time with apoc
MATCH (x:Test {id: 1})
RETURN x.datetime as Datetime,
apoc.temporal.format( x.datetime, 'HH:mm:ss.SSSS')
AS formattedDateTime
MATCH (x:Test {id: 1})
RETURN apoc.date.toISO8601(x.datetime.epochMillis, "ms")
AS iso8601
RETURN apoc.temporal.toZonedTemporal('2012-12-23 23:59:59',"yyyy-MM-dd HH:mm:ss") AS output;Return path in curry
MATCH p = ((person:Person)-[]->(movie))
WHERE person.name = 'Walt Disney'
RETURN p
Reducing Memory
In Cypher, a query represents a single transaction against the graph.
Subquery
- A subquery returns values referred to by the variables in the RETURN clause.
- A subquery cannot return variables with the same name used in the enclosing query.
- You must explicitly pass in variables from the enclosing query to a subquery.
CALL {
MATCH (m:Movie) WHERE m.year = 2000
RETURN m ORDER BY m.imdbRating DESC LIMIT 10
}
MATCH (:User)-[r:RATED]->(m)
RETURN m.title, avg(r.rating)Union
tipUNION ALL
returns all results which is more efficient on memory but can lead to duplicates.UNION
returns distinct results.MATCH (m:Movie) WHERE m.year = 2000
RETURN {type:"movies", theMovies: collect(m.title)} AS data
UNION ALL
MATCH (a:Actor) WHERE a.born.year > 2000
RETURN { type:"actors", theActors: collect(DISTINCT a.name)} AS dataUnion in subquery
MATCH (p:Person)
WITH p LIMIT 100
CALL {
WITH p
OPTIONAL MATCH (p)-[:ACTED_IN]->(m:Movie)
RETURN m.title + ": " + "Actor" AS work
UNION
WITH p
OPTIONAL MATCH (p)-[:DIRECTED]->(m:Movie)
RETURN m.title+ ": " + "Director" AS work
}
RETURN p.name, collect(work)
Using parameters
A parameter name is prefixed with a $
symbol in Cypher statement.
// Setting paramter with string value
:param actorName: 'Tom Hanks'
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = $actorName
RETURN m.released AS releaseDate,
m.title AS title
ORDER BY m.released DESC
// JSON syntax to set multiple parameters
:params {actorName: 'Tom Cruise', movieName: 'Top Gun'}
// List all parameters
:params
// Clear all parameters
:params {}
However, special care is required when handling integer value due to a discrepancy between integers in JavaScript and the Neo4j type system. To avoid any data loss on large numbers, any integers are converted to floating point values when the parameter is set.
:param number: 10 // outputs: { "number": 10.0 }
// Force type
:param number=> 10 // outputs: { "number": 10}
Using constraints
Constraint is interally implemented as an index, and is typically created before the data is loaded into the graph.
There are three types of constraints:
- Uniqueness for a single node property value.
- Existence for a property of a node or relationship.
- Existence and uniqueness for a set of node property values (called a
Node key
).
When creating a uniqueness constraint on a property, a node label that the constraint will be associated with must be specified. Attempt to create a uniqueness constraint on a property that is not unique for all nodes with that label, the constraint creation will fail.
Creating basic constraints
// Create a uniqueness constraint for a single property
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR (x:<node_label>)
REQUIRE x.<property_key> IS UNIQUE
// Create a uniqueness constraint for multiple properties
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR (x:<node_label>)
REQUIRE (x.<property_key1>, x.<property_key2>) IS UNIQUE
// Create a existance constraint on node
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR (x:<node_label>)
REQUIRE x.<property_key> IS NOT NULL
// Create a existance constraint on relationship
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR ()-[x:<RELATIONSHIP_TYPE>]-()
REQUIRE x.<property_key> IS NOT NULL
// List all constraints
SHOW CONSTRAINTS
Creating node key constraint
Node key constraint is a specialized type of constraint for the properties of a node. It combines existence with uniqueness.
// Creating a Node key constraint for a single property
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR (x:<node_label>)
REQUIRE x.<property_key> IS NODE KEY
// Creating a Node key constraint for mulitple properties
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR (x:<node_label>)
REQUIRE (x.<property_key1>, x.<property_key2>) IS NODE KEY
Drop constraint
DROP CONSTRAINT <constraint_name>
// Creating list of constraints to drop
SHOW CONSTRAINTS
YIELD name
RETURN collect('DROP CONSTRAINT ' + name + ';') AS Statements
// Drop constraints using APOC
CALL apoc.schema.assert({},{},true)
Using indexes
Neo4j supports two types of index: search-performace index and semantic index
Searh-performance index type enables quicker retrieval of exact matches between an index and the primary data storage, including: RANGE
, LOOKUP
, TEXT
, POINT
.
Semantic index type captures the semantic meaning or context of the data by returning an approximation score, which indicates the similarity between a query string and the data in a database. It includes: full-text
and vector
.
Unlike search-performance indexes, semantic indexes are not automatically used by the Cypher® planner. To use semantic indexes, they must be explicitly called with specific procedures.
During query planning, at most one index is used. So it is important to plan what properties to index how to create them.
To list and drop indexes created for the graph: below Cypher command can be used:
SHOW INDEXES
DROP INDEX <index_name>
Range index
Range
index can be defined on a property of node or relationship type. It is implemented with B tree which is often used to enable efficient sorting. It can speed up following Cypher queries:
- Equality checks
=
(TEXT index may perform better for string properties) - Range comparisons
>
,>=
,<
,<=
STARTS WITH
string comparisons (performs better thanTEXT
index)- Existence checks
IS NOT NULL
Creating RANGE index
// Create index on node
CREATE INDEX <index_name> IF NOT EXISTS
FOR (x:<node_label>)
ON x.<property_key>
// Create index on relationship
CREATE INDEX <index_name> IF NOT EXISTS
FOR ()-[x:<relationship_label>]-()
ON x.<property_key>
TEXT index
A TEXT
index supports node or relationship property types that must be strings. It speeds up the following Cypher code:
- Equality checks:
=
- String comparisons:
ENDS WITH
,CONTAINS
- List membership: x.prop in ["a","b","c"]
Using TEXT index also brings below benefits:
- Take up less space
- More performant than RANGE indexes when the property being indexed contains a lot of duplicate values.
For ENDS WITH
ands CONTAINS
comparisons, TEXT
index might perform better than RANGE
index.
Total db hits is often a useful indicator to evaluate the performance of a query. For TEXT index, however, this might not be the case. It can have higher db hits but providing less elapsed time.
Creating TEXT index
// Create TEXT index on node
CREATE TEXT INDEX <index_name> IF NOT EXISTS
FOR (x:<node_label>)
ON x.<property_key>
// Create TEXT index on relationship
CREATE TEXT INDEX <index_name> IF NOT EXISTS
FOR ()-[x:<RELATIONSHIP_TYPE>]-()
ON (x.<property_key>)
Composite index
A composite index combines values from multiple properties, can be of different type, for a node label or for relationship type. This index comes in handy when multiple properties are often queried together.
Creating Composite index
// Composite index on node
CREATE INDEX <index_name> IF NOT EXISTS
FOR (x:<node_label>)
ON (x.<property_key1>,x.<property_key2>,...)
// Composite index on relationship
CREATE INDEX <index_name> IF NOT EXISTS
FOR ()-[x:<RELATIONSHIP_TYPE>]-()
ON (x.<property_key1>, x.<property_key2>,...)
However, queries that simply check if property exist or only one property of the index is tested won't utilize this kind of index. For instance, composite index won't be used in below query even if a composite index, say (m.year, m.runtime), is created.
PROFILE MATCH (m:Movie)
WHERE m.year IS NULL AND m.runtime IS NULL
RETURN m.title, m.year, m.runtime
// OR
PROFILE MATCH (m:Movie)
WHERE m.year = 1994
RETURN m.title, m.year, m.runtime
Full-Text index
A full-text index is based upon string values only, but provides additional search capabilities that RANGE or TEXT indexes don't offer for it is built on Apache Lucene. It can be used on:
- Node or relationship properties
- Single property or multiple properties
- Single or multiple types of nodes (labels)
- Single or multiple types of relationships
Full-text index has below benefits:
- Utilize Lucene to offer performant full-text search
- Enables using regular expressions for complex query predicates
- Allow specifying an index on multiple properties associated with multiple labels
Full-text index can be created for:
- A string property in each node with a given label.
- Multiple string properties across multiple node labels.
- A string property in each relationship with a given type.
- Multiple string properties across multiple relationship types.
But not:
- ❌ Properties across node and relationship
Using full-text index
i
Explicit instruction must be included in Cypher to hint the query engine to use full-text index by calling a special procedure, db.index.fulltext.queryNodes()
or db.index.fulltext.queryRelationships()
,whereas other indexs are used automatically.
// Create FULLTEXT index for node
CREATE FULLTEXT INDEX <index_name> IF NOT EXISTS
FOR (x:<node_label>)
ON EACH [x.<property_key>]
// Create FULLTEXT index for relationship
CREATE FULLTEXT INDEX <index_name> IF NOT EXISTS
FOR ()-[x:<RELATIONSHIP_TYPE>]-()
ON EACH [x.<property_key>]
// Query node utilizing FULLTEXT index
CALL db.index.fulltext.queryNodes
("<index_name>", "<lucene_query>")
YIELD node
RETURN node.<property>
// Query relationship utilizing FULLTEXT index
CALL db.index.fulltext.queryRelationships
("<index_name>", "<lucene_query>")
YIELD relationship
RETURN relationship.<property>
The total db hits statistics do not reflect the true db hits because the PROFILE will never expose details of a procedure. All that you can measure for queries that call procedures is elapsed time.
LOOKUP index
LOOKUP indexes are used to look up nodes with a specific label or relationships of a specific type. They are always created over all labels or relationship types. Therefore, databases can have a maximum of two token LOOKUP indexes - one for nodes and one for relationships.
Controlling index usage
MATCH
clause uses an index by default, specific hint USING
must be added to utilize specific index.
PROFILE MATCH
(p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p2:Person)
USING INDEX p:Person(name)
WHERE
p.name CONTAINS 'John'
AND
p2.name CONTAINS 'George'
RETURN p.name, p2.name, m.title
Multiple USING
can be used:
PROFILE MATCH
(p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p2:Person)
USING INDEX p:Person(name)
USING INDEX p2:Person(name)
WHERE
p.name CONTAINS 'John'
AND
p2.name CONTAINS 'George'
RETURN p.name, p2.name, m.title
However, not all queries will benefit from query hints, so always test the effectiveness of the queries/indexes.
Index limitations
Cypher Aggregation
Creating List
MATCH (m:Movie)
RETURN [m.title, m.released, date().year - date(m.released).year + 1 ]
collect
can be used to aggregate nods related to specific nodes into a list.
MATCH (a:label1)--(m:label2)
WITH a, collect (m) AS col2
RETURN a AS col1, col2