Skip to main content

Neo4j

Basic Cypher

Creating Node

MERGE keyword is used to create a pattern in the database. It checks for duplication before creating.

CREATE can also be used to create node. The benefit is that it does not look up the primary key before adding the node. Use it when you are sure your data is clean and you want greater speed during import.

MERGE (p:Person {name: 'Katie Holmes'})
CREATE (m:Movie {title: 'The Dark Knight'})
RETURN p, m

Creating nodes and relationships using multiple clauses

MATCH (p:Person {name: 'Michael Caine'})
MATCH (m:Movie {title: 'The Dark Knight'})
MERGE (p)-[:ACTED_IN]->(m)
MERGE (p:Person {name: 'Chadwick Boseman'})
MERGE (m:Movie {title: 'Black Panther'})
MERGE (p)-[:ACTED_IN]-(m)

Adding properties for a node or relationship

  • Inline as part of the MERGE clause

    MERGE (p:Person {name: 'Michael Caine'})
    MERGE (m:Movie {title: 'Batman Begins'})
    MERGE (p)-[:ACTED_IN {roles: ['Alfred Penny']}]->(m)
    RETURN p,m
  • Using the SET keyword

    MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
    WHERE p.name = 'Michael Caine' AND m.title = 'The Dark Knight'
    SET r.roles = ['Alfred Penny']
    RETURN p, r, m
  • Setting label

    MATCH (p:Person {name: 'Jane Doe'})
    SET p:Developer
    RETURN p
  • Setting mutliple properties

    MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
    WHERE p.name = 'Michael Caine' AND m.title = 'The Dark Knight'
    SET r.roles = ['Alfred Penny'], r.year = 2008
    RETURN p, r, m
  • Update properties

    MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
    WHERE p.name = 'Michael Caine' AND m.title = 'The Dark Knight'
    SET r.roles = ['Mr. Alfred Penny']
    RETURN p, r, m
  • Remove properties

    You should never remove the property that is used as the primary key for a node.

    MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
    WHERE p.name = 'Michael Caine' AND m.title = 'The Dark Knight'
    REMOVE r.roles
    SET p.born = null
    RETURN p, r, m
  • Customize MERGE behaviour

    // Find or create a person with this name
    MERGE (p:Person {name: 'McKenna Grace'})

    // Only set the `createdAt` property if the node is created during this query
    ON CREATE SET p.createdAt = datetime()

    // Only set the `updatedAt` property if the node was created previously
    ON MATCH SET p.updatedAt = datetime()

    // Set the `born` property regardless
    SET p.born = 2006

    RETURN p
  • Deleting node/relationship/labels

    MATCH (p:Person)
    WHERE p.name = 'Jane Doe'
    DELETE p

    Neo4j prevents orphaned relationships in the graph. So node can't be deleted if it has relationship connected to it.

    // Direct execution raises error
    MATCH (p:Person {name: 'Jane Doe'})
    DELETE p

    // First remove relationship so node can be deleted
    MATCH (p:Person {name: 'Jane Doe'})-[r:ACTED_IN]->(m:Movie {title: 'The Matrix'})
    DELETE r
    RETURN p, m

    Use DETACH to delete node and relationship together. Note that only do this on relatively small databases as trying to do this on a large database exhausts memory.

    MATCH (p:Person {name: 'Jane Doe'})
    DETACH DELETE p

    // This deletes entire graph
    MATCH (n)
    DETACH DELETE n

    To remove label:

    MATCH (p:Person {name: 'Jane Doe'})
    REMOVE p:Developer
    RETURN p
  • Returning Path It's sometimes useful to work with path object using built-in functions:

    • length(p) returns the length of a path.
    • nodes(p) returns a list containing the nodes for a path.
    • relationships(p) returns a list containing the relationships for a path.
    MATCH p = ((person:Person)-[]->(movie))
    WHERE person.name = 'Walt Disney'
    RETURN p

Graph Traversal

When the execution plan is created, it determines the set of nodes, referred as anchor, that will be the starting points for the query. The anchor is often based upon a MATCH clause, typically determined by meta-data that is stored in the graph or a filter that is provided inline or in a WHERE clause. The anchor for a query will be based upon the fewest number of nodes that need to be retrieved into memory.

Once anchor node is extracted, path will then be expanded from them if provided

  • Find anchor nodes

    // p is anchor, since Person node amount is less than general nodes m
    PROFILE MATCH (p:Person)-[:ACTED_IN]->(m)
    RETURN p.name, m.title LIMIT 100

    // m is anchor, since Movie node amount is less than Person node
    PROFILE MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
    RETURN p.name, m.title LIMIT 100

    // p is anchor, since there is only one Person node with name Eminem
    PROFILE MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
    WHERE p.name = 'Eminem'
    RETURN p.name, m.title

Varying length traversal

  • Shortest path For shortestPath() and allShortestPaths() you can provide an upper bound on the length of the path(s), but not a lower bound.

    // This query finds the shortest path between two nodes regardless of relationship types
    MATCH p = shortestPath((p1:Person)-[*]-(p2:Person))
    WHERE p1.name = "Eminem"
    AND p2.name = "Charlton Heston"
    RETURN p

    // This query finds the shortest path between two nodes using only ACTED_IN relationship
    MATCH p = shortestPath((p1:Person)-[:ACTED_IN*]-(p2:Person))
    WHERE p1.name = "Eminem"
    AND p2.name = "Charlton Heston"
    RETURN p
  • Varying length traversal When exact hops are given, paths are traversed in a depth-first manner.

    // This query returns Person node that are exactly two hops away from Eminem, if any
    MATCH (p:Person {name: 'Eminem'})-[:ACTED_IN*2]-(others:Person)
    RETURN others.name

    // This query returns Person node that are up to four hops away from Eminem, if any
    MATCH (p:Person {name: 'Eminem'})-[:ACTED_IN*1..4]-(others:Person)
    RETURN others.name

Advanced Cypher

  • Get node by id

    MATCH (n) where ID(n)=<your_id>
    RETURN n
  • Label related

    // Get list of all labels
    CALL db.labels()

    // Find node with label
    MATCH (p)
    WHERE p.born.year > 1960
    AND p:Actor
    AND p:Director
    RETURN p.name, p.born, labels(p)
  • Schema Related

    // View data model
    CALL db.schema.visualization()

    // View node property type
    CALL db.schema.nodeTypeProperties()

    // View relationship property type
    CALL db.schema.relTypeProperties()

    // Show constraints
    SHOW CONSTRAINTS
  • Patterns-related In general, using a single MATCH clause will perform better than multiple MATCH clauses.

    // Multiple patterns
    MATCH (a:Person)-[:ACTED_IN]->(m:Movie), (m)<-[:DIRECTED]-(d:Person)
    WHERE m.year > 2000
    RETURN a.name, m.title, d.name

    // This query will have better performance then the above one
    MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
    WHERE m.year > 2000
    RETURN a.name, m.title, d.name

    // Multi-relationship match
    MATCH (p:Person) -[r:ACTED_IN|DIRECTED]->(m:Movie)
    RETURN p.name, r.role, m.title

    // Test pattern existence
    MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
    WHERE p.name = 'Tom Hanks'
    AND exists {(p)-[:DIRECTED]->(m)}
    RETURN p.name, labels(p), m.title
  • Filtering Note that string matching in Cypher is case-sensitive.

    Warning

    Query engine turns off the use of the index if a string property is transformed uring query, such as using toUpper() or toLower().

    A best practice for handling property values that need to be evaluated as upper, lower, or mixed case is to use fulltext schema indexes.

    // Check string property value starts with prefix or ends with suffix
    MATCH (m:Movie)
    WHERE m.title STARTS WITH 'Toy Story'
    WHERE m.title ENDS WITH '2'
    RETURN m.title, m.released

    // Check value containse substring
    MATCH (p:Person)
    WHERE toUpper(p.name) CONTAINS ' DE '
    RETURN p.name

    // To avoid missing data due to mismatch in case
    MATCH (p:Person)
    WHERE toLower(p.name) ENDS WITH 'demille'
    OR toUpper(p.name) ENDS WITH 'DEMILLE'
    RETURN p.name

    // Testing inequality of a property using the <> predicate
    MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
    WHERE p.name <> 'Tom Hanks'
    AND m.title = 'Captain Phillips'
    RETURN p.name

    // Less/Greater than
    MATCH (m:Movie) WHERE m.title = 'Toy Story'
    RETURN
    m.year < 1995 AS lessThan, // Less than (false)
    m.year <= 1995 AS lessThanOrEqual, // Less than or equal(true)
    m.year > 1995 AS moreThan, // More than (false)
    m.year >= 1995 AS moreThanOrEqual // More than or equal (true)

    // Range
    MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
    WHERE 2005 <= m.year <= 2010
    RETURN m.title, m.released

    // Or condition
    MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
    WHERE p.name = 'Tom Hanks'
    OR m.title = 'Captain Phillips'
    RETURN p.name, m.title

    // Null testing
    MATCH (p:Person)
    WHERE p.died IS NOT NULL
    AND p.born.year >= 1985
    RETURN p.name, p.born, p.died

    MATCH (p:Person)
    WHERE p.died IS NULL
    AND p.born.year <= 1922
    RETURN p.name, p.born, p.died
  • Profiling The difference between using EXPLAIN and PROFILE is that EXPLAIN provides estimates of the query steps where PROFILE provides the exact steps and number of rows retrieved for the query.

    PROFILE MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p)
    WHERE p.name = 'Tom Hanks'
    RETURN m.title

    EXPLAIN MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p)
    WHERE p.name = 'Tom Hanks'
    RETURN m.title
  • Optional Match OPTIONAL MATCH matches patterns with your graph, just like MATCH does. The difference is that if no matches are found, OPTIONAL MATCH will use nulls for missing parts of the pattern. OPTIONAL MATCH could be considered the Cypher equivalent of the outer join in SQL.

    MATCH (m:Movie) WHERE m.title = "Kiss Me Deadly"
    MATCH (m)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(rec:Movie)
    OPTIONAL MATCH (m)<-[:ACTED_IN]-(a:Actor)-[:ACTED_IN]->(rec)
    RETURN rec.title, a.name
  • Order result

    There is no limit to the number of properties you can order by.

    MATCH (p:Person) WHERE p.born.year > 1980 RETURN p
    ORDER BY p.born

    MATCH (p:Person) WHERE p.born.year > 1980 RETURN p
    ORDER BY p.born DESC

    // Multiple sort expression
    MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
    WHERE p.name = 'Tom Hanks'
    OR p.name = 'Keanu Reeves'
    RETURN m.year, m.title
    ORDER BY m.year DESC , m.title
  • Paganition

    MATCH (p:Person)
    WHERE p.born.year = 1980
    RETURN p.name as name,
    p.born AS birthDate
    ORDER BY p.born SKIP 40 LIMIT 10
  • Distinct You can use DISTINCT to eliminate duplication of:

    • rows returned (you have just learned this)
    • property values
    • nodes
    MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
    WHERE p.name = 'Tom Hanks'
    RETURN DISTINCT m.title, m.released
    ORDER BY m.titles
  • Customize returned result

    // Return partial property
    MATCH (p:Person)
    WHERE p.name CONTAINS "Thomas"
    RETURN p { .name, .born } AS person
    ORDER BY p.name

    // Adding custom data
    MATCH (m:Movie)<-[:DIRECTED]-(d:Director)
    WHERE d.name = 'Woody Allen'
    RETURN m {.*, favorite: true} AS movie

    // Custom string/numeric operation
    MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)
    WHERE m.title CONTAINS 'Toy Story' AND
    p.died IS NULL
    RETURN 'Movie: ' + m.title AS movie,
    p.name AS actor,
    p.born AS dob,
    date().year - p.born.year AS ageThisYear

    // Conditionally changine returned data
    MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)
    WHERE p.name = 'Henry Fonda'
    RETURN m.title AS movie,
    CASE
    WHEN m.year < 1940 THEN 'oldies'
    WHEN 1940 <= m.year < 1950 THEN 'forties'
    WHEN 1950 <= m.year < 1960 THEN 'fifties'
    WHEN 1960 <= m.year < 1970 THEN 'sixties'
    WHEN 1970 <= m.year < 1980 THEN 'seventies'
    WHEN 1980 <= m.year < 1990 THEN 'eighties'
    WHEN 1990 <= m.year < 2000 THEN 'nineties'
    ELSE 'two-thousands'
    END
    AS timeFrame

    // Return a list
    MATCH (p:Person)
    RETURN p.name, [p.born, p.died] AS lifeTime
  • Aggregation

    List of aggregation functions can be checked here.


    // Return a list
    MATCH (p:Person)
    RETURN p.name, [p.born, p.died] AS lifeTime

    // Using collect() to create a list
    // collect() removes duplicates as well
    MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
    RETURN a.name AS actor,
    count(*) AS total,
    collect(m.title) AS movies
    ORDER BY total DESC LIMIT 10

    // Collect nodes
    MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
    WHERE p.name ='Tom Cruise'
    RETURN collect(m) AS tomCruiseMovies

    // Access element of a list
    MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
    RETURN m.title AS movie,
    collect(a.name)[0] AS castMember,
    size(collect(a.name)) as castSize

    // Returning a slice of collection
    MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
    RETURN m.title AS movie,
    collect(a.name)[2..] AS castMember,
    size(collect(a.name)) as castSize

    // List comprehension
    MATCH (m:Movie)
    RETURN m.title as movie,
    [x IN m.countries WHERE x CONTAINS 'USA' OR x CONTAINS 'Germany']
    AS country LIMIT 500
    tip

    count() may be more efficient because it gets its values for node counts or relationships from a node from the internal count store of the graph.

  • Pattern comprehension

    MATCH (m:Movie)
    WHERE m.year = 2015
    RETURN m.title,
    [(dir:Person)-[:DIRECTED]->(m) | dir.name] AS directors,
    [(actor:Person)-[:ACTED_IN]->(m) | actor.name] AS actors
  • Map projection

    MATCH (m:Movie)
    WHERE m.title CONTAINS 'Matrix'
    RETURN m { .title, .released } AS movie
  • Setting variable

    WITH 'Tom Hanks' AS actorName
    MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
    WHERE p.name = actorName
    RETURN m.title AS movies
  • Utilizing scope

    // Limit return amount using `with` scope
    WITH 'Tom Hanks' AS theActor
    MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
    WHERE p.name = theActor
    WITH m LIMIT 2
    // possibly do more with the two m nodes
    RETURN m.title AS movies

    // Sort before scope
    WITH 'Tom Hanks' AS theActor
    MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
    WHERE p.name = theActor
    WITH m ORDER BY m.year LIMIT 5
    // possibly do more with the five m nodes in a particular order
    RETURN m.title AS movies, m.year AS yearReleased
  • Map projection and aggregation

    MATCH (n:Movie)
    WHERE n.imdbRating IS NOT NULL
    AND n.poster IS NOT NULL
    WITH n {
    .title,
    .year,
    .languages,
    .plot,
    .poster,
    .imdbRating,
    directors: [ (n)<-[:DIRECTED]-(d) | d { tmdbId:d.imdbId, .name } ]
    }
    ORDER BY n.imdbRating DESC LIMIT 4
    RETURN collect(n)
    tip

    Although this is nice for processing on the client side, it takes more memory on the server as records cannot be streamed to the client but are collected into the list structure on the server.

  • UNWIND

    UNWIND returns a row for each element of a list.

    MATCH (m:Movie)-[:ACTED_IN]-(a:Actor)
    WHERE a.name = 'Tom Hanks'
    UNWIND m.languages AS lang
    RETURN m.title AS movie,
    m.languages AS languages,
    lang AS language

Date Manipulation

  • Create date properties

    MERGE (x:Test {id: 1})
    SET x.date = date(),
    x.datetime = datetime(),
    x.time = time()
    RETURN x

    CALL apoc.meta.nodeTypeProperties()
  • Extract date data

    MATCH (x:Test {id: 1})
    RETURN x.date.day, x.date.year,
    x.datetime.year, x.datetime.hour,
    x.datetime.minute
  • Setting date values

    MATCH (x:Test {id: 1})
    SET x.date1 = date('2022-01-01'),
    x.date2 = date('2022-01-15')
    RETURN x
  • Setting datetime values

    MATCH (x:Test {id: 1})
    SET x.datetime1 = datetime('2022-01-04T10:05:20'),
    x.datetime2 = datetime('2022-04-09T18:33:05')
    RETURN x
  • Working with durations

    // Match between duration
    MATCH (x:Test {id: 1})
    RETURN duration.between(x.date1,x.date2)

    // Calculate duration in days
    MATCH (x:Test {id: 1})
    RETURN duration.inDays(x.datetime1,x.datetime2).days

    // Add duration to date
    MATCH (x:Test {id: 1})
    RETURN x.date1 + duration({months: 6})
  • Format date and time with apoc

    MATCH (x:Test {id: 1})
    RETURN x.datetime as Datetime,
    apoc.temporal.format( x.datetime, 'HH:mm:ss.SSSS')
    AS formattedDateTime

    MATCH (x:Test {id: 1})
    RETURN apoc.date.toISO8601(x.datetime.epochMillis, "ms")
    AS iso8601

    RETURN apoc.temporal.toZonedTemporal('2012-12-23 23:59:59',"yyyy-MM-dd HH:mm:ss") AS output;
  • Return path in curry

    MATCH p = ((person:Person)-[]->(movie))
    WHERE person.name = 'Walt Disney'
    RETURN p

Reducing Memory

In Cypher, a query represents a single transaction against the graph.

  • Subquery

    • A subquery returns values referred to by the variables in the RETURN clause.
    • A subquery cannot return variables with the same name used in the enclosing query.
    • You must explicitly pass in variables from the enclosing query to a subquery.
    CALL {
    MATCH (m:Movie) WHERE m.year = 2000
    RETURN m ORDER BY m.imdbRating DESC LIMIT 10
    }
    MATCH (:User)-[r:RATED]->(m)
    RETURN m.title, avg(r.rating)
  • Union

    tip

    UNION ALL returns all results which is more efficient on memory but can lead to duplicates. UNION returns distinct results.

    MATCH (m:Movie) WHERE m.year = 2000
    RETURN {type:"movies", theMovies: collect(m.title)} AS data
    UNION ALL
    MATCH (a:Actor) WHERE a.born.year > 2000
    RETURN { type:"actors", theActors: collect(DISTINCT a.name)} AS data
  • Union in subquery

    MATCH (p:Person)
    WITH p LIMIT 100
    CALL {
    WITH p
    OPTIONAL MATCH (p)-[:ACTED_IN]->(m:Movie)
    RETURN m.title + ": " + "Actor" AS work
    UNION
    WITH p
    OPTIONAL MATCH (p)-[:DIRECTED]->(m:Movie)
    RETURN m.title+ ": " + "Director" AS work
    }
    RETURN p.name, collect(work)

Using parameters

A parameter name is prefixed with a $ symbol in Cypher statement.

// Setting paramter with string value
:param actorName: 'Tom Hanks'

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = $actorName
RETURN m.released AS releaseDate,
m.title AS title
ORDER BY m.released DESC

// JSON syntax to set multiple parameters
:params {actorName: 'Tom Cruise', movieName: 'Top Gun'}

// List all parameters
:params

// Clear all parameters
:params {}

However, special care is required when handling integer value due to a discrepancy between integers in JavaScript and the Neo4j type system. To avoid any data loss on large numbers, any integers are converted to floating point values when the parameter is set.

:param number: 10   // outputs: { "number": 10.0 }

// Force type
:param number=> 10 // outputs: { "number": 10}

Using constraints

Constraint is interally implemented as an index, and is typically created before the data is loaded into the graph.

There are three types of constraints:

  • Uniqueness for a single node property value.
  • Existence for a property of a node or relationship.
  • Existence and uniqueness for a set of node property values (called a Node key).

When creating a uniqueness constraint on a property, a node label that the constraint will be associated with must be specified. Attempt to create a uniqueness constraint on a property that is not unique for all nodes with that label, the constraint creation will fail.

Creating basic constraints

// Create a uniqueness constraint for a single property
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR (x:<node_label>)
REQUIRE x.<property_key> IS UNIQUE

// Create a uniqueness constraint for multiple properties
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR (x:<node_label>)
REQUIRE (x.<property_key1>, x.<property_key2>) IS UNIQUE

// Create a existance constraint on node
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR (x:<node_label>)
REQUIRE x.<property_key> IS NOT NULL

// Create a existance constraint on relationship
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR ()-[x:<RELATIONSHIP_TYPE>]-()
REQUIRE x.<property_key> IS NOT NULL

// List all constraints
SHOW CONSTRAINTS

Creating node key constraint

Node key constraint is a specialized type of constraint for the properties of a node. It combines existence with uniqueness.

// Creating a Node key constraint for a single property
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR (x:<node_label>)
REQUIRE x.<property_key> IS NODE KEY

// Creating a Node key constraint for mulitple properties
CREATE CONSTRAINT <constraint_name> IF NOT EXISTS
FOR (x:<node_label>)
REQUIRE (x.<property_key1>, x.<property_key2>) IS NODE KEY

Drop constraint

DROP CONSTRAINT <constraint_name>

// Creating list of constraints to drop
SHOW CONSTRAINTS
YIELD name
RETURN collect('DROP CONSTRAINT ' + name + ';') AS Statements

// Drop constraints using APOC
CALL apoc.schema.assert({},{},true)

Using indexes

Neo4j supports two types of index: search-performace index and semantic index Searh-performance index type enables quicker retrieval of exact matches between an index and the primary data storage, including: RANGE, LOOKUP, TEXT, POINT. Semantic index type captures the semantic meaning or context of the data by returning an approximation score, which indicates the similarity between a query string and the data in a database. It includes: full-text and vector.

note

Unlike search-performance indexes, semantic indexes are not automatically used by the Cypher® planner. To use semantic indexes, they must be explicitly called with specific procedures.

During query planning, at most one index is used. So it is important to plan what properties to index how to create them.

To list and drop indexes created for the graph: below Cypher command can be used:

SHOW INDEXES
DROP INDEX <index_name>

Range index

Range index can be defined on a property of node or relationship type. It is implemented with B tree which is often used to enable efficient sorting. It can speed up following Cypher queries:

  • Equality checks = (TEXT index may perform better for string properties)
  • Range comparisons >, >=, <, <=
  • STARTS WITH string comparisons (performs better than TEXT index)
  • Existence checks IS NOT NULL

Creating RANGE index

// Create index on node
CREATE INDEX <index_name> IF NOT EXISTS
FOR (x:<node_label>)
ON x.<property_key>

// Create index on relationship
CREATE INDEX <index_name> IF NOT EXISTS
FOR ()-[x:<relationship_label>]-()
ON x.<property_key>

TEXT index

A TEXT index supports node or relationship property types that must be strings. It speeds up the following Cypher code:

  • Equality checks: =
  • String comparisons:ENDS WITH, CONTAINS
  • List membership: x.prop in ["a","b","c"]

Using TEXT index also brings below benefits:

  • Take up less space
  • More performant than RANGE indexes when the property being indexed contains a lot of duplicate values.
Tip 1

For ENDS WITH ands CONTAINS comparisons, TEXT index might perform better than RANGE index.

Tip 2

Total db hits is often a useful indicator to evaluate the performance of a query. For TEXT index, however, this might not be the case. It can have higher db hits but providing less elapsed time.

Creating TEXT index

// Create TEXT index on node
CREATE TEXT INDEX <index_name> IF NOT EXISTS
FOR (x:<node_label>)
ON x.<property_key>

// Create TEXT index on relationship
CREATE TEXT INDEX <index_name> IF NOT EXISTS
FOR ()-[x:<RELATIONSHIP_TYPE>]-()
ON (x.<property_key>)

Composite index

A composite index combines values from multiple properties, can be of different type, for a node label or for relationship type. This index comes in handy when multiple properties are often queried together.

Creating Composite index

// Composite index on node
CREATE INDEX <index_name> IF NOT EXISTS
FOR (x:<node_label>)
ON (x.<property_key1>,x.<property_key2>,...)

// Composite index on relationship
CREATE INDEX <index_name> IF NOT EXISTS
FOR ()-[x:<RELATIONSHIP_TYPE>]-()
ON (x.<property_key1>, x.<property_key2>,...)

However, queries that simply check if property exist or only one property of the index is tested won't utilize this kind of index. For instance, composite index won't be used in below query even if a composite index, say (m.year, m.runtime), is created.

PROFILE MATCH (m:Movie)
WHERE m.year IS NULL AND m.runtime IS NULL
RETURN m.title, m.year, m.runtime

// OR

PROFILE MATCH (m:Movie)
WHERE m.year = 1994
RETURN m.title, m.year, m.runtime

Full-Text index

A full-text index is based upon string values only, but provides additional search capabilities that RANGE or TEXT indexes don't offer for it is built on Apache Lucene. It can be used on:

  • Node or relationship properties
  • Single property or multiple properties
  • Single or multiple types of nodes (labels)
  • Single or multiple types of relationships

Full-text index has below benefits:

  • Utilize Lucene to offer performant full-text search
  • Enables using regular expressions for complex query predicates
  • Allow specifying an index on multiple properties associated with multiple labels

Full-text index can be created for:

  • A string property in each node with a given label.
  • Multiple string properties across multiple node labels.
  • A string property in each relationship with a given type.
  • Multiple string properties across multiple relationship types.

But not:

  • ❌ Properties across node and relationship

Using full-text index

i Explicit instruction must be included in Cypher to hint the query engine to use full-text index by calling a special procedure, db.index.fulltext.queryNodes() or db.index.fulltext.queryRelationships() ,whereas other indexs are used automatically.

// Create FULLTEXT index for node
CREATE FULLTEXT INDEX <index_name> IF NOT EXISTS
FOR (x:<node_label>)
ON EACH [x.<property_key>]

// Create FULLTEXT index for relationship
CREATE FULLTEXT INDEX <index_name> IF NOT EXISTS
FOR ()-[x:<RELATIONSHIP_TYPE>]-()
ON EACH [x.<property_key>]

// Query node utilizing FULLTEXT index
CALL db.index.fulltext.queryNodes
("<index_name>", "<lucene_query>")
YIELD node
RETURN node.<property>

// Query relationship utilizing FULLTEXT index
CALL db.index.fulltext.queryRelationships
("<index_name>", "<lucene_query>")
YIELD relationship
RETURN relationship.<property>
tip

The total db hits statistics do not reflect the true db hits because the PROFILE will never expose details of a procedure. All that you can measure for queries that call procedures is elapsed time.

LOOKUP index

LOOKUP indexes are used to look up nodes with a specific label or relationships of a specific type. They are always created over all labels or relationship types. Therefore, databases can have a maximum of two token LOOKUP indexes - one for nodes and one for relationships.

Controlling index usage

MATCH clause uses an index by default, specific hint USING must be added to utilize specific index.

PROFILE MATCH
(p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p2:Person)
USING INDEX p:Person(name)
WHERE
p.name CONTAINS 'John'
AND
p2.name CONTAINS 'George'
RETURN p.name, p2.name, m.title

Multiple USING can be used:

PROFILE MATCH
(p:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(p2:Person)
USING INDEX p:Person(name)
USING INDEX p2:Person(name)
WHERE
p.name CONTAINS 'John'
AND
p2.name CONTAINS 'George'
RETURN p.name, p2.name, m.title

However, not all queries will benefit from query hints, so always test the effectiveness of the queries/indexes.

Index limitations

Recommed REad

Cypher Aggregation

Creating List

MATCH (m:Movie)
RETURN [m.title, m.released, date().year - date(m.released).year + 1 ]

collect can be used to aggregate nods related to specific nodes into a list.

MATCH (a:label1)--(m:label2)
WITH a, collect (m) AS col2
RETURN a AS col1, col2