Py2neo¶
Py2neo is a python package for working with graph databases on Neo4j. It's not officially supported by Neo4j (as the package neo4j
is), but it offers a worthwhile suite of tools for working with graph databases in a more pythonic way.
Before you get too invested, a word of warning. The package is not as well used as something like pandas
, and it has very little in the way of usage documentation--just a straightforward documentation of its API. You will struggle to find current documentation for problems your facing (besides this wonderful guide, of course). When you do find a solution, it will almost certainly be out of date and no longer work. There's almost nothing on YouTube posted in the last 5 years. You will probably end up digging through its source code at some point trying to figure something out. If you're still here, read on!
Tip
Always start at the home page of the py2neo handbook (py2neo.org) when looking for documentation. If you search google for the specific function or issue you are having, the documentation will be out of date.
The missing quick start guide¶
To get up-to-speed, we'll work through a quick canonical example.
0. Installation¶
To install the latest release of py2neo, simply use:
pip install --upgrade py2neo
1. Create a graph database¶
First, you'll need to create an empty graph database using Neo4j Desktop. Check out the Neo4j tutorial for the first steps. Start the database. The database should be exposed on port 7687. If not, you will have to update the URI in the next step. Make sure you remember the password.
2. Connect with py2neo¶
Once the graph database has been started, it will be available to connect.
from py2neo import Graph
URI = 'bolt://localhost:7687'
USERNAME = 'neo4j'
PASSWORD = '<password>'
graph = Graph(URI, auth=(USERNAME, PASSWORD))
The first argument to graph is the URI string for the graph. The auth
parameter is a tuple of the username (generally 'neo4j') and your password.
3. Create nodes and relationships¶
It is quite easy to create nodes and relationships.
from py2neo import Node, Relationship
a = Node("Person", name="Alice", age=33)
b = Node("Person", name="Bob", age=44)
KNOWS = Relationship.type("KNOWS")
graph.merge(KNOWS(a, b), "Person", "name")
The nodes assigned to a
and b
and the relationship KNOWS
are not actually created until the merge
command is called.
Let's look at an alternative pattern that uses a transaction to create the nodes and relationships. This is going to be safer in most cases, as a transaction will be rolled back if any part fails.
from py2neo import Node, Relationship
tx = graph.begin()
a = Node("Person", name="Alice")
tx.create(a)
b = Node("Person", name="Bob")
ab = Relationship(a, "KNOWS", b)
tx.create(ab)
tx.commit()
Note some commands are transactions, and you shouldn't use both at the same time (e.g., update properties of node with graph.push()). If you have up to 20,000 operations to run you can batch those with one transaction. This is both safe and efficient. Over 20,000 you may run into a memory issue and you'll need to periodically commit as you are running (committing each of significantly more than 20,000 transaction might take a long time).
See more on adding nodes.
4. Query graph¶
You can run any cypher query with the graph.run()
command:
results = graph.run("MATCH(p:Person {name='Alice'})-[k:KNOWS]-(p2) RETURN p2.name")
Usage¶
Add nodes¶
A node has:
- a label (e.g., 'Person'),
- a suite of properties (e.g., name='Alice'),
- a primary key (the property with which to identify the node, e.g., 'name'), and
- a primary label (e.g., 'Person')
tx = graph.begin()
new_node = Node('<LABEL>', <id_property>=<>,
primary_key='<id_property>',
primary_label='<LABEL>')
tx.merge(new_node, '<LABEL', '<id_property>')
tx.commit()
You can also use a dictionary to add properties. Any kwargs are read as properties, any args are read as labels. This can be helpful when reading from a pandas dataframe (see Import from DataFrame rows below).
property_dict = {
'<property1>': '<prop_val1>',
'<property2>': '<prop_val2>'
}
new_node = Node('<LABEL>', <id_property>=<>,
primary_key='<id_property>',
primary_label='<LABEL>',
**property_dict)
Add relationships¶
If
Match Nodes¶
To match nodes, first set up a matcher object:
matcher = NodeMatcher(graph)
Update properties (one property)¶
This runs as a transaction, so don't wrap in transaction. Note that updates to properties occur only locally until pushed using graph.push
.
node = matcher.match('<LABEL>', <id_property>=<>).first()
if node:
node[<property>] = <>
graph.push(node)
Update properties (multiple properties)¶
node = matcher.match('<LABEL>', <id_property>=<>).first()
if node:
node.update(**properties)
graph.push(node)
Delete nodes from list¶
tx = graph.begin()
for node in bad_node_list:
node_matches = matcher.match('<LABEL>', <id_property>=node)
for node in node_matches:
graph.delete(node)
tx.commit()
Query Graph¶
To get data from the graph
results = graph.run("<CYPHER QUERY>") # returns cursor to stream results
for result in results:
# do something
Instead of streaming results, data can be read to a list of dictionaries
results = graph.run(
f"""
MATCH(n:Node)-[]-()
WHERE n.name = "{<name>}"
RETURN n.name, n.prop_1, n.prop_2
"""
).data()
# returns:
[
{'n.name': '', 'n.prop_1': '', 'n.prop_2': ''},
{'n.name': '', 'n.prop_1': '', 'n.prop_2': ''},
...
]
If your graph has spaces in the properties, use indexing:
results = graph.run(
f"""
MATCH(n:Node)-[]-()
WHERE n.name = "{<name>}"
RETURN n['my name'], n['my prop_1'], n['my prop_2']
"""
).data()
If labels have spaces, use backticks
results = graph.run(
f"""
MATCH(n:`My Node`)-[]-()
WHERE n.name = "{<name>}"
RETURN n['my name'], n['my prop_1'], n['my prop_2']
"""
).data()
If you need relationships from one central node to multiple other nodes, use OPTIONAL MATCH:
results = graph.run(
f"""
MATCH(n:Node)-[]-(o:other_node)
WHERE n.name="{<name>}"
OPTIONAL MATCH (o)-[]-(p)
WHERE p:<Label1> OR p:<Label2>
RETURN n.name, o.name, labels(p), p.name
"""
).data()
This query gives you nodes with labels Label1 or Label2 related to node with label node that is connected through other_node. Note that in the above example the identifying property for all additional nodes must be the same, namely name
.
Working with pandas¶
Py2neo integrates well with pandas.
Export to DataFrame¶
Note that nodes may not conform well to pandas expectations, and unexpected errors can occur.
df = graph.run("MATCH (a:Person) RETURN a.name, a.born").to_data_frame()
Import nodes from DataFrame columns¶
If lists of nodes of the same type are stored in DataFrame columns, you can create a unique list from each column and create a node using that list with the label as the column header. Labels may contain spaces.
tx = graph.begin()
for column in df.columns:
node_set = df[column].unique().astype(str) # some data types not supported as labels
for node in node_set:
new_node = Node(column, <id_property>=node,
primary_key='<id_property>',
primary_label=column)
tx.merge(new_node, column, '<id_property>')
tx.commit()
https://stackoverflow.com/questions/45738180/neo4j-create-nodes-and-relationships-from-pandas-dataframe-with-py2neo
Import nodes from DataFrame rows¶
Alternatively, if the nodes correspond to rows and columns are properties (you should find this with tidy datasets), read the df with the primary label as the index column, convert to a dictionary, and iterate through the dictionary items to add nodes. Note that the index column must be unique.
df = pd.read_csv('path/to/data', index_col='<id_property>')
node_dict = df.to_dict('index')
tx = graph.begin()
for node, properties in node_dict.items():
node = Node('<LABEL>', <id_property>=node,
primary_key='<id_property>',
primary_label='<LABEL>',
**properties)
tx.merge(node, '<LABEL>', '<id_property>')
tx.commit()
# needs to be tested
Using **properties
passes the dictionary of properties, where each column is a key and the data in the cell is a value, to the Node object.
Add relationships from DataFrame columns¶
Where nodes are stored in columns, and nodes have already been imported, you can use either df.iterrows()
or convert the DataFrame to a dictionary to relate all nodes in a single row.
from py2neo import NodeMatcher
matcher = NodeMatcher(graph)
df = pd.read_csv('path/to/data', index_col='<id_property>')
entity_dict = df.to_dict()
tx = graph.begin()
for node_label, node_dict in entity_dict.items():
for project_id in entity_dict[node_label]:
project_node = matcher.match('Project',
project_number=project_id).first()
entity_node = matcher.match(
node_label, name=node_dict.get(project_id)).first()
if project_node and entity_node:
relationship = Relationship(project_node, "IN", entity_node)
tx.create(relationship)
tx.commit()
How this works:
The entity_dict
looks like:
{
'Country': {'AID-512-A-00-08-00005': 'Brazil',
'AID-512-A-00-08-00015': 'Brazil',
'AID-512-A-10-00004': 'Brazil',
'AID-512-A-11-00004': 'Brazil',
'AID-512-A-16-00001': 'Brazil'},
'Income Group': {'AID-512-A-00-08-00005': 'Upper Middle Income Country',
'AID-512-A-00-08-00015': 'Upper Middle Income Country',
'AID-512-A-10-00004': 'Upper Middle Income Country',
'AID-512-A-11-00004': 'Upper Middle Income Country',
'AID-512-A-16-00001': 'Upper Middle Income Country'}
}
Each column has the relationship required between the index (in this case a project number) and the node of the type contained in that column. You can create all of the relationships required and then repeat the process by specifying a new index column, if needed.
For each column (node_label), we use the dictionary associated to match each project id and each node. If a match is found for both, we create a relationship. Don't forget to commit the transaction.
You can use another dictionary to specify the label for the relationship if you want to have different relationship labels for different columns. Simply lookup the relationship name in place of "IN".
Using objects¶
from py2neo.ogm import GraphObject, Property
class Person(GraphObject):
name = Property()
born = Property()
[(a.name, a.born) for a in Person.match(graph).limit(3)]
# returns
[('Laurence Fishburne', 1961),
('Hugo Weaving', 1960),
('Lilly Wachowski', 1967)]