Querying graph database data

Querying graph database data

The original form of the Utb table can be restored by means of an ordinary SQL query:

select B.b_datetime, Q.q_id b_q_id, V.v_id b_v_id, B.b_vol
from utbG B join utqG Q on B.$to_id = Q.$node_id
join utvG V on B.$from_id = V.$node_id;

However, to make graph navigation easier SQL Server introduces the MATCH function that defines a search pattern for nodes based on their relations. The pattern syntax resembles the one used by Cypher, the Neo4j’s graph query language.

In a simplified form, a pattern definition looks as follows:

< node the edge originates from > - (< edge >) -> < node the edge terminates at >

The node is represented by the table name or alias. The use of aliases becomes indispensable if a table is used in a pattern more than once, e.g. in case of a self-join. The MATCH function patterns can be combined using the AND operator. Let’s continue with some examples. For a start, we rewrite the previous query, making it graph-oriented:

select B.b_datetime, Q.q_id b_q_id, V.v_id b_v_id, B.b_vol
from utbG B, utqG Q, utvG V
where match(V-(B)->Q);

Here, we use aliases for the sake of shortness only. We can rearrange the nodes, but the direction of the edge (from the spray can to the square) has to be retained:

select B.b_datetime, Q.q_id b_q_id, V.v_id b_v_id, B.b_vol
from utbG B, utqG Q, utvG V
where match(Q<-(B)-V);

Unlike in Cypher, the direction of the relationship has to be specified, in other words, the following function call

match(V-(B)-Q);

will result in an error. Cypher is more flexible in this respect; the relationship direction may be omitted if it can be unambiguously identified; say, for a certain kind of relationship only one direction is possible.

Get the squares dyed with red paint. Display the square ID and the amount of red paint.

Relational schema

Select b_q_id, SUM(b_vol) qty
from utB join utV on B_V_ID =V_ID
where V_COLOR ='R'
group by b_q_id;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Graph-oriented schema

Select q_id, SUM(b_vol) qty
from utbG B, utvG V, utqG Q
where match(V-(B)->Q) AND V_COLOR ='R'
group by q_id;
Get the squares dyed both with red and blue paint. Display the names of the squares.

Relational schema

select q_name from utQ join (
select b_q_id from utB join utV on B_V_ID =V_ID
where V_COLOR ='R'
intersect
select b_q_id from utB join utV on B_V_ID =V_ID
where V_COLOR ='B'
) X on Q_ID=b_q_id;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Graph-oriented schema

select distinct q_name
from utbG B1,utbG B2, utvG VR, utvG VB, utqG Q
where
match(VR-(B1)->Q<-(B2)-VB) AND VR.V_COLOR ='R' AND VB.V_COLOR ='B';

Note that you have to include the corresponding table into the FROM clause for each (generally different) node used in the pattern, as well as the relationship table. If the same node is implied, as the square in our case, we can use it for a bidirectional relationship. The predicate

match(VR-(B1)->Q<-(B2)-VB)

could be rewritten as

match(VR-(B1)->Q) AND match(VB-(B2)->Q)

or as

match(VR-(B1)->Q AND VB-(B2)->Q)

DISTINCT is necessary in these queries, because without it, duplicate records will occur for squares painted by more than one red (and/or blue) spray cans.

Get squares dyed with all three colors.

Relational schema

The most straightforward approach would be INTERSECTing our previous example with an additional query. However, let’s present a more effective solution that uses the fact the number of distinct colors the square has been painted with is three:

select q_name from utQ join (
select b_q_id
from utB join utV on B_V_ID =V_ID
group by b_q_id
having count(distinct v_color)=3
) X on q_id=b_q_id;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Graph-oriented schema

select distinct Q.q_name
from utbG B1,utbG B2,utbG B3, utvG VR, utvG VB, utvG VG, utqG Q
where
match(VR-(B1)->Q<-(B2)-VB and VG-(B3)->Q)
AND VG.V_COLOR ='G' AND VR.V_COLOR ='R' AND VB.V_COLOR ='B';
Get the spray cans used for painting more than one square.

Relational schema

select v_name from
utB join utV on B_V_ID =V_ID
group by b_v_id,v_name
having count(distinct b_q_id)>1;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Graph-oriented schema

select distinct v_name
from utbG B1,utbG B2, utvG V, utqG Q,utqG Q2
where
match(Q2<-(B2)-V-(B1)->Q) and Q.q_id <> Q2.q_id;

The condition match(Q2<-(B2)-V-(B1)->Q) states a spray can participated in two paint events, while Q.q_id <> Q2.q_id claims there were different squares involved.

Preliminary conclusions

At first glance, relational and graph-oriented queries don’t seem to be radically different. Moreover, one can omit the graph-oriented syntax completely, keeping within SQL boundaries. However, the examples about aren’t very illustrative, since the advantages of graph databases become apparent for subject areas with complex relationships between the data entities.