Querying graph database data
The original form of the Utb table can be restored by means of an ordinary SQL query:
select B.b_datetime, Q.q_id b_q_id, V.v_id b_v_id, B.b_vol
from utbG B join utqG Q on B.$to_id = Q.$node_id
join utvG V on B.$from_id = V.$node_id;
However, to make graph navigation easier SQL Server introduces the MATCH function that defines a search pattern for nodes based on their relations. The pattern syntax resembles the one used by Cypher, the Neo4j’s graph query language.
In a simplified form, a pattern definition looks as follows:
< node the edge originates from > - (< edge >) -> < node the edge terminates at >
The node is represented by the table name or alias. The use of aliases becomes indispensable if a table is used in a pattern more than once, e.g. in case of a self-join. The MATCH function patterns can be combined using the AND operator. Let’s continue with some examples. For a start, we rewrite the previous query, making it graph-oriented:
select B.b_datetime, Q.q_id b_q_id, V.v_id b_v_id, B.b_vol
from utbG B, utqG Q, utvG V
where match(V-(B)->Q);
Here, we use aliases for the sake of shortness only. We can rearrange the nodes, but the direction of the edge (from the spray can to the square) has to be retained:
select B.b_datetime, Q.q_id b_q_id, V.v_id b_v_id, B.b_vol
from utbG B, utqG Q, utvG V
where match(Q<-(B)-V);
Unlike in Cypher, the direction of the relationship has to be specified, in other words, the following function call
match(V-(B)-Q);
will result in an error. Cypher is more flexible in this respect; the relationship direction may be omitted if it can be unambiguously identified; say, for a certain kind of relationship only one direction is possible.
Relational schema
Select b_q_id, SUM(b_vol) qty
from utB join utV on B_V_ID =V_ID
where V_COLOR ='R'
group by b_q_id;
[[ column ]] |
---|
[[ value ]] |
Graph-oriented schema
Select q_id, SUM(b_vol) qty
from utbG B, utvG V, utqG Q
where match(V-(B)->Q) AND V_COLOR ='R'
group by q_id;
Relational schema
select q_name from utQ join (
select b_q_id from utB join utV on B_V_ID =V_ID
where V_COLOR ='R'
intersect
select b_q_id from utB join utV on B_V_ID =V_ID
where V_COLOR ='B'
) X on Q_ID=b_q_id;
[[ column ]] |
---|
[[ value ]] |
Graph-oriented schema
select distinct q_name
from utbG B1,utbG B2, utvG VR, utvG VB, utqG Q
where
match(VR-(B1)->Q<-(B2)-VB) AND VR.V_COLOR ='R' AND VB.V_COLOR ='B';
Note that you have to include the corresponding table into the FROM clause for each (generally different) node used in the pattern, as well as the relationship table. If the same node is implied, as the square in our case, we can use it for a bidirectional relationship. The predicate
match(VR-(B1)->Q<-(B2)-VB)
could be rewritten as
match(VR-(B1)->Q) AND match(VB-(B2)->Q)
or as
match(VR-(B1)->Q AND VB-(B2)->Q)
DISTINCT is necessary in these queries, because without it, duplicate records will occur for squares painted by more than one red (and/or blue) spray cans.
Relational schema
The most straightforward approach would be INTERSECTing our previous example with an additional query. However, let’s present a more effective solution that uses the fact the number of distinct colors the square has been painted with is three:
select q_name from utQ join (
select b_q_id
from utB join utV on B_V_ID =V_ID
group by b_q_id
having count(distinct v_color)=3
) X on q_id=b_q_id;
[[ column ]] |
---|
[[ value ]] |
Graph-oriented schema
select distinct Q.q_name
from utbG B1,utbG B2,utbG B3, utvG VR, utvG VB, utvG VG, utqG Q
where
match(VR-(B1)->Q<-(B2)-VB and VG-(B3)->Q)
AND VG.V_COLOR ='G' AND VR.V_COLOR ='R' AND VB.V_COLOR ='B';
Relational schema
select v_name from
utB join utV on B_V_ID =V_ID
group by b_v_id,v_name
having count(distinct b_q_id)>1;
[[ column ]] |
---|
[[ value ]] |
Graph-oriented schema
select distinct v_name
from utbG B1,utbG B2, utvG V, utqG Q,utqG Q2
where
match(Q2<-(B2)-V-(B1)->Q) and Q.q_id <> Q2.q_id;
The condition match(Q2<-(B2)-V-(B1)->Q) states a spray can participated in two paint events, while Q.q_id <> Q2.q_id claims there were different squares involved.
Preliminary conclusions
At first glance, relational and graph-oriented queries don’t seem to be radically different. Moreover, one can omit the graph-oriented syntax completely, keeping within SQL boundaries. However, the examples about aren’t very illustrative, since the advantages of graph databases become apparent for subject areas with complex relationships between the data entities.