SQL Server graph databases

SQL Server graph databases

A drastic difference between SQL Server graph databases and NoSQL graph databases is, SQL Server uses tables for modeling graph structures. Those are tables of two special types. One type of table is used to create nodes, and the other one to create edges (relations between nodes). Let’s note that this tabular representation of nodes and edges allows writing SQL queries to access these tables.

The node table describes some entity. The instances of this entity are represented by the table rows and are characterized the same set of attributes (table columns).  An edge table defines the type and the direction of a relationship between nodes. Say, a table named ToBeFriends might describe a friendly relationship between instances of the same or different entities.

It’s probably time for an example. I suggest we take the educational Painting database and present its relational structure in the form of a graph structure, without any data loss. This will allow us to write queries and compare results for two models presenting the same information in different ways.

Thus, we have two types of nodes representing the square and spray can entities, and a relationship between them that can be described as follows: the spray can paints the square. The relationship is directed from the spray can to the square.

Let’s create the node tables:

--Squares
CREATE TABLE utqG (
    q_id INT PRIMARY KEY,
    q_name VARCHAR(35) NOT NULL,
) AS NODE;
--Spray cans
CREATE TABLE utvG (
    v_id INT PRIMARY KEY,
    v_name VARCHAR(35) NOT NULL,
    v_color char(1) not null
) AS NODE;

As you can see the tables are created exactly like conventional relational ones, except for the type specification AS NODE. Now let’s look at the structure of the tables we just created:

select table_name, column_name, data_type
from information_schema.columns
where table_name='utqG';
table_namecolumn_namedata_type
utqGgraph_id_AA21DCF7CB44457BB308B21482806B87bigint
utqG$node_id_EEFDE0FB86F243E4A6667A5CE470F4F6nvarchar
utqGq_idint
utqGq_namevarchar

When a node table is created, in addition to the columns specified by user, two pseudocolumns are created, named graph_id и $node_id. Let’s populate our table with data to find out what is stored in these columns

insert into utqG(q_id, q_name
select * from [sql-ex]..utq;

Here, we just use data already present in the educational sql-ex database.

select top 2 * from utqG;
$node_id_EEFDE0FB86F243E4A6667A5CE470F4F6q_idq_name
{“type”:“node”,“schema”:“dbo”,“table”:“utqG”,“id”:0}1Square # 01
{“type”:“node”,“schema”:“dbo”,“table”:“utqG”,“id”:1}2Square # 02

The graph_id column is missing from the result set. This is because it is used internally by the database kernel and can’t be accessed by the user directly. Indeed, if we run the query

select graph_id from utqG;

it will generate the following error message:

Invalid column name 'graph_id'.

The  $node_id column holds the unique node identifier presented in JSON format. The hexadecimal column name suffix warranties this name to be globally unique; however, it’s not used to access the column. Here’s an example:

select top 2 $node_id, q_id from utqG;
$node_id_EEFDE0FB86F243E4A6667A5CE470F4F6q_id
{“type”:“node”,“schema”:“dbo”,“table”:“utqG”,“id”:0}1
{“type”:“node”,“schema”:“dbo”,“table”:“utqG”,“id”:1}2

The utvG table is similar to utqG, thus, we won’t display its contents here.

Now, let’s create an edge table.

CREATE TABLE utbG (
    b_datetime datetime not null,
    b_vol tinyint NOT NULL,
) AS EDGE;

This table contains property columns – time of the painting event (B_datetime) and the amount of paint sprayed from the can (b_vol). It differs from the node tables by its type – now, it’s EDGE, not NODE. Let’s have a look at this table’s structure:

select column_name, data_type
from information_schema.columns where table_name='utbG';
column_namedata_type
graph_id_9D6121CFD14948A5B03BBD6A4BDB4774bigint
$edge_id_EAB5B85BC07649ED89435D6F2A2ACE83nvarchar
from_obj_id_D0D5B23329B5409895672D5C32283E51int
from_id_526FECFDEBB84E86B2F7440FF625BCF1bigint
$from_id_BF1E69FB306E4225A58F4DAFAA8FBE94nvarchar
to_obj_id_14C309807A224F7BA7BD8F88425A15ABint
to_id_295F2F2BD42A425CAAD6ACC9F2174BC7bigint
$to_id_79504454FC184B7F849B1AB28BCCC670nvarchar
b_datetimedatetime
b_voltinyint

As well as for the nodes, the system automatically creates several pseudocolumns, the following of which are accessible to the user:

  • $edge_id – (automatically formed) edge ID;
  • $from_id – ID of the node the edge originates from;
  • $to_id – ID of the node the edge terminates at.

Let’s populate this table with data, as well. Keeping in mind the structure of the original utb table, we have to put into the $from_id column the ID of the node contained in the utvG table whose v_id is equal to b_v_id in the utb table. Then, the $to_id should contain the node ID from the utqG table whose q_id, in its turn, corresponds to the value of b_q_id  of the same record in the utb table. It’s probably easier for the reader to comprehend the statement inserting the data described above:

insert into utbG(b_datetime, b_vol, $from_id, $to_id)
select b_datetime, b_vol,
(select $node_id from utvG where v_id = orig.b_v_id),
(select $node_id from utqG where q_id = orig.b_q_id)
from [sql-ex]..utb orig

Now, we can have a look at the data in the edge table connecting two nodes – the spray can and the square dyed.

select top 1 * from utbG;

Since the width of the resulting table significantly exceeds the page width, let’s present the result in key:value format.

$edge_id_EAB5B85BC07649ED89435D6F2A2ACE83: {“type”:“edge”,“schema”:“dbo”,“table”:“utbG”,“id”:0}
$from_id_BF1E69FB306E4225A58F4DAFAA8FBE94: {“type”:“node”,“schema”:“dbo”,“table”:“utvG”,“id”:49}
$to_id_79504454FC184B7F849B1AB28BCCC670: {“type”:“node”,“schema”:“dbo”,“table”:“utqG”,“id”:21}
b_datetime: “2000-01-01 01:13:36.000”
b_vol: 50