SQL Server graph databases
A drastic difference between SQL Server graph databases and NoSQL graph databases is, SQL Server uses tables for modeling graph structures. Those are tables of two special types. One type of table is used to create nodes, and the other one to create edges (relations between nodes). Let’s note that this tabular representation of nodes and edges allows writing SQL queries to access these tables.
The node table describes some entity. The instances of this entity are represented by the table rows and are characterized the same set of attributes (table columns). An edge table defines the type and the direction of a relationship between nodes. Say, a table named ToBeFriends might describe a friendly relationship between instances of the same or different entities.
It’s probably time for an example. I suggest we take the educational Painting database and present its relational structure in the form of a graph structure, without any data loss. This will allow us to write queries and compare results for two models presenting the same information in different ways.
Thus, we have two types of nodes representing the square and spray can entities, and a relationship between them that can be described as follows: the spray can paints the square. The relationship is directed from the spray can to the square.
Let’s create the node tables:
--Squares
CREATE TABLE utqG (
q_id INT PRIMARY KEY,
q_name VARCHAR(35) NOT NULL,
) AS NODE;
--Spray cans
CREATE TABLE utvG (
v_id INT PRIMARY KEY,
v_name VARCHAR(35) NOT NULL,
v_color char(1) not null
) AS NODE;
As you can see the tables are created exactly like conventional relational ones, except for the type specification AS NODE. Now let’s look at the structure of the tables we just created:
select table_name, column_name, data_type
from information_schema.columns
where table_name='utqG';
table_name | column_name | data_type |
---|---|---|
utqG | graph_id_AA21DCF7CB44457BB308B21482806B87 | bigint |
utqG | $node_id_EEFDE0FB86F243E4A6667A5CE470F4F6 | nvarchar |
utqG | q_id | int |
utqG | q_name | varchar |
When a node table is created, in addition to the columns specified by user, two pseudocolumns are created, named graph_id и $node_id. Let’s populate our table with data to find out what is stored in these columns
insert into utqG(q_id, q_name
select * from [sql-ex]..utq;
Here, we just use data already present in the educational sql-ex database.
select top 2 * from utqG;
$node_id_EEFDE0FB86F243E4A6667A5CE470F4F6 | q_id | q_name |
---|---|---|
{“type”:“node”,“schema”:“dbo”,“table”:“utqG”,“id”:0} | 1 | Square # 01 |
{“type”:“node”,“schema”:“dbo”,“table”:“utqG”,“id”:1} | 2 | Square # 02 |
The graph_id column is missing from the result set. This is because it is used internally by the database kernel and can’t be accessed by the user directly. Indeed, if we run the query
select graph_id from utqG;
it will generate the following error message:
The $node_id column holds the unique node identifier presented in JSON format. The hexadecimal column name suffix warranties this name to be globally unique; however, it’s not used to access the column. Here’s an example:
select top 2 $node_id, q_id from utqG;
$node_id_EEFDE0FB86F243E4A6667A5CE470F4F6 | q_id |
---|---|
{“type”:“node”,“schema”:“dbo”,“table”:“utqG”,“id”:0} | 1 |
{“type”:“node”,“schema”:“dbo”,“table”:“utqG”,“id”:1} | 2 |
The utvG table is similar to utqG, thus, we won’t display its contents here.
Now, let’s create an edge table.
CREATE TABLE utbG (
b_datetime datetime not null,
b_vol tinyint NOT NULL,
) AS EDGE;
This table contains property columns – time of the painting event (B_datetime) and the amount of paint sprayed from the can (b_vol). It differs from the node tables by its type – now, it’s EDGE, not NODE. Let’s have a look at this table’s structure:
select column_name, data_type
from information_schema.columns where table_name='utbG';
column_name | data_type |
---|---|
graph_id_9D6121CFD14948A5B03BBD6A4BDB4774 | bigint |
$edge_id_EAB5B85BC07649ED89435D6F2A2ACE83 | nvarchar |
from_obj_id_D0D5B23329B5409895672D5C32283E51 | int |
from_id_526FECFDEBB84E86B2F7440FF625BCF1 | bigint |
$from_id_BF1E69FB306E4225A58F4DAFAA8FBE94 | nvarchar |
to_obj_id_14C309807A224F7BA7BD8F88425A15AB | int |
to_id_295F2F2BD42A425CAAD6ACC9F2174BC7 | bigint |
$to_id_79504454FC184B7F849B1AB28BCCC670 | nvarchar |
b_datetime | datetime |
b_vol | tinyint |
As well as for the nodes, the system automatically creates several pseudocolumns, the following of which are accessible to the user:
- $edge_id – (automatically formed) edge ID;
- $from_id – ID of the node the edge originates from;
- $to_id – ID of the node the edge terminates at.
Let’s populate this table with data, as well. Keeping in mind the structure of the original utb table, we have to put into the $from_id column the ID of the node contained in the utvG table whose v_id is equal to b_v_id in the utb table. Then, the $to_id should contain the node ID from the utqG table whose q_id, in its turn, corresponds to the value of b_q_id of the same record in the utb table. It’s probably easier for the reader to comprehend the statement inserting the data described above:
insert into utbG(b_datetime, b_vol, $from_id, $to_id)
select b_datetime, b_vol,
(select $node_id from utvG where v_id = orig.b_v_id),
(select $node_id from utqG where q_id = orig.b_q_id)
from [sql-ex]..utb orig
Now, we can have a look at the data in the edge table connecting two nodes – the spray can and the square dyed.
select top 1 * from utbG;
Since the width of the resulting table significantly exceeds the page width, let’s present the result in key:value format.
$edge_id_EAB5B85BC07649ED89435D6F2A2ACE83: {“type”:“edge”,“schema”:“dbo”,“table”:“utbG”,“id”:0} |
$from_id_BF1E69FB306E4225A58F4DAFAA8FBE94: {“type”:“node”,“schema”:“dbo”,“table”:“utvG”,“id”:49} |
$to_id_79504454FC184B7F849B1AB28BCCC670: {“type”:“node”,“schema”:“dbo”,“table”:“utqG”,“id”:21} |
b_datetime: “2000-01-01 01:13:36.000” |
b_vol: 50 |