SQL Server graph databases

A drastic difference between SQL Server graph databases and NoSQL graph databases is, SQL Server uses tables for modeling graph structures. Those are tables of two special types. One type of table is used to create nodes, and the other one to create edges (relations between nodes). Lets note that this tabular representation of nodes and edges allows writing SQL queries to access these tables.

The node table describes some entity. The instances of this entity are represented by the table rows and are characterized the same set of attributes (table columns).  An edge table defines the type and the direction of a relationship between nodes. Say, a table named ToBeFriends might describe a friendly relationship between instances of the same or different entities.

Its probably time for an example. I suggest we take the educational Painting database and present its relational structure in the form of a graph structure, without any data loss. This will allow us to write queries and compare results for two models presenting the same information in different ways.

Thus, we have two types of nodes representing the square and spray can entities, and a relationship between them that can be described as follows: the spray can paints the square. The relationship is directed from the spray can to the square.

Lets create the node tables:

  1. --Squares
  2. CREATE TABLE utqG (
  3. q_id INT PRIMARY KEY,
  4. q_name VARCHAR(35) NOT NULL,
  5. ) AS NODE;
  6. --Spray cans
  7. CREATE TABLE utvG (
  8. v_id INT PRIMARY KEY,
  9. v_name VARCHAR(35) NOT NULL,
  10. v_color char(1) NOT NULL
  11. ) AS NODE;

As you can see the tables are created exactly like conventional relational ones, except for the type specification AS NODE. Now lets look at the structure of the tables we just created:

  1. SELECT table_name, column_name, data_type
  2. FROM information_schema.COLUMNS
  3. WHERE table_name='utqG';

table_name column_name data_type
utqG graph_id_AA21DCF7CB44457BB308B21482806B87 bigint
utqG $node_id_EEFDE0FB86F243E4A6667A5CE470F4F6 nvarchar
utqG q_id int
utqG q_name varchar

When a node table is created, in addition to the columns specified by user, two pseudocolumns are created, named graph_id  $node_id. Lets populate our table with data to find out what is stored in these columns

  1. INSERT INTO utqG(q_id, q_name
  2. SELECT * FROM [sql-ex]..utq;

Here, we just use data already present in the educational sql-ex database.

  1. SELECT TOP 2 * FROM utqG;

$node_id_EEFDE0FB86F243E4A6667A5CE470F4F6 q_id q_name
{"type":"node","schema":"dbo","table":"utqG","id":0} 1 Square # 01
{"type":"node","schema":"dbo","table":"utqG","id":1} 2 Square # 02

The graph_id column is missing from the result set. This is because it is used internally by the database kernel and cant be accessed by the user directly. Indeed, if we run the query

  1. SELECT graph_id FROM utqG;
it will generate the following error message:

Invalid column name 'graph_id'.

The  $node_id column holds the unique node identifier presented in JSON format. The hexadecimal column name suffix warranties this name to be globally unique; however, its not used to access the column. Heres an example:

  1. SELECT TOP 2 $node_id, q_id FROM utqG;

$node_id_EEFDE0FB86F243E4A6667A5CE470F4F6 q_id
{"type":"node","schema":"dbo","table":"utqG","id":0} 1
{"type":"node","schema":"dbo","table":"utqG","id":1} 2

The utvG table is similar to utqG, thus, we wont display its contents here.

Now, lets create an edge table.

  1. CREATE TABLE utbG (
  2. b_datetime datetime NOT NULL,
  3. b_vol tinyint NOT NULL,
  4. ) AS EDGE;

This table contains property columns time of the painting event (B_datetime) and the amount of paint sprayed from the can (b_vol). It differs from the node tables by its type now, its EDGE, not NODE. Lets have a look at this tables structure:

  1. SELECT column_name, data_type
  2. FROM information_schema.COLUMNS WHERE table_name='utbG';

column_name data_type
graph_id_9D6121CFD14948A5B03BBD6A4BDB4774 bigint
$edge_id_EAB5B85BC07649ED89435D6F2A2ACE83 nvarchar
from_obj_id_D0D5B23329B5409895672D5C32283E51 int
from_id_526FECFDEBB84E86B2F7440FF625BCF1 bigint
$from_id_BF1E69FB306E4225A58F4DAFAA8FBE94 nvarchar
to_obj_id_14C309807A224F7BA7BD8F88425A15AB int
to_id_295F2F2BD42A425CAAD6ACC9F2174BC7 bigint
$to_id_79504454FC184B7F849B1AB28BCCC670 nvarchar
b_datetime datetime
b_vol tinyint

As well as for the nodes, the system automatically creates several pseudocolumns, the following of which are accessible to the user:

  • $edge_id (automatically formed) edge ID;
  • $from_id ID of the node the edge originates from;
  • $to_id ID of the node the edge terminates at.

Lets populate this table with data, as well. Keeping in mind the structure of the original utb table, we have to put into the $from_id column the ID of the node contained in the utvG table whose v_id is equal to b_v_id in the utb table. Then, the $to_id should contain the node ID from the utqG table whose q_id, in its turn, corresponds to the value of b_q_id  of the same record in the utb table. Its probably easier for the reader to comprehend the statement inserting the data described above:

  1. INSERT INTO utbG(b_datetime, b_vol, $from_id, $to_id)
  2. SELECT b_datetime, b_vol,
  3. (SELECT $node_id FROM utvG WHERE v_id = orig.b_v_id),
  4. (SELECT $node_id FROM utqG WHERE q_id = orig.b_q_id)
  5. FROM [sql-ex]..utb orig

Now, we can have a look at the data in the edge table connecting two nodes the spray can and the square dyed.

  1. SELECT TOP 1 * FROM utbG;

Since the width of the resulting table significantly exceeds the page width, lets present the result in key:value format.

$edge_id_EAB5B85BC07649ED89435D6F2A2ACE83: {"type":"edge","schema":"dbo","table":"utbG","id":0}
$from_id_BF1E69FB306E4225A58F4DAFAA8FBE94: {"type":"node","schema":"dbo","table":"utvG","id":49}
$to_id_79504454FC184B7F849B1AB28BCCC670: {"type":"node","schema":"dbo","table":"utqG","id":21}
b_datetime: "2000-01-01 01:13:36.000"
b_vol: 50

Bookmark and Share
aggregate functions Airport ALL AND AS keyword ASCII AVG Battles Bezhaev Bismarck C.J.Date calculated columns Cartesian product CASE cast CHAR CHARINDEX Chebykin check constraint classes COALESCE common table expressions comparison predicates Computer firm CONSTRAINT CONVERT correlated subqueries COUNT CROSS APPLY CTE data type conversion data types database schema date/time functions DATEADD DATEDIFF DATENAME DATEPART DATETIME DDL DEFAULT DEFAULT VALUES DELETE DISTINCT DML duplicates edge equi-join EXCEPT exercise (-2) More tags
The book was updated
several days ago
©SQL-EX,2008 [Evolution] [Feedback] [About] [Links] [Team]
All right reserved.
Rambler's Top100