loading..
Ðóññêèé    English
19:05

SQL Server graph databases

A drastic difference between SQL Server graph databases and NoSQL graph databases is, SQL Server uses tables for modeling graph structures. Those are tables of two special types. One type of table is used to create nodes, and the other one to create edges (relations between nodes). Let’s note that this tabular representation of nodes and edges allows writing SQL queries to access these tables.

The node table describes some entity. The instances of this entity are represented by the table rows and are characterized the same set of attributes (table columns).  An edge table defines the type and the direction of a relationship between nodes. Say, a table named ToBeFriends might describe a friendly relationship between instances of the same or different entities.

It’s probably time for an example. I suggest we take the educational Painting database and present its relational structure in the form of a graph structure, without any data loss. This will allow us to write queries and compare results for two models presenting the same information in different ways.

Thus, we have two types of nodes representing the square and spray can entities, and a relationship between them that can be described as follows: the spray can paints the square. The relationship is directed from the spray can to the square.

Let’s create the node tables:

  1. --Squares
  2. CREATE TABLE utqG (
  3.     q_id INT PRIMARY KEY,
  4.     q_name VARCHAR(35) NOT NULL,
  5. ) AS NODE;
  6. --Spray cans
  7. CREATE TABLE utvG (
  8.     v_id INT PRIMARY KEY,
  9.     v_name VARCHAR(35) NOT NULL,
  10.     v_color char(1) NOT NULL
  11. ) AS NODE;

As you can see the tables are created exactly like conventional relational ones, except for the type specification AS NODE. Now let’s look at the structure of the tables we just created:

  1. SELECT table_name, column_name, data_type
  2. FROM information_schema.COLUMNS
  3. WHERE table_name='utqG';

table_name    column_name    data_type
utqG    graph_id_AA21DCF7CB44457BB308B21482806B87    bigint
utqG    $node_id_EEFDE0FB86F243E4A6667A5CE470F4F6    nvarchar
utqG    q_id   int
utqG    q_name    varchar

When a node table is created, in addition to the columns specified by user, two pseudocolumns are created, named graph_id è $node_id. Let’s populate our table with data to find out what is stored in these columns

  1. INSERT INTO utqG(q_id, q_name
  2. SELECT * FROM [sql-ex]..utq;

Here, we just use data already present in the educational sql-ex database.

  1. SELECT TOP 2 * FROM utqG;

$node_id_EEFDE0FB86F243E4A6667A5CE470F4F6    q_id    q_name
{"type":"node","schema":"dbo","table":"utqG","id":0}    1    Square # 01
{"type":"node","schema":"dbo","table":"utqG","id":1}    2    Square # 02

The graph_id column is missing from the result set. This is because it is used internally by the database kernel and can’t be accessed by the user directly. Indeed, if we run the query

  1. SELECT graph_id FROM utqG;
it will generate the following error message:

Invalid column name 'graph_id'.

The  $node_id column holds the unique node identifier presented in JSON format. The hexadecimal column name suffix warranties this name to be globally unique; however, it’s not used to access the column. Here’s an example:

  1. SELECT TOP 2 $node_id, q_id FROM utqG;

$node_id_EEFDE0FB86F243E4A6667A5CE470F4F6    q_id
{"type":"node","schema":"dbo","table":"utqG","id":0}    1
{"type":"node","schema":"dbo","table":"utqG","id":1}    2

The utvG table is similar to utqG, thus, we won’t display its contents here.

Now, let’s create an edge table.

  1. CREATE TABLE utbG (
  2.     b_datetime datetime NOT NULL,
  3.     b_vol tinyint NOT NULL,
  4. ) AS EDGE;

This table contains property columns – time of the painting event (B_datetime) and the amount of paint sprayed from the can (b_vol). It differs from the node tables by its type – now, it’s EDGE, not NODE. Let’s have a look at this table’s structure:

  1. SELECT column_name, data_type
  2. FROM information_schema.COLUMNS WHERE table_name='utbG';

column_name    data_type
graph_id_9D6121CFD14948A5B03BBD6A4BDB4774    bigint
$edge_id_EAB5B85BC07649ED89435D6F2A2ACE83    nvarchar
from_obj_id_D0D5B23329B5409895672D5C32283E51    int
from_id_526FECFDEBB84E86B2F7440FF625BCF1    bigint
$from_id_BF1E69FB306E4225A58F4DAFAA8FBE94    nvarchar
to_obj_id_14C309807A224F7BA7BD8F88425A15AB    int
to_id_295F2F2BD42A425CAAD6ACC9F2174BC7    bigint
$to_id_79504454FC184B7F849B1AB28BCCC670    nvarchar
b_datetime    datetime
b_vol    tinyint

As well as for the nodes, the system automatically creates several pseudocolumns, the following of which are accessible to the user:

  • $edge_id – (automatically formed) edge ID;
  • $from_id – ID of the node the edge originates from;
  • $to_id – ID of the node the edge terminates at.

Let’s populate this table with data, as well. Keeping in mind the structure of the original utb table, we have to put into the $from_id column the ID of the node contained in the utvG table whose v_id is equal to b_v_id in the utb table. Then, the $to_id should contain the node ID from the utqG table whose q_id, in its turn, corresponds to the value of b_q_id  of the same record in the utb table. It’s probably easier for the reader to comprehend the statement inserting the data described above:

  1. INSERT INTO utbG(b_datetime, b_vol, $from_id, $to_id)
  2. SELECT b_datetime, b_vol,
  3. (SELECT $node_id FROM utvG WHERE v_id = orig.b_v_id),
  4. (SELECT $node_id FROM utqG WHERE q_id = orig.b_q_id)
  5. FROM [sql-ex]..utb orig

Now, we can have a look at the data in the edge table connecting two nodes – the spray can and the square dyed.

  1. SELECT TOP 1 * FROM utbG;

Since the width of the resulting table significantly exceeds the page width, let’s present the result in key:value format.

$edge_id_EAB5B85BC07649ED89435D6F2A2ACE83: {"type":"edge","schema":"dbo","table":"utbG","id":0}
$from_id_BF1E69FB306E4225A58F4DAFAA8FBE94: {"type":"node","schema":"dbo","table":"utvG","id":49}
$to_id_79504454FC184B7F849B1AB28BCCC670: {"type":"node","schema":"dbo","table":"utqG","id":21}
b_datetime: "2000-01-01 01:13:36.000"
b_vol: 50

Tags
aggregate functions Airport ALL AND AS keyword ASCII AVG Battles Bezhaev Bismarck C.J.Date calculated columns Cartesian product CASE cast CHAR CHARINDEX Chebykin check constraint classes COALESCE common table expressions comparison predicates Computer firm CONSTRAINT CONVERT correlated subqueries COUNT CROSS APPLY CTE data type conversion data types database schema DATEADD DATEDIFF DATENAME DATEPART DATETIME date_time functions DDL DEFAULT DEFAULT VALUES DELETE DISTINCT DML duplicates edge equi-join EXCEPT exercise (-2) More tags
The book was updated
several days ago
©SQL-EX,2008 [Evolution] [Feedback] [About] [Links] [Team]
All right reserved.