SQL Server graph databases |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A drastic difference between SQL Server graph databases and NoSQL graph databases is, SQL Server uses tables for modeling graph structures. Those are tables of two special types. One type of table is used to create nodes, and the other one to create edges (relations between nodes). Let’s note that this tabular representation of nodes and edges allows writing SQL queries to access these tables. The node table describes some entity. The instances of this entity are represented by the table rows and are characterized the same set of attributes (table columns). An edge table defines the type and the direction of a relationship between nodes. Say, a table named ToBeFriends might describe a friendly relationship between instances of the same or different entities. It’s probably time for an example. I suggest we take the educational Painting database and present its relational structure in the form of a graph structure, without any data loss. This will allow us to write queries and compare results for two models presenting the same information in different ways. Thus, we have two types of nodes representing the square and spray can entities, and a relationship between them that can be described as follows: the spray can paints the square. The relationship is directed from the spray can to the square. Let’s create the node tables:
As you can see the tables are created exactly like conventional relational ones, except for the type specification AS NODE. Now let’s look at the structure of the tables we just created:
When a node table is created, in addition to the columns specified by user, two pseudocolumns are created, named graph_id è $node_id. Let’s populate our table with data to find out what is stored in these columns
Here, we just use data already present in the educational sql-ex database.
The graph_id column is missing from the result set. This is because it is used internally by the database kernel and can’t be accessed by the user directly. Indeed, if we run the query
Invalid column name 'graph_id'. The $node_id column holds the unique node identifier presented in JSON format. The hexadecimal column name suffix warranties this name to be globally unique; however, it’s not used to access the column. Here’s an example:
The utvG table is similar to utqG, thus, we won’t display its contents here. Now, let’s create an edge table.
This table contains property columns – time of the painting event (B_datetime) and the amount of paint sprayed from the can (b_vol). It differs from the node tables by its type – now, it’s EDGE, not NODE. Let’s have a look at this table’s structure:
As well as for the nodes, the system automatically creates several pseudocolumns, the following of which are accessible to the user:
Let’s populate this table with data, as well. Keeping in mind the structure of the original utb table, we have to put into the $from_id column the ID of the node contained in the utvG table whose v_id is equal to b_v_id in the utb table. Then, the $to_id should contain the node ID from the utqG table whose q_id, in its turn, corresponds to the value of b_q_id of the same record in the utb table. It’s probably easier for the reader to comprehend the statement inserting the data described above:
Now, we can have a look at the data in the edge table connecting two nodes – the spray can and the square dyed.
Since the width of the resulting table significantly exceeds the page width, let’s present the result in key:value format.
|