loading..
Русский    English
17:25

LEN() function, trailing spaces and uniqueness

Recently, I was faced with the fact that I could not add the two values of VARCHAR type, which differ only in trailing spaces, in the column of a composite primary key (SQL Server 2008). Perhaps for some people is obvious, but it seemed strange that, in principle, different values are considered as duplicates. For values of CHAR(n) type, which has a fixed length, all is clear, because short strings padded with spaces to length n. Therefore, input values, which differ only in trailing spaces, are indistinguishable. But trailing space in value of a VARCHAR type is like a normal character. Here's a simple experiment.

  1. CREATE TABLE Test_Trailing_Space2
  2. ( num int NOT NULL, name VARCHAR(10) NOT NULL,
  3. PRIMARY KEY(num, name) );
  4. GO
  5. INSERT INTO Test_Trailing_Space2 VALUES(1, 'John');
  6. INSERT INTO Test_Trailing_Space2 VALUES(1, 'John ');
  7. GO

The second row will not be inserted into the table, while you receive a message on the primary key constraint violation. Ie second row is considered as a duplicate of the first. May be the fact is that when you insert, a trailing space was cut off. But no, insert another unique string with trailing space and check trailing space existence:

  1. INSERT INTO Test_Trailing_Space2 VALUES(2, 'John ');
  2. GO
  3. SELECT *, LEN(name) len1,DATALENGTH(name) len2
  4. FROM Test_Trailing_Space2;

Result:

num name len1 len2
1 John 4 4
2 John 4 5

The value in the len2 column shows that a space exists in the data, but, nevertheless, the values of 'John' and 'John ' are duplicates:

  1. SELECT DISTINCT name
  2. FROM Test_Trailing_Space2;

name
John

It is obvious that the whole matter in the LEN() function, which, as we know, does ignore trailing spaces. I did not find this information in BOL, but apparently, it is this function is used when comparing the values of any string types. I was curious about the behavior of other databases in this case, and I repeated the experiment for MySQL and PostgreSQL. Following results were obtained.

MySQL (version 5.0)

  1. SELECT *, OCTET_LENGTH(name) AS len1, LENGTH(name) AS len2
  2. FROM Test_Trailing_Space2;


1 John 4 4
2 John 5 5


  1. SELECT DISTINCT name
  2. FROM Test_Trailing_Space2;

John

PostgreSQL (version 8.3.6)

  1. SELECT *, OCTET_LENGTH(name) AS len1, LENGTH(name) AS len2
  2. FROM Test_Trailing_Space2;


1 John 4 4
2 John 5 5


  1. SELECT DISTINCT name
  2. FROM Test_Trailing_Space2;


"John"
"John "

As can be seen, MySQL, and PostgreSQL take into account the space in the number of characters, and in the number of bytes used to store the string value. Unlike PostgreSQL, MySQL and SQL Server consider the rows, which differ only in trailing spaces, duplicates. Therefore, PostgreSQL allows you to insert a row in the tables:

  1. INSERT INTO Test_Trailing_Space2 VALUES(1, 'John ')

Instead of conclusions

I am far from being able to say on the correctness of any implementation and, moreover, argue about which DBMS is better. I believe that you need to know closely the other DBMS you are using in their professional activities. Study the documentation and all subject to the tests rather than relying on intuition and common sense.

Suggested exercises: 131

Tags
aggregate functions Airport ALL AND AS keyword ASCII AVG Battles Bezhaev Bismarck C.J.Date calculated columns Cartesian product CASE cast CHAR CHARINDEX Chebykin check constraint classes COALESCE common table expressions comparison predicates Computer firm CONSTRAINT CONVERT correlated subqueries COUNT CROSS APPLY CTE data type conversion data types database schema DATEADD DATEDIFF DATENAME DATEPART DATETIME date_time functions DDL DEFAULT DEFAULT VALUES DELETE DISTINCT DML duplicates edge equi-join EXCEPT exercise (-2) More tags
The book was updated
several days ago
©SQL-EX,2008 [Evolution] [Feedback] [About] [Links] [Team]
All right reserved.