Using MSSQL to define distinct cross-column relationships...

Posted by Dan on Feb 15, 2008 @ 9:21 AM

Someone asked me this morning how they could determine distinct relationships. They wanted to be able to retrieve all the distinct rows where someone had defined another user as a friend or any users who had defined them as a friend. The table schema looked like this:

friend_user1 friend_user2
1 10
2 6
10 1
6 1
2 1
10 2

As you can see friend 1 is pretty popular amongst other users. However, he only has 1 person defined as friend of his—friend 10. What we need to do is to find all distinct cross-column relationships in the table—which means only one row should be returned for the relationship between friend 1 and friend 10.

From a schema standpoint even though both columns are foreign keys to the same column (a user id,) the columns in this table have different meanings. Column 1 (friend_user1 ) is the user who has friends and Column 2 (friend_user2 ) defines users who relate to Column 1.

In order to get distinct rows, we need to derive a table where both Column 1 and Column 2 have the same meaning. We can do this by creating two queries—one which uses the friend_user1 as the primary key and friend_user2 as the related foreign key and one which uses the friend_user2 as the primary key and friend_user1 as the related foreign key. We then can join the two tables together using the union statement.

The result of this query would look like this:

-- grab column 1 as the primay key and column 2 as the foreign key select distinct friend_user1 as user_key, friend_user2 as friend_key from friend_relationships union -- grab column 2 as the primay key and column 1 as the foreign key select distinct friend_user2 as user_key, friend_user1 as friend_key from friend_relationships

What the query above does is flatten the table so that every relationship is defined in both columns—which doubles the size of the table. By restructuring the table data like this, we can now use a where clause to find all cross-column relationships between users.

However, in order to make sure we only get distinct records back, we need to actually do a select distinct against this query to make sure duplicates are removed. One way to do this would be to convert the above query into a View. This is extremely useful if you need to query against this data frequently. Newer version of SQL Server will even allow you to index Views—which can really help your performance querying for this data.

I'm going to show you another method using a derived table. A derived table is a table generated by a SQL statement which you can use as a virtual table. A derived table is temporary in nature and you can use the derived table to do anything a regular table can do—such as joining tables.

In the SQL below we're retrieving all unique friendship relations for friend 1.

select distinct user_key, friend_key from /* here's our "derived" table – you could make this a view instead */ ( -- grab column 1 as the primay key and column 2 as the foreign key select distinct friend_user1 as user_key, friend_user2 as friend_key from friend_relationships -- remove this if you're doing this as a view where friend_user1 = 1 union -- grab column 2 as the primay key and column 1 as the foreign key select distinct friend_user2 as user_key, friend_user1 as friend_key from friend_relationships -- remove this if you're doing this as a view where friend_user2 = 1 ) as derived_table order by friend_key

When you execute the table above you'd get a resultset of:

user_key friend_key
1 2
1 6
1 10

If you want to test this for yourself, here's a MS SQL script you can run against your database. It uses temp tables to create the table, so you can just copy and paste the SQL into Query Analyzer and run the script. Make sure to change the value of the @find_friend_for variable to see how the resultset changes.

-- declare the user to find friendship relations declare @find_friend_for int set @find_friend_for = 1 go create table #friend_relationships( friend_user1 [int] NOT NULL, friend_user2 [int] NOT NULL ) go insert into #friend_relationships (friend_user1, friend_user2) values (1,10) insert into #friend_relationships (friend_user1, friend_user2) values (2,6) insert into #friend_relationships (friend_user1, friend_user2) values (10,1) insert into #friend_relationships (friend_user1, friend_user2) values (6,1) insert into #friend_relationships (friend_user1, friend_user2) values (2,1) insert into #friend_relationships (friend_user1, friend_user2) values (10,2) go select distinct user_key, friend_key from /* here's our "derived" table – you could make this a view instead */ ( -- grab column 1 as the primay key and column 2 as the foreign key select distinct friend_user1 as user_key, friend_user2 as friend_key from #friend_relationships -- remove this if you're doing this as a view where friend_user1 = @find_friend_for union -- grab column 2 as the primay key and column 1 as the foreign key select distinct friend_user2 as user_key, friend_user1 as friend_key from #friend_relationships -- remove this if you're doing this as a view where friend_user2 = @find_friend_for ) as derived_table /* this where clause is unneed if using the derived table above since records are already filtered. */ where user_key = @find_friend_for order by friend_key go drop table #friend_relationships go

If you have alternative methods or recommendations for a better way, please make share your comments below!

Categories: SQL, Source Code

3 Comments

  • Good solution.
  • ***NOTE that this is not a real-world solution, I'm just being ornery.***

    SELECT DISTINCT
     CASE WHEN friend_user1 < friend_user2 THEN friend_user1 ELSE friend_user2 END AS user_key,
     CASE WHEN friend_user1 < friend_user2 THEN friend_user2 ELSE friend_user1 END AS friend_key
    FROM friend_relationships
    WHERE (friend_user1 <> friend_user2)

    That should return the same result set.
  • My brother pointed out an error in the first row of the table, so I've fixed it.

Comments for this entry have been disabled.