dans.blog: Using MSSQL to define distinct cross-column relationships...

Someone asked me this morning how they could determine distinct relationships. They wanted to be able to retrieve all the distinct rows where someone had defined another user as a friend or any users who had defined them as a friend. The table schema looked like this:

friend_user1	friend_user2
1	10
2	6
10	1
6	1
2	1
10	2

As you can see friend 1 is pretty popular amongst other users. However, he only has 1 person defined as friend of his—friend 10. What we need to do is to find all distinct cross-column relationships in the table—which means only one row should be returned for the relationship between friend 1 and friend 10.

From a schema standpoint even though both columns are foreign keys to the same column (a user id,) the columns in this table have different meanings. Column 1 (friend_user1 ) is the user who has friends and Column 2 (friend_user2 ) defines users who relate to Column 1.

In order to get distinct rows, we need to derive a table where both Column 1 and Column 2 have the same meaning. We can do this by creating two queries—one which uses the friend_user1 as the primary key and friend_user2 as the related foreign key and one which uses the friend_user2 as the primary key and friend_user1 as the related foreign key. We then can join the two tables together using the union statement.

The result of this query would look like this:

-- grab column 1 as the primay key and column 2 as the foreign key
select distinct
    friend_user1 as user_key, friend_user2 as friend_key
from
    friend_relationships

union

-- grab column 2 as the primay key and column 1 as the foreign key
select distinct
    friend_user2 as user_key, friend_user1 as friend_key
from
    friend_relationships

What the query above does is flatten the table so that every relationship is defined in both columns—which doubles the size of the table. By restructuring the table data like this, we can now use a where clause to find all cross-column relationships between users.

However, in order to make sure we only get distinct records back, we need to actually do a select distinct against this query to make sure duplicates are removed. One way to do this would be to convert the above query into a View. This is extremely useful if you need to query against this data frequently. Newer version of SQL Server will even allow you to index Views—which can really help your performance querying for this data.

I'm going to show you another method using a derived table. A derived table is a table generated by a SQL statement which you can use as a virtual table. A derived table is temporary in nature and you can use the derived table to do anything a regular table can do—such as joining tables.

In the SQL below we're retrieving all unique friendship relations for friend 1.

select distinct
    user_key, friend_key
from
    /* here's our "derived" table â€“ you could make this a view instead */
    (
        -- grab column 1 as the primay key and column 2 as the foreign key
        select distinct
            friend_user1 as user_key, friend_user2 as friend_key
        from
            friend_relationships
        -- remove this if you're doing this as a view
        where
            friend_user1 = 1

        union

        -- grab column 2 as the primay key and column 1 as the foreign key
        select distinct
            friend_user2 as user_key, friend_user1 as friend_key
        from
            friend_relationships
        -- remove this if you're doing this as a view
        where
            friend_user2 = 1
    ) as derived_table
order by
    friend_key

When you execute the table above you'd get a resultset of:

user_key	friend_key
1	2
1	6
1	10

If you want to test this for yourself, here's a MS SQL script you can run against your database. It uses temp tables to create the table, so you can just copy and paste the SQL into Query Analyzer and run the script. Make sure to change the value of the @find_friend_for variable to see how the resultset changes.

-- declare the user to find friendship relations
declare @find_friend_for int
set @find_friend_for = 1

go

create table #friend_relationships( friend_user1 [int] NOT NULL, friend_user2 [int] NOT NULL )

go

insert into #friend_relationships (friend_user1, friend_user2) values (1,10)
insert into #friend_relationships (friend_user1, friend_user2) values (2,6)
insert into #friend_relationships (friend_user1, friend_user2) values (10,1)
insert into #friend_relationships (friend_user1, friend_user2) values (6,1)
insert into #friend_relationships (friend_user1, friend_user2) values (2,1)
insert into #friend_relationships (friend_user1, friend_user2) values (10,2)

go

select distinct
    user_key, friend_key
from
    /* here's our "derived" table â€“ you could make this a view instead */
    (
        -- grab column 1 as the primay key and column 2 as the foreign key
        select distinct
            friend_user1 as user_key, friend_user2 as friend_key
        from
            #friend_relationships
        -- remove this if you're doing this as a view
        where
            friend_user1 = @find_friend_for

        union

        -- grab column 2 as the primay key and column 1 as the foreign key
        select distinct
            friend_user2 as user_key, friend_user1 as friend_key
        from
            #friend_relationships
        -- remove this if you're doing this as a view
        where
            friend_user2 = @find_friend_for
    ) as derived_table
/*
    this where clause is unneed if using the derived table above
    since records are already filtered.
*/
where
    user_key = @find_friend_for
order by
    friend_key

go

drop table #friend_relationships

go

If you have alternative methods or recommendations for a better way, please make share your comments below!

Categories: SQL, Source Code

3 Comments

Ben Nadel
Feb 15, 2008 @ 10:52 AM

Good solution.
Rick O
Feb 15, 2008 @ 11:29 AM

***NOTE that this is not a real-world solution, I'm just being ornery.***

SELECT DISTINCT
CASE WHEN friend_user1 < friend_user2 THEN friend_user1 ELSE friend_user2 END AS user_key,
CASE WHEN friend_user1 < friend_user2 THEN friend_user2 ELSE friend_user1 END AS friend_key
FROM friend_relationships
WHERE (friend_user1 <> friend_user2)

That should return the same result set.
Dan G. Switzer, II
Feb 15, 2008 @ 5:23 PM

My brother pointed out an error in the first row of the table, so I've fixed it.

Comments for this entry have been disabled.