Easily manage one-to-many data in your SQL database without using multiple INSERT INTO statements

Posted by Dan on May 18, 2011 @ 11:16 AM

A task that most web developers have faced at one point or another is how to cleanly manage updating data in a one-to-many relationship in a relational database. For example, let's say that you have schema looks something like this:

image

The User table holds the information on all your Users, the UserRole table holds all the roles of each User and the Role table contains a list of valid roles that a user can hold. This type of schema gives us a ton of flexibility, but management of the UserRole table can often be tricky. The most common method I've seen developers use to maintain a table like this is to delete all the data in the UserRole table for a user and use a loop to do an insert on each role that needs to be assigned. This leads to SQL that looks something like:

delete
from
  UserRole
where
  UserId = 1

insert into
  UserRole
(
  UserId, RoleId
) values (
  1, 1
)

insert into
  UserRole
(
  UserId, RoleId
) values (
  1, 2
)

insert into
  UserRole
(
  UserId, RoleId
) values (
  1, 3
)

While this works, it has several issues:

  1. If there are many roles being modified, this can lead to a very lengthy SQL statement and/or many database transactions (depending on how your code is implemented.)
  2. There is no inherit constraint to ensure that an invalid role does not get inserted. If your database has foreign key relationships defined between the UserRole.RoleId and Role.RoleId columns your database would throw an error, but wouldn't it be better just to ignore invalid values?
  3. Since you are deleting all the roles on each update, your assigning new UserRoleId to roles that may not have changed. This can lead to more work on your database's side for managing indexes, etc.

Wouldn't it be nicer if you could just supply a list of the RoleIds you want to update instead of doing a bunch of individual INSERT INTO statements?

Well there is actually a very nice way to manage this that should work in any modern database. The trick is to use an INSERT INTO with SELECT statement to manage your data. The basic idea is instead of doing a bunch of individual INSERT INTO statements, we can write a single SQL statement that looks like this:

insert into
  UserRole
(
  UserId, RoleId
)

select
  1 as UserId
  , RoleId
from
  Role
where
  RoleId in (1,2,3)

What's nice about this code is work's around our first two issues by reducing the SQL statement and ensures that only valid Roles will actually get inserted into the the UserRole table—since we're selecting the valid rows right from the Role table.

But how do we solve issue #3, so that we don't alter roles that haven't changed? The trick is to only delete the roles not in our new list of values, then modify our insert to only insert rows not already assigned to us. So our complete code now looks like this:

delete
from
  UserRole
where
  UserId = 1
-- IF WE HAVE NO ROLES TO ASSIGN, DO NOT RUN ANY CODE BELOW THIS LINE
-- this deletes only rows not in our new selection
and
  RoleId not in (1,2,3)

insert into
  UserRole
(
  UserId, RoleId
)

select
  1 as UserId
  , RoleId
from
  Role
where
  RoleId in (1,2,3)
-- only grab roles not already inserted into the database
and
  RoleId not in (
    select
      RoleId
    from
      UserRole
    where
      UserId = 1
  )

This new code gives us 2 simple SQL statements (that can be run in a single transaction) which will now remove any roles that are no longer assigned to a user and only insert the new rows that need to be added. We're not altering data that hasn't changed.

Coming shortly, I'll post a follow up article that's specific to SQL Server 2005 (and above) on how you can modify the above example to return a query that will contains an audit of all the changes applied to the database.

Categories: SQL, HTML/ColdFusion

4 Comments

  • Dan, I really like the fact that your method removes the need for cfloop completely, but it needn't be used to generate multiple INSERTs. In MySQL at least, you can have just a single statement with multiple rows in the VALUES clause, created by cfloop:

    INSERT INTO UserRole(
    UserID,RoleID
    )
    VALUES(
    1,1
    ,1,2
    ,1,3
    )

    I think I agree that only deleting and inserting items that need to be is better than deleting all and starting over, but I wonder if it is really faster given that the extra conditional clauses will add overhead, particularly the sub-query in your INSERT.

    I do love the fact that your solution is CFLOOP-free though. Thanks for sharing.
  • @Julian:

    The other issue that you solve using my technique is that you're also follow your database constraints. It's going to be impossible to insert an invalid foreign key into your table.

    While there is some overhead, it's generally extremely negligible. The table holding the chooses (in this example, the "Role" table) usually ends up being a small table that doesn't involve much overhead.

    As with anything, not every solution is the optimal in every situation. However, in my experience this technique works really well and I love that it prevents "dirty" entries (i.e. invalid foreign keys.) Even if your database has constraints in place, you still run into concurrency issues where the UI the user was posting changes to became invalid because of a change someone else made, so this just enforces everything when writing to the the db.

    Lastly, as a non-MySQL user, I was unaware of the multiple row insert syntax, so that's a nice tip!
  • Dan, yes I forgot to mention that I also like the "validation" element of your technique.

    Again possibly MySQL only, but I frequently use the INSERT IGNORE command so that if a constraint is violated, the insert is silently skipped, but your method gives you "belt and braces" (US=suspenders) and will protect you even without keys in place.

    Overall this strikes me as a good approach and I'm going to try it out. Thanks again.
  • I have a lot of these one-to-manys in my code - very clever Dan!

Comments for this entry have been disabled.