Adam Howitt's Blog

Mar 22
2007

When a NULL is not a NULL

I've just come across a little logic flaw in a SQL construct I have relied on for some time.  Needless to say I have some work ahead of me to find places I have used it and fixed it. 

To keep my stored procedures tidy but still allow optional parameters (like CF can deliver by using CFIF around the extra WHERE clause), I have been using this type of thing:

    SELECT *
    FROM any_table
    WHERE field_id = ISNULL(@field_id,field_id)

This code is supposed to work as follows - give me all rows where field_id in the row matches the passed in parameter @field_id but if the @field_id wasn't passed in was passed in as a NULL show all rows. It shows all rows because in all cases the field_id in the row should match itself, correct?
WRONG!  If field_id is actually a NULL value then the row becomes WHERE NULL = NULL.  I just ran some tests and was baffled but then remembered the rules of three valued LOGIC (where there are three possible values of true, false or NULL) - NULL compared to anything is always an unknown value.  This article gives it a good wrap:

After realizing my mistake the code now looks like this to achieve the same effect but correctly:
    SELECT *
    FROM any_table
    WHERE field_id = @field_id OR @field_id IS NULL


Hope that helps someone.

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
[Add Comment] [Subscribe to Comments]
  1. That is interesting. I know this is not the intended point of the post, but I think this is another argument that NULL values are not all that useful in a database.

    See, to me, filed_id, "feels" like something that should always exist, and if it is not "set", then it should just be zero.... now, yes, I know that zero and NULL are not the same thing, but if you live in a world where zero is "not-set" and non-zero is "set" for pretty much everything, then you would never run into issues like this.

    Not meant to start any war here, I just think its funny that I see problems that NULL causes, but I rarely every see problems that "0" causes.

  2. I recently learned from our DBA that the "field = @field or @field is null" construct can be very inefficient. He said that including the "field" (even if it ends up not being used) will force a full table scan.

    Custom-building the SQL with and without the comparison is slightly messier but more efficient.

    I'll ping him tomorrow to make sure I'm not making shit up, or maybe a DBA will read this and elaborate.

  3. @Ben - It depends. The system I'm working in now uses 0 to mean both 0 and NULL. How do you tell when something really has the value of 0 and when it doesn't really have a value but the 0 is just a placeholder? You have to guess based on other fields, which breaks all kinds of normalization rules.

    @Scott - That's not necessarily true. If there's an index on [field], no table scan will be done. But if there's no index then you will do a scan. Generally using = or IS do not cause scans. It's pesky things like LIKE and in some cases BETWEEN that cause table scans. That is, any case where you are working on just a small part of the field and not the whole thing at once.

  4. Sorry, that should read:

    "Generally using = or IS do not *inherently* cause scans. It's pesky things like LIKE and in some cases BETWEEN that *will almost always* cause table scans."

  5. @Rick,

    Yeah, if zero is going to mean more than one thing, it will cause all kinds of problems. However, in my universe, i have lots of tables that have foreign keys. For instance, a [user] record might have a user_type_id which refers to the [user_type] primary key. Now, since my systems all use INT primary keys starting at ONE, I know that any zero foreign key is not-set. It works perfectly. I have never ran into a snag where this would cause issues.

  6. @Rick,

    ps. Someone asked me a rather high-level database question the other day... I had no idea, so I referred them to your website; told them you were a sql-genius ;)

  7. Hi all and thanks for the comments. @ben - the reason for the nulls in the database is that the fields are uniqueidentifiers. I know it's prettier to use ints but the tradeoff is a predictable progressions of ids like userid where it makes it easier to guess someone else's userid, not to mention the tricks around guaranteeing the next int goes to just one person in a highly concurrent system. @scott and @rick - this is of course database specific but for the SQL server I use, the code causes a clustered index scan, not a table scan (per the execution plan). I'm not sure if that is the case in other databases. Here is some example code to recreate the table and then select from the results. Clustered index scans are faster than table scans, especially in tables with more fields because scanning a clustered index requires less pages to be read: CREATE TABLE [dbo].[class_attributes_tb]( [class_attribute_id] [uniqueidentifier] NOT NULL PRIMARY KEY, [rel_type_id] [uniqueidentifier] NULL ) GO SELECT * FROM class_attributes_tb WHERE rel_type_id IS NULL OR rel_type_id = NEWID()

  8. @Adam,

    I think I misscommunicated; I do not mean to say that pkey fields (unique identifiers) should have zeros if not set. Actually, I don't think they should have NULLs as well. What I mean is that foreign keys should have zeros if they are "not set". Foreign keys are generally not unique and therefore there is no need for predicitability.

  9. @Adam:

    You should be able to use the COALESCE() function in your first example. COALESCE() returns the first *non* null result, so if @FieldId was null, then field itself should be used:

    SELECT * FROM any_table WHERE field_id = COALESCE(@field_id,field_id)

[Add Comment]