Code Answer: Allowing nulls vs default values

I'm working on an ASP.NET project that replaces many existing paper forms. One of the requirements is that the user can save the form in any state, i.e. they could create a new blank form and immediately save it with no data or with partial data. I'm validating for data type on every save but validation for required fields does not occur until the user marks the form as completed.

I'm not sure what the best approach is to handle this requirement in the database and domain model. As I see it, I have two options:

Allow nulls for any field that may not have data. This feels like the "correct" approach but it requires that almost every database field allow nulls and I have to code around a lot of nullable types. Also, when the form is finalized none of the required fields are enforced in the database.
Populate my business objects with meaningful default values. In some cases, there are meaningful default values for many (but not all) fields that I could use. This approach verges on "magic numbers" which makes me uncomfortable.

Which approach is best? Or is there a third way? I'm not willing to go to extremes, such as splitting the tables.

Edited to add: I wanted to expand on this a bit since I accepted a response. The primary reason that I'm not interested in splitting the tables is that once a project is submitted, the data on the forms is used to generate data for another system that is the system of record. At that point the original form data is unlikely to be revised or used for reporting.

From stackoverflow

If there isn't a sensible default, and you don't want to split the data, then nulls are almost certainly your best option. Re the db not being to verify that they are not null when completed... well, if you don't want to split the table there isn't much you can do (short of using a CHECK constraint, or an INSTEAD OF trigger to run validation). But the DB isn't the only place responsible for data validation. Your app logic can do that too.
I don't understand why you don't want to split the tables. I don't know what domain you're in but in any I could imagine there are two classes of people:
- people who have submitted the form
- people who haven't
And as a business executive I don't care about the second. But the first I care deeply about, and they need to have all their data in correctly.

It also improves efficiency - most of your queries about aggregate data will be over the first table, not the second. The second table will only be used for index seeks.

Jamie Ide : There aren't well defined steps in the process that let me split the table logically. To really make it work, I would have to maintain two sets of tables, one for in-progress forms and one for completed forms. This project doesn't warrant that amount of effort.
I'd take the first option but add a column to the database tables so that when the form is completed this is flagged. Then for anything using the form data it merely needs to check that the form has been completed.

That's my suggestion for a way around this.

Jamie Ide : Yes, I already have that flag. It literally is a matter of the user checking a box to indicate that the form is finalized.

JB King : I meant putting that value into the database and changing the object model to use a check on that column to ensure that the values are set. The required fields is enforced within the business logic as the finalized column isn't set unless all the required fields are filled in.
You could use a temporary table with "allow nulls" on every column to store the form containing partial or no data and copy / move the data to the final table when the user marks the form as completed. This way, you do not depend on default values (which the user may forget to change), you can save in any state, and you still have the validation in the end.
NULL values are not searchable by the indexes.

If you'll need to issue a query like "select first 10 forms with a certain field unfilled", this query will use a FULL TABLE SCAN which may be not efficient.

Oracle does not distinguish between NULL and empty string, but other databases do. You'll probably want to make an empty string to be the DEFAULT for unfilled fields and use it in a search.

If you don't need to search on unfilled fields, then just make them NULL.
If splitting the table(s) (are there more than one?) is not an option, I would consider creating single table to store serialisations of objects of incomplete forms, and only commit a form to the "real" tables when the form is fully submitted by the user.
CHECK CONSTRAINT + VIEW

if you don't have a status field add one so you can tell that it is finished.
- add a check constraint on that status field so it can't be marked finished if any of the columns are null.
When you write your queries on "finished" forms you can ignore checking for nulls everywhere if you do one of these two options:
- just add Status="F"inished in the where clause
- make a view of only finished ones
when using the "finished view" you don't have to do all the validation checks or worry about unfinished ones showing up in the results
NULL generally means "Don't Know" (in a database) whereas an empty string could actually represent an empty string.

I would tend to use NULL as the "Don't Know" value in your case. When you print out data you'll just have to assume that any NULL value means an empty string.

Kezzer : On the other hand, NULL could very well represent something. What if you have a date field called "EndDate" which, when left as null, indicates that there is no end date to the current process? Then that NULL value has a meaning and purpose.
This is a situation that cries out for split tables. I know you said you don't want to do that, and in a comment even said "this project doesn't warrant that level of effort". but it's really the best solution.

Set up preliminary table(s) with everything except your key nullable. When the user marks the form complete, and it passes validation, move it to the final table(s). not only is this The Right Thing To Do, but it's probably less effort than "coding around nullable values" when working with finished forms.

If you need to see all forms, finished or not, make a Union view.
I've had a similar situation, and while I haven't yet come up with a solution, I have been toying with the idea of just using simple XML serialization to store the temporary document data. If you generate simple classes that model the data in the objects (using nullable types where needed, perhaps), it would be easy to stuff data from the screen into those objects, serialize them to XML and then store them in a temporary "staging" table. When your users are done working and want to submit or finalize the document, then you perform all of your needed validation against the serialized data, eventually putting into the "real" table with the proper data structures and constraints.

Code Answer

Sunday, May 1, 2011

Allowing nulls vs default values

0 comments:

Post a Comment

Blog Archive