In my previous post I described a three pronged approach to software security that is summed up by "Constrain, Reject and Sanitize". In this article, I'll discuss the "Constrain" part in some more detail.
Developers are usually focussed on what their end users are going to want to do with the system, and this generally does not (in most cases at least) involve launching XSS or SQL Injection attacks against the system, in fact generally the target audience for most software know nothing of these things. So when the developer is designing a piece of software, they are looking at it from the perspective of a benign user who just wants the software to work. It goes without saying, however, that the benign user doesn't want their personal details divulged to hackers. So there comes a time when the developer of the system has to consider what a hacker might want to "inject" into their software. There is a general principle that ALL user input should be considered evil until proven otherwise. The first part of this process is to constrain ALL user input.
There are a number of ways that input can be constrained, based on the type of information you are expecting.
1. If the possible field values is a singl or multiple selection from a fairly small well defined set of values, only allow the user to choose from this set of values. This can be done using UI elements such as a list of Radio Buttons, a group of check boxes, a listbox or a drop down list.
2. If you require more freedom than this but the data has a strict pattern that you can check for, then ensure that you validate the entered data the user enters. . This can take a number of forms
a. Data of a particular type (ie Decimal / Integer / Date) should be attempted to be cast to that type as soon as possible and the user notified if the cast fails.
b. Valid ranges and lengths of all data should be enforced. i.e. an age field may be required to be > 18 but < 130, a name field may be 30 characters or smaller etc....
c. regular expressions should be used for things like email addresses, post codes, Tax File Numbers, etc...
This kind of validation is called "white list" validation, because it looks at the problem from the point of view of a set of allowable formats for the input data, anything that does not satisfy this formatt is rejected outright. For example if you have a post code field, no one is going to be able to write any kind of attack that contains only 4 characters, all of which are numeric (0-9), similarly it is not possible to form an attack that looks enough like an email address to validate with a decent email checking regular expression.
The issue is though that some fields don't really lend themselves to this form of valdation. For instance description fields you generally want to be large text fields that can contain virtually any type of character. Even still, I would suggest you attempt to define a set of allowed characters and limit what the user can type into these fields. This article is not specific to web development, however, I do want to say something specific to web development. Some times you want to give your users the ability to enter rich html content. this is all well and good, but of couse it makes constraining the input quite difficult. You have to allow tags which means that you are potentially openning yourself up to Cross Site Scripting attacks. You might say that we can just reject any <script> tags, and we will get on to the rejection phase in my next post, but if you spend a bit of time looking at the XSS Cheat Sheet you'll very quickly realise that there are literally hundreds of ways to phrase an XSS attack. Also as the web is eveloving, and browsers implement the new standards (and new proprietary tags), the list of possible attack vectors grows without bound, so a site that may have been safe in ie5 days may without any extra work be vulnerable if the end user is using firefox 2.0 or IE7. This is a difficult problem, and one solution I have seen in the past is for the rich editor control to use its own format for storing the rich content. So the idea here is to have a format that allows for a supported subset of html. It is stored in the backend in this format and only transformed into html when it is needed to be rendered. This is still white list constraining and it works because the format that the control stores its data in will not have the necessary syntax to support tags that it doesn't know about.
The final thing I want to say is to give some idea of where this constraining should be done. The defence in depth paradigm requires that input be checked at every boundary. So the first boundary would be client side ie when the user first enters the data into the UI. Secondly the server side should validate the instant it recieves any data. From this point onwards, the data should be checked at the boundary of every layer until it is finally placed safe and secure into the database. Often if your application is designed well, you should be able to re-use validation logic between layers. This ensures that if the user finds a way to enter data at a lowere level, the data is still constrained.
In summary, Constraint is your first (and in my opinion best) line of defence against potentially malicious users. Next I will discuss the rejection phase.