Sunday 20 November 2011

Why parameterized queries stop SQL injection attacks

An updated version of this article can be found here (How do prepared statements protect against SQL Injection?)

I've recently got a new job, and as such was having to go through a lot of documentation, and 'recommended reading' (which I actually read because I had so much free time), but one of the many things was various type of vulnerabilities and how they work, and surprisingly they told you various ways to actually help against these attacks.
Now, the other day I was down the pub, talking with a friend who has recently got a developer job (we're both just out of university, thus many people I know either have just got a job, or will be hopefully in the near future), and he didn't understand about parameterized queries, and how they actually stopped SQL attacks. Now this I found totally understandable, as when articles talk about parameterized queries stopping SQL attacks they don't really explain why, it's often a case of "It does, so don't ask why" -- possibly because they don't know themselves. A sure sign of a bad educator is one that can't admit they don't know something. But I digress.
When I say I found it totally understandable to be confused is simple. Imagine a dynamic SQL query

sqlQuery='SELECT * FROM custTable WHERE User=' + Username + ' AND Pass=' + password

so a simple sql injection would be just to put the Username in as ' OR 1=1--
This would effectively make the sql query:

sqlQuery='SELECT * FROM custTable WHERE User='' OR 1=1-- ' AND PASS=' + password

This says select all customers where they're username is blank ('') or 1=1, which is a boolean, equating to true. Then it uses -- to comment out the rest of the query. So this will just print out all the customer table, or do whatever you want with it, if logging in, it will log in with the first user's privileges, which can often be the administrator.

Now parameterized queries do it differently, with code like:

sqlQuery='SELECT * FROM custTable WHERE User=? AND Pass=?'

parameters.add("User", username)
parameters.add("Pass", password)

where username and password are variables pointing to the associated inputted username and password

Now at this point, you may be thinking, this doesn't change anything at all. Surely you could still just put into the username field something like Nobody OR 1=1'--, effectively making the query:

sqlQuery='SELECT * FROM custTable WHERE User=Nobody OR 1=1'-- AND Pass=?'

And this would seem like a valid argument. But, you would be wrong.

The way parameterized queries work, is that the sqlQuery is sent as a query, and the database knows exactly what this query will do, and only then will it insert the username and passwords merely as values. This means they cannot effect the query, because the database already knows what the query will do. So in this case it would look for a username of "Nobody OR 1=1'--" and a blank password, which should come up false.

This isn't a complete solution though, and input validation will still need to be done, since this won't effect other problems, such as XSS attacks, as you could still put javascript into the database. Then if this is read out onto a page, it would display it as normal javascript, depending on any output validation. So really the best thing to do is still use input validation, but using parameterized queries or stored procedures to stop any SQL attacks.

Monday 12 September 2011

Annoying geeks...a World of Warcraft DoS & security of tokens

Ok, you may have seen that title and thought OMG! 1337 DoS to take down blizzard!!!1! Well you're wrong, and it's not actually specific to World of Warcraft, it's anything that uses a token. 

Note: this is a post saying how this is possible, not indicating that you should do so. And it's not permanent, it will just make them hate you until it's fixed. 

So I noticed about a year or so ago that a friend of mine, who plays World of Warcraft, uses a token device, called "The Blizzard Authenticator", more information can be found here
It generates a 6 digit numeric code that has to be inputted along with the username and password of the user. One simple way to effectively DoS the user is to simply steal the keychain, whereas a smarter way, which will probably make them think that something's wrong with the token, and probably spend lots of time trying to fix it, is simply by pressing the button around 10 times or so. 
Since tokens generate a (pseudo-)random number, it means that hitting the button enough times will make the device show every code in the whole keyspace of 000000-999999. And because of this, technically every number is a valid number, but if it just took every 6 digit number as correct, it would give no added security. The way they get around this is have a Window of Acceptance. They take the last 6-digit code used in the user's last successful login, and do the random function on it (the same one used by the token device to bring out the next code), this should give the next code, however this also means there is only a window of acceptance of one code, which is pretty ridiculous as it means if you accidentally press the button twice, you've wasted all your money on a token. So they have a window of acceptance that's larger, perhaps 5 or 10. So they do this function on each subsequent result, and have a list of codes that can be accepted. 

World of Warcraft isn't the only application of tokens, the reason I used WoW for this blog post is partly because I know somebody that uses one of the devices, and partly because of what one has to do to fix it. On the WoW wiki, it shows that if your token breaks, you need to contact the World of Warcraft billing department with the following information:

  • Your full real name.
  • Your full address including postal or zip code.
  • Your full email address (currently registered on the account).
  • Your account name.
  • The authentication key used to create the account.
  • Your Secret question and answer.
  • The last 4 digits of the credit card used on the account plus the expiration date OR the full code of a game card activated on the account.
  • A legible fax, scan or photo of a piece of government-issued photo identification, such as a passport or driving license matching the first and last name of the registered account owner. (No idea if this is kept on record but I sure hope not)
  • A legible fax, scan or photo of the Authenticator token, with the code on the back fully visible.

This is an immense amount of security for a game (perhaps even too much). I'm really impressed as there are banks with less security than this (probably because sending photos of ID and the broken token would take so long)
So in the end, if you know somebody who plays WoW, and they use a token device, don't go around pressing the token device 10 or 20 times (or giving them to small children who will just do the same). Or if you do, expect them to be annoyed until they manage to get it reset, which judging by how much information they need to give, may take a while. 

p.s. I don't play World of Warcraft, I'm not a fan, but I think Blizzard are doing a decent job to make people feel protected against others stealing their account, and in the end, I feel tokens are a good idea.

Tuesday 28 June 2011

Get your fuzzers ready...

I've been skimping a lot on blog posts of late, partially because I've been doing my exams & relaxing there after, but it's also partly down to how busy i've been, and how lazy I am. Anyway, since my last blog post, I've finished my final year project, done my exams and been to both BSidesLondon and BSidesVienna. Both conferences were great, and if I had the money to go to Las Vegas I'm sure I would be going to BSidesLV too (and probably defcon while I'm out there). However, money being a little tighter means I am unable to do this.

What I have really been doing though is focussing on fuzzing, the theory behind it, and setting up my home lab for both bug hunting and general trying to get better at pentesting. The idea of fuzzing and bug hunting is really interesting to me, since there are practically unlimited different ways to try to break a program.
It's the same idea as for all security -- an attacker has to only find one way to get in, a defender has to attempt to stop every single possibility, and you could use the same thing with a security-minded programmer versus a bug hunter.
But back to fuzzing in particular, in case you don't know the basic techniques of fuzzing, there are two main ones -- mutation-based and generation-based.

Generation is a much more comprehensive type of fuzzing, since input is generated and can be made as such that is tests every known feature of a program. The major flaw with this technique (and the main reason it's not used as much as the other) is purely time. It could be deemed it is also down to knowledge, but I am putting knowledge in with time. The reason so much time is taken up by building a generation-based fuzzer is that in order to generate the data, you need to know how the data was constructed. Now with simple protocols and filetypes this isn't such a bad thing, as there may not be a lot to learn and generate data, but with proprietary protocols and stuff that hasn't been well documented/reverse engineered, this becomes a major problem. It may be that you create a fuzzer, spending months reverse engineering the protocol and making the best fuzzer possible, just to find out that there were no problems with the particular program (or at least none your fuzzer could find). It is seen that since generation-based fuzzers are specific, they will make good overtime, since if testing many different programs you just need the one fuzzer and it can test all of them for a specific file format or protocol.
However, the startup cost is usually so great that it is often easier to just use mutation-based fuzzing instead. With this technique the tester takes sample data, changes it, and feeds it into the program being tested. The only problem is that this techniques will not get past security measures (you're pretty screwed against encrypted traffic) and checksums or anything calculated on parts of the data.

Luckily, there are 2 compromises:
--find out some of the protocol/file type
--Mutate a lot of files

You may think that finding out about the protocol/file type is the same as generation fuzzing, but this is meant to be used with mutation-based fuzzing. And you may also be wondering that if you're already going this far, why not just go and build a generation fuzzer?
This is a valid point, however, this way you need to know a lot less. For example, you are reverse engineering a proprietary filetype, you've spent days trying to find out what a particular section of 10 bytes is for. If you're making a generation fuzzer, you will have to keep on getting other files and trying to reverse these, so that you can do the correct generation on files, but if you're doing mutation-based fuzzing you can either say just fuzz it, or leave it as static. The bare minimum need to know for this type of fuzzing is checking what stays the same in each sample, and what is different. Then you can say "fuzz it" or "it's static, don't fuzz it". Of course the further into the protocol/file type you get the better results can be obtained, however there are programs that can try to do this for you (I haven't personally researched them but there may be a blog post coming up about them at a later date). Another advantage of using this is that if there is a checksum or hash that needs to be done, this technique can do these. If these are not done it is likely that the data will just be rejected, effectively losing a potentially huge chunk that cannot be tested with normal mutation-based fuzzing.

Mutating a lot of files basically gives the effect that it will go through a lot more branches of the execution of code, and you will get much better code coverage ((a program could potentially be used for this). This is a technique that has been covered well by Charlie Miller here. The techniques will most likely give a compromise between amount of time spent and problems found. Similar to the technique above there is a lot of startup time used for it, with collecting samples, making them into an easy form to fuzz, and fuzzing them will take a long time since there is a lot of documents (in the slides above Charlie Miller says it would take around 3-5 weeks to fuzz using his own code running in parallel on 1-5 systems at a time). However, this is still often shorter than the amount of time it would make to create a generation fuzzer for a proprietary file format or protocol.

I have personally spent some time setting up machines for doing fuzzing the latter way. And here is some problems and solutions I have found thus far:
(note I have created scripts for many of these jobs, which I am not stating to use, since in some cases they may possible go against websites' Terms of Uses)

So I decided I would start off on my desktop fuzzing PDF software so I first started investigating what to fuzz that actually uses PDFs. This isn't just Adobe PDF reader. There are different viewers, editors, converters, creators and other things applications can do with PDFs. A decent list (with specific platforms) can be found on wikipedia at, however this isn't a complete list, and I'm pretty sure the list doesn't include any web browser plugins.
So now I have a list of possible software - what am I going to fuzz and what platform am I going to use?
Well I decided since I know best about Linux and Windows I would use these (I do have the possibility of testing Mac software, but I don't want to fuck up my mac and as far as I know it's still not really possible to put OSX in a VM without hacks).
Since I have a desktop able to stay on all night and day fuzzing these (core i5 with 12Gb DDR3) I have installed VMware workstation and will install a few VMs of each, with exactly the same environments since workstation is able to clone virtual machines.

For downloading files from the internet, I created a script that simple downloaded them from Google. Since I couldn't be bothered to try and find out how to look through each page in a search result I just put in another number with the filetype in each search. It isn't the best working script every, and won't find every possible pdf in the results, as I had to make a compromise between how long I wanted it to take and what strings the regex would find. This is because when adding even more possibilities in regex it made it grow rapidly in time. I believe I found a decent compromise and have put it here. As I noted before, Google may have something against scripting in its Terms of Service and as such this should not be used, plus there is limit of around 650 for the reason that Google brings up a Captcha. (And if you can script your way around a captcha you need a medal) If you wish to download more, it could be tried at different IPs (how I imagine it determines scripting) or just wait the required time, which I believe to be around 30-60 minutes.

if [ $# -ne 1 ]
  echo "Usage: $0 "
echo "Getting files.."
for ((i=$bottomLimit; i<=$topLimit; i++))
  echo "$num of $topLimit completed"
  wget -q "$i+filetype%3Apdf" -U $userAgent -O Temp
  egrep -o "http:\/\/(\w*\.)*\w*(\/\w*|\w*\-\w*)*\.$1" Temp >> Temp2
rm Temp
sort Temp2 | uniq > $1List.txt
rm Temp2
resultsNum=`wc -l $1List.txt | awk '{print $1}'`
echo "Searching complete! Total of $resultsNum found. Check file $1List.txt for results"
mkdir $1s
cd $1s
echo "Downloading documents to new directory $1s..."

for i in $(cat ../$1List.txt)
  echo "$num of $resultsNum downloaded"
  wget -q -U $userAgent $i

Note as well that the second part of downloading the files actually was taking a very long time, and as such I made a script of just this part so that I could make it run on several different machines which sped the download up considerably.
This is that script:

peter@LinuxDesktop:~/PDF$ cat
if [ $# -ne 1 ]
  echo "Usage: $0 "
resultsNum=`wc -l $1List.txt | awk '{print $1}'`
cd $1s
echo "Downloading documents to new directory $1s..."

for i in $(cat ../$1List.txt)
  echo "$num of $resultsNum downloaded"
  wget -q -U $userAgent $i

This was just used by splitting up the list in the textfile created in the first script and using it on different VMs.

Once this was done I decided I wanted to give all the files a generic name, as I would be making lots of different slightly different copies of each file. So for this I created (yet another) script so that all the files would be named A.pdf, B.pdf, C.pdf .. AA.pdf, AB.pdf....
This was going to be used with upper and lower case arguments, however after 2 hours of frustration I remembered the fact its a case insensitive drive (damn Macs!!) and therefore ab.pdf is the same as AB.pdf (on the drive not in the terminal before anyone tries to argue with me).
Anyway, this script is really bad and I'm sure there's a better way to increment a letter after Z using mod, but I couldn't figure out and was tired, so here's my script for this part too:

peter@LinuxDesktop:~/PDF$ cat
import subprocess
from sys import argv, exit

if len(argv)!=3:
  print "Usage: "
if argv[1][-1]!='/':
  argv[1]= argv[1]+'/'
for i in range(65, 91):
cmd='ls -1 ' + argv[1]
p=subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

for i in files:
  index= files.index(i)
  if index%200==0: #just to have some sort of update to user
    print str(index) + " of " + str(filesLen) + " done."
  #want to make sure am only changing files of same type, so will check extension
  if i[-(len(argv[2])):]!=argv[2]:
    print "File with mismatching extension. Continuing over this file. Best to run on directory filled only with the associated files"

  if index<len(array):
    newFile = array[index] + '.'  + argv[2]
  elif index<(len(array)**2):
    newFile = firstLetter + secondLetter + '.' + argv[2]
  elif index<(len(array)**3):
    newFile = firstLetter + secondLetter + thirdLetter + '.' + argv[2]
  cmd='mv ' + argv[1] + i + ' ' + argv[1] + newFile
  output = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

You just simply use the script stating what directories all the files are in, and what their extension is (note: I later found out this only works on linux with all the extension is the same case e.g. png isn't the same as PNG which I should've got earlier but I can't be bothered to change the script now). This will only work for a max of 26^3 files (and I think it's so badly written it won't actually do that many,
but I don't care, I'm giving it away on a blog for crying out loud).

And now how to fuzz it?

This I'll leave upto you. I've looked into both Sulley and Peach fuzzing frameworks, and I think Sulley looks really good but I think it's Windows only. Peach however is multi platform but looks as though it may be harder to understand, but I would recommend checking it out if you haven't already because it looks like it can do just about anything.
If you want you can fuzz any way you want. You may want to use the '5 lines of python' that Charlie Miller swears by (which was actually about 20-35 lines when I managed to get it into a python program -- this is actually still fairly small but it's only for generating the fuzzed files). I suggest if doing it this last way, one way to test the fuzzed files if on windows would be to use FileFuzz It's not amazing, but I can't see any reason why you couldn't put a lot of fuzzed files into a directory and tell it to use them, I'm just not sure exactly what FileFuzz has checks for, and some trial and error testing may be needed done a sample at a time in order to get the time needed per test down properly.

Anyway I hope if you try to do any fuzzing that this gives you some help on the basics, and ways you could go about it.


Sunday 27 March 2011

The power of redirection

URL redirects are becoming less frequent but they are not going to disappear any time soon. You may not think it’s a big deal, but I’m guessing that the majority of people would not know the difference between clicking on<ype=dl_dlnow&pid=11355671&mfgId=50119&merId=50119&pguid=V95vrAoOYJAAABI7QlIAAABo&ttag=tdw_dltext;&

And clicking on<ype=dl_dlnow&pid=11355671&mfgId=50119&merId=50119&pguid=V95vrAoOYJAAABI7QlIAAABo&ttag=tdw_dltext;&destUrl=%68%74%74%70%3a%2f%2f%77%77%77%2e%6c%61%76%61%6d%75%6e%6b%79%2e%63%6f%6d

It’s surprising that spammers have not considered this method. If a mail server checks the content of emails for links of black listed websites, a redirection could get around this, especially if obfuscated in such a way as above.

This is a hacker’s dream. People go onto a legitimate website that they’ve used hundreds of times before, they click on a link for, say Windows 7 SP1 (as is the one above), and a link that should take them to the Microsoft website instead takes them to a malicious website that contains malware that uses vulnerabilities fixed with SP1 (unlike the one above, promise ;) ).

Especially in recent times, with such huge catastrophes happening around the world, people are a bit wary, and want to give money. This is a huge opportunity for attackers, and as such large websites, especially those that are foundations taking people’s donations need to make sure that there aren’t redirects on their websites.

To show the ease of taking a normal page, finding redirects and exploiting them I have written this tool in Python:

(Note: This tool is for education purposes only. The author does not take any responsibility for any damage caused and does not condone this being used for illegal purposes.)

from sys import exit, argv
import re
from optparse import OptionParser
defaultUsage = 'usage: %prog '
parser = OptionParser(usage=defaultUsage)
parser.add_option('-f', '--file', action="store", type="string", dest="file", help="File for searching for redirects")
parser.add_option('-u', '--url', action="store", type="string", dest="url", help="A URL to replace the redirect, e.g.")
parser.add_option('-o', '--obfuscate', action="store_true", dest="obfuscate", help="Obfuscate a URL by converting to hex then URL encode. Note: -u option also needed.\nE.g. becomes \nhttp%3A%2F%2F%77%77%77%2E%6C%61%76%61%6D%75%6E%6B%79%2E%63%6F%6D%2F")
(options, args) = parser.parse_args()

separator = '---------------------------------------------------------------------------\n\n'

if not (options.file):
  print defaultUsage

filename = options.file
file = open(filename, 'r')
text =

urlPattern = 'http((\:\/\/)|(\%3A\%2F\%2F))\w*[.\w]+[\/\?+\=+\&+\%+\.+\;+\-+\_+\++\w+]*\"'
redir = 'http((\:\/\/)|(\%3A\%2F\%2F))\S*redir\S*\='+urlPattern #won't match all redirects but is good enough for my needs
match = re.findall(r'('+redir+')', text)
if not match:
  print "No redirects found!"
uniqueMatch = []
for elem in match:
  if elem not in uniqueMatch:

if options.obfuscate:
  if not options.url:
    print "A url is needed with -u in order to obfuscate"

if (options.url):
  url = options.url
  if url[:4]!='http':
    url = 'http://'+url
  if options.obfuscate: #convert to hex then effectively url encode, so A becomes %41 etc
    url = url.encode('hex')
    tempList = list(url)
    i = 0
     j = len(tempList)
     while (i < j):        tempList.insert(i, '%')        i+=3        j = len(tempList)      url = ''.join(tempList)    for elem in uniqueMatch:      original = elem[0] + '\n\nbecomes:\n\n'      print original      replaced = re.sub(r'\='+urlPattern, '='+url, elem[0]) #replace the strings
     print replaced+'\n\n'
     print separator #just presents it in a easy to read way
   for elem in uniqueMatch:
     print elem[0] + '\n'

redirects = len(uniqueMatch)
if redirects!=0:
   print str(redirects) + ' redirects found\n' #tells you how many found for good measure

I originally wanted to create a proxy server, which would then find all the redirects as a surfed the Internet, however I wanted something I could create in a couple of hours.
This program takes in a file such as the source code from a webpage with the -f option and prints out the redirects. If you specify -u you can specify a URL you want changed into the redirect, and -o to then obfuscate this.

To test this out you can use the source from the web page:

As you can see from the URL, this came from searching for windows 7 sp1 on cnet’s website, and the redirect at the top of the page came from this page.

There seems to be quite a few redirects which require the user to login first but this doesn’t fix the problem, since if it is a targeted attack, they will use a website that the target probably has a login for.

Either way redirects can be very dangerous, and shouldn’t be a problem that gets put off.

Friday 18 March 2011

Compilers: Constructing an SLR(1) parser table & parsing a string

Compilers is a very difficult subject and this isn't a dirt basics course. Before reading this you should probably learn about grammars and how they are used in parsing, and the differences between the different parsers.

Here is just a simple (well I can't think of a simpler way to put it) way to explain how to construct an SLR(1) table from a grammar and then use this to parse a string using the largest match strategy.

Note: this may take you may have to go over this a few times to understand, and it's not exactly something you'll get good at just by doing it.
Note 2: I don't care for mathematical notation. Only about 0.1% of the population understands it and for everyone else it is confusing/scary. If you want the mathematical information about SLR1 parsers, I suggest using a different source. Wikipedia is a good starting block.

Now after getting formalities out of the way, let’s get down to business.

We will start off with the grammar:

1. S::=bAS 2. S::=ab 3. A::=bA 4. A::= ϵ

The capital letters are non-terminals and the small letters are called terminals.

For the sake of it just looking easier to use we’ll right it in this way:

S’ -> S
S -> bAS | ab
A -> bA | ϵ

The S’ always goes to S. This is just something that’s done. I don’t know the direct reasoning behind it, but in the end it’s often to know better to just do it than to worry why it’s needed.

So we will go through each position trying to get to the end of the string while doing a closure each time.

We always start off with the first point in, which is always S’ -> S

We do a closure on this. This looks like:

{ (S’ -> .S), (S-> .bAS), (S->.ab) }

What we have done here is gone from S’ to the start position of the right hand side. Once we had S’ -> .S, we then have to look and say “Are there and non-terminals after the . (dot).

And as we can quite clearly see, there is an S after the dot, so we expand on this by also recursively doing the same thing. We put in all that S can go to and look at any non-terminals directly after the .

So once we have { (S’ -> .S), (S-> .bAS), (S->.ab) } we can say let’s call this state 0.

You then have to go through 0, trying to get to the next position through a terminal or non-terminal. This sounds confusing but it’s really not once you see it, it’s just a case of moving the . across.

So we then do:

0S = { (S’ -> S.) }

we have travelled 0 along S. And the only point we can do this is here (note the . now on the right hand side of S).

as you can see above, we can also go along b in S -> .bAS and along a in S -> ab

So we then do these

0a = { (S -> a.b) }

0b = { ( S -> b.AS) }

Now in the first two we did, there are no more non-terminals you can go along.
S’ -> S. can’t go anywhere because it’s finished, and
S -> a.b can only go along b, which is a terminal.

So we now appoint numbers to these as well. So

0S = { (S’ -> S.) } = 1

0a = { (S -> a.b) } = 2

Technically you could also say 0 along A goes to the empty set, but because it’s nothing I don’t see any point since it would take up a lot of room later.

0b = { ( S -> b.AS) } can go along a non-terminal, however. Therefore we do a closure on this.

cl(0b) = { (S -> b.AS), (A -> .bA), (A -> .) } = 3

here again we have started off by first putting in the set 0b, then looking at the next non-terminal. We then evaluate this non-terminal, putting everything that can go to.
As you can see A -> . (dot). This is because it goes to ϵ, then nothing character, effectively. Therefore you won’t ever go past epsilon as you already have since it’s nothing, is probably the best way I can describe it (unfortunately).

As you can see here I’ve called this state 3. This is because whenever you do a closure, you then name it a new state, if the set isn’t the same as one you already have.
This is actually the reason we named 1 and 2 before.
If you have for instance
0a = { (S -> a.b) } – then nothing within the set has a non-terminal after the . which means the closure of this will just be itself, meaning we can just miss it out completely.

Anyway, we have completely evaluated 0, and we now have:

0S = { (S’ -> S.) } = 1

0a = { (S -> a.b) } = 2

cl(0b) = { (S -> b.AS), (A -> .bA), (A -> .) } = 3

State 1 has parsed to the end of the string (the . is right at the end), so we can’t do anything further with this.


2b = { (S -> ab.) } = 4

again no non-terminal after the dot, so we give it a new state since this hasn’t appeared before (this isn’t the same as state 2, the placement of the . does count).

So now we can’t do anything else we move onto 3.

3A = { (S -> bA.S) }
3b = { (A -> b.A) }

and there’s no 3ϵ since as I said we can’t move past the epsilon.
Everything 3 went to has a non-terminal after the dot, so no new states yet. Again we repeat the process doing the closure of each

cl(3A) = { (S -> bA.S), (S -> .bAS), (S->.ab) } = 5
cl(3b) = { (A->b.A), (A->.bA), (A->.) } = 6

Now this is done we go back to where we last evaluated a state. The last state we evaluated was 3, so we now try 4, but this is at the end so we can’t go any further. So we go onto 5 and 6:

5S = { (S-> bAS.) } = 7
5b = { (S-> b.AS) }
now if you notice, this is actually the same as 0b. This means the closure will be the same, meaning we won’t have to redo it again. Therefore we just say
5b = { (S-> b.AS) } = 3
5a = { (S -> a.b) } = 0a, cl(5a) = 2

6A = { (A -> bA.) } = 8
6b ={ (A ->b.A) } = 3b, cl(6b) = 6

If this looks confusing, don’t worry I’ve literally just done the same thing I said above but in a little bit of shorthand. If you are confused, I suggest you re-read over the couple of paragraphs before.

We now look through every number since the last ones evaluated. So we’ve done 5 and 6. We look at 7 and see the . is in the last position, therefore can’t go any further, and the same holds true for 8, and there is no higher number which means we are done!

Now comes actually building the table.

We have a row for each state (the numbers) and a column for each terminal, $ and each non-terminal.
Now the $ is actually the end of string symbol. This means that you have absolutely completed anything and is called the accepting state.

This means that in the table we have, the state that has the final accepting state, i.e. where S’ -> S. is where you will put ‘acc’ in the table.
So the row of the state, in the $ column.
We also need to go through every state we have and look to see if there are any other points where we have a . in the end position. We then make a rule for them.

All the states we have are:

1 = { (S’ -> S.) }

2 = { (S -> a.b) }

3 = { (S -> b.AS), (A -> .bA), (A -> .) } #rule 1

4 = { (S -> ab.) } #rule 2

5 = { (S -> bA.S), (S -> .bAS), (S->.ab) }

6 = { (A->b.A), (A->.bA), (A->.) } #rule 1 (again)

7 = { (S-> bAS.) } #rule 3

8 = { (A -> bA.) } #rule 4

It doesn’t really matter what numbers you give the rules, just as long as you make the consistent.
These rules are actually reductions, and we need them in the table.
To calculate where to put these we need to work out the follow sets.
(I’m not going to go over how to work out the follow set, you must look this up yourself)

But the Follow set is written FW(X), where X is a non-terminal, and the follow sets are made up of terminals, since they are effectively finding what terminal will come next after the non-terminal.

FW(S) = {$}
FW(A) = {a, b}

Therefore we look at the rule, for instance rule 2.
Rule 2 only happens in state 4. So we know the row it will be in our table. And the rule is S -> ab. – the part on the left of the rule is the part you’re looking for. The follow set of this is the column where you put the rule.
So you need to put r2 in column $, row 2.
And do this for every rule.

We also need to put shifts and gotos in the table.
A shift is for terminals and a goto is for non-terminals.

It’s probably best to explain with an example.

I’ll use state 3, since it’s one of the most complicated (and I hate when people give you the easiest example and expect you can do that hardest):

cl(3A) = { (S -> bA.S), (S -> .bAS), (S->.ab) } = 5
cl(3b) = { (A->b.A), (A->.bA), (A->.) } = 6

For this part we need to know what states you can get to from state 3.

As you can see, from state 3 via A, you can get to 5,
So we put g5 in row 3, column A. (g for goto since A is non-terminal)

And we can see, from state 3 via b, you can get to 6,
Meaning we put s6 in row 3, column b.

We do this for every state and we get the table:
a b $ S A
0 s2 s3 g1
1 acc
2 s4
3 r1 s6/r1 g5
4 r2
5 s2 s3 g7
6 r1 s6/r1 g8
7 r3
8 r4 r4

Now this is the SLR(1) parse table we’ve been wanting to get!

Now actually for the tricky bit (this will probably need going over at least a couple of times).

Parsing a string with our table.
We’re going to be parsing the string bbbab with this parse table and hopefully getting back to the accepting state.

Since SLR(1) parsers are ‘bottom-up’ parsers, they take the string, and then try to go through the different variations until it can get back to the original accepting state, meaning it is a correct string for the grammar.

And this is also where the largest match strategy comes in (trust me I did mention it at the start). This basically means that if you come to a point in the table where you can do a shift or a reduction, choose the shift.
You actually have a stack, and the input. You’re trying to basically derive the stack down to $0 and the something that will make you go to the row that has the accepting state. So in this case we’re trying to get the stack to $0S1, and the input to the end terminal, $, and the action as accept (acc).

Stack                                        Input                                        Actions

$0                                              bbbab$                                       s3

So this is where we will always start, with our stack and our input.
The action is obtained by looking at our parse table. You look at the last number on the stack (0 in this case) and the next input, b.
This gives you the action s3.
Shifting means you take the terminal from the input and put it on the stack, with the row number (3 in this case).

And then you keep on doing this.
$0b3                                           bbab$                                             s6
$0b3b6                                        bab$                                               s6
$0b3b6b6                                    ab$                                                 r1

Now that we have a different input, a, it means the action is now r1, or rule 1.
This means you look at the row you got this from, 6 (also the last number on the stack if you forgot) and then you have to look at the rule.
Rule 1 is (A -> .). When you do a reduction, you look at the right hand side and pop this off (take it off) the stack.
So in this case the right hand side of the rule is epsilon, so nothing, which means you don’t need to pop anything off of the stack.
Once you have done this you push the left hand side of the rule onto the stack, making
Now you need to find the number that goes here. The non-terminal on the left hand side of the rule is A, so we look up state 6, column A and we can see there is a g8, so you put the number associated with A on the stack as 8.

$0b3b6b6A8                              ab$                                                   r4

As you can see above the input did not change. Each time a reduction is done, it does not change the input!

So now we look at r4, which is (A -> bA.)
We look at the right hand side and we see bA, so we pop this off the stack.

We take a note of the number at the end here, and then we pop on the right hand side
and look up the number for A. For this we have to use the number we took a note of (the number before the A) and find this in the table. Again it’s g8 so we have:


And now we carry on:

$0b3b6A8                                   ab$                                                   r4
$0b3A5                                      ab$                                                    s2
$0b3A5a2                                   b$                                                     s4
$0b3A5a2b4                                $                                                      r2
$0b3A5S7                                   $                                                       r3
$0S1                                           $                                                      acc

And now we have got to the accepting state action, we have finished, and the string has been parsed!

I fully expect this may need a couple of read-throughs and it still might not make sense. The key is to just take it slowly, being careful to look up at the table and look up the rules when doing reductions.


Saturday 12 February 2011

What makes a great security conference?

A friend recently suggested we start up a security conference, and it has got me thinking a lot about security cons in general.
What makes a really good conference?
Is it the presentations? workshops? free stuff? just being able to hang out with other interesting people? (The drinking? ;-))

There are lots of fundamental questions you need to think about from the get-go,
Who is the conference audience?
-Is it general security? Offensive? Defensive?
Very technical? What sort of subjects would you want talks on?
How many people do you want going? (Roughly)
Where area do you want it in?
-City or at least region
How are you going to get the word out to get people excited & want to come?
-blogs? twitter? website? connections? podcasts?
Do you want to make it a free conference? If not, rough cost?
--is free even feasible? Sponsors? Do we have enough money to invest? Favours?
How long do you want the conference to be?
How many presentations do you want?
How are you going to get people to enter the CFP?

I personally feel the best things about a conference comes down to 3 things: 
getting people to come, a good place for people to hang out & get to know each other and great presentations.

I'm really liking the idea and challenge of organising and creating a conference (and all the problems that go with that), but I think there are some fundamental ideas that need to be sorted first.
But perhaps the best thing to do is just try to get on with it and not get too distracted, otherwise it may just get put off indefinitely.

Seeing as I want to know what people like most,
what are the key points that make a conference great to you?

Please leave comments for some suggestions


Friday 28 January 2011

Why education on exploitation isn't helping security

To start off, I feel it imperative I give a little background, I'm a student at university in London (don't worry I know the little value that a degree has), and I want to get into infosec.*
And as I want the best chance of getting into the industry, I'm generalising the things I'm learning. Learning about networking, exploiting machines on Windows, and linux, different exploits and how they work, web application hacking, but the one thing that bothers me by far the most out of everything I've read, is that you are nearly never told how to prevent the problem. You are told the reason why something has a vulnerability in, but not how to stop it occurring.
I know some vulnerabilities are easy to sort the problem, whether it be not using strcpy(), and using strncpy() instead, but it isn't always the case. Whenever you read about webapp hacking, with all the different problems that can be sorted by input sanitisation, but you are nearly never told how to do this effectively, since just reading this once may make you believe you can just easily sort the problem, but they may not consider unicode encoding, or recursive sanitisation, where a program may get delete %22 if it occurs in the string, but this could cause %%2222 to work since it only checks the string once.
In the end, I can only think of 3 reasons why authors and such do not write about how to defend against  or fix these vulnerabilities:
  • It may be a case by case basis,
  • It isn't what people want to read about. We want to see about how to get into somebody's machine, our 'end result'
  • The person writing actually has no idea about how to defend against or fix the vulnerability.

Case by case
Now I can understand this for quite a few vulnerabilities, as it may be a slight coding error, or even a design fault. The first may be specific to the program or project, where they haven't thought about everything that could happen, and the latter can't really be changed unless you take the 'feature' out, or change it so that it cannot be used in that way.
But this isn't to say you couldn't tell people how to protect against the vulnerability. For example, if somebody is willing to write a short C program showing exploitable code, why can't they show the fixed program? The programs often used to show vulnerabilities are very short programs that clearly show the vulnerability, also meaning they are usually very easy to fix and easy to show the differences in code.
I fully understand that not all cases can be shown, but I think just a general helping could go a long way.

Reader doesn't want to know
This reason (presuming it is a valid reason) especially annoys me. This is because, if you ever read in a book on hacking, or exploitation, where it says in the book who the book was made for, it states something like "hackers, people in the information security industry and system administrators wanting to understand how hackers work," but how is this true for the last group of people? Somebody on the defensive side would want to know what causes the exploit, and then more importantly how to fix it. There's no point finding a vulnerability in your own code, then doing a bad job trying to fix the problem, and actually not completely fixing it.
The same goes for courses I have taken, they may show you methodology, or information about hacking, or vulnerabilities but may not show you how to fix the problem, but give a general explanation of how to fix it.
Plus if you're a pentester and you don't believe fixing the problem is part of your 'end result' then I think you should be in another job, since you are not helping the customer at all, but just getting paid to hack into a company's network. The main reason for a company to do a pentest or vulnerability assessment is their ROI (return on investment). There are solutions that don't need much detail, like specific patches they need, but if you're putting "SQL injections possible in form xyz, better input sanitisation needed" then that doesn't give them much information. And the only way to properly help them is to know how to do it yourself since if you give some example code on how to fix it, and they take the code, implement it, and a similar problem happens, then it is you, not the company who hired you, who is at fault (except maybe they are at fault for hiring you in the first place).

Author doesn't know how to fix the vulnerability

This reason doesn't annoy me, but would really worry me.
I couldn't give a crap who the author is, or the person giving the course is, if they don't know how to fix the basic vulnerability or exploit they are showing you how to do, then I don't want to be learning from them, as I don't think they have the right to tell teach anybody anything. I wouldn't teach somebody how to take a car engine apart if I didn't know how to put it back together, so why should they teach me how to break something without being able to tell me how to fix it?

People in information security are constantly complaining about the same old vulnerabilities in 'crappy', but is it any surprise when you have often have to go the extra mile to find out about fixing problems? It just shows that a lot of people don't really care about fixing stuff, because they're not on the programmers' side. They just want to break stuff and be a "1337 h4X0R Wh0 pWNd uR M4cH1N3."

In the end, not everything is bad, but there is a lot of information out there, and most of it doesn't help security,**


p.s. If you know of any good books or courses that go over exploitation & then how to fix the vulnerability, please leave a comment.

*I know getting into the information security industry from university is often not the best way, but I feel the fact I want to get into infosec now, and that I'm willing to spend days researching old and new ways to find vulnerabilities, and trying to discover vulnerabilities myself, whilst also taking the OSCP certification at the same time as my degree shows some of my dedication, and I feel although the better way, it is not necessarily the right way for me to get into infosec.
**It can be debated that finding more vulnerabilities & making the vendor create a patch is helping security, though I am basing this more on a penetration testing/vulnerability assessment view or the view of a vulnerability researcher trying to find vulnerabilities in their own company's programs.

Monday 24 January 2011

Self Improvement working within information security

.....and why it makes it more and more difficult to get into the industry.

Everything in IT is constantly changing, and it is exactly the same within infosec.  Whether working as a penetration tester, a social engineer, a hardware hacker, or even as somebody defending your company servers against tyrants of people trying to hack in. It doesn't change the fact that the field is constantly changing.
But this constant advancement is a double-edged sword, making people need to constantly strive to learn new things, but at the same time making it difficult impossible to keep up. But with all this constant change making it difficult for people to keep up, how is anybody new supposed to get in on the action?

Since technology is constantly changing it means that there are many companies still with legacy systems, meaning you need to know not only the new exploits, and architectures coming in, and the ever-growing scope of attack, but the old systems that still plague offices (and banks and supermarkets and bedrooms and ....) all over the world.
Even if you're not a vulnerability researcher, and are just somebody who dips their toes in a bit of everything in infosec, you may not need to know how things work down to the bit-level, but you still need to constantly keep up. You need to keep pushing yourself to keep learning, and that's something that people in information security all have in common. The curiosity that they want to know. They want to keep learning, want to keep reading blog posts, and whitepapers, listening to podcasts. They want to know why something works, and how it does it, and then how to break it.

But this can also make it very difficult for people trying to get into the industry.
  • To me, there are 3 kinds of people wanting to get into infosec: The people coming straight from university (like myself), who have become really interested in computer security, 
  • Those coming from another department of IT, that want to move into security
  • Those who are neither, more often younger people who didn't get to go to university, and might have a low level job in IT, or no job at all.

The first category, of being a student, or somebody younger who have become really interested in the infosec industry and want to learn more and more about it, and then, obviously try to make it a profession.
The biggest problem here is experience. Since they won't have as much experience as somebody who has been working with technology day-in, day-out, it makes it feel as if there is a huge amount to catch up on before they have even started. They might not have administered lots of different servers over many years, they won't necessarily know about certain configuration problems, or what certain errors mean, and these are mainly small things, but a lot of them can add up, but is all this just to say that they are no good? Of course not, it would be stupid to dismiss somebody as they haven't been working in IT for years.
The second way I think of getting into infosec, I believe the more common way, is to come from somewhere else in IT.
But this also brings with it a disadvantage, in that, a lot of jobs in IT may not have the same growth and may not move as fast infosec, so the people coming from these areas may not be used to the pace and it could come as a shock to the system to have such an overload of information.
And the last option is somebody who has just become really enthusiastic about security (I'm not including script kiddies who are just going through a phase), but again don't have the experience, and may not have the qualifications to get their resume past the HR department, but just because they don't, who is to say they cannot make it? I would rather talk to somebody enthusiastic about security that sees it as a way of life, who wants the constant learning, than say a master's student straight out of university who doesn't care about it and just wants it as a job.

But either way, there is the HUGE problem of having to learn the stuff that everyone in infosec already knows, while at the same time trying to learn everything that is currently coming out, and I think this may be putting people off and not allowing people into the industry.
Either way, I believe getting into security is really challenging, let alone staying there, and more importantly wanting to stay in the industry.