Sunday 27 March 2011

The power of redirection

URL redirects are becoming less frequent but they are not going to disappear any time soon. You may not think it’s a big deal, but I’m guessing that the majority of people would not know the difference between clicking on<ype=dl_dlnow&pid=11355671&mfgId=50119&merId=50119&pguid=V95vrAoOYJAAABI7QlIAAABo&ttag=tdw_dltext;&

And clicking on<ype=dl_dlnow&pid=11355671&mfgId=50119&merId=50119&pguid=V95vrAoOYJAAABI7QlIAAABo&ttag=tdw_dltext;&destUrl=%68%74%74%70%3a%2f%2f%77%77%77%2e%6c%61%76%61%6d%75%6e%6b%79%2e%63%6f%6d

It’s surprising that spammers have not considered this method. If a mail server checks the content of emails for links of black listed websites, a redirection could get around this, especially if obfuscated in such a way as above.

This is a hacker’s dream. People go onto a legitimate website that they’ve used hundreds of times before, they click on a link for, say Windows 7 SP1 (as is the one above), and a link that should take them to the Microsoft website instead takes them to a malicious website that contains malware that uses vulnerabilities fixed with SP1 (unlike the one above, promise ;) ).

Especially in recent times, with such huge catastrophes happening around the world, people are a bit wary, and want to give money. This is a huge opportunity for attackers, and as such large websites, especially those that are foundations taking people’s donations need to make sure that there aren’t redirects on their websites.

To show the ease of taking a normal page, finding redirects and exploiting them I have written this tool in Python:

(Note: This tool is for education purposes only. The author does not take any responsibility for any damage caused and does not condone this being used for illegal purposes.)

from sys import exit, argv
import re
from optparse import OptionParser
defaultUsage = 'usage: %prog '
parser = OptionParser(usage=defaultUsage)
parser.add_option('-f', '--file', action="store", type="string", dest="file", help="File for searching for redirects")
parser.add_option('-u', '--url', action="store", type="string", dest="url", help="A URL to replace the redirect, e.g.")
parser.add_option('-o', '--obfuscate', action="store_true", dest="obfuscate", help="Obfuscate a URL by converting to hex then URL encode. Note: -u option also needed.\nE.g. becomes \nhttp%3A%2F%2F%77%77%77%2E%6C%61%76%61%6D%75%6E%6B%79%2E%63%6F%6D%2F")
(options, args) = parser.parse_args()

separator = '---------------------------------------------------------------------------\n\n'

if not (options.file):
  print defaultUsage

filename = options.file
file = open(filename, 'r')
text =

urlPattern = 'http((\:\/\/)|(\%3A\%2F\%2F))\w*[.\w]+[\/\?+\=+\&+\%+\.+\;+\-+\_+\++\w+]*\"'
redir = 'http((\:\/\/)|(\%3A\%2F\%2F))\S*redir\S*\='+urlPattern #won't match all redirects but is good enough for my needs
match = re.findall(r'('+redir+')', text)
if not match:
  print "No redirects found!"
uniqueMatch = []
for elem in match:
  if elem not in uniqueMatch:

if options.obfuscate:
  if not options.url:
    print "A url is needed with -u in order to obfuscate"

if (options.url):
  url = options.url
  if url[:4]!='http':
    url = 'http://'+url
  if options.obfuscate: #convert to hex then effectively url encode, so A becomes %41 etc
    url = url.encode('hex')
    tempList = list(url)
    i = 0
     j = len(tempList)
     while (i < j):        tempList.insert(i, '%')        i+=3        j = len(tempList)      url = ''.join(tempList)    for elem in uniqueMatch:      original = elem[0] + '\n\nbecomes:\n\n'      print original      replaced = re.sub(r'\='+urlPattern, '='+url, elem[0]) #replace the strings
     print replaced+'\n\n'
     print separator #just presents it in a easy to read way
   for elem in uniqueMatch:
     print elem[0] + '\n'

redirects = len(uniqueMatch)
if redirects!=0:
   print str(redirects) + ' redirects found\n' #tells you how many found for good measure

I originally wanted to create a proxy server, which would then find all the redirects as a surfed the Internet, however I wanted something I could create in a couple of hours.
This program takes in a file such as the source code from a webpage with the -f option and prints out the redirects. If you specify -u you can specify a URL you want changed into the redirect, and -o to then obfuscate this.

To test this out you can use the source from the web page:

As you can see from the URL, this came from searching for windows 7 sp1 on cnet’s website, and the redirect at the top of the page came from this page.

There seems to be quite a few redirects which require the user to login first but this doesn’t fix the problem, since if it is a targeted attack, they will use a website that the target probably has a login for.

Either way redirects can be very dangerous, and shouldn’t be a problem that gets put off.

Friday 18 March 2011

Compilers: Constructing an SLR(1) parser table & parsing a string

Compilers is a very difficult subject and this isn't a dirt basics course. Before reading this you should probably learn about grammars and how they are used in parsing, and the differences between the different parsers.

Here is just a simple (well I can't think of a simpler way to put it) way to explain how to construct an SLR(1) table from a grammar and then use this to parse a string using the largest match strategy.

Note: this may take you may have to go over this a few times to understand, and it's not exactly something you'll get good at just by doing it.
Note 2: I don't care for mathematical notation. Only about 0.1% of the population understands it and for everyone else it is confusing/scary. If you want the mathematical information about SLR1 parsers, I suggest using a different source. Wikipedia is a good starting block.

Now after getting formalities out of the way, let’s get down to business.

We will start off with the grammar:

1. S::=bAS 2. S::=ab 3. A::=bA 4. A::= ϵ

The capital letters are non-terminals and the small letters are called terminals.

For the sake of it just looking easier to use we’ll right it in this way:

S’ -> S
S -> bAS | ab
A -> bA | ϵ

The S’ always goes to S. This is just something that’s done. I don’t know the direct reasoning behind it, but in the end it’s often to know better to just do it than to worry why it’s needed.

So we will go through each position trying to get to the end of the string while doing a closure each time.

We always start off with the first point in, which is always S’ -> S

We do a closure on this. This looks like:

{ (S’ -> .S), (S-> .bAS), (S->.ab) }

What we have done here is gone from S’ to the start position of the right hand side. Once we had S’ -> .S, we then have to look and say “Are there and non-terminals after the . (dot).

And as we can quite clearly see, there is an S after the dot, so we expand on this by also recursively doing the same thing. We put in all that S can go to and look at any non-terminals directly after the .

So once we have { (S’ -> .S), (S-> .bAS), (S->.ab) } we can say let’s call this state 0.

You then have to go through 0, trying to get to the next position through a terminal or non-terminal. This sounds confusing but it’s really not once you see it, it’s just a case of moving the . across.

So we then do:

0S = { (S’ -> S.) }

we have travelled 0 along S. And the only point we can do this is here (note the . now on the right hand side of S).

as you can see above, we can also go along b in S -> .bAS and along a in S -> ab

So we then do these

0a = { (S -> a.b) }

0b = { ( S -> b.AS) }

Now in the first two we did, there are no more non-terminals you can go along.
S’ -> S. can’t go anywhere because it’s finished, and
S -> a.b can only go along b, which is a terminal.

So we now appoint numbers to these as well. So

0S = { (S’ -> S.) } = 1

0a = { (S -> a.b) } = 2

Technically you could also say 0 along A goes to the empty set, but because it’s nothing I don’t see any point since it would take up a lot of room later.

0b = { ( S -> b.AS) } can go along a non-terminal, however. Therefore we do a closure on this.

cl(0b) = { (S -> b.AS), (A -> .bA), (A -> .) } = 3

here again we have started off by first putting in the set 0b, then looking at the next non-terminal. We then evaluate this non-terminal, putting everything that can go to.
As you can see A -> . (dot). This is because it goes to ϵ, then nothing character, effectively. Therefore you won’t ever go past epsilon as you already have since it’s nothing, is probably the best way I can describe it (unfortunately).

As you can see here I’ve called this state 3. This is because whenever you do a closure, you then name it a new state, if the set isn’t the same as one you already have.
This is actually the reason we named 1 and 2 before.
If you have for instance
0a = { (S -> a.b) } – then nothing within the set has a non-terminal after the . which means the closure of this will just be itself, meaning we can just miss it out completely.

Anyway, we have completely evaluated 0, and we now have:

0S = { (S’ -> S.) } = 1

0a = { (S -> a.b) } = 2

cl(0b) = { (S -> b.AS), (A -> .bA), (A -> .) } = 3

State 1 has parsed to the end of the string (the . is right at the end), so we can’t do anything further with this.


2b = { (S -> ab.) } = 4

again no non-terminal after the dot, so we give it a new state since this hasn’t appeared before (this isn’t the same as state 2, the placement of the . does count).

So now we can’t do anything else we move onto 3.

3A = { (S -> bA.S) }
3b = { (A -> b.A) }

and there’s no 3ϵ since as I said we can’t move past the epsilon.
Everything 3 went to has a non-terminal after the dot, so no new states yet. Again we repeat the process doing the closure of each

cl(3A) = { (S -> bA.S), (S -> .bAS), (S->.ab) } = 5
cl(3b) = { (A->b.A), (A->.bA), (A->.) } = 6

Now this is done we go back to where we last evaluated a state. The last state we evaluated was 3, so we now try 4, but this is at the end so we can’t go any further. So we go onto 5 and 6:

5S = { (S-> bAS.) } = 7
5b = { (S-> b.AS) }
now if you notice, this is actually the same as 0b. This means the closure will be the same, meaning we won’t have to redo it again. Therefore we just say
5b = { (S-> b.AS) } = 3
5a = { (S -> a.b) } = 0a, cl(5a) = 2

6A = { (A -> bA.) } = 8
6b ={ (A ->b.A) } = 3b, cl(6b) = 6

If this looks confusing, don’t worry I’ve literally just done the same thing I said above but in a little bit of shorthand. If you are confused, I suggest you re-read over the couple of paragraphs before.

We now look through every number since the last ones evaluated. So we’ve done 5 and 6. We look at 7 and see the . is in the last position, therefore can’t go any further, and the same holds true for 8, and there is no higher number which means we are done!

Now comes actually building the table.

We have a row for each state (the numbers) and a column for each terminal, $ and each non-terminal.
Now the $ is actually the end of string symbol. This means that you have absolutely completed anything and is called the accepting state.

This means that in the table we have, the state that has the final accepting state, i.e. where S’ -> S. is where you will put ‘acc’ in the table.
So the row of the state, in the $ column.
We also need to go through every state we have and look to see if there are any other points where we have a . in the end position. We then make a rule for them.

All the states we have are:

1 = { (S’ -> S.) }

2 = { (S -> a.b) }

3 = { (S -> b.AS), (A -> .bA), (A -> .) } #rule 1

4 = { (S -> ab.) } #rule 2

5 = { (S -> bA.S), (S -> .bAS), (S->.ab) }

6 = { (A->b.A), (A->.bA), (A->.) } #rule 1 (again)

7 = { (S-> bAS.) } #rule 3

8 = { (A -> bA.) } #rule 4

It doesn’t really matter what numbers you give the rules, just as long as you make the consistent.
These rules are actually reductions, and we need them in the table.
To calculate where to put these we need to work out the follow sets.
(I’m not going to go over how to work out the follow set, you must look this up yourself)

But the Follow set is written FW(X), where X is a non-terminal, and the follow sets are made up of terminals, since they are effectively finding what terminal will come next after the non-terminal.

FW(S) = {$}
FW(A) = {a, b}

Therefore we look at the rule, for instance rule 2.
Rule 2 only happens in state 4. So we know the row it will be in our table. And the rule is S -> ab. – the part on the left of the rule is the part you’re looking for. The follow set of this is the column where you put the rule.
So you need to put r2 in column $, row 2.
And do this for every rule.

We also need to put shifts and gotos in the table.
A shift is for terminals and a goto is for non-terminals.

It’s probably best to explain with an example.

I’ll use state 3, since it’s one of the most complicated (and I hate when people give you the easiest example and expect you can do that hardest):

cl(3A) = { (S -> bA.S), (S -> .bAS), (S->.ab) } = 5
cl(3b) = { (A->b.A), (A->.bA), (A->.) } = 6

For this part we need to know what states you can get to from state 3.

As you can see, from state 3 via A, you can get to 5,
So we put g5 in row 3, column A. (g for goto since A is non-terminal)

And we can see, from state 3 via b, you can get to 6,
Meaning we put s6 in row 3, column b.

We do this for every state and we get the table:
a b $ S A
0 s2 s3 g1
1 acc
2 s4
3 r1 s6/r1 g5
4 r2
5 s2 s3 g7
6 r1 s6/r1 g8
7 r3
8 r4 r4

Now this is the SLR(1) parse table we’ve been wanting to get!

Now actually for the tricky bit (this will probably need going over at least a couple of times).

Parsing a string with our table.
We’re going to be parsing the string bbbab with this parse table and hopefully getting back to the accepting state.

Since SLR(1) parsers are ‘bottom-up’ parsers, they take the string, and then try to go through the different variations until it can get back to the original accepting state, meaning it is a correct string for the grammar.

And this is also where the largest match strategy comes in (trust me I did mention it at the start). This basically means that if you come to a point in the table where you can do a shift or a reduction, choose the shift.
You actually have a stack, and the input. You’re trying to basically derive the stack down to $0 and the something that will make you go to the row that has the accepting state. So in this case we’re trying to get the stack to $0S1, and the input to the end terminal, $, and the action as accept (acc).

Stack                                        Input                                        Actions

$0                                              bbbab$                                       s3

So this is where we will always start, with our stack and our input.
The action is obtained by looking at our parse table. You look at the last number on the stack (0 in this case) and the next input, b.
This gives you the action s3.
Shifting means you take the terminal from the input and put it on the stack, with the row number (3 in this case).

And then you keep on doing this.
$0b3                                           bbab$                                             s6
$0b3b6                                        bab$                                               s6
$0b3b6b6                                    ab$                                                 r1

Now that we have a different input, a, it means the action is now r1, or rule 1.
This means you look at the row you got this from, 6 (also the last number on the stack if you forgot) and then you have to look at the rule.
Rule 1 is (A -> .). When you do a reduction, you look at the right hand side and pop this off (take it off) the stack.
So in this case the right hand side of the rule is epsilon, so nothing, which means you don’t need to pop anything off of the stack.
Once you have done this you push the left hand side of the rule onto the stack, making
Now you need to find the number that goes here. The non-terminal on the left hand side of the rule is A, so we look up state 6, column A and we can see there is a g8, so you put the number associated with A on the stack as 8.

$0b3b6b6A8                              ab$                                                   r4

As you can see above the input did not change. Each time a reduction is done, it does not change the input!

So now we look at r4, which is (A -> bA.)
We look at the right hand side and we see bA, so we pop this off the stack.

We take a note of the number at the end here, and then we pop on the right hand side
and look up the number for A. For this we have to use the number we took a note of (the number before the A) and find this in the table. Again it’s g8 so we have:


And now we carry on:

$0b3b6A8                                   ab$                                                   r4
$0b3A5                                      ab$                                                    s2
$0b3A5a2                                   b$                                                     s4
$0b3A5a2b4                                $                                                      r2
$0b3A5S7                                   $                                                       r3
$0S1                                           $                                                      acc

And now we have got to the accepting state action, we have finished, and the string has been parsed!

I fully expect this may need a couple of read-throughs and it still might not make sense. The key is to just take it slowly, being careful to look up at the table and look up the rules when doing reductions.