::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. Oct/Nov 98 :::\_____\::::::::::. Issue 1 ::::::::::::::::::::::......................................................... A S S E M B L Y P R O G R A M M I N G J O U R N A L http://asmjournal.freeservers.com asmjournal@mailcity.com T A B L E O F C O N T E N T S ---------------------------------------------------------------------- Introduction...................................................mammon_ "VGA Programming in Mode 13h".............................Lord Lucifer "SMC Techniques: The Basics"...................................mammon_ "Going Ring0 in Windows 9x".....................................Halvar Column: Win32 Assembly Programming "The Basics"..............................................Iczelion "MessageBox"..............................................Iczelion Column: The C standard library in Assembly "_itoa, _ltoa and _ultoa"...................................Xbios2 Column: The Unix World "x86 ASM Programming for Linux"............................mammon_ Column: Issue Solution "11-byte Solution"..........................................Xbios2 ---------------------------------------------------------------------- +++++++++++++++++++++++Issue Challenge++++++++++++++++++++ Write a program that displays its command line in 11 bytes ---------------------------------------------------------------------- ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::..............................................INTRODUCTION by mammon_ Welcome to the first issue of Assembly Programming Journal. Assembly language has become of renewed interest to a lot of programmers, in what must be a backlash to the surge of poor-quality RAD-developed programs (from Delphi, VB, etc) released as free/shareware over the past few years. Assembly language code is tight, fast, and often well-coded -- you tend to find fewer inexperienced coders writing in assembly language than you do writing in, say, Visual Basic. The selection of articles is somewhat eclectic and should demonstrate the focus of this magazine: i.e., it targets the assembly-language programming community, not any particular type of coding such as Win32, virus, or demo programmimg. As the magazine is newly born and much of its purpose may seem unclear, I will devote the rest of this column to the most common questions I have received via email regarding the mag. How often will an issue be released? ------------------------------------ Barring hazard, an issue will be released every other month. What types of articles will be accepted? ---------------------------------------- Anything to do with assembly language. Obviously repeats of previously presented material are not necessary unless they enhance or clarify the earlier material. The focus will be on Intel x86 instruction sets; however coding for other processors is acceptable (though out of courtesy it would be good point to an x86 emulator for the processor you write on). Personally I am looking for articles on the areas of asembly language that interest me: code optimization, demo/graphics programming, virus coding, unix and other-OS asm coding, and OS-internals. Demos (with source) and quality ASCII art (for issue covers, column logos, etc) are especially welcome. For what level of coding experience is the mag intended? -------------------------------------------------------- The magazine is intended to appeal to asm coders of all levels. Each issue will contain mostly beginner and intermediate level code/techniques, as these will by nature be of the greatest demand; however one of the goals of APJ is to include enough advanced material to make the magazine appeal to "pros" as well. How will the mag be distributed? -------------------------------- Assembly Programming Journal has its own web page at http://asmjournal.freeservers.com which will contain the current issue and an archive of previous issues. The page also contains a guestbook and a disucssion board for article writers and readers. An email subscription may be obtained by sending an email to asmjournal@mailcity.com with the subject "SUBSCRIBE"; starting with the next issue, Assembly Programming Journal will be emailed to the address you sent the mail from. Wrap-up ------- That's the bulk of the "faq". Enjoy the mag! ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE VGA Programming in Mode 13h by Lord Lucifer This article will describe how to program VGA graphics Mode 13h using assembly language. Mode 13h is the 320x200x256 graphics mode, and is fast and very convenient from a programmer's perspective. The video buffer begins at address A000:0000 and ends at address A000:F9FF. This means the buffer is 64000 bytes long and that each pixel in mode 13h is represented by one byte. It is easy to set up mode 13h and the video buffer in assembly language: mov ax,0013h ; Int 10 - Video BIOS Services int 10h ; ah = 00 - Set Video Mode ; al = 13 - Mode 13h (320x200x256) mov ax,0A000h ; point segment register es to A000h mov es,ax ; we can now access the video buffer as ; offsets from register es At the end of your program, you will probably want to restore the text mode. Here's how: mov ax,0003h ; Int 10 - Video BIOS Services int 10h ; ah = 00 - Set Video Mode ; al = 03 - Mode 03h (80x25x16 text) Accessing a specific pixel int the buffer is also very easy: ; bx = x coordinate ; ax = y coordinate mul 320 ; multiply y coord by 320 to get row add ax,bx ; add this with the x coord to get offset mov cx,es:[ax] ; now pixel x,y can be accessed as es:[ax] Hmm... That was easy, but that multiplication is slow and we should get rid of it. That's easy to do too, simply by using bit shifting instead of multiplica- tion. Shifting a number to the left is the same as multiplying by 2. We want to multiply by 320, which is not a multiple of 2, but 320 = 256 + 64, and 256 and 64 are both even multiples of 2. So a faster way to access a pixel is: ; bx = x coordinate ; ax = y coordinate mov cx,bx ; copy bx to cx, to save it temporatily shl cx,8 ; shift left by 8, which is the same as ; multiplying by 2^8 = 256 shl bx,6 ; now shift left by 6, which is the same as ; multiplying by 2^6 = 64 add bx,cx ; now add those two together, whis is ; effectively multiplying by 320 add ax,bx ; finally add the x coord to this value mov cx,es:[ax] ; now pixel x,y can be accessed as es:[ax] Well, the code is a little bit longer and looks more complicated, but I can guarantee it's much faster. To plot colors, we use a color look-up table. This look-up table is a 768 (3x256) array. Each index of the table is really the offset index*3. The 3 bytes at each index hold the corresponding values (0-63) of the red, green, and blue components. This gives a total of 262144 total possible colors. However, since the table is only 256 elements big, only 256 different colors are possible at a given time. Changing the color palette is accomplished through the use of the I/O ports of the VGA card: Port 03C7h is the Palette Register Read port. Port 03C8h is the Palette Register Write port Port 03C9h is the Palette Data port Here is how to change the color palette: ; ax = palette index ; bl = red component (0-63) ; cl = green component (0-63) ; dl = blue component (0-63) mov dx,03C8h ; 03c8h = Palette Register Write port out dx,ax ; choose index mov dx,03C9h ; 03c8h = Palette Data port out dx,al mov bl,al ; set red value out dx,al mov cl,al ; set green value out dx,al mov dl,al ; set blue value Thats all there is to it. Reading the color palette is similar: ; ax = palette index ; bl = red component (0-63) ; cl = green component (0-63) ; dl = blue component (0-63) mov dx,03C7h ; 03c7h = Palette Register Read port out dx,ax ; choose index mov dx,03C9h ; 03c8h = Palette Data port in al,dx mov bl,al ; get red value in al,dx mov cl,al ; get green value in al,dx mov dl,al ; get blue value Now all we need to know is how to plot a pixel of a certain color at a certain location. Its very easy, given what we already know: ; bx = x coordinate ; ax = y coordinate ; dx = color (0-255) mov cx,bx ; copy bx to cx, to save it temporatily shl cx,8 ; shift left by 8, which is the same as ; multiplying by 2^8 = 256 shl bx,6 ; now shift left by 6, which is the same as ; multiplying by 2^6 = 64 add bx,cx ; now add those two together, whis is ; effectively multiplying by 320 add ax,bx ; finally add the x coord to this value mov es:[ax],dx ; copy color dx into memory location ; thats all there is to it Ok, we now know how to set up Mode 13h, set up the video buffer, plot a pixel, and edit the color palette. My next article will go on to show how to draw lines, utilize the vertical retrace for smoother rendering, and anything else I can figure out by that time... ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE SMC Techniques: The Basics by mammon_ One of the benefits of coding in assembly language is that you have the option to be as tricky as you like: the binary gymnastics of viral code demonstrate this above all else. One of the viral "tricks" that has made its way into standard protection schemes is SMC: self-modifying code. In this article I will not be discussing polymorphic viruses or mutation engines; I will not go into any specific software protection scheme, or cover any anti-debugger/anti-disassembler tricks, or even touch on the matter of the PIQ. This is intended to be a simple primer on self-modifying code, for those new to the concept and/or implementation. Episode 1: Opcode Alteration ---------------------------- One of the purest forms of self-modifying code is to change the value of an instruction before it is executed...sometimes as the result of a comparison, and sometimes to hide the code from prying eyes. This technique essentially has the following pattern: mov reg1, code-to-write mov [addr-to-write-to], reg1 where 'reg1' would be any register, and where '[addr-to-write-to]' would be a pointer to the address to be changed. Note that 'code-to-write- would ideally be an instruction in hexadecimal format, but by placing the code elsewhere in the program--in an uncalled subroutine, or in a different segment--it is possible to simply transfer the compiled code from one location to another via indirect addressing, as follows: call changer mov dx, offset [string] ;this will be performed but ignored label: mov ah, 09 ;this will never be perfomed int 21h ;this will exit the program .... changer: mov di, offset to_write ;load address of code-to-write in DI mov byte ptr [label], [di] ;write code to location 'label:' ret ;return from call to_write: mov ah, 4Ch ;terminate to DOS function this small routine will cause the program to exit, though in a disassembler it at first appears to be a simple print string routine. Note that by combining indirect addressing with loops, entire subroutines--even programs--can be overwritten, and the code to be written--which may be stored in the program as data--can be encrypted with a simple XOR to disguise it from a disassembler. The following is a complete asm program to demonstrate patching "live" code; it asks the user for a password, then changes the string to be printed depending on whether or not the password is correct: ; smc1.asm ================================================================== .286 .model small .stack 200h .DATA ;buffer for Keyboard Input, formatted for easy reference: MaxKbLength db 05h KbLength db 00h KbBuffer dd 00h ;strings: note the password is not encrypted, though it should be... szGuessIt db 'Care to guess the super-secret password?',0Dh,0Ah,'$' szString1 db 'Congratulations! You solved it!',0Dh,0Ah, '$' szString2 db 'Ah, damn, too bad eh?',0Dh,0Ah,'$' secret_word db "this" .CODE ;=========================================== start: mov ax,@data ; set segment registers mov ds, ax ; same as "assume" directive mov es, ax call Query ; prompt user for password mov ah, 0Ah ; DOS 'Get Keyboard Input' function mov dx, offset MaxKbLength ; start of buffer int 21h call Compare ; compare passwords and patch exit: mov ah,4ch ; 'Terminate to DOS' function int 21h ;=========================================== Query proc mov dx, offset szGuessIt ; Prompt string mov ah, 09h ; 'Display String' function int 21h ret Query endp ;=========================================== Reply proc PatchSpot: mov dx, offset szString2 ; 'You failed' string mov ah, 09h ; 'Display String' function int 21h ret Reply endp ;=========================================== Compare proc mov cx, 4 ; # of bytes in password mov si, offset KbBuffer ; start of password-input in Buffer mov di, offset secret_word ; location of real password rep cmpsb ; compare them or cx, cx ; are they equal? jnz bad_guess ; nope, do not patch mov word ptr cs:PatchSpot[1], offset szString1 ;patch to GoodString bad_guess: call Reply ; output string to display result ret Compare endp end start ; EOF ======================================================================= Episode 2: Encryption --------------------- Encryption is undoubtedly the most common form of SMC code used today. It is used by packers and exe-encryptors to either compress or hide code, by viruses to disguise their contents, by protection schemes to hide data. The basic format of encryption SMC would be: mov reg1, addr-to-write-to mov reg2, [reg1] manipulate reg2 mov [reg1], reg2 where 'reg1' would be a register containing the address (offset) of the location to write to, and reg2 would be a temporary register which loads the contents of the first and then modifies them via mathematical (ROL) or logical (XOR) operations. The address to be patched is stored in reg1, its contents modified within reg2, and then written back to the original location still stored in reg1. The program given in the preceding section can be modified so that it unencrypts the password by overwriting it (so that it remains unencrypted until the program is terminated) by first changing the 'secret_word' value as follows: secret_word db 06Ch, 04Dh, 082h, 0D0h and then by changing the 'Compare' routine to patch the 'secret_word' location in the data segment: ;=========================================== magic_key db 18h, 25h, 0EBh, 0A3h ;not very secure! Compare proc ;Step 1: Unencrypt password mov al, [magic_key] ; put byte1 of XOR mask in al mov bl, [secret_word] ; put byte1 of password in bl xor al, bl mov byte ptr secret_word, al ; patch byte1 of password mov al, [magic_key+1] ; put byte2 of XOR mask in al mov bl, [secret_word+1] ; put byte2 of password in bl xor al, bl mov byte ptr secret_word[1], al ; patch byte2 of password mov al, [magic_key+2] ; put byte3 of XOR mask in al mov bl, [secret_word+2] ; put byte3 of password in bl xor al, bl mov byte ptr secret_word[2], al ; patch byte3 of password mov al, [magic_key+3] ; put byte4 of XOR mask in al mov bl, [secret_word+3] ; put byte4 of password in bl xor al, bl mov byte ptr secret_word[3], al ; patch byte4 of password mov cx, 4 ;Step 2: Compare Passwords...no changes from here mov si,offset KbBuffer mov di, offset secret_word rep cmpsb or cx, cx jnz bad_guess mov word ptr cs:PatchSpot[1], offset szString1 bad_guess: call Reply ret Compare endp Note the addition of the 'magic_key' location which contains the XOR mask for the password. This whole thing could have been made more sophisticated with a loop, but with only four bytes the above speeds debugging time (and, thereby, article-writing time). Note how the password is loaded, XORed, and re-written one byte at a time; using 32-bit code, the whole (dword) password could be written, XORed and an re-written at once. Episode 3. Fooling with the stack --------------------------------- This is a trick I learned while decompiling some of SunTzu's code. What happens here is pretty interesting: the stack is moved into the code segment of the program, such that the top of the stack is set to the first address to be patched (which, BTW, should be the one closest to the end of the program due to the way the stack works); the byte at this address is the POPed into a register, manipulated, and PUSHed back to its original location. The stack pointer (SP) is then decremented so that the next address to be patched (i byte lower in memory) is now at the top of the stack. In addition, the bytes are being XORed with a portion of the program's own code, which disguises somewhat the actual value of the XOR mask. In the following code, I chose to use the bytes from Start: (200h when compiled) up to --but not including-- Exit: (214h when compiled; Exit-1 = 213h). However, as with SunTzu's original code I kept the "reverse" sequence of the XOR mask such that byte 213h is the first byte of the XOR mask, and byte 200h is the last. After some experimentation I found this was the easiest way to sync a patch program--or a hex editor--to the stack-manipulative code; since the stack moves backwards (a forward-moving stack is more trouble than it is worth), using a "reverse" XOR mask allows both filepointers in a patcher to be INCed or DECed in sync. Why is this an issue? Unlike the previous two examples, the following does not contain the encrypted version of the code-to-be-patched. It simply contains the source code which, when compiled, results in the unencrypted bytes which are then run through the XOR routine, encrypted, and then executed (which, if you have followed thus far, will immediately demonstrate to be no good... though it is a fantastic way of crashing the DOS VM!). Once the program is compiled you must either patch the bytes-to-be-decrypted manually, or write a patcher to do the job for you. The former is more expedient, the latter is more certain and is a must if you plan on maintaining the code. In the following example I have embedded 2 CCh's (Int3) in the code at the fore and aft end of the bytes-to-be-decrypted section; a patcher need simply search for these, count the bytes in between, and then XOR with the bytes between 200-213h. Once again, this sample is a continuation of the previous example. In it, I have written a routine to decrypt the entire 'Compare' routine of the previous section by XORing it with the bytes between 'Start' and 'Exit'. This is accomplished by seeting the stack segment equal to the code segment, then setting the stack pointer equal to the end (highest) address of the code to be modified. A byte is POPed from the stack (i.e. it's original location), XORed, and PUSHed back to its original location. The next byte is loaded by decrementing the stack pointer. Once all of the code it decrypted, control is returned to the newly-decrypted 'Compare' routine and normal execution resumes. ;=========================================== magic_key db 18h, 25h, 0EBh, 0A3h Compare proc mov cx, offset EndPatch[1] ;start addr-to-write-to + 1 sub cx, offset patch_pwd ;end addr-to-write-to mov ax, cs mov dx, ss ;save stack segment--important! mov ss, ax ;set stack segment to code segment mov bx, sp ;save stack pointer mov sp, offset EndPatch ;start addr-to-write-to mov si, offset Exit-1 ;start sddr of XOR mask XorLoop: pop ax ;get byte-to-patch into AL xor al, [si] ;XOR al with XorMask push ax ;write byte-to-patch back to memory dec sp ;load next byte-to-patch dec si ;load next byte of XOR mask cmp si, offset Start ;end sddr of XOR mask jae GoLoop ;if not at end of mask, keep going mov si, offset Exit-1 ;start XOR mask over GoLoop: loop XorLoop ;XOR next byte mov sp, bx ;restore stack pointer mov ss, dx ;restore stack segment jmp patch_pwd db 0CCh,0CCh ;Identifcation mark: START patch_pwd: ;no changes from here mov al, [magic_key] mov bl, [secret_word] xor al, bl mov byte ptr secret_word, al mov al, [magic_key+1] mov bl, [secret_word+1] xor al, bl mov byte ptr secret_word[1], al mov al, [magic_key+2] mov bl, [secret_word+2] xor al, bl mov byte ptr secret_word[2], al mov al, [magic_key+3] mov bl, [secret_word+3] xor al, bl mov byte ptr secret_word[3], al ;compare password mov cx, 4 mov si, offset KbBuffer mov di, offset secret_word rep cmpsb or cx, cx jnz bad_guess mov word ptr cs:PatchSpot[1], offset szString1 bad_guess: call Reply ret Compare endp EndPatch: db 0CCh, 0CCh ;Identification Mark: END This kind of program is very hard to debug. For testing, I substituted 'xor al, [si]' first with 'xor al, 00h', which would cause no encryption and is useful for testing code for final bugs, and then with 'xor al, EBh', which allowed me to verify that the correct bytes were being encrypted (it never hurts to check, after all). Episode 4: Summation -------------------- That should demonstrate the basics of self-modifying code. There are a few techniques to consider to make development easier, though really any SMC programs will be tricky. The most important thing is to get your program running completely before you start overwriting any of its code segments. Next, always create a program that performs the reverse of any decryption/encryption code--not only does this speed up comilation and testing by automating the encryption of code areas that will be decrypted at runtime, it also provides a good tool for error checking using a disassembler (i.e. encrypt the code, disassemble, decrypt the code, disassemble, compare). In fact, it is a good idea to encapsulate the SMC portion of your program in a separate executable and test it on the compiled "release product" until all of the bugs are out of the decryption routine, and only then add the decryption routine to your final code. The CCh 'landmarks' (codemarks?) are extremely useful as well. Finally, do your debugging with debug.com for DOS applications--the debugger is quick, small, and if it crashes you simply lose a Windows DOS box. The ability to view the program address space after the program has terminated but before it is unloaded is another distinct advantage. More complex examples of SMC programs can be found in Dark Angel's code, the Rhince engine, or in any of the permutation engines used in ploymorphic viruses. Acknowledgements go to Sun-Tzu for the stack technique used in his ghf-crackme program. ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE Going Ring0 in Windows 9x by Halvar Flake This article gives a short overview over two ways to go Ring0 in Windows 9x in an undocumented way, exploiting the fact that none of the important system tables in Win9x are on pages which are protected from low-privilege access. A basic knowledge of Protected Mode and OS Internals are required, refer to your Assembly Book for that :-) The techniques presented here are in no way a good/clean way to get to a higher privilege level, but since they require only a minimal coding effort, they are sometimes more desirable to implement than a full-fledged VxD. 1. Introduction --------------- Under all modern Operating Systems, the CPU runs in protected mode, taking advantage of the special features of this mode to implementvirtual memory, multitasking etc. To manage access to system-critical resources (and to thus provide stability) a OS is in need of privilege levels, so that a program can't just switch out of protected mode etc. These privilege levels are represented on the x86 (I refer to x86 meaning 386 and following) CPU by 'Rings', with Ring0 being the most privileged and Ring3 being the least privileged level. Theoretically, the x86 is capable of 4 privilege levels, but Win32 uses only two of them, Ring0 as 'Kernel Mode' and Ring3 as 'User Mode'. Since Ring0 is not needed by 99% of all applications, the only documented way to use Ring0 routines in Win9x is through VxDs. But VxDs, while being the only stable and recommended way, are work to write and big, so in a couple of specialized situations, other ways to go Ring0 are useful. The CPU itself handles privilege level transitions in two ways: Through Exceptions/Interrupts and through Callgates. Callgates can be put in the LDT or GDT, Interrupt-Gates are found in the IDT. We'll take advantage of the fact that these tables can be freely written to from Ring3 in Win9x (NOT IN NT !). 2. The IDT method ----------------- If an exception occurs (or is triggered), the CPU looks in the IDT to the corresponding descriptor. This descriptor gives the CPU an Address and Segment to transfer control to. An Interrupt Gate descriptor looks like this: --------------------------------- --------------------------------- D D 1.Offset (16-31) P P P 0 1 1 1 0 0 0 0 R R R R R +4 L L --------------------------------- --------------------------------- 2.Segment Selector 3.Offset (0-15) 0 --------------------------------- --------------------------------- DPL == Two bits containing the Descriptor Privilege Level P == Present bit R == Reserved bits The first word (Nr.3) contains the lower word of the 32-bit address of the Exception Handler. The word at +6 contains the high-order word. The word at +2 is the selector of the segment in which the handler resides. The word at +4 identifies the descriptor as Interrupt Gate, contains its privilege and the present bit. Now, to use the IDT to go Ring0, we'll create a new Interrupt Gate which points to our Ring0 procedure, save an old one and replace it with ours. Then we'll trigger that exception. Instead of passing control to Window's own handler, the CPU will now execute our Ring0 code. As soon as we're done, we'll restore the old Interrupt Gate. In Win9x, the selector 0028h always points to a Ring0-Code Segment, which spans the entire 4 GB address range. We'll use this as our Segment selector. The DPL has to be 3, as we're calling from Ring3, and the present bit must be set. So the word at +4 will be 1110111000000000b => EE00h. These values can be hardcoded into our program, we have to just add the offset of our Ring0 Procedure to the descriptor. As exception, you should preferrably use one that rarely occurs, so do not use int 14h ;-) I'll use int 9h, since it is (to my knowledge) not used on 486+. Example code follows (to be compiled with TASM 5): -------------------------------- bite here ----------------------------------- .386P LOCALS JUMPS .MODEL FLAT, STDCALL EXTRN ExitProcess : PROC .data IDTR df 0 ; This will receive the contents of the IDTR ; register SavedGate dq 0 ; We save the gate we replace in here OurGate dw 0 ; Offset low-order word dw 028h ; Segment selector dw 0EE00h ; dw 0 ; Offset high-order word .code Start: mov eax, offset Ring0Proc mov [OurGate], ax ; Put the offset words shr eax, 16 ; into our descriptor mov [OurGate+6], ax sidt fword ptr IDTR mov ebx, dword ptr [IDTR+2] ; load IDT Base Address add ebx, 8*9 ; Address of int9 descriptor in ebx mov edi, offset SavedGate mov esi, ebx movsd ; Save the old descriptor movsd ; into SavedGate mov edi, ebx mov esi, offset OurGate movsd ; Replace the old handler movsd ; with our new one int 9h ; Trigger the exception, thus ; passing control to our Ring0 ; procedure mov edi, ebx mov esi, offset SavedGate movsd ; Restore the old handler movsd call ExitProcess, LARGE -1 Ring0Proc PROC mov eax, CR0 iretd Ring0Proc ENDP end Start -------------------------------- bite here ----------------------------------- 3. The LDT Method ----------------- Another possibility of executing Ring0-Code is to install a so- called callgate in either the GDT or LDT. Under Win9x it is a little bit easier to use the LDT, since the first 16 descriptors in it are always empty, so I will only give source for that method here. A Callgate is similar to a Interrupt Gate and is used in order to transfer control from a low-privileged segment to a high-privileged segment using a CALL instruction. The format of a callgate is: --------------------------------- --------------------------------- D D D D D D 1.Offset (16-31) P P P 0 1 1 0 0 0 0 0 0 W W W W +4 L L C C C C --------------------------------- --------------------------------- 2.Segment Selector 3.Offset (0-15) 0 --------------------------------- --------------------------------- P == Present bit DPL == Descriptor Privilege Level DWC == Dword Count, number of arguments copied to the ring0 stack So all we have to do is to create such a callgate, write it into one of the first 16 descriptors, then do a far call to that descriptor to execute our Ring0 code. Example Code: -------------------------------- bite here ----------------------------------- .386P LOCALS JUMPS .MODEL FLAT, STDCALL EXTRN ExitProcess : PROC .data GDTR df 0 ; This will receive the contents of the IDTR ; register CallPtr dd 00h ; As we're using the first descriptor (8) and dw 0Fh ; its located in the LDT and the privilege level ; is 3, our selector will be 000Fh. ; That is because the low-order two bits of the ; selector are the privilege level, and the 3rd ; bit is set if the selector is in the LDT. OurGate dw 0 ; Offset low-order word dw 028h ; Segment selector dw 0EC00h ; dw 0 ; Offset high-order word .code Start: mov eax, offset Ring0Proc mov [OurGate], ax ; Put the offset words shr eax, 16 ; into our descriptor mov [OurGate+6], ax xor eax, eax sgdt fword ptr GDTR mov ebx, dword ptr [GDTR+2] ; load GDT Base Address sldt ax add ebx, eax ; Address of the LDT descriptor in ; ebx mov al, [ebx+4] ; Load the base address mov ah, [ebx+7] ; of the LDT itself into shl eax, 16 ; eax, refer to your pmode mov ax, [ebx+2] ; manual for details add eax, 8 ; Skip NULL Descriptor mov edi, eax mov esi, offset OurGate movsd ; Move our custom callgate movsd ; into the LDT call fword ptr [CallPtr] ; Execute the Ring0 Procedure xor eax, eax ; Clean up the LDT sub edi, 8 stosd stosd call ExitProcess, LARGE -1 Ring0Proc PROC mov eax, CR0 retf Ring0Proc ENDP end Start -------------------------------- bite here ----------------------------------- Well, that's all for now folks. This method can be easily changedto use the GDT instead which would save a few bytes in case you have to optimize heavily. Anyways, do use these methods with care, they will NOT run on NT and are generally not exactly a clean or stable way to do these things. Credits & Thanks ---------------- The IDT-Method taken from the CIH virus & Stone's example source at http://www.cracking.net. The LDT-Method was done by me, but without IceMans & The_Owls help I would still be stuck, so all credits go to them. ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::................................WIN32.ASSEMBLY.PROGRAMMING Win32 ASM: The Basics by Iczelion The required tools: -Microsoft Macro Assembler 6.1x : MASM support of Win32 programming starts from version 6.1. The latest version is 6.13 which is a patch to previous version of 6.11. Win98 DDK includes MASM 6.11d which you can download from Microsoft at http://www.microsoft.com/hwdev/ddk/download/win98ddk.exe But be warned, this monstrosity is huge, 18.5 MB in size. MASM 6.13 patch can also be downloaded from ftp://ftp.microsoft.com/softlib/mslfiles/ml613.exe -Microsoft import libraries : You can use the import libraries from Visual C++. Some are included in Win98 DDK. -Win32 API Reference : You can download it from Borland's site: ftp://ftp.borland.com/pub/delphi/techpubs/delphi2/win32.zip Here's a brief description of the assembly process. MASM 6.1x comes with two essential tools: ml.exe and link.exe. ml.exe is the assembler. It takes in the assembly source code (.asm) and produces an object file (.obj) . An object file is an intermediate file between the source code and the executable file. It needs some address fixups which are the services provided by link.exe. Link.exe makes an object file into an executable file by several means such as adding the codes from other modules to the object files or providing the address fixups, addingr esouces, etc. For example: ml skeleton.asm ---> this produces skeleton.obj link skeleton.obj ---> this produces skeleton.exe The above lines are simplification of course. In the real world, you must add several switches to ml.exe and link.exe to customize your application. Also there will be several files you must link with the object file in order to create your application. Win32 programs run in protected mode which is available since 80286. But 80286 is now history. So we only have to concern ourselves with 80386 and its descendants. Windows run each Win32 program in separated virtual space. That means each Win32 program will have its own 4 GB address space. Each program is alone in its address space. This is in contrast to the situation in Win16. All Win16 programs can *see* each other. Not so in Win32. This feature helps reduce the chance of one program writing over other program's code/data. Memory model is also drastically different from the old days of the 16-bit world. Under Win32, we need not be concerned with memory model or segment anymore! There's only one memory model: Flat memory model. There's no more 64K segments. The memory is a large continuous space of 4 GB. That also means you don't have to play with segment registers. You can use any segment register to address any point in the memory space. That's a GREAT help to programmers. This is what makes Win32 assembly programming as easy as C. We will examine a miminal skeleton of a Win32 assembly program. We'll add more flesh to it later. Here's the skeleton program. If you don't understand some of the codes, don't panic. I'll explain each of them later. .386 .MODEL Flat, STDCALL .DATA ...... .DATA? <Your uninitialized data> ...... .CONST ...... .CODE