News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

what's mov esi,esi useful for??

Started by bobl, November 27, 2009, 02:09:21 PM

Previous topic - Next topic

bobl

I downloaded an archive and discovered that the pre-assembled floppy image "fd.img",  in it, which works,
does not derive from the sources that accompany it and from which I made "fd.img.using_makefile", which does not work.
(although the two are close)

Using vbindiff shows the first differences (highlighted in square brackets)

fd.img                                                                         
0000 0020: 40 0B 00 00 00 00 00 00  00 01 02 12 1B FF [89 F6]  @....... ........
0000 0030: 09 00 00 00 17 00 [40] 00  00 00 [8D B6 00 00] 00 00  ......@. ........

fd.img.using_makefile                                                           
0000 0020: 40 0B 00 00 00 00 00 00  00 01 02 12 1B FF [66 90]  @....... ......f.
0000 0030: 09 00 00 00 17 00 [00] 00  00 00 [66 0F 1F 44] 00 00  ........ ..f..D..

The are differences at addresses 2e, 2f, 36, 3a-3d

I used ht editor to look at the images and it tells me that, in the image that works, the first two different bytes
represent the instruction "mov esi,esi" or "mov si,si" in 32 or 16 bit mode.

In the image that doesn't work the instruction that "replaces" it is .balign 4

I wonder if "mov esi,esi"  serves any useful purpose or whether these two bytes are data cos their addresses are in the area of a data block
ie

  57 0025 00        command:.byte 0
  58 0026 00        .byte 0# head, drive
  59              cylinder:
  60 0027 00        .byte 0
  61 0028 00        .byte 0# head
  62 0029 01        .byte 1# sector
  63 002a 02        .byte 2# 512 bytes/sector
  64 002b 12        .byte 18# sectors/track
  65 002c 1B        .byte 0x1b# gap
  66 002d FF        .byte 0x0ff
  67 002e 6690      .balign 4                      <===========first address of differences between images


Any thoughts appreciated



dedndave

if it were MOV SI,SI or ESI,ESI, it might be some kind of a 2-byte NOP (NOP is really XCHG AX,AX or EAX,EAX)
it has been replaced with 66 90 - a size override prefix and a NOP - a 2-byte NOP, of sorts
neither one really makes sense - lol
i would say it is data

i don't understand what you are trying to do, though
are you trying to create an assembler listing ?

EDIT - what you may be looking at is a data table for formatting a disk
the F6 may be a "gap fill byte" (usually, they use the media descriptor byte for that)
it has been a while since i messed with the format service
but, i can give you clues to find info about it
acquire a copy of Ralf Brown's Interrupt List
look up the BIOS INT 13h function for "Format a Track"
that should give you the structure definition for that call
http://www.cs.cmu.edu/~ralf/files.html

bobl

Sorry had to nip out to the dentist!

Thx for the advice!

Here's what I'm trying to do...
At this url

http://www.dnd.utwente.nl/~tim/colorforth/bochs/

there's an archive containing executable "fd.img" that runs under bochs
together with what appears to be the sources used to derive it.

Since Charles Moore invented Forth lots of people have written their own.
Decades later Charles Moore attempted to show his reviewed ideas of what Forth should be.
fd.img is that idea modified slightly just to enable it to run under bochs so you can "poke it with a stick"
It's very close to the "tin" indeed.

The problem is that I don't think fd.img was derived from the supplied sources even though they're similar so I'm trying to reverse the executable to get the sources for it.
I'm doing this cos the supplied executable works whereas the one made from the sources doesn't.
Much of the sources are correct though so I'm just looking for the differences and changing them to those in the working executable.
I've created a listing file of the sources supplied ( ie color.s which includes all other asm files) so that I can work out which
bytes and ultimately instructions are different in both image files so I know which instructions/data to replace.
Having worked out which instructions to replace I then need to work out what instructions to replace them with
e'g' .balign 4 with mov esi,esi.

To start with it might be easier to replace the different instructions without regard for the "sensibleness"
e.g .balign 4 with mov esi,esi even if it is rubbish or NOP
just to get a byte for byte compatible executable with source that runs 
cos it migh then be easier to work out what they should be ie. with something that runs and can be read.

What do you reckon?
Back in an hour



bobl

Your suggestion is in keeping with the code cos the instruction that differs in both images occurs in the source straight after byte data under a label called "cylinder".  Thx for the advice on where to check it out.

dedndave

well - i have no experience with bochs
i did debug the first 512 bytes with symdeb
it looks as though bochs leaves some data lying around in places that a normal BIOS does not
at that point, you have to be familiar enough with bochs boot sequence to understand what is what
still, my guess is those are data values
sometimes data is an address or a pointer, of course (may be the case in the later discrepancies)

bobl

Thx for trying I'll have a look at symdeb.
In bochs I broke at the first instruction

lb 0x7c00
c

& tried a read watch point on the first byte in question to see how it was used ie

watch read 0x2e
c

but nothing read it so I tried a write watch point

watch write 0x2e
c

and got a couple of "bites" ie

(0) Caught write watch point at 0x0000002c
(0) [0x00007c9a] 0000:7c9a (unk. ctxt): rep movsd dword ptr es:[di], dword ptr ds:[si] ; f366a5

above is just moving the image down to start @ address 0

00003918211e[FDD  ] non DMA mode not fully implemented yet
00003919097i[CPU0 ] [3919097] Caught write watch point
(0) Caught write watch point at 0x0000002e
(0) [0x000001d4] 0008:000001d4 (unk. ctxt): dec ecx                   ; 49           

dec ecx is the first bit of "next 0b" in the source ie it jnz follows on (I had to look this up!)

Heres the chunk where bochs broke on the second supposed write to 0x2e
Note this is the source listing but the bochs image looks very similar so its prolly doing the same thing
and both seem immediately below a block labeled dma: in the sources. So both "stink" of floppy disk reading.

in color.s list file
228 01bb E8B6FFFF read: call seek
228      FF
229 01c0 B0E6      movb $0x0e6, %al# Read normal data
230 01c2 E877FFFF         call transfer
230      FF
231 01c7 66B90048         movw $18 * 2 * 512, %cx
232 01cb E85FFFFF 0:         call ready
232      FF
233 01d0 EC        in %dx, %al
234 01d1 E6E1      outb %al, $0x0e1
235 01d3 AA                stosb                                  (write AL at address ES:(E)DI depending on instruction size)   
236 01d4 4975F4            next 0b      <======== address where write watch point on addr 0x2e stopped
237 01d7 C3        ret



I can't see how "next" could write to 0x2e and so assumed that it must be stosb so I broke just before it and looked at the registers
ie

(0) Breakpoint 2, 0x000001d3 in ?? ()
Next at t=3918498
(0) [0x000001d3] 0008:000001d3 (unk. ctxt): stosb byte ptr es:[edi], al ; aa
<bochs:5> r
eax: 0x000000eb 235
ecx: 0x00004800 18432
edx: 0x000003f5 1013
ebx: 0x00000000 0
esp: 0x0009fff8 655352
ebp: 0xe0000000 -536870912
esi: 0x0009f448 652360
edi: 0x00000000 0
eip: 0x000001d3

I can't see how ES:(E)DI amounts to 0x2e so am at a bit of a loss at this point.
Having said that my hex maths is rubbish.

Is bochs misleading me do you think re writing to 0x2e the second time?
ie I can see how it would write to this address when relocating the image to free space



 




dedndave

well - symdeb is less than ideal for this, as they are booting up with some 32-bit code
bochs let's them do that, i guess
symdeb displays "db 66" for the size override prefix - but i know what it means
some instructions, it does not recognize at all
you may be better off trying to debug it with Olly
start with the first 512 bytes - that is the boot sector, which will tell you how the rest is loaded
it gets loaded at 0000:7C00 and executed - that much is the same as DOS
you will have to account for that when translating addresses and keep an eye on the segment registers

MichaelW

QuoteHaving worked out which instructions to replace I then need to work out what instructions to replace them with e'g' .balign 4 with mov esi,esi.

.balign is not an instruction, it's a directive that causes the assembler to add padding as necessary to place the next item at the specified alignment. Here is a listing of the NOP sequences that a fairly recent version of GAS uses in a code section (I did not test in a data section, but I would expect the padding there to be zeros):

No-op sequences inserted for .balign, 1 to 15 bytes
For GNU assembler version 2.18.50 (i686-pc-mingw32)

00401001 90                     nop

00401006 6690                   nop

00401009 8D7600                 lea     esi,[esi]

00401014 8D742600               lea     esi,[esi]

0040101B 90                     nop
0040101C 8D742600               lea     esi,[esi]

00401022 8DB600000000           lea     esi,[esi]

00401029 8DB42600000000         lea     esi,[esi]

00401038 90                     nop
00401039 8DB42600000000         lea     esi,[esi]

00401047 89F6                   mov     esi,esi
00401049 8DBC2700000000         lea     edi,[edi]

00401056 8D7600                 lea     esi,[esi]
00401059 8DBC2700000000         lea     edi,[edi]

00401065 8D742600               lea     esi,[esi]
00401069 8DBC2700000000         lea     edi,[edi]

00401074 8DB600000000           lea     esi,[esi]
0040107A 8DBF00000000           lea     edi,[edi]

00401083 8DB600000000           lea     esi,[esi]
00401089 8DBC2700000000         lea     edi,[edi]

00401092 8DB42600000000         lea     esi,[esi]
00401099 8DBC2700000000         lea     edi,[edi]

Unlike the other sequences, this 15-byte sequence
starts with a jump past the nops

004010A1 EB0D                   jmp     loc_004010B0
004010A3 90                     nop
004010A4 90                     nop
004010A5 90                     nop
004010A6 90                     nop
004010A7 90                     nop
004010A8 90                     nop
004010A9 90                     nop
004010AA 90                     nop
004010AB 90                     nop
004010AC 90                     nop
004010AD 90                     nop
004010AE 90                     nop
004010AF 90                     nop
004010B0                    loc_004010B0:

eschew obfuscation

bobl

Given your corresponding information i.e

"if it were MOV SI,SI or ESI,ESI, it might be some kind of a 2-byte NOP"
&
"Here is a listing of the NOP sequences...
00401047 89F6                   mov     esi,esi"

that 89 F6 is padding I zero'd the two bytes and the Forth/os still boots fine.
I suppose the test will come when I get far enough to save the modified image back to disk, if bochs lets me do this.
Your help is very much appreciated cos this is the first change of my first reverse engineering project and wasn't obvious.