In my program I have a block of memory which are byte sized flags, either zero or one. There are a number of ways of setting & resetting these flags, here are some methods :-
esi is the pointer to the start of the flags, setting a flag can be done as follows where n is the position of the flag.
inc byte ptr [esi+n]
add byte ptr [esi+n],1
mov byte ptr [esi+n],1
or byte ptr [esi+n],1
restting the flag would be the opposite of this :-
dec byte ptr[esi+n]
sub byte ptr [esi+n],1
mov byte ptr [esi+],0
and byte ptr [esi+n],0
my query is which is the better (faster) method. Probably another method would be to treat each flag as a variable with a unique label. but then the same problem would arise as how to deal with them.
from my experience, you have to try each one with the code that will surround it
i.e. the one that will be fastest may depend on adjacent instructions :P
of all the instructions you show, i would lean toward the MOV's however - they are likely to be fastest
that is because, with the other instructions, the processor has to read and write the memory operand
also, in some cases, calculating the entire address in a single register may or may not help
that is, if you can stick some other instructions between the steps
you have to try it both ways and time it - lol
for example
mov ebx,n
;some other instruction(s)
add ebx,esi
;some other instruction(s)
mov byte ptr [ebx],0
of course, if you have a register that happens to contain a 0, this is best
;al = 0
mov ebx,n
;some other instruction(s)
add ebx,esi
;some other instruction(s)
mov [ebx],al
or
;al = 0
mov [esi+n],al
using a register rather than an immediate is faster and smaller, too
reading your question, i get the impression that each section of code accesses one flag value
if you happen to be accessing the flag several times, it may be advantageous to place it in the stack frame
you can do that by calculating (n - (n mod 4)) and pushing the flag set
mov ebx,n
and bl,0FCh
push dword ptr [esi+ebx]
;then access it with ebp
mov [ebp-m],al
;and store the flag set when done
pop dword ptr [esi+ebx]
that method probably won't help you - but there are cases where it may
on a similar note, i have had cases where placing a dword constant in the stack frame was faster than coding it as an immediate
most modern processors are designed to optimize stack frame references
The MOV is the only instruction listed that only WRITES, the others are READ-MODIFY-WRITE (RMW) type instructions. Most of the slowness is absorbed by the processor with caches, and write buffers these days, add a LOCK prefix to see how slow they are if the RMW operation is performed atomically.
The x86 also has the BTR and BTS instructions to test and reset/set bits.
Thanks Dave for your comments, you've given me something to think about.
Clive I hadn't thought of using BTR, BTS, I'll have to look into that, check clock cycles etc
Thanks guys :U
MOV's are probably still your best bet :P
Yes, mov immediate seems to be the way to go, but first I'll check if any register contains zero, I have my doubts but still worth a look.
i forgot to mention something...
memory likes to be accessed as 4-aligned dwords
again, if you are accessing the flag several times, and have a register available,
you may want to get 4 flags into, say, EDX at the onset
then access the flag in-register and store the result from EDX when finished
Yes, that's a good thought Dave, something else to look into. My next problem is to check whether the flags are zero or one, I seem to remember there was rather a long discussion on that very thing not so long ago :toothy
Anyone know the link to the check for zero or one topic I can't find it. It got that involved I can't remember what the decision was, that's if there was one.
lol - i think that thread was for in-register checks
but - give us some clues ????
i take it, upon entry, that the flag may be either 0 or 1
do you always clear it on exit ?
I can't remember, all I know is that it was a test for zero or one something along the lines of TEST EAX,EAX.
The flags can be either on entry & cleared or not cleared on exit.
that being the case, i would try to get a 4-aligned dword into register
then use Clives suggestion of using bit test/set instructions on that register
if you don't clear it on exit, you simply do nothing :bg
besides using MOV, that's really all i can think of
Yes, that's probably a good idea, but I'd still like to read that post to see if there was a decision. I've done lots of searches & come up blank, just a matter of finding the correct search parameters. I don't think it was that long ago, but, then again it could have been last year.
If you need no more than 32 different flags, here two macros from the MasmBasic library:
Flags MACRO pos
bt MbFlags, pos
EXITM <CarrY?>
ENDM
SetFlags MACRO pos, mode
.if mode
bts MbFlags, pos
.else
btr MbFlags, pos
.endif
ENDM
Usage:
include \masm32\MasmBasic\MasmBasic.inc
Init
Whatever = 3
SetFlags Whatever, 1
.if Flags(Whatever)
MsgBox 0, "Flag Whatever set", "Hi", MB_OK
.else
MsgBox 0, "Flag Whatever clear", "Hi", MB_OK
.endif
SetFlags Whatever, 0
.if Flags(Whatever)
MsgBox 0, "Flag Whatever set", "Hi", MB_OK
.else
MsgBox 0, "Flag Whatever clear", "Hi", MB_OK
.endif
Exit
end start
That's interesting jj, but unfortunately there are 100+ flags :(
Quote from: Neil on November 06, 2010, 04:20:06 PM
That's interesting jj, but unfortunately there are 100+ flags :(
Would still work if you access them with immediates:
include \masm32\include\masm32rt.inc
Flags MACRO pos1
LOCAL posMem, posBit
posMem = pos1/32
posBit = pos1/32-posMem
bt MyFlags[posMem*4], posBit
EXITM <CarrY?>
ENDM
SetFlags MACRO pos1, mode
LOCAL posMem, posBit
posMem = pos1/32
posBit = pos1/32-posMem
if mode
bts MyFlags[posMem*4], posBit
else
btr MyFlags[posMem*4], posBit
endif
ENDM
.data?
MyFlags dd 4 dup(?) ; 128 flags
.code
start:
SetFlags 0, 1
SetFlags 99, 1
.if Flags(99)
MsgBox 0, "Flag 99 set", "Hi", MB_OK
.else
MsgBox 0, "Flag 99 clear", "Hi", MB_OK
.endif
SetFlags 99, 0
.if Flags(99)
MsgBox 0, "Flag 99 set", "Hi", MB_OK
.else
MsgBox 0, "Flag 99 clear", "Hi", MB_OK
.endif
exit
end start
To get that to work I would have to change my flag definition from bytes to dwords, but still worth thinking about.
Neil,
you say there are ~100 flags
is it safe to assume there are more than one set of flags ?
i say that because you have ESI holding the base address
how many sets of flags are there ?
Quote from: Neil on November 07, 2010, 11:28:27 AM
To get that to work I would have to change my flag definition from bytes to dwords, but still worth thinking about.
It actually works with bits, and testing is done with
.if Carry? etc, or if you prefer, with the
jc/jnc instructions.
Dave there is only one set of flags. esi is only loaded up when required.
jj I thought that BTR & BTS only worked with words & dwords, or am I getting a bit confused somewhere :dazzled:
Quote from: Neil on November 07, 2010, 12:48:44 PM
jj I thought that BTR & BTS only worked with words & dwords, or am I getting a bit confused somewhere
It is confusing indeed. You test for a bit in a dword registe, but it can also be memory - and then a bt MyMem, eax has apparently no limits, i.e. you can define the bit offset in eax freely. If I find time, I will design The Perfect Macro :wink
I look forward to that :bg
if there is only one set of flags, then ESI need not contain the base address - it may be specified as a direct address
instead of...
mov byte ptr [esi+n],0
you can use
mov byte ptr FlagTable[n],0
that simplifies the addressing mode
furthermore, if there are only ~100 flags, use 4-aligned dwords for each one
who cares about a few hundred extra bytes ?
if you want fast....
mov dword ptr FlagTable[4*n],0
(note - this assembles as a single address value)
even better if the 0/1 are in register :bg
if the flag data needs to be saved to a file at some point, reduce it with an algo and expand it on read
I like that one dave :8) I would never of thought of that method & you're right a few hundred bytes is neither here or there.
i think it's faster
at least it was on an 8086, if i remember correctly - lol