I'm kinda newish to assembly and I was hoping that someone could help me with this. I made a program to calculate the prime numbers up to 2^32 by division (I know that there are faster methods but the idea was to show a friend how slow this method was - which soon became me trying to optimize it as best as possible). This is the generation procedure, the rest of the code is two api calls to requisition memory via globalalloc and globallock and a procedure for saving the data.
For some reason that I haven't been able to figure out, a divide by zero error takes place within the .REPEAT loop. It happens somewhere between testing 7FFFFFFFh and 8FFFFFFFh for primality. Based on an output file it should take place roughly .5 million primes after testing 7FFFFFFFh. (Output file is 410mb when ending at 7FFFFFFFh, divide by zero takes place at around 413mb of memory usage and primes are always stored in a full DWORD).
Any help, even unrelated to the error - such as how to do something faster / more efficiently or just properly would be appreciated. And if theres a proper methodology on how to debug these sorts of problems (I've tried ollydbg but it slowed it down massively, and by the time it did get to divisor = 0 I couldn't figure out exactly what had caused it).
Quote
exp proc
mov ebx, pMemory ;start initial data entry, pMemory = pointer to memory location
mov [ebx], dword ptr 1h
add ebx, 4h
mov [ebx], dword ptr 2h
add ebx, 4h
mov [ebx], dword ptr 3h
add ebx, 4h
mov [ebx], dword ptr 5h ;end initial data entry
add ebx, 4h
mov writeloc, ebx ;move next possible writeloc into writeloc
push writeloc
FLDCW FPSTATE ;load FP control word with round to 1 for sqrt function
mov esi, 5h ;move last prime into esi (might as well be an immediate in this case)
add pMemory, 4h ;move pointer to dword 2h so that next add moves it to 3h
restart:
mov ebx, pMemory ;return divisor pointer to point at the first divisor, which = 3
cmp esi, 0FFFFFFFFh ;check if prime candidate is last acceptable 2^32 number
je endsave ;if so, end and save output
add esi, 2h ;increment prime candidate
push esi
fild DWORD PTR[esp]
fsqrt ;FP square root,
fistp DWORD PTR[esp]
pop edi
.REPEAT ;check prime loop
add ebx, 4h ;move forward to next divisor
mov ecx, DWORD PTR [ebx] ;move divisor into ecx
mov eax, DWORD PTR esi ;esi = current prime candidate
xor edx, edx ;clear remainder/modulo register
div dword ptr ecx ;divide
cmp edx, 0h ;check if remainder = 0
je restart ;next prime candidate
.UNTIL ecx > edi ;until ecx > sqrt(prime candidate)
isaprime:
pop edx ;writeloc pop'd into ebx
mov [edx], dword ptr ESI ;save checked prime
add edx, 4h
push edx ;push writeloc onto stack
jmp restart
endsave: ;prepares for save
pop edx ;writeloc pop'd into ebx
sub pMemory, 4h ;remove bias
sub edx, 4h
sub edx, pMemory ;writeloc - initial memory pointer = number of bytes written
mov SizeReadWrite, edx
ret
exp endp
My first guess (without checking your code thoroughly) would be that it's caused due to the representation of negative numbers by 2's-complement. Which basically means that 7FFFFFFFh+1 and 'above' are negative.
Anyway, using the fpu is slow, and unnecessary for this. (But at least you're only checking up to the square-root :wink)
A completely integer based method (I wrote this quite a while ago :lol):
prime proc eks:DWORD ;return true if prime, false if not
push esi
mov esi,pList
add esi,4
@@:
mov eax,[esi]
mov ecx,eks
mul eax
cmp eax,ecx
ja @fin ;when all divisors (upto sqrt) have been checked it must be prime
mov eax,ecx
xor edx,edx
div DWORD PTR [esi]
test edx,edx
jz @fail ;if no remainder then is whole divisor - so eks not prime!
add esi,4
jmp @B
@fin:
pop esi
mov eax,TRUE
ret
@fail:
pop esi
mov eax,FALSE
ret
prime endp
pList is a pointer to the list of currently found primes (first element is 2) -- which we use as the list of numbers to divide with (e.g. there's no point checking with any even numbers as dividing by 2 already covers them; you only need to check against any primes up to the sqrt of your candidate.)
I used the FPU square root function because its supposed to be the fastest method. I searched forum topics and found someone else's testing of different square root algorithms and the fastest was using the FP sqrt at about 26 cycles (including moving the data back into the x86 registers). Squaring the divisor seemed like a slower way since MUL is 3 cycles and the amount of divisors will quickly increase past 26/3.
The FPU square root function is fine in this situation and your logic is sound.
The divide by zero I believe will be what Tedd said. However, the DIV instruction is not at fault here (it normally is, since there are different instructions for signed and unsigned). I think the issue is that loading 80000000h into the FPU will translate into a negative number. Taking the square root of a negative number is not possible in the real number set and either the floating point exception thrown here is misinterpreted at a divide by zero or the number zero is returned.
The only workaround AFAIK is to zero-extend the original integer into a QWORD. This will ensure that the number is not loaded as a negative number. (It also sets you up to go further than 2^32 numbers (if you have the time :wink )).
Cheers,
Zooba :U
When loading an integer to the FPU, it will ALWAYS be considered as a SIGNED integer. The FPU will then consider extracting the square root of a negative number as an invalid oeration and replace the value in the FPU register with a NAN (Not-A-Number). When trying to store such a NAN as an integer, this is also considered as an invalid operation and a value of 80000000h will be stored at a dword destination address.
You were then transfering that value into EDI, using it as the limit for increasing ECX. By increasing the pointer into the primes' memory area, you were eventually loading a 0 into ECX and use it as the divisor. :red
Actually, you don't need the square root nor the square to terminate the loop. Simply terminate it when the divisor exceeds the quotient.
;assuming N=the number being tested and ESI=pointer to the prime list
@@:
mov eax,N
xor edx,edx
div dword ptr[esi]
test edx,edx
jz NOT_prime
cmp eax,[esi]
jb IS_prime
add esi,4
jmp @B
Raymond
N.B. Using the FPU without proper knowledge can create problems. As a starter, you may want to have a peak at the following if you are not yet aware of its existence.
http://www.ray.masmcode.com/tutorial/index.html
Thanks for the replies. Looks like I have some reading to do (thanks for the link too).
I've already tried terminating when divisor>n/3, since no even number other than two is a prime and the first divisor is 3 but it was very slow - taking about 4 hours to hit 10mb of memory usage (by comparison its about 8 minutes till 70mb of usage for terminating when divisor>sqrt(n)).