News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Raise x to the y power improvements?

Started by devilhorse, October 30, 2008, 06:05:36 PM

Previous topic - Next topic

devilhorse

I found the following code on Google and converted it to asm. I would like to know if there are any improvements anyone thinks I should make to the procedure.

C++ Version
#include <iostream>
using std::cout;
using std::cin;
using std::endl;

int main()
{
   int base; 
   int exponent;
   int i;
   int power;

   i = 1;
   power = 1;

   cout << "Enter base as an integer: "; 
   cin >> base;

   cout << "Enter exponent as an integer: ";
   cin >> exponent;

   while ( i <= exponent )
   {
      power *= base;
      i++;
   }

   cout << power << endl;
   return 0;
}

My Asm Version

                    ;int base   ;int exponent
RAISE_TO_POWER Proc Base:DWord, Exponent:DWord, szOut:DWord
Local locBase:DWord
Local locExponent:DWord
Local i:DWord ;int i
Local power:DWord ;int power
Local szResult:DWord
Local szprivateBuffer[3]:Byte

Mov i, 1 ;i = 1
Mov power, 1 ;power = 1


Invoke Value, Base ;Enter base as an integer:
Mov locBase, Eax ;cin >> base

Invoke Value, Exponent ;Enter exponent as an integer:
Mov Edi, Eax ;cin >> exponent

Invoke IntMul, locBase, locBase

_While:
Cmp i, Edi ;while ( i <= exponent )
Jng _Raise
Jmp _DoneRaising

_Raise:

Mov Eax, power    ;{ power *= base;
Mul locBase
Mov power, Eax

Add i, 1    ;i++; }
Cmp i, Edi
Jng _While

_DoneRaising:

;cout << power << endl;
Invoke dwtoa, power, Addr szprivateBuffer
Invoke lstrcpy, szOut, Addr szprivateBuffer
   Ret
RAISE_TO_POWER EndP


KeepingRealBusy

A more efficient algo would be the following:

    mov esi POWER
    mov  ebx,BASE
    mov  ecx,1
top:
    test esi,01h
    jz    skip
    mov eax,ecx
    mul  ebx
    mov ecx,eax
skip:
    mov eax,ebx
    mul   ebx
    mov  ebx,eax
    shr   esi,1
    jnz   top
done:         ;result in ecx

Of course, check that you do not have the base = 0, and without too many iterations you will exceed the range of a register (edx will not be 0 any more) and you are then in the area of multiple precision arithmetic.

Dave.


cmpxchg

#2
If 32bit number is the only thing you want then its not interesting at all

  mov  ebx, 17  ;exponent
  mov  esi, 3    ;base
;------------------------
  mov  eax, esi
  xor  edx, edx
  sub  ebx, 2
  je   .1
  jb   .done
.loop:
  mul  esi
  mul  esi
  sub  ebx, 2
  ja   .loop
  jnz  .done
.1:
  mul  esi
.done:
  cmp  ebx, -2
  mov  esi, 1
  cmove eax, esi       


However there are not that many iterations in 32bit range so you may consider simpler to understand approach.

  mov  ebx, 17   ;exponent
  mov  esi, 3    ;base
  ;------------------------
x_power_y:
  mov  eax, esi
  xor  ecx, ecx
  sub  ebx, 1        ;same as "cmp ebx,1" except 1 is subtracted from ebx
  je   .done          ;ebx = 1
  jb   .zero           ;ebx = 0

.loop:
  imul eax, esi      ;eax = eax * esi
  adc  ecx, 0        ;jump out of the loop could be faster
  sub  ebx, 1
  jnz  .loop          ;reached 0?
  test ecx, ecx
  jnz  .overflow     ;...

.done:
  rep ret

.zero:
  xor  eax, eax      ;one byte
  inc  eax             ; smaller   
  jmp  .done     


both results are in EAX

KeepingRealBusy

The only problem with the first and third examples is one of efficiency. Consider a base of 1 and an exponent of 500000000. Both of these examples would result in multiplying 500000000 (5 hundred million) times. My method would only use 13 + 28 multiplies (use calculator and enter 500,000,000 and display in binary) - there are 13 '1' bits and 28 bits total. The top section only multiplies the result by the current power of the base if the bit for that power of the base is represented in the exponent expressed as a binary number. The skip section squares the current power and shifts the exponent down by one bit (to check the next power of the exponent). If the exponent was 1, only two multiplies would be done - 1 by base, and base by base. If the exponent was 2, then three multiplies would be done - base by base, 1 by base^2, and base^2 by base^2. If the exponent was 3 then 4 multiplies would be done - 1 by base, base by base, base by base^2, and  base^2 by base^2. If the shift and test were moved in front of the base by base multiply, then that final base^2n by base^2n could also be eliminated.

Of course, for any base larger than 1, you would quickly exceed the range of a 32 bit register. OBTW, the output field of the result was only specified as a 3 byte local variable so could only hold a 3 digit number with a trailing null.

Dave.

Slugsnack

Perhaps consider recursion by checking whether the exponent is odd.  eg. consider the equation x^y

If y = odd then x*x^(y-1)

Now y is even then you can say the original equation can be represented by x*(x^(y-1)/2)^2.  Simply recurse through a function that checks the odd/even and acts on it then once it hits the base case (ie. when y = 0) then the return is 1 by which time you can 'expand' it out.  This changes the amount of 'effort' of the problem to a logarithmic 2 scale instead of linear.  Of course you might have to do a bit more work to get recursion working effectively ;)

ptoulis

Actually, the algorithm from KeepingRealBusy is near optimal.
It is called repeated squaring and is tremendously faster than the first algorithm.
To compute power(x,n) you need O(log n) calculations.

You can check out the relevant Wikipedia page at http://en.wikipedia.org/wiki/Exponentiating_by_squaring

KeepingRealBusy

I have also considered this for calculating the natural anti-logrithm (ln) i.e. e^xyz.mno but to extreme accuracy (extended precision). First create a table of powers of e up to e^32 by squaring, and fractional powers by sqrt (e^1/2, 1/4, 1/8...) for 32 roots. then express the power as a binary number xyzmno, split the number at the integer bit count, and start the process at e^0 by setting the result to 1.0 then for every shifted bit that matches the integer portion (actually as I expressed it, check for odd) and if a match, then multiply the result by the next power of e, then shift the integer power down and repeat until the power goes to 0, then process the fractional bits the same way, only multiplying by the roots of e, and checking for the most significant bit set, then shifting left for the next fractional power. Expanding a Maclaurin power series converges very slowly if the value is not much less than 1.

Dave.

KeepingRealBusy

And, of course, you always have Booth's algorithm. This was originally designed for additive circuits, but can be extended to the power series. If, for instance, the power in binary is 01011111b, this would require multiplying by the base, base^2,base^4,base^8,base^16,base^64. However, 011111b is 31, so instead, multiply by base^32 then divide by base then multiply by base^64. All you need is a string of one bits (usually > 3 bits because the divide is longer than the multiply) to implement this. A table lookup on 8 bit values of the power could be used to give appropriate sequence of multiplies and divides, using a different table for each byte of the multiplier.

Dave.

KeepingRealBusy

I have just thought of an amazingly easy solution to the divide problem. Have another zeroed power word, but reserved for powers to divide by. Change my scan to shift a bit left and testing for that bit set for each iteration instead of shifting the power right and testing for odd. In addition, don't do the multiplies yet, just look for a one bit in the multiply power that is preceded by a one bit. Add the shifted bit to the divide power and add the shifted bit to the multiply power (this will propagate in the multiply power to remove the preceding one bit and any one bits preceding it and in addition will set the next preceding zero bit). Continue on until all bits in the multiply power have been checked. Then multiply as before to create the base^adjusted_power. Now do the same thing for the divide powers, creating the divisor by multiplying by the base for any matched shifted bit. Finally, divide the result of the multiply power by the result from the divide power. This results in a single divide, and a single multiply for a lone one bit, and only two multiplies for any island of adjacent one bits. In addition, consider the following: if the multiply power is 0111101111000b and divisor power 0b, the multiply power will be adjusted to 0111110000000b and the divide power adjusted to 1000b, then the multiply power will be adjusted to 01000000000000b and the divide power adjusted to 10001000b. This leaves you with only 3 multiplies and one divide. In essence, two islands of adjacent ones separated by a single zero bit are connected, resulting in a single one and two ones in the divisor. Only for isolated 01b iterations (01010101b) would there be a multiply for each bit in the power.

Remember, the multiplies and divides are not a single DWORD by a single DWORD, but possibly many hundreds of multiplies or divides in multiple precision arithmetic.

Can this algorithm  be copyrighted? I have never seen it mentioned before.

Dave.