Converting binary integers of an arbitrary size to ascii decimal

Started by formalproof, November 18, 2008, 02:19:28 PM

Previous topic - Next topic

formalproof

I'd like to write a procedure to convert a binary integer of arbitrary (huge) size (implemented as an array of DWORDs, each DWORD representing 32 bits of the integer) to ascii decimal. But I cannot figure out a way to do this, even though I've been thinking of it for a long time.

Probably the way to go would be to first "convert" the x highest order bits to ascii decimal, display the result, and then somehow to "convert" the next x bits to ascii decimal so that the result would no longer affect those decimal digits which have already been displayed. I'm not sure if this is possible (I think it should be) as I've been unable to do the math. Or perhaps there is some other way to do it?

Can anyone help? Thank you in advance.

Mark Jones

Hi, if each DWORD entry is 32 binary bits of the final integer, how about starting at the last bit and working backwards towards the most significant bit. Each "set" bit is worth 2n decimal, where the exponent n starts at 0 and increases by one for every bit counted. i.e.,

10001001  :

10001001  = 20 = 1
10001001 = 23 = 8
10001001 = 27 = 128

Total = 1 + 8 + 128 = 137
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

herge

 Hi formalproof:

Try this:

; DW2HEX.ASM Wednesday, November 12, 2008 6:25 AM
; ########################################################################
;
;               This original module was written by f0dder.
;
;      Part of the code has been optimised by Alexander Yackubtchik
;
; ########################################################################
   include \masm32\include\masm32rt.inc

   .386
    option casemap :none   ; case sensitive
   .data
Num    dd 0   ; Binary Number
Buf    dd 0    ; Pointer to Print Buffer
HexOutS db 9    
HexOut  db 32 dup(0)
      db 0
CrLf db 13, 10, 0
PutX dw 0
    db 0    
   .code

; ########################################################################

dw2hex proc source:DWORD, lpBuffer:DWORD
   push esi
   mov edx, lpBuffer
   mov esi, source
   xor eax, eax
   mov ecx, eax
   mov [edx+8], al         ; put terminator at correct length
   mov cl, 7               ; length of hex string - 1
 @@:
   mov eax, esi            ; we're going to work on AL
   and al, 00001111b       ; mask out high nibble
   cmp al, 10
   sbb al, 69h
   das
   mov [edx + ecx], al     ; store the ascii hex(AL) in the string
   shr esi, 4              ; next nibble
   dec cl                  ; decrease counter (one byte less than dec cl :-)
   jns @B                  ; eat them if there's any more
   pop esi
   ret
dw2hex endp
findNotEbx proc uses ecx eax
   mov ecx, 8
@@:    
   mov al, byte ptr [esi]
   inc esi
   cmp al, bl
   jnz @F
   dec cl
   jnz @B
@@:    
   dec esi
   ret
findNotEbx endp
findIsEbx proc uses ecx eax
   mov ecx, 8
@@:    
   mov al, byte ptr [esi]
   inc esi
   cmp al, bl
   jz @F
   dec cl
   jnz @B
@@:    
   dec esi
   ret
findIsEbx endp
trailEax proc    
   push esi
   mov ebx, 0
   call findIsEbx
   mov word ptr [esi], ax
   pop esi
   ret
trailEax endp  
PutAx proc
   push esi
   mov esi, offset PutX
   mov word ptr [esi], ax
   invoke StdOut, esi
   pop esi
   ret
PutAx endp

; #########################################################################
Start:
   mov eax, offset HexOut
   mov Buf, eax
   mov eax, -1
   mov ecx, 23
@@:  
   mov Num, eax  
   push eax
   push ecx
   mov esi, offset HexOut
   invoke dw2hex, eax, esi
   push ebx
   mov ebx, '0'
   call findNotEbx
   mov eax, 'h'
   call trailEax
   pop ebx
   invoke StdOut, esi
   invoke crt__ultoa, Num, Buf, 10
   mov esi, offset HexOutS
   invoke StdOut, esi
   invoke crt__ultoa, Num, Buf, 8
   mov esi, offset HexOutS
   invoke StdOut, esi
   mov ax, 'o'
   call PutAx
   invoke crt__ultoa, Num, Buf, 2
   mov esi, offset HexOutS
   invoke StdOut, esi
   mov ax,'B'
   call PutAx
   invoke StdOut, addr CrLf
   pop ecx
   pop eax
   inc eax
   dec ecx
   jnz @B
   inkey
   exit
end Start


Regards herge
// Herge born  Brussels, Belgium May 22, 1907
// Died March 3, 1983
// Cartoonist of Tintin and Snowy

formalproof

@Mark Jones: Yes, but the problem is how to store that temporary sum result in memory, as the registers are only 32-bit. I already have a procedure which converts one 32-bit integer to ascii decimal, but for binary integers of length n*32 bits, I have no idea what to do.

@Herge: As far as I understand, the code you posted converts bin to ascii hex. What I'd like to do is to convert to ascii decimal. Anyway, if it's possible to convert arbitrary sized binary numbers to ascii hex, then it should definitely be possible to converto ascii dec as well. I probably could try to modify the code you posted to convert it to ascii dec instead. But I'm still learning ASM. Could you explain in principle how the code works?

Thanks for the answers!

qWord

hi,

Quote from: formalproof on November 18, 2008, 02:19:28 PM
Probably the way to go would be to first "convert" the x highest order bits to ascii decimal, display the result, and then somehow to "convert" the next x bits to ascii decimal so that the result would no longer affect those decimal digits which have already been displayed. I'm not sure if this is possible ...
afaik it's not possible - i think you must use the normal "divide by 10" algorithm. So you need to write a division algorithm that return the quotient and remainder.
Here some pseudo-code for a division(unsigned) with arbitrary-size-integers:

// Quotient, Dividend, Divisor and  Remainder are 128bit-integers

   Quotient = Dividend
   Remainder = 0

   for (i=0,i<128,i++)
{
   // Quotient = bits 0-127, Rmainder = bits 128-255
   //        256bit               256bit
   // |----------------|   |----------------|   
      Remainder:Quotient = Remainder:Quotient SHL 1

      if(Remainder >= Divisor)
      {
         Remainder = Remainder - Divisor
         Quotient = Quotient + 1
      }
    }


regards, qWord
FPU in a trice: SmplMath
It's that simple!

FORTRANS

Hi,

   This is cobbled together from pieces of code collected from
the newsgroups, and what I remember from trying it myself long
ago.  Untested, can't find the real thing.  If it doesn't work,
what you will need to search for is "extended precision math".
EDI will end up pointing just before the high digit of the
answer.  Note this is unsigned math if that is important.

HTH,

Steve N.


; - - - Binary to decimal example. - - -
        STD     ; Make strings go backward (that's how the digits come out).
        MOV     EDI,OFFSET DestinationEnd ; Point to last digit of the string.
; - - -
OuterLoop:
        XOR     EDX,EDX ; This requires EDX to start out as zero.
        MOV     ECX,4   ; For this example.
        MOV     EBX,OFFSET BinArray
        MOV     [TestZer],0

; - - - Multiple word division. - - -
InnerLoop:
        MOV     EAX,[EBX]; Get part of binary number.
        DIV     TEN
        MOV     [EBX],EAX; Save remainder.

        TEST    EAX,EAX
        JZ      IL_1
        MOV     [TestZer],1    ; Mark if non-zero.
IL_1:

        SUB     EBX,4    ; Point to next part of binary number.
        LOOP    InnerLoop
; - - -
        MOV     AL,DL    ; Put digit in AL for string operation.
        ADD     AL,'0'   ; Convert to ASCII,
        STOSB            ; and save it.

        TEST    [TestZer],0FFH
        JNZ     OuterLoop; Anything left to do?

; - - - Data area. - - -
BinArray DD     0, 0, 0, 0     ; Fill in with your number.
TEN     DD      10             ; Decimal conversion.
TestZer DB      0              ; See if done.
DestinationArray DB 'A whole bunch of bytes to hold your answer, too lazy to calc.'
DestinationEnd  DB  '0'        ; The last digit of the ASCII number.
; - - - End example. - - -

KeepingRealBusy

Divide the big number by 10^9, saving the quotient and saving the remainder in another array, repeat until the quotient is 0. This gives you an array of unsigned integers, each < 10^9. Divide each of the remainders by 10 9 times to get 9 remainders, adding a '0' to each to convert to characters. The array of unsigned integers is in little endian format (first unsigned integer is the lowest), as are the characters derived from the unsigned integers.

I'v got the code somewhere (this is not the first time this question has come up), but not this forum, bur it is "fun" to roll your own, isn't it?

Dave.

RuiLoureiro

Quote from: KeepingRealBusy on November 18, 2008, 08:21:14 PM
Divide the big number by 10^9,
Hi Dave,

Sorry, but i have some doubts

When you say «Divide the big number by 10^9»

1st: whats a big number ?
     can be a big number like this defined by «var» ?

     666666665555555544444444333333332222222211111111h
     
var    dd 11111111h
        dd 22222222h
        dd 33333333h
        dd 44444444h
        dd 55555555h
        dd 66666666h

       
2nd: Why 10^9 = 1000000000 = 3b9aca00h ?
      And why not by a 64 bits value 10^N (or a 128 bits value...)

3rd: How to do it ?

Thanks
Rui

KeepingRealBusy

Rui,

A big number is not expressed in decimal. It is a binary number. If you have a maximum sized unsigned binary number in a DWORD, then it would be expressed as 0FFFFFFFFh, if it was 64 bits then it would be expressed as 0FFFFFFFFFFFFFFFFh. In keeping with the Intel format of little endian, the most significant DWORD would be highest in memory, the least significant DWORD lowest in memory. A bignum is an arbitrarily long sequence of DWORDS which express a value. I have routines that handle up to 64K bit long numbers (2048 DWORDS). You divide by 10^9 because the remainder is the largest power of 10 that will fit in a DWORD. You could divide by 10 repeatedly, but this would take too long, or 10^2, or any other power of 10. In fact, the general case is that you can convert the bignum to charcters in any base that had a character representation for the unique digits of the number set for that base - ASCII64 uses (./0-9A-Zaz) to have 64 characters (used with PGP and other implementations of Public Private encryption with a need to distribute huge keys).

KeepingRealBusy

formalproof,

I hope you don't want "formal proof" of the following, I really hate working on mathematical "proofs" that always involved some sort of trick.

You proceed just as you would for dividing a decimal number by 10, but your digits are DWORDS. You work from the top down. Start by zeroing edx. Divide the most significant DWORD pair (edx,eax) by 10^9. Save the eax result on top of the input, the remainder is in edx (already where it needs to be, isn't that neat), then pick up the next DWORD in eax and divide again, until all DWORDS are divided. The bignum is now smaller by a bit count of 10^9, or almost. Depending on the two most significant bits of two binary numbers, the result of multiplying the numbers has a bit count equal to the sum of the bit counts of the two numbers, if either number is 10b, then the result will only be the sum - 1. Division is just the opposite, you may shrink the dividend by n, or maybe n-1. 10^9 starts with 3 and is 30 bits long so the dividend will shrink either 30 bits or 29 bits, but will always be right justified in big endian form or left justified in little endian form. The last remainder in edx is saved in another array. Repeat the process until the dividend (bignum) is 0. Now convert the remainders by dividing by 10 nine times and convert the remainders to characters by adding '0'. This is brutal, but works, there are quicker ways to convert a DWORD to decimal. The DWORD count times 9 + 1 will give you the size of the character array to save the decimal string. Start at the end of the string and work in reverse, save a terminating null, start with the first remainder DWORD and calculate each least significant digit and save in front of the prior digit. You will possibly end up with leading zeros in the first 9 characters of the string. There are three ways to handle this. Return a pointer to the whole string. Return a pointer to the first non-zero digit but leave at least one 0 if the value was 0. The final way would be to convert the last DWORD to 9 characters at the front of the string, left justify the string to save it starting with the first non-zero digit, then calculate the count of remaining DWORDS, multiply by 9, and add to the save pointer (now points to the position for the NULL), and calculate the remaining digits as before.

As I have indicated in my prior post, there is no reason why this has to be in decimal. Any base that has a character set representation will work. For a divisor, use the largest base^power that will fit in a DWORD., use the remainders from dividing the remainder DWORDs by the base to index into the character set for the base to get the characters to save. If using ASCII64 remember that the leading zeros are actually '.' characters when skipping by leading zeros.

Simple?

Dave.

RuiLoureiro

Quote from: KeepingRealBusy on November 18, 2008, 09:59:39 PM
A big number is not expressed in decimal. It is a binary number.
Hi Dave,
        Seems i understood the idea.
        After the convertion we should see a decimal number, (if we print it) no ?     
Rui       

KeepingRealBusy

Rui,

Yes, but be careful. A 65K bit number (2048 DWORDS) has 19729 digits, 247 lines of 80 characters. To quickly get the character count, use calculator, enter 2 x^y xxx (where xxx is the power of 2 just beyond the highest DWORD). This will result in a number such as 2.0035299304068464649790723515603e+19728 (where 19728 are the number of zeros following the "2."), thus 19729 digits. I have the this exact number, but I don't wish to paste it in this post - Hutch would probably ban me from the MASM32 forum.

Dave.

formalproof

@KeepingRealBusy: It wasn't simple at all to implement what you said (at least for a noob like me.) But I finally got it working! Thank you so much, and everyone else who responded too!

RuiLoureiro

Quote from: KeepingRealBusy on November 19, 2008, 04:49:50 PM
Rui,
Yes, but be careful.

Hi Dave,
        «Yes, but be careful» 
Quote
        but you said «A big number is not expressed in decimal.It is a binary number.»

        A number is a number. We can see it in binary, decimal, hexa, ...
       
        «To quickly get the character count, use calculator, enter 2 x^y xxx
         (where xxx is the power of 2 just beyond the highest DWORD).»

        the next decimal is 2^65536, N < 0.30103 * 65536, N+1 is de number of decimal digits
        for 65536 bits.       
        For 64 bits, N < 0.30103 * 64  => 20 digits

        I believe you got this number
       
                              2.0035299304068464649790723515603e+19728

        with windows calculator in this way:  2   x^y   65536    =

        It gives:   2^65536 = 2,0035299304068464649790723515603e+19728
        And it is the next decimal number (with one more binary digit)
        (2^65536 - 1 is the last decimal number with 65536 bits).
        Instead of base e you could use base 10 to see the first decimal digit
       
        I dont like to use useless things, and for me a decimal number with 536 bits is
        useless. number of my bank notes, it would be different, Dave ! :U
Rui       


KeepingRealBusy

Rui,

e+19728 does not indicate using the base e (the base of the natural logs), instead it is a scientific notation for 10 raised to a power of 19728.

FYI, the last 8 decimal digits of 2^65536-1 are "19156735". You cannot get these digits by subtracting 1 from a floating representation of 2^65536, you have to do the actual divides.

QuoteI dont like to use useless things, and for me a decimal number with 536 bits is useless.

Whether or not you you think they are useless, they exist. You never mentioned that your conversion would be limited to 64 bits, in fact in a a prior post you were talking about a 6 DWORD sized number which is 192 bits long.

QuoteAfter the convertion we should see a decimal number, (if we print it) no ?

My reply was to caution you about trying to print a big number like 2^65536-1. Again, you never mentioned that your conversion would be limited to 64 bits, in fact in a a prior post you were talking about a 6 DWORD sized number which is 192 bits long.

You also asked:

QuoteAnd why not by a 64 bits value 10^N (or a 128 bits value...)

This could be done, but only if you have a system that currently supports 64 bit registers. I am still running XP in 32 bit mode, so I stick to 32 bit divides. I haven't yet looked into the possibility of using SSE for 128 bit arithmetic, I don't think MMX is suitable for 64 bit multiplies or adds, I may be wrong about this.

Dave