The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: bomz on December 08, 2011, 07:13:49 AM

Title: SSE
Post by: bomz on December 08, 2011, 07:13:49 AM
somebody may give really working code for 128 bit registers? or better all instructions with examples
http://www.mark.masmcode.com/
Quote        mov     ecx,16384           ;write 16384 16-byte values, 16384*16 = 256KB.
                                    ; So we are copying a 256KB array
        mov     esi,offset src_arr  ;pointer to the source array which has to be
                                    ; 16-byte aligned or you will get an exception.
        mov     edi,offset dst_arr  ;pointer to the destination array which has to be
                                    ; 16-byte aligned or you will get an exception.
looper:
        movdqa  xmm0,[esi]          ;works on P3 and up
        movntps [edi],xmm0          ;Works on P3 and up
        add     esi,16
        add     edi,16
        dec     ecx
        jnz     looper
(http://smiles.kolobok.us/light_skin/rtfm.gif)
Quote.686
.xmm

.model flat, stdcall
option casemap :none

include \MASM32\INCLUDE\windows.inc
include \MASM32\INCLUDE\user32.inc
include \MASM32\INCLUDE\kernel32.inc
includelib \MASM32\LIB\user32.lib
includelib \MASM32\LIB\kernel32.lib

.data
align 16
var1   db "1234567812345678",0
var2   db "0000000000000000",0

.code
start:
lea esi, var1
lea edi, var2
;movd  xmm(0),[esi]
;movd [edi],xmm(0)
movq  xmm(0),[esi]
movq [edi],xmm(0)
invoke MessageBox,0,ADDR var2,0,MB_ICONASTERISK
invoke ExitProcess,0
end start
(http://s1.ipicture.ru/uploads/20111208/TRHaaRsE.png)
Title: Re: SSE
Post by: jj2007 on December 08, 2011, 07:38:59 AM
Search the forum for SSE2 - 14 pages of results. Many have attachments.
Or try a search for pcmpeqb - 2 pages, most of them on timing fast SSE2 algos (and many found their way into MasmBasic).
If you have a more specific need, be more specific and somebody will help.
Title: Re: SSE
Post by: bomz on December 08, 2011, 08:58:03 AM
somebody have example (for any code and make-batch file) for masm 6.15?

application with SSE instruction both compatible with AMD and Intel processors?
Title: Re: SSE
Post by: hutch-- on December 08, 2011, 09:37:00 AM
bomz,

Do yourself a favour, get ML 9, 10 or 11, you can have real PHUN with SSE4.2  :P
Title: Re: SSE
Post by: bomz on December 08, 2011, 09:44:18 AM
QuotePentium 4 2.26    SL6RY    C1    2.26 GHz    512 KB    533 MT/s    17×    1.53 V    58 W    Socket 478       RK80532PE051512

Northwood (130 nm)

    * Intel Family 15 Model 2
    * All models support: MMX, SSE, SSE2

I need to delete double urls from the list of 20.000 http addresses. I do a little application which do this 10 minutes first. than I optimize it... optimize... now it do this 0.375 sec. This good task for SSE training
Title: Re: SSE
Post by: bomz on December 08, 2011, 09:56:10 AM
http://www.masm32.com/board/index.php?PHPSESSID=b16411be671a312294b80470f76cd95d&topic=16430.0
(http://smiles.kolobok.us/light_skin/party.gif)
Title: Re: SSE
Post by: bomz on December 08, 2011, 10:20:42 AM
Quote.686
.xmm

.model flat, stdcall
option casemap :none

include \MASM32\INCLUDE\windows.inc
include \MASM32\INCLUDE\user32.inc
include \MASM32\INCLUDE\kernel32.inc
includelib \MASM32\LIB\user32.lib
includelib \MASM32\LIB\kernel32.lib

.data
align 16
var1   db "1234567812345678",0
var2   db "0000000000000000",0
buffer db 512 dup (0)

.code
start:
lea esi, var1
lea edi, var2

movups  xmm1, [esi]; XMMWORD PTR[esi]
movups  [edi],xmm1

invoke MessageBox,0,ADDR var2,0,MB_ICONASTERISK
invoke ExitProcess,0
end start
(http://s2.ipicture.ru/uploads/20111208/FxOSkSU3.png)

http://www.microsoft.com/downloads/en/details.aspx?familyid=7A1C9DA0-0510-44A2-B042-7EF370530C64&displaylang=en (http://smiles.kolobok.us/light_skin/download.gif)

Quote from: dedndave on April 16, 2011, 01:10:18 PM
it is far easier to just use 7-zip to extract the file from the masm8 setup
1) right-click on the setup program, 7-zip, Extract files, OK
2) inside the resulting folder, another file - repeat the same thing
3) inside that, there are 2 files, an MSI, and a CAB
4) again, use 7-zip to extract files from the CAB file
5) inside the CAB is a file named FL_ml_exe_____X86.3643236F_FC70_11D3_A536_0090278A1BB8
6) rename it to ML.exe
7) change ML.EXE in C:\masm32\bin

bomz, don't post files that are not yours. It is a licence violation to post a Microsoft owned binary. Just use the normal Microsoft link.
Title: Re: SSE
Post by: bomz on December 08, 2011, 11:38:01 AM
http://neilkemp.us/src/sse_tutorial/sse_tutorial.html
Intel SSE Tutorial : An Introduction to the SSE Instruction Set
Title: Re: SSE
Post by: jj2007 on December 08, 2011, 11:43:52 AM
Quote from: bomz on December 08, 2011, 09:44:18 AM
I need to delete double urls from the list of 20.000 http addresses. I do a little application which do this 10 minutes first. than I optimize it... optimize... now it do this 0.375 sec. This good task for SSE training

Zip the list and post it here. We can do it in less than 0.1 seconds.
Title: Re: SSE
Post by: bomz on December 08, 2011, 11:51:08 AM
It's password URL's for files access. 92 sign long. I doubt it's possible more quickly. make your own random list
Title: Re: SSE
Post by: bomz on December 08, 2011, 12:18:22 PM
it's hard to me formulate it in english

I match to each URLtricky algorithm logical summary of all it's signs in huge matrix (32 mb size for 500.000 max strings), so to compare two string enough compare it's 32 bit "hash" (legth is equal). if hash equal - this need sign compare, if not - no need any compare. so in 20.000 list only 5-10 may have the same "hash" or any
Title: Re: SSE
Post by: jj2007 on December 08, 2011, 06:03:48 PM
Quote from: bomz on December 08, 2011, 11:51:08 AM
It's password URL's for files access. 92 sign long. I doubt it's possible more quickly. make your own random list

Creates a file with 20,000 different URLs, of which roughly half are unique. Writes to a second file only the unique URLs.
30 lines, 32...47 ms on my slow old Celeron, reading the old and writing the new file included.

include \masm32\MasmBasic\MasmBasic.inc   ; download (http://www.masm32.com/board/index.php?topic=12460)
  Init
  mov ecx, 19999      ; we need a random file with 20,000 URLs
  Dim My$(ecx)
  .Repeat
     Let My$(ecx)="http://go"+Str$(Rand(10000))+"site.htm"
     dec ecx
  .Until Sign?
  Store "MyURLs.txt", My$()
  push Timer            ; ------- timing includes reading and writing of files ------
  Recall "MyURLs.txt", Mu$()    ; file contains multiple URLs, about 50% are unique
  xchg eax, ecx         ; save # of lines
  QSort Mu$()
  Dim URL$(ecx)
  xor edi, edi
  dec ecx
  .Repeat
     mov esi, Mu$(ecx)
     .Repeat
          dec ecx
     .Until Sign? || StringsDiffer(esi, Mu$(ecx))
     Let URL$(edi)=esi
     inc edi
  .Until signed ecx<=0
  Store "MyUniqueURLs.txt", URL$(), edi
  void Timer
  pop edx
  Inkey Str$("The action took %i ms", eax-edx)
  Exit
end start

EDIT: .Until signed ecx<=0 ; signed is a simple equate: sdword ptr - without "signed", the code would continue if ecx was below zero, and trouble was ahead. Not for n=20000, but e.g. for 50000 strings.

@dancho: Thanxalot :bg
Title: Re: SSE
Post by: dancho on December 08, 2011, 06:48:09 PM
little off topic here
@jj2007
didn't notice this before but you masm basic is really top notch product,
really nice and clean code,
gratz on that...
Title: Re: SSE
Post by: dedndave on December 08, 2011, 09:50:37 PM
yah - he has spent a lot of time on it
it is pretty fast, too   :U
Title: Re: SSE
Post by: bomz on December 09, 2011, 12:14:02 AM
QuoteMasmBasic.lib(libtmpAB.obj) : warning LNK4078: multiple ".drectve" sections foun
d with different attributes (00000240)
Title: Re: SSE
Post by: bomz on December 09, 2011, 12:32:19 AM
Quote.686
.xmm

.model flat, stdcall
option casemap :none

include \MASM32\INCLUDE\windows.inc
include \MASM32\INCLUDE\user32.inc
include \MASM32\INCLUDE\kernel32.inc
includelib \MASM32\LIB\user32.lib
includelib \MASM32\LIB\kernel32.lib

.data
align 16
var1   db "1111222233334444"
mess   db "MOVAPS         ",9
var5   db "0000000000000000"
   db 13,10,"SHUFPS 0D8h  ",9
var6   db "0000000000000000"
   db 13,10,"SHUFPS 01Eh  ",9
var7   db "0000000000000000"
   db 13,10,"MOVUPS       ",9
var8   db "0000000000000000"
   db 13,10,"UNPCKHPS     ",9
var9   db "0000000000000000"
   db 13,10,"UNPCKLPS     ",9
var10   db "0000000000000000"
   db 13,10,"MOVDQA       ",9
var11   db "0000000000000000"
   db 13,10,"PINSRW       ",9
var12   db "0000000000000000"
   db 13,10,"PEXTRW       ",9
var13   db "0000000000000000",0

.code
start:
lea esi, var1
lea edi, var5
MOVAPS      xmm1, XMMWORD PTR[esi]
MOVAPS      [edi],xmm1
SHUFPS      XMM1, XMM1, 0D8h
MOVAPS      [edi+32],xmm1
MOVAPS      xmm1, XMMWORD PTR[esi]
SHUFPS      XMM1, XMM1, 01Eh
MOVAPS      [edi+64],xmm1
MOVUPS      xmm1, [esi]
MOVAPS      [edi+96],xmm1
MOVAPS      xmm1, XMMWORD PTR[esi]
UNPCKHPS   XMM1, XMM1
MOVAPS      [edi+128],xmm1
MOVAPS      xmm1, XMMWORD PTR[esi]
UNPCKLPS   XMM1, XMM1
MOVAPS      [edi+160],xmm1
MOVDQA      XMM0, [esi]
MOVDQA      [edi+192], XMM0
MOVDQA      XMM0, [esi]
MOV      eax, '**'
PINSRW      XMM0, eax, 4
MOVDQA      [edi+224], XMM0
MOVDQA      XMM0, [esi]
MOVDQA      XMM0, [esi]
PEXTRW      eax, XMM0, 7
PINSRW      XMM0, eax, 0
MOVDQA      [edi+256], XMM0
MOVDQA      XMM0, [esi]

invoke MessageBox,0,ADDR mess,0,MB_ICONASTERISK
invoke ExitProcess,0
end start
(http://s1.ipicture.ru/uploads/20111211/5C1jmapU.png)
Title: Re: SSE
Post by: bomz on December 09, 2011, 12:50:27 AM
360 mine yours 141 (http://smiles.kolobok.us/light_skin/girl_cray2.gif)

QSort Mu$()

StringsDiffer(esi, Mu$(ecx))

What is this?


as I understand first you sort list by the oder above-under. and than only need to compare neighbor strings.
When I do mine I think that sorting need much steps than 1 full list reading
Title: Re: SSE
Post by: jj2007 on December 09, 2011, 06:29:57 AM
Quote from: bomz on December 09, 2011, 12:50:27 AM
360 mine yours 141 (http://smiles.kolobok.us/light_skin/girl_cray2.gif)

QSort Mu$()  ; QuickSort of strings

StringsDiffer(esi, Mu$(ecx))  ; what the name says

What is this?

> as I understand first you sort list by the oder above-under. and than only need to compare neighbor strings. ; YES
> When I do mine I think that sorting need much steps than 1 full list reading
the logic is interesting, but it might take longer

> warning LNK4078
Thanks, will look into it. It's a harmless warning, though.
Title: Re: SSE
Post by: bomz on December 09, 2011, 06:41:19 AM
It's strange because when you sorting you already compare each strings. and may put them to double and unique list already.

It's need thinking and fresh head. yesterday I think about reason of so difference. first I think that yours first making list when read it from cache but than I try my own list. mine no need "clean list" URL's may find it in trash. than I rebuild mine to console. may be ...... it is not clear for me now why this algorithm need low steps two time

When I make it first I need to decide my problem, So it was not important 10 min or 20. When I find prog for it that do this about 1 min or something about. I think that this prog must use the algorithm from common theory and try another way

sad it's only russian. it's allow to work with lists
http://zalil.ru/32233341
Title: Re: SSE
Post by: bomz on December 09, 2011, 10:14:26 AM
This is SSE1 or SSE2? - upper examples

http://www.tommesani.com/Docs.html
Title: Re: SSE
Post by: qWord on December 09, 2011, 10:29:42 AM
You can use AMD's documentation (http://developer.amd.com/documentation/guides/Pages/default.aspx) to determine the instruction set (SEEx, AVX)
AMD64 Architecture Programmer's Manual Volume 4: 128-bit and 256 bit media instructions (http://support.amd.com/us/Processor_TechDocs/APM_V4_26568.pdf)
Title: Re: SSE
Post by: bomz on December 09, 2011, 11:25:22 AM
Quote.686
.MMX
.XMM

.model flat, stdcall
option casemap :none

include \MASM32\INCLUDE\windows.inc
include \MASM32\INCLUDE\user32.inc
include \MASM32\INCLUDE\kernel32.inc
includelib \MASM32\LIB\user32.lib
includelib \MASM32\LIB\kernel32.lib

.data
align 16
var1   db "1234567812345678"
var2   db "0000000000000000",0

.code
start:
lea esi, var1
lea edi, var2
movdqa  xmm0,[esi]
movntps [edi],xmm0
invoke MessageBox,0,ADDR var2,0,MB_ICONASTERISK
invoke ExitProcess,0
end start
http://www.mark.masmcode.com/

under DOS don't see any difference between SSE and 386 rep movsd

Quote@@:
   movaps  xmm0, ds:[esi]
   movaps  xmm1, ds:[esi+16]
   movaps  xmm2, ds:[esi+32]
   movaps  xmm3, ds:[esi+48]
   movaps  xmm4, ds:[esi+64]
   movaps  xmm5, ds:[esi+80]
   movaps  xmm6, ds:[esi+96]
   movaps  xmm7, ds:[esi+112]
   movaps  es:[edi],xmm0
   movaps  es:[edi+16],xmm1
   movaps  es:[edi+32],xmm2
   movaps  es:[edi+48],xmm3
   movaps  es:[edi+64],xmm4
   movaps  es:[edi+80],xmm5
   movaps  es:[edi+96],xmm6
   movaps  es:[edi+112],xmm7
   add si, 128
   add di, 128
   sub cx, 1
   jnz @B
Title: Re: SSE
Post by: bomz on December 11, 2011, 09:51:55 AM
How convert REAL4 to string?
Quote;and ebx, 111111111111111111111111b
;and eax, 00111111000000000000000000000000b
;rol eax, 8
(http://smiles.kolobok.us/light_skin/suicide2.gif)

PINSRB PINSRQ don't work on SSE2 is it possible to ADD 64 bit integer or only REAL4?
Title: Re: SSE
Post by: jj2007 on December 11, 2011, 10:13:17 AM
Quote from: bomz on December 11, 2011, 09:51:55 AM
How convert REAL4 to string?
Beware of the lousy precision...

Quoteinclude \masm32\MasmBasic\MasmBasic.inc
.data
MyReal4   REAL4 3.14159265358979324
   Init
   DefNum 9
   fldpi
   fstp MyReal4
   Let esi=Str$(MyReal4)
   DefNum 19
   Inkey "PI=", Tb$, esi, CrLf$, "Exact=", Tb$, Str$(PI)
   Exit
end start

PI=     3.14159274
Exact=  3.141592653589793238
Title: Re: SSE
Post by: bomz on December 11, 2011, 10:23:50 AM
This is Basic?
Title: Re: SSE
Post by: jj2007 on December 11, 2011, 10:26:42 AM
Quote from: bomz on December 11, 2011, 10:23:50 AM
This is Basic?

No, it's Assembler. To be precise: it assembles with ml.exe versions 6.15 ... 10.0 or JWasm :bg

Hint: if you prefer C, go for crt_sprintf.
Title: Re: SSE
Post by: bomz on December 11, 2011, 10:28:35 AM
I prefer MASM

It's convert through FPU?
Title: Re: SSE
Post by: jj2007 on December 11, 2011, 11:29:33 AM
Quote from: bomz on December 11, 2011, 10:28:35 AM
It's convert through FPU?

Yes, Str$() (http://www.masm32.com/board/index.php?topic=12460) uses the FPU and has a REAL10 internal precision. You can set output precision either with DefNum n (n=1...19) or with a sprintf type format string:

   PrintLine "Precision:", Str$("\n%3f", PI), Str$("\n%7f", PI), Str$("\n%Cf", PI), Str$("\n%Gf", PI), Str$("\n%Jf", PI)

Precision:
3.14
3.141593
3.14159265359
3.141592653589793
3.141592653589793238
Title: Re: SSE
Post by: bomz on December 11, 2011, 11:39:02 AM
I try do this with masm, read about REAL - BCD(http://smiles.kolobok.us/artists/vishenka/l_book.gif)
Title: Re: SSE
Post by: bomz on December 11, 2011, 12:15:30 PM
Quote.386
.MMX
.XMM


.model flat, stdcall
option casemap :none

   include \MASM32\INCLUDE\windows.inc
   include \MASM32\INCLUDE\masm32.inc
   include \MASM32\INCLUDE\gdi32.inc
   include \MASM32\INCLUDE\user32.inc
   include \MASM32\INCLUDE\kernel32.inc
   include \MASM32\INCLUDE\fpu.inc
   includelib \MASM32\LIB\masm32.lib
   includelib \MASM32\LIB\gdi32.lib
   includelib \MASM32\LIB\user32.lib
   includelib \MASM32\LIB\kernel32.lib
   includelib \MASM32\LIB\fpu.lib

.data
   mestitle      db "Bomz",0
   VAR1         REAL4 11111.0
   VAR2         dt ?
.data?
   buffer         db 512 dup(?)
.code
start:
   finit
   fld VAR1
   ;fstp VAR2

   invoke FpuFLtoA, 0, 10, ADDR buffer, SRC1_FPU or SRC2_DIMM
   invoke MessageBox,0, ADDR buffer,ADDR mestitle,MB_ICONASTERISK
   invoke ExitProcess,0
end start
(http://smiles.kolobok.us/artists/connie/connie_crazyperson.gif)
Title: Re: SSE
Post by: bomz on December 11, 2011, 12:43:52 PM
Quote.386
.MMX
.XMM

.model flat, stdcall
option casemap :none

   include \MASM32\INCLUDE\windows.inc
   include \MASM32\INCLUDE\masm32.inc
   include \MASM32\INCLUDE\gdi32.inc
   include \MASM32\INCLUDE\user32.inc
   include \MASM32\INCLUDE\kernel32.inc
   include \MASM32\INCLUDE\fpu.inc
   includelib \MASM32\LIB\masm32.lib
   includelib \MASM32\LIB\gdi32.lib
   includelib \MASM32\LIB\user32.lib
   includelib \MASM32\LIB\kernel32.lib
   includelib \MASM32\LIB\fpu.lib

.data
   VAR1         REAL4 0.0, 0.0, 0.0, 0.0
   VAR2         REAL4 1.0, 1.0, 1.0, 1.0
   VAR3         REAL4 1.0, 1.0, 1.0, 1.0
   VAR4         REAL4 1.0, 1.0, 1.0, 1.0
   mestitle      db "SSE",0
   shell32         db 'shell32.dll',0
   MessBox         MSGBOXPARAMS <sizeof MSGBOXPARAMS, 0, 0, offset buffer,\
            offset mestitle, MB_OK OR MB_USERICON, 48, 0, 0, 0>
.data?
   buffer         db 512 dup(?)
   string         db 32  dup(?)

.code
start:
   lea   esi, VAR2
   lea   edi, VAR1
   movdqa   XMM0, [esi]
   movdqa   XMM1, [esi+16]
   ADDPS   XMM0, XMM1
   movdqa   [edi], XMM0
   finit
   fld   dword ptr[VAR1]
   invoke   FpuFLtoA, 0, 1110h, ADDR string, SRC1_FPU or SRC2_DIMM
   invoke   lstrcpy, addr buffer, addr string
   fld   dword ptr[VAR1+4]
   invoke   FpuFLtoA, 0, 1110h, ADDR string, SRC1_FPU or SRC2_DIMM
   invoke   lstrcat, addr buffer, addr string
   fld   dword ptr[VAR1+8]
   invoke   FpuFLtoA, 0, 1110h, ADDR string, SRC1_FPU or SRC2_DIMM
   invoke   lstrcat, addr buffer, addr string
   fld   dword ptr[VAR1+12]
   invoke   FpuFLtoA, 0, 1110h, ADDR string, SRC1_FPU or SRC2_DIMM
   invoke   lstrcat, addr buffer, addr string

   invoke   LoadLibrary, addr shell32
   mov   MessBox.hInstance, eax
   invoke   MessageBeep, MB_ICONASTERISK
   invoke   MessageBoxIndirect, addr MessBox
   invoke   ExitProcess,0
end start
Quote.386
.MMX
.XMM

.model flat, stdcall
option casemap :none

include \MASM32\INCLUDE\windows.inc
include \MASM32\INCLUDE\user32.inc
include \MASM32\INCLUDE\kernel32.inc
includelib \MASM32\LIB\user32.lib
includelib \MASM32\LIB\kernel32.lib
include \MASM32\INCLUDE\masm32.inc
includelib \MASM32\LIB\masm32.lib

.data
var1      real4 4.1f, 1.0f, 1.0f, 1.0f
var3      real4 1.0f, 1.0f, 1.0f, 1.0f
var2      INT64 0
mestitle   db "Bomz",0
form      db "0.%ldx10^%ld", 0
shell32      db 'shell32.dll',0
MessBox      MSGBOXPARAMS <sizeof MSGBOXPARAMS, 0, 0,\
      offset buffer, 0, MB_OK OR MB_USERICON, 48, 0, 0, 0>

.data?
buffer db 512 dup(?)

.code
start:
   invoke   LoadLibrary, addr shell32
   mov   MessBox.hInstance, eax

   lea   esi, var1
   lea   edi, var3

   movdqa   XMM0, [esi]
   movdqa   XMM1, [edi]
   ADDPS   XMM1, XMM0
   movdqa   [edi], XMM1
   finit
   fld   dword ptr[var3]
   fstp   var2
   invoke  FloatToStr2,var2,addr buffer
   invoke MessageBeep, MB_ICONASTERISK
   invoke   MessageBoxIndirect, addr MessBox
   invoke   ExitProcess,0
end start
(http://smiles.kolobok.us/light_skin/dance4.gif)
Title: Re: SSE
Post by: jj2007 on December 11, 2011, 02:58:56 PM
You got it! Here is the MB equivalent:

include \masm32\MasmBasic\MasmBasic.inc
.data
VAR1   REAL4 0.0, 0.0, 0.0, 0.0
VAR2   REAL4 1.0, 1.0, 1.0, 1.0
VAR3   REAL4 1.0, 1.0, 1.0, 1.0

   Init
   lea   esi, VAR2
   lea   edi, VAR1
   movdqa XMM0, [esi]
   movdqa XMM1, [esi+16]
   ADDPS XMM0, XMM1
   movdqa [edi], XMM0
   fld REAL4 PTR VAR1
   fld REAL4 PTR VAR1+4
   fld REAL4 PTR VAR1+8
   fld REAL4 PTR VAR1+12
   deb 1, "Result:", ST(0), ST(1), ST(2), ST(3)
   Exit
end start
Title: Re: SSE
Post by: bomz on December 11, 2011, 06:59:02 PM
I Turbo Basic know well. But now new language... But I like your examples.(http://smiles.kolobok.us/light_skin/thank_you2.gif)
I find Quick Sort in MASM32 help.

http://turbo-basic.narod.ru/index.html  (http://smiles.kolobok.us/light_skin/girl_in_love.gif)
http://turbo-basic.narod.ru/TBDEMO-256-H101.rar

I find two errors in Turbo compiler - first it don't back right offset to symbols strings. But for assembler possible use numeric array with the same success. This was corrected only in Quick Basic
Second - it's not error but undocumented, 64 kb string may be cut from left, but cut from right you may only 32 kb string
Title: Re: SSE
Post by: jj2007 on December 11, 2011, 10:23:07 PM
Porting your example should be very easy, see attachment.
Title: Re: SSE
Post by: bomz on December 11, 2011, 10:34:48 PM
It is possible to compile it with MASM?

DownLoadMaster have problem - and this prog I do cut file from the end to the first nonzero byte.
Title: Re: SSE
Post by: jj2007 on December 12, 2011, 06:40:40 AM
Quote from: bomz on December 11, 2011, 10:34:48 PM
It is possible to compile it with MASM?

Sure, that's the whole point: MasmBasic is a library for ML.exe or JWasm.exe, and it works in parallel to the Masm32 installation, i.e. you can use both the Masm32 macros and library functions and the whole BASIC stuff.
Title: Re: SSE
Post by: bomz on December 12, 2011, 09:14:13 AM
INT64 SSE2 http://www.tommesani.com/SSE2MMX.html
Quote.386
.MMX
.XMM

.model flat, stdcall
option casemap :none

include \MASM32\INCLUDE\windows.inc
include \MASM32\INCLUDE\user32.inc
include \MASM32\INCLUDE\kernel32.inc
includelib \MASM32\LIB\user32.lib
includelib \MASM32\LIB\kernel32.lib

.data
;   VAR1      dd   0,   1,   0,   0
;   VAR2      dd   1,   0,   0,   0
   VAR1      INT64   4294967295,   0
   VAR2      INT64   1,      0
.data?
   String      db 20 dup(?)
   RESULT      dt ?

.code
start:
   movdqa   XMM0, XMMWORD PTR[VAR1]
   movdqa   XMM1, XMMWORD PTR[VAR2]
   PADDQ   XMM1, XMM0
   movdqa   XMMWORD PTR[VAR2], XMM1

   finit
   fild qword ptr[VAR2]
   fbstp RESULT
   lea esi,String
   add esi, 18
   lea edi,RESULT
   xor edx, edx
   xor ebx, ebx
   mov ecx, 9
@@:
   dec esi
   dec esi
   mov bl, byte ptr [edi]
   shl ebx,4
   add bh, 48
   mov dl, bh
   xor bh, bh
   shr ebx, 4
   add bl, 48
   mov dh, bl
   mov word ptr [esi], dx
   inc edi
   loop @B
   invoke MessageBox,0,addr String,0,MB_ICONASTERISK
   invoke ExitProcess,0
end start
(http://smiles.kolobok.us/light_skin/party.gif)
Title: Re: SSE
Post by: bomz on April 29, 2012, 04:46:35 PM
Quote.686
.xmm
CSEG segment use16
assume cs:CSEG, ds:CSEG, es:CSEG, ss:CSEG

   movaps  xmm0,xmmword ptr es:[esi]

programe hang than try to execute SSE command. it is impossible to use USE16 32bit address and SSE?
Title: Re: SSE
Post by: bomz on April 29, 2012, 04:52:14 PM
sory - it's Microsoft Virtual Machine SSE problem (http://smiles.kolobok.us/standart/blush2.gif)

*as always when try to formulate problem to another people good idea come