somebody may give really working code for 128 bit registers? or better all instructions with examples
http://www.mark.masmcode.com/
Quote mov ecx,16384 ;write 16384 16-byte values, 16384*16 = 256KB.
; So we are copying a 256KB array
mov esi,offset src_arr ;pointer to the source array which has to be
; 16-byte aligned or you will get an exception.
mov edi,offset dst_arr ;pointer to the destination array which has to be
; 16-byte aligned or you will get an exception.
looper:
movdqa xmm0,[esi] ;works on P3 and up
movntps [edi],xmm0 ;Works on P3 and up
add esi,16
add edi,16
dec ecx
jnz looper
(http://smiles.kolobok.us/light_skin/rtfm.gif)
Quote.686
.xmm
.model flat, stdcall
option casemap :none
include \MASM32\INCLUDE\windows.inc
include \MASM32\INCLUDE\user32.inc
include \MASM32\INCLUDE\kernel32.inc
includelib \MASM32\LIB\user32.lib
includelib \MASM32\LIB\kernel32.lib
.data
align 16
var1 db "1234567812345678",0
var2 db "0000000000000000",0
.code
start:
lea esi, var1
lea edi, var2
;movd xmm(0),[esi]
;movd [edi],xmm(0)
movq xmm(0),[esi]
movq [edi],xmm(0)
invoke MessageBox,0,ADDR var2,0,MB_ICONASTERISK
invoke ExitProcess,0
end start
(http://s1.ipicture.ru/uploads/20111208/TRHaaRsE.png)
Search the forum for SSE2 - 14 pages of results. Many have attachments.
Or try a search for pcmpeqb - 2 pages, most of them on timing fast SSE2 algos (and many found their way into MasmBasic).
If you have a more specific need, be more specific and somebody will help.
somebody have example (for any code and make-batch file) for masm 6.15?
application with SSE instruction both compatible with AMD and Intel processors?
bomz,
Do yourself a favour, get ML 9, 10 or 11, you can have real PHUN with SSE4.2 :P
QuotePentium 4 2.26 SL6RY C1 2.26 GHz 512 KB 533 MT/s 17× 1.53 V 58 W Socket 478 RK80532PE051512
Northwood (130 nm)
* Intel Family 15 Model 2
* All models support: MMX, SSE, SSE2
I need to delete double urls from the list of 20.000 http addresses. I do a little application which do this 10 minutes first. than I optimize it... optimize... now it do this 0.375 sec. This good task for SSE training
http://www.masm32.com/board/index.php?PHPSESSID=b16411be671a312294b80470f76cd95d&topic=16430.0
(http://smiles.kolobok.us/light_skin/party.gif)
Quote.686
.xmm
.model flat, stdcall
option casemap :none
include \MASM32\INCLUDE\windows.inc
include \MASM32\INCLUDE\user32.inc
include \MASM32\INCLUDE\kernel32.inc
includelib \MASM32\LIB\user32.lib
includelib \MASM32\LIB\kernel32.lib
.data
align 16
var1 db "1234567812345678",0
var2 db "0000000000000000",0
buffer db 512 dup (0)
.code
start:
lea esi, var1
lea edi, var2
movups xmm1, [esi]; XMMWORD PTR[esi]
movups [edi],xmm1
invoke MessageBox,0,ADDR var2,0,MB_ICONASTERISK
invoke ExitProcess,0
end start
(http://s2.ipicture.ru/uploads/20111208/FxOSkSU3.png)
http://www.microsoft.com/downloads/en/details.aspx?familyid=7A1C9DA0-0510-44A2-B042-7EF370530C64&displaylang=en (http://smiles.kolobok.us/light_skin/download.gif)
Quote from: dedndave on April 16, 2011, 01:10:18 PM
it is far easier to just use 7-zip to extract the file from the masm8 setup
1) right-click on the setup program, 7-zip, Extract files, OK
2) inside the resulting folder, another file - repeat the same thing
3) inside that, there are 2 files, an MSI, and a CAB
4) again, use 7-zip to extract files from the CAB file
5) inside the CAB is a file named FL_ml_exe_____X86.3643236F_FC70_11D3_A536_0090278A1BB8
6) rename it to ML.exe
7) change ML.EXE in C:\masm32\bin
bomz, don't post files that are not yours. It is a licence violation to post a Microsoft owned binary. Just use the normal Microsoft link.
http://neilkemp.us/src/sse_tutorial/sse_tutorial.html
Intel SSE Tutorial : An Introduction to the SSE Instruction Set
Quote from: bomz on December 08, 2011, 09:44:18 AM
I need to delete double urls from the list of 20.000 http addresses. I do a little application which do this 10 minutes first. than I optimize it... optimize... now it do this 0.375 sec. This good task for SSE training
Zip the list and post it here. We can do it in less than 0.1 seconds.
It's password URL's for files access. 92 sign long. I doubt it's possible more quickly. make your own random list
it's hard to me formulate it in english
I match to each URLtricky algorithm logical summary of all it's signs in huge matrix (32 mb size for 500.000 max strings), so to compare two string enough compare it's 32 bit "hash" (legth is equal). if hash equal - this need sign compare, if not - no need any compare. so in 20.000 list only 5-10 may have the same "hash" or any
Quote from: bomz on December 08, 2011, 11:51:08 AM
It's password URL's for files access. 92 sign long. I doubt it's possible more quickly. make your own random list
Creates a file with 20,000 different URLs, of which roughly half are unique. Writes to a second file only the unique URLs.
30 lines, 32...47 ms on my slow old Celeron, reading the old and writing the new file included.
include \masm32\MasmBasic\MasmBasic.inc ; download (http://www.masm32.com/board/index.php?topic=12460)
Init
mov ecx, 19999 ; we need a random file with 20,000 URLs
Dim My$(ecx)
.Repeat
Let My$(ecx)="http://go"+Str$(Rand(10000))+"site.htm"
dec ecx
.Until Sign?
Store "MyURLs.txt", My$()
push Timer ; ------- timing includes reading and writing of files ------
Recall "MyURLs.txt", Mu$() ; file contains multiple URLs, about 50% are unique
xchg eax, ecx ; save # of lines
QSort Mu$()
Dim URL$(ecx)
xor edi, edi
dec ecx
.Repeat
mov esi, Mu$(ecx)
.Repeat
dec ecx
.Until Sign? || StringsDiffer(esi, Mu$(ecx))
Let URL$(edi)=esi
inc edi
.Until signed ecx<=0
Store "MyUniqueURLs.txt", URL$(), edi
void Timer
pop edx
Inkey Str$("The action took %i ms", eax-edx)
Exit
end start
EDIT: .Until
signed ecx<=0 ; signed is a simple equate: sdword ptr - without "signed", the code would continue if ecx was below zero, and trouble was ahead. Not for n=20000, but e.g. for 50000 strings.
@dancho: Thanxalot :bg
little off topic here
@jj2007
didn't notice this before but you masm basic is really top notch product,
really nice and clean code,
gratz on that...
yah - he has spent a lot of time on it
it is pretty fast, too :U
QuoteMasmBasic.lib(libtmpAB.obj) : warning LNK4078: multiple ".drectve" sections foun
d with different attributes (00000240)
Quote.686
.xmm
.model flat, stdcall
option casemap :none
include \MASM32\INCLUDE\windows.inc
include \MASM32\INCLUDE\user32.inc
include \MASM32\INCLUDE\kernel32.inc
includelib \MASM32\LIB\user32.lib
includelib \MASM32\LIB\kernel32.lib
.data
align 16
var1 db "1111222233334444"
mess db "MOVAPS ",9
var5 db "0000000000000000"
db 13,10,"SHUFPS 0D8h ",9
var6 db "0000000000000000"
db 13,10,"SHUFPS 01Eh ",9
var7 db "0000000000000000"
db 13,10,"MOVUPS ",9
var8 db "0000000000000000"
db 13,10,"UNPCKHPS ",9
var9 db "0000000000000000"
db 13,10,"UNPCKLPS ",9
var10 db "0000000000000000"
db 13,10,"MOVDQA ",9
var11 db "0000000000000000"
db 13,10,"PINSRW ",9
var12 db "0000000000000000"
db 13,10,"PEXTRW ",9
var13 db "0000000000000000",0
.code
start:
lea esi, var1
lea edi, var5
MOVAPS xmm1, XMMWORD PTR[esi]
MOVAPS [edi],xmm1
SHUFPS XMM1, XMM1, 0D8h
MOVAPS [edi+32],xmm1
MOVAPS xmm1, XMMWORD PTR[esi]
SHUFPS XMM1, XMM1, 01Eh
MOVAPS [edi+64],xmm1
MOVUPS xmm1, [esi]
MOVAPS [edi+96],xmm1
MOVAPS xmm1, XMMWORD PTR[esi]
UNPCKHPS XMM1, XMM1
MOVAPS [edi+128],xmm1
MOVAPS xmm1, XMMWORD PTR[esi]
UNPCKLPS XMM1, XMM1
MOVAPS [edi+160],xmm1
MOVDQA XMM0, [esi]
MOVDQA [edi+192], XMM0
MOVDQA XMM0, [esi]
MOV eax, '**'
PINSRW XMM0, eax, 4
MOVDQA [edi+224], XMM0
MOVDQA XMM0, [esi]
MOVDQA XMM0, [esi]
PEXTRW eax, XMM0, 7
PINSRW XMM0, eax, 0
MOVDQA [edi+256], XMM0
MOVDQA XMM0, [esi]
invoke MessageBox,0,ADDR mess,0,MB_ICONASTERISK
invoke ExitProcess,0
end start
(http://s1.ipicture.ru/uploads/20111211/5C1jmapU.png)
360 mine yours 141 (http://smiles.kolobok.us/light_skin/girl_cray2.gif)
QSort Mu$()
StringsDiffer(esi, Mu$(ecx))
What is this?
as I understand first you sort list by the oder above-under. and than only need to compare neighbor strings.
When I do mine I think that sorting need much steps than 1 full list reading
Quote from: bomz on December 09, 2011, 12:50:27 AM
360 mine yours 141 (http://smiles.kolobok.us/light_skin/girl_cray2.gif)
QSort Mu$() ; QuickSort of strings
StringsDiffer(esi, Mu$(ecx)) ; what the name says
What is this?
> as I understand first you sort list by the oder above-under. and than only need to compare neighbor strings. ; YES
> When I do mine I think that sorting need much steps than 1 full list reading
the logic is interesting, but it might take longer
> warning LNK4078
Thanks, will look into it. It's a harmless warning, though.
It's strange because when you sorting you already compare each strings. and may put them to double and unique list already.
It's need thinking and fresh head. yesterday I think about reason of so difference. first I think that yours first making list when read it from cache but than I try my own list. mine no need "clean list" URL's may find it in trash. than I rebuild mine to console. may be ...... it is not clear for me now why this algorithm need low steps two time
When I make it first I need to decide my problem, So it was not important 10 min or 20. When I find prog for it that do this about 1 min or something about. I think that this prog must use the algorithm from common theory and try another way
sad it's only russian. it's allow to work with lists
http://zalil.ru/32233341
This is SSE1 or SSE2? - upper examples
http://www.tommesani.com/Docs.html
You can use AMD's documentation (http://developer.amd.com/documentation/guides/Pages/default.aspx) to determine the instruction set (SEEx, AVX)
AMD64 Architecture Programmer's Manual Volume 4: 128-bit and 256 bit media instructions (http://support.amd.com/us/Processor_TechDocs/APM_V4_26568.pdf)
Quote.686
.MMX
.XMM
.model flat, stdcall
option casemap :none
include \MASM32\INCLUDE\windows.inc
include \MASM32\INCLUDE\user32.inc
include \MASM32\INCLUDE\kernel32.inc
includelib \MASM32\LIB\user32.lib
includelib \MASM32\LIB\kernel32.lib
.data
align 16
var1 db "1234567812345678"
var2 db "0000000000000000",0
.code
start:
lea esi, var1
lea edi, var2
movdqa xmm0,[esi]
movntps [edi],xmm0
invoke MessageBox,0,ADDR var2,0,MB_ICONASTERISK
invoke ExitProcess,0
end start
http://www.mark.masmcode.com/
under DOS don't see any difference between SSE and 386 rep movsd
Quote@@:
movaps xmm0, ds:[esi]
movaps xmm1, ds:[esi+16]
movaps xmm2, ds:[esi+32]
movaps xmm3, ds:[esi+48]
movaps xmm4, ds:[esi+64]
movaps xmm5, ds:[esi+80]
movaps xmm6, ds:[esi+96]
movaps xmm7, ds:[esi+112]
movaps es:[edi],xmm0
movaps es:[edi+16],xmm1
movaps es:[edi+32],xmm2
movaps es:[edi+48],xmm3
movaps es:[edi+64],xmm4
movaps es:[edi+80],xmm5
movaps es:[edi+96],xmm6
movaps es:[edi+112],xmm7
add si, 128
add di, 128
sub cx, 1
jnz @B
How convert REAL4 to string?
Quote;and ebx, 111111111111111111111111b
;and eax, 00111111000000000000000000000000b
;rol eax, 8
(http://smiles.kolobok.us/light_skin/suicide2.gif)
PINSRB PINSRQ don't work on SSE2 is it possible to ADD 64 bit integer or only REAL4?
Quote from: bomz on December 11, 2011, 09:51:55 AM
How convert REAL4 to string?
Beware of the lousy precision...
Quoteinclude \masm32\MasmBasic\MasmBasic.inc
.data
MyReal4 REAL4 3.14159265358979324
Init
DefNum 9
fldpi
fstp MyReal4
Let esi=Str$(MyReal4)
DefNum 19
Inkey "PI=", Tb$, esi, CrLf$, "Exact=", Tb$, Str$(PI)
Exit
end start
PI= 3.14159274
Exact= 3.141592653589793238
This is Basic?
Quote from: bomz on December 11, 2011, 10:23:50 AM
This is Basic?
No, it's Assembler. To be precise: it assembles with ml.exe versions 6.15 ... 10.0 or JWasm :bg
Hint: if you prefer C, go for crt_sprintf.
I prefer MASM
It's convert through FPU?
Quote from: bomz on December 11, 2011, 10:28:35 AM
It's convert through FPU?
Yes, Str$() (http://www.masm32.com/board/index.php?topic=12460) uses the FPU and has a REAL10 internal precision. You can set output precision either with DefNum n (n=1...19) or with a sprintf type format string:
PrintLine "Precision:", Str$("\n%3f", PI), Str$("\n%7f", PI), Str$("\n%Cf", PI), Str$("\n%Gf", PI), Str$("\n%Jf", PI)
Precision:
3.14
3.141593
3.14159265359
3.141592653589793
3.141592653589793238
I try do this with masm, read about REAL - BCD(http://smiles.kolobok.us/artists/vishenka/l_book.gif)
Quote.386
.MMX
.XMM
.model flat, stdcall
option casemap :none
include \MASM32\INCLUDE\windows.inc
include \MASM32\INCLUDE\masm32.inc
include \MASM32\INCLUDE\gdi32.inc
include \MASM32\INCLUDE\user32.inc
include \MASM32\INCLUDE\kernel32.inc
include \MASM32\INCLUDE\fpu.inc
includelib \MASM32\LIB\masm32.lib
includelib \MASM32\LIB\gdi32.lib
includelib \MASM32\LIB\user32.lib
includelib \MASM32\LIB\kernel32.lib
includelib \MASM32\LIB\fpu.lib
.data
mestitle db "Bomz",0
VAR1 REAL4 11111.0
VAR2 dt ?
.data?
buffer db 512 dup(?)
.code
start:
finit
fld VAR1
;fstp VAR2
invoke FpuFLtoA, 0, 10, ADDR buffer, SRC1_FPU or SRC2_DIMM
invoke MessageBox,0, ADDR buffer,ADDR mestitle,MB_ICONASTERISK
invoke ExitProcess,0
end start
(http://smiles.kolobok.us/artists/connie/connie_crazyperson.gif)
Quote.386
.MMX
.XMM
.model flat, stdcall
option casemap :none
include \MASM32\INCLUDE\windows.inc
include \MASM32\INCLUDE\masm32.inc
include \MASM32\INCLUDE\gdi32.inc
include \MASM32\INCLUDE\user32.inc
include \MASM32\INCLUDE\kernel32.inc
include \MASM32\INCLUDE\fpu.inc
includelib \MASM32\LIB\masm32.lib
includelib \MASM32\LIB\gdi32.lib
includelib \MASM32\LIB\user32.lib
includelib \MASM32\LIB\kernel32.lib
includelib \MASM32\LIB\fpu.lib
.data
VAR1 REAL4 0.0, 0.0, 0.0, 0.0
VAR2 REAL4 1.0, 1.0, 1.0, 1.0
VAR3 REAL4 1.0, 1.0, 1.0, 1.0
VAR4 REAL4 1.0, 1.0, 1.0, 1.0
mestitle db "SSE",0
shell32 db 'shell32.dll',0
MessBox MSGBOXPARAMS <sizeof MSGBOXPARAMS, 0, 0, offset buffer,\
offset mestitle, MB_OK OR MB_USERICON, 48, 0, 0, 0>
.data?
buffer db 512 dup(?)
string db 32 dup(?)
.code
start:
lea esi, VAR2
lea edi, VAR1
movdqa XMM0, [esi]
movdqa XMM1, [esi+16]
ADDPS XMM0, XMM1
movdqa [edi], XMM0
finit
fld dword ptr[VAR1]
invoke FpuFLtoA, 0, 1110h, ADDR string, SRC1_FPU or SRC2_DIMM
invoke lstrcpy, addr buffer, addr string
fld dword ptr[VAR1+4]
invoke FpuFLtoA, 0, 1110h, ADDR string, SRC1_FPU or SRC2_DIMM
invoke lstrcat, addr buffer, addr string
fld dword ptr[VAR1+8]
invoke FpuFLtoA, 0, 1110h, ADDR string, SRC1_FPU or SRC2_DIMM
invoke lstrcat, addr buffer, addr string
fld dword ptr[VAR1+12]
invoke FpuFLtoA, 0, 1110h, ADDR string, SRC1_FPU or SRC2_DIMM
invoke lstrcat, addr buffer, addr string
invoke LoadLibrary, addr shell32
mov MessBox.hInstance, eax
invoke MessageBeep, MB_ICONASTERISK
invoke MessageBoxIndirect, addr MessBox
invoke ExitProcess,0
end start
Quote.386
.MMX
.XMM
.model flat, stdcall
option casemap :none
include \MASM32\INCLUDE\windows.inc
include \MASM32\INCLUDE\user32.inc
include \MASM32\INCLUDE\kernel32.inc
includelib \MASM32\LIB\user32.lib
includelib \MASM32\LIB\kernel32.lib
include \MASM32\INCLUDE\masm32.inc
includelib \MASM32\LIB\masm32.lib
.data
var1 real4 4.1f, 1.0f, 1.0f, 1.0f
var3 real4 1.0f, 1.0f, 1.0f, 1.0f
var2 INT64 0
mestitle db "Bomz",0
form db "0.%ldx10^%ld", 0
shell32 db 'shell32.dll',0
MessBox MSGBOXPARAMS <sizeof MSGBOXPARAMS, 0, 0,\
offset buffer, 0, MB_OK OR MB_USERICON, 48, 0, 0, 0>
.data?
buffer db 512 dup(?)
.code
start:
invoke LoadLibrary, addr shell32
mov MessBox.hInstance, eax
lea esi, var1
lea edi, var3
movdqa XMM0, [esi]
movdqa XMM1, [edi]
ADDPS XMM1, XMM0
movdqa [edi], XMM1
finit
fld dword ptr[var3]
fstp var2
invoke FloatToStr2,var2,addr buffer
invoke MessageBeep, MB_ICONASTERISK
invoke MessageBoxIndirect, addr MessBox
invoke ExitProcess,0
end start
(http://smiles.kolobok.us/light_skin/dance4.gif)
You got it! Here is the MB equivalent:
include \masm32\MasmBasic\MasmBasic.inc
.data
VAR1 REAL4 0.0, 0.0, 0.0, 0.0
VAR2 REAL4 1.0, 1.0, 1.0, 1.0
VAR3 REAL4 1.0, 1.0, 1.0, 1.0
Init
lea esi, VAR2
lea edi, VAR1
movdqa XMM0, [esi]
movdqa XMM1, [esi+16]
ADDPS XMM0, XMM1
movdqa [edi], XMM0
fld REAL4 PTR VAR1
fld REAL4 PTR VAR1+4
fld REAL4 PTR VAR1+8
fld REAL4 PTR VAR1+12
deb 1, "Result:", ST(0), ST(1), ST(2), ST(3)
Exit
end start
I Turbo Basic know well. But now new language... But I like your examples.(http://smiles.kolobok.us/light_skin/thank_you2.gif)
I find Quick Sort in MASM32 help.
http://turbo-basic.narod.ru/index.html (http://smiles.kolobok.us/light_skin/girl_in_love.gif)
http://turbo-basic.narod.ru/TBDEMO-256-H101.rar
I find two errors in Turbo compiler - first it don't back right offset to symbols strings. But for assembler possible use numeric array with the same success. This was corrected only in Quick Basic
Second - it's not error but undocumented, 64 kb string may be cut from left, but cut from right you may only 32 kb string
Porting your example should be very easy, see attachment.
It is possible to compile it with MASM?
DownLoadMaster have problem - and this prog I do cut file from the end to the first nonzero byte.
Quote from: bomz on December 11, 2011, 10:34:48 PM
It is possible to compile it with MASM?
Sure, that's the whole point: MasmBasic is a library for ML.exe or JWasm.exe, and it works in parallel to the Masm32 installation, i.e. you can use both the Masm32 macros and library functions and the whole BASIC stuff.
INT64 SSE2 http://www.tommesani.com/SSE2MMX.html
Quote.386
.MMX
.XMM
.model flat, stdcall
option casemap :none
include \MASM32\INCLUDE\windows.inc
include \MASM32\INCLUDE\user32.inc
include \MASM32\INCLUDE\kernel32.inc
includelib \MASM32\LIB\user32.lib
includelib \MASM32\LIB\kernel32.lib
.data
; VAR1 dd 0, 1, 0, 0
; VAR2 dd 1, 0, 0, 0
VAR1 INT64 4294967295, 0
VAR2 INT64 1, 0
.data?
String db 20 dup(?)
RESULT dt ?
.code
start:
movdqa XMM0, XMMWORD PTR[VAR1]
movdqa XMM1, XMMWORD PTR[VAR2]
PADDQ XMM1, XMM0
movdqa XMMWORD PTR[VAR2], XMM1
finit
fild qword ptr[VAR2]
fbstp RESULT
lea esi,String
add esi, 18
lea edi,RESULT
xor edx, edx
xor ebx, ebx
mov ecx, 9
@@:
dec esi
dec esi
mov bl, byte ptr [edi]
shl ebx,4
add bh, 48
mov dl, bh
xor bh, bh
shr ebx, 4
add bl, 48
mov dh, bl
mov word ptr [esi], dx
inc edi
loop @B
invoke MessageBox,0,addr String,0,MB_ICONASTERISK
invoke ExitProcess,0
end start
(http://smiles.kolobok.us/light_skin/party.gif)
Quote.686
.xmm
CSEG segment use16
assume cs:CSEG, ds:CSEG, es:CSEG, ss:CSEG
movaps xmm0,xmmword ptr es:[esi]
programe hang than try to execute SSE command. it is impossible to use USE16 32bit address and SSE?
sory - it's Microsoft Virtual Machine SSE problem (http://smiles.kolobok.us/standart/blush2.gif)
*as always when try to formulate problem to another people good idea come