News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Possible problems with SSE usage.

Started by KeepingRealBusy, July 07, 2010, 12:57:11 AM

Previous topic - Next topic

dedndave

just a thought, here - it may or may not offer a speed advantage
copy the buffer contents into a "safe" buffer that is known to have adequate tail-end space for the over-shoot
i know it takes time to copy, but at least you wouldn't have to test inside the loop

KeepingRealBusy

Quote from: jj2007 on July 15, 2010, 06:33:42 AM
Dave,
Thanks for looking at that.

Re lingo's pop the ret address technicque: Interrupt seem not to be a problem, although it is apparently nowhere documented.

Re point 2: You are perfectly right, there is a risk at the end of a VirtualAlloc buffer. Any suggestions? The routine is already a bit slow ::)

As far as the pop two args from the stack, I remember Lingo doing:


    pop ecx    ;    Get return.
    pop eax    ;    Get first.
    pop ebx    ;    Get second.
    push eax  ;     Save relocated  return.


You now have a protected return and two unprotected args but in eax and ebx. This will work, but don't count on the unprotected args on the stack. Another trick was


    mov    eax,[esp+4]    ;    Get arg
    mov    [esp+4],esi     ;    Save esi over the arg.


This will also work.

If you can wait for just a bit, I am working on an entire set of string routines that are safe and (mostly) SSE. Right now I have towlower towupper wcslwr wcsupr wcslwr_s wcsupr_s, and am working on wcscpy (a modification of WordAlign from my zip here), then wcslen then wcschr then wcscmp, then wcsstr, then the wcsn.... Then I'll work on the normal string versions. These are all for my own use, but I'll publish in a source zip for others to blatantly steal (right, Lingo, isn't that what they do to yours?).

I have a question about what to do with error returns such as the crt__ functions return. I was thinking about returning error codes in edx and the normal return in eax. The end of the functions would end with an "or edx,edx" so that the caller could just "jz Good" or "jnz Bad". Since these are not CDECL, the flags would not be destroyed by INVOKE's add esp,n.

I have even more questions about some of the crt_ comments in \crt\src like "the return string can be shorter or longer than the input string". Maybe for MBCS, but for Unicode?

The following are some of my times:

AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ (SSE3)
558     cycles for wInstr (MasmBasic)
16444   cycles for StrStrIW
37      cycles for crt_towlower
10      cycles for KRBtowlower
1002    cycles for crt__wcslwr
723     cycles for KRBwcslwr
383     cycles for KRBwcslwr2
32      cycles for crt_towupper
10      cycles for KRBtowupper
576     cycles for crt__wcsupr
829     cycles for KRBwcsupr
411     cycles for KRBwcsupr2
--- done ---

Dave.

jj2007

Quote from: KeepingRealBusy on July 15, 2010, 09:59:01 PM
As far as the pop two args from the stack, I remember Lingo doing:


    pop ecx    ;    Get return.
    pop eax    ;    Get first.
    pop ebx    ;    Get second.
    push eax  ;     Save relocated  return.


Dave,
You probably meant push ecx, not eax.
However, is it really needed?

Q. Why does MS-DOS switch stacks for hardware interrupts?
QuoteAPPLIES TO Microsoft Windows 3.1 Standard Edition

http://en.wikipedia.org/wiki/Task_State_Segment#Inner_Level_Stack_Pointers
QuoteThe TSS contains 6 fields for specifying the new stack pointer when a privilege level change happens. The field SS0 contains the stack segment selector for CPL=0, and the field ESP0/RSP0 contains the new ESP/RSP value for CPL=0. When an interrupt happens in protected (32-bit) mode, the x86 CPU will look in the TSS for SS0 and ESP0 and load their values into SS and ESP respectively. This allows for the kernel to use a different stack than the user program, and also have this stack be unique for each user program.

http://stackoverflow.com/questions/866672/switching-stacks-in-c
QuoteOn 16-bit DOS, an interrupt could occur and this interrupt would be initially running on the same stack. If you got interrupted in the middle of the operation, the interrupt could crash because you only updated ss and not sp.

On Windows, and any other modern environment, each user mode thread gets its own stack. If your thread is interrupted for whatever reason, it's stack and context are safely preserved

KeepingRealBusy

JJ,

But how do you get into ring 0 from you program, how does the system know how to get back to you? The CPU needs to save your return information somewhere, and that somewhere is your current stack, THEN it can swap the stacks and insure that the interrupt stack is enough for the processing.

Anyone else, am I wrong here?

Dave.


KeepingRealBusy

Quote from: E^cube on July 16, 2010, 12:54:28 AM
Yes.
In what way?, I mean, how does the hardware change the stack on the fly without destroying any registers?

Dave.

ecube

it doesn't, I just really felt like saying yes :) I apologize

KeepingRealBusy

Apology accepted, but not necessary.

Any experts around that understand and can explain a privilege level switch.

sinsi

Intel manuals, especially volume 3a chapter 6.3 "task switching"
Light travels faster than sound, that's why some people seem bright until you hear them.

KeepingRealBusy

Quote from: sinsi on July 16, 2010, 02:23:55 AM
Intel manuals, especially volume 3a chapter 6.3 "task switching"


sinsi,

Thank you. I knew that someday I would have to go through all of this. About 40 pages of documentation and diagrams later (AMD PDF's), I can safely say that anything we are doing here will not be affected by a task switch. The first thing that happens is that stack pointer is saved in the TSS (system) and loaded with an appropriate new stack pointer, then the flags and eip are pushed on to the NEW frame, then all regs are saved in the TSS. An opposite set of actions cause the task to be restarted.

Only something you do in your task (push, mov [esp+n],DataOrReg, etc) would wipe out an unprotected stack location.

So, JJ, your code is safe, and yes, I meant "push ecx", and I would use this instead of leaving the return address unprotected. With some of the MASM32 macros, I would not trust that some invocation wouldn't push a register for a calculation or a call and wipe out a unprotected return address ("print" comes to mind).

Dave.

dedndave

Dave...
this topic has been beat to death a few times
it seems the members are split (50-50 ?) on this issue
some say it is ok to use space under [ESP] - some say it is not
the best we seem to do is - we agree to disagree  :P

out of old-school habit, i avoid using stack space under the stack pointer
those who argue it is ok say that windows protects that space, as interrupts, other threads, etc, are never allowed to access it
you'll have to decide for yourself   :bg

sinsi

I've used parameters as storage before with no problems

myproc:
  xchg ebx,[esp+4]
  xchg esi,[esp+8]
  ...
  pop ecx
  pop ebx
  pop esi
  jmp ecx

I figure that if you reserve space (sub esp,xxx) it's yours but pushing params ([esp+x]) means they are fair game.
In the same way, anything below esp ([esp-x]) is undefined and likely to get zapped at some stage (is that what you mean by 'under [esp]' dedndave?), especially using a proc with a stack frame or simply forgetting what you did 50 lines ago  :bdg

It's all personal, that's why we have the freedom of asm and not the constraints of a hll.
Light travels faster than sound, that's why some people seem bright until you hear them.

jj2007

Quote from: KeepingRealBusy on July 16, 2010, 04:05:51 AMOnly something you do in your task (push, mov [esp+n],DataOrReg, etc) would wipe out an unprotected stack location.

So, JJ, your code is safe, and yes, I meant "push ecx", and I would use this instead of leaving the return address unprotected. With some of the MASM32 macros, I would not trust that some invocation wouldn't push a register for a calculation or a call and wipe out a unprotected return address ("print" comes to mind).

Dave.

Dave, thanks for reading this up in the "official" manuals. My Wiki quote on TSS said something similar, but Intel is a more reliable source.
So it boils down to "yes, you can do it but make sure you know what you are doing in that proc". And, for example, print obviously pushes parameters.

dedndave

QuoteOnly something you do in your task (push, mov [esp+n],DataOrReg, etc) would wipe out an unprotected stack location.

hang on - is that a quote from the intel manual ?
and - if so - the OS could possibly alter that, no ?

clive

Quote from: sinsi
I've used parameters as storage before with no problems

myproc:
  xchg ebx,[esp+4]
  xchg esi,[esp+8]


I would worry about the speed of XCHG on memory, it is atomic and exposes the speed of the underlying memory (DRAM)

On my 3 GHz Prescott the "xchg ebx,[ebp+8]" takes ~100 machine cycles, or 33 ns, the memory access speed is ~17ns
It could be a random act of randomness. Those happen a lot as well.