News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Possible problems with SSE usage.

Started by KeepingRealBusy, July 07, 2010, 12:57:11 AM

Previous topic - Next topic

KeepingRealBusy

Quote from: Rockoon on July 14, 2010, 07:30:30 AM

Ah, now here is the rub.

On the one hand we have code that is going to overshoot its buffer on purpose, and then from time to time it is going to just catch an exception if one is raised because it not only overshot the buffer, it also overshot the contiguous memory pages the buffer resides in.

On the other we have code that will divide by zero from time to time, where the programmer is going to just catch the exception if one is raised, and then execute default-value semantics, error out, or whatever.

One of these is not like the other. In the buffer case, not only are we overshooting the buffer on purpose, but sometimes we also overshoot the process space too? Wow. Just wow. In the divide by zero case, its accidental or incidental, and not on purpose.

One area where I normally just want to swallow exceptions is fsqrt. Luckily the FPU lets us do just that.

Specifically "not only are we overshooting the buffer on purpose, but sometimes we also overshoot the process space too". Are you saying that the exception handler should correct this? In the case of loading 16 bytes of a string using SSE, the string is absolutely valid and null terminated and in the buffer, but if it is short and at the end of the buffer, then you would get the fault. What should the exception handler do, or are you saying that this should not be handled by an exception handler?

Dave.

Rockoon

Quote from: KeepingRealBusy on July 14, 2010, 01:23:41 PM
Specifically "not only are we overshooting the buffer on purpose, but sometimes we also overshoot the process space too". Are you saying that the exception handler should correct this? In the case of loading 16 bytes of a string using SSE, the string is absolutely valid and null terminated and in the buffer, but if it is short and at the end of the buffer, then you would get the fault. What should the exception handler do, or are you saying that this should not be handled by an exception handler?

Dave.

I'm saying that you normally shouldn't read 16 bytes from a buffer that ends in less than 16 bytes, and absolutely never read any bytes beyond the end of your allocation space without something having gone terribly wrong.

If you are intent on reading 16 bytes at a time then clearly performance is your concern. Since performance is your concern..

(A) align your strings so that they are 16-byte aligned.
(B) allocate space in 16-byte multiples so that you never overshoot your buffer.
(C) stop relying on NULL to terminate your strings. Store the length instead. You can still use a NULL to make them compatible with other routines.

When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

dedndave

i think that "overshoot" happens quite often
a lot of functions are passed string/buffer pointers without buffer size values
they assume the null terminator to be valid, i guess   :bg

it probably also happens when functions try to dword align themselves inside a buffer
care isn't always taken to insure that accesses a few bytes above and below the buffer is avoided

when i wrote the ling long kai fang routines, i was extra careful to avoid this, and you have to specify both in and out sizes as parms
those routines dword-align themselves inside both buffers
i may have specified an aligned input buffer base - i don't remember at the moment
but, there is some code in there to avoid overshoot above the input value buffer
and some more code to avoid overshoot at the end of the output buffer

KeepingRealBusy

Quote from: Rockoon on July 14, 2010, 05:26:22 PM
Quote from: KeepingRealBusy on July 14, 2010, 01:23:41 PM
Specifically "not only are we overshooting the buffer on purpose, but sometimes we also overshoot the process space too". Are you saying that the exception handler should correct this? In the case of loading 16 bytes of a string using SSE, the string is absolutely valid and null terminated and in the buffer, but if it is short and at the end of the buffer, then you would get the fault. What should the exception handler do, or are you saying that this should not be handled by an exception handler?

Dave.

I'm saying that you normally shouldn't read 16 bytes from a buffer that ends in less than 16 bytes, and absolutely never read any bytes beyond the end of your allocation space without something having gone terribly wrong.

If you are intent on reading 16 bytes at a time then clearly performance is your concern. Since performance is your concern..

(A) align your strings so that they are 16-byte aligned.
(B) allocate space in 16-byte multiples so that you never overshoot your buffer.
(C) stop relying on NULL to terminate your strings. Store the length instead. You can still use a NULL to make them compatible with other routines.

You missed the operative sentence in my first post:

Quote
Note:

This coding fix only applies to a general purpose library routine in which the library routine has no knowledge of how the data was created or where it was saved.

Dave.

Rockoon

Quote from: KeepingRealBusy on July 14, 2010, 06:03:00 PM
You missed the operative sentence in my first post:

Quote
Note:

This coding fix only applies to a general purpose library routine in which the library routine has no knowledge of how the data was created or where it was saved.

Dave.


No, I didnt. A general purpose routine wouldnt be swallowing protection violations. Thats decidedly not general purpose.

You have made the decision to not be general purpose when you started reading 16-bytes at a time.
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

jj2007

Quote from: Rockoon on July 14, 2010, 06:11:53 PMA general purpose routine wouldnt be swallowing protection violations. Thats decidedly not general purpose.

lstrcpy is a general purpose routine... and I fully agree, it should not swallow protection violations

QuoteYou have made the decision to not be general purpose when you started reading 16-bytes at a time.

Although there are still a few non-SSE2 machines around, it might be time to declare reading 16-bytes at a time "normal".

KeepingRealBusy

Quote from: Rockoon on July 14, 2010, 06:11:53 PM
Quote from: KeepingRealBusy on July 14, 2010, 06:03:00 PM
You missed the operative sentence in my first post:

Quote
Note:

This coding fix only applies to a general purpose library routine in which the library routine has no knowledge of how the data was created or where it was saved.

Dave.


No, I didnt. A general purpose routine wouldnt be swallowing protection violations. Thats decidedly not general purpose.

You have made the decision to not be general purpose when you started reading 16-bytes at a time.

My routines:

    Don't swallow protection violations.
    Read 16 bytes at a time.
    Require valid null terminated strings.
    Handle both Wide character (Unicode) and normal character strings.

There is nothing that says that a 16 BYTE reading function cannot be a general purpose routine, they are not mutually exclusive.

What, exactly, do you not like about  my routines?

Dave.

Rockoon

Quote from: jj2007 on July 14, 2010, 06:53:13 PM
Quote from: Rockoon on July 14, 2010, 06:11:53 PMA general purpose routine wouldnt be swallowing protection violations. Thats decidedly not general purpose.

lstrcpy is a general purpose routine...

Not if it swallows protection violations.


Quote from: jj2007 on July 14, 2010, 06:53:13 PM
Although there are still a few non-SSE2 machines around, it might be time to declare reading 16-bytes at a time "normal".

It really isnt an issue of "support." This is about design. If you swallow the page faults, then you are special purpose.

My last post was in error, however, since clearly the routine could be constructed to only make aligned reads even for unaligned input (result: never cross a page boundary in error when valid data was supplied to it) and that would make it general purpose.
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

Rockoon

Quote from: KeepingRealBusy on July 14, 2010, 07:32:00 PM
What, exactly, do you not like about  my routines?

I never said that I didn't like your routines. I never even looked at them prior to just now. I said that swallowing the page faults is not general purpose, which some posters seemed to consider a valid strategy (that a page fault wasnt an error, that the string could still have been terminated validly)
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

KeepingRealBusy

I stand corrected. I thought you were addressing your comments to my code and not to the other side discussion about SEH and faults.

What you see here of my code was only a little piece to handle unaligned (odd BYTE aligned) wide characters. From what I have learned, I will redo all my character routines, and implement wide character routines as well.

Dave.

jj2007

So here is an attempt to start a "16-bit safe collection": good ol ' string len.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
44      cycles for StrLen1 (safe)
47      cycles for StrLen2 (safe)
25      cycles for MasmBasic (unsafe)
132     cycles for MasmLib

44      cycles for StrLen1
47      cycles for StrLen2
25      cycles for MasmBasic
132     cycles for MasmLib

Results:
100 bytes for StrLen1
100 bytes for StrLen2
100 bytes for MasmBasic

Code sizes:
75      for StrLen1
75      for StrLen2
87      for MasmBasic

KeepingRealBusy

JJ,

I have looked at the code, and have one question. You pop the first 2 stack parameters into eax, leaving the return address on the stack, but it is unprotected by the esp value. I am not familiar with what happens during interrupts, but I do not think this is safe. If an interrupt comes in, where can the CPU save the current ip or any regs that need to be used?

Dave.

KeepingRealBusy

JJ,

A second problem. In the iteration loop you get two xmm regs (from [eax] and [eax+16) without checking the first for nulls. This is not safe for a the end of a VirtualAlloc buffer.

Dave.

jj2007

Dave,
Thanks for looking at that.

Re lingo's pop the ret address technicque: Interrupt seem not to be a problem, although it is apparently nowhere documented.

Re point 2: You are perfectly right, there is a risk at the end of a VirtualAlloc buffer. Any suggestions? The routine is already a bit slow ::)

ecube

try this



invoke AddVectoredExceptionHandler,1,handlexcept

;do everything...


handlexcept proc pExceptionInfo
mov edi, pExceptionInfo
mov eax, [edi].EXCEPTION_POINTERS.pExceptionRecord
mov edx, [edi].EXCEPTION_POINTERS.ContextRecord
cmp [eax].EXCEPTION_RECORD.ExceptionCode,STATUS_BREAKPOINT
jne @F
;cmp [eax].EXCEPTION_RECORD.ExceptionAddress,; is it our code address
;jne @F ;if not let others have a go

add [edx].CONTEXT.regEip,1
mov eax,EXCEPTION_CONTINUE_EXECUTION
ret
@@:
mov eax,EXCEPTION_CONTINUE_SEARCH ;let others have a go
ret
handlexcept endp


VEH is xp+ only but it's a beautiful thing, it gets exceptions before SEH and others do, and you can add as many different handlers as you like, but only 1 is really needed. When you do the EXCEPTION_CONTINUE_SEARCH it passes it on to the other handlers then onto SEH, etc... down the list.