News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

LOCK prefix question

Started by theunknownguy, November 08, 2010, 01:03:19 AM

Previous topic - Next topic

theunknownguy

Hey guys can somebody give me a good explain about how the LOCK prefix really work?.

For what i understand LOCK its mostly like ensure the exclusive memory isn't used by another function. (Multiprocessor)

So under this point of view, emulate the LOCK prefix on my sandbox wouldn't be needed at all...

I still want to know in depth how LOCK prefix work.

Thanks

redskull

It "locks" the memory bus so that, generally, other CPU's on multi-CPU systems don't access the same memory at the same time (much like a hardware level mutex or semaphore).  With everything being in the cache these days, its not as important.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

theunknownguy

Quote from: redskull on November 08, 2010, 01:07:23 AM
It "locks" the memory bus so that, generally, other CPU's on multi-CPU systems don't access the same memory at the same time (much like a hardware level mutex or semaphore).  With everything being in the cache these days, its not as important.

-r

Now what happen if trying to access memory with the LOCK prefix? It would raise an exception or simple it will wait till the LOCK operation is done?.

redskull

I can't speak for certain, but I believe the second processor will simply wait until the first processor is finished.  I am fairly sure it doesn't throw an exception.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

clive

You expose the true latency of the memory subsystem, because the read has to occur after all other pending writes to the location have to complete first (ie write buffers, write back, flushing of matching dirty cache lines on all processor(s)), and during the actual read, and modification, the bus is exclusively held (owned) until the write itself completes.

Remember the buses will likely be running at different speed, different widths, and ownership (CPUs, DMA, etc) is arbitrated.

It is a very expensive operation. The speed of an exchange on memory (implicitly locked) will cost at least twice the access speed of the underlying memory, with whatever setup and turnaround times that might have.

It does not fault, it just stalls everything.
It could be a random act of randomness. Those happen a lot as well.

theunknownguy

Thanks both redskull and clive. All perfectly clear on my mind  :clap:

clive

Quote from: cliveIt does not fault, it just stalls everything.

I might refine that. It stalls everything on the common memory bus. It would not have to stall a second processor from running it's own code out of cache or local memory. Any conflicting cache lines should have been voided, so the processor should/could act autonomously. Attempts to refill the cache from the common memory will stall.

If you have a NUMA style multiprocessors they should be able to proceed if the locked region is outside of their local memory arena, where as SMP would block.

Much of this will clearly depend on the CPU, chipset, busing and memory architecture.
It could be a random act of randomness. Those happen a lot as well.

dedndave

with the 8086, i seem to recall we sometimes used lock with I/O instructions, too
maybe my memory is bus-locked - lol

clive

Quote from: dedndave on November 08, 2010, 11:54:32 PM
with the 8086, i seem to recall we sometimes used lock with I/O instructions, too
maybe my memory is bus-locked - lol

Not sure I can help you there Dave. I can't think of any IO code using RMW. There's DMA and video refresh, but LOCKing a CGA access would just snow things up more.
It could be a random act of randomness. Those happen a lot as well.

dedndave

the point was - i think bus lock applies to I/O the same as it applies to memory
unfortunately, i don't recall any specific instance for discussion - lol
it may be used that way in device drivers, and is probably why we don't see it too often
there is INSB and OUTSB that can be REP'd, but i don't think REP and LOCK go together well   :P

in any case, the instruction prefix is rarely used altogether
and, there is no great way to emulate it's behavior, other than using the prefix itself

dioxin

From the Intel manual:
The LOCK prefix can be prepended only to the following instructions and to those forms of the
instructions that use a memory operand: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG,
DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. An undefined opcode
exception will be generated if the LOCK prefix is used with any other instruction. The XCHG
instruction always asserts the LOCK# signal regardless of the presence or absence of the LOCK
prefix.

dedndave

thanks Paul
guess my memory is useless, now days   :P

that also explains why XCHG is such a slug
however, i doubt that applies to XCHG reg,reg
they probably mean to say...
QuoteThe XCHG instruction always asserts the LOCK# signal regardless of the presence or absence of the LOCK prefix, if a memory operand is used.

FORTRANS

Hi,

   Well it is a bit unlikely that another processor is going to
mess with a register's contents.  Whereas memory is fair
game.  As far as I/O and memory being similar, it started
out as being the state of one pin on the CPU package that
specified whether I/O or memory was accessed.

Regards,

Steve N.

theunknownguy

Real life examples powered by microsoft  :lol:

7C80981E kernel32.InterlockedExchange                                    8B4C24 04                                   MOV ECX,DWORD PTR SS:[ESP+4]                                      ; ntdll.7C920208
7C809822                                                                 8B5424 08                                   MOV EDX,DWORD PTR SS:[ESP+8]
7C809826                                                                 8B01                                        MOV EAX,DWORD PTR DS:[ECX]                                        ; ntdll.7C91DC9C
7C809828                                                                 F0:0FB111                                   LOCK CMPXCHG DWORD PTR DS:[ECX],EDX                               ; LOCK prefix
7C80982C                                                               ^ 75 FA                                       JNZ SHORT kernel32.7C809828
7C80982E                                                                 C2 0800                                     RETN 8


7C809832 kernel32.InterlockedCompareExchange                             8B4C24 04                                   MOV ECX,DWORD PTR SS:[ESP+4]                                      ; ntdll.7C920208
7C809836                                                                 8B5424 08                                   MOV EDX,DWORD PTR SS:[ESP+8]
7C80983A                                                                 8B4424 0C                                   MOV EAX,DWORD PTR SS:[ESP+C]
7C80983E                                                                 F0:0FB111                                   LOCK CMPXCHG DWORD PTR DS:[ECX],EDX                               ; LOCK prefix
7C809842                                                                 C2 0C00                                     RETN 0C



7C809846 kernel32.InterlockedExchangeAdd                                 8B4C24 04                                   MOV ECX,DWORD PTR SS:[ESP+4]                                      ; ntdll.7C920208
7C80984A                                                                 8B4424 08                                   MOV EAX,DWORD PTR SS:[ESP+8]
7C80984E                                                                 F0:0FC101                                   LOCK XADD DWORD PTR DS:[ECX],EAX                                  ; LOCK prefix
7C809852                                                                 C2 0800                                     RETN 8








dedndave

i can see where you might use it while writing semaphores between threads