News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Owned Lock Implementation

Started by Neo, May 23, 2009, 08:14:27 AM

Previous topic - Next topic

Neo

Hi everyone!  :bg  It's been a while, and I'll be posting much more soon, since progress on Inventor IDE has been going well.  However, I've got a question mostly (I think) to do with respect to memory synchronization between CPU caches.  Sorry that it's pretty specific stuff this time.

I've been implementing a 64-bit OS, (and finally making progress on it thanks to my IDE; the next release will be able to assemble/link 16-bit, 32-bit, and 64-bit all together), and I've got most of an implementation of a binary lock that keeps track of the owning thread.  I think I've found a way for an app to stay in privilege level 3 when calling either:

  • GetLock if no other thread currently has access to the lock and no thread is waiting on the lock, or
  • ReleaseLock if no thread is waiting on the lock
:U

The main idea is using a bus-locked cmpxchg16b, but details get a bit more complicated to handle the cases that require switching to privilege level 0.  The big question is: does using "lock cmpxchg16b [rsi]" correctly update/invalidate any read-cache copies of the 16-byte operand on other CPU cores?  Conversely, will "lock cmpxchg16b [rsi]" correctly force any other CPU cores to write out their write caches of those 16 bytes before the instruction executes?  The code may or may not require the latter, but it'd be useful information to know.  I'm also curious as to whether I've used sfence correctly to make sure updates have finished before letting others update the lock, and how pause should be used in the WaitForAccess spinloops (see below), since I can't find any examples with reputable sources of info.

Here's the code (not for use outside of this forum thread, please, as specified in the forum rules) for the lock.  You can assume that pCurrentThread is calculated somewhere (I'm still undecided on the best way to track it), that SleepForLock will put the thread pointed to by rbx to sleep indefinitely, and that WakeUpThreadInRdi will cause the thread pointed to by rdi to be scheduled to run when time is available.  AllocateSystemMemory and FreeSystemMemory allocate or free memory on the system heap.  In the final version, there will be copies of GetLock and ReleaseLock that will make a system call to get to GetLockExtended or ReleaseLockExtended from privilege level 3, but the idea is the same.

.nolist
;********************************************************************************************************************************
;*                                                                                                                              *
;*  File: Lock.inc                                                                                                              *
;*                                                                                                                              *
;*  This file contains the structures used for basic access-locking mechanisms.                                                 *
;*                                                                                                                              *
;*  Author:                                                                                                                     *
;*      - Neil G. Dickson                                                                                                       *
;*                                                                                                                              *
;********************************************************************************************************************************

;********************************************************************************************************************************
;*                                                                                                                              *
;*  Structure: LockStruct                                                                                                       *
;*                                                                                                                              *
;*  This stores the owning thread and waiting threads for a lock.  A LockStruct can be initialized to all zero (no owner and no *
;*  waiting threads), but one may want an initial owner or initial waiting threads.                                             *
;*                                                                                                                              *
;*  Members:                                                                                                                    *
;*  pOwner  - The thread that currently has the access controlled by this lock, or NULL if none                                 *
;*  pList   - Address of the list of threads waiting for the access controlled by this lock, or NULL if none, or -1 if the      *
;*  LockStruct is being modified non-trivially                                                                                  *
;*                                                                                                                              *
;********************************************************************************************************************************
LockStruct  struct
    pOwner  PTR Thread          ?
    pList   PTR LockWaitingList ?
LockStruct  ends

;********************************************************************************************************************************
;*                                                                                                                              *
;*  Structure: LockWaitingList                                                                                                  *
;*                                                                                                                              *
;*  This stores an array of threads waiting for the access controlled by a LockStruct.                                          *
;*                                                                                                                              *
;*  Members:                                                                                                                    *
;*  NextElementOffset   - Offset from the start of the LockWaitingList to where the next element would be put if the capacity   *
;*  is large enough                                                                                                             *
;*  Capacity            - Offset from the start of the LockWaitingList that marks the end of available space in the array       *
;*  ThreadList          - The array containing pointers to waiting threads; the minimum array length is 5, but the maximum is   *
;*  over 400 million (limited by Capacity being a DWORD)                                                                        *
;*                                                                                                                              *
;********************************************************************************************************************************
LockWaitingList struct
    NextElementOffset   DWORD       ?
    Capacity            DWORD       ?
    ThreadList          PTR Thread  5 dup (?)
LockWaitingList ends

.list


;********************************************************************************************************************************
;*                                                                                                                              *
;*  File: Lock.asm                                                                                                              *
;*                                                                                                                              *
;*  This file provides basic access-locking mechanisms for threads.                                                             *
;*                                                                                                                              *
;*  Author:                                                                                                                     *
;*      - Neil G. Dickson                                                                                                       *
;*                                                                                                                              *
;********************************************************************************************************************************

.code
;********************************************************************************************************************************
;*                                                                                                                              *
;*  Function: GetLock                                                                                                           *
;*                                                                                                                              *
;*  Parameters:                                                                                                                 *
;*  pLock   -                                                                                                                   *
;*                                                                                                                              *
;********************************************************************************************************************************
GetLock proc    pLock:PTR LockStruct
    xor     eax,eax
    xor     edx,edx
    xor     ecx,ecx
    mov     rbx,pCurrentThread
    mov     rsi,pLock
ASSUME RSI:PTR LockStruct
    lock cmpxchg16b [rsi]
    jz      HaveLock        ;If LockStruct was completely zero before, this thread now owns the lock.
    cmp     rbx,rax         ;Check if this thread already owns this lock
    je      HaveLock        ;
    call    GetLockExtended ;If not, we probably need to wait, so do things the long way
HaveLock:
    ret
GetLock endp

;********************************************************************************************************************************
;*                                                                                                                              *
;*  Function: GetLockExtended                                                                                                   *
;*                                                                                                                              *
;********************************************************************************************************************************
GetLockExtended proc
ASSUME RSI:PTR LockStruct
    or      rax,-1
WaitForAccess:
    lock xchg   [rsi].pList,rax
    cmp     rax,-1
    je      WaitForAccess
    cmp     [rsi].pOwner,0
    je      HaveItNow
    test    rax,rax
    jnz     AlreadyHaveList
;NOTE: AllocateSystemMemory here won't cause a deadlock because the system heap doesn't use a full lock, it just uses a spinlock.
    invoke  AllocateSystemMemory,sizeof LockWaitingList
ASSUME RAX:PTR LockWaitingList
    mov     [rax].NextElementOffset,offset LockWaitingList.ThreadList   ;length = 1; save (length+1)*8=10h
    mov     [rax].Capacity,sizeof LockWaitingList                       ;capacity = 5; save (cap+1)*8=30h
    mov     [rax].ThreadList[0],rbx                                     ;NOTE: Assuming rbx unchanged
    sfence
    mov     [rsi].pList,rax                                             ;NOTE: Assuming rsi unchanged
    SleepForLock
    ret

AlreadyHaveList:
    mov     ecx,[rax].NextElementOffset
    cmp     ecx,[rax].Capacity
    je      NeedNewList
    mov     qword ptr [rax+rcx],rbx
    add     [rax].NextElementOffset,8
    sfence
    mov     [rsi].pList,rax
    SleepForLock
    ret
   
NeedNewList:
    lea     rbp,[rcx*2]
    mov     rdi,rax
ASSUME RDI:PTR LockWaitingList
    invoke  AllocateSystemMemory,rbp        ;(see note about AllocateSystemMemory above)
    mov     ecx,[rdi].NextElementOffset
    mov     [rax].NextElementOffset,ecx
    mov     [rax].Capacity,ebp
    mov     rdx,8
CopyNextElement:
    mov     r8,qword ptr [rdi+rdx]
    mov     qword ptr [rax+rdx],r8
    add     rdx,8
    cmp     rdx,rcx
    jne     CopyNextElement
    mov     qword ptr [rax+rdx],rbx
    xchg    rax,rdi
    invoke  FreeSystemMemory,rax            ;(see note about AllocateSystemMemory above)
    sfence
    mov     [rsi].pList,rdi
    SleepForLock
    ret


HaveItNow:
    mov     [rsi].pOwner,rbx
    sfence
    mov     [rsi].pList,rax
    ret
GetLockExtended endp

;********************************************************************************************************************************
;*                                                                                                                              *
;*  Function: ReleaseLock                                                                                                       *
;*                                                                                                                              *
;*  Parameters:                                                                                                                 *
;*  pLock   -                                                                                                                   *
;*                                                                                                                              *
;********************************************************************************************************************************
ReleaseLock proc    pLock:PTR LockStruct
    mov     rax,pCurrentThread
    xor     edx,edx
    xor     ebx,ebx
    xor     ecx,ecx
    mov     rsi,pLock
ASSUME RSI:PTR LockStruct
    lock cmpxchg16b [rsi]
    jz      LockReleased
    call    ReleaseLockExtended
LockReleased:
    ret
ReleaseLock endp

;********************************************************************************************************************************
;*                                                                                                                              *
;*  Function: ReleaseLockExtended                                                                                               *
;*                                                                                                                              *
;********************************************************************************************************************************
ReleaseLockExtended proc
ASSUME RSI:PTR LockStruct
    or      rax,-1
WaitForAccess:
    lock xchg   [rsi].pList,rax
    cmp     rax,-1
    je      WaitForAccess
    test    rax,rax
    jz      NobodyToWake
ASSUME RAX:PTR LockWaitingList
    mov     rdi,[rax].ThreadList[0]
    cmp     [rax].NextElementOffset,10h
    jne     NotEmptyList
;NOTE: FreeSystemMemory here won't cause a deadlock because the system heap doesn't use a full lock, it just uses a spinlock.
    invoke  FreeSystemMemory,rax
    xor     eax,eax
    mov     [rsi].pOwner,rdi
    sfence
    mov     [rsi].pList,rax
    WakeUpThreadInRdi
    ret
   
NotEmptyList:   
    mov     ecx,[rax].NextElementOffset
    sub     rcx,offset LockWaitingList.ThreadList + 8
    shr     rcx,3
    lea     rdi,[rax].ThreadList[0]
    lea     rsi,[rax].ThreadList[8]
    rep movsq
    sub     [rax].NextElementOffset,8
    mov     [rsi].pOwner,rdi
    sfence
    mov     [rsi].pList,rax
    WakeUpThreadInRdi
    ret

NobodyToWake:
    mov     [rsi].pOwner,rax
    sfence
    mov     [rsi].pList,rax
    ret
ReleaseLockExtended endp

UtillMasm

Inventor IDE?

like RadASM?
also masm6148444 open source?
:eek

###

oh my fo! it's a windows java virtual machine client program!

sinsi

About the caches, look at "Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Part 1"

Quote from: 7.1.4 Effects of a LOCK Operation on Internal Processor CachesBecause frequently used memory locations are often
cached in a processor's L1 or L2 caches, atomic operations can often be carried out
inside a processor's caches without asserting the bus lock. Here the processor's
cache coherency protocols insure that other processors that are caching the same
memory locations are managed properly while atomic operations are performed on
cached memory locations.
Light travels faster than sound, that's why some people seem bright until you hear them.

Neo

Quote from: sinsi on May 23, 2009, 08:32:02 AM
About the caches, look at "Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Part 1"

Quote from: 7.1.4 Effects of a LOCK Operation on Internal Processor CachesBecause frequently used memory locations are often
cached in a processor's L1 or L2 caches, atomic operations can often be carried out
inside a processor's caches without asserting the bus lock. Here the processor's
cache coherency protocols insure that other processors that are caching the same
memory locations are managed properly while atomic operations are performed on
cached memory locations.


Thanks!  How did I miss that?  I could've sworn I've read that chapter.  :wink

Neo

Quote from: UtillMasm on May 23, 2009, 08:20:15 AM
Inventor IDE?

like RadASM?
also masm6148444 open source?
:eek

###

oh my fo! it's a windows java virtual machine client program!

Hehe, it's actually not much like RadASM if you know about the best features, but I'll take that as a compliment, since I've used RadASM for years.  It has assembling and linking built in (the code for it is across several files in here and here, but the bulk is in ASM.xml, in the download, not the repository), and it has error detection (though not warnings yet), and as you rename things, all of their references get updated automatically.  However, it is still an alpha, so it's missing major features like search.  I'll be posting a thread about it shortly (when I get Alpha 5a out, since it'll be much more stable than Alpha 5).

To be clear, there's no intended association with MASM.  The code is mostly compatible with MASM, so I had named some things like that earlier-on, but I've tried to remove references like that, since I want to maintain half-decent relations with Microsoft.  It should work on linux/Mac too, but it doesn't yet create linux/Mac executables, so it'll probably crash if you hit Run.  Also, it won't be in Java forever, since the memory it takes up is ghastly; a friend of mine is starting the long task of porting to C/C++.