C++ Name Wrangling / Assembly function access to C++ class members

Started by csusie1, October 12, 2011, 08:23:43 PM

Previous topic - Next topic

csusie1

BACKGROUND: Alright, I am writing some C++ bitmanipulation routines as a personal acacdemic exercise.  I always code in 64 bit, because why bother with 32 bit for my own stuff . . . might as well not live in the past, right?  I wanted to develop a bit rotation method that works on unsigned 64 bit integers (QWORDS).  I coded something myself in C++ that works but I thought there had to be an "easier" way.  I looked into compiler intrinsics and found functions that work on bytes and 16-bit words but nothing for 64.  I then looked at intel's developer manual and found there was indeed an assembler operation that could easily do what I want (ror, rol).  So I wrote a function (my first working assembly language function!) that did what I want:
C++ declaration:


typedef unsigned long long CPSWORD;
typedef unsigned char byte;
extern "C" CPSWORD _fastcall Rotate_Left(CPSWORD someWord, byte someByte);



Assembly code definition / implementation:



; Rotate Left Assembly Function
.data
amount BYTE ? ; the byte goes here

.code
PUBLIC Rotate_Left
Rotate_Left PROC
and rdx, 255d ;#? the shift_amount comes in on dl, the low byte of this register 
  ; start by clearing all the other gunk out by clearing all but the dl seg
mov amount, dl ;#? now store the amount in the variable
xor rdx, rdx ;#? clear the rdx register hi mom
mov rax, rcx ;#? move the word into rax, the register where return val will be stored
xor rcx, rcx ;#? clear the rcx register
mov cl, amount;#2 move shift_amount into cl (low byte of rcx)
rol rax, cl ;#? rotate the word left by cl (shift_amount)
ret ;#1 terminate procedure
Rotate_Left ENDP
END




Well, it works and does what I want. -- I'm pretty impressed with myself --/ The next step was for me to encapsulate it in a class.  I figured out I couldn't declare the member function in the class with extern "C" because, well C++ doesn't allow that for member functions.  When I removed extern "C" from the inside the class declaration, of course I got a linker error complaining that Rotate_Left was an unresolved symbol.  This seemed to be the result of name wrangling. So here is what I did:

Class definition:

class MyClass
{
private:

    static CPSWORD Rotate_Left(CPSWORD  someWord, byte someByte);
.....
};



then I just defined a dummy method in C++ in the implementation file thus:

CPSWORD MyClass::Rotate_Left(CPSWORD someWord, byte someByte)
{
////bla bla bla do nothing
      return 0;


}


Then I looked in the dissassembly and found Rotate_Left come out as: ?Rotate_Left@MyClass@cpsasm@@SA_K_KE@Z

So, I went into my code and just swapped Rotate_Left with ?Rotate_Left@MyClass@cpsasm@@SA_K_KE@Z then I deleted the dummy C++ implementation.

What do you know, it worked!

Ok so END BACKGROUND

THE ACTUAL QUESTIONS

1) Is there any better way I can id my C++ static class member functions implemented in assembly so I don't have to do all this sleuthing?  Is there some command in assembly to look for the id in the C++ code or vice versa?

2)  What if I wanted to make it a non-static member function that acted on the private (or even public) members of the class?  Are there any tutorials on writing class member functions in assembly?   How do I know how to properly reference the data members?

PLEASE NOTE:  I cannot use inline asm because it doesn't work on 64 bit.  Also, I think it is cleaner and a better learning experience to do them in separate files.  Any help is much appreciated!



}

clive

Quote from: csusie1
Assembly code definition / implementation:



; Rotate Left Assembly Function
.data
amount BYTE ? ; the byte goes here

.code
PUBLIC Rotate_Left
Rotate_Left PROC
and rdx, 255d ;#? the shift_amount comes in on dl, the low byte of this register 
  ; start by clearing all the other gunk out by clearing all but the dl seg
mov amount, dl ;#? now store the amount in the variable
xor rdx, rdx ;#? clear the rdx register hi mom
mov rax, rcx ;#? move the word into rax, the register where return val will be stored
xor rcx, rcx ;#? clear the rcx register
mov cl, amount;#2 move shift_amount into cl (low byte of rcx)
rol rax, cl ;#? rotate the word left by cl (shift_amount)
ret ;#1 terminate procedure
Rotate_Left ENDP
END

Parking "amount" with global scope renders it hideously unsafe for threading. Consider either a local variable, or holding it in a register. Consider rbx, and not bothering to mask with 255, or clearing rcx

Perhaps something less complicated?

Rotate_Left PROC
mov rax, rcx ;#? move the word into rax, the register where return val will be stored
mov rcx, rdx
rol rax, cl  ;#? rotate the word left by cl (shift_amount)
xor rdx, rdx ;#? clear the rdx register
ret      ;#1 terminate procedure
Rotate_Left ENDP


There are functions to do the unmangling of the names, there are probably ones to mangle them too, but your methods looks sufficiently effective to get the job done. You could always use cdecl to ease the calling of externalized functions.
It could be a random act of randomness. Those happen a lot as well.

csusie1

Thanks for the advice; I'm just learning assembly language now.  You don't know of any resources on how you can write non-static member functions of a class in assembler do you?  Specifically, I am wondering how you can gain access to the class data members and (perhaps) member functions from assembly language.  How to identify their symbol or address in particular.

drizz

Quote from: csusie1 on October 12, 2011, 08:23:43 PM
Then I looked in the dissassembly and found Rotate_Left come out as: ?Rotate_Left@MyClass@cpsasm@@SA_K_KE@Z

So, I went into my code and just swapped Rotate_Left with ?Rotate_Left@MyClass@cpsasm@@SA_K_KE@Z then I deleted the dummy C++ implementation.

What do you know, it worked!
Yes that's how it's done. Also remember that this isn't compatible with other c++ compilers.

extern "C" CPSWORD ASM_Rotate_Left(CPSWORD  someWord, byte someByte);

inline CPSWORD MyClass::Rotate_Left(CPSWORD someWord, byte someByte)
{
     return ASM_Rotate_Left(someWord,someByte);
}


You can't force C name mangling on C++ class member funtions. If you make your wrapper methods inline asm code should blend in well; pass your class variables this way too.

>> I am wondering how you can gain access to the class data members and (perhaps) member functions from assembly language.
Think of asm as C, how would you gain access to class data members from C?

The truth cannot be learned ... it can only be recognized.

csusie1

Having never learned C beyond having done things the C way in C++ occasionally, I am not 100% sure how to do it in c because c doesn't have member functions .... Wouldn't know to get the location of a member function's address. I have made non-class based interfaces to public members of c++ classes (to simplify access to the native class (in a native dll) in C#).  To do this, in C++ I wrote a function to dynamically instantiate a new class instance and return a void * type pointer to it.  I then wrote other functions that took as their first parameter the void pointer (necessary because c# has no idea what a pointer to a class instantiated in c++ or any built-in way to call its methods) and the remaining parameters were the same of those of its corresponding class-public-member function.  They cast the void * back to a pointer to the class and called the corresponding class function and passed the value back to the C# caller.  (There was also a function to delete the class -- of course).  The problem with this approach as I see it is that it creates overhead that I want to avoid -- what is the point of writing a specialized member function in assembly language if I have to play all those games to use it conveniently.   Moreover, I wish to encapsulate it so I never have to worry about its internals again once it's done -- so I don't want to use a free function.  The situation was different when I was trying to link to a native C++ class in a C# program.

Two questions as possible solutions come to mind:

How do you determine the distance in bytes of class member X from the address of the instantiated object.  That is, consider the following:



class SomeClass
{

public:

    inline CPSWORD EncapsulatingPublicMethodForASMMethod(int some_parameter, bool some_other_parameter)
    {

              return some_method_implemented_in_assembly_language(some_parameter, some_other_parameter);
    }
    ///bla bla bla some other methods and accessors and such and thus

private:

    CPSWORD some_method_implemented_in_assembly_language(int some_parameter, bool some_other_parameter);

    //blablabla a bunch of other data and methods
    unsigned short data_that_some_method_implemented_in_assembly_language_manipulates;

};




How can I determine the distance (in bytes I suppose) of the address of data_that_some_method_implemented_in_assembly_language_manipulates from the address of SomeClass (or I guess miore properly speaking, from the address of an object of type SomeClass)?  If I can determine that, I can manipulate the data in assembly language correct?  Is there no Macro or anything in masm that just figures that sort of thing out automatically? I mean C++ classes are compiled to assembly all the time, there has to be some built in way of accessing member variables and accessing member functions (I would think at any rate).  I suppose one method to do it absent some macro or directive in masm or whatever would be simply to write a test program, instantiate the class, make a pointer that points to an instance then increment the address stored in the pointer by 1 until I find the data that I had put in the member variable, then modify my assembly language implementation function accordingly.  But this raises other questions to my beginner's mind: a) would the distance be the same each time I instantiated the object i.e. the distance between the address of any given SomeClass and its member data_that...... is ALWAYS X bytes? b) would the distance be the same each time the program was executed? c) would the distance be the same each time the program is COMPILED assuming I do not modify the class or the compilation options d) (I can almost be the answer to d is no) would the distance be the same every time the program is compiled if I modify the class or change compiler settings?  Am I on the right track here?  How should I approach this (assuming I want it to be a member function -- so I can encapsulate it -- and so I can learn to do this -- this is more about learning for me than devising a solution to some particular problem) problem?

The second idea that comes to mind would be to have the method be a free function outside the class but not export its symbol so I can't access it directly later when I make my library / dll --- inside the class I would have a constant function pointer to that free function as a method.  This raises two issues: 1) because I am implementing the method in assembly language, will I be able to access the private data of the class still assuming I can tag the address -- masm doesn't enforce C++ public/private restrictions, does it?  2) does the fact that the method is a pointer to a function and not a function itself introduce extra overhead?  If so, to what extent, if any, does the fact that it is a constant pointer -- i.e. the function it points to cannot be changed, alleviate that?

I appreciate any help you guys provide!

NOTE -- edited previous version because I wrote it on my phone and it was ungrammatical to the point of being almost unintelligible.  I also added example to clarify my problem.

csusie1

Oooh one other thing .... If I specify extern "c" can it still pass parameters on registers via fastcall convention  if I so desire?

drizz


  • You are thinking too much about how you can manipulate a class instance from your asm code. Don't. While using this pointer to access class members is of course possible, it is (IMO) hackish.

  • Use asm do a particular task. Create asm functions that work on basic types, structures, etc. Don't enforce oop onto asm.
    Then again, there is no "don't" and "can't do" for assembly-knowledgeable programmers - that's why there is ObjAsm32 ObjAsm64 :U


    Next,

class SomeClass
{

public:

    inline CPSWORD EncapsulatingPublicMethodForASMMethod(int some_parameter, bool some_other_parameter)
    {

              return some_method_implemented_in_assembly_language(some_parameter, some_other_parameter);
    }
    ///bla bla bla some other methods and accessors and such and thus

private:

    CPSWORD some_method_implemented_in_assembly_language(int some_parameter, bool some_other_parameter);

    //blablabla a bunch of other data and methods
    unsigned short data_that_some_method_implemented_in_assembly_language_manipulates;

};

[li] Pointers. Use pointers as parameters for asm functions that work on basic types, structures (if necessary) [/li]
- Also, I think encapsulating method does not need to be public

some_FUNCTION_implemented_in_assembly_language ( unsigned short* pdata , ..............

some_FUNCTION_implemented_in_assembly_language(&this->data_that_some_method_implemented_in_assembly_language_manipulates, ...);


[li]

If you are going to write a lot of asm code then why start from c++ class? Why not use c++ class to wrapp around the final "product"?[/li]


[li]

Speaking of C++ and Interfaces... Interfaces are great! They make C++ programming bearable and certanly the right way to write compatible reusable C++ code. I expect to see the future windows with interfaces/COM only - API obsolete.
[/li]

[/list]


HTH!

The truth cannot be learned ... it can only be recognized.

baltoro

CSUSIE1,   
You really ask terrific questions !!!
There was a MASM Forum thread awhile ago that deals with using C++ class member functions, coded in assembly language: Clipping Rectangles With Use of MMX Instruction Set.
If you visit the website of the person who posted, you will find a description of the technique: Implementing C++ Class Methods in Pure Assembly Language in Microsoft Visual C++ and MASM
...DRIZZ is one of the most experienced and knowledgeable programmers here,...
Here is a suggestion: If I was doing what you are attempting to do,...my first effort would be to simply declare a dummy member function (it willl have the correct offset, and you can specify whatever parameters you desire),...and, then from within the code block of that member function call your assembly language routines. The Assembly language routines can be compiled to an Object file, and this can be inserted into your Visual Studio Project simply by putting it into the project directory,...and the project should compile correctly if your Calling Conventions are compatible and the assembly routines are either exported or public.
Baltoro