問(wèn)題描述
我試圖了解對(duì)象在程序集級(jí)別是如何工作的.對(duì)象究竟是如何存儲(chǔ)在內(nèi)存中的,成員函數(shù)如何訪問(wèn)它們?
I'm trying to understand how objects work at the assembly level. How exactly are objects stored in memory, and how do member-functions access them?
(編者注:原始版本方式太寬泛了,并且首先對(duì)匯編和結(jié)構(gòu)的工作方式有些困惑.)
(editor's note: the original version was way too broad, and had some confusion over how assembly and structs work in the first place.)
推薦答案
類(lèi)的存儲(chǔ)方式與結(jié)構(gòu)完全相同,除非它們具有虛擬成員.在這種情況下,有一個(gè)隱式 vtable 指針作為第一個(gè)成員(見(jiàn)下文).
Classes are stored exactly the same way as structs, except when they have virtual members. In that case, there's an implicit vtable pointer as the first member (see below).
結(jié)構(gòu)體存儲(chǔ)為連續(xù)的內(nèi)存塊(如果編譯器沒(méi)有對(duì)其進(jìn)行優(yōu)化或?qū)⒊蓡T值保留在寄存器中).在 struct 對(duì)象中,其元素的地址按照定義成員的順序增加.(來(lái)源:http://en.cppreference.com/w/c/language/struct).我鏈接了 C 定義,因?yàn)樵?C++ 中 struct
意味著 class
(使用 public:
作為默認(rèn)值而不是 private:
>).
A struct is stored as a contiguous block of memory (if the compiler doesn't optimize it away or keep the member values in registers). Within a struct object, addresses of its elements increase in order in which the members were defined. (source: http://en.cppreference.com/w/c/language/struct). I linked the C definition, because in C++ struct
means class
(with public:
as the default instead of private:
).
將 struct
或 class
視為可能太大而無(wú)法放入寄存器的字節(jié)塊,但它作為值"被復(fù)制.匯編語(yǔ)言沒(méi)有類(lèi)型系統(tǒng);內(nèi)存中的字節(jié)只是字節(jié),不需要任何特殊指令即可從浮點(diǎn)寄存器中存儲(chǔ) double
并將其重新加載到整數(shù)寄存器中.或者進(jìn)行未對(duì)齊的加載并獲取 1 int
的最后 3 個(gè)字節(jié)和下一個(gè)的第一個(gè)字節(jié).struct
只是在內(nèi)存塊之上構(gòu)建 C 類(lèi)型系統(tǒng)的一部分,因?yàn)閮?nèi)存塊是有用的.
Think of a struct
or class
as a block of bytes that might be too big to fit in a register, but which is copied around as a "value". Assembly language doesn't have a type system; bytes in memory are just bytes and it doesn't take any special instructions to store a double
from a floating point register and reload it into an integer register. Or to do an unaligned load and get the last 3 bytes of 1 int
and the first byte of the next. A struct
is just part of building C's type system on top of blocks of memory, since blocks of memory are useful.
這些字節(jié)塊可以具有靜態(tài)(全局或 static
)、動(dòng)態(tài)(malloc
或 new
)或自動(dòng)存儲(chǔ)(局部變量: 臨時(shí)在堆棧或寄存器中,在普通 CPU 上的普通 C/C++ 實(shí)現(xiàn)中).無(wú)論如何,塊內(nèi)的布局都是相同的(除非編譯器優(yōu)化了結(jié)構(gòu)局部變量的實(shí)際內(nèi)存;請(qǐng)參閱下面的示例,內(nèi)聯(lián)返回結(jié)構(gòu)的函數(shù).)
These blocks of bytes can have static (global or static
), dynamic (malloc
or new
), or automatic storage (local variable: temporary on the stack or in registers, in normal C/C++ implementations on normal CPUs). The layout within a block is the same regardless (unless the compiler optimizes away the actual memory for a struct local variable; see the example below of inlining a function that returns a struct.)
結(jié)構(gòu)或類(lèi)與任何其他對(duì)象相同.在 C 和 C++ 術(shù)語(yǔ)中,即使 int
也是一個(gè)對(duì)象:http://en.cppreference.com/w/c/language/object.即您可以存儲(chǔ)的連續(xù)字節(jié)塊(C++ 中的非 POD 類(lèi)型除外).
A struct or class is the same as any other object. In C and C++ terminology, even an int
is an object: http://en.cppreference.com/w/c/language/object. i.e. A contiguous block of bytes that you can memcpy around (except for non-POD types in C++).
您正在編譯的系統(tǒng)的 ABI 規(guī)則指定了插入填充的時(shí)間和位置,以確保即使您執(zhí)行諸如 struct { char a; 之類(lèi)的操作,每個(gè)成員也有足夠的對(duì)齊方式.國(guó)際b;};
(例如,x86-64 System V ABI,用于 Linux 和其他非 Windows 系統(tǒng),指定 int
是 32 位類(lèi)型,在內(nèi)存中獲得 4 字節(jié)對(duì)齊.ABI是什么確定了 C 和 C++ 標(biāo)準(zhǔn)依賴(lài)于實(shí)現(xiàn)"的一些東西,以便該 ABI 的所有編譯器都可以編寫(xiě)可以調(diào)用彼此函數(shù)的代碼.)
The ABI rules for the system you're compiling for specify when and where padding is inserted to make sure each member has sufficient alignment even if you do something like struct { char a; int b; };
(for example, the x86-64 System V ABI, used on Linux and other non-Windows systems specifies that int
is a 32-bit type that gets 4-byte alignment in memory. The ABI is what nails down some stuff that the C and C++ standards leave "implementation dependent", so that all compilers for that ABI can make code that can call each other's functions.)
請(qǐng)注意,您可以使用 offsetof(struct_name, member)
了解結(jié)構(gòu)布局(在 C11 和 C++11 中).另請(qǐng)參閱 C++11 中的 alignof
,或_Alignof
在 C11 中.
Note that you can use offsetof(struct_name, member)
to find out about struct layout (in C11 and C++11). See also alignof
in C++11, or _Alignof
in C11.
由程序員對(duì)結(jié)構(gòu)成員進(jìn)行排序以避免在填充上浪費(fèi)空間,因?yàn)?C 規(guī)則不允許編譯器為您對(duì)結(jié)構(gòu)進(jìn)行排序.(例如,如果您有一些 char
成員,請(qǐng)將它們分成至少 4 個(gè)一組,而不是與更寬的成員交替.從大到小排序是一個(gè)簡(jiǎn)單的規(guī)則,記住指針可能是 64 或 32- 常見(jiàn)平臺(tái)上的位.)
It's up to the programmer to order struct members well to avoid wasting space on padding, since C rules don't let the compiler sort your struct for you. (e.g. if you have some char
members, put them in groups of at least 4, rather than alternating with wider members. Sorting from large to small is an easy rule, remembering that pointers may be 64 or 32-bit on common platforms.)
有關(guān) ABI 等的更多詳細(xì)信息,請(qǐng)?jiān)L問(wèn) https://stackoverflow.com/tags/x86/info.Agner Fog 的優(yōu)秀網(wǎng)站包括 ABI 指南和優(yōu)化指南.
More details of ABIs and so on can be found at https://stackoverflow.com/tags/x86/info. Agner Fog's excellent site includes an ABI guide, along with optimization guides.
class foo {
int m_a;
int m_b;
void inc_a(void){ m_a++; }
int inc_b(void);
};
int foo::inc_b(void) { return m_b++; }
編譯為(使用http://gcc.godbolt.org/):
foo::inc_b(): # args: this in RDI
mov eax, DWORD PTR [rdi+4] # eax = this->m_b
lea edx, [rax+1] # edx = eax+1
mov DWORD PTR [rdi+4], edx # this->m_b = edx
ret
如您所見(jiàn),this
指針作為隱式第一個(gè)參數(shù)傳遞(在 rdi 中,在 SysV AMD64 ABI 中).m_b
存儲(chǔ)在結(jié)構(gòu)/類(lèi)開(kāi)頭的 4 個(gè)字節(jié)處.注意 lea
的巧妙使用來(lái)實(shí)現(xiàn)后增量運(yùn)算符,將舊值留在 eax
中.
As you can see, the this
pointer is passed as an implicit first argument (in rdi, in the SysV AMD64 ABI). m_b
is stored at 4 bytes from the start of the struct/class. Note the clever use of lea
to implement the post-increment operator, leaving the old value in eax
.
沒(méi)有發(fā)出 inc_a
的代碼,因?yàn)樗窃陬?lèi)聲明中定義的.它被視為與 inline
非成員函數(shù)相同.如果它真的很大并且編譯器決定不內(nèi)聯(lián)它,它可以發(fā)出它的獨(dú)立版本.
No code for inc_a
is emitted, since it's defined inside the class declaration. It's treated the same as an inline
non-member function. If it was really big and the compiler decided not to inline it, it could emit a stand-alone version of it.
C++ 對(duì)象與 C 結(jié)構(gòu)體的真正不同之處在于涉及虛擬成員函數(shù).對(duì)象的每個(gè)副本都必須攜帶一個(gè)額外的指針(指向?qū)嶋H類(lèi)型的 vtable).
Where C++ objects really differ from C structs is when virtual member functions are involved. Each copy of the object has to carry around an extra pointer (to the vtable for its actual type).
class foo {
public:
int m_a;
int m_b;
void inc_a(void){ m_a++; }
void inc_b(void);
virtual void inc_v(void);
};
void foo::inc_b(void) { m_b++; }
class bar: public foo {
public:
virtual void inc_v(void); // overrides foo::inc_v even for users that access it through a pointer to class foo
};
void foo::inc_v(void) { m_b++; }
void bar::inc_v(void) { m_a++; }
編譯aJoPaLlDHEBg>aJoPaLlDHEBg>
compiles to
; This time I made the functions return void, so the asm is simpler
; The in-memory layout of the class is now:
; vtable ptr (8B)
; m_a (4B)
; m_b (4B)
foo::inc_v():
add DWORD PTR [rdi+12], 1 # this_2(D)->m_b,
ret
bar::inc_v():
add DWORD PTR [rdi+8], 1 # this_2(D)->D.2657.m_a,
ret
# if you uncheck the hide-directives box, you'll see
.globl foo::inc_b()
.set foo::inc_b(),foo::inc_v()
# since inc_b has the same definition as foo's inc_v, so gcc saves space by making one an alias for the other.
# you can also see the directives that define the data that goes in the vtables
<小時(shí)>
有趣的事實(shí):add m32, imm8
在大多數(shù) Intel CPU 上比 inc m32
快(負(fù)載微融合 + ALU uops);舊的 Pentium4 建議避免 inc
仍然適用的罕見(jiàn)情況之一.gcc 總是避免使用 inc
,即使它可以節(jié)省代碼大小而沒(méi)有任何缺點(diǎn):/INC 指令與 ADD 1:重要嗎?
Fun fact: add m32, imm8
is faster than inc m32
on most Intel CPUs (micro-fusion of the load+ALU uops); one of the rare cases where the old Pentium4 advice to avoid inc
still applies. gcc always avoids inc
, though, even when it would save code size with no downsides :/ INC instruction vs ADD 1: Does it matter?
void caller(foo *p){
p->inc_v();
}
mov rax, QWORD PTR [rdi] # p_2(D)->_vptr.foo, p_2(D)->_vptr.foo
jmp [QWORD PTR [rax]] # *_3
(這是一個(gè)優(yōu)化的尾調(diào)用:jmp
替換 call
/ret
).
(This is an optimized tailcall: jmp
replacing call
/ret
).
mov
將對(duì)象中的 vtable 地址加載到寄存器中.jmp
是內(nèi)存間接跳轉(zhuǎn),即從內(nèi)存加載新的 RIP 值.跳轉(zhuǎn)目標(biāo)地址是vtable[0]
,即vtable中的第一個(gè)函數(shù)指針.如果有另一個(gè)虛函數(shù),mov
不會(huì)改變,但 jmp
會(huì)使用 jmp [rax + 8]
.
The mov
loads the vtable address from the object into a register. The jmp
is a memory-indirect jump, i.e. loading a new RIP value from memory. The jump-target address is vtable[0]
, i.e. the first function pointer in the vtable. If there was another virtual function, the mov
wouldn't change but the jmp
would use jmp [rax + 8]
.
vtable 中條目的順序可能與類(lèi)中的聲明順序相匹配,因此在一個(gè)翻譯單元中重新排序類(lèi)聲明會(huì)導(dǎo)致虛函數(shù)到達(dá)錯(cuò)誤的目標(biāo).就像對(duì)數(shù)據(jù)成員重新排序會(huì)改變類(lèi)的 ABI 一樣.
The order of entries in the vtable presumably matches the order of declaration in the class, so reordering the class declaration in one translation unit would result in virtual functions going to the wrong target. Just like reordering the data members would change the class's ABI.
如果編譯器有更多信息,它可以去虛擬化調(diào)用.例如如果它可以證明 foo *
總是指向一個(gè) bar
對(duì)象,它就可以?xún)?nèi)聯(lián) bar::inc_v()
.
If the compiler had more information, it could devirtualize the call. e.g. if it could prove that the foo *
was always pointing to a bar
object, it could inline bar::inc_v()
.
GCC 甚至?xí)?strong>推測(cè)性地去虛擬化,因?yàn)樗梢栽诰幾g時(shí)確定可能的類(lèi)型.在上面的代碼中,編譯器看不到任何繼承自 bar
的類(lèi),所以很可能 bar*
指向一個(gè) bar代碼> 對(duì)象,而不是某個(gè)派生類(lèi).
GCC will even speculatively devirtualize when it can figure out what the type probably is at compile time. In the above code, the compiler can't see any classes that inherit from bar
, so it's a good bet that bar*
is pointing to a bar
object, rather than some derived class.
void caller_bar(bar *p){
p->inc_v();
}
# gcc5.5 -O3
caller_bar(bar*):
mov rax, QWORD PTR [rdi] # load vtable pointer
mov rax, QWORD PTR [rax] # load target function address
cmp rax, OFFSET FLAT:bar::inc_v() # check it
jne .L6 #,
add DWORD PTR [rdi+8], 1 # inlined version of bar::inc_v()
ret
.L6:
jmp rax # otherwise tailcall the derived class's function
記住,一個(gè) foo *
實(shí)際上可以指向一個(gè)派生的 bar
對(duì)象,但是一個(gè) bar *
不允許指向一個(gè)純的foo
對(duì)象.
Remember, a foo *
can actually point to a derived bar
object, but a bar *
is not allowed to point to a pure foo
object.
不過(guò)這只是一個(gè)賭注;虛函數(shù)的部分要點(diǎn)是可以擴(kuò)展類(lèi)型而無(wú)需重新編譯對(duì)基類(lèi)型進(jìn)行操作的所有代碼.這就是為什么它必須比較函數(shù)指針并在錯(cuò)誤時(shí)退回到間接調(diào)用(在這種情況下為 jmp 尾調(diào)用)的原因.編譯器啟發(fā)式?jīng)Q定何時(shí)嘗試.
It is just a bet though; part of the point of virtual functions is that types can be extended without recompiling all the code that operates on the base type. This is why it has to compare the function pointer and fall back to the indirect call (jmp tailcall in this case) if it was wrong. Compiler heuristics decide when to attempt it.
請(qǐng)注意,它正在檢查實(shí)際的函數(shù)指針,而不是比較 vtable 指針.只要派生類(lèi)型沒(méi)有覆蓋那個(gè)虛函數(shù),它仍然可以使用內(nèi)聯(lián)的bar::inc_v()
.覆蓋其他虛函數(shù)不會(huì)影響這個(gè),但需要一個(gè)不同的虛表.
Notice that it's checking the actual function pointer, rather than comparing the vtable pointer. It can still use the inlined bar::inc_v()
as long as the derived type didn't override that virtual function. Overriding other virtual functions wouldn't affect this one, but would require a different vtable.
允許擴(kuò)展而不重新編譯對(duì)于庫(kù)來(lái)說(shuō)很方便,但也意味著大程序各部分之間的耦合更松散(即您不必在每個(gè)文件中都包含所有頭文件).
Allowing extension without recompilation is handy for libraries, but also means looser coupling between parts of a big program (i.e. you don't have to include all the headers in every file).
但這對(duì)某些用途造成了一些效率成本:C++ 虛擬分派僅通過(guò)指針對(duì)對(duì)象起作用,因此您不能擁有沒(méi)有黑客的多態(tài)數(shù)組,或者通過(guò)指針數(shù)組進(jìn)行昂貴的間接訪問(wèn)(這打敗了許多硬件和軟件優(yōu)化:在 C++ 中最快實(shí)現(xiàn)簡(jiǎn)單的、虛擬的、觀察者類(lèi)型的模式?).
But this imposes some efficiency costs for some uses: C++ virtual dispatch only works through pointers to objects, so you can't have a polymorphic array without hacks, or expensive indirection through an array of pointers (which defeats a lot of hardware and software optimizations: Fastest implementation of simple, virtual, observer-sort of, pattern in c++?).
如果您想要某種多態(tài)性/分派,但僅適用于一組封閉的類(lèi)型(即在編譯時(shí)已知),您可以使用 union + enum
+ switch
,或使用 std::變體<D1,D2>
進(jìn)行聯(lián)合和 std::visit
進(jìn)行分派,或其他各種方式.另請(qǐng)參閱多態(tài)類(lèi)型的連續(xù)存儲(chǔ)和c++中簡(jiǎn)單、虛擬、觀察者排序模式的最快實(shí)現(xiàn)?.
If you want some kind of polymorphism / dispatch but only for a closed set of types (i.e. all known at compile time), you can do it manually with a union + enum
+ switch
, or with std::variant<D1,D2>
to make a union and std::visit
to dispatch, or various other ways. See also Contiguous storage of polymorphic types and Fastest implementation of simple, virtual, observer-sort of, pattern in c++?.
使用 struct
并不會(huì)強(qiáng)制編譯器實(shí)際將內(nèi)容放入內(nèi)存,就像小數(shù)組或指向局部變量的指針一樣.例如,一個(gè)按值返回 struct
的內(nèi)聯(lián)函數(shù)仍然可以完全優(yōu)化.
Using a struct
doesn't force the compiler to actually put stuff in memory, any more than a small array or a pointer to a local variable does. For example, an inline function that returns a struct
by value can still fully optimize.
as-if 規(guī)則適用:即使結(jié)構(gòu) 邏輯上 有一些內(nèi)存存儲(chǔ),編譯器可以制作 asm,將所有需要的成員保存在寄存器中(并進(jìn)行轉(zhuǎn)換,這意味著寄存器中的值不對(duì)應(yīng)于運(yùn)行"源代碼的 C++ 抽象機(jī)中變量或臨時(shí)變量的任何值).
The as-if rule applies: even if a struct logically has some memory storage, the compiler can make asm that keeps all the needed members in registers (and do transformations that mean that values in registers don't correspond to any value of a variable or temporary in the C++ abstract machine "running" the source code).
struct pair {
int m_a;
int m_b;
};
pair addsub(int a, int b) {
return {a+b, a-b};
}
int foo(int a, int b) {
pair ab = addsub(a,b);
return ab.m_a * ab.m_b;
}
那個(gè) 編譯(使用 g++ 5.4):
# The non-inline definition which actually returns a struct
addsub(int, int):
lea edx, [rdi+rsi] # add result
mov eax, edi
sub eax, esi # sub result
# then pack both struct members into a 64-bit register, as required by the x86-64 SysV ABI
sal rax, 32
or rax, rdx
ret
# But when inlining, it optimizes away
foo(int, int):
lea eax, [rdi+rsi] # a+b
sub edi, esi # a-b
imul eax, edi # (a+b) * (a-b)
ret
請(qǐng)注意,即使按值返回結(jié)構(gòu)也不一定將其放入內(nèi)存中.x86-64 SysV ABI 傳遞并返回打包到寄存器中的小結(jié)構(gòu).不同的 ABI 為此做出不同的選擇.
Notice how even returning a struct by value doesn't necessarily put it in memory. The x86-64 SysV ABI passes and returns small structs packed together into registers. Different ABIs make different choices for this.
這篇關(guān)于對(duì)象如何在程序集級(jí)別在 x86 中工作?的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!