Memory footprint of objects in Ruby 1.8, EE, 1.9, and OCaml

With Ruby 1.9.1 being out for a while now, it's time to review my calculations regarding the memory footprint of objects, since 1.9 incorporates some optimizations that improve significantly on 1.8. I also measured the footprint of OCaml objects while I was at it.

Addendum: added note about Ruby Enterprise Edition (Ruby EE), a patched Ruby 1.8.6 used with Phusion Passenger; see below.

This table summarizes the results (sizes in bytes on x86; around (exactly, for OCaml) twice as much on x86-64 --- the malloc overhead might differ):

Ruby 1.8 Ruby EE Ruby 1.9 OCaml
object with no IVs 20 20 20 12
object 1 IV 120 96 20 16
object 2 IVs 144 112 20 20
object 3 IV 168 128 20 24
object 4 IV 192 144 48 28
struct (Struct or record) 1 elm. 32 24 20 8
struct 2 elms. 36 28 20 12
struct 3 elms. 40 32 20 16
struct 4 elms. 44 36 44 20

(The Ruby EE gains come from the TCMalloc allocator and these are best case figures; the actual footprint will be between them and those for Ruby 1.8.)

Keep in mind that both Ruby 1.8 and 1.9 can suffer from heavy memory fragmentation (both internal and external) when allocating many objects (also, objects might be retained for an arbitrarily long amount of time because the GC is conservative). OCaml has no such problem, as it has got a generational, exact GC with a copying GC in the minor heap and an incremental mark & sweep & compact GC in the major heap.

In Ruby 1.8, an object with one instance variable (IV) takes:

  • 5 words for the object slot

  • 4 (+2 = malloc overhead) words for the IV table (st_table struct)

  • 11 (+2) words for the bins

  • 4 (+2) words for the entry

That is, given

class X; def initialize(x); @x = x end end

X.new(1) will take 30 words, or 120 bytes in x86 (24 of which are used by malloc for internal bookkeeping).

Additional IVs cost 6 words (24 bytes) per IV until we reach 11 IVs (at which point the hash table resizes to 19 bins).

Ruby Enterprise Edition

Ruby EE is a patched Ruby 1.8.6 which uses Google's TCMalloc, which is much faster than the most common one, based on Doug Lea's. There are no changes to the runtime representation of objects, so all the possible gains space-wise come from TCMalloc. According to its documentation, small blocks can be allocated with virtually no overhead, so Ruby EE will take up to 24 fewer bytes per object with IVs, and as much as 8 bytes less per Struct.

Ruby 1.9

Ruby 1.9 doesn't use a symbol -> value hash table for IVs anymore. There's an IV index table per class which contains the index associated to the IV name. The index is used to dereference a per-object IV array.

(Note that the IV index table is shared amongst all the objects of the same class. If each one uses different names for the IVs, the indexes will keep increasing, so in a pathological case the IV array could become arbitrarily large even when the object has got only one IV.)

Ruby 1.9 stores up to 3 instance variables in the object slot without using an external table, so an object with one IV will only take 5 words. Beyond 3 instance variables, it reverts to an external IV array which is resized exponentially (factor 1.25) as new elements are added. For an object with 4 IVs, it'll be of size 5, and the overall footprint will be:

  • 5 words for the object slot

  • 5 (+2) words for the IV array

I'd never bothered to look into the size of OCaml objects before (as you're going for records when you want speed anyway), even though it's really easy using the low-level Obj module, which gives information about the runtime representation:

# open Obj;;
# let value_size o = let t = repr o in if is_block t then 1 + size t else 0;;
val value_size : 'a -> int = <fun>
# value_size 0;;
- : int = 0
# type foo = A | B of int;;
type foo = A | B of int
# value_size A;;
- : int = 0
# value_size (B 1);;
- : int = 2
# value_size (object end);;
- : int = 3
# value_size (object val a = 1 end);;
- : int = 4
# value_size (object val a = 1 method x = 1 end);;
- : int = 4

The value_size function returns the size in words of the value in the heap, and returns 0 for immediate values (bool, char, int, constant constructors).

After a look at CamlinternalOO.ml, I now know that, in addition to the 1 word overhead taken for all values in the heap (the block header used by the GC and the runtime), objects take:

  • 1 word for the method table

  • 1 word for an unique object ID

  • 1 additional word per instance variable

eigenclass.org.sharedcopy.com

khigia says...

sharedcopy ate the table???


khigia says...

need to understand this fully ;)


choonkeat replies...

i'm using "Readablity" bookmarklet to make fonts big and easy to read on the bus!