Bridging the Gap on Objective-C ARC

Some Starting Words

 Objective-C has three memory management schemes:

  • manual retain release (MRR)
  • automatic reference counting (ARC) 
  • garbage collection (GC) 

Luckily, GC is dead and so there is no use to discuss it. However, it still is incredibly easy to stumble across both ARC and MRR wherever you go. Understanding ARC is critical to being an iOS or Mac developer these days and as it turns out its a little more involved than this graphic Apple made. The image's overall sentiment is correct and does give a basis for the difference in code: you no longer have to write retain and release explicitly. Yay! Now, we're done, right? 

Reference Counting

If you already know what reference counting is then feel free to skip this section.  I am going to keep it brief anyway, so if you have never heard of this concept, then you might want to read up elsewhere first.

Every object in Objective-C has a piece of state called its reference count. It is an unsigned integer that is used to keep track of the number of referents to the object and keep it alive until there are none. One could interpret this as to say that if an object has a reference count of 3 then there are 3 semantic scopes in which the object must be kept alive.

The rules of reference counting in Objective-C are as follows:

  • A created object has a retain count of +1 (usually from an -init method).
  • When the count is 1 and the object is released, it is deallocated. 
  • The -retain method increases the count by one. 
  • The -release method decreased the count by one. 

This arrives at some pretty standard patterns which can be summed up as "if you want to start using the object you must ensure that it is retained through your use." This manifests itself primarily in two ways:

  1. When you start using an object -retain it and when you finish, -release it.
  2. Never use an object without retaining it UNLESS you have retained another object that has retained it. In reality this is just repeating point #1 but with the transitive property.

Retained References

Then, with this is mind, we arrive at the standard retain-setter: 

Compiled with MRR (-fno-objc-arc)

 1  // A retain-setter (acquiring a "strong" reference)
 2  - (void)setFoo:(Foo *)aFoo {
 3    [aFoo retain];
 4    [_foo release];
 5    _foo = foo;
 6  }

The pattern is easily learned. We must release the old value (we are done with it), retain the new value (we want to start using it), and actually assign the pointer. This is the semantic handoff for replacing ownership (and we would release _foo in the -dealloc method to seal the deal).

The only point worth making is that the method must be implemented in the order shown. If aFoo happens to be the same pointer as _foo then we wouldn't want to release it and potentially have it be deallocated if the intent was to immediately retain it afterward (sorry, it's gone!). 

Unretained References

It's just as easy to make an assign-setter where we don't actually bump the reference count. That means that keeping this object alive is someone else's job. It could be deallocated and we'd end up with _foo being a dangling pointer. Un-retained objects must be handled with care.

Compiled with MRR (-fno-objc-arc)

 1  // An assign-setter (acquiring a "weak" reference)
 2  - (void)setFoo:(Foo *)aFoo {
 3    _foo = foo;
 4  }

Scope-Retained References

The only piece of MRR that remains is the method -autorelease. Often you want to create an object to return from a function (reminder: its retain count would be +1). Imagine this semi-contrived function:

Compiled with MRR (-fno-objc-arc)

 1  NSArray *ensureArray(NSArray *object) {
 2    if ([object isKindOfClass::[NSArray class]]) {
 3      return object;
 4    }
 5    return [[NSArray alloc] initWithObject:object];
 6  }

It is the responsibility of the caller to release the return value? In one branch a new object is created and in the other an existing object is returned. If the first branch executes and the caller releases the value at some point then the number of retains and releases will be unbalanced and the program will suffer an overrelease. If the second branch executes and the caller doesn't release it then the object will never be released and it will leak.

The problem is that the first branch return an unowned object  whereas the second returns an owned  object. There is no correct behavior for the caller. One possible solution would be to make all branches return an owned object.

Compiled with MRR (-fno-objc-arc)

 1  NSArray *ensureArray(NSArray *object) {
 2    if ([object isKindOfClass::[NSArray class]]) {
 3      [object retain];
 4      return object;
 5    }
 6    return [[NSArray alloc] initWithObject:object];
 7  }

While this works, it is unfortunate. This means that every branch of every method must return an owned reference and every caller of every function must release the return value. Ugh! Can you imagine all the bugs and chaos that would ensue?! 

Instead we want a way to return an unowned object. Sadly if we release the value in the second branch we'd end up deallocating the object before the method even returns. We need a way to say "release this in the future" such that there is sufficient time for the caller to retain it if desired or allow it deallocate and be forever forgotten.  

We arrive at autorelease. An autorelease-pool is a set that when "drained" send a -release message to all objects in it. Thus all we need to do is put our object into an autorelease pool that will drain at some point in the future. Luckily we don't have to worry about where the autorelease-pool is or when it drains (there's one created for each processed event in the event loop and you can create your own). We simply need to just call the -autorelease method and everything will work out in the end.

Compiled with MRR (-fno-objc-arc)

 1  NSArray *ensureArray(NSArray *object) {
 2    if ([object isKindOfClass::[NSArray class]]) {
 3      return object;
 4    }
 5    return [[[NSArray alloc] initWithObject:object] autorelease];
 7  }

Core Foundation

There is one more thing to go over before I go into ARC, since it causes the pain points and complexity of ARC: void pointers. 

All Objective-C objects are on the heap (yes, there are a few exceptions). That means that we refer to them via pointers and whenever you have pointers you are in The Wild West.  You can shove stuff into them, dereference them, add 4 and see what's next, and do any other crazy thing you want. When the compiler tries to figure out how to manage your memory for you it needs to make sure you don't go do any crazy pointer magic behind its back. But sadly you will because of a cute little animal called Core Foundation.

In reality it's just a bunch of C objects (meaning structs, pointers, malloced memory, and glue to hold it all together) with a bunch of C functions to operate on them. Many of the APIs are more feature-filled than objects in Foundation; so, they tend to pop up quite often. One example is CFMutableDictionaryRef:

Compiled with MRR (-fno-objc-arc)

 1  CFMutableDictionaryRef dRef = CFMutableDictionaryCreate(...);
 2  CFDictionarySetValue(dRef, aKey, aValue);
 3
 4    ...
 5
 6  CFRelease(dRef);

It behaves just like NSMutableDictionary except that it has a few extra bells and whistles. It even follows the same patterns of MRR. You can see that the "create" method returns a +1 object and that if must be CFReleased at the end (lest it leak).

Toll-Free Bridging

CFMutableDictionaryRef and NSMutableDictionary are more than just similar; they are essentially the same object. Under the hood they have the exact same data layout and are completely interchangeable. You are free to take either of the two, cast it to the other and use method from either on it. It's cool and it's crazy and when you add ARC to the mix, it's tricky.

Automatic Reference Counting (ARC)

A few years ago Apple introduced a new feature called ARC to the Objective-C language (here's the spec). The idea was that you would no longer need to call retain, release, or autorelease because the patterns had become so well established that the compiler could figure it out for you. For a lot of high-level application code (read: code nowhere near Core Foundation) this was basically true and the graphic shown at the top of this post (from Apple's transition guide) was basically true.

ARC added the ability to explicitly specify retained and unretained references directly and the compiler would handle the rest. Now you could write '__strong id Foo' and have a reference then when assigned to, would: release the old value, retain the new, and assign use primitive assignment for the actual value. Thankfully, all variables (including instance variables in classes) are strong by default; feel free to omit the qualifier.

Along with __strong they added __weak, for unretained pointers. And because "who doesn't love a magic show?" they made is so that __weak references were self-zeroing. That means that when the object gets deallocated all __weak references to it self-zero and become nil (it's actually just a memory barrier and a little bit of bookkeeping, but meh, call it magic).

So now the compiler figures out what to retain, what to release, what to autorelease (with some crazy runtime optimizations to avoid it), and then at compile time can optimize the numbers of calls to the minimum set needed for a given block of code. It's pretty cool, and well, I love my job. 

Hmmm... it's just got dark out. Oh! CF is here. 

Cast All The Things

Now that we're all ARC experts, let's look at some code! :D
Here's an example where a CFDictionaryRef and NSDictionary are used with Toll-Free Bridging.

 

Compiled with MRR (-fno-objc-arc)

 1  {
 2    CFDictionaryRef dRef = CFDictionaryCreate(...);
 3    NSDictionary *d = (id)dRef;
 4
 5    ...
 6
 7    [d release];
 8  }

Notice that the code is MRR and it works. dRef is created with a retain count of +1. Then after line 3 both d and dRef point to the same object which still has retain count +1. Finally at line 7 d is release, the retain count goes to 0, and the object is deallocated (remember, in reality there was only ever one object).

Let's use our awesome new knowledge and convert this to ARC.

Attempt at an ARC solution

 1  {
 2    CFDictionaryRef dRef = CFDictionaryCreate(...);
 3    NSDictionary *d = (id)dRef;
 4
 5    ...
 6  }

We just remove the -release call and we're done, right? Well, if we were dealing completely with NSDictionary and there was never any CF in sight, then that would be 100% correct, life would be too easy, and the header of that code snippet would not be yellow nor contain the word "attempt." 

So, what went wrong? Well, the problem is with the interaction of the calls that the compiler inserts and the Toll-Free Bridging.

There are a number of methods that are inserted during ARC compilation.  The most obvious three are objc_retain(id value)objc_release(id value), and objc_autorelease(id value). But there are some crazy cool ones like 

objc_retainAutoreleasedReturnValue(id value) and

objc_initWeak(id *object, id value), but we don't really care about those for this post.

The one that we are going to focus on quite a bit is 

objc_storeStrong(id *object, id value). It is semantically the equals operator when used with a __strong variable. It's behavior is equivalent to the following MRR:

Compiled with MRR (-fno-objc-arc)

 1  id objc_storeStrong(id *object, id value) {
 2    [value retain];
 3    id oldValue = *object;
 4    *object = value;
 5    [oldValue release];
 6    return value;
 7  }

I am not saying it is actually implemented that way, but oh well, close enough! Also notice that same pattern or releasing the old value, retaining the new value, and doing a primitive assignment. 

So, let's get back to that example. I'd like to remark that it won't even compile, but for now let's ignore that fact and I'll tell you why later.

 

Attempt at an ARC solution

 1  {
 2    CFDictionaryRef dRef = CFDictionaryCreate(...);
 3    NSDictionary *d = (id)dRef;
 4
 5    ...
 6  }

This is roughly what the compile is going to emit (from here on a blue header with the title "Compiler emitted ARC code" means a rough approximation of the generated/emitted ARC code):

Compiler emitted ARC code

 1  {
 2    CFDictionaryRef dRef = CFDictionaryCreate(...);
 3    NSDictionary *d;
 4    objc_storeStrong(&d, (id)dRef);
 5
 6    ...
 7
 8    objc_release(d);
 9  }

The compiler has to store strongly into d and then release d when it goes out of scope. 

Transferring Ownership

When dRef is created on line 2 it has a retain count of +1. On line 4 it is stored strongly into d which will bump the retain count again (it's a strong reference). So, after line 4 d and dRef, which are the same object, have a retain count of 2. Then at line 8  d is released and the retained count is 1. That means that d and dRef are NEVER deallocated. This code leaks . Sadness ensures.

What happened? Well, d is an Objective-C object so ARC was able to handle it perfectly well. The issue is that dRef, a Core Foundation object, was created but never cleaned up. The solution is super simple.

Attempt at an ARC solution

 1  {
 2    CFDictionaryRef dRef = CFDictionaryCreate(...);
 3    NSDictionary *d = (id)dRef;
 4    CFRelease(dRef);
 5
 6    ...
 7  }

Now the compiler will insert its calls and the object will be correctly deallocated. This code is semantically correct, but as mentioned above, it doesn't actually compile.

If we look at the code as a whole then we can actually say that lines 3 and 4 represent a transfer of ownership. We had ownership of dRef on line 2 (it had a retain count of +1 with the ownership being this block, lexical scope, or area between the squiggly braces). Then after line 4 we instead have a +1 retain count in d. The ownership of the object has been transferred from CF-land to ARC-land and this pattern is quite common.

Bridging Transfers

ARC introduced a new language feature __bridge_transfer as an ownership qualifier explicitly to be used in casts. Its entire goal was to assist with this pattern. See, I told you it's common enough!

Compiled with ARC (-fobjc-arc)

 1  {
 2    CFDictionaryRef dRef = CFDictionaryCreate(...);
 3    NSDictionary *d = (__bridge_transfer id)dRef;
 4
 5    ...
 6  }

Now we have a correct solution. This works and does not leak. We use the __bridge_transfer qualifier to tell ARC that when storing dRef into d it should also release dRef so that ownership is properly transferred and there isn't a "dangling retain."

The reason that it didn't compile before is that casting a Core Foundation object, a CFType, to any object (casting to id) is not allowed. In reality CFType is just void* and that's what matters. If you want to cast a void* to an object you need to tell the compiler how to do it. And if what you want is to change from a CFType to an id then what you want is a bridge-transfer and ARC will supply the release for you.

Foundation even includes a function called CFBridgingRelease to perform the same semantic behavior. This gives us a second possible solution. 

 

Compiled with ARC (-fobjc-arc)

 1  {
 2    CFDictionaryRef dRef = CFDictionaryCreate(...);
 3    NSDictionary *d = CFBridgingRelease(dRef);
 4
 5    ...
 6  }

But don't mistake CFBridgingRelease for anything too cool; it's just the same silly cast under the hood.

Foundation/NSObject.h

 1  NS_INLINE id CFBridgingRelease(CFTypeRef CF_CONSUMED X) {
 2    return  (__bridge_transfer id)X;
 3  }

Bridging Retain

Let's look at another example of code that we want to convert to ARC. 

Compiled with MRR (-fno-objc-arc)

 1  {
 2    NSDictionary *d = [[NSDictionary alloc] initWithObjectsAndKeys:@"foo", @"bar", nil];
 3    CFDictionaryRef dRef = (CFDictionaryRef)d;
 4
 5    ...
 6
 7    CFRelease(dRef);
 8  }

In MRR this is perfectly valid. After line 2 d has a retain count of +1, then it is cast to dRef and released at the end. But what would happen if we compiled this same code with ARC? Let's look at what the compiler would emit.

Compiler emitted ARC code

 1  {
 2    NSDictionary *d;
 3    $eax = [[NSDictionary alloc] initWithObjectsAndKeys:@"foo", @"bar", nil]);
 4    objc_storeStrong(&d, $eax);
 5    objc_release($eax);
 6
 7    CFDictionaryRef dRef = (CFDictionaryRef)d;
 8
 9    ...
10
11    CFRelease(dRef);
12    objc_release(d);
13  }

I hope you'll pardon me here for making up some syntax. I am using the temporary variable $eax to store the contents of the array. However I couldn't do this with normal Objective-C because the = operator already has a meaning and I didn't want to use a weak reference because that would be incorrect to the behavior.

What this says is that the compiler allocates the array and after doing the strong assignment, releases it since it is no longer used again in the scope. The ARC optimizer would notice the silliness of this and clean it up slightly to:

Compiler emitted ARC code

 1  {
 2    NSDictionary *d ≡ [[NSDictionary alloc] initWithObjectsAndKeys:@"foo", @"bar", nil]);
 3    CFDictionaryRef dRef = (CFDictionaryRef)d;
 4
 5    ...
 6
 7    CFRelease(dRef);
 8    objc_release(d);
 9  }

where I intend for ≡ to mean "primitive assignment". 

Now it should be obvious that this is an overrelease.  After line 2 d has a retain count of +1. After line 3 both d and dRef have a retain count of +1 and then we release each, meaning that first we release which causes it to deallocate and then we release again. Whoops, crash McGee has arrived!

And you probably guess that the code we are talking about doesn't compile. If you can't cast from a CFType (void*) to an id then why would you be able to go the other way?

So, what we want to do is to tell ARC to retain d on its way into CF. That way we can be good, law-abiding citizens of CF-land and properly release it when we are done (as we have been taught to do an know is correct).  For this we don't want to transfer ownership across the bridge, but to retain the object on its way across. Time for a new qualifier it seems.

Compiled with ARC (-fobjc-arc)

 1  {
 2    NSDictionary *d = [[NSDictionary alloc] initWithObjectsAndKeys:@"foo", @"bar", nil];
 3    CFDictionaryRef dRef = (__bridge_retained CFDictionaryRef)d;
 4
 5    ...
 6
 7    CFRelease(dRef);
 8  }

and just like last time, we are also provided with a semantically equivalent function

Compiled with ARC (-fobjc-arc)

 1  {
 2    NSDictionary *d = [[NSDictionary alloc] initWithObjectsAndKeys:@"foo", @"bar", nil];
 3    CFDictionaryRef dRef = (CFDictionaryRef)CFBridgingRetain(d);
 4
 5    ...
 6
 7    CFRelease(dRef);
 8  }

which is nothing more than an inlined function that performs the cast

Foundation/NSObject.h

 1  // After using a CFBridgingRetain on an NSObject, the caller must
 2  // take responsibility for calling CFRelease at an appropriate time.
 3  NS_INLINE CF_RETURNS_RETAINED CFTypeRef CFBridgingRetain(id X) {
 4    return  (__bridge_retained CFTypeRef)X;
 5  }

and even reminds us to clean up after we are done playing with out toys. Thanks! 

Quick Recap:

  • id → CF (retain on the way in) should use __bridge_retained.
  • CF → id (release/transfer on the way in) should use __bridge_transfer.

Casting Just for a Moment

Consider this example. 

 

Attempt at an ARC solution

 1  {
 2    CFDictionaryRef dRef = CFDictionaryCreate(...);
 3    NSDictionary *d = @[ @"foo", @"bar", (id)dRef ];
 4
 5    ...
 6
 7    CFRelease(dRef);
 8  }

Surely by now we know that this won't compile! You can't cast a void* to an id! Let's try the one trick we know. 

Attempt at an ARC solution

 1  {
 2    CFDictionaryRef dRef = CFDictionaryCreate(...);
 3    NSDictionary *d = @[ @"foo", @"bar", (__bridge_transfer id)dRef ];
 4
 5    ...
 6
 7    CFRelease(dRef);
 8  }

Well shoot. That's an overrelease. We don't actually want ARC to release it. The array can retain it and release it when it is done with it. We just want to convert it to an id without and retains or releases . Please please?

Compiled with ARC (-fobjc-arc)

 1  {
 2    CFDictionaryRef dRef = CFDictionaryCreate(...);
 3    NSDictionary *d = @[ @"foo", @"bar", (__bridge id)dRef ];
 4
 5    ...
 6
 7    CFRelease(dRef);
 8  }

Note that __bridge is bidirectional. And there we have it, the three bridge casts in all their glory:

  • id → CF (retain on the way in) should use __bridge_retained.
  • CF → id (release/transfer on the way in) should use __bridge_transfer.
  • CF ↔︎ id (no change in ownership)  should use __bridge

Playing With Pointers

Oh yeah, so Objective-C objects are just pointers, right?  This MRR code

Compiled with MRR (-fno-objc-arc)

 1    NSArray *a = [[NSArray alloc] initWithObject:@"foo"];
 2    NSLog(@"The address of a is %p.", a);

will clearly work in ARC, right?  No! You can't cast an object to a pointer (and vice versa) without crossing the bridge! And here we don't want any change in ownership, so __bridge it up.

Compiled with ARC (-fobjc-arc)

 1    NSArray *a = [[NSArray alloc] initWithObject:@"foo"];
 2    NSLog(@"The address of a is %p.", (__bridge void *)a);

Put it in the Pointer, Please

We're almost done. There's only one topic left to address: out variables. That means passing around an id* and setting it explicitly. Well you can actually just declare `id __strong *` or `id __weak *` and the compiler will do exactly what you want when you dereference it for assignment. So, what the issue? I bet you guessed it: void pointers, again!

Imagine the following two silly functions that move src into dest. Have you ever seen anything so contrived? Oh well, it will allow me to illustrate the point. 

Compiled with ARC (-fobjc-arc)

 1  // Store src into dest without retaining it.
 2  void storeArgUnretained(void *dest, id src);
 3
 4  // Store src into dest and retain it.
 5  void storeArgRetained(void *dest, id src);

Let's imagine a simple use of the one that retains the argument during the store. 

Compiled with ARC (-fobjc-arc)

 1  {
 2    id arg = nil;
 3    id src = ...;
 4    getRetainedArg(&arg, src);
 5
 6    ...
 7  }

This works perfectly. The compiler will try to clean up arg and src at the end and that will be fine. The contents of arg with have a +1 retain count because getRetainedArg retains across the store and then the release at the end of scope will decrement and deallocate. Sweet!  But then what about the other one?

Attempt at an ARC solution

 1  {
 2    id arg = nil;
 3    id src = ...;
 4    getUnretainedArg(&arg, src);
 5
 6    ...
 7  }

Well shucks! This one doesn't work. Let's take a look at some pretend emitted code again to try and see why. 

Compiler emitted ARC code

 1  {
 2    id arg, src;
 3    objc_storeStrong(&arg, nil);
 4    objc_storeStrong(&src, ...);
 5
 6    getUnretainedArg(&arg, src);
 7
 8    ...
 8
10    objc_release(arg);
11    objc_release(src);
12  }

The problem is that the storeStrong for arg is storing nil and then we put an argument into the pointer without retaining it. Thus we have a release downstream for an object we never retained. Whoops! That an overrelease if I ever saw one for sure. 

So, we need a way to tell ARC, "don't try to release this object because I am not retaining it!" If we expand this a little more we are actually saying "hey ARC never retain or release this object. I don't want a strong variable. I want an unretained one!"

Oh, weak. That's unretained. Let's use that! Sorry, nope, can't. The problem will actually be the same. The contents of the pointer are assigned without ARC's knowledge. It wouldn't be able to problem bookkeep a weak variable in this case and the self-zeroing wouldn't work. Sadly, if you go behind the compiler's back then your only choice here is to give up a safe, unretained pointer and go with an unsafe, unretained pointer. 

Compiled with ARC (-fobjc-arc)

 1  {
 2    __unsafe_unretained id arg = nil;
 3    id src = ...;
 4    getUnretainedArg(&arg, src);
 5
 6    ...
 7  }

Another common place you'll use/see __unsafe_unretained is for storing object in structs. They don't have any real kind of teardown so proper cleanup is impossible (however in Objective-C++ structs are actually classes and have deconstructors so storing __strong or __weak objects in them is completely fine).

The unfortunate part with the solution above is that now you are using an unsafe reference. It could dangle at any minute! You have to somehow ensure that wherever its contents came from continues to stay alive for as long as you want to use it.

This is the price you pay for flexibility and for using a void*. NSInvocations's getArgument:atIndex:  suffers from this exact problem. It takes a void* because it could stuff anything into it, including a non-object, and so to pay for that flexibility, you have to let things get just a tad unsafe .

BUT if you know that you are only ever going to stuff an object in there and that you don't want to retain it, then you could put in an autoreleased object. The following function will simply do the right thing.

Compiled with ARC (-fobjc-arc)

 1  // Store src into dest and retain-autorelease it.
 2  void storeArgRetainAutoreleased(id __autoreleasing *dest, id src);

Don't forget that id is an object pointer. So id* is an object pointer pointer. This is the exact pattern you will see for the out-variable pattern with NSError. The arg will be `NSError *__autoreleasing *`  and the compiler will make all your dreams come true.

Just in case you want to see how and when these function do and don't crash, here are some simple yet illustrative examples to poke and prod. 

Compiled with ARC (-fobjc-arc)

 1  // Store src into dest without retaining it.
 2  void storeArgUnretained(void *dest, id src) {
 3    *((void **)dest) = (__bridge void *)src;
 4  }
 5
 6  // Store src into dest and retain it.
 7  void storeArgRetained(void *dest, id src) {
 8    *((__strong id *)dest) = src;
 9  }
10
11  // Store src into dest and retain-autorelease it.
12  void storeArgRetainAutoreleased(id __autoreleasing *dest, id src);
13    *dest = src;
14  }

Stop the Snark, Get out of the Dark, and Just Use ARC

At first ARC seems so simple and easy. You don't have to write retain, release, and autorelease; wow. But then you hit the bridging and the address-taking and it gets tricky. Hopefully this helps demystify and provide some depth and intuition to what its doing and how to use it. With time and practice you'll find that it's really not that bad. 

Whenever you find yourself thinking "how can I make the compile emit a retain here and a release there" realize that you are approaching it incorrectly. At some point after the creation of the first compilers people had to stop thinking "how do I get to it produce such and such assembly" and instead use it as intended and accept it as a decrease in burden on the developer, as long as you operate within it. I don't miss MRR one bit.

 

ARC is awesome and I love it. Apple and the clang committers are working hard to keep improving it and it's only going to keep getting better.