Automatic Reference Counting

5 minute read

I was bored so I decided to make a blog post on what “Automatic Reference Counting” (ARC) is and more importantly how it can act as a mitigation for Use-After-Free vulnerabilities. As well as other heap-based memory management bugs such as memory leaks.

Introduction

Most of you will have probably heard of garbage collection, most likely in the context of Java. Someone might have said to you before “Java garbage collection is horrible”. Well, they’re right, it is horrible. Mostly because it uses so many resources. Resources which, in the context of a phone are absolutely critical. However, it does serve a very vital & necessary purpose.

Removing the onus of memory cleanup from the developer. If you’re like me, you probably like doing memory cleanup, as much of a pain in the ass as it is, more control is better. But, more control/onus on the programmer also equates to vulnerabilities when they get it wrong.

Garbage Collection

Garabge collection therefore is the process of reclaiming unused memory automatically during runtime, i.e; automatic destruction of unused objects. This has some benefits, the biggest one in the context of a developer is that it means they have less to worry about and therefore have to write less code. The biggest benefit in the context of security is that, unless there is a vulnerability in the Java garbage collector, bugs such as Use-After-Free become much, much less prevalent & much harder to exploit.

Languages like C and C++ have to use free() and delete() respectively, and this is at the burden of the developer. But as I mentioned, in Java this is done automatically. There’s a few methods for unreferencing an object;

  • Nulling it.
  • Assigning a reference to a different object.
  • Anonymous object, i.e use a method without assigning a reference.

Okay but what does this have to do with ARC?

Automatic Reference Counting

ARC is a feature of Clang, it is a compile-time memory management technology which inserts two object code messages;

  • retain
  • release

These messages increase/decrease the reference counter during runtime. When the number of references for an object hits zero, it is marked for deallocation. ARC is somewhat like GC. In essence, all ARC does is track and manage an applications memory usage. This provides similar behavior to that of GC in that, memory management “just works” The developer does not bear the burden of memory management.

ARC does this in a similar (but not the same) way that GC does. Yes, it automatically frees up the memory used by class instances when they’re no longer needed but it also does it in such a way that it doesn’t need as many resources as GC does.

Differences Between ARC and GC

I should explain why ARC is only “similar (but not the same)” to GC. GC runs a background process that deallocates the objects asynchronously during runtime. The backend process achieves this by periodically transversing graphs of managed objects.

ARC on the other hand does not have a backend process that deallocates asynchronously at runtime and it does not automatically handle reference cycles like GC does. This means that as long as there are “strong” references to an object, it will not be deallocated. This is sometimes known as litter collection.

Further details on these kind of differences is out of the scope of this post. But that should explain why they’re not the same but are similar in what they intend to achieve.

ARC in Depth

Each time you create a new instance of a class, ARC allocates a chunk of memory to store metadata about that instance. This chunk holds details about the type of instance as well as any values of any stored properties that are associated to the instance.

As mentioned, when an instance is no longer needed, ARC frees up the memory that instance was using so that it can be reallocated for other purposes. However, if ARC attempted to deallocate an instance that was still in use then it would no longer be possible to call any methods from the instance, nor access its properties. Most of the time, this would cause a crash… Essentially, UAF.

To ensure that instances don’t get deallocated when they’re still needed, ARC tracks how many properties, constants and variables that are currently referencing each class instance. ARC will then prevent deallocation as long as at least one active reference still exists. In order to make this possible, whenever a class instance is assigned to a property, constant or variable that thing makes what is known as a “strong reference” to the instance. This “strong reference” is what prevents the deallocation for as long as that strong reference exists.

Apple has a really good example of this in the Swift documentation, so I won’t go into it further, you can find it here.

How does ARC Prevent Use After Free?

Simply, by maintaining a count of how many locations can reference an object, once your counter hits zero i.e, your last reference has now gone out of scope, assuming the counting did not go wrong, you have proven by defintion that the memory can be freed without a chance of a UAF occurring.

Further Reading

There is of course, a lot more detail you can add to all of this. In-fact, there are a few good papers which go into implementing the principle of ARC as a GCC compile-time optimization as to allow developers to write in C/C++ without the need for manual memory management. Two such example papers I would recommend are;

Perhaps I might do a paper summary post of each of these papers in the future. They are really interesting so I would highly recommend reading them.

There’s always more to learn about a given technology. Expect future posts on this subject. I am quite interested in compile-time exploit mitigations so this is definitely something I’d like to talk more about in the future.

References