Wednesday, February 9, 2011

C++: Multithreading and refcounted object

I'm currently trying to pass a mono threaded program to multithread. This software do heavy usage of "refCounted" objects, which lead to some issues in multithread. I'm looking for some design pattern or something that might solve my problem.

The main problem is object deletion between thread, normally deletion only decrement the reference counting, and when refcount is equal to zero, then the object is deleted. This work well in monothread program, and allow some great performance improvement with copy of big object.

However, in multithread, two threads might want to delete the same object concurrently, as the object is protected by a mutex, only one thread delete the object and block the other one. But when it releases the mutex, then the other thread continue its execution with invalid (freed object), which lead to memory corruption.

Here is an example with this class RefCountedObject

class RefCountedObject
{
public:
RefCountedObject()
: _refCount( new U32(1) )
{}

RefCountedObject( const RefCountedObject& obj )
: _refCount( obj._refCount )
{
 ACE_Guard< ACE_Mutex > guard( _refCountMutex );
 ++(*_refCount);
}

~RefCountedObject()
{
 Destroy();
}

RefCountedObject& operator=( const RefCountedObject& obj )
{
    if( this != &obj )
    {
        Destroy();
  ACE_Guard< ACE_Mutex > guard( _refCountMutex );
        _refCount = obj._refCount;
        ++(*_refCount);
    }

    return *this;
}

private:
    void Destroy()
 {
  ACE_Guard< ACE_Mutex > guard( _refCountMutex );  // thread2 are waiting here
        --(*_refCount);         // This cause a free memory write by the thread2
        if( 0 == *_refCount )
            delete _refCount;
 }

private:
    mutable U32* _refCount;
    mutable ACE_Mutex _refCountMutex; // BAD: this mutex only protect the refCount pointer, not the refCount itself
};

Suppose that two threads want to delete the same RefCountedObject, both are in ~RefCountedObject and call Destroy(), the first thread has locked the mutex and the other one is waiting. After the deletion of the object by the first thread, the second will continue its execution and cause a free memory write.

Anyone has experience with a similar problem and found a solution ?


Thanks all for your help, I realize my mistake: The mutex is only protecting refCount pointer, not the refCount itself! I've created a RefCount class which is mutex protected. The mutex is now shared between all refCounted object.

Now all works fine.

  • Surely each thread simply needs to manage the reference counts correctly... That is, if ThreadA and ThreadB are both working with Obj1 then BOTH ThreadA and ThreadB should own a reference to the object and BOTH should call release when they're done with the object.

    In a single threaded application it's likely that you have a point where a reference counted object is created, you then do work on the object and eventually call release. In a multi-threaded program you would create the object and then pass it to your threads (however you do that). Before passing the object to the thread you should call AddRef() on your object to give the thread its own reference count. The thread that allocated the object can then call release as it's done with the object. The threads that are working with the object will then call release when they're done and when the last reference is released the object will be cleaned up.

    Note that you dont want the code that's running on the threads themselves to call AddRef() on the object as you then have a race condition between the creating thread calling release on the object before the threads that you've dispatched to get a chance to run and call AddRef().

  • Any object that you are sharing between threads should be protected with a mutex, and the same applies to refcount handles ! That means you will never be deleting the last one handle to an object from two threads. You might be concurrently deleting two distinct handles that happen to point to one object.

    In Windows, you could use InterlockedDecrement. This ensures that precisely one of the two decrements will return 0. Only that thread will delete the refcounted object.

    Any other thread cannot have been copying one of the two handles either. By common MT rules one thread may not delete an object still used by another thread, and this extends to refcount handles too.

    From MSalters
  • One solution is to make the reference counter an atomic value, so that each concurrent call to destroy can safely proceed with only 1 deletion actually occurring, the other merely decrementing the atomic reference count.

    The Intel Thread Building Blocks library (TBB) provides atomic values.

    Also, so does the ACE library in the ACE_Atomic_Op template.

    The Boost library provides a reference counting smart pointer library that already implements this.

    http://www.dre.vanderbilt.edu/Doxygen/Current/html/ace/a00029.html http://www.boost.org/doc/libs/release/libs/smart_ptr/shared_ptr.htm

    From grrussel
  • I believe something along this line would solve your problem:

    private:
        void Destroy()
        {
    ACE_Guard< ACE_Mutex > guard( _refCountMutex ); // thread2 are waiting here if (_refCount != 0) { --(*_refCount); // This cause a free memory write by the thread2 if( 0 == *_refCount ) { delete _refCount; _refcount = 0; } } } private: mutable U32* _refCount; mutable ACE_Mutex _refCountMutex;

    From tloach
  • If the count is part of the object then you have an inherent problem if one thread can be trying to increase the reference count whilst another is trying to remove the last reference. There needs to be an extra value on the ref count for each globally accessible pointer to the object, so you can always safely increase the ref count if you've got a pointer.

    One option would be to use boost::shared_ptr (see the docs). You can use the free functions atomic_load, atomic_store, atomic_exchange and atomic_compare_exchange (which are conspicuously absent from the docs) to ensure suitable protection when accessing global pointers to shared objects. Once your thread has got a shared_ptr referring to a particular object you can use the normal non-atomic functions to access it.

    Another option is to use Joe Seigh's atomic ref-counted pointer from his atomic_ptr_plus project

  • thinking about your issue a little... what you're saying is that you have 1 object (if the refcount is 1) and yet 2 threads both call delete() on it. I think this is where your problem truly lies.

    The other way round this issue, if you want a threaded object you can safely reuse between threads, is to check that the refcount is greater than 1 before freeing internal memory. Currently you free it and then check whether the refcount is 0.

    From gbjbaanb
  • This isn't an answer, but just a bit of advice. In a situation like this, before you start fixing anything, please make sure you can reliably duplicate these problems. Sometimes this is a simple as running your unit tests in a loop for a while. Sometimes putting some clever Sleeps into your program to force race conditions is helpful.

    Ref counting problems tend to linger, so an investment in your test harness will pay off in the long run.

    Ian Hickman : +1 the only real way of solving multi-threaded issues is to make them repeatable.
    From twk

0 comments:

Post a Comment