Skip navigation

Fast Thread-Safe Map Cache

Recently I came across an interesting programming problem I want to share with you. In short this post discusses thread-safe and fast java implementation of a cache for some special case.

To give you some background, this problem arose in scope of my pet project PUNKSearch (local network indexer and searcher based on Lucene). PUNKSearch shows online/offline status of the hosts displayed in search results. These statuses are cached for several minutes to avoid “dogpile effect” on remote hosts and increase overall responsiveness of the application. Cache is organized as HashMap for which keys are IPs and values are pairs of Date and Boolean wrapped in some extra object named HostStatus. The Date is used to determine if we need to refresh the status for the host and Boolean defines whatever host is online. Obtaining status is a Slow Process (timeout usually is 5 seconds).

Since PUNKSearch is a web application this cache must be thread-safe. Another property we want it to have is speed. Locking the whole cache for the period some thread tries to determine if a host is online is no way.

Lets try to construct such a cache.
Try 1: The simple not thread-safe implementation (cache renewal is dropped for the sake of clearness)


class MapCacheSimpleFastBroken {

	private Map<String, Object> cache       = new HashMap<String, Object>();
	private SlowProcess         slowProcess = new SlowProcess();

	public Object get(String key) {
		Object value = cache.get(key);
		if (value == null) {
			value = slowProcess.get(key);
			cache.put(key, value);
		}
		return value;
	}
}

This class obviously not thread-safe. What can we do to make it thread-safe? Of course, make the method “synchronized”!

Try 2: The simple thread-safe implementation (cache renewal is dropped for the sake of clearness)


class MapCacheSimpleSlowSafe {

	private Map<String, Object> cache       = new HashMap<String, Object>();
	private SlowProcess         slowProcess = new SlowProcess();

	public synchronized Object get(String key) {
		Object value = cache.get(key);
		if (value == null) {
			value = slowProcess.get(key);
			cache.put(key, value);
		}
		return value;
	}
}

This implementation is thread-safe, good. But it is extremely slow! How can we avoid locking the whole cache object? What if we lock only required portion of the map? The map key in particular. Lets try.

Try 3: The fine-grained want-to-be thread-safe implementation (cache renewal is dropped for the sake of clearness)


class MapCacheTrickyFastBroken {

	private Map<String, Object> cache       = new HashMap<String, Object>();
	private SlowProcess         slowProcess = new SlowProcess();
	private static final Object STUB        = new Object();

	public Object get(String key) throws InterruptedException {
		String lock = "";
		// sync on the whole cache to: 1) get cached data, 2) insert stub if necessary
		synchronized (cache) {
			Object value = cache.get(key);
			if (value != null && value != STUB) { // "null" only if this is the first time we see this key
				return value;
			} else if (value == null) { // we see the key for the first time, add the stub to create a key in map keySet
				cache.put(key, STUB);
			}
			// init lock with the key for the required object
			for (String cacheKey : cache.keySet()) {
				if (cacheKey.equals(key)) {
					lock = cacheKey;
					break;
				}
			}
		}
		// grab the lock only on the part of the cache map, so online checks for different hosts can go simultaneously
		// can't use "key" argument as lock here, since we want to sync on cache's key object
		synchronized (lock) {
			Object value = cache.get(key);
			// maybe object was already updated by other thread while we were waiting for lock?
			// call the slow process only if that was not happened
			if (value == STUB) {
				value = slowProcess.get(key);
				cache.put(key, value);
			}
			return value;
		}
	}
}

This implementation seems to be ok. We lock the whole map for a short period of time to verify if we have required object in the cache and possibly modify the map. If the object was not found we put a stub into the map to sync against its key later.
Bad news. This implementation has a serious flaw. Can you find it? Look under the cut for the explanation and the correct implementation.

Indeed, the Try 3 is broken. Assume two threads want to obtain an object for the same key “foo”.
Now imagine first thread has left the 33 line and the second thread just fall into “else” branch at the line 14 (i.e. it is at the line 15 now). Obviously the second thread drops all that hard work done by the first thread and have to call the slow process again. Not good. Additionally notice the crap we have at lines 18-23 to obtain the key object. What we can do to avoid these 2 issues? Lets try.

Last Try: The fine-grained thread-safe implementation (cache renewal is dropped for the sake of clearness)


class MapCacheTrickyFastSafe {

	private ConcurrentHashMap<String, Object> cache       = new ConcurrentHashMap<String, Object>();
	private Map<String, String>               keys        = new HashMap<String, String>();
	private SlowProcess                       slowProcess = new SlowProcess();
	private static final Object               STUB        = new Object();

	public Object get(String key) throws InterruptedException {
		String lock = "";
		// sync on the whole cache to: 1) get cached data, 2) insert stub if necessary
		synchronized (cache) {
			Object value = cache.putIfAbsent(key, STUB);
			if (value != null && value != STUB) { // "null" only if this is the first time we see this key
				return value;
			} else if (value == null) { // we see the key for the first time, store it
				keys.put(key, key);
			}
			// init lock with the stored reference to the key for the required object
			lock = keys.get(key);
		}
		// grab the lock only on the part of the cache map, so online checks for different hosts can go simultaneously
		// can't use "key" argument as lock here, since we want to sync on cache's key object
		synchronized (lock) {
			Object value = cache.get(key);
			// maybe object was already updated by other thread while we were waiting for lock?
			// call the slow process only if that was not happened
			if (value == STUB) {
				value = slowProcess.get(key);
				cache.put(key, value);
			}
			return value;
		}
	}
}

Obviously, we added the “keys” map to search for required lock object quickly. This solves that crap 18-23 lines of Try 3.

What about thread safety? It is putIfAbsent() method of ConcurrentHashMap which helps us a lot.
The method is equal to the following code snippet (except that the action is performed atomically):


if (!map.containsKey(key))
	return map.put(key, value);
else
	return map.get(key);

Finally, we have fast and thread-safe cache implementation!
Please note, however, all implementations lack renewal of cached objects and this was done by intention for this post. Moreover, implementations suffer from memory leak. Can you see it? That simple — all distinct keys create respective objects in cache and that objects never dropped from the cache (may be replaced, yes). So the map is growing constantly. For the production implementation one may want to implement periodical cleanup of the map from very old objects using TimerTask or as side effect of the get() method call.

I hope this post was not extra boring and more than that I hope the last try is indeed thread-safe :) Have I missed something? Let me know ;)

8 Comments

  1. Posted August 19, 2008 at 4:42 am | Permalink

    Yury you made me thinking more :) We were having conversation about that problem over IM yesterday. I still don’t have good solution for all the problems, but to me it’s more about trade-offs. I think it’s Ok (for your application) to have a few threads trying to obtain connection to the same host. But to avoid race-condition I was talking about there is a solution

    
    class Cache {
        private ConcurrentHashMap<String, Object> cache = new ConcurrentHashMap<String, Object>();
        private SlowProcess slowProcess = new SlowProcess();
        public Object get(String key) {
            Object value = cache.get(key);
                if (value == null) {
                    Object newValue = slowProcess.get(key);
                    //returns the existing value or null
    	            value = cache.putIfAbsent(key, newValue);
    	        }
                if (value == null){
                    value = newValue;
                }
            return value;
    }
    

    Spied here.

  2. Posted August 19, 2008 at 1:34 pm | Permalink

    Yeah, thanks for proposal. This may work sometimes. The problem is if you have > 10 users searching for something and cache becomes invalidated by timeout then you in trouble.

    The thing is in networking hardware. Switches/hubs (at least in our network) can’t process more than 10 simultaneous requests, timeout occur or smth like that. This is why (for example) I forced to use max 10 threads while indexing local network with PUNKSearch.

    And thanks for the link! A lot of interesting info there (especially in comments with further links)!

  3. Zsolt
    Posted August 21, 2008 at 11:11 am | Permalink

    Check Brian Goetz’s sample code for his Java Concurrency in Practice book (highly recommended)

    http://www.javaconcurrencyinpractice.com/listings/Memoizer.java

  4. Bob Hansen
    Posted August 21, 2008 at 2:57 pm | Permalink

    ZSolt: Heh.

    Miller’s Hypothesis: “Any interesting Java question has already been thoroughly discussed and decided in either Java Concurrency in Practice or Effective Java.”

  5. Stephen Colebourne
    Posted August 21, 2008 at 5:50 pm | Permalink

    The last code is not thread-safe. You use a String as a lock which you should never do – in particular its an interned String (as it is a constant). Any other class can synchronize on a “” and that will lock the same object as your map. A good lock object is new Object(). Or use a Java 5 Lock.

  6. Posted August 21, 2008 at 9:55 pm | Permalink

    @Zsolt: Thanks a lot!
    I was suspecting “java.util.concurrent” has something to solve the problem. Thanks to pointing at the exact solution and Brian Goetz’s book.
    For all who wants a description why this Memoizer class is the solution, see here:
    http://codeidol.com/java/java-concurrency/Building-Blocks/Building-an-Efficient,-Scalable-Result-Cache/

    @Stephen Colebourne: I guess you missed line 19 of the last try. Lock is assigned to the “key” object (yes, that is String also, but not “”, smth like “192.168.1.2″).
    Line 09 may look like “String lock;” and code compiles and works after what. I was trying to cleanup the code from all unnecessary statements, but missed this one.

  7. Posted November 15, 2008 at 5:14 pm | Permalink

    Yurik, if you look into implementation of data structures in concurrent.* you’ll see that often there are magic numbers like 32 segments for concurrent access. Just tune that parameter for your needs ;-)


Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*