M-A's

technology blog

Wednesday, 13 February 2013

Exception handling and non-critical components


Most software developers consider exception handling is different than error handling but should these be handled differently? Wikipedia has a full page on exception handling.

Exception handling in theory

For example let's take memcached clients across various languages. Let's speculate that an error could be the fact that a key is not in memcache and an exception could be that the client failed to connect to the server. Note that it's the library that decides what is an exception or an error and different libraries disagree with each other. It becomes even more confusing in languages like C++ where there isn't a common idiom on where the separation line should lie.

Exception handling in practice

Let's use the 4 partial snippets below to describe the difference in practice. While I'm referring to the AppEngine documentation, it's by pure laziness and this post has nothing to do specifically with AppEngine and has only little to do with memcache client libraries themselves.

It's all about exception and error handling idioms in each language:

C (ref)

memcached_st* memc = memcached(...);
size_t length = 0;
uint32_t flags = 0;
char* value = memcached_get(memc, "item", 4, &length, &flags, &error);
if (value == NULL) {
  // Regen the value.
}

Java (ref)

import com.google.appengine.api.memcache
MemcacheService syncCache = MemcacheServiceFactory.getMemcacheService();
byte[] value = syncCache.get("item");
if (value == null) {
  // Regen the value.
}

python (ref)

from google.appengine.api import memcache
value = memcache.get("item")
if not value:
  # Regen the value.

Go (ref)

import "appengine/memcache"
if item, err := memcache.Get(c, "lyric"); err != nil {
  // Regen the value.
}

Exception handling and components

Can you spot the errors above? To help you, read this post about the chaos monkey
http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html
and figure out what would happen with each of the 4 clients above if a chaos monkey decided to take a memcached server down. In particular, spot the client library design error versus the necessary components to reply to an HTTP request.

It's simple, memcache is not a necessary component in a http request success. In practice, any request to the web server must always succeed in either of the languages above even if memcached is not available. But it won't in 2 out of 4 because they handle exceptions differently from errors. In particular, for an optional module, this turns an innocuous error into an HTTP 500! The fix is to rewrite these as the following:

Java

import com.google.appengine.api.memcache
byte[] value = null;
try {
  MemcacheService syncCache = MemcacheServiceFactory.getMemcacheService();
  value = syncCache.get("item");
} catch ( <Figure out which exception to catch> ) {
  // Nothing to do.
}
if (value == null) {
  // Regen the value.
}

python

from google.appengine.api import memcache
value = None
try:
  value = memcache.get("item")
except <Figure out which exception to catch>:
  # Nothing to do.
  pass
if not value:
  # Regen the value.
At first look, it turns the shortest snippets into horribly long sequences of code for non-important code paths. But there's more issues in there.

Handle a runtime exception, or not

Let's focus on deciding which exception to catch. In particular, this has to be decided at each call site and it's very easy to catch different exceptions at different call site if one is not careful. To make things worse, the exhaustive list of runtime exceptions that can be raised is not documented! In practice, one has to see the exception happening to be able to figure out which exception to handle. But is not always easy in practice to "kill the memcached server and see what happens".

Preemptive handling

A junior developer would be quickly tempted to put a catch all handler like "} catch (Exception e) {" or "except Exception:". It is worse because then you'd catch exceptions like DeadlineExceededException or DeadlineExceededError, which could make an HTTP that could have possibly returned into one that has no chance of completing. https://developers.google.com/appengine/articles/deadlineexceedederrors

Exception documentation

Java tries to solve this problem with checked exception signature but the core problem with exception signature in Java is that it forces the developer to handle the exception he probably doesn't care about, the one explicitly listed, and simultaneously fails to document or help the developer handle the runtime exceptions, which is the one he cares about! It's even worse in Oracle's Unchecked Exceptions — The Controversy, the author uses the following falacy:
Runtime exceptions represent problems that are the result of a programming problem, and as such, the API client code cannot reasonably be expected to recover from them or to handle them in any way.
While inside the memcache client library, a component, the fact that a network connection fails may be considered an exception. But this thinking assumes all the components are necessary to achieve work item completion which, as demonstrated above, is not true, especially in distributed systems. Another example is dynamic content with static content fallback, which netflix also implements. So the hypothesis of linear control flow in the unchecked exception doesn't hold; in modern system, the control flow can adapt to "exceptional" runtime situations and more often than not, doesn't care if a value wasn't in the key store or if the whole db is temporarily unavailable.

Throw or return null?

So next time someone asks to be able to use exceptions in C++, just say no. If you are stuck with Java or python for web development, assume unplanned downtime. If you are starting a new project, strongly consider a language not using exceptions like Golang or C or in the case of C++, with libraries not using exceptions. Personally, I'm totally sold to Golang because exception safe code is shorter.