Logging

A personal bugbear of mine is developers not being able to write clear, effective logging. This seemingly trivial task appears to cause a great number of developers no end of problems, and yet it shouldn’t necessarily be that hard. Why is it then, that when I go to interrogate a log file I have to trawl though kilobytes (or worse) of meaningless rubbish to determine that:

24/04/2013 09:26:19 [ERROR] [Batch941-Thread5]: Unhandled exception has occurred null

That’s a real error message from a system I work on, and that’s all it had to say about the matter. I despair.

There’s a few basic things you can consider that will make you logging a lot more effective.

Use the appropriate log level

Fatal should be reserved for when an application, or a thread, is about to be terminated in a fashion that really isn’t expected. Errors should indicate something recoverable [at an application level] that’s gone wrong that wasn’t expected. The vast majority of fatal and error log messages are really warnings, that is messages indicating that an error has occurred but we’ve been able to carry on. Any occurrences of a fatal or error level message in your logs should have attendant bug reports with either code or configuration fixes to remove those errors. Informational messages should relate to things that people will care about day-to-day, or as additional log output for an initial higher level log output. Everything else is a debug message and will generally be turned off in production systems.

Provide the right level of information

Logs are text, are often large (I’m looking at 2 production log files that are in excess of 12Mb) and are often going to be parsed with simple tools. If I use grep ERROR *.log I should be able to see each error, and enough information about that error to give me a high level overview of what is happening. More diagnostic information can follow the initial error at lower logging levels. There should be enough information following the error that someone reading the log file with default production settings can diagnose the issue, but not so much that they’re drowning in log lines.

Format the messages correctly

Be mindful that when you concatenate strings you may need spaces and other delimiters between output. When you split your output over multiple lines those lines may not be seen on an initial parse of the file. Also, be mindful of how values are going to be displayed. With Java the default toString method on an object isn’t the most useful of outputs in a log file. In contrast, some objects are verbose in the extreme and may break the formatting of your error message by spewing a multiline string onto your single line error message.

Some real world examples

I regularly check our production log files for a number of reasons and find myself facing such doozies as:

30/04/2013 08:45:41 [ERROR] [[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)']: Error

The kicker here is that this could be one of a number of errors. If I see this twice in a log file I have no way of knowing if it’s the same error twice, or two different errors. The error message is badly formatted with the information [hundreds of lines of it] on the next line. Sadly I see this more than a couple of times a day and, as it’s a third party bit of code that’s responsible, theres’ not much I can do about it.

30/04/2013 15:29:28 [INFO ] [[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)']: Email transport connection established
30/04/2013 15:29:29 [INFO ] [[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)']: Email transport connection closed

126 occurrences in todays log files. This needs to be a debug message and an error output if it fails to establish or close the email transport connection. Ironically enough, as I was digging into the code to fix this I discovered that when it does go wrong it reports the error 3 times in 3 different places, resulting in 3 error lines in the logs. Worse still 2 of these lines only state “Error sending email” with no other information other than a stack trace from the exception. That’s three slightly different stack traces, two useless error lines and one useful error line for 1 error which could easily add 15 minutes to the diagnosis time while the developer tries to work out what’s going on.

01:17:45 [ERROR] [[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)']: Failed to create URI Object

Well over 1000 occurrences today alone! My grepfu is reasonable so it I altered it to show the line after that. Turns that someone is trying something nasty with some of our search URLs, but that wasn’t immediately obvious from the error, instead I was resorting to the stack trace. Not only could the error be improved, but we can also downgrade this to a warning. Looking at the code this is only going to happen in two cases: either someone has put a bug in the code, in which case we’ll see thousands of identical warnings and do something about it; someone is trying something nasty when we’d see hundreds of variations on the warning which we can investigate and then improve or ignore depending on the outcome of the investigation. Bug raised to fix both the handling of dodgy URLs and the logging level.

I’m sure I could dig out loads more and thats just with a cursory glance of the files. Logging is an important diagnostic tool, but it’s only as good as you make it.

One thought on “Logging

  1. gibbja

    There should be some form of punishment for developers who can’t provide proper logging. I also would go so far as to say, when coding it should be possible to record at least the major error types, perhaps number them, and then have that as information made available as part of help files. Or to the support organisation.

    I also particularly like where the choice of logging level is either everything, thereby minimising the chance of finding anything useful, or nothing, genuinely nothing even when a crash occurs. I’ve seen this recently on a commercial product. It made me unhappy.

Comments are closed.