Category Archives: Thoughts & Scribbles

Chasing 100% Coverage

Unit tests, as we all know, are A Good Thing™. It stands to reason then that 100% unit test coverage must also be A Good Thing™, and therefore is something that should be strived for at all costs. I’d argue this is wrong though.

100% unit1 test coverage can be exceptionally difficult to achieve in real world applications without jumping through a whole load of hoops that end up being counterproductive. The edges of your system often talk to other systems which can be difficult to isolate or mock out. You can often be left chasing that last few percent and making counterintuitive changes to achieve them. At this point I’d say it’s better to leave the clean, readable, untested code in and just accept 100% coverage isn’t always possible.

This leads to another problem though. Once you’re not hitting 100% coverage you need to be sure that code that isn’t covered is actually code you can’t cover. As your code base gets bigger the amount a single missed line of code affects your coverage gets smaller.

PHPUnit takes a pragmatic approach to this issue; it allows you to mark blocks of code as being untestable. The net result is that it simply ignores these blocks of code in its coverage calculations allowing you to get 100% coverage of testable code.

Quite a few people who I’ve told about this have declared this to be ‘cheating’, however, lets look at a very real issue I have in one of my bits of Java code. I have code that uses the list of locales that can be retrieved by Java. It uses the US as default since it’s reasonable to assume that the US locale will be set up on a system. While highly improbably, it’s not impossible for a system to lack the US locale and the code handles this gracefully. Unit testing this code path is impossible as it involves changes to the environment. I could conceivably handle this in functional tests, but it’s not easy. I could remove the check and just let the code fall over in a crumpled heap in this case, but then if it ever does happen someone is going to have a nasty stack trace to deal with rather than a clear and concise error message.

If I could mark that single block of code as untestable I would then be able to see if the rest of my code was 100% covered at a glance. As it is I’ve got 99 point something coverage and I need to drill into the coverage results to ensure the missing coverage comes from that one class. Royal pain in the behind.

I am willing to concede that the ability to mark code as untestable can be abused, but then that’s what code reviews are for. If someone knows how to unit test a block of code that’s marked as untestable they can fail the review and give advice to the developer that submitted the code.


1 People seem to get unit and functional tests muddled up. A unit test should be able to run in isolation with no dependency on environment or other systems. Functional tests, on the other hand, can make assumptions about the environment and can require that other systems are running and correctly configured. The classic example is a database. Unit tests should mock out the database connection and used canned data within the unit tests. Functional tests can connect to a real database pre-populated with test data. Typically functional tests also test much larger blocks of code than individual unit tests do.

Compromised, or How Not To Do Customer Services

Today I went from being a huge Playstation fan to being ardently anti Sony. The reason? Their customer services. It all started with an email saying I’d bought an add on to Call Of Duty today at 12:45. The problem with this is that I don’t own Call Of Duty, and haven’t had the PS3 on for a couple of weeks. Obviously my account has been compromised.

First things first, I changed the password, then confirmed the purchase was made and wasn’t a fishing email, then I called Sony. This call did not go well.

Initially I was told that “you must have turned the console on today to make the purchase”. Really. So you’re calling me a liar, or an idiot. Then, after some investigation, its revealed that another PS3 has been associated with my account. It was this PS3 that was used to make the purchase. When I stated that I had never shared my account I was told that the second console was logged in first time, therefore they had to have had my details and, while not explicitly stated, I was practically accused of handing over my account details to someone else.

To add insult to injury the money stolen from my wallet couldn’t be refunded. In fact, there was nothing Sony could, or would do. They wouldn’t believe that I hadn’t shared my details. Couldn’t refund the stolen money and wouldn’t cancel my account over the phone. I am livid. OK, the money stolen wasn’t a huge amount, but that’s not the point. Sony are hiding behind their terms of service and don’t give a crap that I’ve been hacked and robbed. I don’t want to deal with a company like that. I’ve requested my account be cancelled and will relegate the PS3 to a glorified DVD and BR player.

I work in IT. I know that systems are get compromised, people’s details get leaked, all I was looking for was some positive action to avoid any further damage to my account. Instead I got disbelief and no help. Suffice to say I won’t be getting a PS4 now.

Why you should be documenting your code

It seems there are two schools of thought when it codes to documenting code. In one corner you’ve got those who feel that the “code should be expressive“; those that feel the “unit tests should document the code“; and there are those who argue that developers are lazy and the documentation gets out of date. These people will argue that there is no point documenting code. And in the other corner there’s me… well, me and the rest of my team, who have all seen sense after I’ve pointed out the error of their ways. 🙂

The rule for Javadoc in our team is “do it properly, or not at all“. I don’t mandate it because if you do you will just end up with rubbish. I do strongly encourage it though. The key word, however, is properly. I’m not just talking about ending the first sentence with a full stop, and making sure you have all the correct param, returns and throws tags; I mean well written, well thought out documentation.

I’ll handle the lazy developers argument first. The same argument can be made for not unit testing, and yet I doubt there are many people who would take the stance that not unit testing is OK. It’s part of development. Writing good Javadoc should also be a part of development.

But it’ll get out of date!“, I hear you cry. So not only has a developer done a half arsed job when updating some code by not updating the documentation, everyone who peer reviewed it missed that too. Please, feel free to hide behind that excuse. You may also want to stop doing reviews because you’re not doing them correctly. Also, what else are you missing?

The unit tests should document the code. Let’s consider a hypothetical situation. I’ve got your object and I’m calling a getter. I want to know if you’ll return null or not, because I may have some extra processing if to do if I do get a null value. You expect me to stop, mid flow, go find the unit tests, locate the relevant tests and work out if null is something that might come back? Really? If there are no tests dealing with null return values can I safely assume that null is never returned, or did you just forget to include that? I’ve never bought that argument.

As to the code being descriptive, well yes, it should. But it’s not always evident what’s going on. I can leap into our hypothetical get method and have a poke about to see what’s happening, but what if we’re getting the value from a delegate? What if that talks to a cache and a DAO? How far do I need to delve into your code to work out what’s going on?

And then we come on to the best argument of all: “You only need to document API’s“. It’s all API.

But what is good documentation? To help with that we’ll expand our scenario. Our getter has the following interface:

public String getName(int id);

What would most people put?

/**
 * Gets the name for the given id.
 * 
 * @param id the id
 *
 * @returns the name
 */
public String getName(int id);

It’s valid Javadoc, I’ll give you that, but it’s this kind of rubbish that has given the anti-documentation lot such great ammo. It’s told me nothing that the code hasn’t already told me. Worse still, it’s told me nothing I couldn’t have worked out from the autocomplete. I’d agree with you that having no Javadoc is better than that. This is why I say do it properly, or not at all.

The documentation needs to tell me things I can’t work out from the interface. The method takes an ID, you may want to be more explicit what the ID is. Are there lower or upper bounds for the ID? You return a String, are there any assumptions I can make about this String? Could the method return null?

/**
 * Returns the name of the user identified by the provided
 * user ID, or <code>null</code> if no user with the
 * specified ID exists. The user ID must be valid, that is
 * greater than 1.
 * 
 * @param id a valid user ID
 *
 * @returns the name associated with the given ID, or
 * <code>null</code> if no user exists
 *
 * @throws IllegalArgumentException if the user ID is less
 * than 1
 */
public String getName(int id);

This Javadoc does three things.

  1. It means I can easily get all the information I may need about the method from within the autocomplete dialog in my IDE, so I don’t need to leave my code.
  2. It defines a contract for the method so anyone implementing the code (or overriding it in the case of already implemented code) knows the basic assumptions that existing users already have.
  3. It provides me with a number of unit tests (or BDD scenarios) that I already know I have to write. While I may not be able to enforce the contract within the framework of the code (especially with interfaces), I can ensure the basic contract is followed by providing tests that check id is greater than 1 and that some non-existent ID returns null.

Writing good documentation forces you to think about your code. If you can’t describe what the method is doing succinctly then that in itself is a code smell. It forces you to consider the edge cases and to think about how people are going to use your code. When updating a method it forces you to consider your actions. If you’re changing our get method so that it will always return a default value rather than null will that affect other peoples logic and flow? What about the other way round when you’ve got a method that states it doesn’t return null – change that and you’re likely going to introduce NPEs in other code. Yes, the unit tests should pick that up, but lets work it out up front, before you’ve gone hacking about with the code.

Those of you familiar with BDD may spot that what you’re doing here is just performing deliberate self discovery with yourself. The conversation is between you, and you as a potential future developer. If, during that conversation, you can’t find anything to say that the interface doesn’t already tell you, then by all means leave out the Javadoc. To my mind that’s pretty much just the getters and setters on Javabeans where you know it’ll take and return anything. Everything else can and should benefit from good documentation.

Programmed to fail

One aspect of programming that fascinates me is the psychology of software development. To study Agile is to study people and their interactions, and there are some interactions that seem incredibly hard to break.

Given how often software projects overrun it astounds me that the same patterns occur again and again. Developers seem very reluctant to admit they’re late and tend not to question deadlines until its clear the deadline is now ridiculous. There is a hope that, somehow, everything will fall into place and everything will work come delivery time. And yet experience tells us that things invariably go wrong – the hope is irrational and yet near universal.

Those managing the team can easily spot the signs that a project is drifting into trouble. Confident answers to progress become couched in qualifiers. Progress reports become terse, or filled with excuses. Progress reports may also start sounding very similar week on week. At this point the project sponsors should be alerted to the issue, but again people seem to cling onto this hope that everything will be OK. Reporting up is rarely done early enough or emphatically enough.

Reports of delays need to be emphatic and early as, the closer to release date you get, the less willing sponsors seem to be to accept delays. This may be due to hard and fast deadlines (e.g. shipping physical boxes of software), but I’ve seen it in teams where the deadline is nothing more than a notional line in the sand. Yes, moving that line may be problematic, but not nearly as problematic as putting poorly functioning software live, or missing the deadline with no contingency at all. Simply defining delays in the project as “unacceptable” doesn’t make them go away. Software development is an art, not a science and delivery dates should be treated as malleable until the software is actually delivered.

With all three levels refusing to face facts its little wonder that we have the issues in software development that we do. Agile helps us by giving us tools to counter these issues, but until people can get out of the Big Project Mentality the psychology of large deadlines in the distant future becoming looming deadlines in the very near future will prevail.

Side Projects

A number of years ago I was interviewing someone for a position as a senior software developer at a market leading financial institution. This was a job with a large pay packet, excellent benefits package and potential for reasonable bonuses, so we were being understandably picky. One of my [fairly standard, even now] questions was to ask what development activities he did outside of work. His response was that he spent all day working with computers and that he didn’t want anything to do with them during his free time. The interview was effectively over from that point on.

Software development, like most other technology disciplines, moves at a rapid pace. If you’re standing still you’re moving backwards and it’s frightening how fast you can become obsolete. While you can learn on the job it does require the job containing useful elements you can learn from, and the chances are what you work on day to day isn’t going to be bleeding edge.

The easiest way to reconcile this is with side projects; something worked on during your spare time that allows you to investigate new technology, new languages and new ways of doing things. Interestingly though, I generally find those developers who have side projects are not doing them due to a professional need to keep up, it’s from a personal need to learn. They’d be playing with this stuff even if they didn’t have to.

For me it’s this hunger to learn that makes a good developer. Experience helps, but you can gain experience, I’m not sure you can learn the hunger. It’s not just me who thinks this either, an increasing number of top name technology firms allow their developers to spend a certain portion of their work time on side projects. Even if the project itself end up providing no benefit to the company, the experience, lessons learned and morale boost this practice provides can only help the developer become better and more productive.

There is another side to this though. Developers can be fickle creatures and they can get bored easily. When this happens productivity crashes, and more than likely they’ll start looking for a new job. The day job is never going to be as interesting as the side projects buy its very nature – the product has to make money, and has to satisfy more criteria than simply “be fun“. By accepting that your top guys are going to have side projects, buy allowing them to find what interests them and spend a little bit of time during work time, you’ll find they keep themselves interested.

The question shouldn’t be “why is this developer spending a long lunch working on something not work related?“, the question should be “why aren’t all our developers doing this?“.

Logging

A personal bugbear of mine is developers not being able to write clear, effective logging. This seemingly trivial task appears to cause a great number of developers no end of problems, and yet it shouldn’t necessarily be that hard. Why is it then, that when I go to interrogate a log file I have to trawl though kilobytes (or worse) of meaningless rubbish to determine that:

24/04/2013 09:26:19 [ERROR] [Batch941-Thread5]: Unhandled exception has occurred null

That’s a real error message from a system I work on, and that’s all it had to say about the matter. I despair.

There’s a few basic things you can consider that will make you logging a lot more effective.

Use the appropriate log level

Fatal should be reserved for when an application, or a thread, is about to be terminated in a fashion that really isn’t expected. Errors should indicate something recoverable [at an application level] that’s gone wrong that wasn’t expected. The vast majority of fatal and error log messages are really warnings, that is messages indicating that an error has occurred but we’ve been able to carry on. Any occurrences of a fatal or error level message in your logs should have attendant bug reports with either code or configuration fixes to remove those errors. Informational messages should relate to things that people will care about day-to-day, or as additional log output for an initial higher level log output. Everything else is a debug message and will generally be turned off in production systems.

Provide the right level of information

Logs are text, are often large (I’m looking at 2 production log files that are in excess of 12Mb) and are often going to be parsed with simple tools. If I use grep ERROR *.log I should be able to see each error, and enough information about that error to give me a high level overview of what is happening. More diagnostic information can follow the initial error at lower logging levels. There should be enough information following the error that someone reading the log file with default production settings can diagnose the issue, but not so much that they’re drowning in log lines.

Format the messages correctly

Be mindful that when you concatenate strings you may need spaces and other delimiters between output. When you split your output over multiple lines those lines may not be seen on an initial parse of the file. Also, be mindful of how values are going to be displayed. With Java the default toString method on an object isn’t the most useful of outputs in a log file. In contrast, some objects are verbose in the extreme and may break the formatting of your error message by spewing a multiline string onto your single line error message.

Some real world examples

I regularly check our production log files for a number of reasons and find myself facing such doozies as:

30/04/2013 08:45:41 [ERROR] [[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)']: Error

The kicker here is that this could be one of a number of errors. If I see this twice in a log file I have no way of knowing if it’s the same error twice, or two different errors. The error message is badly formatted with the information [hundreds of lines of it] on the next line. Sadly I see this more than a couple of times a day and, as it’s a third party bit of code that’s responsible, theres’ not much I can do about it.

30/04/2013 15:29:28 [INFO ] [[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)']: Email transport connection established
30/04/2013 15:29:29 [INFO ] [[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)']: Email transport connection closed

126 occurrences in todays log files. This needs to be a debug message and an error output if it fails to establish or close the email transport connection. Ironically enough, as I was digging into the code to fix this I discovered that when it does go wrong it reports the error 3 times in 3 different places, resulting in 3 error lines in the logs. Worse still 2 of these lines only state “Error sending email” with no other information other than a stack trace from the exception. That’s three slightly different stack traces, two useless error lines and one useful error line for 1 error which could easily add 15 minutes to the diagnosis time while the developer tries to work out what’s going on.

01:17:45 [ERROR] [[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)']: Failed to create URI Object

Well over 1000 occurrences today alone! My grepfu is reasonable so it I altered it to show the line after that. Turns that someone is trying something nasty with some of our search URLs, but that wasn’t immediately obvious from the error, instead I was resorting to the stack trace. Not only could the error be improved, but we can also downgrade this to a warning. Looking at the code this is only going to happen in two cases: either someone has put a bug in the code, in which case we’ll see thousands of identical warnings and do something about it; someone is trying something nasty when we’d see hundreds of variations on the warning which we can investigate and then improve or ignore depending on the outcome of the investigation. Bug raised to fix both the handling of dodgy URLs and the logging level.

I’m sure I could dig out loads more and thats just with a cursory glance of the files. Logging is an important diagnostic tool, but it’s only as good as you make it.

Replacing our Git branching strategy

The branching strategy we use at work is one that has evolved from us tentatively learning how Git works, reacting to our mistakes and avoiding the problems that have arisen. I have, on more than one occasion, had to rebuilt the history due to errant merges going into master. Mistakes can also happen resulting in contamination of release branches, or failure to update all required branches, including master. While it works for us, the fact that it can occasionally bite us on the bum, and the fact that I’m having difficulty documenting why we do what we do both point to the conclusion that the strategy is flawed. So I’m binning it, although not until I have a working alternative that we’re happy with.

Key to this new strategy will be how changes are reapplied back to the main and release branches. Rather than simply taking peoples word for why you should or shouldn’t be doing merges or rebases in Git I’ve gone right back to basics and made sure I fully understand what is going on when you merge or rebase and what the implications are – to the point of setting up parallel git repositories and testing the same operations with different strategies on each.

Secondly I need to look at putting branches back on origin. Being distributed can make git a pain in the backside for some things and sometimes you really do just need one place to go look for data. A prime example is Fisheye/Crucible which we use for viewing our git repositories and performing code reviews. Since our JIRA workflow looks to Fisheye/Crucible for information about code changes and code reviews we push all branches to origin. Would a move to Stash remove this need?

Thirdly is our habit of keeping all branches. This leads to a lot of clutter and may or may not be related to the first two points; I’ll need to do more investigation on that front, however, I suspect I’ll be able to greatly reduce the number of branches we have.

What I suspect we’ll end up with is a strategy where most branches are taken from master, with master being rebased back into the branch before the branch is merged into master and then deleted. Release branches will also be taken from master as needed. Fixes to release branches will be branched from the release branch, the release branch rebased back in when work is done, and then the fix merged into the release branch. The release branch will then be merged back into master. At some point the release will be tagged and deleted. Pull requests using Stash will hopefully obviate the need to push feature branches to origin. How well that plan survives contact with reality I don’t know.

Revisiting Git

I first discovered Vincent Driessen’s branching model when I was investigating Git a couple of years ago. At the time I was desperate to get away from the merge issues that we were having with Subversion and was looking into both Git, and different branching models using it. Given the steep learning curve with Git we decided not to complicate things further and to stick with our old branching model; albeit with a slight change in naming convention borrowed from Vincent.

Now I’m a little more comfortable with Git it’s time to revisit our branching strategy, and use of Git overall. I’ve looked at Vincent’s branching model again, and looked at git-flow in great depth and have concluded it’s still not for us, however, the idea of codifying the workflow, and providing git extensions to aid using the workflow appealed to me immensely.

Currently we’re doing little more than branching off master for issues, with each issue (be it a new feature, or bug fix) living in its own branch. Issues are rebased into master when they’re done. Releases are also cut from master and any patches (either to the release version in QA, or to the release version in live) are cut from the release branch, rebased into that and then rebased into master.

In order to simplify our JIRA and Fisheye/Crucible setup we also push issue branches to origin making them visible from a central point, as well as providing a backup. These branches are considered private and not used by others.

Since we have tight control over all the branches a rebase only strategy works fine for our team, although we have had some problems caused by accidental merges. The next step is to improve the workflow and the merge/rebase strategies to survive all situations, codify these steps and then script them – something I’ve already started doing.

I’m also looking at this as a potential opportunity to possibly move away from Fisheye and Crucible to Stash, potentially saving some money whilst keeping the tight JIRA integration.

Interviews

Interviewing effectively is a hard skill to master. Not only do you need to work out if the person being interviewed can do the job, but also if they’ll be the best out of all the candidates and if they’ll fit in to the team dynamic. All this in a very finite space of time. You’re not going to manage that if you just ask a few questions you’ve downloaded from the web.

I’m unashamedly evil when it comes to interviews. Once the niceties are over you’re going to find yourself face to face with my laptop and a copy of Eclipse. What happens over the next 45 minutes or so is entirely down to you.

This method of trial by fire is something I’ve endured a number of times, and every time I’ve hated it. You’re in the spotlight, you’re under pressure and you’ve got someone questioning everything you do. I recognised very quickly that this is a very powerful interview tool because it leaves you wide open and quickly cuts through any front.

What I ask you to write (which, incidentally, will be decided based on the preamble, the telephone interview if there was one, and your CV) is pretty much irrelevant, as is whether you finish it. The discussion and discovery that happens in the time is what counts. I’ll probe as many subjects as I can, dropping them as soon as I’ve determined if you’re strong or weak in that area. All of this on a platform you may not be familiar with because I use OSX.

You can use the internet because it’s not about what you can remember, it’s about how you apply knowledge; you can ask questions – in fact failure to ask any questions will probably have you flunking the interview; you can say you don’t know, we’ll move on – it is actually OK not to know every tiny thing I question you on.

It’s brutal, I know. I’ve been there. You’re constantly off guard, the goal posts keep moving, the questions keep coming and during all that someone is expecting you to code at the same time. Do well in that and the job itself will be a cakewalk. But hey, if you’ve made it to the interview then I obviously think your CV matches what I’m looking for, so you shouldn’t have a problem 😉

Job Specs and CVs

Job specs annoy me. There seems to be a standard formula which contains a number of canned phrases that are absolutely pointless to certain job types; especially those I’d be applying for, or, as I am currently, advertising for. These are some snippets I’ve pulled from a similar job advert on the internet:

“Strong Analytical capabilities”

It’s a java development role. Strong analytical capabilities should be a given. I could understand it if you were advertising for a role that paid below £20K, but there are certain things that you can start assuming one you get into the realms of higher paying jobs.

“A thorough attention to detail”

I’m actually hard pressed to think of an employer who wouldn’t want that.

“Presentable and organized” [sic]

Coming, as that did, after “A thorough attention to detail” you have to love the irony in the use of the American spelling in a British job advertisement.

This isn’t just a one off either. Most job adverts are like this, and when they’ve dispensed with the banal and pointless buzzwords there comes the smorgasbord of required skills. This is usually just a list of all the technology in use without proper consideration to if it’s actually required. It also tends to bear no relation to the salary on offer and, worse still, adverts these days have the wonderfully vague £competitive or £negotiable. In effect, most job specs say the following:

“Wanted, one development guru willing to work on junior wages. Social skills desirable.”

Norwich is not a hotbed of Java shops, so it’s understandably light on Java developers. Restricting myself with a “standard” job spec would just further limit my options. Instead, lets focus on what I really want, in the order I want it: a developer who in an A player, who has a voracious appetite for learning about new things, and who has good working knowledge of Java. That’s it, everything else is a bonus because with that base they can learn everything they need quickly, and lets face it, it’s rare that you can just walk into a new job without having to learn something.

WebLogic? We use it, but I’ve got step by step instructions on how to use that so even a non technical person should be able to cope with the amount of WebLogic knowledge that’s required. Oracle? Provided you can string together some SQL for at least one database you’ll probably be fine. JSPs, Servlets, Hibernate, Spring, JUnit, TDD, BDD, Ant, Maven, Eclipse? Yes, all lovely to have, and the more experience the better, but I don’t want to prevent an otherwise excellent developer from applying just because I’ve done the shotgun approach to skills. I’ll list them, but just so you can make the bits of your CV that include the relevant skills shine.

You’ll also note I’ve not listed how many years experience with Java I want. I’ve never thought was productive. I’ve known people with just a few months of experience in a language produce far superior code to people with several years experience. The last person I hired was technically an ActionScript developer with some Java knowledge. They were an excellent find. I’d rather take an enthusiastic, bright and eager junior developer who will grow over the years than an experienced developer who is now resting on their laurels. This is software development; evolve or die.

Degree? I don’t have one. They’re useful when getting into the job market, but once you’ve got a few years experience under your belt they become less of an issue. Mandating one is just another way of artificially limiting the available pool of candidates for no good reason.

This doesn’t mean I’ll take anyone though. Your CV has to convince me you’re an A player. It has to tell me you’ve got the appetite for learning that I want. It needs to show me that you’ll grow in your role. Sadly in todays world of keyword matching not many CVs do that.