Tag Archives: development

The Joy Of Development

It would appear someone has stolen my week. It’s Thursday afternoon already and I left the office wishing I could stay for another hour or two as I was on a roll and wanted to finish what I was working on. While I do have the ability to work on the train it’s offline for most of the journey, and interjects a 30 minute delay which is enough to derail any train of thought. Instead I need to wait until I get home to finish off.

It’s been a long time since I’ve been caught up in The Joy of Development at work. Too long in fact. I’d almost forgotten how much fun it is to take a complex problem and provide an elegant solution. I’d also forgotten how much of an arse it is to get OSX to play nicely as a web server – something the Apple make even harder when you’ve got OSX server running on the box. Still judicious amounts of Googling and sudoing have fixed that issue so shortly we shall have beautiful reports and pretty graphs coming from our CI builds which we can proudly display in the office on our dashboard screens.

Next job is to take the rather meta PostIt note from the whiteboard1 saying “build Kanban board” and turn it from a makeshift board with a few things stuck on it into a work of functional art. Then we can take the newly purchased PostIt notes, write down everything that needs to be done between now and the 21st2, panic at the size of the backlog and then wish we hadn’t visualised it.

All of which is a nice segue-way into a weekend of learning about continuous delivery into the cloud and blue/green deployments; although I could quite happily handle several more working days in this week before the weekend. I’ve missed enjoying work.


1 Actually, there’s two of them, and this one is on the left, so technically it’s the wheftboard, not the whiteboard… anyone? … no? … One is on the wheft, one is on the white? … really? No-one? Fine, forget it.

2 Release date for our open beta; sign up if you haven’t, it’s rather cool stuff. Also the date of my next talk, which I really need to get finished.

Interview

Depending how you look at it, my interview with RainBird was either non-existent featuring, at best, an informal chat; or it was a gruelling 5 year affair where I had to prove myself by working my way up the ranks of a totally different company. Either way it wasn’t your standard technical interview.

I’ve written on the subject of interviews before, but that was for an established company hiring a developer. At a startup you’re hiring a manager/secretary/handyman who can also code and do a million other things that need to be done, which is a very tall order. I’m not entirely sure how you’d go about doing that without knowing that person and seeing, first hand, what that person was capable of over a prolonged period of time.

This approach to hiring means you can dispense with the incredibly narrow (and often counterproductive) fallacy that you must hire someone with X years experience in technology Y1, because that’s what you use. RainBird needs developers who can code in Node.js, AngularJS, plus a smattering of C++ and Prolog. If we’re charitable I have 1 months worth of industry C++ experience… from over 15 years ago.

Despite that seeming handicap I like to think I have a good understanding of a number of programming languages, including Javascript, a good grasp of architecting systems, the ability to manage a team, a broad set of organisational skills and the ability to build furniture that means, regardless of the technology being used, the longer term benefit I bring to the company by far and away beats the incredibly short term drawback of me having to get up to speed with some new stuff.


1 And don’t get me started on the whole “must be a self starter; must work well by themselves or as part of a team; must have excellent communication skills”; what does that even mean? You’d be unlikely to hire a lazy illiterate who didn’t play well with others for something as simple as a job at McDonalds, let alone put them into a development role – please, for the love of God, stop putting this crap into job specs.

Eclipse, OSX and JDK 1.7

Despite being a massive Mac fanboi I am the first to admit that as soon as you start going a little off piste with OSX you run into problems that require technical knowledge to fix. Java development on the Mac falls into the category of off piste and it has always been more than a little fun getting things set up.

Now that Oracle are providing the JDK it seems that things no longer live quite where they do which left me scratching my head when trying to get Eclipse working with JDK 1.7.

Installing JDK 1.7 is easy, go to the Oracle download page, grab the 64bit OSX DMG, open, run, job done.

$ java -version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)

Now to tell Eclipse where the JDK is:

$ ls -l `which java`
lrwxr-xr-x  1 root  wheel  74 24 Oct 15:37 /usr/bin/java -> /System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands/java

Great… except Eclipse doesn’t recognise /System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands/ or /System/Library/Frameworks/JavaVM.framework/Versions/Current/ as a valid JDK location.

A bit of Googling I discovered the magic java_home command.

$ /usr/libexec/java_home -v 1.7
/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home

Giving that directory to Eclipse made it happy and I’m now able to use an up to date version of Java for my code.

Chasing 100% Coverage

Unit tests, as we all know, are A Good Thing™. It stands to reason then that 100% unit test coverage must also be A Good Thing™, and therefore is something that should be strived for at all costs. I’d argue this is wrong though.

100% unit1 test coverage can be exceptionally difficult to achieve in real world applications without jumping through a whole load of hoops that end up being counterproductive. The edges of your system often talk to other systems which can be difficult to isolate or mock out. You can often be left chasing that last few percent and making counterintuitive changes to achieve them. At this point I’d say it’s better to leave the clean, readable, untested code in and just accept 100% coverage isn’t always possible.

This leads to another problem though. Once you’re not hitting 100% coverage you need to be sure that code that isn’t covered is actually code you can’t cover. As your code base gets bigger the amount a single missed line of code affects your coverage gets smaller.

PHPUnit takes a pragmatic approach to this issue; it allows you to mark blocks of code as being untestable. The net result is that it simply ignores these blocks of code in its coverage calculations allowing you to get 100% coverage of testable code.

Quite a few people who I’ve told about this have declared this to be ‘cheating’, however, lets look at a very real issue I have in one of my bits of Java code. I have code that uses the list of locales that can be retrieved by Java. It uses the US as default since it’s reasonable to assume that the US locale will be set up on a system. While highly improbably, it’s not impossible for a system to lack the US locale and the code handles this gracefully. Unit testing this code path is impossible as it involves changes to the environment. I could conceivably handle this in functional tests, but it’s not easy. I could remove the check and just let the code fall over in a crumpled heap in this case, but then if it ever does happen someone is going to have a nasty stack trace to deal with rather than a clear and concise error message.

If I could mark that single block of code as untestable I would then be able to see if the rest of my code was 100% covered at a glance. As it is I’ve got 99 point something coverage and I need to drill into the coverage results to ensure the missing coverage comes from that one class. Royal pain in the behind.

I am willing to concede that the ability to mark code as untestable can be abused, but then that’s what code reviews are for. If someone knows how to unit test a block of code that’s marked as untestable they can fail the review and give advice to the developer that submitted the code.


1 People seem to get unit and functional tests muddled up. A unit test should be able to run in isolation with no dependency on environment or other systems. Functional tests, on the other hand, can make assumptions about the environment and can require that other systems are running and correctly configured. The classic example is a database. Unit tests should mock out the database connection and used canned data within the unit tests. Functional tests can connect to a real database pre-populated with test data. Typically functional tests also test much larger blocks of code than individual unit tests do.

Is PHP an enterprise language?

During the last nor(DEV): we tried something a little different: a Question Time style panel where the ‘audience’ asked a panel of guests a series of contentious questions. These were then debated by both the panel and the audience. While the questions may have been pre-prepared the debate that followed was not, which resulted in some interesting discussion.

One of the questions that interested me the most was regarding PHP and its status as an ‘Enterprise’ level language. With my recent experience with PHP I’ve found my attitudes to it have changed drastically. This has resulted in, for me, a startling conclusion.

Historically I’ve been a Java developer. While we know Java is an enterprise language (it says so on the tin), I didn’t start out that way. While I’ve dabbled in Ada (university) and C++ (early graphics programming for fun) my first commercial programming was done in Perl and TCL. I may have been at a large enterprise, but, by my own definition, I come from a Script Kiddie background. Perhaps I’m a little more open to scripting languages than others.

That said, I’ve always felt that scripting languages have somehow been less than ‘proper‘ languages like Java. Still, right tool for the job and all that, and given I’m not getting away from PHP anytime soon1 I decided to look at exactly why PHP wasn’t as ‘good’ as Java.

It’s not Object Oriented

Having done OO for a large part of my programming life I find it hard to use anything that doesn’t support objects. I suspect this is more a failing of the developer than functional programming, however, it’s a moot point as PHP is a fully OO language, I just needed to RTFM.

It’s not got a proper IDE

Around 1 or 2KLoc2 I find I’ve now got more objects than I can remember the interfaces for, which results in lots of bouncing between files to refresh my memory. This can get problematic when it breaks my train of thought. I’ve tried a number of editors and found Sublime Text was OK, but it wasn’t Eclipse. Thankfully I found PHPStorm, a proper IDE for PHP that works on a Mac. It shares a number of features with Eclipse (including being as buggy as hell) and has massively improved life when writing PHP. It costs money, but the investment is well worth it if you’re doing anything more than dabbling in PHP.

You can’t test it easily

Actually, there’s PHPUnit which I now prefer to JUnit. OK, so unit testing the code that actually renders your web pages isn’t so easy, but if you’ve got proper separation of responsibilities there should be minimal code before it hands everything off to your fully tested backend code.

It lacks ‘modern’ features

Such as…
PHP can handle closures so it’s actually more feature rich than Java in some respects. The way PHP handles method overloading is convoluted to say the least, but method overloading isn’t exactly modern and I suspect it stems from the way that PHP handles varargs (something else Java only got recently).

The code is hard to read

Yes, a lot of PHP is hard to read (go look at the WordPress codebase) but I don’t think PHP is to blame here. I spent the longest time trying to bludgeon PHP into looking like Java when I should have been bludgeoning my brain into thinking the PHP way – something I’ve finally done.

I think this last point is key when it comes to the problems with PHP. The barriers to entry with PHP are very low as you can throw together a usable web app with a UI very quickly, even as a novice developer. The result is a lot of bad PHP out there in the wild which lacks the design, testing and layout that us ‘proper’ developers would use with our ‘grownup’ languages. I’ll freely admit that some of that bad PHP belongs to me. Having matured as a PHP developer I’m hoping my next application will see me being a ‘grownup’ PHP developer.

This is a similar conclusion that that nor(DEV): panel came to. The question isn’t “is PHP3 an enterprise language?“, it’s “are your developers enterprise level developers?“.


1My hosting provider, while nice and cheap, are rather limited on what I can run, thus PHP for my personal projects.

2KLoC: Kilo-Lines of Code, or thousand lines of code.

3You can replace PHP here with a number of languages.

PDD

Development Strategies

The Development Strategy triangle.


Most [all?] discussions on Agile (or lean, or XP, or whatever the strategy de jour is currently) seem to use a sliding scale of “Agileness” with pure a Waterfall process on the left, a pure Agile process on the right. You then place teams somewhere along this axis with very few teams being truly pure Waterfall or pure Agile. I don’t buy this. I think it’s a triangle with Waterfall at one point, Agile at the second, and Panic Driven Development at the third. Teams live somewhere within this triangle.

So what is Panic Driven Development? Panic Driven Development, or PDD is the knee jerk reactions from the business to various external stimuli. There’s no planning process and detailed spec as per Waterfall, there’s no discreet chunks and costing as per Agile, there is just “Do It Now!” because “The sky is falling!“; or “All our competitors are doing it!“; or “It’ll make the company £1,000,0001; or purely “Because I’m the boss and I said so“. Teams high up the PDD axis will often lurch from disaster to disaster never really finishing anything as the Next Big Thing trumps everything else, but even the most Agile team will have some PDD in their lives, it happens every time there is a major production outage.

If you’re familiar with the Cynefin framework you’ll recognise PDD as living firmly in the chaotic space. As such, a certain amount of PDD in any organisation is absolutely fine – you could even argue it’s vital to handle unexpected emergencies – but beyond this PDD is very harmful to productivity, morale and code quality. Over a certain point it doesn’t matter if you’re Agile or Waterfall, the high levels of PDD mean you are probably going to fail.

Sadly, systemic PDD is often something that comes from The Business and it can be hard for the development team to push back and gain some order. If you find yourself in this situation you need to track all the unplanned incoming work and its affect on the work you should be doing and feed this data back to the business. Only when they see the harm that this sort of indecision is causing, and the effect on the bottom line, will they be able to change.


1 I have worked on quite a few “million pound” projects or deals. The common denominator is that all of them failed to produce the promised million, often by many orders of magnitude.


“PDD” originally appeared as part of Agile In The Real World and appears here in a revised and expanded form.


Three Bin Scrum

Allan Kelly blogged recently about using three backlogs with Scrum rather than the more traditional two. Given this is a format we currently use at Virgin Wines he asked if I would do a writeup of how it’s used so he could know more. I’ve already covered our setup in passing, but thought I would write it up in a little more detail and in the context of Allan’s blog.

Our agile adoption has gone from pure PDD, to Scrum-ish, to Kanban, to something vaguely akin to Scrumban taking the bits we liked from Scrum and Kanban. It works for us and our business, although we do regularly tweak it and improve it.

With Kanban we had a “Three-bin System“. The bin on the factory floor was the stuff the team was actively looking at, or about to look at; the bin in the factory store was a WIP limited set of issues to look at in the near future; and the bin at the supplier was everything else.

When we moved to our hybrid system we really didn’t want to replace our three bins, or backlogs with just a sprint backlog and product backlog because the product backlog would just be unworkable (as in 1072 issues sitting in it unworkable!). So we kept our three backlogs.

The Product Backlog

The Product Backlog (What Allan calls the Opportunity backlog, which is a much better name) is a dumping ground. Every minor bug, every business whim, every request is recorded and, unless it meets certain criteria, dumped in the Product Backlog. There’s 925 issues in the product backlog at the moment, a terrifyingly large number of those are bugs!

I can already hear people telling me that those aren’t really bugs or feature requests, how can they be, they’re not prioritised, therefore they’re not important. They’re bugs alright. Mostly to do with our internal call centre application or internal processes where there are workarounds. I would dearly love to get those bugs fixed, but this is the Real World and I have finite resources and a demanding business.

I am open and honest about the Product Backlog. An issue goes in there, it’s not coming out again without a business sponsor to champion it. It’s not on any “long term road map”. It’s buried. I am no longer thinking about it.

Our QA team act as the business sponsor for the bugs. Occasionally they’ll do a sweep and close any that have been fixed by other work, and if they get people complaining about a bug in the Product Backlog they’ll prioritise it.

The Product Backlog is too big to view in its entirety. We use other techniques, such as labels and heat maps to give an overview of whats in this backlog at a glance.

The Sprint backlog

Bad name, I know, but this equates to Allan’s Validated Backlog. This is the list of issues that could conceivably be picked up and put into the next sprint. The WIP limit for this backlog is roughly 4 x velocity which, with our week long sprints, puts it at about a months work deep.

To make it into the Sprint Backlog an issue must be costed, prioritised and have a business sponsor. Being in this backlog doesn’t guarantee that the work will get done, and certainly doesn’t guarantee it’ll get done within a month. It simply means it’s on our radar and has a reasonable chance of being completed. The more active the product sponsor, the higher that chance.

The Current Sprint

With a WIP limit of last weeks velocity, adjusted for things like holidays and the like, this forms the List Of Things We Hope To Do This Week. We don’t have “Sprint Failures“, so if an issue doesn’t make it all the way to the Completed column it simply gets dumped back into the Sprint Backlog at sprint completion. The majority of uncompleted issues will get picked up in the next sprint, but it’s entirely possible for something to make it all the way to the current sprint, not get worked on, then get demoted all the way back to the Product Backlog, possibly never to be heard from again.

Because issues that span sprints get put back in exactly the same place they were when the last sprint ended we end up with something that’s akin to punctuated kanban. It’s not quite the hard stop and reset that pure Scrum advocates, but it’s also not continuous flow.

The current sprint is not set in stone. While I discourage The Business from messing about with it once it’s started (something they’ve taken on board) we are able to react to events. Things can be dropped from the sprint, added to the sprint or re-costed. Developers who run out of work can help their colleagues, or go to the Sprint Backlog to pull some more work into the sprint. Even the sprint end date and the timing of the planning meeting and retrospective is movable if need be.

The Expedited Queue

There is a fourth backlog, the Expedited Queue. This is a pure Kanban setup with WIP limits of 0 on every column and should be empty at all times. QA failures and bugs requiring a patch get put in this queue and should be fixed ASAP by the first available developer. Story points are used to record how much work was spent in the Expedited Queue, but it’s not attributed to the sprints velocity. The logic here is that it’s taking velocity away from the sprint, as this is work caused by items that aren’t quite as “done” as we had hoped.

Programmed to fail

One aspect of programming that fascinates me is the psychology of software development. To study Agile is to study people and their interactions, and there are some interactions that seem incredibly hard to break.

Given how often software projects overrun it astounds me that the same patterns occur again and again. Developers seem very reluctant to admit they’re late and tend not to question deadlines until its clear the deadline is now ridiculous. There is a hope that, somehow, everything will fall into place and everything will work come delivery time. And yet experience tells us that things invariably go wrong – the hope is irrational and yet near universal.

Those managing the team can easily spot the signs that a project is drifting into trouble. Confident answers to progress become couched in qualifiers. Progress reports become terse, or filled with excuses. Progress reports may also start sounding very similar week on week. At this point the project sponsors should be alerted to the issue, but again people seem to cling onto this hope that everything will be OK. Reporting up is rarely done early enough or emphatically enough.

Reports of delays need to be emphatic and early as, the closer to release date you get, the less willing sponsors seem to be to accept delays. This may be due to hard and fast deadlines (e.g. shipping physical boxes of software), but I’ve seen it in teams where the deadline is nothing more than a notional line in the sand. Yes, moving that line may be problematic, but not nearly as problematic as putting poorly functioning software live, or missing the deadline with no contingency at all. Simply defining delays in the project as “unacceptable” doesn’t make them go away. Software development is an art, not a science and delivery dates should be treated as malleable until the software is actually delivered.

With all three levels refusing to face facts its little wonder that we have the issues in software development that we do. Agile helps us by giving us tools to counter these issues, but until people can get out of the Big Project Mentality the psychology of large deadlines in the distant future becoming looming deadlines in the very near future will prevail.

Norfolk Developers (NorDev)

I’ve got to admit, when Paul Grenyer from Naked Element approached me a couple of weeks ago about setting up a new group specifically for developers I was a bit sceptical. Leaving aside my concerns about being available on a regular basis to run the group – which was easily solved by sharing that with Paul and Ben Taylor from Validus – there was also the question of how many people would be interested.

I was wrong to be concerned; there are loads of you out there. At barely a week old we currently boast 64 members (which as a geek makes me smile), and 23 of those have said they’re coming to our first event. An event we haven’t even named or confirmed any speakers for!

What I do know is that it’s on the 26th of June, will be held at the Virgin Wines offices since they’ve kindly sponsored the group, and there will almost certainly be cake and some wine to try. No doubt there will be some great speakers, and some interesting conversations sparked from it. Hopefully I’ll see you there.

Logging

A personal bugbear of mine is developers not being able to write clear, effective logging. This seemingly trivial task appears to cause a great number of developers no end of problems, and yet it shouldn’t necessarily be that hard. Why is it then, that when I go to interrogate a log file I have to trawl though kilobytes (or worse) of meaningless rubbish to determine that:

24/04/2013 09:26:19 [ERROR] [Batch941-Thread5]: Unhandled exception has occurred null

That’s a real error message from a system I work on, and that’s all it had to say about the matter. I despair.

There’s a few basic things you can consider that will make you logging a lot more effective.

Use the appropriate log level

Fatal should be reserved for when an application, or a thread, is about to be terminated in a fashion that really isn’t expected. Errors should indicate something recoverable [at an application level] that’s gone wrong that wasn’t expected. The vast majority of fatal and error log messages are really warnings, that is messages indicating that an error has occurred but we’ve been able to carry on. Any occurrences of a fatal or error level message in your logs should have attendant bug reports with either code or configuration fixes to remove those errors. Informational messages should relate to things that people will care about day-to-day, or as additional log output for an initial higher level log output. Everything else is a debug message and will generally be turned off in production systems.

Provide the right level of information

Logs are text, are often large (I’m looking at 2 production log files that are in excess of 12Mb) and are often going to be parsed with simple tools. If I use grep ERROR *.log I should be able to see each error, and enough information about that error to give me a high level overview of what is happening. More diagnostic information can follow the initial error at lower logging levels. There should be enough information following the error that someone reading the log file with default production settings can diagnose the issue, but not so much that they’re drowning in log lines.

Format the messages correctly

Be mindful that when you concatenate strings you may need spaces and other delimiters between output. When you split your output over multiple lines those lines may not be seen on an initial parse of the file. Also, be mindful of how values are going to be displayed. With Java the default toString method on an object isn’t the most useful of outputs in a log file. In contrast, some objects are verbose in the extreme and may break the formatting of your error message by spewing a multiline string onto your single line error message.

Some real world examples

I regularly check our production log files for a number of reasons and find myself facing such doozies as:

30/04/2013 08:45:41 [ERROR] [[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)']: Error

The kicker here is that this could be one of a number of errors. If I see this twice in a log file I have no way of knowing if it’s the same error twice, or two different errors. The error message is badly formatted with the information [hundreds of lines of it] on the next line. Sadly I see this more than a couple of times a day and, as it’s a third party bit of code that’s responsible, theres’ not much I can do about it.

30/04/2013 15:29:28 [INFO ] [[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)']: Email transport connection established
30/04/2013 15:29:29 [INFO ] [[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)']: Email transport connection closed

126 occurrences in todays log files. This needs to be a debug message and an error output if it fails to establish or close the email transport connection. Ironically enough, as I was digging into the code to fix this I discovered that when it does go wrong it reports the error 3 times in 3 different places, resulting in 3 error lines in the logs. Worse still 2 of these lines only state “Error sending email” with no other information other than a stack trace from the exception. That’s three slightly different stack traces, two useless error lines and one useful error line for 1 error which could easily add 15 minutes to the diagnosis time while the developer tries to work out what’s going on.

01:17:45 [ERROR] [[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)']: Failed to create URI Object

Well over 1000 occurrences today alone! My grepfu is reasonable so it I altered it to show the line after that. Turns that someone is trying something nasty with some of our search URLs, but that wasn’t immediately obvious from the error, instead I was resorting to the stack trace. Not only could the error be improved, but we can also downgrade this to a warning. Looking at the code this is only going to happen in two cases: either someone has put a bug in the code, in which case we’ll see thousands of identical warnings and do something about it; someone is trying something nasty when we’d see hundreds of variations on the warning which we can investigate and then improve or ignore depending on the outcome of the investigation. Bug raised to fix both the handling of dodgy URLs and the logging level.

I’m sure I could dig out loads more and thats just with a cursory glance of the files. Logging is an important diagnostic tool, but it’s only as good as you make it.