Tag Archives: problems

Eclipse, OSX and JDK 1.7

Despite being a massive Mac fanboi I am the first to admit that as soon as you start going a little off piste with OSX you run into problems that require technical knowledge to fix. Java development on the Mac falls into the category of off piste and it has always been more than a little fun getting things set up.

Now that Oracle are providing the JDK it seems that things no longer live quite where they do which left me scratching my head when trying to get Eclipse working with JDK 1.7.

Installing JDK 1.7 is easy, go to the Oracle download page, grab the 64bit OSX DMG, open, run, job done.

$ java -version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)

Now to tell Eclipse where the JDK is:

$ ls -l `which java`
lrwxr-xr-x  1 root  wheel  74 24 Oct 15:37 /usr/bin/java -> /System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands/java

Great… except Eclipse doesn’t recognise /System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands/ or /System/Library/Frameworks/JavaVM.framework/Versions/Current/ as a valid JDK location.

A bit of Googling I discovered the magic java_home command.

$ /usr/libexec/java_home -v 1.7
/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home

Giving that directory to Eclipse made it happy and I’m now able to use an up to date version of Java for my code.

iOS 7 Music Problems

So, like pretty much every other fanboi out there, I now have iOS7 installed on my iPad and iPhone. In the main I’ve been quite impressed. I’m still running an iPhone 4S and I was worried it would struggle. Two things I did notice though were music playback and battery life were both shocking. I’ve had this problem in the past when the iPhone 4 came out. I was on an iPhone 3 and the latest version of iOS struggled to play music without stuttering. I had joked that Apple deliberately caused older hardware to do this to force upgrades, and then duly went and got an iPhone 4 which solved any speed issues. This time round I wasn’t so sure it was hardware related. For one thing, every time I opened the music app there was network access, and music wasn’t stuttering, it was just stopping, or refusing to play.

Googling the situation wasn’t helpful. The internet is rife with stories about iOS7, the new music player and people having unrelated problems, meaning my searches were brining up useless news articles and forum posts. To that end I’m going to describe the problems I had so maybe others migh find this and get a solution.

The first problem was music would just stop. Press play again and nothing would happen, or maybe it would play a second or two, and then stop.

Next up was the bizarre behaviour of me pressing ‘next’ and seeing my iPhone keep skipping tracks. It was almost as if it was considering the track, and then discounting it, moving onto the next one. Sometimes it would skip a number of tracks before finally deciding it would play one.

Then there was the issue of near constant network access when the music app was open, and really poor battery life, probably because of the network access.

Lastly there were some odd songs on my iPhone. I rate all my music and have rules that put 5*, 4* and a random selection of 3* tracks onto my phone. This does mean that each time I sync I get a slightly different selection of tunes, but I was sure I’d set some of these tracks not to sync.

It turns out the explanation, and solution was very simple. The problems seemed to occur when I had limited network access so I wondered if iOS7 was doing anything funky with the music app and phoning home. A quick check of the Music app preferences yielded:

image

Seems my phone was now trying to play music I’d bought off the iTunes Music Store, but that wasn’t on my phone. A quick change of settings to:

image

And the number of tracks on my phone dropped by 1,000, those that were left played instantly and the problems all went away.

Issues with GlassFish on OSX

I’ve been trying to get GlassFish 4 to install on my laptop on and off for the past two days now. Needless to say, it’s not being going well. Initially I tried the native install (export DISPLAY=localhost:0.0 btw 🙂 ), it hung trying to configure the domain. Trying to configure a new domain from the command line yielded:

./asadmin create-domain --adminport 4848 --instanceport 8000 domain2
You do not have permission to use port 4848 for domain2. Try a different port number or login to a more privileged account.

Nothing on port 4848 and even running the command as sudo root didn’t work. Next up, the zip install. The comes with a preconfigured domain and a new error trying to start it:

./asadmin start-domain domain1
There is a process already using the admin port of 4848 -- it probably is another instance of a GlassFish Server

Resorting to Google and I finally worked out that my hostname wasn’t in /etc/hosts. Being on the work network I’ve been assigned a hostname by the DHCP server rather than the sulaco.local it usually is. A quick google for setting the hostname on a Mac got me [this][http://blog.psyrendust.com/2008/05/23/change-the-hostname-in-mac-os-x-osx/] and I was able to run

sudo scutil --set HostName sulaco.local
./asadmin start-domain domain1

Lo and behold, GlassFish has started.

PDD

Development Strategies

The Development Strategy triangle.


Most [all?] discussions on Agile (or lean, or XP, or whatever the strategy de jour is currently) seem to use a sliding scale of “Agileness” with pure a Waterfall process on the left, a pure Agile process on the right. You then place teams somewhere along this axis with very few teams being truly pure Waterfall or pure Agile. I don’t buy this. I think it’s a triangle with Waterfall at one point, Agile at the second, and Panic Driven Development at the third. Teams live somewhere within this triangle.

So what is Panic Driven Development? Panic Driven Development, or PDD is the knee jerk reactions from the business to various external stimuli. There’s no planning process and detailed spec as per Waterfall, there’s no discreet chunks and costing as per Agile, there is just “Do It Now!” because “The sky is falling!“; or “All our competitors are doing it!“; or “It’ll make the company £1,000,0001; or purely “Because I’m the boss and I said so“. Teams high up the PDD axis will often lurch from disaster to disaster never really finishing anything as the Next Big Thing trumps everything else, but even the most Agile team will have some PDD in their lives, it happens every time there is a major production outage.

If you’re familiar with the Cynefin framework you’ll recognise PDD as living firmly in the chaotic space. As such, a certain amount of PDD in any organisation is absolutely fine – you could even argue it’s vital to handle unexpected emergencies – but beyond this PDD is very harmful to productivity, morale and code quality. Over a certain point it doesn’t matter if you’re Agile or Waterfall, the high levels of PDD mean you are probably going to fail.

Sadly, systemic PDD is often something that comes from The Business and it can be hard for the development team to push back and gain some order. If you find yourself in this situation you need to track all the unplanned incoming work and its affect on the work you should be doing and feed this data back to the business. Only when they see the harm that this sort of indecision is causing, and the effect on the bottom line, will they be able to change.


1 I have worked on quite a few “million pound” projects or deals. The common denominator is that all of them failed to produce the promised million, often by many orders of magnitude.


“PDD” originally appeared as part of Agile In The Real World and appears here in a revised and expanded form.


Logging

A personal bugbear of mine is developers not being able to write clear, effective logging. This seemingly trivial task appears to cause a great number of developers no end of problems, and yet it shouldn’t necessarily be that hard. Why is it then, that when I go to interrogate a log file I have to trawl though kilobytes (or worse) of meaningless rubbish to determine that:

24/04/2013 09:26:19 [ERROR] [Batch941-Thread5]: Unhandled exception has occurred null

That’s a real error message from a system I work on, and that’s all it had to say about the matter. I despair.

There’s a few basic things you can consider that will make you logging a lot more effective.

Use the appropriate log level

Fatal should be reserved for when an application, or a thread, is about to be terminated in a fashion that really isn’t expected. Errors should indicate something recoverable [at an application level] that’s gone wrong that wasn’t expected. The vast majority of fatal and error log messages are really warnings, that is messages indicating that an error has occurred but we’ve been able to carry on. Any occurrences of a fatal or error level message in your logs should have attendant bug reports with either code or configuration fixes to remove those errors. Informational messages should relate to things that people will care about day-to-day, or as additional log output for an initial higher level log output. Everything else is a debug message and will generally be turned off in production systems.

Provide the right level of information

Logs are text, are often large (I’m looking at 2 production log files that are in excess of 12Mb) and are often going to be parsed with simple tools. If I use grep ERROR *.log I should be able to see each error, and enough information about that error to give me a high level overview of what is happening. More diagnostic information can follow the initial error at lower logging levels. There should be enough information following the error that someone reading the log file with default production settings can diagnose the issue, but not so much that they’re drowning in log lines.

Format the messages correctly

Be mindful that when you concatenate strings you may need spaces and other delimiters between output. When you split your output over multiple lines those lines may not be seen on an initial parse of the file. Also, be mindful of how values are going to be displayed. With Java the default toString method on an object isn’t the most useful of outputs in a log file. In contrast, some objects are verbose in the extreme and may break the formatting of your error message by spewing a multiline string onto your single line error message.

Some real world examples

I regularly check our production log files for a number of reasons and find myself facing such doozies as:

30/04/2013 08:45:41 [ERROR] [[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)']: Error

The kicker here is that this could be one of a number of errors. If I see this twice in a log file I have no way of knowing if it’s the same error twice, or two different errors. The error message is badly formatted with the information [hundreds of lines of it] on the next line. Sadly I see this more than a couple of times a day and, as it’s a third party bit of code that’s responsible, theres’ not much I can do about it.

30/04/2013 15:29:28 [INFO ] [[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)']: Email transport connection established
30/04/2013 15:29:29 [INFO ] [[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)']: Email transport connection closed

126 occurrences in todays log files. This needs to be a debug message and an error output if it fails to establish or close the email transport connection. Ironically enough, as I was digging into the code to fix this I discovered that when it does go wrong it reports the error 3 times in 3 different places, resulting in 3 error lines in the logs. Worse still 2 of these lines only state “Error sending email” with no other information other than a stack trace from the exception. That’s three slightly different stack traces, two useless error lines and one useful error line for 1 error which could easily add 15 minutes to the diagnosis time while the developer tries to work out what’s going on.

01:17:45 [ERROR] [[ACTIVE] ExecuteThread: '7' for queue: 'weblogic.kernel.Default (self-tuning)']: Failed to create URI Object

Well over 1000 occurrences today alone! My grepfu is reasonable so it I altered it to show the line after that. Turns that someone is trying something nasty with some of our search URLs, but that wasn’t immediately obvious from the error, instead I was resorting to the stack trace. Not only could the error be improved, but we can also downgrade this to a warning. Looking at the code this is only going to happen in two cases: either someone has put a bug in the code, in which case we’ll see thousands of identical warnings and do something about it; someone is trying something nasty when we’d see hundreds of variations on the warning which we can investigate and then improve or ignore depending on the outcome of the investigation. Bug raised to fix both the handling of dodgy URLs and the logging level.

I’m sure I could dig out loads more and thats just with a cursory glance of the files. Logging is an important diagnostic tool, but it’s only as good as you make it.

Overlooking Social Channels

We recently suffered an 8 hour outage from our payment provider. The most frustrating thing about this outage was the complete lack of information from the payment provider about the problem, or indeed the lack of any communication whatsoever. Yesterday we got reports from our front office staff that they were having problems with payments again. A quick check of the logs confirmed that, yes, there was a problem somewhere. Given the nature of the issue it was likely to be a problem with out payment provider but we needed to be sure. We approached getting this information in two ways.

My boss took the traditional approach, contacting the account manager to see what light they could shed on the problem. Net result: there may be a problem, further information would be forthcoming in 30 minutes after a meeting their side.

Given the informational black hole from the last outage I took a slightly tangential approach; Twitter. In seconds I was able to confirm that others were seeing the same problem and it has started at least 3 minutes ago. Two minutes after that, and only 5 minutes after the outage started I had the entire company either on, or preparing to enter a BCP stance. Part of this involved speaking to our social media team because they needed to be poised to inform customers and handle customer queries and complains.

25 minutes later the outage ended. Again I was able to confirm that there were no intermittent problems through a combination of our logs, talking to our staff and responses from people on Twitter. We still hadn’t been called back by the account manager, there was still no official communication about the outage. As far as I’m aware, some 24 hours on, there is still no official acknowledgment.

These days companies, especially large ones, need to understand that they have a social media presence, even if it’s not official. Search for our payment provider during an outage and the torrent of negative opinion and pleas for information are abundant. In this case the presence of, and silence of the official Twitter account only fuelled this frustration. People expect frequent and honest updates, especially when it’s something as important as a payment provider. BCP should include informing customers of the outage, the extent, estimated duration and any other pertinent information. Even if it is “We are aware of issues with the payment gateway. Engineers are looking into it, update to follow in 10 minutes“. Not wanting to say anything for fear of negative reaction is pointless. The negative reaction is already out there. How you present the information is also critical. Use of the word “intermittent” for a problem that is affecting 99 out of 100 transactions, while technically accurate, is clouding the situation. “We are suffering from intermittent problems” in this case sounds like spin which sticks out like a sore thumb in a sea of negative statements.

Effective management of the various social media channels is something that is overlooked by ‘traditional’ far too often.

Tumbler and low level stories

I’ve run into a bit of a brick wall with Tumbler in so far as I think I’m using it at far too low a level. I’ve got a fairly simple object at the moment, little more than a bean, which I’m using as a working example to try these new techniques out. While my first story and group of scenarios were easy to write I started running into issues with the second group. The issues are twofold. Firstly I’m having to learn to rethink how I group my tests to fit into stories and scenarios. While working this through I started butting into problems with long class and method names which I can’t really shorten as there will, eventually, be literally hundreds of tests and I need to be able to distinguish between them.

After fiddling about with different ways of framing the stories and scenarios I discovered that its annotations aren’t picked up by the JUnit plugin for Eclipse so I can’t rely on them to make readable tests, I have to use readable class and method names.

Then, I discovered that Tumbler isn’t hierarchical. Stories are listed on the index page, then you can drill down into a story and see it’s scenarios. That’s it. If I had 100 stories I’d have to wade through all of them on the front page. What I need is epics.

This all rather makes sense for a tool that’s going to be used at a higher level, detailing 20 or 30 user stories that constitute an application, but I want the ability to test at different levels. After all, as I understand it, BDD can be considered fractal in its nature and is as easily applied to a users interaction with a save dialog box as it is to the save method call on some object somewhere. Yes, the players change, and yes the granularity and precision of the inputs and outputs change, but it’s the same fundamental thought process when developing the tests.

In order to shed some light on the issue I tool a look at the Tumbler source, specifically the test cases, but they were all at the user level. Tumbler itself isn’t that complicated a program so it may be that these user stories suffice for testing the majority of the code, but I want to know at an object level that they do what they say on the tin.

Sadly, the majority of this discovery has been performed on the train with its connectivity issues so performing research into alternative tools is proving hard. That said, given Tumbler isn’t massively complex I may just put my current project to one side, fork that and get it to work at both the low and high levels. In the mean time it seems like I need to do more research.

Maven and Eclipse

Yesterday’s issue with Maven turned out to be a little more severe than just not having an Internet connection. Not only could Eclipse not create my new project, it couldn’t build an existing one. This problem persisted even with an Internet connection. From the command line everything worked though.

After fiddling about with a few things I tried a software update for Eclipse. It failed updating GWT. The last time I used eclipse on my laptop I was buggering about with Google Web Toolkit and Maven. Given the failure to update maybe I’d broken something. I uninstalled GWT. No joy.

Finally I stumbled across something on Google that suggested I blow away a large chunk of my local Maven repository, rebuild clean from the command line, refresh the Eclipse project and run Maven -> Update from within the project. That worked.

As far as I can work out I had newer version of the jars in my repository than Eclipse wanted and, for whatever reason, it wasn’t downloading the versions I needed. The command line was happy with the version I had. By deleting and updating it obviously downloaded versions that everyone was happy with. And people wonder why I like Ant.

The Beginning Is A Very Difficult Time

This morning was meant to see my start development of one of my project ideas. Maven had other ideas. I’ve been [ab]using Ant for longer than I can remember and would consider myself to be an advanced hacker. The build system at work will automatically switch between building JARs, WARs and EARs based on project structure; a project structure which is based on Maven. It’s not exactly an elegant build system, woe betide the poor bastard that inherits it from me, but it works and adding new projects is very simple.

The use of the Maven project structure stems from a number of years ago where the team I was working with decided that Maven was far too hideously complex to consider using, but had some sensible ideas of project layout. I’ve maintained this view until recently when I decided that perhaps there is a better option than thousands of lines of hand crafted XML to handle our builds. This coincided with me using more and more tools that are build using Maven prompting a decision to start learning how to use it. Since you learn best by doing I began to use Maven for any new projects.

This is the fourth time I’ve used Maven in a project from scratch, or at least it would be had I not been tripped up by something that keeps tripping me up: Maven often requires Internet access; something which, at this moment in time, I don’t really have… and certainly not to the level that I suspect Maven is going to need as it runs off and downloads all the dependencies I’m going to want.

Still, all is not lost. I have a name for the project (something that can often take me days to think up), I know what archetype I want to use and I’ve done a little research into a couple of things I’m going to include in the project just so I can have a go at using them. I’ll create the project over lunch and begin coding this afternoon