Sunday, March 22, 2009

Deployment: March 21, 2009

This should have been a fairly typical deployment. We were putting in a new J2EE application, with a database change. The minor difference between this one and my normal deployments is that this wasn’t an update, it was a brand new J2EE application; so if we screwed up, and it wasn’t properly deployed on time, we wouldn’t have to roll back to a previous version.

22:00–22:01 (1 minute): We all logged onto the conference bridge, and confirmed that we were ready to go.

22:01–23:33 (1 hour and 32 minutes): The DBA ran the database scripts. There were some errors, so this took a lot longer than expected, but she managed to work through the problems and get everything done.

22:01–22:34 (33 minutes): We configured the J2EE cluster (connection pools, security, etc.). We then had to wait for the database scripts to be finished, to move on to the next part of the deployment.

23:33–02:30 (2 hours, 57 minutes): We deployed the J2EE application. Again, there were difficulties, which were hard to troubleshoot, because they had to do with some internal workings of our vendor’s software, rather than our own code. We were getting errors between the vendor’s software and their database objects. After troubleshooting as much as we could, we called it a night.

Overall deployment: 22:00–02:30 (4 hours and 30 minutes). We left the bridge knowing that there was a good chance we’d need to rebuild the environment from scratch; it was quite possibly a configuration issue, but the likelihood of being able to track it down was slim. We tentatively planned to retry the next night, schedules (and permissions) pending.

Sanity/Landing Tests Defined/Explained

I use the terms Sanity Test and Landing Test a lot, so I should define what I mean by them—others might use these terms in different ways, but this is how I and my team use them.

When we’re doing a deployment, there is the technical team, which is doing the actual work of the deployment, and troubleshooting any technical issues, and there are the business users, who want to make sure that the application meets their standards before they hand it over to the end users.

The Sanity Tests are done by the technical team. Once we’ve deployed the application, and everything is up and running, we quickly run through it to make sure that it really is up and running. As the name implies, this really is just a quick sanity check, to make sure that everything is up; it’s not meant to be exhaustive.

The Landing Tests are done by the business users. Once the technical team has given the go ahead that the application is up and running (as verified by sanity testing), the business users go ahead and verify that the application is working according to their specifications. As with the sanity tests, landing tests are not meant to be exhaustive; the exhaustive testing should have been done long before the deployment! (We have a phase of testing called User Acceptance Testing, or UAT, which is where the exhaustive testing should have been done.) The purpose of the landing tests is for the business to ensure that everything is running, and that’s it. If a bug is discovered, it must be one of two things:

  1. An environmental issue, where the production environment is different than the other environments and it’s causing a problem, or
  2. A gap in the UAT, where something wasn’t tested properly
We do our best to plan for the former. We get very annoyed by the latter, but to err is human.

If either the sanity tests or the landing tests fail, we do our best to try and fix the problems right then and there, but that’s not always possible—especially at 2:00 in the morning. So when there are issues, it’s up to the business to decide if these issues are “show stoppers,” meaning we have to abort the deployment and roll back to the previous version of the application, or whether the issues are minor enough that they can live with them.

Thursday, March 19, 2009

Deployment: March 19, 2009

Not a complex deployment, this time. A simple J2EE EAR file to be updated, with no database scripts to worry about or configuration changes to make.

01:30–01:43 (13 minutes): We all logged onto the conference bridge, and confirmed that we were ready to go. It took longer than it was supposed to; some people joined late.

01:43–02:02 (19 minutes): We un-deployed the old version of the EAR file, and deployed the new version.

02:02–02:08 (6 minutes): We did our Sanity Testing, and ensured that the app came back online successfully.

02:08–02:16 (8 minutes): We waited for the business to join the conference call, to do their Landing Tests; we were a bit ahead of schedule, so they hadn’t been expecting to join the call this early.

02:16–02:25 (9 minutes): The client did their Landing Testing, and signed off that everything was working as it was supposed to.

Overall deployment: 01:30–02:25 (55 minutes).