Sunday, September 7, 2008

Deployment: September 7, 2008

This was a slightly more intricate deployment than usual, because there were a few moving parts. We had not one but two sets of database changes to go through, plus three separate applications to deploy and/or re-deploy on the application servers. Plus, because of our new “global delivery model,” we had a DBA in India who would be making our database changes, which was new. So I was hoping it wouldn’t be a bad deployment, but I was a bit worried.

00:00: As I was getting ready for the deployment, I logged into my computer, and noticed an email from the DBA in India, dated twenty-four hours prior to the deployment, saying that he wasn’t able to join the bridge, but would I please call his cell phone when we were ready for him to start. Not a good sign. I looked in the database, and found that the changes had already been executed (presumably twenty-four hours earlier than they were supposed to be done). This could have been very bad, but we got lucky, and it turns out that the database changes didn’t break the existing application; the old version of the app was able to run for the last twenty-four hours with the database changes.

01:30–01:50 (20 minutes): We all logged onto the conference bridge, and confirmed that we were ready to go. This took much longer than usual, because we had to confirm with the DBA what had actually been done, and what he hadn’t. It turned out to be even more confusing than I’d thought: out of the two DB changes we needed, one was partially done, and one was not done at all.

01:50–01:52 (2 minutes): We shut down the application servers.

01:52–02:21 (29 minutes): We had the DBA finish the database changes.

02:21–02:24 (3 minutes): We brought back up the application servers

02:24–02:42 (18 minutes): We deployed the updated and new applications.

02:42–05:00 (2 hours 18 minutes): We did our Sanity Testing. We found an issue with one piece of functionality, and determined that it might have been due to a back-end system, not ours. We spent some time troubleshooting it, and ended up not being sure which system was the culprit.

03:40–05:00 (1 hour 20 minutes): The client did their Landing Tests. This was overlapped with the troubleshooting we did, for the functionality that wasn’t working.

05:00–05:30 (30 minutes): One of the back-end systems that we depend on went down for scheduled maintenance, and we had to wait for it to come back up before we could go back to our testing.

05:30–06:20 (50 minutes): We finished testing, and then called it a wrap. There was still an oustanding issue—the one we weren’t able to troubleshoot—but it wasn’t a show-stopper.

Overall deployment: 01:30–06:20 (4 hours and 50 minutes).

No comments: