Saturday, September 20, 2008

[Mini] Deployment: September 20, 2008

This was actually just a simple database script, to update one of our data tables. No structural changes, just data. But someone decided to treat it like a full deployment anyway, complete with the conference call and everything. So…

01:30–01:37 (7 minutes): We all logged onto the conference bridge, and confirmed that we were ready to go. It took a bit longer than we’d expected, because we weren’t sure if one particular person was going to join or not.

01:37–01:37 (about 30 seconds): We backed up the database.

01:37–01:41 (4 minutes): We executed the DB scripts.

01:41–01:48 (7 minutes): We did our Sanity Test—mostly just checking the logs that the scripts worked fine, as well as a quick glance at the application, to verify that the data was showing up correctly.

01:43–01:53 (10 minutes): The client performed their Landing Test. (Further tests that the data was showing up correctly in the application…) We let them start before the Sanity was done, in this case, since all indications were that things had worked fine, and we were just waiting on an email to show up in my Inbox to verify the scripts.

Overall, we were done ahead of schedule, and everything worked well. Total deployment: 01:30–01:53 (23 minutes).

Thursday, September 11, 2008

Deployment: September 11, 2008

This was a bug fix deployment, to fix the issue discovered with the previous deployment.

01:30–01:34 (4 minutes): We all logged onto the conference bridge, and confirmed that we were ready to go.

01:34–02:01 (27 minutes): We backed up the database, and executed the database scripts. This took longer than usual, because the DBA accidentally ran the scripts on the wrong database, the first time.

01:34–01:56 (22 minutes): We un-deployed the old version of the application, and shut down the application servers.

02:06–02:19 (13 minutes): We deployed the new version(s) of the application(s). (Some were actually still deployed from the last time, and just had to be re-activated, while others—the ones with bug fixes—actually had to be redeployed.)

02:19–03:04 (45 minutes): We conducted the Sanity Test. During Sanity, we discovered an issue unrelated to our deployment—one that had been part of the application for the previous four months.

03:00–05:23 (2 hours and 23 minutes): The client did their Landing Tests. (This was overlapped with the Sanity Test a bit.) Some additional issues were discovered, but they were all deemed minor enough that we didn’t need to back out. (The most serious one turned out to not be a bug with our application, but with a back-end system that we depend on. So the bug fix will be for them, not for us.)

Overall deployment: 01:30–05:23 (3 hours and 53 minutes).

Deployment: September 8, 2008

No timeline with this one.

There had been an issue with the previous deployment. Every once in a while, the CPU on the application servers was getting up close to 100%, and the Support Team had to reboot them to keep the application alive.

Unfortunately, rolling back wasn’t working; we tried un-deploying the application, and re-deploying the old version, but the new version seemed to be cached on the application servers. So we had to call the application server vendor, and get help. (It turned out we had to undeploy, remove the temp directory, and then re-deploy the old version.)

Sunday, September 7, 2008

Deployment: September 7, 2008

This was a slightly more intricate deployment than usual, because there were a few moving parts. We had not one but two sets of database changes to go through, plus three separate applications to deploy and/or re-deploy on the application servers. Plus, because of our new “global delivery model,” we had a DBA in India who would be making our database changes, which was new. So I was hoping it wouldn’t be a bad deployment, but I was a bit worried.

00:00: As I was getting ready for the deployment, I logged into my computer, and noticed an email from the DBA in India, dated twenty-four hours prior to the deployment, saying that he wasn’t able to join the bridge, but would I please call his cell phone when we were ready for him to start. Not a good sign. I looked in the database, and found that the changes had already been executed (presumably twenty-four hours earlier than they were supposed to be done). This could have been very bad, but we got lucky, and it turns out that the database changes didn’t break the existing application; the old version of the app was able to run for the last twenty-four hours with the database changes.

01:30–01:50 (20 minutes): We all logged onto the conference bridge, and confirmed that we were ready to go. This took much longer than usual, because we had to confirm with the DBA what had actually been done, and what he hadn’t. It turned out to be even more confusing than I’d thought: out of the two DB changes we needed, one was partially done, and one was not done at all.

01:50–01:52 (2 minutes): We shut down the application servers.

01:52–02:21 (29 minutes): We had the DBA finish the database changes.

02:21–02:24 (3 minutes): We brought back up the application servers

02:24–02:42 (18 minutes): We deployed the updated and new applications.

02:42–05:00 (2 hours 18 minutes): We did our Sanity Testing. We found an issue with one piece of functionality, and determined that it might have been due to a back-end system, not ours. We spent some time troubleshooting it, and ended up not being sure which system was the culprit.

03:40–05:00 (1 hour 20 minutes): The client did their Landing Tests. This was overlapped with the troubleshooting we did, for the functionality that wasn’t working.

05:00–05:30 (30 minutes): One of the back-end systems that we depend on went down for scheduled maintenance, and we had to wait for it to come back up before we could go back to our testing.

05:30–06:20 (50 minutes): We finished testing, and then called it a wrap. There was still an oustanding issue—the one we weren’t able to troubleshoot—but it wasn’t a show-stopper.

Overall deployment: 01:30–06:20 (4 hours and 50 minutes).