Monday, September 24, 2007

Deployment: September 2007 (Take Two)

This was a second attempt on the deployment written about earlier. Since we were convinced we’d fixed the problems discovered during the first deployment, we decided to tackle this attempt exactly the same way.

Of course, there was one major difference, this time: I decided to take this one from home. On the previous attempt, they’d shut down the power in the building, forcing me to come home; this time, they were going to shut down the phones and the network. But luckily, I knew ahead of time, this time, so I was able to plan ahead to come home, instead of being forced halfway through.

10:00PM: We’re scheduled to start at 10:00PM, but there’s an accident on the 401, which prevents me from getting home in time for the beginning of the deployment. Luckily, I really only have to log on for a couple of minutes, verify that everything is going smoothly, and then log back off until midnight. So I log onto the conference bridge from my car, on my cell. The “database split team” begins their work—at this point, they’re backing up the database, in case of rollback—and I log back off again. I have just walked into my house, at this point.

10:00–12:00: I start watching another movie, to kill the time. The Spy Who Loved Me, this time. (Still on my 007 kick.) I also make some tapioca pudding, since I’m home, and spend some time preparing my “work environment”—setting up a phone with a long cable (since I don’t know how long my cordless phone batteries will last), getting a couple of cordless phones handy (since I don’t like my “corded” phone), and preparing some things to drink.

12:00AM: I log back on. The backup was completed successfully. We shut down the Application Servers for “Application 1”, so that they can proceed with the database split. And then I log back off the bridge, since there won’t be any further activity until 12:45. (For a reminder of what “Application 1” and “Application 2” are, see the post from the first attempt.)

12:05–12:45: More movie.

12:45: I log back onto the bridge.

12:45–12:55: We sit around on the bridge, wondering where in the world everyone is. We then decide to proceed without them, for the time being—the clock is ticking, after all.

12:55: We shut down the App Servers for “Application 2”.

1:00–1:01: We back up the database, for “Application 2”.

1:01: After the quickest backup in history, we begin the database changes for “Application 2”. (We did double check, of course, to make sure the backup was successful; when a backup is that quick, you have to wonder if it really backed up at all…)

1:05: We’re informed that half of the members of the client team won’t be showing up. Yes, you read that right: They’re just not coming. (Since we’d already done it once, I guess they got bored with the whole thing…) We’re told, of course, that we can call them, if anything goes wrong. (How magnanimous.)

1:10: The database changes are finished. We bring the App Servers for “Application 2” back online, so that the deployment of that application can begin.

1:10–1:35: We deploy “Application 2”.

1:35–1:40: We conduct our Sanity Test. Our testing is positive, which means that we’ve fixed the first issue we had problems with, on the last deployment. (Phew!) The client also jumps on the application, to start testing, but I give him a verbal slap on the wrist for it—I’d prefer us to finish our testing, before handing it over to the client. After my testing is done—which only takes a couple of minutes anyway—I give the client the go ahead to do his testing.

1:40: At this point, we’re ready to begin the deployment for “Application 1”, but, fortunately and unfortunately, we’re about an hour ahead of schedule. It’s fortunate because it means there is a chance of getting to bed early; it’s unfortunate because the people we need for the next phase of the deployment aren’t on the bridge—in fact, they’re probably sleeping, since they aren’t expecting to be needed, yet.

1:40–1:45: We call them on their home numbers, but they sleep through the calls. This is a fairly normal event, for late-night deployments; the human body is used to being asleep, at this time. So we take it in stride, and simply keep trying, maintaining our good humour. We finally get hold of them, and they join the bridge.

1:45–2:05: We deploy “Application 1”.

2:05–2:25: We do our Sanity Testing. This is a bit longer than we usually take, for this particular application—we can usually run through it in 10–15 minutes—but I think some of us are being extra careful. The good news, though, is that everything is working fine—meaning that the second problem we’d had, last week, is also fixed.

2:25–3:40: The client does their Landing Testing, and everything goes well. This is always the nerve-wracking part for me; there is so much chatter on the bridge, and every time someone asks a question—“Is it supposed to work like this?” or “How come this page is taking so long to load?”—I get nervous that we’ll have a bug that’s a “show stopper”. (In other words, a bug that’s serious enough to force us to back out the application.) I guess this is where I earn part of my salary, though; being able to think on my feet in the wee hours of the morning, answering questions, and deciding when a problem is serious, and when it is just a transient blip. (e.g. a link in the application that appears not to work, but it turns out it’s because the link is pointing to another application that’s also deploying right now.) You don’t want to be wrong, and assume something is transient if it’s not, or else you’ll spend the next week fielding calls from the Help Desk, with your cell phone glued to your ear. On this morning, though, we don’t have any bugs, serious or otherwise.

And, at 3:45AM, we have another successful deployment under our belts.

1 comment:

Anonymous said...

I was riveted while reading this. I truly am white AND nerdy.