Treat Every System as a Production System
For those who are familiar with my background, you know I spent a year and a half of my Microsoft tenure with the team that built out the enterprise hosted SharePoint platform (we were called MMS), which is now the dedicated side of Office365 (formerly known as BPOS-D). One of the reasons I was interested in joining the team was that I could leverage my prior experience helping build a hosted collaboration platform from scratch (while at E2open). Many of the experiences the team went through in 2006 and 2007 I had already experienced back in 2001 to 2003, and unfortunately, the org was slow to learn from the experiences of its team members.
I don’t want to rehash complaints about a team that has been reorganized a dozen times since my time there, but point to one particular episode: the problem with treating production systems like any old internal system. Having experienced the building of a real-time hosted software platform, with customers around the world, you could say that I am sensitive to any changes to (or heavy breathing around) production systems. I played a minor role in the building of a NOC – but enough that I was able to experience first-hand the building of processes and procedures for managing customer expectations for a real-time, customer-facing platform.
While in MMS, we experienced an episode where we had paying customers on our platform, with financial implications around our SLAs, and a support team that had no / very little experience with running customer-facing environments.
The dialog (mostly in email) went something like this:
Systems: We need to make some changes to PROD001 through PROD0048, with an estimated downtime of 4 to 6 hours.
MMS (my team): These are live servers, with customers actively using them. You can’t take them down now.
Systems: We’ll try to improve on that downtime.
MMS: You don’t understand – these are LIVE SYSTEMS. We have negotiated service windows, and financial impacts for any downtime outside of those windows.
Systems: We’ve already taken them down for the updates. We’ll let you know if there are any changes to our 4-6 hour estimate.
While my version lacks detail, the circumstances were very much real. In the middle of a weekday, our systems team took down a few dozen production systems without regard to customer impact, service level agreements, or communication protocols. We had to scramble to repair the damage done, and it was a black mark on our team record.
Why am I sharing this story? To make the point that every system, every activity that is customer-facing should be treated as a production system. I no longer manage the back-end systems, and am no longer involved directly in the building or support of a real-time hosted platform, but every week I participate in one or more customer-facing events where "uptime" and managing customer expectations is just as real-time, just as important.
Presentation is important. Customer expectations are important. Treating every customer or partner interaction in a defined, measured, repeatable way is important.
In short, if a customer can see it – it’s a production system.