Black Boxes

October 17, 2013

If you’ve been responsible for deploying and maintaining software your colleagues develop, you’ll understand phrases like “throwing it over the wall” or “they don’t get it.”  I used to utter those phrases in frustration all the time.  I don’t anymore, because I know who’s fault it was — it was mine.

I took a position as an application operations admin over 7 years ago.  The company was still in the start-up phase — and that meant areas like automation and operability were suffocating.  There was no way to build a fresh QA or Dev instance of the product from scratch.  There was no way for QA to push a button and get the latest code compiled and deployed.  Managing production was rocket science and we were always paging the developers whenever the slightest thing went wrong.

I changed that — which sounds great, but was actually a Trojan horse of sorts.  I was thanked profusely for taking the bull by the horns and automating things and diving into the code to understand what the logs were saying so I didn’t have to page developers.  The architect’s wife thanked me for how much sleep he was now getting (true story).  Great, right?  Wrong.

Our development team quickly lost touch with production and what it took to run it.  I did a poor job at communicating the top pain points of operating the platform.  We never got back-end improvements prioritized on the product backlog (a typical problem).  Developers would start writing code and 3-6 months later just assume it would all work in production — it was my job to figure it out.  And I always did because we were always strapped for resources.  Well, who isn’t?  But I didn’t understand that.  Next thing you know, I have animosity towards the development team for throwing things over the wall and not listening to me screaming “please fix these problems!”

Fast forward a few years — we’ve continued to ignore the back end and no one knows how to install the platform anywhere, even on their developer workstations.  I wrote the documentation for how to build and deploy the software.  I ended up writing the automated installer for it.  I was the release engineer, the build master, the operations guy – the only guy who knew the product end-to-end.  We suffered a massive outage (why is this the only way we learn?) — I really wanted to scream “I told you so!” … but I didn’t.  I couldn’t.  I realized I’d failed my job.  I’d failed to get the business to understand the risks and poor decisions that were being made.

It was at this point that I started to realize that what I’d done was turn Operations into a giant black box.  I would consume the source code, perform some black magic, and get it out to production.  Me, the hero for figuring it all out and letting the developers just code.  Turns out, I was the villain.  We’ve started to undo some of the damage I’d done, but it’s a long uphill battle.

About 2 years ago, we’d come to believe it was finally time to have someone responsible for release engineering, build automation, deployment automation, etc.  I was contemplating my next career move at the time, and this looked like a good opportunity to manage a brand new team.  So that’s what I did.  When forming this team, there was a lot of buzz around “DevOps” so I did a lot of research.  Sadly, it looks like a lot of people are making the same Black Box mistake with respect to DevOps.

One way to get the code from source control to production is to introduce a team that’s responsible for bridging the gap between Ops and Dev.  And this is incorrectly labeled as a DevOps team.  Check out the Wikipedia entry for DevOps if you’re scratching your head.  DevOps is not a team, it’s a mentality — a way of working together.  If you create a DevOps team to bridge the gap, all you have done is move the black box.  You’ve increased the size of your organization and added another hand-off point.  You’ve decreased efficiency and increased the chance for error.

I was getting a lot of pressure to name this new team of ours DevOps, but I refused based on this realization.  Instead we named it the PD Tools team (I now wish we’d named it the Engineering Services team).  Our mission is to enable our engineers to focus on delighting our customers because we provide them best-in-class tools, systems, and processes.  It’s always in the back of my head that we’re here to enable our developers to create a buildable, testable, deployable, and operable product for Operations.  It is not to do that job for them.  If we did that job for them, we’d be that villainous Black Box.

The PD Tools team is a little over a year old now and recent events have me thinking about it’s charter.  I’d like our responsibilities to shrink even more by pushing more of them to the developers.  We shouldn’t be in charge of the tools they use — that’s creating another black box!  So if we shirk that responsibility, what does that leave us?  We’d supply the systems or services needed to run those tools, and we’d act as subject-matter experts to guide and educate.  We’d be aware of the large central organization (we’re a 150-person business unit inside an 8,000-person company) offerings and make sure our local team was aware of them.

Ultimately, the end-to-end processes to define, build, test, deploy, and operate a product are owned by the entire organization as a whole.  The lines of responsibility should blur and merge.  This requires a different mindset and process (which Agile, Scrum, and DevOps methodologies solve for).  It’s making sure everyone understands each other’s challenges and works as a team together to solve for the end-to-end process.  Drawing lines and passing the buck around is just connecting Black Boxes together and hoping it’ll work out.