Look at your data, DevOps is not just measurement but knowing what to measure.

Looking back at Devops, to move forward

7 min readNov 8, 2017

I wrote an internal post a few years back and some recent online articles, including some stellar on point articles by Cindy Sridharan, prompted me to re-visit that with fresh eyes and re-post here on medium.

Back in 2011

While reading an article from Arstechnica , I read the following

“With TFS 11, Microsoft is building in greater built-in support for agile methodologies (in particular scrum iterative development), integrated support for code reviews, and more.”

It only really dawned at that point, just how mainstream Agile had become since its inception, ten plus years ago, when it’s manifesto was announced.

Development teams have such a wide range of Agile software development methodologies to choose from including Scrum. I attended a Scrum Master course with agile practitioner and scrum author, Geoff Watts, who delved into Scrum core values and how it attempts to introduce a software development framework, that will at the end have a potentially “deploy-able” package.

Scrum endeavours to address any bottlenecks that were encountered in the traditional waterfall delivery. But during those two days on that Scrum course, I just kept thinking that despite potential advantages, Scrum still stops just short and doesn’t involve operations folks enough.

I often looked with envy on the Agile movement but wondered why it couldn’t include a role for operations? I mean you have a situation where development teams were talking to the business more frequently, while the support team is more or less, out of the loop. Alexander Grosse, who was Engineering VP at Soundcloud back at the time, alluded to this in his post, from 2011:

There has to be something like a “Scrum 2.0”, which should address the current shortcomings, especially that Operations people are not part of the Scrum teams and that Scrum is introduced without software engineering best practices.

Scrum - Quo Vadis?

Ten years after the Agile Manifesto, Scrum is a mainstream software development approach. However, where has Scrum…

klangberater.tumblr.com

In the operations world, there is an ITIL framework which doesn’t really include development teams, outside of the release management function. So what if there were a bridge between these two worlds where items such as monitoring, logging, metrics, plumbing, communication, documentation, automating tasks, where possible, could be regular topics of conversation?

It was in late 2011, that I noticed the increased use of hash tag #DevOps appearing on Twitter. I had seen it used in the past but I hadn’t really taken a great deal of notice. With my curiosity sufficiently aroused, I began my journey to discover just what is DevOps all about.

One of the first posts I read was on a blog by Patrick Debois, who has a guest post by Stephen Nelson-Smith, another of the founding fathers of the DevOps movement.

It turned out, DevOps was a cultural movement that had grown from the Cloud start-up Web companies, bringing the principles of Agile to the full life cycle, while attempting to break down barriers or that “wall of confusion” between development, operations (and the business).

After an invitation, Stephen came over and spoke onsite in a session called The What, Why, When, How, Where, Who of DevOps.

Devops applies agile principles and practices to development, project management and system administration to bridge the gap between projects and operations.
This improves collaboration and communication between departments, and encourages continuous deployment and delivery of business value. All this collaboration starts from the premise that the whole is greater than the sum of its parts.
Devops is the complete package of culture, automation, measurement and sharing required to deliver the expected services to the business and user community.

It was the “Silos are for farmers” phrase from another DevOps founder, Julian Simpson, in one of his presentations that really resonated with me.

It re-emphasised that, this is not a technology problem but a cultural/organisational problem. I loved that phrase and began to immerse myself further in the culture and began consuming various slide deck presentations, Velocity conferences videos and any online articles from this area. People like John Rauser, a quality speaker, continues to raise valid points, about monitoring and also books such as “Web Operations, Keeping the Data on Time” by John Allspaw and Jesse Robbins, helped me get a handle on DevOps

“If you’re not looking at you data, then you don’t understand your business”.

When I started in IT back in ‘99 in a support help desk rep role, I was fortunate enough to be indoctrinated into a culture which already had a focus of getting the end user back working first, addressing any gaps in the support process afterwards, because we were in business technology. Subsequently I felt familiar with many aspects of the DevOps culture which aren’t new as such, but just needed to be re-iterated.

Reading more about the DevOps culture reminded me of a quote from the former F1 driver Nicky Lauda.

“You have to co-ordinate the activities of a large number of good people. I like to work like that; I like to get together a group of people that are very good technically, and get them to work as a team and gel together”.

Present day

Currently I still work in an operational role in business technology, but just because operations is an area of focus it does not mean that I am opposed to enhanced features or continuous release deployments. If there is an issue that needs to be addressed, I want to avail of the smoothest process to get a fix tested, implemented with the least amount of time. Similar to a what a Scrum master should be aspiring to, an operations support engineer’s long term focus should be to trying to work herself/himself out of a job. I also believe that the application domain business knowledge that system admins and operations staff pickup over time is vital to the success of the applications and it can often help development folks understand just how in reality, the business is using their application. That bridge is what I still believe DevOps is trying to promote as a way forward, a shared responsibility model between Operations and Dev teams, rather than a tug of war.

Operation support teams need improved application metrics, all applications need a baseline and a simple method for perusing your logs. Dev teams can help with the right logging statements and to expose critical metrics because you simply can’t improve something that you can’t measure. (This is something that was reiterated in a great talk by Yevgeniy Brikman)

Talk by Yevgeniy Brikman

But we still need more published case studies of examples of DevOps working in larger organisations. We also need a coherent and consistent message promoting the idea that the four pillars of Culture, Sharing, Automation, Measurement are of equal importance.

DevOps is more than a combination of Agile and software engineering best practices, it is more encompassing and now the technology and tools are there to back it up, a “Culture of openess around engineering” is what is needed. Maybe then the idea will penetrate wider and become adopted more, with the adoption driven down from senior management in larger enterprises and up from the grass roots?

The tools are definitely there now, they are more mature including apps such as Ansible, Docker, Jenkins, Github. These tools allow for the quick spin up and management of applications such as a centralised log solution like the Elastic stack and monitoring tools such as Prometheus in easy to manage containers. From a operations support perspective, being able to visualise metrics in aesthetically pleasing Grafana dashboards and logging errors in Kibana, is so much better that what it was once like. ** Note, there are some real pain points with the self management of these tools but they are both technically rewarding and challenging, when it all comes together. **

For example, I work on an application which is serviced by a wide variety of distributed components. With all these moving parts, while allowing for the fact that different applications log in a different ways and in different formats, we wanted a platform to fulfil this purpose without requiring the user to be a system expert. Our solution had a goal to standardise the logging infrastructure and provide an inclusive monitoring approach. After many iterations of experimentation, we came up with a Dockerised solution, where the Elastic stack (Elasticsearch, Logstash and Kibana + Filebeat) is fully deployed through automated Ansible/Jenkins jobs, including a Kafka buffer, automatic logs shipping from all our components to Kafka and performant logstash consumers working to ingest millions of logs entries per hour.

So, can we continue to learn from DevOps?

Yes!

A cornerstone of DevOps is not just measurement but knowing what to measure. The benefits of investing in structured application logging are common to both Dev and Operation teams, this helps provide further anomaly detection for deeper problem management and “If you’re not looking at your data, then you don’t understand your business”

I thoroughly enjoy working with dev teams and learning something with each interaction, but I especially like the iterative approach and incremental improvements, as we learn more together!

This blog post was drafted listening to a post-rock playlist, which also happens to be the perfect musical accompaniment, to searching through application log files.