QCON was one of those very productive conferences, I learnt a lot about Microservices and Agile from other people in the industry that are either in the same stage of transformation we are or more advanced in the process and can tell about all the pain that was felt and what was learnt during the process.
I’ve attended the third day of talks and these are the ones I’ve watched:
Applying Failure Testing Research @Netflix
The whole is greater than the sum of its parts, that’s how they started the presentation. This was to explain that they have got very good results by mixing academic and industry people to think about software. “Freedom & Responsibility” is one of Netflix’s values, which means employees can do wherever they think is the best and the company will support them, but they will be responsible for it.
So they explained that on a system containing 100 layers of services talking to each other, the number of possible combinations would be 2^100, which is a number VERY big. So testing end to end scenarios become something impossible.
They came out with a concept of a failure-driven architecture mixing a paper written by the academic guy called Lineage-driven fault injection with the data and the know-how the industry guy had, where you create your piece of software testing it with various failures scenarios making it failure-proof. So as an example, you could deploy your packages into a server with a faulty hard disk, or unstable network connection.
So with that you can learn more about your software and know where you should redirect you efforts into testing.
Industry and academia need each other. Far from the tire fires of production, university researchers have the time to ask big questions. Sometimes they get lucky and obtain answers that change how we think about large-scale systems! But detached from real world constraints, systems research in academia risks irrelevance: inventing and solving imaginary problems. Industry owns the data, the workloads and the know-how to realize large-scale infrastructures. They want answers to the big questions, but often fear the risks associated with research. Academics, for their part, seek real-world validation of their ideas, but are often unwilling to adapt their “beautiful” models to the gritty realities of production deployments. Collaborations between industry and academia — despite their deep interdependence — are rare.
In this talk, we present our experience: a fruitful industry/academic collaboration. We describe how a “big idea” — lineage-driven fault injection — evolved from a theoretical model into an automated failure testing system that leverages Netflix’s state-of-the-art fault injection and tracing infrastructures. This collaboration required us to take risks, to accept defeats, and to constantly evolve our approach to “make it work”. We sketch the architecture of the automated failure testing system we built and some of its discoveries, while providing intuition for why it works. Along the way, we will describe the challenges (expect as well as unexpected, technical as well as ideological) that arose, and how we overcame them.
Test-Driven Microservices: System Confidence
Not sure if the idea was that, but this talk was very vague to me and more practical examples would help much more to understand the concepts. It started with Russ Miles playing Highway to Hell with custom lyrics about Microservices: Highway to Microservices Hell.
He asked questions such as “Why Microservices?” and said things like:
- Testing pieces of software is a good thing, instead of testing the whole thing every time;
- Microservices are like jigsaw puzzles;
- Microservices are like theaters, where each actor has a responsibility;
The main idea of the talk was that you can learn about a system through stories and tests can tell them.
Tests should speed you up not slow you down, and nowhere is this more important than when building micro service-based systems where speed of delivery and adaption is, often, everything.
In this talk Russ Miles will show how you can build production-level confidence in your polyglot microservices by applying the test-driven approach to synchronous (REST) and asynchronous (Messaging) services. In a massively distributed system such as microservices there are a lot of variables at play and this could be a real headache for testing. At the same time testing is critical to having the confidence to take advantage of the speed of adaption that the microservice-based approach promises.
With a selection of demonstrations and code snippets (along with a little guitar thrown in for good measure!), Russ will show that with the right approach to testing and deploying you can have confidence in your individual microservices AND in having the right impact on the surround system.
Taking a technical dive they’ll show how by applying specific constraints to the system, testing not only can be successfully applied to the microservices themselves but that this can be done simply, easily and can embrace speed of change rather than be an impediment.
Hacking bank mobile apps
This was one of the best ones. Stevie Graham has a company (http://teller.io/) that builds apps on the top of bank mobile apps. He showed how he managed to connect to Barclays API (which is one of the most complex ones on the market) and get a session and get a list of his own card transactions. He did that just by analyzing the messages exchanged by the app and the public API.
What do you do when you want a fully transactional banking API but PSD 2 is years away and you doubt that banks will ever make a good faith effort to ship a usable API anyway? You attack the bank’s own mobile app and reverse engineer its API, that’s what.
The Microservices and DevOps Journey
Wix, a free website Builder, converted their monolithic system into a microservice architecture in 5 years, and Aviran Mordo, the Head of Engineering, showed every step they took to do it. It was a good one to watch as they learnt a lot during the process, even at the very beginning when they had splitted the system into two parts. He said that at the beginning we don’t need to worry about a lot of stuff we see other people doing in the market such as Service Discovery, Shared Logging, etc. Just keep it simple and improve the architecture with the time, when you start to feel the pain. For example, they just implemented shared logging when they had 150 components.
- Microservice is not a library, is a production system;
- The sizer of a microservice is the size of the team building it;
- Microservice or library? Discussion.
- Limit the tech stack to known products;
- You can’t do microservices without devops culture because every microservice has a devops overhead;
Switching from a monolithic to a microservices architecture is no easy task. At Wix, it was a 5 year journey, but we now have over 200 microservices successfully running on a battle-tested production environment.
I will share what we learned as we worked toward this milestone—how microservices and DevOps go hand in hand, and what it takes to operate and build a successful microservices architecture from development to production.
Microservices Chaos Testing at Jet
The idea was to say that functional programming languages are very common with microservices. Rachel Reese gave examples of how F# can be better than C# by being more simple and reducing the size of the architecture.
She showed how they implemented testing at Jet.com with something called Chaos Testing, where they experiment random scenarios against normal behaviours, something like Netflix’s Chaos Monkey. You can run these chaos testing even in production if you really want a stable, well defined environment because “if availability matters you should be testing for it”.
Some of the biggest growing pains we’ve experienced with our microservice architecture at Jet is in preparing for system outages. In this talk, Rachel will discussJet.com‘s chaos testing methods and code in depth, as well as lay out a path to implementation that everyone can use.
The Journey of Agile Transformation @Barclays
Barclays has 150,000 employees in 50 countries and by January 2015 only 4% of them were using Agile methodologies. They then started a Agile Transformation and nowadays something about 40% are already doing it. It was great to see that they improved a lot in a such small amount of time. Even senior managers are using agile with their work.
They explained about:
- Taylorism and Standards and why they affect so badly large corporations, such as “We’ve always done it this way”;
- Don’t scale agile, descale the work instead;
- Targets are bad for teams, but they can be food if they fit the culture;
- Management should remove impediments and teams should implement principles;
- Honest feedback is always great to know if a transformation project is going well, but they are quite hard to get;.
Agile at scale means different things to different people depending on their context.
We want to share our unfolding story of Agile transformation across Barclays, what we have done and how we’ve done it, not only in IT but beyond IT. The transformation is across a very large and very diverse organisation with a 325 year history.
We will look at the challenge of using Agile to bring about culture change from different perspectives, comparing and contrasting what needs to be done at an organisational level to make change happen and how that change can be perceived at a team level.
We will share both what has gone well and what we would do differently if we could do it all over again. We hope to inspire you with our ideas on how to get started and which next steps to take in order to advance your organisations towards greater agility.