Cloud Scale Challenge
How do you get around 20 people to invest crazy amounts of personal time and energy into learning technologies that are important to know, but do not directly impact their day-to-day work lives? Turn it into a competition!
Over the past several months, eight teams from the Pariveda Solutions Dallas office have worked tirelessly to design and implement a cloud-based architecture that a perspective client could be interested in, today. Each team had to meet a minimum set of requirements (more on that below), as well as implement their cloud architecture in both AWS and Microsoft Azure. We called this event the 2014 Cloud Scale Challenge - the first of its kind.
Last week, we hosted Demo Day where each team had an opportunity to show the group what they built and try to beat the competition in a measure of cost vs. performance. We had some interesting solutions, approaches, and lessons learned that are outlined below.
The Rules
The goal of this competition was to have each team design a simple E-Commerce application that can process large transaction volumes. This involved creating a suite of services to power the application, as well as a simple front-end to demonstrate functionality. Finally, each team had to come up with their own method of proving performance.
Each team was required to successfully meet these requirements on both AWS and Azure, but they get to choose which solution to use during the competition. To make things even more interesting, each cloud solution had to consist of at least three services from each provider (EC2, S3, etc.)
User Features - Create a website that performs the following actions
- List products (100+ unique SKUs)
- Search for a product by name or SKU
- Create an order
- Add an item to an existing order
- Submit order
Service SLAs- The system must handle the following request volumes
- 10,000 concurrent product search requests (10,000/second)
- 30,000 add item requests per minute (500+/second)
- 3,000 submit order requests per minute (50+/second)
Performance SLAs
- All orders and line items must persist to the backing store within 10 seconds of the request
- OK response must come within 500 milliseconds of the request
- Search results must return to the user in less than 1 second
The Scoring
The team that wins must have the best score in two of the three categories above. In the event of a tie, the leading teams must compete with their other solution.
Technology Choices
I was surprised to find that the teams were split down the middle between competing with their AWS and Azure solutions. We were right at 50%. Here are the unique technology stack decisions made across the competition.
- Azure - PaaS - C# - Windows
- Azure - PaaS - Node.js - Linux
- AWS - Java - Linux
- AWS - Node.js - Linux
- AWS - Node.js - Windows
Three of the teams ended up starting with one technology stack, and then completely pivoting to a different language and OS midway through the competition because they weren't getting the results they expected.
Architecture Approaches
As you might expect, the architecture implementation a team ended up with was the key driver of success or failure in this competition. Any software developer on the planet can create a good enough solution and throw tons of computing power at the problem to meet their performance demands - a strategy that will bankrupt the infrastructure budget of even the largest companies. However, it takes a skilled cloud architect to analyze the various trade-offs each decision incurs and strive to make lots of really good small decisions to succeed.
Here's the AWS architecture that one of the better-performing teams put together - in both cost and performance.
Lessons Learned
This was my favorite part of the day. Each team had a wealth of knowledge and experience to speak about that didn't exist just a few months prior to the competition. Everyone unanimously agreed that they are much more comfortable with cloud technologies, and feel confident they could join any cloud-based project and hit the ground running. Mission accomplished.
In addition, there were some more nuanced lessons learned that are outlined below:
- The teams had a unique opportunity to create a solution from scratch: File->New Project. This is something that some of them haven't done in a long time due to working on larger, more established projects.
- The knowledge gained with one cloud provider (i.e., Azure) does not necessarily help you in the other.
- Teams that started focusing on Azure’s Platform as a Service (PaaS) tended to use their Azure solution for the competition.
- Creating a solution that works across both cloud providers with minimal code changes, restricted teams from using services that are unique across a single provider (PaaS on Azure).
- AWS charges you for the whole hour, even if you spin an EC2 instance up for just a few seconds.
- The team that had the best raw performance focused intently on the low-level aspects of their solution: frameworks, web servers, operating systems, and programming languages. They didn't pick what was familiar to them, they picked what would perform best.
- Scaling out (adding more machines) is more effective than scaling up (adding more compute resources).
Conclusion
When we set out to create this competition, our goal was simple. Incentiveize people to get up-to-speed on technology platforms that are desperately needed in the industry. We didn't know what the response would be, how applicable the experience would be, or even how many teams would be able to finish.
At the end of demo day, we felt like the results far exceeded our expectations. The teams impressed us with their tenacity and intuitive problem solving abilities, and there was a high level of excitement and passion around the subject area.
We consider the 2014 Cloud Scale Challenge an overwhelming success and I can't wait to see what the teams come up with next year.