Week 2 coming to an end. After two weeks of the virtual re:invent in 2020, it is certainly a bit a bag of mixed feelings. And probably it shows on the AWS execution side.
Don’t get us wrong, especially the keynote performances are well-rehearsed, the message is polished, and the content is well thought out. But all the feeling of danger in the live setting is kind of missing. And usually, there has been some thread woven through whole of the event, like those “Go Build” types of slogans. There is an empty space where they used to be.
There were some hints in the first keynote what could become those themes and slogans, but they just didn’t seem to flow to the other portions of the content.
One trend that has started a few years back as a change from the original is how the announcements are made. And nowadays, there is more of an “re:Invent season” than just the conference itself. This year, more than half of the announcements have been made somewhere else except keynotes. And we don’t expect that balance to change drastically, as there is only one more keynote to go next week.
But back to this week and today’s keynote is a natural place to start. So first, let’s cover few details from Infrastructure keynote by Peter DeSantis, SVP of AWS Infrastructure and Support.
AWS Infrastructure Keynote
Peter’s sessions are something that we always look forward to. During the normal re:Invents (remember those still?), his slot has been traditionally the Late Night slot just in the beginning of the week. It has not usually been jampacked with new launches, but you’ve always had one or two sneaks in there as well. It’s mostly been about beer, snacks and war stories of how specific AWS infrastructure things have been built.
This session, along maybe with the Security Leadership session, has traditionally been the showcase session, reaching out both to existing clients as well as new and potential customers. The message is clear: in infrastructure, we are the best in the world and good if anyone comes close. There is no-one better in doing these things. If these claims are valid, everyone needs to decide for themselves. We are simply stating that this is the tone which they have chosen.
Today he promises to talk about three main themes: how AWS operates, AWS custom silicon story and how Graviton was designed and lastly about Amazon and AWS take on sustainability and where they are on that journey. So let’s see how he assumes kind of “official” keynote status and how that has impacted his way of presenting things.
You can also find AWS’ liveblog of the keynote for detailed minute-by-minute here: https://aws.amazon.com/blogs/aws/reinvent-2020-liveblog-infrastructure-keynote/
How AWS operates its data centers
In the infrastructure keynote, we got a glimpse into how AWS thinks of their data centers and how they are operated. If you’re still running your infrastructure, there is a lot you can learn from AWS, and apply to your own business. Albeit they are probably working on a different scale.
At AWS, the data center design anticipates failures at all levels. The design principles are to limit blast radius while decreasing complexity while being able to concurrently maintain systems with redundancy.
All IT revolves around electricity; without it, there is no cloud. Redundancy in power grids, switches, generators, UPSs and power supplies is a must in enabling you to reach six nines (99.9999%) availability. Everything fails – eventually. It’s just a matter of time, and that is why the metric of measurement is even called “Mean Time Between Failures”, MTBF. That’s why it’s important to add additional Availability Zones into the mix. Another facility with a completely separated set of power utilities and which is geographically placed miles apart helps to lower the risk of natural disasters affecting workloads, while remaining close enough to have sub-millisecond latency.
Putting enough distance between facilities and keeping the latency low enough by keeping it short enough is a balancing act. There is what is referred to as the “goldilocks zone” (borrowed terminology meaning Circumstellar habitable zone in space), where the distance is just right for minimizing risk and not affecting production network latency. As other means mentioned here – most of this is balancing between risk, complexity, cost and performance. When Dr Matt Wood was referring to “Danceability” in the ML keynote, I guess this constitutes as “Sleepability” from Infrastructure Manager standpoint of view.
AWS has designed its hardware for the data centers, for example, switchgear and micro-UPSs. All hardware partner delivered switchgear, as well as AWS own versions, are controlled by same software in all regions. The micro-UPSs are housed in the heart of every AWS rack, and they have their independent batteries and software built by Amazon to have just the required features and be able to develop at the pace we accustom to AWS. While micro-UPS’s are not the most cost-effective way of securing your electricity, this drastically lowers the blast radius when it comes to UPS failures or maintenance work affecting power. Only a single rack is affected in case of total failure when internal redundancy is not enough.
AWS conducts all its maintenance operations in a way that only one availability zone can be affected at a time – with hardware and software updates. We can learn from this as human error is one of the key reason for outages. Amazon has managed to have no global outages in its services since the beginning – can you say the same?
AWS Custom Silicon approach
Ever since the acquisition of Annapurna Labs in 2015, AWS has really been pushing the development of their silicon. In the past years, we’ve already seen custom AWS silicon like Graviton, the 64bit ARM for servers, and Inferentia, the chip targeted solely for machine learning inference purposes. In the machine learning landscape, the majority of the cost is coming from inference, and this is where Inferentia shines—bringing more bang for less buck. That is also where Amazon is eating their “own” dogfood as the Alexa product line has swapped from Nvidia based environments to Inferentia to gain lower cost and higher performance.
The drivers for developing their own, custom silicon are clear. First of all, achieving better efficiency. Really a key thing! Secondly; tying closely on their sustainability targets, power-saving and finally being more secure.
Looking in retrospect, this is exactly what they have achieved. Peter mentioned Graviton2 is the best performing CPU on AWS while being the most power-efficient at the same time. Great achievement! Peter also spent quite some time on the history and development of the different CPU evolutions and compared, why Graviton2 processors are as good as they are. Some points to highlight, the lack of SMT (Simultaneous multithreading) or HyperThreading on Intel terms, which causes performance variability and security concerns like cryptographic side-channel attacks. Graviton processors have four times more core-local cache, compared to x86 architecture processors.
Interestingly enough, AWS does not allow SMT to happen between different guests, so it is debatable if they are running their whole infrastructure HT turned off, to begin with. But in the performance/price diagrams, they make a point of the actual high core count of the ARM architecture to work in favour of it. Or in favour of AWS, in most cases – which is one of the natural reasons why they make a push towards it. But as long as it benefits both the user and the provider, it’s a win-win.
This year, in his keynote, Andy Jassy announced some new fruits of the acquisition. One of them was the Trainium (I want the same pills these people at the AWS marketing department are having), which is briefly explained in our blog post of Andy’s keynote. In short, it does the same thing for ML training as Inferentia for inference. More performance at a lower cost. Who could say no to that?
One of the most interesting parts of AWS custom silicon development is definitely the AWS Nitro System including custom silicon, responsible for the security of the system. The Nitro card is also something that powers and made possible the first cloud-based MacOS instances, AWS launched just last week.
Nitro cards can turn all kinds of hardware into EC2 machines as is the case with the Mac instance as well. It’s a Mac mini shoehorned into a rack enclosure, combined with a Nitro controller. Pretty cool, don’t you think?
Sustainability has been a big ticket item already for the whole of Amazon, and there has been the pledge to become carbon neutral by 2040. They have previously stated that they are committed to running their business in the most environmentally friendly way possible and achieving 100% renewable energy usage for AWS global infrastructure.
These targets and the agenda is especially near our hearts here at Cybercom, because we are very focused on sustainability and committed to UN Global Compact, so working alongside AWS is a natural choice for us from that perspective as well, where our visions and ambitions are aligned.
A study from 451 Research states that AWS infrastructure is 3.6x more energy-efficient, even as of now, compared to U.S. based enterprise data centers. Also, enhancements in the data center power design, which were discussed in the segment above, have resulted in smaller losses in energy conversions. Which is effectively energy which nobody uses – and that is the energy best going saved. That is where it also makes sense to consider, if it is worthwhile to make a move to energy-efficient CPUs, like Graviton2 based instances.
AWS is also embarking with new carbon-neutral energy projects in Italy, France, South-Africa and Germany, where new wind and solar farm projects are starting. It contributes to the updated target to be fully renewable by 2025, 5 years ahead of the original schedule. Additionally, AWS co-founded and made a pledge against the Paris Agreement to be net-zero carbon across our business by 2040, 10 years ahead of the Paris Agreement.
Energy consumption, and its production, is relatively easy to understand as the direct components of producing CO2 emissions. Every business has a collection of indirect emissions, which are created as byproducts of the business, rather than directly created from running the business. In IT infrastructure, the largest indirect emissions come from building the infrastructure, including buildings and additionally from assembling the electronics and other hardware needed in the service production.
There are also some challenging chemical components in concrete production, and one of those is carbon-intensive clinker. There are currently plans and project how other materials will replace it in concrete.
An interesting tidbit was that AWS is partnering with a company called CarbonCure External to reduce the carbon footprint in concrete use. They promise to deliver a technology that introduces recycled CO₂ into fresh concrete to reduce its carbon footprint without compromising performance.
One scarce resource is also freshwater. And depending on the geographical location, it might be relatively cheap and available or costly, scarce and constrained resource. As for how the water is used in data centers, which is mostly only for cooling purposes for transferring heat out, there is a lot of wasted potentials, if water would be only used once and then pumper to a sewer. There has also been some innovation in regards to how to reclaim the water from that process. Even the same water is used multiple times in the data center facility, but as it is essentially clean water, if not necessarily drinkable, but still usable, for example in farming purpose. There are now already existing projects, where the reclaimed water repurposed for irrigation and thus saves huge amounts of freshwater that would otherwise be needed and used.
In addition to its energy production, Amazon also acquires energy from grid providers, who are producers of sustainable energy. Amazon procures 6.5GW in renewables in total, which is relatable to up to five regular-sized nuclear power plants at capacity.
On a related note to the topic, the news was announced today, that Amazon has signed a ten-year corporate power purchase agreement with Ørsted to offtake the output of 250 MW from Ørsted’s planned 900 MW Borkum Riffgrund 3 offshore wind farm in Germany.
When it comes to the future of our planet, environmental actions are not only worth making- they are worth making them well. Or putting it in other words, do we have other alternatives? Also, environmental things are cascading in nature, so when acquiring goods and services from sustainable providers and partners, you are contributing to net-positive outcome automatically through that relationship—worth a thought.
As presumed, Peter is doing a great job in vocalizing the speciality of and being the spokesperson of the “undifferentiated heavy lifting department” at AWS.
This year, we were left missing any new announcements in this keynote. But got to scratch little bit beneath the surface on some of the things what is going on under the hood.
For us as authors, Infrastructure keynote holds a special place always in our hearts, because we are all old server huggers. And even though our focus is nowadays somewhere else, we still understand the gravity and the importance of the things we were shown today. And especially how important they are for all customer’s success, that someone who cares, takes care of the issues when we no longer have to.
Coming up next week
We are at full speed, and there have been 105 announcements so far at the time of writing this post. It seems we are going to be somewhere around 150 new things when the dust settles, and everyone heads to the holiday season.
From keynote perspective, the event is concluded next week on Tuesday 15 with the keynote of Werner Vogels. Also, at the beginning of Werner’s keynote, there will be the Deep Racer finals. Naturally, this is of great interest to us at Cybercom to see how Jouni is going to rank this year, but next week will tell more about that.
Additionally, there is going to be AWS Community Nordics hosting virtual viewing sessions for Werner’s keynote, links to local events, for example here:
After the keynote, there is going to be a Community Leader livestream session with Gunnar (AWS), Rolf (Finland), Lezgin (Sweden), Angela (Denmark) and Anders (Norway) discussing the keynote and the whole event. Details of the stream will be announced in the meetup events above.
Key announcements from this week
The continuously updated list of both top announcements and all announcements can be found in the links below:
Top announcements: https://aws.amazon.com/blogs/aws/aws-reinvent-announcements-2020/
All announcements: https://aws.amazon.com/new/reinvent
Happy re:inventing – catch you later on next week!