2023 Infrastructure Report

One of our Year End Wrap-up Blogs. Others include 2023 Year in Review, 2023 GiveSignup Product Recap, 2023 TicketSignup Product Recap, 2023 Infrastructure Report, 2024 Company Strategy, 2024 RunSignup Roadmap, 2024 GiveSignup Roadmap, 2024 TicketSignup Roadmap.

Solid Infrastructure for our Customers

0

Zero downtime in 2023, with only 4 minutes of downtime total since 2015

2,000

Releases of Software
All done between user clicks

$470,000,000

Transactions processed and paid to our event customers with 100% accurate, on time payments

While no one ever really notices our infrastructure, we spend a lot of time and effort and resources to ensure that we have reliable, scalable, secure systems in place to allow our customers to count on the technology services we provide and most importantly, makes sure we process transactions from their participants, donors and members and pay them on time. We also take care of things like Sales Tax collection and remittance for our customers. In 2023 we collected $5,000,000 in sales tax for 8,845 jurisdictions in the US.

Our customers count on us for critical services – their event and sometimes even organization websites, their email, reporting, participant management, and countless features that allow them to grow and operate their events.

9,000,000

Registrations and Tickets Sold

600,000,000

Free Emails Sent

29,000

Event Websites

In our Thanksgiving Infrastructure Report we highlighted the busiest day of the year, with our system providing flawless, fast service.

920,000

Turkey Trot Participants

40,000

Page Views per Minute at peak load with average 100 millisecond response time

534,000

Participants checked in with our RaceDay CheckIn App

We know having reliable systems is important to our customers. It is also exciting for us to have grown to a size where we can afford to do a great job with quality infrastructure as well as continuously deliver and improve on high quality software.

2023 Infrastructure Overview

Please refer to our Thanksgiving Infrastructure Report for a review of each layer of our system architecture shown above, along with metrics of how each layered performed on our busiest day of the year.

Infrastructure Deep Dive

In addition to this systems infrastructure, the other key part of our infrastructure is how we develop and deploy software.

People

We start by thinking about the people. We are lucky to have a talented team that has been with us a long time and works together very well. You can see interviews with a number of members of the team on our company video page. These are the some of the people who create the great software our customers use. Bruce Kratz, our VP of Development, does a nice job of explaining our people as well as the processes we use.

As Jonathan Farrell puts it in his video when asked what the greatest strength of the development team is:

“The Teamwork. We’re built up of some of the greatest and smartest and most talented individuals I’ve worked with. And we work as one. Even though we are a group of great individuals, there’s no individual, we are a team in every aspect. We work together and we work strongly and we help each other and we teach each other. We grow together. And there’s nothing better than that.”
As Jonathan Farrell puts it in his video when asked what the greatest strength of the development team.

Code Review Process

100% of our code is reviewed line by line by another person before it is ready for release. This has the benefit of people learning from each other, and increases the consistency and maintainability of our code base. We have the strategy of “Aggressive Patience” with software development and do not put deadlines on our software releases.

We also have invested in automated testing over the past few years. We have a set of unit tests that are run automatically multiple times during our code review and release to check for any errors. We also have full integration suite tests that check on more complex multi-step things like signing up for a race and paying that developers are able to run on their Mac’s and PC’s. Here is a graph that shows our progress in growing our automated testing capabilities (we have over 1,500 tests with over 50,000 assertions currently):

Deployment

Over a decade ago we set the foundation for being able to deploy new version of software as well as upgrade our systems without affecting any users. In essence, we upgrade the software our customers are using between clicks of editing their race or signing up for an event. This is what allows us to do over 2,000 releases each year. The basic function supports our “Continuous Improvement” philosophy. Our customers know that our software will just keep getting better and better. And many of them notice, including Tom Jordan who sent this nice note to Bob a while ago:

2023 Infrastructure Improvements

Monthly System Updates
We made our usual improvements in 2023, which involve monthly updates to all of our services. We use a third party tool that looks for vulnerabilities in our systems and compares them with the CVE lists that track all security vulnerabilities (if you want to be scared, look at how many are reported and the frequency (hourly) on the public list). This is part of our overall PCI Level 1 security as well as fraud efforts to keep our systems and data safe. In addition, we also do updates of a variety of the levels of our software like upgrading to new versions of AWS Aurora MySQL Database, Smarty, PHP, Lambdas, NodeJS, TinyMCE, etc. We made some pretty major upgrades this year in terms of software versions of some of these critical underlying components that was person-months in terms of effort. While not creating any new functionality for our customers, it is the kind of unseen investment we make continuously.

Upgraded Servers
We upgraded a number of our servers (we run over 50 servers and a number of AWS services like Lambda, SQS, SES, SNS, S3, Cloudfront, CloudWatch, Route 53, etc.). There were two areas where we took advantage of new technology. First, were new Graviton based servers – the new higher performance and lower cost servers that AWS has rolled out, especially in our Database tier.

The second major change was moving to the newer m6i that are designed for memory intensive applications on our web server tier. This had the advantage of doubling our total memory at the web server tier.

In addition we went from 4 servers at a “4XL” size to 8 servers at a “2XL” size. This has the benefit of distributing the load during peak usage times, as well as increasing availability if issues happen (for example an AWS Availability Zone went offline a couple of years ago and took out many cloud services like Ticketmaster, Netflix, Delta and a couple of competitive registration providers while our services survived because of our distributed model).

Auto-Scaling
Perhaps the biggest advancement we have made is improving our auto-scaling. We now watch for either CPU or memory thresholds being exceeded across a minimum of 2 servers to then automatically start up new servers. We also increased the number of reserve servers to 8, enabling us to double capacity in about 5 minutes automatically with no manual intervention. Fortunately, we have not needed it since we rolled it out (but it was tested!).

We also updated our console that allows us to control servers in a more intuitive manner than the AWS Console. It allows us to reboot, shutdown, remove and add to to our load balancers any of the running web servers as shown below. It also allows us to manually add some or all of the 8 reserve servers very quickly, as well as add any number of servers with a bit more of a delay for them to be configured in the background before coming online.

Summary

We have come a long way since 2010 when we had a single server at GoDaddy. We are thankful to our many customers who have helped us grow to the point where we can build the best technology platform in the endurance industry. We will keep working hard and improving for you all.