So all of us have been thinking how some of the websites are not able to take on the traffic they get exposed to. We have reached a critical mass in the industry and it is important for large internet companies to take this more seriously and also understand that scaling goes beyond adding hardware. Here are few aspects I would like to point out based on my experience.
Prediction
This is one big issue. Even Flipkart has pointed this out in their apology. The dynamics of marketing have changed. The various mediums have evolved and how the marketing is measured has not made much progress. Traffic estimation has to become more scientific and we also need to measure both online as well as offline mediums.
Testing
Another ignored area by lots of companies. A strong testing organization is need of the hour. Software has become much more complex and distributed. Testing is not just about overall functionality. In most of the scenarios the actual user scenario leads to invocation of multiple components and interactions between those components need to be carefully modeled to handle edge case scenarios. Always remember, you are as strong as your weakest link. Testing strategy needs to take into account for the weakest component also.
Hardware is not magic bullet
We can not solve everything by adding hardware. Sometimes business owners do make an assumption that just adding hardware can help but it can also make your deployments more complex and sometimes that can cause more harm.
Horizontal or Vertical
Old technical debate about the right way to scale things. But at the internet scale of things, you will probably need both.
Chose your open source wisely
Gone are the days when people wrote code from scratch. Nowadays it is common to use components from all over the place. It is important that you test all of them properly for scale before using them. Just because they worked well in some other scenario does not mean that they will work well for you. The way you use them might be different.
Performance
Performance is still relevant and important. In the days of quick development cycles, this is often ignored and pushed to be last priority. But bad performance of a single component can have a cascading effect in the entire cycle so individual components need to be designed and tested to perform under stress.
Do add other things if you think they are important.