Cloud Computing

2024 Infrastructure Tech Predictions

Ganesh Srinivasan, partner at Venrock, co-authored this article. 2023 was a rollercoaster like none other; from the death of the modern data stack sprawl to the birth of generative AI, we are only at the beginning of a new era in the ‘art of the possible.’ We guarantee 2024 won’t be a disappointment. With a new year approaching, it’s the perfect time for us to examine what we anticipate being the biggest developments in the year ahead. Here is what we think is going to happen in 2024: 1. OpenAI’s Reign Challenged With the emerging learnings in core neural net architectures that led to the transformer and dominance by OpenAI, it is likely that their imminent release of GPT5 will be surpassed in specific performance benchmarks by a new entrant on the backs of more efficient architectures, improved multimodal capabilities, better contextual understanding of the world and enhanced transfer learning. These new models will be built on emerging research in spatial networks, graph structures and combinations of various neural networks that will lead to more efficient, versatile and powerful capabilities. 2. Apple: The New Leader in Generative AI One of the most important players in the generative AI space is only starting to show their cards. 2024 will be the year Apple launches its first set of generative AI capabilities, unlocking the true potential of an AI-on-the-edge, closed architecture with full access to your personal data – showing that Apple is actually the most important company in the generative AI race. 3. Building for Client-First The last decade has reflected a shift away from fat clients to server-side rendering and compute. But the world is changing back to the client. Mobile-first experiences will be required to work in offline mode. Real-time experiences require ultra-low latency transactions. Running LLMs will increasingly be required to run on the device to increase performance and reduce costs. 4. Death of Data Infrastructure Sprawl The rapid growth of the data infrastructure needs of enterprises has led to an increasing sprawl of point solutions, from data catalogs, data governance, reverse extract, transform, load, and airflow alternatives to vector databases and yet another lakehouse. The pendulum will swing back to unified platforms and fewer silos to bring down the total cost of ownership and operating overhead going into 2024. 5. Approaching the AI Winter Generative AI in 2023 could be best characterized as the ‘art of the possible,’ with 2024 being the true test to see if prototypes convert into production use cases. With the peak of the hype cycle likely here, 2024 will experience the stage of disillusionment where enterprises discover where generative AI can create margin-positive impact and where the costs outweigh the benefits. 6. The Misinformation Threat While image and video diffusion models have unlocked a new era for digital creation and artistic expression, there’s no doubt that its dark side has not yet taken its toll. With a presidential election in the wings, diffusion models as a machine for political disinformation will emerge to become the next major disinformation weapon of choice. 7. AI’s Real-World Breakthrough Coming out of the ‘field of dreams’ era for AI, 2024 will represent a breakthrough for commercial use cases in AI, particularly in the physical world. Using AI for physical world modalities will unlock our ability to […]

Read More

How to Solve the GPU Shortage Problem With Automation

GPU instances have never been as precious and sought-after as they have since generative AI captured the industry’s attention. Whether it’s due to broken supply chains or the sudden demand spike, one thing is clear: Getting a GPU-powered virtual machine is harder than ever, even if a team is fishing in the relatively large pond of the top three cloud providers. One analysis confirmed “a huge supply shortage of NVIDIA GPUs and networking equipment from Broadcom and NVIDIA due to a massive spike in demand.” Even the company behind the rise of generative AI–OpenAI–suffers from a lack of GPUs. And companies have started adopting rather unusual tactics to get their hands on these machines (like repurposing old video gaming chips). What can teams do when facing a quota issue and the cloud provider runs out of GPU-based instances? And once they somehow score the right instance, how can you make sure no GPUs go to waste? Automation is the answer. Teams can use it to accomplish two goals: Find the best GPU instances for their needs and maximize their utilization to get more bang for their buck. Automation Makes Finding GPU Instances Easier The three major cloud providers offer many types and sizes of GPU-powered instances. And they’re constantly rolling out new ones–an excellent example of that is AWS P5, launched in July 2023. To give a complete picture, here’s an overview of instance families with GPUs from AWS, Google Cloud and Microsoft Azure: AWS P3 P4d G3 G4 (this group includes G4dn and G4ad instances) G5 Note: AWS offers Inferentia machines optimized for deep learning inference apps and Trainium for deep learning training of 100B+ parameter models. Google Cloud Microsoft Azure NCv3-series NC T4_v3-series ND A100 v4-series NDm A100 v4-series When picking instances manually, teams may easily miss out on opportunities to snatch up golden GPUs from the market. Cloud automation solutions help them find a much larger supply of GPU instances with the right performance and cost parameters. Considering GPU Spot Instances Spot instances offer significant discounts–even 90% off on-demand rates–but they come at a price. The potential interruptions make them a risky choice for important jobs. However, running some jobs on GPU spot instances is a good idea as they accelerate the training process, leading to savings. ML training usually takes a very long time–from hours to even weeks. If interruptions occur, the deep learning job must start over, resulting in significant data loss and higher costs. Automation can prevent that, allowing teams to get attractively-priced GPUs still available on the market to cut training and inference expenses while reducing the risk of interruptions. In machine learning, checkpointing is an important practice that allows for the saving of model states at different intervals during training. This practice is especially beneficial in lengthy and resource-intensive training procedures, enabling the resumption of training from a checkpoint in case of interruptions rather than starting anew. Furthermore, checkpointing facilitates the evaluation of models at different stages of training, which can be enlightening for understanding the training dynamics. Zoom in on Checkpointing PyTorch, a popular ML framework, provides native functionalities for checkpointing models and optimizers during training. Additionally, higher-level libraries such as PyTorch Lightning abstract away much of the boilerplate code associated with training, evaluation, and checkpointing in PyTorch. Let’s take a […]

Read More