AI | The IT Hollow

This post is part of the Red Hat Platform series. If you want the full picture of what we’re building toward, start there. Well, these days it’s hard to consider myself a technologist unless there is some AI stuff on my blog. So, in this post we’ll focus on an introduction to OpenShift AI specifically on inferencing. AI has taken the world by storm for better or worse. It’s not just technologists who are buzzing about how AI is going to impact them, the whole world has an opinion on it. While the debates rage on, technologists are trying to find ways to make AI easier to manage. Many people use the SaaS based way to get access to AI models. They use an AI endpoint from Claude (Anthropic), ChatGPT (Microsoft), or Gemini (Google) and pay for each request through the allocation of tokens, and it’s simple to get started. But as your workloads consume more and more tokens, and the SaaS providers charge more and more money for those tokens, this way of leveraging AI gets very expensive very quickly. So you look to buy your own GPUs and servers and plan to host your models yourself. Now we’re met with new problems. How do I lifecycle my models and replace old models when new ones are available? How do I change model types without updating my applications for a new API spec each time? How do I match my finite GPUs to the right models, and how do I keep users from accessing models they don’t need? The challenge is daunting… unless you have OpenShift AI. ...