Optimizing Cost of Cloud Deployment for Custom AI Solutions

Optimizing Cost of Cloud Deployment for Custom AI Solutions

As Deloitte Global predictions for the next few years indicate, companies will accelerate their usage of cloud-based Artificial Intelligence (AI) software and services. The AI usage will be a combination of AI features in cloud-based enterprise software (70%) and custom AI components using cloud-based development services (65%).

The number of products deployed on cloud as well as those being added with AI-enabled features is constantly increasing. There are three factors that largely contribute to this – availability of open source AI algorithms and models, access to open source pre-trained models to kick start the AI feature development, and access to AI ready infrastructure/services from the cloud platform providers.

These options empower product teams to put together and deploy AI-based solutions at speed. As customers sign up and use the solutions, many a times a changed monthly billing draws the CIO’s attention to optimization of costs.

Linking Resources and Costs

The changed monthly billing is directly connected to the number of resources and features utilized over the cloud infrastructure. Even though this is known to many, I have noticed how, due to many reasons, AI-based solutions contribute to the rising costs once they hit production.

To devise a systematic method to optimize such costs of cloud-based AI solutions, I have a framework that will help guide architects, DevOps professionals, and even product managers to keep a tab on such costs.

Attention to Increased Resource Demands

The way in which an AI component is deployed in the overall architecture dictates how it contributes to the resource requirements. These resource requirements are dependent on model sizes and complexity of the AI algorithms. With specialized requirements of AI components, it is usually easier to allocate separate dedicated resources for AI components in the overall deployment. But, at what cost?

The separation of resources dedicated to AI works well due to differences in skills required for developing AI components vis-à-vis development of web services and web application components. However, at times, this separation results in over-provisioning in the initial stages of AI feature usage by end users. In my opinion, one of the largest contributors to the increasing costs for cloud expenditure – the design itself.

How to Think About Solutions and Pricing

Traditionally, performance analysis and capacity planning methods played a role in optimal resource allocation based on planned load.

In the pay-per-use model of cloud (where resource quantity and duration of use contribute to overall costs), predicting costs beforehand becomes challenging. Here, cost is proportional to resource usage and the resource-specific pricing model. For overprovisioned AI resources, the implementation is therefore set to begin with a higher cost.

Hence, using a combination of methods to arrive at a systematic approach for optimizing costs, is needed.

Applying Cloud-based AI Solutions

Cloud-based AI solutions fall into three categories:

1. The AI component of the solution is completely developed using SaaS AI offering

2. AI component is completely built ground up or built using the public domain models and algorithms

3. Combination of 1 and 2, where some aspects of the problem are solved by readymade SaaS solution and those get used by custom modules to get the final result that the product/application can readily use

For category 1 solutions (e.g. using Cloud based AI solutions such as Google CTS, for job candidate matching), the cost of AI feature is directly proportional to the number of calls made to the API and the API pricing model. The cost optimization in such cases boils down to estimating volume of the calls (based on projected load) and finding opportunities to reduce the API usage.

On the other hand, for solutions of categories 2 and 3, product teams need to choose the infrastructure during development for deploying the solution.

As we all know, cloud providers present us a menu of infrastructure options to choose from. CPU/GPU based machines, choice of number of cores, memory, bandwidth etc. are bundled as items in the menu. When AI components that use algorithms and models that need GPU-based infrastructure, a typical consequence is that the GPU-based infrastructure does not get used to its full capacity till the product usage reaches high volumes. Till then, this provisioned high capacity in the deployment shows up as increased cost.

In the AI solutions of category 2 and 3 above, our methodology helps us carry out right kind of analysis that looks at overall deployment and resource usage and helps us identify opportunities to optimize costs.

How Does This Work in Practice?

At a high level, analysis of incoming requests, and observation of corresponding system resource usage leads us to discover opportunities for optimizing resources used, thereby optimizing costs.
At a lower level, depending on the high-level observations of resource usage, one needs to zoom-in on specific processes, threads, and their use of resources like memory, network, storage etc.

The overall process comprises of:

Deployment architecture

Understanding deployment architecture including allocation of each logical component to physical infrastructure (for AI as well as non-AI).

Cost data

Compiling cost data and mapping the costs to components in the deployment architecture. Marking top contributors to cost for further analysis.

Usage data

This includes collecting – a) input request data at the web entry point as well as entry point of each physical server in the deployment, b) system resource usage for provisioned infrastructure (e.g. CPU, Memory) data, c) cloud service usage data (like network bandwidth, storage, other SaaS services etc.), d) data about user categories by their pricing plans and their locations (present and future forecasts).

Product/component usage

This involves analyzing correlation of the resource usage with input requests to discover patterns. These patterns are later used to identify opportunities for optimization of resources.

The typical activities include correlation of the request data with resource usage, and discovering usage patterns by time, request types, location, user types, etc.

Cost optimization opportunities

The results of analysis phase are used to discover optimization opportunities. The steps performed in this phase are to identify:

1. Underutilized resources

2. Possible level of reduction in allocated resources on that infrastructure (e.g. moving to a smaller capacity infra)

3. Opportunities to drop an infrastructure component completely and move the software components to other existing infrastructure component

4. Opportunities to shut down components without affecting usage of the solution

Recommendations

Finally, the process involves preparing a recommendations report that documents present vs proposed deployment plan, cost optimizations that can be achieved, and activities that need to be carried out to effect the successful transition to a new deployment model.

Cloud does not offer a one-size-fits-all approach, and as its capabilities expand, so will the ways in which companies use it to their advantage. It may seem daunting to choose from a variety of service offerings and deployment models to determine the best fit for your own AI solution. Rest assured, be it for AI or even non-AI, the cloud cost optimization problem can be solved with the use of a resource utilization analysis framework, based on time tested methods of application performance analysis and resource consumption.

Reference

https://www2.deloitte.com/us/en/insights/industry/technology/technology-media-and-telecom-predictions/cloud-based-artificial-intelligence.html

Leave a Comment