Sara just received the cloud bill and no way in the universe did she see this coming. The bill is at least twice what she expected.
Sounds familiar?
It happens more frequently than you think, with more leaders than you can imagine.
Once in the cloud, approximately 80% of companies report receiving bills 2-3 times what they expected. Cloud costs keep creeping up despite the many measures and steps taken by cloud architects.
What more is that these soaring unpredictably high bills hold leaders back from taking the next step into BI adoption – quite a contradiction given the objective of cloud investment.
Cloud investments become so big that leaders are left with less money to spend on data initiatives. Resources are wasted, goals are unmet, and CIOs are burdened with huge costs.
In a conversation with one of the world’s leading FMCG brands, they said, “We wanted to drive data analytics adoption, but the cost of moving data to cloud was so high that it didn’t seem worth it.”
Cloud is an enabler to BI adoption, but without right understanding and insights, it does the opposite. Bases on out experience, we have identified 3 key drivers to optimize cloud costs. Take a look –
1. Cloud Data Dispersion Costs
Costs related to distribution of data across different data centres and cloud providers, across different regions of the world.
What is your data dispersion strategy? What is your data portfolio – how much of your data is static and how much is dynamic?
Who needs ‘what data’ at ‘what frequency’? Having a BI solution deployed for all Key members when all they need is a simple table in email (or chat) monthly would increase your cost.
How is the data coming to your cloud, how much of it is dynamic?
If your data is consumed at monthly intervals, then having those real time data stream services open 24*7 will not be of worth.
These costs can go really high for applications that incur high investments when it comes to dealing with dynamic data.
2. Cloud Computation Cost
Costs involved in delivery of computation services, including resource allocation and consumption.
What kind of services are being used and how much of the cloud is underutilized?
Are you using an on-demand computation strategy – Big Query, Athena (pay as you go)
or are you investing in reserved capacity or hybrid pricing (subscription plus
pay as you go)?
Most of the time businesses conduct ad-hoc analysis. Need to set a pattern of analysis.
While using Big Query and Athena like on demand serverless services for ad-hoc analysis decrease your costs, over a period of time, they start to build up.
If you have a BigQuery bill going above $3000 a month for a long time now, it would be wise to invest in flat rate costing.
Another common problem is that the clean-up strategy for the tables is not set. So, while you only need at max past 12 months of data for your active queries, the data exists for past 3 years in the table.
It could be that you are churning out values for past 3 years and then filtering them, effectively when you could have done with 1/3rd the computation cost. Doesn’t make sense, right!
What is your cloud spend optimization strategy?
Regularly monitor where you are overspending and underutilizing.
Have your consulted with all stakeholders on their analytics requirements? Have you asked them the time periods for which the data is usually needed by them?
Are you conducting regular clean-ups and auditing of the services for loads and usage and taking corrective actions against any aberration?
Let’s take an example.
Keeping a large cluster of data bricks to run daily model execution when its current capacity is only required when the model training section runs every 3 months would be a bad idea. Separating the two scenarios and running model execution on a smaller cluster daily and model training on the larger cluster once in three months will save you a lot of money.
3. Cloud Storage Cost
Costs relating to storing desired quantities of data on cloud.
Which tier do you want to store your data in? – In the edge, in the files or in the cloud?
This decision depends on how quickly you need to use data – each tier costs differently. Depending on your requirement, if you just need a filesharing environment, then having a simple sharepoint /google drive might be enough for you.
On the cloud you should classify your data into immediately required that needs to be in active storage, infrequently required that should be in nearline storage and rarely required that can go to archival.
What are your networking and transaction needs? What are your data replication and duplication strategies?
These depend on –
- how often do you need to replicate data or move data in and out of cloud (egress fee)?
- how often do your applications transact with cloud – small fee but potentially high impact if unaccounted for?
What sort of data do you want to archive?
With focus on accelerating BI and data analytics initiatives, CIOs are investing heavily on cloud – millions of dollars annually, in an attempt to derive maximum value out of their data.
But are the investments justified? Are they able to extract full value potential out of cloud investments?
Why not?
In a survey, 32% of IT leaders worry business units are overspending on cloud without centralized spending oversight. The typical company wastes as much as 35% of its cloud budget, according to some estimates.