Data Collection in Cloud Management: Part 1 - What to measure?
KPIs which matter for the business
Consumption is the hallmark of capitalism - it drives the free market. While in the 90s, consumerism was primarily restricted to physical goods, today with free internet, it has become so much about digital goods too. Just look at the time we spend on social media!
So, there are 3 major activities under consumption - collection, processing and action.
Let’s take a basic example - Humans. We collect data from various senses in different formats - visual(eyes), sound(ears), touch(skin) and smell(nose). Each of these senses send the data to the brain in the form of signals, which are then processed, and acted upon based on the various patterns that we have built over the years.
If we look at various artificial systems, we would see a similar sequence of events. Let’s look at other examples from our daily lives:
Uber
Here the system collects travel data, fare, along with the personal information that we provide during signup. Based on this, it creates a persona, and acts by providing differentiated pricing.
Amazon
Based on our past purchases, and browsing history, Amazon provides personalised recommendations through various ML engines.
Whatsapp
Whatsapp collects our messages, though in “encrypted format”, and supposedly uses it for providing personalised ads on Facebook. This is speculation. But the intent here is to convey what can be done.
Google
Google search, maps, etc - the whole business model is based on collecting data, and providing personalised ads.
I want to focus on the collection aspect. All these systems have a separate data collector.
What is a data collector supposed to do? Collect data. Yes, at the core that is what it is supposed to do. But with the variety of data sources, providing data in various formats, a simple data collection is not enough. More power is required. Think of Uber! It now allows its drivers and passengers to provide commands via speech, which needs to be both authenticated and authorised.
So, a Data collector collects data from the given authorised sources and passes it to the processor in a timely manner. If I were to track the health of a data collector, I would look at the following metrics:
Health - Is my data collector functional, and able to service the load in expected manner? Too slow, and UX gets impacted.
Availability - No system is 100% available. So, prioritisation is required, based on users, needs or revenue from a company’s perspective. For example, Uber might choose to provide a highly resilient system for passengers travelling to hospitals, so that the route could be tracked or cleared by the traffic controller.
Time to setup - How easily can the system be set up? This is where user research is extremely important. For example, some of the riders on Uber may prefer voice controls, while some text.
Tickets - While the developer of the system might do everything right from his/her end, changes in the user's environment may impact collection. Let’s take the case of a bad network. As product builders, we need to ensure resiliency is built in the product to handle any unforeseen circumstances in the user’s environment. If possible, predict and prevent. This requires the correct data infrastructure in place.
And in the worst case, if the system breaks, provide the right enablement to help user troubleshoot.Security breaches - Need I say anything? Data is precious.
It is important to keep an eye on these. Because though trivial, the collection system is at the heart of any product. Today, a resilient DC is not a differentiator, but a bad one certainly can take the business down.
Here are two types of collectors used today.