As product people, we all deal with a lot of data. Here are some questions I’ve learned to ask based on my work in full-text search products; regardless of whether I’m dealing with a search implementation, adoption analytics, or setting up statistical experimentation on a new feature of value.
Six Questions to Ask
Is it clean? I recall an instance when looking at a report on top search queries where the first several hundred entries were null queries with subsecond utilization timestamps. What we eventually figured out was that various code modules were utilizing the search APIs for the high-speed population of screen data. This was not a bad practice for read-only page renderings, nor was this a case of garbage in. Instead, it just indicated that we needed to re-instrument a bit to ensure the analytics data captured reflected something other than a rapid series of null queries asserted by a living and breathing human being.
Is it normal? January has three more days than February, a three-day difference might cause a significant discrepancy in month-to-month comparisons. This is resolved via normalization, a process takes all metrics and forces them to equal proportions, raising or lowering individual line items according to their normalized total. Put another way, normalizing your data ensures you’re not incorrectly comparing extra large apples to small apples when you want to compare mid-sized apples to mid-sized apples.
Is it seasonal? While December and January each have 31 days, it’s not hard to understand how sales might look like they’re dropping off a cliff from one month to the next, or why returns might spike in the latter month. So while other seasonal observations might not be as easy to detect, just them keep in mind that they may explain significant swings in measures.
What about Outliers? Remember my first example of cleaning data? We figured out we had an issue because it was a deviation from the norm. This is where various mathematical and statistical models and algorithms can come into play, which ones you want to employ are very much dependent on the nature of your data. This is why most awesome organizations have at least one data scientist on board, as they are worth their weight in gold in helping you figure out which tool is best for the job (in other words, pick up the tab when you take them out to lunch … often!).
Is it Contextual? How do you interpret a query for “a formal suit”? It depends. If you’re talking to a teenager looking forward to the prom then your meaning is entirely different than a legal assistant engaged in client research. You and I have other senses and reasoning capabilities to help us figure that out. Computers, by their underlying binary nature, aren’t so attuned to such subtleties. This is why more mature data ingestion systems eventually include a natural language processing pipeline of some sort; or at least leverage tools such as word2vec over a bag-of-words in their treatment of incoming text.
Is it Actionable? How can spot a first time user of Tableau or Kibana? Usually, the first sign is a dashboard that looks like a hoarder’s garage where they include everything but the kitchen sink. The other sure sign is a focus on what’s been labeled ‘Vanity Analytics’, data that makes us feel good but isn’t really all that helpful in shaping more productive behaviors; for that, you want to focus on actionable measures.
TL;DR
Whether you’re creating KPIs for your newest feature, NLP pipelining data into a full-text search service, or building out a SIRI-like recommendation system, my point is simply that while data is awesome, it also requires you as a product manager to ask important questions related to its quality and reliability.
YMMV