18 ways to turn analysis projects into a nightmare
Posted: 05 Sep 2016 12:49 AM PDT
Every week, we read news about failed analysis projects. If you listen carefully, you can hear the grunts, screams and curses of thousands of analysts all over the world about their analysis nightmares.
At Chandoo.org, we talk a lot about best practices for data analytics. So today, let’s peek in to the dark side and understand the mistakes that can turn your analysis project into a nightmare.
There are 3 parts in any analysis project
To understand these worst practices in analysis world, first let’s break analysis projects in to 3 parts.
- Data Structure
- Tools & Construction
Let’s deep dive in to each area of the analysis projects to see what can go wrong.
Requirements – Worst practices
Any analysis work must begin with requirement gathering. This is when you set out the objective for your work. So what can go wrong at this stage?
- Asking wrong questions: Start with a wrong question and you will end up analyzing wrong stuff. Its that simple.
- Asking wrong people: In most analysis projects the people who will use outputs (consumers) are not the same people who may pay for it (buyers). For example, you may be designing a customer service dashboard that will be used by supervisors and store managers, but the people paying for it could be in marketing department. So if you ask the wrong people (ie buyers, gate keepers, figure heads) instead of right ones (end users), you will end up with wrong set of requirements.
- Not enough whys: Anytime you have a requirement from your user, you should follow up with, “Why do you need this…?”. Do it at least a few times until you hit the bottom. This will give you correct idea of the needs of your users.
- Too much or too little detail: If you gather too much detail during requirements phase, you will end up wasting a lot of time over tiny things and may loose the big picture. On the other hand, too little detail and you will end up making too many assumptions. So what is the right level detail? This comes with practice. The best way to tackle this is, collect too much, prioritize the needs and build only for top 5 / 10 requirements. Once you have a version ready, show case and iterate.
Data Structure – Worst practices
Once you know what your user wants, you can start going thru data to prepare the analysis. So what can go wrong at this stage? Oh so much.
At this stage, there are usually 2 kinds of mistakes.
Table level worst practices:
- Not using tables: If your data is not structured in tables (could be ranges or Excel tables), then you are asking for trouble.
- Separated data: Data that should be together, but split in to multiple sets. For example, student data maintained by each class in a separate spreadsheet (or workbook) just makes the analysis work extra hard.
- Duplicate data: Same data appearing several times across various data sets is a common problem. If you don’t know what is duplicated, you end up doing lots of extra steps even before starting the analysis.
- Wrong structure: Tall and narrow. That is what works best for tables. Few columns, lots of rows. Anytime you have a structure that is not like this (ie a pivot kind of table as source data, rows with variable number of columns etc.) you are in a for a long night and gallons of coffee.
- Too much data: Want to analyze the trends for 2016, but have data from 1990? You have too much data. Trim down to just what you need before starting your formula engine, or else you would be stuck in calculating x% hell.
Row / cell level worst practices:
- Multiple values in a cell: IF(Cell has more than 1 value, analyst SAD, analyst HAPPY). Parsing cells and splitting values is tricky, error prone and a waste of time. If possible, avoid at the source by keeping one value per cell.
- Inconsistent formats: This is quite common for date values, but you can have it for others too. A date maintained as 1-SEP-2016 in first cell, 01/09/2016 in another and 01-09-2016 in another calls for lots of trouble.
- Blanks: Cells, rows or columns that have arbitrary blank values are hard to track and fix.
- Merged cells: These can cause a lot of heartache and slow down your workbooks. Keep merged cells to the headers / display part, but don’t merge things inside data section.
Tools & Construction – Worst practices
Even if you have awesome requirements, perfect data, you can still screw up your analysis project by choosing wrong tools and following shabby construction practices. So let’s see what these are:
- Using wrong tools for the job: Imagine writing your own VBA code to find matching value in a column and to return corresponding value from another column. You would spend lots of time thinking about all scenarios for the code. But hey, why not use VLOOKUP() instead? By using wrong tools for the job, you will end up doing extra work and getting poor results. Use below mapping to find best tools (from best to good in that order) for any part of your analysis work.
- Loyalty: If you become too loyal to a particular tool / concept of doing things, then you will end up with extra work for solving even simple problems. For example, if you are a super fan of Excel formulas and use them for everything, then you would end up writing long formulas for simple things like removing duplicates from a list. The same can be achieved with just a few clicks with either Power Query, Pivot Tables or Remove Duplicates. Likewise, if you want to use pivot tables for everything, then you may end up compromising for certain types of analysis because it is not possible with them.
- Volatile Workbooks: Using too many OFFSETs, NOWs and other volatile functions in your complex workbook can slow things down to a grinding halt. Use smarter functions where possible.
- Spaghetti Spreadsheets: You know the kind where the formulas are interconnected hot mess. Once you make a spaghetti with your workbook formulas, you can say goodbye to maintainability. Layer your workbooks like pizza instead. Build formulas / logic one on top of another in orderly fashion.
- Lazy referencing: writing your formulas with A:A, 1:1 etc. is lazy. Once you are lazy, your analysis gets crazy. Avoid lazy referencing by using either Tables or dynamic references.
50 ways to make your analysis projects awesome:
Of course, for every worst way to analyze data, there are dozens to make awesome analysis. If you want to learn the best way to analyze any kind of data along with a deep dive in to advanced Excel features and case studies, check out our 50 ways to analyze data program. This online course will make you awesome in analytics.
Share your analytics worst practice:
Do you come across any of the above worst practices in your day to day work? What is the worst you have seen? Please share in the comments section so we all can learn from these mistakes.