I wanted to understand at a higher level what we do as business analysts. During my placement you can feel a bit like a Tableau monkey or just executing what a stakeholder tells you. But I think it’s really important to understand our “why” and purpose – ended up having a really interesting conversation with people I have worked with.
What does Business Intelligence mean?
“Business intelligence is a technological term that covers data, computing and analytics within business operations. Much more than a specific “thing”, business intelligence is rather an umbrella term that covers the processes and methods of collecting, storing and analysing data from business operations or activities to optimise performance.” – https://www.tableau.com/en-gb/learn/articles/business-intelligence?__src=liftigniter&__widget=blog-widget&li_source=LI&li_medium=blog-widget
What is our role?
In-between’ers – we work as the facilitator between end users and data scientists. We don’t ‘mine’ the data or look for data (i.e. Collating different data sources, performing predictive, number crunching) – this would be what data scientists do.
What we do is take the data that is given and we try to find value/insight to give to end users. But how do we know what is valuable insight – chicken and egg conundrum, always going to be a backwards and forwards process. Up to you as analysts to understand the business need and to communicate that to data scientist/engineer to get the correct data.
Therefore, key skill = communication!
Future role of analytics
- being able to do ETL, so that data scientists can focus on more interesting and harder, complex work e.g. more machine learning, AI etc.
Tableau is now slightly edging towards ETL with Prep – yes not that great atm, but what about in 5-10 yrs time? What does this mean for the role of the data analyst? It means that we can take away the laborious job of ETL from data scientists, freeing them up to do more of the harder complicated stuff e.g. Modelling, predictive, machine learning, AI etc.
One of my colleagues from The Information Lab, blew my mind on this concept and made me think of the future of work very differently – https://www.oecd.org/employment/future-of-work/Future-of-work-infographic-web-full-size.pdf
In the meantime, the more skilled end users get in understanding data, the more questions/demand there will be for analysts.
What other components are there to being a data analyst? What other skills? What other roles do they play? How do we see our role changing in 5,10 years time?
Infrastructure and Data Eco-systems
Why do companies need systems in place?
All about utilising and exploiting data (living in the digital era, so don’t get left behind competitors). But really, it is about finding VALUE.
What is valuable?
Getting water from a stream = not valuable
BUT, after processing it and cleaning it = becomes valuable and consumable
Same thing with data!
Side-thought: what is classified as valuable? Something that is part of the iterative process, as this develops and more questions get asked and more people become more skilled and data literate, can begin to narrow this definition and become more focused and targeted in their system design choices.
Limitation : When creating an infrastructure, it is not always led by design but through a more iterative process. Can sometimes mean they are limited when it comes to changing/shifting systems as reliant on what has already been built.
Key stages within a data eco-system:
1) Data collection
2) Data capture -> a way of transporting data from the sources to data storage
3) Data storage -> storage platforms e.g. cloud, database, data warehouse – can find differences here: https://panoply.io/data-warehouse-guide/the-difference-between-a-database-and-a-data-warehouse/
Key considerations are size and type of data:
Need a system that can handle various types of data e.g.
structured – schemas/tables
Semi-structured – JSON, XML, etc
un-structured – video, MP3, images
Examples: Hadoop, AWS, Spark, Hive etc.
Concept of Data Temperature:
Hot, warm and cold data? – Concept by David A Spezia
Cold being data that is quite hard to get/pull out e.g. Hive, Hadoop. Tableau breaking this convention by allowing analysts to connect to these data sources easier, whereas previously very hard to do.
4) Analytics -> WHERE WE LIVE, WHOOP! Once we have all the data, turn it into insight and something communicable/tangible.
Analytics – Tableau and Alteryx
Where does Tableau and Alteryx sit within this data structure?
This is the Tableau view of a data ecosystem, where Tableau and Alteryx probably sit more towards the right hand side.
Interesting angle from a CTO at one of the companies I worked with: push vs pull of data. Tableau and Alteryx can’t handle streaming data at all (I.e. you’re always pulling data from a data source). A good example is through the Twitter API and the differences between them: https://brightplanet.com/2013/06/25/twitter-firehose-vs-twitter-api-whats-the-difference-and-why-should-you-care/
How is Tableau being deployed? Automating this? How can we automate some of our consultancy engagements?
What is this thing? Would love to get some ideas on how this is being understood by clients and what we are doing to assist with that: is it establishing internal standards and best practices, or is it more about legislation and regulation (GDPR). Demands can differ across industries -> banks vs tech start-up’s. Is it technical vs business, industry vs generic, regulation vs operational.
Spoke with the Head of Data Infrastructure at one of the companies I have worked with, about their experiences and considerations when maintaining and governing their data infrastructure and eco-system. Got loads of interesting insight from the 30 min conversation we had:
–> Their ecosystem was built “through evolution rather than by design”.
Problems encountered -> solutions were built ad hoc and on the fly, without really thinking about how that solution fit into the overall structure/system.
How do they decide which products to go with?
- Mostly tied to other existing products and what they support. E.g. if a product supported AWS and not google cloud, they would opt for AWS as it is easier to integrate with the current system.
Problems with this approach
- multiplication of products and systems. You end up with a variety of different types of products that essentially do the same thing because of existing products and their limitations. It also costly and logistically very difficult to switch from one product to another.
- Hard to have a common policy of development, usage and retention -> e.g. how long do I need to keep data stored for as it gets expensive to store data for a long time. People ask them if that data is useful and if there is anything useful there? They says “don’t know, is there something useful there?”, which is a problem in itself -> no one knows what is useful.
Part of their job is to bring those systems together and to get people more aware and conscious to use more common data storage in the future.
What does their ecosystem look like?
1) Data source:
- internal platforms I.e. data they have collected on their own via their website, applications that they have built to track usage
- External – suppliers that they pay for data
- API’s and web scraping
2) Data Capture:
- Streaming technologies e.g. Kafka, cloud provider versions e.g. KNESES (amazon)
3) Storage Platforms:
- AWS – storage of object files e.g. disc drive – Amazon S3.
What is the main purpose of collecting this data?
To improve decision making processes, they would often have a gut feeling that something was not working etc, but now they actually have data to back that up.
Mostly used by: senior business decision makers e.g. CTO, CFO and heads of departments.
Also used for process monitoring to see what is taking up a lot of space and time to run.