This article builds on parts 1 and 2 of the technology series on the data value chain. In the first two parts, Dilyana Bossenz and Björn Leffler from m2 have already addressed data purpose and data visualization as well as data transformation and data analysis.
ERP systems provide standard reports. Today, reports should be available online and in real time. Often, they are enriched with eye catching graphics. However, this is still not a “real” dashboard. There are key differences between a report and a dashboard.
A dashboard is supposed to highlight the most important information from the abundance of data and variables. Sure. But are dashboards only meant to do that? A dashboard must be the result of considerations related to a digital strategy. It needs to verify that the strategy being applied is producing results. If it doesn’t, the dashboard is more of a report, assessment, overview or summary of the company’s activities, but that doesn’t make it a dashboard. It is the close connection with the company strategy that makes a dashboard a dashboard!
In part two of this article series, we looked at steps three and four in the data value chain. As an example, we focused on a fictitious printing company with 100 employees. The print shop wants to build a dashboard and asks itself the following question: In what quantity were print products such as brochures, flyers or postcards produced in Germany over the past five years (2016 – 2021)?
This blog article, the last part of the series, is about preparing the data and data sources for the final data analysis.
Data Prep & Integration
How do I connect and load different data sources?
Nowadays, all BI tools offer a connection to the most common databases and can read files in different formats.
In our example, we need certain data from the database: number of pieces, printed products, location and date. We can access this data with the MySQL query in the database. There are two ways to do this:
- Accessing the data directly in the database. Afterwards the data can be exported as csv.
- Accessing the database is done through Tableau. The data is made available directly in Tableau.
In practice, data from different data sources can be merged. For example, your data may exist in an Excel file on your computer and in a database. Let’s assume that in order to answer your existing questions, both data sources need to be connected. Tableau Desktop offers you this possibility. In Tableau Desktop you can connect data from different data sources.
Note here: if the data comes from different data sources, tables may have different formats and aggregation levels. The structure of data and data sources may differ significantly. To clean data and synchronize such existing differences so that the data is best prepared for data analysis, Tableau Prep can be used, for example.
Tableau Prep is another tool in the Tableau portfolio.
This tool provides simple and convenient ways to merge data from different data sources and aggregation levels without programming knowledge. As in Tableau Desktop, Tableau Prep also provides the possibility to connect to data sources, regardless of whether they are stored locally on the computer or in a database.
How do I need to model the data?
Data can be modelled in different ways. A distinction is made between the following options:
- Flat data model: A two-dimensional matrix of data elements.
- Hierarchical model: Data is stored in a tree-like structure. Each entry has a parent entity or trunk.
- Network model: This model builds on the hierarchical model. It allows 1:n relationships; their assignment is done via a link table.
- Relational model: a collection of predicates over a finite set of predicate variables, for whose possible values or combinations of values constraints apply.
- Star schema model: standardized fact and dimension tables remove attributes with low cardinality for data aggregations.
- Data vault model: records containing long-term stored historical data from various data sources arranged in hub, satellite, and link tables and related across them.
The decision to model data is influenced by various factors. We need to know which database we are using. And we should be clear about which visualization we want to create in the end.
For our example, the relational model comes into question, since we use MySQL as the storage location for our data. In this model, the data is stored thematically. For example, authors and designers (author’s name, addresses, categorization) are stored in one table. Publisher information is stored in another separate table. Other information can be created in additional tables. In this way, several tables are created according to a certain topic. Separating the information in different tables ensures that our queries in the database are performed quickly and with high performance. For this purpose, the tables should contain a column with a unique key. Mostly these are ID numbers. Using these key columns, we can connect tables and build an individualized table.
The figure below shows such a schema. The schema defines what data is stored in the database and how this data is related to each other. The process of creating a schema is called data modelling.
Now we have defined our question from the beginning of the process and know what data is needed. The initial question we asked was in what quantity print products such as brochures, flyers or postcards were produced in Germany in the past five years (2016 – 2021).
To answer the question, we need the following data: Print products, number of pieces, location and date. If this information is available within the company, it must be clarified how access to the data can be made possible. Externally available, public data can also be added to internal company data, depending on the evaluation requirements. However, the data required in our example should come exclusively from the internal data source.
In a self-service environment with multiple publishers, it is not uncommon for a project to contain a variety of content in the database that is similarly named or based on the same or similar underlying data. In addition, content is often published without descriptive information. For example, a report could have different names:
- 20221003 Report_v1
- 20221004 Report_v1
- 20221004 Report_v2
- 20221004 Report_final
The reason for different naming could be, for example, that minimal changes have been made in each version. This complicates the work of analysts who may not have confidence in the data they are supposed to use for their analysis and visualization due to unclear file names or histories.
To help your users find the data that is reliable and recommended for their type of analysis, there should be a clear and standardized syntax and delivery process as part of the data preparation process.
Many organizations have developed their own code on how to designate the correct or relevant data sources. For example, a data source might have a certified seal or be named according to a traffic light scheme. Once this is assured, the analysis and visualization of the data can begin using Tableau.
This was the last part of the series on the data value chain. We wanted to give you an overview of how to proceed in a classic business intelligence project.
M2 technology & project consulting GmbH, briefly named M2, was founded in Berlin in 2009. The company specializes in next-generation business intelligence solutions. As the first partner of Tableau Software in Germany and a long-standing partner of Alteryx, Amazon Web Services, Exasol and Snowflake, the company has since accompanied the development of modern BI platforms and data-driven projects along the entire data value chain at DAX corporations, SMEs, public sector companies and start-ups.
If you want to try Tableau yourself, join a free workshop for Tableau beginners.