Data Visualization in Data Science
In our increasingly data-driven world, it’s more important than ever to have accessible ways to view and understand data. After all, the demand for data skills in employees is steadily increasing each year. Employees and business owners at every level need to have an understanding of data and of its impact.
That’s where data visualization comes in handy. With the goal of making data more accessible and understandable, data visualization in the form of dashboards is the go-to tool for many businesses to analyze and share information.
Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. Additionally, it provides an excellent way for employees or business owners to present data to non-technical audiences without confusion. (tableau.com)
The advantages of data visualization include:
- Easily sharing information.
- Interactively explore opportunities.
- Visualize patterns and relationships.
The disadvantages also include:
- Biased or inaccurate information.
- Correlation doesn’t always mean causation.
- Core messages can get lost in translation. (https://www.tableau.com/learn/articles/data-visualization)
Let’s jump into the case!
Shopedia is one of big e-commerce who sells anything. They’d like to do performance analysis in 2019 with BI team assistance. We already cleansed the data here and it’s ready to visualize in Looker Studio.
The first step is open Looker Studio and click Blank Report to make new dashboard.
And then, add the data source in many options below. Mine is from Google Sheets and then add chart to start data visualization.
From the dashboard created, BI team has many findings as below:
- Sales performance over time, in what month do our sales peak or drop the most?
Answer: Based on the graphs below, we could find that we had our sales peak in March based on Distinct Count OrderID (993 orders) and total revenue ($204k) compared other months and we had our sales drop in May based on Distinct Count OrderID (57 orders) and total revenue ($9.2k).
2. Which city has the top and lowest orders during 2019?
3. Which city contributed the most and the least to our revenue in 2019?
Answer: This graph below will answer both question number 2 and 3. For the number 2, we could find that:
- The city has top orders in 2019 is San Fransisco with 887 orders in total.
- The city has lowest order in 2019 is Austin with 210 orders in total.
And for the number 3, we could see that:
- The most contributed city by revenue in 2019 is San Fransisco with $166 K in total
- The least contributed city by revenue in 2019 is Austin with $39.1K in total.
4. Is there any correlation between the price of each product and their revenue contribution to the company?
Answer: We could draw the conclusion for correlation between price of each product and the revenue from scatterplot. Based on the graph below, surely there’s a positive correlation between product price and revenue. The higher the price, the more revenue we could get.
5. What time is the best time to roll out our advertisement campaign by hour?
Answer: The best time to launch the advertisement campaign is around 12pm and 8 pm because those two hours have 270 orders compared to another hours based on this graph below. One of the hypotesis why the peak orders happened around those two hours is because 12 pm is lunch time and people usually take their break time by also scrolling the e-commerce and same thing also applied at 8 pm, when they get rest before sleep.
6. What kind of product delivers the most order, is it aligned with the revenue yielded from the product sales?
Answer: By orderID, we could find that product category which achieve peak order is in Headphone category which has 998 orders in 2019 and if we take a look into the details, there’s no correlation between revenue and revenue yielded from product sales because the most contributed product to revenue is Laptop category instead of Headphones.
Call to Action:
- Based on chart number 1, we could see that the peak order only happened in March and the distribution of sales growth for each months tend to be uneven and low. By this fact, we should have running campaign every month with different categories. Besides that, to increase the orders we also could make bundling programs for related categories with high-selling products and low-selling products at one.
- Relate to chart number 5 about advertisement hours, that also could be our opportunity to acquired new users by providing them some voucher code for discount or free delivery to increase the transactions.
- To increase the transaction for the low contributed city (chart number 2 and 3), we have to deep dive more by merchants for that city because there would be an assumption that the low order is happened because that city does not have many merchant compared to another city’s or maybe the comparison between active and inactive merchant is too high for inactive one.
- We should review the price of each product regularly by comparing to another e-commerce site so we could gain more profit and the price can compete with another e-commerce.
Many thanks to: MySkill, Ronny Fahrudin, Kak Riza and all of mentors who help me to understand this Data Science field.