Visualizing Time Series Data in Python
URL: http://datascienceanywhere.com/timeseries/
In this article, I will explain how to visualize time series data. In general, there are four kinds of charts that are used to visualize time series data.
- Line Chart
- Area Plot
- Bar Plot
- Heatmaps
Click here to download the data and Google Colab notebook
Now let us consider the “Debourgh-Sales.xlsx” excel sheet where we are having the “Years” and “Sales” information for each month around 10 years starting from 01–01–2001.
1. Line Chart
A line chart is the most common way of visualizing the time series data. Line chart particularly on the x-axis, you will place the time and on the y-axis, you will use independent values like the price of the stock price, sale in each quarter of the month, etc.
Now let’s see how to visualize a line plot in python.
import numpy as np # numerical python
import pandas as pd # pannel datasets
import matplotlib.pyplot as plt
import seaborn as sns
Once you import all necessary libraries now load the dataset
# load the data using pandas
df = pd.read_excel('./data/Debourgh-Sales.xlsx',sheet_name='Sheet2')
df.head()
And the data we have with us is consists of two columns “Years” and “Sales”. If you see the “Years” column it is of date format and “Sales” could be a float value. Now it is important to find the is there are any missing values and data types of each column and number of instances in the dataset. For that, we can use df.info().
df.info()
From the results, there are 108 entries with and no missing values. and the data type of Years is datetime64[ns] and Sales is float64. Now it is convenient to plot the line plot with this data. In this example, I will show you the two ways to plot the line plot.
- Using Seaborn
plt.figure(figsize=(15,6))
sns.lineplot(x='Years',y='Sales',data=df)
plt.show()
- Using Pandas dataframe
In order to plot with a pandas data frame, we need to set index to DateTime values i.e., Years, and then we need to visualize the graph.
df1 = df.set_index('Years') # setting "Years" as indexfig,ax = plt.subplots(figsize=(15,6))
df1.plot(kind='line',ax=ax)
plt.show()
Hue the Line plot
You can color your line plot based on years, meaning you can color the line plot for each year ie. you can color the line for each year. In order to do that we need to extract years from the date. We will see how to extract the year and quarter from the Years column.
df['year'] = df['Years'].dt.year # extracting year
df['quarter'] = df['Years'].dt.quarter # extracting quarter
df.head()
As you see from the results we can easily extract the year and quarter from the Years column using pandas. Now using the above data frame we visualize the line chart by coloring to each year differently.
plt.figure(figsize=(15,6))
sns.lineplot(x='Years',y='Sales',data=df,hue='year',palette='Set1')
plt.show()
2. Bar Plot
We cannot directly visualize the bar plot in time series data. We need to extract a few more information out of it. As bar plot is used for categorical data. So, here we need to create categories and with those categories, we need to visualize the bar plot. We will see three kinds of bars plot “Bar plot”, “Proposition Bar plot” and “Stacked Bar Plot”. Let’s see one by one in python.
(a) Bar Plot
The bar plot is used for categorical data. But here the data is continuous but still, we can make that categorical like getting of average or sum of all sales in each year, and average or sum of all sales in each quarter are some examples of bringing the data into categorical.
Let visualizing the sum of all sales each year. We can do that using group of “Sales” and “year” and group by “year”. The following code is shown below
df_year = df[['Sales','year']].groupby(by='year').sum().reset_index()
df_year
As you can see that we got the sum of sales from each year and now let’s visualize the bar plot with seaborn as shown below.
sns.catplot(x='year',y='Sales',data=df_year,kind='bar',aspect=3)
Let’s visualize the bar plot even with a pandas data frame.
fig, ax = plt.subplots(figsize=(15,6))
df_year.set_index('year').plot(kind='bar',ax=ax)
plt.show()
(b) Proposition Bar Plot
Now let’s look into the proposition bar plot. The Proposition bar plot is very of comparing the sales based on quarter and year. This kind of bar chart is very useful especially if you’re want to compare more than two variables. In order to compare the results we need to do the group by to the following columns, in this case, it is year and quarter and with those values, we need to reshape into the desired format as shown in the below code.
sales = df[['Sales','year','quarter']].groupby(by=['year','quarter']).sum().values
df_sales = pd.DataFrame(sales.reshape(9,4),index=list(range(2001,2010)),columns= ['Q1','Q2','Q3','Q4'])df_sales
Here we bring the data with index is years, column as a quarter, and the values are sums of total sales in the data. With the data above you can plot the proposition bar plot using pandas as shown below.
fig,ax = plt.subplots(figsize=(15,6))
df_sales.plot(kind='bar',ax=ax)
plt.show()
(c) Stacked Bar Plot
The stacked bar plot is the same as the proposition bar plot. But in the stacked bar plot we put all the bar on top of each bar as shown below. We can plot a stacked bar plot by just set the parameter `stacked` to True.
fig,ax = plt.subplots(figsize=(15,6))
df_sales.plot(kind='bar',ax=ax,stacked=True)
plt.show()
Add alt text
3. Area Plot
The area plot is a combination of line-chart and bar plot. Just like a bar, we have different types of area plots like Area chart, proposition area chart, and stacked area chart.
We can plot the area charts using pandas simply setting the parameter kind to “area”. This is much more similar to the line plot but the area plot filled the portion of the area with blue color.
fig,ax = plt.subplots(figsize=(15,6))
df1.plot(kind='area',ax=ax)
plt.show()
Stacked Area Plot
The stacked area plot is similar to the stacked bar plot and we will use the same data to visualize the stacked area plot. The intuition behind that is also the same as a stacked bar plot i.e., we can visualize each quarter sales with respective color and sum of all quarter values equal to total sales in that year.
fig,ax = plt.subplots(figsize=(15,6))
df_sales.plot(kind='area',ax=ax,stacked=True)
plt.show()
Proposition Area Plot
In the proposition area plot, we can visualize the sales or values on the same scale wherein the stacked bar plot we sum the area to get the total sales.
fig,ax = plt.subplots(figsize=(15,6))
df_sales.plot(kind='area',ax=ax,stacked=False)
plt.show()
4. Heatmap
Finally, heatmap is a powerful way to visualize all your data in one chart. From the chart, we are plotting complete data for all the sales in each quarter and year. It the clear that where we are having high sales and least sale can be easily identified with heatmap
plt.figure(figsize=(10,8))
sns.heatmap(df_sales,annot=True,fmt='.2f',cmap='gnuplot',linewidths=1)
plt.show()
Still, there are a lot more visualization chart is there for time series data. But these are the charts that are mostly used to get insights from the data.
For more details on time series and forecasting with ARIMA, SARIMA visit