From: https://towardsdatascience.com/a-complete-guide-to-time-series-data-visualization-in-python-da0ddd2cfb01
## This Should Give You Enough Resources to Make Great Visuals with Time Series DataTime series data is very important in so many different industries. It is especially important in research, financial industries, pharmaceuticals, social media, web services, and many more. Analysis of time series data is also becoming more and more essential. What is better than some good visualizations in the analysis. Any type of data analysis is not complete without some visuals. Because one good plot can provide you with a better understanding than a 20-page report. So, this article is all about time-series data visualization.
I need to make one more thing clear before starting.
But this article should provide you with enough tools and techniques to tell a story or understand and visualize a time series data clearly. I tried to explain some simple and easy ones and some advanced techniques. ## DatasetIf you are reading this for learning, the best way is to follow along and run all the code by yourself. Please feel free to download the dataset from this link: This is a stock dataset. Let’s import some necessary packages and the dataset: import pandas as pd I used the ‘parse_dates’ parameter in the read_csv function to convert the ‘Date’ column to the DatetimeIndex format. Most of the time, Dates are stored in string format which is not the right format for time series data analysis. When it is in the DatetimeIndex format, it is a lot helpful to deal with as a time series data. You will see it soon. I have a detailed article on Time-series data analysis. If you are new to time series data, it will be helpful if you have a look at this article first: I explained some important Pandas function in the article above that will be used in this article. Though I will provide a brief idea here as well. But if you need an example to understand better, please feel free to have a look at that previous article. ## Basic Plots FirstAs I said before, I want to start with some basic plots. The most basic plot should be a line plot using Pandas. I will plot the ‘Volume’ data here. See how it looks: `df['Volume'].plot()` This is our plot of ‘Volume’ data that looks pretty busy with some big spikes. It will be a good idea to plot all the other columns as well in a plot to examine the curves of all of them at the same time. `df.plot(subplots=True, figsize=(10,12))` The shape of the curve for ‘Open’, ‘Close’, ‘High’ and ‘Low’ data have the same shape. Only the ‘Volume’ has a different shape.
The line plot I used above is great for showing seasonality. Resampling for months or weeks and making bar plots is another very simple and widely used method of finding seasonality. Here I am making a bar plot of month data for 2016 and 2017. For the index, I will use [2016:]. Because our dataset contains data until 2017. So, 2016 to end should bring 2016 and 2017. df_month = df.resample("M").mean()fig, ax = plt.subplots(figsize=(10, 6)) There are 24 bars. Each bar represents a month. A huge spike in July 2017. Otherwise, there is no monthly seasonality here. One way to find seasonality is by using a set of boxplots. Here I am going to make boxplots for each month. I will use ‘Open’, ‘Close’, ‘High’ and ‘Low’ data to make this plot. `import seaborn as sns` It shows the monthly difference in values clearly. There are more ways to show seasonality. I discussed it one more way at the end. ## Resampling and RollingRemember that first line plot of ‘Volume’ data above. As we discussed before, it was too busy. It can be fixed by resampling. Instead of plotting daily data, plotting monthly average will fix this issue to a large extent. I will use the df_month dataset I prepared already for the bar plot and box plots above for this. `df_month['Volume'].plot(figsize=(8, 6))` Much more understandable and clearer! It gives a better idea about a trend in long term. Resampling is very common in time-series data. Most of the time resampling is done to a lower frequency. So, this article will only deal with the resampling of lower frequencies. Though resampling of higher frequency is also necessary especially for modeling purposes. Not so much in data analysis purpose. In the ‘Volume’ data we are working on right now, we can observe some big spikes here and there. These types of spikes are not helpful for data analysis or for modeling. normally to smooth out the spikes, resampling to a lower frequency and rolling is very helpful. Now, plot the daily data and weekly average ‘Volume’ in the same plot. First, make a weekly average dataset using the resampling method. `df_week = df.resample("W").mean()`
This ‘df_week’ and ‘df_month’ will be useful for us in later visualization as well. Let’s plot the daily and weekly data in the same plot. start, end = '2015-01', '2015-08' Look, the weekly average plot has smaller spikes than daily data. Rolling is another very helpful way of smoothing out the curve. It takes the average of a specified amount of data. If I want a 7-day rolling, it gives us the 7-d average data. Let’s include the 7-d rolling data in the above plot. df_7d_rolling = df.rolling(7, center=True).mean()start, end = '2016-06', '2017-05'fig, ax = plt.subplots() A lot going on in this one plot. But If you look at it carefully it is still understandable. If you notice 7-d rolling average is a bit smoother than the weekly average. It is also common to take a 30-d or 365-d rolling average to make the curve smoother. Please try it yourself. ## Plotting the ChangeLots of time it is more useful to see how the data change over time instead of just everyday data. There are a few different ways to calculate and visualize the change in data.
The shift function shifts the data before or after the specified amount of time. If I do not specify the time it will shift the data by one day by default. That means you will get the previous day's data. In financial data like this one, it is helpful to see previous day data and today's data side by side. As this article is dedicated to visualization only, I will only plot the previous day data: `df['Change'] = df.Close.div(df.Close.shift())`
In the code above, .div() helps to fill up the missing data. Actually, div() means division. df. div(6) will divide each element in df by 6. But here I used ‘df.Close.shift()’. So, Each element of df will be divided by each element of ‘df.Close.shift()’. We do this to avoid the null values that are created by the ‘shift()’ operation.
Here is the output: You can simply take a specific period and plot to have a clearer look. This is the plot of 2017 only. `df['2017']['Change'].plot(figsize=(10, 6))` Though the shift is useful in many ways. But I find percent change useful on many occasions.
I will use the monthly data that was calculated in the beginning. This time I chose bar plots. It shows the percent change clearly. There is a percent change function available to get the percent_change data. df_month.loc[:, 'pct_change'] = df.Close.pct_change()*100fig, ax = plt.subplots() I plotted the percent change in closing data here. I used monthly percent change here.
Differencing takes the difference in values of a specified distance. By default, it’s one. If you specify 2 like “df.High.diff(2)’, it will take the difference of first and third element of ‘High’ column, second and fourth element, and so on. It is a popular method to remove the trend in the data. The trend is not good for forecasting or modeling. `df.High.diff().plot(figsize=(10, 6))`
Another way of transformation. It keeps adding the cumulative. For example, if you add an expanding function to the ‘High’ column first element remains the same. The second element becomes cumulative of the first and second element, the third element becomes cumulative of the first, second, and third element, and so on. You can use aggregate functions like mean, median, standard deviation, etc. on it too. In that way, it will provide you with the changing mean, median, sum, or standard deviation with time. Isn’t it really useful for financial data or business sales or profit data? `fig, ax = plt.subplots()` Here I added expanding mean and standard deviation. Look at the daily data and the mean. At the end of 2017, daily data shows a huge spike. But it doesn’t show a spike in the average. Probably if you take the 2017 data only, the expanding average will look different. Please feel free to try it. ## Heat MapA heat map is generally a common type of data visualization that is used everywhere. In time-series data also heat maps can be very useful. But before diving into the heat map, we need to develop a calendar that will represent each year and month data of our dataset. Let’s see it in an example. For this demonstration, I will import a calendar package and use the pivot table function to generate the values. `import calendar` The calendar is ready with monthly average ‘Open’ data. Now, generate the heat map with it. `ax = sns.heatmap(all_month_year_df, cmap='RdYlGn_r', robust=True, fmt='.2f', ` The heat map is ready! Darker red means very high opening and dark green mean very low opening. ## DecompositionDecomposition will show the observations and these three elements in the same plot: Trend: Consistent upward or downward slope of a time series. Seasonality: Clear periodic pattern of a time series Noise: Outliers or missing values Using the stats model library, it is easy to do it: from pylab import rcParams Here the trend is the moving average. To give you a high-level idea of residuals in the last row, here is the general formula: Original observations = Trend + Seasonality + Residuals Though the documentation for decomposition itself says that it’s a very naive representation but it is still popular. ## ConclusionIf you could run all the code above, Congratulation! You learned enough today to make a great level of time series of data visualization. As I mentioned in the beginning, there are a lot of cool visualization techniques available. I will write more in the future. |

Python >