Python
A Complete Guide to Time Series Data Visualization in Python
From: https://towardsdatascience.com/a-complete-guide-to-time-series-data-visualization-in-python-da0ddd2cfb01
This Should Give You Enough Resources to Make Great Visuals with Time Series DataTime series data is very important in so many different industries. It is especially important in research, financial industries, pharmaceuticals, social media, web services, and many more. Analysis of time series data is also becoming more and more essential. What is better than some good visualizations in the analysis. Any type of data analysis is not complete without some visuals. Because one good plot can provide you with a better understanding than a 20-page report. So, this article is all about time-series data visualization.
I need to make one more thing clear before starting.
But this article should provide you with enough tools and techniques to tell a story or understand and visualize a time series data clearly. I tried to explain some simple and easy ones and some advanced techniques. DatasetIf you are reading this for learning, the best way is to follow along and run all the code by yourself. Please feel free to download the dataset from this link: This is a stock dataset. Let’s import some necessary packages and the dataset: import pandas as pd ![]() I used the ‘parse_dates’ parameter in the read_csv function to convert the ‘Date’ column to the DatetimeIndex format. Most of the time, Dates are stored in string format which is not the right format for time series data analysis. When it is in the DatetimeIndex format, it is a lot helpful to deal with as a time series data. You will see it soon. I have a detailed article on Time-series data analysis. If you are new to time series data, it will be helpful if you have a look at this article first: I explained some important Pandas function in the article above that will be used in this article. Though I will provide a brief idea here as well. But if you need an example to understand better, please feel free to have a look at that previous article. Basic Plots FirstAs I said before, I want to start with some basic plots. The most basic plot should be a line plot using Pandas. I will plot the ‘Volume’ data here. See how it looks: df['Volume'].plot() ![]() This is our plot of ‘Volume’ data that looks pretty busy with some big spikes. It will be a good idea to plot all the other columns as well in a plot to examine the curves of all of them at the same time. df.plot(subplots=True, figsize=(10,12)) ![]() The shape of the curve for ‘Open’, ‘Close’, ‘High’ and ‘Low’ data have the same shape. Only the ‘Volume’ has a different shape.
The line plot I used above is great for showing seasonality. Resampling for months or weeks and making bar plots is another very simple and widely used method of finding seasonality. Here I am making a bar plot of month data for 2016 and 2017. For the index, I will use [2016:]. Because our dataset contains data until 2017. So, 2016 to end should bring 2016 and 2017. df_month = df.resample("M").mean()fig, ax = plt.subplots(figsize=(10, 6)) ![]() There are 24 bars. Each bar represents a month. A huge spike in July 2017. Otherwise, there is no monthly seasonality here. One way to find seasonality is by using a set of boxplots. Here I am going to make boxplots for each month. I will use ‘Open’, ‘Close’, ‘High’ and ‘Low’ data to make this plot. import seaborn as sns ![]() It shows the monthly difference in values clearly. There are more ways to show seasonality. I discussed it one more way at the end. Resampling and RollingRemember that first line plot of ‘Volume’ data above. As we discussed before, it was too busy. It can be fixed by resampling. Instead of plotting daily data, plotting monthly average will fix this issue to a large extent. I will use the df_month dataset I prepared already for the bar plot and box plots above for this. df_month['Volume'].plot(figsize=(8, 6)) ![]() Much more understandable and clearer! It gives a better idea about a trend in long term. Resampling is very common in time-series data. Most of the time resampling is done to a lower frequency. So, this article will only deal with the resampling of lower frequencies. Though resampling of higher frequency is also necessary especially for modeling purposes. Not so much in data analysis purpose. In the ‘Volume’ data we are working on right now, we can observe some big spikes here and there. These types of spikes are not helpful for data analysis or for modeling. normally to smooth out the spikes, resampling to a lower frequency and rolling is very helpful. Now, plot the daily data and weekly average ‘Volume’ in the same plot. First, make a weekly average dataset using the resampling method. df_week = df.resample("W").mean()
This ‘df_week’ and ‘df_month’ will be useful for us in later visualization as well. Let’s plot the daily and weekly data in the same plot. start, end = '2015-01', '2015-08' ![]() Look, the weekly average plot has smaller spikes than daily data. Rolling is another very helpful way of smoothing out the curve. It takes the average of a specified amount of data. If I want a 7-day rolling, it gives us the 7-d average data. Let’s include the 7-d rolling data in the above plot. df_7d_rolling = df.rolling(7, center=True).mean()start, end = '2016-06', '2017-05'fig, ax = plt.subplots() ![]() A lot going on in this one plot. But If you look at it carefully it is still understandable. If you notice 7-d rolling average is a bit smoother than the weekly average. It is also common to take a 30-d or 365-d rolling average to make the curve smoother. Please try it yourself. Plotting the ChangeLots of time it is more useful to see how the data change over time instead of just everyday data. There are a few different ways to calculate and visualize the change in data.
The shift function shifts the data before or after the specified amount of time. If I do not specify the time it will shift the data by one day by default. That means you will get the previous day's data. In financial data like this one, it is helpful to see previous day data and today's data side by side. As this article is dedicated to visualization only, I will only plot the previous day data: df['Change'] = df.Close.div(df.Close.shift())
In the code above, .div() helps to fill up the missing data. Actually, div() means division. df. div(6) will divide each element in df by 6. But here I used ‘df.Close.shift()’. So, Each element of df will be divided by each element of ‘df.Close.shift()’. We do this to avoid the null values that are created by the ‘shift()’ operation.
Here is the output: ![]() You can simply take a specific period and plot to have a clearer look. This is the plot of 2017 only. df['2017']['Change'].plot(figsize=(10, 6)) ![]() Though the shift is useful in many ways. But I find percent change useful on many occasions.
I will use the monthly data that was calculated in the beginning. This time I chose bar plots. It shows the percent change clearly. There is a percent change function available to get the percent_change data. df_month.loc[:, 'pct_change'] = df.Close.pct_change()*100fig, ax = plt.subplots() ![]() I plotted the percent change in closing data here. I used monthly percent change here.
Differencing takes the difference in values of a specified distance. By default, it’s one. If you specify 2 like “df.High.diff(2)’, it will take the difference of first and third element of ‘High’ column, second and fourth element, and so on. It is a popular method to remove the trend in the data. The trend is not good for forecasting or modeling. df.High.diff().plot(figsize=(10, 6)) ![]()
Another way of transformation. It keeps adding the cumulative. For example, if you add an expanding function to the ‘High’ column first element remains the same. The second element becomes cumulative of the first and second element, the third element becomes cumulative of the first, second, and third element, and so on. You can use aggregate functions like mean, median, standard deviation, etc. on it too. In that way, it will provide you with the changing mean, median, sum, or standard deviation with time. Isn’t it really useful for financial data or business sales or profit data? fig, ax = plt.subplots() ![]() Here I added expanding mean and standard deviation. Look at the daily data and the mean. At the end of 2017, daily data shows a huge spike. But it doesn’t show a spike in the average. Probably if you take the 2017 data only, the expanding average will look different. Please feel free to try it. Heat MapA heat map is generally a common type of data visualization that is used everywhere. In time-series data also heat maps can be very useful. But before diving into the heat map, we need to develop a calendar that will represent each year and month data of our dataset. Let’s see it in an example. For this demonstration, I will import a calendar package and use the pivot table function to generate the values. import calendar ![]() The calendar is ready with monthly average ‘Open’ data. Now, generate the heat map with it. ax = sns.heatmap(all_month_year_df, cmap='RdYlGn_r', robust=True, fmt='.2f', ![]() The heat map is ready! Darker red means very high opening and dark green mean very low opening. DecompositionDecomposition will show the observations and these three elements in the same plot: Trend: Consistent upward or downward slope of a time series. Seasonality: Clear periodic pattern of a time series Noise: Outliers or missing values Using the stats model library, it is easy to do it: from pylab import rcParams ![]() Here the trend is the moving average. To give you a high-level idea of residuals in the last row, here is the general formula: Original observations = Trend + Seasonality + Residuals Though the documentation for decomposition itself says that it’s a very naive representation but it is still popular. ConclusionIf you could run all the code above, Congratulation! You learned enough today to make a great level of time series of data visualization. As I mentioned in the beginning, there are a lot of cool visualization techniques available. I will write more in the future. |
The Next Level of Functional Programming in Python
From: https://towardsdatascience.com/the-next-level-of-functional-programming-in-python-bc534b9bdce1 Useful Python tips and tricks to take your functional skills to the next level![]()
Python is one of the world’s most popular and in-demand programming languages. Indeed, if you are in the hot field of Data Science, Python is, most probably, your daily driver. But why?
However, this story is not for beginners. Some experience with the language is required to understand the examples and their value. Thus, without further ado let’s dive into the
DefinitionsEvery function that we examine here is part of Python’s
count(start=10, step=1) --> 10 11 12 13 14 ...
islice('ABCDEFG', 2, None, 1) --> C D E F G
tee(it, [n=2])
repeat(elem=10, n=3) --> 10 10 10
cycle('ABCD') --> A B C D A B C D ...
chain('ABC', 'DEF') --> A B C D E F
accumulate(p=[1,2,3,4,5], func=add) --> 1 3 6 10 15
As we said, to make something out of these primitive functions, we would have to compose them and create higher abstractions. So let’s do that and create our Custom extensionsTo begin with, let’s create a function to realize the first def take(it, n):
As we do that, let’s also create a function that removes the first def drop(it, n):
Now, we are able to create our own primitive functions, that return the head = next
If you haven’t use Next, we want to create a new function, that we will call compose. It works like that: compose(x, f) --> x, f(x), f(f(x)), f(f(f(x))), ...
So, how can we do that? It turns out we can do this very efficiently now that we have yielded the power of the def compose(x, f):
I admit it looks a bit complex, but if you squint hard enough, you’ll understand its inner workings. take(compose(2, lambda x: x**2), 5) --> [2, 4, 16, 256, 65536]
Putting it all togetherNow, we are onto something here! So let’s use some functions that we have defined to create the infamous Fibonacci numbers sequence. def next_num(pair): Let’s walk it through step by step. The compose function creates a sequence of tuples. The first tuple is take(fibonacci(), 10) --> [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
This is a super-efficient implementation of the Fibonacci sequence, which runs under 35μs! It showcases the power of good, functional design! |
Develop and sell a Python API — from start to end tutorial
From: https://towardsdatascience.com/develop-and-sell-a-python-api-from-start-to-end-tutorial-9a038e433966 You can also read this article directly on Github (for better code formatting) I recently read a blog post about setting up your own API and selling it. I was quite inspired and wanted to test if it works. In just 5 days I was able to create an API from start to end. So I thought I share issues I came across, elaborate on concepts that the article was introducing, and provide a quick checklist to build something yourself. All of this by developing another API. Table of Contents
About this articleThis article can be considered as a tutorial and comprehension of other articles (listed in my “Inspiration” section). It paints a picture for developing a Python API from start to finish and provides help in more difficult areas like the setup with AWS and Rapidapi. I thought it will be useful for other people trying to do the same. I had some issues on the way, so I thought I share my approach. It is also a great way to build side projects and maybe even make some money. As the Table of content shows, it consists of 4 major parts, namely:
You will find all my code open sourced on Github: You will find the end result here on Rapidapi: If you found this article helpful let me know and/or buy the functionality on Rapidapi to show support. DisclaimerI am not associated with any of the services I use in this article. I do not consider myself an expert. If you have the feeling that I am missing important steps or neglected something, consider pointing it out in the comment section or get in touch with me. Also, always make sure to monitor your AWS costs to not pay for things you do not know about. I am always happy for constructive input and how to improve. Stack usedWe will use
1. Create project formalitiesIt’s always the same but necessary. I do it along with these steps:
git remote add origin URL_TO_GIT_REPO Now we have:
2. Create a solution for a problemThen we need to create a solution to some problem. For the sake of demonstration, I will show how to convert an excel csv file into other formats. The basic functionality will be coded and tested in a Jupyter Notebook first.
Install packagesInstall jupyter notebook and jupytext: pip install notebook jupytext sets a hook in #!/bin/shjupytext --from ipynb --to jupytext_conversion//py:light --pre-commit Develop a solution to a problempip install pandas requests Add a Download dataDownload an example dataset (titanic dataset) and save it into a data folder: def download(url: str, dest_folder: str): Create functionalityTransform format df = pd.read_csv('./data/titanic.csv') Build server to execute a function with RESTAfter developing the functionality in jupyter notebook we want to actually provide the functionality in a python app. There are ways to use parts of the jupyter notebook, but for the sake of simplicity we create it again now. Add an We want the user to upload an excel file and return the file converted into JSON for example. Browsing through the internet we can see that there are already packages that work with flask and excel formats. So let's use them. pip install Flask Start Flask server with env FLASK_APP=app.py FLASK_ENV=development flask run Tipp: Test your backend functionality with Postman. It is easy to set up and allows us to test the backend functionality quickly. Uploading an excel is done in the “form-data” tab: ![]() Here you can see the uploaded titanic csv file and the returned column names of the dataset. Now we simply write the function to transform the excel into json, like: import json (Check out my repository for the full code.) Now we have the functionality to transform csv files into json for example. 3. Deploy to AWSAfter developing it locally we want to get it in the cloud.
Set up zappaAfter we created the app locally we need to start setting up the hosting on a real server. We will use zappa.
pip install zappa As we are using a conda environment we need to specify it: which python will give you remove the export VIRTUAL_ENV=/Users/XXXX/opt/anaconda3/envs/XXXXX/ Now we can do zappa init to set up the config. Just click through everything and you will have a { Note that we are not yet ready to deploy. First, we need to get some AWS credentials. Set up AWSAWS credentialsFirst, you need te get an AWS You might think it is as easy as: To get the credentials you need to
But no. There is more to permissions in AWS! Set up credentials with users and roles in IAMI found this article from Peter Kazarinoff to be very helpful. He explains the next section in great detail. My following bullet point approach is a quick summary and I often quote his steps. Please check out his article for more details if you are stuck somewhere. I break it down as simple as possible:
My Custom policy: { NOTE: Replace XXXXXXXXXXX in the inline policy by your AWS Account Number. Your AWS Account Number can be found by clicking “Support → “Support Center. Your Account Number is listed in the Support Center on the upper left-hand side. The json above is what worked for me. But, I expect this set of security permissions may be too open. To increase security, you could slowly pare down the permissions and see if Zappa still deploys. The settings above are the ones that finally worked for me. You can dig through this discussion on GitHub if you want to learn more about specific AWS permissions needed to run Zappa: https://github.com/Miserlou/Zappa/issues/244. Add credentials in your projectCreate a mkdir ~/.aws and paste your credentials from AWS [dev] Same with the code open ~/.aws/config[default] Note that Save the AWS access key id and secret access key assigned to the user you created in the file ~/.aws/credentials. Note the .aws/ directory needs to be in your home directory and the credentials file has no file extension. Now you can do deploy your API with zappa deploy dev ![]() There shouldn’t be any errors anymore. However, if there are still some, you can debug with: zappa status The most common errors are permission related (then check your permission policy) or about python libraries that are incompatible. Either way, zappa will provide good enough error messages for debugging. If you update your code don’t forget to update the deployment as well with zappa update dev AWS API GatewayTo set up the API on a market we need to first restrict its usage with an API-key and then set it up on the market platform. I found this article from Nagesh Bansal to be helpful. He explains the next section in great detail. My following bullet point approach is a quick summary and I often quote his steps. Please check out his article for more details if you are stuck somewhere. Again, I break it down:
it looks like this ![]() Now you have restricted access to your API. 4. Set up Rapidapi
Create API on Rapidapi
![]() 5. In the security tab you can check everything 6. Then go to “endpoints” to add the routes from you Python app by clicking “create REST endpoint” ![]() 7. Add an image for your API 8. Set a pricing plan. Rapidapi published an own article on pricing options and strategies. As they conclude, it is up to your preferences and product on how to price it. 9. I created a freemium pricing plan. The reason for that is that I want to give the chance to test it without cost, but add a price for using it regularly. Also, I want to create a plan for supporting my work. For example: ![]() 10. Create some docs and a tutorial. This is pretty self-explaining. It is encouraged to do so as it is easier for people to use your API if it is documented properly. 11. The last step is to make your API publicly available. But before you do that it is useful to test it for yourself. Test your own APICreate a private plan for testingHaving set up everything, you of course should test it with the provided snippets. This step is not trivial and I had to contact the support to understand it. Now I am simplifying it here. Create a private plan for yourself, by setting no limits. The go to the “Users” section of your API, then to “Users on free plans”, select yourself and “invite” you to the private plan. ![]() ![]() Create code to consume APITo consume the API now you can simply copy the snippet that Rapidapi provides. For example with Python and the requests library: import requestsurl = "https://excel-to-other-formats.p.rapidapi.com/upload"payload = "" End resultInspirationThe article “API as a product. How to sell your work when all you know is a back-end” by Artem provided a great idea, namely to
For the setting everything I found the articles from Nagesh Bansal very helpful:
Also this article from Peter Kazarinoff: https://pythonforundergradengineers.com/deploy-serverless-web-app-aws-lambda-zappa.html I encourage you to have a look at those articles as well. You can also read my article directly on Github (for better code formatting) |
The right and wrong way to set Python 3 as default on a Mac
There are several ways to get started with Python 3 on macOS, but one way is better than the others.What's so hard about this?The version of Python that ships with macOS is well out of date from what Python recommends using for development. Python runtimes are also comically challenging at times, as noted by XKCD. |
8 Advanced Tips to Master Python Strings
Geo Heatmap
This is a script that generates an interactive geo heatmap from your Google location history data using Python, Folium and OpenStreetMap. |
8 Advanced Python Tricks Used by Seasoned Programmers
From: https://towardsdatascience.com/8-advanced-python-tricks-used-by-seasoned-programmers-757804975802Apply these tricks in your Python code to make it more concise and performant1. Sorting Objects by Multiple KeysSuppose we want to sort the following list of dictionaries: people = [
But we don’t just want to sort it by name or age, we want to sort it by both fields. In SQL, this would be a query like: SELECT * FROM people ORDER by name, age
There’s actually a very simple solution to this problem, thanks to Python’s guarantee that sort functions offer a stable sort order. This means items that compare equal retain their original order. To achieve sorting by name and age, we can do this: import operator
Notice how I reversed the order. We first sort by age, and then by name. With This gives us the result we were looking for: [
The names are sorted primarily, the ages are sorted if the name is the same. So all the Johns are grouped together, sorted by age. Inspired by this StackOverflow question. 2. List ComprehensionsA list comprehension can replace ugly for loops used to fill a list. The basic syntax for a list comprehension is: [ expression for item in list if conditional ]
A very basic example to fill a list with a sequence of numbers: mylist = [i for i in range(10)]
print(mylist)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
And because you can use an expression, you can also do some math: squares = [x**2 for x in range(10)]
print(squares)
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Or even call an external function: def some_function(a):
return (a + 5) / 2
my_formula = [some_function(i) for i in range(10)]
print(my_formula)
# [2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0]
And finally, you can use the ‘if’ to filter the list. In this case, we only keep the values that are dividable by 2: filtered = [i for i in range(20) if i%2==0]
print(filtered)
# [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
3. Check memory usage of your objectsWith sys.getsizeof() you can check the memory usage of an object: import sys
mylist = range(0, 10000)
print(sys.getsizeof(mylist))
# 48
Woah… wait… why is this huge list only 48 bytes? It’s because the range function returns a class that only behaves like a list. A range is a lot more memory efficient than using an actual list of numbers. You can see for yourself by using a list comprehension to create an actual list of numbers from the same range: import sys
myreallist = [x for x in range(0, 10000)]
print(sys.getsizeof(myreallist))
# 87632
So, by playing around with 4. Data classesSince version 3.7, Python offers data classes. There are several advantages over regular classes or other alternatives like returning multiple values or dictionaries:
Here’s an example of a data class at work: from dataclasses import dataclass
@dataclass
class Card:
rank: str
suit: str
card = Card("Q", "hearts")
print(card == card)
# True
print(card.rank)
# 'Q'
print(card)
Card(rank='Q', suit='hearts')
An in-depth guide can be found here. 5. The
|
Python | Read Text from Image with One Line Code
Python | Read Text from Image with One Line CodeDealing with images is not a trivial task. To you, as a human, it’s easy to look at something and immediately know what is it you’re looking at. But computers don’t work that way. Tasks that are too hard for you, like complex arithmetics, and math in general, is something that a computer chews without breaking a sweat. But here the exact opposite applies — tasks that are trivial to you, like recognizing is it cat or dog in an image are really hard for a computer. In a way, we are a perfect match. For now at least. While image classification and tasks that involve some level of computer vision might require a good bit of code and a solid understanding, reading text from a somewhat well-formatted image turns out to be a one-liner in Python —and can be applied to so many real-life problems. And in today’s post, I want to prove that claim. There will be some installation to go though, but it shouldn’t take much time. These are the libraries you’ll need:
I don’t want to prolonge this intro part anymore, so why don’t we jump into the good stuff now. OpenCVNow, this library will only be used to load the images(s), you don’t actually need to have a solid understanding of it beforehand (although it might be helpful, you’ll see why). According to the official documentation:
In a nutshell, you can use OpenCV to do any kind of image transformations, it’s fairly straightforward library. If you don’t already have it installed, it’ll be just a single line in terminal:
And that’s pretty much it. It was easy up until this point, but that’s about to change. PyTesseractWhat the heck is this library? Well, according to Wikipedia:
I’m sure there are more sophisticated libraries available now, but I’ve found this one working out pretty well. Based on my own experience, this library should be able to read text from any image, provided that the font isn’t some bulls*** that even you aren’t able to read. If it can’t read from your image, spend more time playing around with OpenCV, applying various filters to make the text stand out. Now the installation is a bit of a pain in the bottom. If you are on Linux it all boils down to a couple of sudo-apt get commands:
I’m on Windows, so the process is a bit more tedious. First, open up THIS URL, and download 32bit or 64bit installer: The installation by itself is straightforward, boils down to clicking Next a couple of times. And yeah, you also need to do a pip installation:
Is that all? Well, no. You still need to tell Python where Tesseract is installed. On Linux machines, I didn’t have to do so, but it’s required on Windows. By default, it’s installed in Program Files. If you did everything correctly, executing this cell should not yield any error: Is everything good? You may proceed. Reading the TextLet’s start with a simple one. I’ve found a couple of royalty-free images that contain some sort of text, and the first one is this: It should be the easy one, and there exists a possibility that Tesseract will read those blue ‘objects’ as brackets. Let’ see what will happen: My claim was true. It’s not a problem though, you could easily address those with some Python magic. The next one could be more tricky: I hope it won’t detect that ‘B’ on the coin: Looks like it works perfectly. Now it’s up to you to apply this to your own problem. OpenCV skills could be of vital importance here if the text blends with the background. Before you leaveReading text from an image is a pretty difficult task for a computer to perform. Think about it, the computer doesn’t know what a letter is, it only works only with numbers. What happens behind the hood might seem like a black box at first, but I encourage you to investigate further if this is your area of interest. I’m not saying that PyTesseract will work perfectly every time, but I’ve found it good enough even on some trickier images. But not straight out of the box. Some image manipulation is required to make the text stand out. It’s a complex topic, I know. Take it one day at a time. One day it will be second nature to you. References |