Python

Develop and sell a Python API — from start to end tutorial

posted Sep 20, 2020, 3:05 PM by Chris G   [ updated Sep 20, 2020, 3:08 PM ]

From: https://towardsdatascience.com/develop-and-sell-a-python-api-from-start-to-end-tutorial-9a038e433966

You can also read this article directly on Github (for better code formatting)

Develop and sell a Python API — from start to end tutorial

The article paints a picture for developing a Python API from start to end and provides help in more difficult areas.

I recently read a blog post about setting up your own API and selling it.

I was quite inspired and wanted to test if it works. In just 5 days I was able to create an API from start to end. So I thought I share issues I came across, elaborate on concepts that the article was introducing, and provide a quick checklist to build something yourself. All of this by developing another API.

Table of Contents

About this article

This article can be considered as a tutorial and comprehension of other articles (listed in my “Inspiration” section).

It paints a picture for developing a Python API from start to finish and provides help in more difficult areas like the setup with AWS and Rapidapi.

I thought it will be useful for other people trying to do the same. I had some issues on the way, so I thought I share my approach. It is also a great way to build side projects and maybe even make some money.

As the Table of content shows, it consists of 4 major parts, namely:

  1. Setting up the environment
  2. Creating a problem solution with Python
  3. Setting up AWS
  4. Setting up Rapidapi

You will find all my code open sourced on Github:

You will find the end result here on Rapidapi:

If you found this article helpful let me know and/or buy the functionality on Rapidapi to show support.

Disclaimer

I am not associated with any of the services I use in this article.

I do not consider myself an expert. If you have the feeling that I am missing important steps or neglected something, consider pointing it out in the comment section or get in touch with me. Also, always make sure to monitor your AWS costs to not pay for things you do not know about.

I am always happy for constructive input and how to improve.

Stack used

We will use

  • Github (Code hosting),
  • Anaconda (Dependency and environment management),
  • Jupyter Notebook (code development and documentation),
  • Python (programming language),
  • AWS (deployment),
  • Rapidapi (market to sell)

1. Create project formalities

It’s always the same but necessary. I do it along with these steps:

  1. Create a local folder mkdir NAME
  2. Create a new repository on Github with NAME
  3. Create conda environment conda create --name NAME python=3.7
  4. Activate conda environment conda activate PATH_TO_ENVIRONMENT
  5. Create git repo git init
  6. Connect to Github repo. Add Readme file, commit it and
git remote add origin URL_TO_GIT_REPO
git push -u origin master

Now we have:

  • local folder
  • github repository
  • anaconda virtual environment
  • git version control

2. Create a solution for a problem

Then we need to create a solution to some problem. For the sake of demonstration, I will show how to convert an excel csv file into other formats. The basic functionality will be coded and tested in a Jupyter Notebook first.

Install packages

Install jupyter notebook and jupytext:

pip install notebook jupytext

sets a hook in .git/hooks/pre-commit for tracking the notebook changes in git properly:

#!/bin/shjupytext --from ipynb --to jupytext_conversion//py:light --pre-commit

Develop a solution to a problem

pip install pandas requests

Add a .gitignore file and add the data folder (data/) to not upload the data to the hosting.

Download data

Download an example dataset (titanic dataset) and save it into a data folder:

def download(url: str, dest_folder: str):
if not os.path.exists(dest_folder):
os.makedirs(dest_folder)
filename = url.split('/')[-1].replace(" ", "_")
file_path = os.path.join(dest_folder, filename)
r = requests.get(url, stream=True)
if r.ok:
print("saving to", os.path.abspath(file_path))
with open(file_path, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024 * 8):
if chunk:
f.write(chunk)
f.flush()
os.fsync(f.fileno())
else:
print("Download failed: status code {}\n{}".format(r.status_code, r.text))
url_to_titanic_data = 'https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv'download(url_to_titanic_data,'./data')

Create functionality

Transform format

df = pd.read_csv('./data/titanic.csv')
df.to_json(r'./data/titanic.json')
Image for post
Conversion example in Jupyter Notebook

Build server to execute a function with REST

After developing the functionality in jupyter notebook we want to actually provide the functionality in a python app.

There are ways to use parts of the jupyter notebook, but for the sake of simplicity we create it again now.

Add an app.py file.

We want the user to upload an excel file and return the file converted into JSON for example.

Browsing through the internet we can see that there are already packages that work with flask and excel formats. So let's use them.

pip install Flask

Start Flask server with

env FLASK_APP=app.py FLASK_ENV=development flask run

Tipp: Test your backend functionality with Postman. It is easy to set up and allows us to test the backend functionality quickly. Uploading an excel is done in the “form-data” tab:

Image for post
Testing backend with Postman

Here you can see the uploaded titanic csv file and the returned column names of the dataset.

Now we simply write the function to transform the excel into json, like:

import json
import pandas as pd
from flask import Flask, request
app = Flask(__name__)@app.route('/get_json', methods=['GET', 'POST'])
def upload_file():
if request.method == 'POST':
provided_data = request.files.get('file')
if provided_data is None:
return 'Please enter valid excel format ', 400
data = provided_data
df = pd.read_csv(data)
transformed = df.to_json()
result = {
'result': transformed,
}
json.dumps(result) return result
if __name__ == '__main__':
app.run()

(Check out my repository for the full code.)

Now we have the functionality to transform csv files into json for example.

3. Deploy to AWS

After developing it locally we want to get it in the cloud.

Set up zappa

After we created the app locally we need to start setting up the hosting on a real server. We will use zappa.

Zappa makes it super easy to build and deploy server-less, event-driven Python applications (including, but not limited to, WSGI web apps) on AWS Lambda + API Gateway. Think of it as “serverless” web hosting for your Python apps. That means infinite scaling, zero downtime, zero maintenance — and at a fraction of the cost of your current deployments!

pip install zappa

As we are using a conda environment we need to specify it:

which python

will give you /Users/XXX/opt/anaconda3/envs/XXXX/bin/python (for Mac)

remove the bin/python/ and export

export VIRTUAL_ENV=/Users/XXXX/opt/anaconda3/envs/XXXXX/

Now we can do

zappa init

to set up the config.

Just click through everything and you will have a zappa_settings.json like

{
"dev": {
"app_function": "app.app",
"aws_region": "eu-central-1",
"profile_name": "default",
"project_name": "pandas-transform-format",
"runtime": "python3.7",
"s3_bucket": "zappa-pandas-transform-format"
}
}

Note that we are not yet ready to deploy. First, we need to get some AWS credentials.

Set up AWS

AWS credentials

First, you need te get an AWS access key id and access key

You might think it is as easy as:

To get the credentials you need to

  • Go to: http://aws.amazon.com/
  • Sign Up & create a new account (they’ll give you the option for 1 year trial or similar)
  • Go to your AWS account overview
  • Account menu; sub-menu: Security Credentials

But no. There is more to permissions in AWS!

Set up credentials with users and roles in IAM

I found this article from Peter Kazarinoff to be very helpful. He explains the next section in great detail. My following bullet point approach is a quick summary and I often quote his steps. Please check out his article for more details if you are stuck somewhere.

I break it down as simple as possible:

  1. Within the AWS Console, type IAM into the search box. IAM is the AWS user and permissions dashboard.
  2. Create a group
  3. Give your group a name (for example zappa_group)
  4. Create our own specific inline policy for your group
  5. In the Permissions tab, under the Inline Policies section, choose the link to create a new Inline Policy
  6. In the Set Permissions screen, click the Custom Policy radio button and click the “Select” button on the right.
  7. Create a Custom Policy written in json format
  8. Read through and copy a policy discussed here: https://github.com/Miserlou/Zappa/issues/244
  9. Scroll down to “My Custom policy” see a snippet of my policy.
  10. After pasting and modifying the json with your AWS Account Number, click the “Validate Policy” button to ensure you copied valid json. Then click the “Apply Policy” button to attach the inline policy to the group.
  11. Create a user and add the user to the group
  12. Back at the IAM Dashboard, create a new user with the “Users” left-hand menu option and the “Add User” button.
  13. In the Add user screen, give your new user a name and select the Access Type for Programmatic access. Then click the “Next: Permissions” button.
  14. In the Set permissions screen, select the group you created earlier in the Add user to group section and click “Next: Tags”.
  15. Tags are optional. Add tags if you want, then click “Next: Review”.
  16. Review the user details and click “Create user”
  17. Copy the user’s keys
  18. Don’t close the AWS IAM window yet. In the next step, you will copy and paste these keys into a file. At this point, it’s not a bad idea to copy and save these keys into a text file in a secure location. Make sure you don’t save keys under version control.

My Custom policy:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"iam:AttachRolePolicy",
"iam:GetRole",
"iam:CreateRole",
"iam:PassRole",
"iam:PutRolePolicy"
],
"Resource": [
"arn:aws:iam::XXXXXXXXXXXXXXXX:role/*-ZappaLambdaExecutionRole"
]
},
{
"Effect": "Allow",
"Action": [
"lambda:CreateFunction",
"lambda:ListVersionsByFunction",
"logs:DescribeLogStreams",
"events:PutRule",
"lambda:GetFunctionConfiguration",
"cloudformation:DescribeStackResource",
"apigateway:DELETE",
"apigateway:UpdateRestApiPolicy",
"events:ListRuleNamesByTarget",
"apigateway:PATCH",
"events:ListRules",
"cloudformation:UpdateStack",
"lambda:DeleteFunction",
"events:RemoveTargets",
"logs:FilterLogEvents",
"apigateway:GET",
"lambda:GetAlias",
"events:ListTargetsByRule",
"cloudformation:ListStackResources",
"events:DescribeRule",
"logs:DeleteLogGroup",
"apigateway:PUT",
"lambda:InvokeFunction",
"lambda:GetFunction",
"lambda:UpdateFunctionConfiguration",
"cloudformation:DescribeStacks",
"lambda:UpdateFunctionCode",
"lambda:DeleteFunctionConcurrency",
"events:DeleteRule",
"events:PutTargets",
"lambda:AddPermission",
"cloudformation:CreateStack",
"cloudformation:DeleteStack",
"apigateway:POST",
"lambda:RemovePermission",
"lambda:GetPolicy"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucketMultipartUploads",
"s3:CreateBucket",
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::zappa-*"
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"s3:ListMultipartUploadParts"
],
"Resource": "arn:aws:s3:::zappa-*/*"
}
]
}

NOTE: Replace XXXXXXXXXXX in the inline policy by your AWS Account Number.

Your AWS Account Number can be found by clicking “Support → “Support Center. Your Account Number is listed in the Support Center on the upper left-hand side. The json above is what worked for me. But, I expect this set of security permissions may be too open. To increase security, you could slowly pare down the permissions and see if Zappa still deploys. The settings above are the ones that finally worked for me. You can dig through this discussion on GitHub if you want to learn more about specific AWS permissions needed to run Zappa: https://github.com/Miserlou/Zappa/issues/244.

Add credentials in your project

Create a .aws/credentials folder in your root with

mkdir ~/.aws
code open ~/.aws/credentials

and paste your credentials from AWS

[dev]
aws_access_key_id = YOUR_KEY
aws_secret_access_key = YOUR_KEY

Same with the config

code open ~/.aws/config[default]
region = YOUR_REGION (eg. eu-central-1)

Note that code is for opening a folder with vscode, my editor of choice.

Save the AWS access key id and secret access key assigned to the user you created in the file ~/.aws/credentials. Note the .aws/ directory needs to be in your home directory and the credentials file has no file extension.

Now you can do deploy your API with

zappa deploy dev
Image for post
Deploying app with zappa

There shouldn’t be any errors anymore. However, if there are still some, you can debug with:

zappa status
zappa tail

The most common errors are permission related (then check your permission policy) or about python libraries that are incompatible. Either way, zappa will provide good enough error messages for debugging.

If you update your code don’t forget to update the deployment as well with

zappa update dev

AWS API Gateway

To set up the API on a market we need to first restrict its usage with an API-key and then set it up on the market platform.

I found this article from Nagesh Bansal to be helpful. He explains the next section in great detail. My following bullet point approach is a quick summary and I often quote his steps. Please check out his article for more details if you are stuck somewhere.

Again, I break it down:

  1. go to your AWS Console and go to API gateway
  2. click on your API
  3. we want to create an x-api-key to restrict undesired access to the API and also have a metered usage
  4. create a Usage plan for the API, with the desired throttle and quota limits
  5. create an associated API stage
  6. add an API key
  7. in the API key overview section, click “show” at the API key and copy it
  8. then associate the API with the key and discard all requests that come without the key
  9. go back to the API overview. under resources, click the “/ any” go to the “method request”. then in settings, set “API key required” to true
  10. do the same for the “/{proxy+} Methods”

it looks like this

Image for post
Set restrictions in AWS API Gateway

Now you have restricted access to your API.

4. Set up Rapidapi

Create API on Rapidapi

  1. Go to “My APIs” and “Add new API”
  2. Add the name, description, and category. Note that you cannot change your API name afterward anymore
  3. In settings, add the URL of your AWS API (it was displayed when you deployed with zappa)
  4. In the section “Access Control” under “Transformations”, add the API key you added in AWS
Image for post
Access Control in Rapidapi

5. In the security tab you can check everything

6. Then go to “endpoints” to add the routes from you Python app by clicking “create REST endpoint”

Image for post
Add a REST endpoint

7. Add an image for your API

8. Set a pricing plan. Rapidapi published an own article on pricing options and strategies. As they conclude, it is up to your preferences and product on how to price it.

9. I created a freemium pricing plan. The reason for that is that I want to give the chance to test it without cost, but add a price for using it regularly. Also, I want to create a plan for supporting my work. For example:

Image for post
Set price plans

10. Create some docs and a tutorial. This is pretty self-explaining. It is encouraged to do so as it is easier for people to use your API if it is documented properly.

11. The last step is to make your API publicly available. But before you do that it is useful to test it for yourself.

Test your own API

Create a private plan for testing

Having set up everything, you of course should test it with the provided snippets. This step is not trivial and I had to contact the support to understand it. Now I am simplifying it here.

Create a private plan for yourself, by setting no limits.

The go to the “Users” section of your API, then to “Users on free plans”, select yourself and “invite” you to the private plan.

Image for post
Add yourself to your private plan

Now you are subscribed to your own private plan and can test the functionality with the provided snippets.

Test endpoint with Rapidapi

Upload an example excel file and click on “test endpoint”. Then you will get a 200 ok response.

Image for post
Test an endpoint in Rapidapi

Create code to consume API

To consume the API now you can simply copy the snippet that Rapidapi provides. For example with Python and the requests library:

import requestsurl = "https://excel-to-other-formats.p.rapidapi.com/upload"payload = ""
headers = {
'x-rapidapi-host': "excel-to-other-formats.p.rapidapi.com",
'x-rapidapi-key': "YOUR_KEY",
'content-type': "multipart/form-data"
}
response = requests.request("POST", url, data=payload, headers=headers)print(response.text)

End result

Inspiration

The article “API as a product. How to sell your work when all you know is a back-end” by Artem provided a great idea, namely to

Make an API that solves a problem

Deploy it with a serverless architecture

Distribute through an API Marketplace

For the setting everything I found the articles from Nagesh Bansal very helpful:

Also this article from Peter Kazarinoff: https://pythonforundergradengineers.com/deploy-serverless-web-app-aws-lambda-zappa.html

I encourage you to have a look at those articles as well.

You can also read my article directly on Github (for better code formatting)


The right and wrong way to set Python 3 as default on a Mac

posted Aug 9, 2020, 10:11 AM by Chris G   [ updated Aug 9, 2020, 10:12 AM ]

There are several ways to get started with Python 3 on macOS, but one way is better than the others.

What's so hard about this?

The version of Python that ships with macOS is well out of date from what Python recommends using for development. Python runtimes are also comically challenging at times, as noted by XKCD.

python_environment_xkcd.png

Python environment webcomic by xkcd

So what's the plan? 

Python GUI For Humans! Create full function Python interfaces with PySimpleGUI

posted Aug 9, 2020, 10:04 AM by Chris G   [ updated Aug 9, 2020, 10:06 AM ]


$ 𝚙𝚒𝚙 𝚒𝚗𝚜𝚝𝚊𝚕𝚕 𝚙𝚢𝚜𝚒𝚖𝚙𝚕𝚎𝚐𝚞𝚒

GitHub https://github.com/PySimpleGUI/
Docs https://pysimplegui.readthedocs.io/en/latest/

pysimplegui_logo

This Code

import PySimpleGUI as sg

sg.theme('DarkAmber')   # Add a touch of color
# All the stuff inside your window.
layout = [  [sg.Text('Some text on Row 1')],
            [sg.Text('Enter something on Row 2'), sg.InputText()],
            [sg.Button('Ok'), sg.Button('Cancel')] ]

# Create the Window
window = sg.Window('Window Title', layout)
# Event Loop to process "events" and get the "values" of the inputs
while True:
    event, values = window.read()
    if event == sg.WIN_CLOSED or event == 'Cancel': # if user closes window or clicks cancel
        break
    print('You entered ', values[0])

window.close()

Makes This Window

and returns the value input as well as the button clicked.

image

Activate link to view larger image.

This Code

import PySimpleGUI as sg

sg.theme('DarkAmber')   # Add a touch of color
# All the stuff inside your window.
layout = [  [sg.Text('Some text on Row 1')],
            [sg.Text('Enter something on Row 2'), sg.InputText()],
            [sg.Button('Ok'), sg.Button('Cancel')] ]

# Create the Window
window = sg.Window('Window Title', layout)
# Event Loop to process "events" and get the "values" of the inputs
while True:
    event, values = window.read()
    if event == sg.WIN_CLOSED or event == 'Cancel': # if user closes window or clicks cancel
        break
    print('You entered ', values[0])

window.close()

Makes This Window

and returns the value input as well as the button clicked.

image

8 Advanced Tips to Master Python Strings

posted Jul 4, 2020, 12:27 PM by Chris G   [ updated Jul 4, 2020, 12:35 PM ]



Learn These Tips to Master Python Strings

Python strings appear simple, but they're incredibly flexible and they’re everywhere!

It may not seem like strings are something to master for data science, but with the abundance of unstructured, qualitative data available, it’s incredibly helpful to dive into strings!

1. Check for Membership with ‘in’

When working with unstructured data, it can be really helpful to identify particular words or other substrings in a larger string. The easiest way to do this is by using the in operator.

Say you’re working with a list, series, or dataframe column, and you want to identify whether a substring exists in a string.

In the example below, you have a list of different regions and want to know if the string “West” is in each list item.


sample_list = ['North West', 'West', 'North East', 'East', 'South', 'North'] is_west = ['Yes' if 'West' in location else 'No' for location in sample_list] print(is_west) # Returns: # ['Yes', 'Yes', 'No', 'No', 'No', 'No']


Checking string membership. Source: Nik Piepenbreier

2. Do Magic with F-Strings

F-strings were introduced in Python 3.6 and they don’t get enough credit.

There’s a reason I say they’re magic. They:

  • Allow for much more flexibility,
  • Are much more readable than other methods, and
  • Execute much faster.

But what are they? F-strings (or formatted string literals) allow you to place variables (or any expression) into strings. The expressions are then executed at run time.

To write an f-string, prefix a string with ‘f’.

Let’s take a look at an example:


name = 'Nik' birthyear = 1987 print(f'My name is {name} and I am {2020-birthyear} years old.')


F-strings are amazing. Source: Nik Piepenbreier 

3. Reverse a String with [::-1]

Strings can be reversed (like other iterables), by slicing the string. To reverse any iterable, simply use [::-1].

The -1 acts as a step argument, by which Python starts at the last value and increments by -1:


string = 'pythonisfun' print(string[::-1]) # Returns: nufsinohtyp


Reversing a string. Source: Nik Piepenbreier

4. Replace Substrings with .replace()

To replace substrings, you can use the replace method. This works for any type of string, including a simple space (as Python doesn’t have built-in methods for removing spaces).

Let’s take a look at an example:


sample = 'Python is kind of fun.' print(sample.replace('kind of', 'super')) # Returns: # Python is super fun.


Replacing substrings. Source: Nik Piepenbreier

5. Iterating over a String with a For-Loop

Python strings are iterable objects (just like lists, sets, etc.).

If you wanted to return each letter of a string, you could write:


sample = 'python' for letter in sample: print(letter) # Returns: # p # y # t # h # o # n


6. Format Strings with .upper(), .lower(), and .title()

Python strings can be a little quirky. You might get yourself a file in all caps, all lower cases, etc. And you might need to actually format these for presenting them later on.

  • .upper() will return a string with all characters in upper case
  • .lower() will return a string with all characters in lower case
  • .title() will capitalize each word of a string.

Let’s see these in action:


sample = 'THIS is a StRiNg' print(sample.upper()) print(sample.lower()) print(sample.title()) # Returns: # THIS IS A STRING # this is a string # This Is A String


7. Check for Palindromes and Anagrams

Combining what you’ve learned so far, you can easily check if a string is a Palindrome by using the [::-1] slice.

A word or phrase is a palindrome if it’s the same spelled forward as it is backward.

Similarly, you can return a sorted version of a string by using the sorted function. If two sorted strings are the same, they are anagrams:


string = 'taco cat' def palindrome(string_to_check): if string.lower().replace(' ', '') == string.lower().replace(' ', '')[::-1]: print("You found a palindrome!") else: print("Your string isn't a palindrome") palindrome(string) # Returns: # You found a palindrome!


An anagram is a word or phrase that is formed by rearranging another word. In short, two words are anagrams if they have the same letters.

If you want to see if two words are anagrams, you can sort the two words and see if they are the same:


def anagram(word1, word2): if sorted(word1) == sorted(word2): print(f"{word1} and {word2} are anagrams!") else: print(f"{word1} and {word2} aren't anagrams!") anagram('silent', 'listen') # Returns: # silent and listen are anagrams!


8. Split a String with .split()

Say you’re given a string that contains multiple pieces of data. It can be helpful to split this string to parse out individual pieces of data.

In the example below, a string contains the region, the last name of a sales rep, as well as an order number.

You can use .split() to split these values:


order_text = 'north-doe-001' print(order_text.split('-')) # Returns: # ['north', 'doe', '001']



Geo Heatmap

posted Jul 4, 2020, 12:15 PM by Chris G   [ updated Jul 4, 2020, 12:15 PM ]


screenshot

This is a script that generates an interactive geo heatmap from your Google location history data using Python, Folium and OpenStreetMap.




8 Advanced Python Tricks Used by Seasoned Programmers

posted Jun 27, 2020, 10:10 AM by Chris G   [ updated Jun 27, 2020, 1:36 PM ]

Apply these tricks in your Python code to make it more concise and performant

Here are eight neat Python tricks some I’m sure you haven’t seen before. Apply these tricks in your Python code to make it more concise and performant!

1. Sorting Objects by Multiple Keys

Suppose we want to sort the following list of dictionaries:

people = [
{ 'name': 'John', "age": 64 },
{ 'name': 'Janet', "age": 34 },
{ 'name': 'Ed', "age": 24 },
{ 'name': 'Sara', "age": 64 },
{ 'name': 'John', "age": 32 },
{ 'name': 'Jane', "age": 34 },
{ 'name': 'John', "age": 99 },
]

But we don’t just want to sort it by name or age, we want to sort it by both fields. In SQL, this would be a query like:

SELECT * FROM people ORDER by name, age

There’s actually a very simple solution to this problem, thanks to Python’s guarantee that sort functions offer a stable sort order. This means items that compare equal retain their original order.

To achieve sorting by name and age, we can do this:

import operator
people.sort(key=operator.itemgetter('age'))
people.sort(key=operator.itemgetter('name'))

Notice how I reversed the order. We first sort by age, and then by name. With operator.itemgetter() we get the age and name fields from each dictionary inside the list in a concise way.

This gives us the result we were looking for:

[
{'name': 'Ed', 'age': 24},
{'name': 'Jane', 'age': 34},
{'name': 'Janet','age': 34},
{'name': 'John', 'age': 32},
{'name': 'John', 'age': 64},
{'name': 'John', 'age': 99},
{'name': 'Sara', 'age': 64}
]

The names are sorted primarily, the ages are sorted if the name is the same. So all the Johns are grouped together, sorted by age.

Inspired by this StackOverflow question.


2. List Comprehensions

A list comprehension can replace ugly for loops used to fill a list. The basic syntax for a list comprehension is:

[ expression for item in list if conditional ]

A very basic example to fill a list with a sequence of numbers:

mylist = [i for i in range(10)]
print(mylist)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

And because you can use an expression, you can also do some math:

squares = [x**2 for x in range(10)]
print(squares)
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Or even call an external function:

def some_function(a):
    return (a + 5) / 2
    
my_formula = [some_function(i) for i in range(10)]
print(my_formula)
# [2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0]

And finally, you can use the ‘if’ to filter the list. In this case, we only keep the values that are dividable by 2:

filtered = [i for i in range(20) if i%2==0]
print(filtered)
# [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

3. Check memory usage of your objects

With sys.getsizeof() you can check the memory usage of an object:

import sys

mylist = range(0, 10000)
print(sys.getsizeof(mylist))
# 48

Woah… wait… why is this huge list only 48 bytes?

It’s because the range function returns a class that only behaves like a list. A range is a lot more memory efficient than using an actual list of numbers.

You can see for yourself by using a list comprehension to create an actual list of numbers from the same range:

import sys

myreallist = [x for x in range(0, 10000)]
print(sys.getsizeof(myreallist))
# 87632

So, by playing around with sys.getsizeof() you can learn more about Python and your memory usage.


4. Data classes

Since version 3.7, Python offers data classes. There are several advantages over regular classes or other alternatives like returning multiple values or dictionaries:

  • a data class requires a minimal amount of code
  • you can compare data classes because __eq__ is implemented for you
  • you can easily print a data class for debugging because __repr__ is implemented as well
  • data classes require type hints, reduced the chances of bugs

Here’s an example of a data class at work:

from dataclasses import dataclass

@dataclass
class Card:
    rank: str
    suit: str
    
card = Card("Q", "hearts")

print(card == card)
# True

print(card.rank)
# 'Q'

print(card)
Card(rank='Q', suit='hearts')

An in-depth guide can be found here.


5. The attrs Package

Instead of data classes, you can use attrs. There are two reasons to choose attrs:

  • You are using a Python version older than 3.7
  • You want more features

Theattrs package supports all mainstream Python versions, including CPython 2.7 and PyPy. Some of the extras attrs offers over regular data classes are validators, and converters. Let’s look at some example code:

@attrs
class Person(object):
    name = attrib(default='John')
    surname = attrib(default='Doe')
    age = attrib(init=False)
    
p = Person()
print(p)
p = Person('Bill', 'Gates')
p.age = 60
print(p)

# Output: 
#   Person(name='John', surname='Doe', age=NOTHING)
#   Person(name='Bill', surname='Gates', age=60)

The authors of attrs have, in fact, worked on the PEP that introduced data classes. Data classes are intentionally kept simpler (easier to understand), while attrs offers the full range of features you might want!

For more examples, check out the attrs examples page.


6. Merging dictionaries (Python 3.5+)

Since Python 3.5, it’s easier to merge dictionaries:

dict1 = { 'a': 1, 'b': 2 }
dict2 = { 'b': 3, 'c': 4 }
merged = { **dict1, **dict2 }
print (merged)
# {'a': 1, 'b': 3, 'c': 4}

If there are overlapping keys, the keys from the first dictionary will be overwritten.

In Python 3.9, merging dictionaries becomes even cleaner. The above merge in Python 3.9 can be rewritten as:

merged = dict1 | dict2

7. Find the Most Frequently Occurring Value

To find the most frequently occurring value in a list or string:

test = [1, 2, 3, 4, 2, 2, 3, 1, 4, 4, 4]
print(max(set(test), key = test.count))
# 4

Do you understand why this works? Try to figure it out for yourself before reading on.

You didn’t try, did you? I’ll tell you anyway:

  • max() will return the highest value in a list. The key argument takes a single argument function to customize the sort order, in this case, it’s test.count. The function is applied to each item on the iterable.
  • test.count is a built-in function of list. It takes an argument and will count the number of occurrences for that argument. So test.count(1) will return 2 and test.count(4) returns 4.
  • set(test) returns all the unique values from test, so {1, 2, 3, 4}

So what we do in this single line of code is take all the unique values of test, which is {1, 2, 3, 4}. Next, max will apply the list.count function to them and return the maximum value.

And no — I didn’t invent this one-liner.

Update: a number of commenters rightfully pointed out that there’s a much more efficient way to do this:

from collections import Counter
Counter(test).most_common(1)
# [4: 4]

8. Return Multiple Values

Functions in Python can return more than one variable without the need for a dictionary, a list, or a class. It works like this:

def get_user(id):
    # fetch user from database
    # ....
    return name, birthdate

name, birthdate = get_user(4)

This is alright for a limited number of return values. But anything past 3 values should be put into a (data) class.

Python | Read Text from Image with One Line Code

posted Jun 20, 2020, 1:24 PM by Chris G   [ updated Jun 20, 2020, 1:25 PM ]

Python | Read Text from Image with One Line Code

Dealing with images is not a trivial task. To you, as a human, it’s easy to look at something and immediately know what is it you’re looking at. But computers don’t work that way.

Tasks that are too hard for you, like complex arithmetics, and math in general, is something that a computer chews without breaking a sweat. But here the exact opposite applies — tasks that are trivial to you, like recognizing is it cat or dog in an image are really hard for a computer. In a way, we are a perfect match. For now at least.

While image classification and tasks that involve some level of computer vision might require a good bit of code and a solid understanding, reading text from a somewhat well-formatted image turns out to be a one-liner in Python —and can be applied to so many real-life problems.

And in today’s post, I want to prove that claim. There will be some installation to go though, but it shouldn’t take much time. These are the libraries you’ll need:

  • OpenCV
  • PyTesseract

I don’t want to prolonge this intro part anymore, so why don’t we jump into the good stuff now.

OpenCV

Now, this library will only be used to load the images(s), you don’t actually need to have a solid understanding of it beforehand (although it might be helpful, you’ll see why).

According to the official documentation:

OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products. Being a BSD-licensed product, OpenCV makes it easy for businesses to utilize and modify the code.[1]

In a nutshell, you can use OpenCV to do any kind of image transformations, it’s fairly straightforward library.

If you don’t already have it installed, it’ll be just a single line in terminal:

pip install opencv-python

And that’s pretty much it. It was easy up until this point, but that’s about to change.

PyTesseract

What the heck is this library? Well, according to Wikipedia:

Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License, Version 2.0, and development has been sponsored by Google since 2006.[2]

I’m sure there are more sophisticated libraries available now, but I’ve found this one working out pretty well. Based on my own experience, this library should be able to read text from any image, provided that the font isn’t some bulls*** that even you aren’t able to read.

If it can’t read from your image, spend more time playing around with OpenCV, applying various filters to make the text stand out.

Now the installation is a bit of a pain in the bottom. If you are on Linux it all boils down to a couple of sudo-apt get commands:

sudo apt-get update
sudo apt-get install tesseract-ocr
sudo apt-get install libtesseract-dev

I’m on Windows, so the process is a bit more tedious.

First, open up THIS URL, and download 32bit or 64bit installer:

This is image title

The installation by itself is straightforward, boils down to clicking Next a couple of times. And yeah, you also need to do a pip installation:

pip install pytesseract

Is that all? Well, no. You still need to tell Python where Tesseract is installed. On Linux machines, I didn’t have to do so, but it’s required on Windows. By default, it’s installed in Program Files.

If you did everything correctly, executing this cell should not yield any error:

This is image title

Is everything good? You may proceed.

Reading the Text

Let’s start with a simple one. I’ve found a couple of royalty-free images that contain some sort of text, and the first one is this:

Reading the Text

It should be the easy one, and there exists a possibility that Tesseract will read those blue ‘objects’ as brackets. Let’ see what will happen:

This is image title

My claim was true. It’s not a problem though, you could easily address those with some Python magic.

The next one could be more tricky:

This is image title

I hope it won’t detect that ‘B’ on the coin:

This is image title

Looks like it works perfectly.

Now it’s up to you to apply this to your own problem. OpenCV skills could be of vital importance here if the text blends with the background.

Before you leave

Reading text from an image is a pretty difficult task for a computer to perform. Think about it, the computer doesn’t know what a letter is, it only works only with numbers. What happens behind the hood might seem like a black box at first, but I encourage you to investigate further if this is your area of interest.

I’m not saying that PyTesseract will work perfectly every time, but I’ve found it good enough even on some trickier images. But not straight out of the box. Some image manipulation is required to make the text stand out.

It’s a complex topic, I know. Take it one day at a time. One day it will be second nature to you.



https://morioh.com/p/177cde94de0e?f=5c21fb01c16e2556b555ab32&_lrsc=3e30293b-e197-4a5c-b336-addd285eb852


References

[1] https://opencv.org/about/

[2] https://en.wikipedia.org/wiki/Tesseract_(software)

handling errors & exceptions when using Boto3

posted Jun 20, 2020, 1:15 PM by Chris G   [ updated Jun 20, 2020, 1:16 PM ]

New to AWS & Boto3? Learn the best practices for handling errors & exceptions when using Boto3, the AWS SDK for Python: https://go.aws/30tWTHs


Overview

AWS services require clients to use a variety of parameters, behaviors, or limits when interacting with their APIs. Boto3 provides many features to assist in navigating the errors and exceptions that you might encounter when interacting with AWS services.

Specifically, this guide provides details on the following:

  • How to find what exceptions there are to catch when using Boto3 and interacting with AWS services
  • How to catch/handle exceptions thrown by both Boto3 and AWS services
  • How to parse error responses from AWS services

Why catch exceptions from AWS and Boto

  • Retries - Your call rate to an AWS service might be too frequent, or you might have reached a specific AWS service quota. In either case, without proper error handling you wouldn’t know or wouldn’t handle them.
  • Parameter validation/checking - API requirements can change, especially across API versions. Catching these errors helps to identify if there’s an issue with the parameters you provide to any given API call.
  • Proper logging/messaging - Catching errors and exceptions means you can log them. This can be instrumental in troubleshooting any code you write when interacting with AWS services.

Determining what exceptions to catch

Exceptions that you might encounter when using Boto3 will come from one of two sources: botocore or the AWS services your client is interacting with.

Botocore exceptions

These exceptions are statically defined within the botocore package, a dependency of Boto3. The exceptions are related to issues with client-side behaviors, configurations, or validations. You can generate a list of the statically defined botocore exceptions using the following code:

import botocore.exceptions

for key, value in sorted(botocore.exceptions.__dict__.items()):
    if isinstance(value, type):
        print(key)


Jupyter Dash

posted May 30, 2020, 7:25 AM by Chris G   [ updated May 30, 2020, 7:25 AM ]

Jupyter Dash

Binder

This library makes it easy to develop Plotly Dash apps interactively from within Jupyter environments (e.g. classic Notebook, JupyterLab, Visual Studio Code notebooks, nteract, PyCharm notebooks, etc.).

jupterlab example


1-10 of 46