It may not seem like strings are something to master for data science, but with the abundance of unstructured, qualitative data available, it’s incredibly helpful to dive into strings!
1. Check for Membership with ‘in’
When working with unstructured data, it can be really helpful to identify particular words or other substrings in a larger string. The easiest way to do this is by using the in operator.
Say you’re working with a list, series, or dataframe column, and you want to identify whether a substring exists in a string.
In the example below, you have a list of different regions and want to know if the string “West” is in each list item.
sample_list = ['North West', 'West', 'North East', 'East', 'South', 'North']
is_west = ['Yes' if 'West' in location else 'No' for location in sample_list]
print(is_west)
# Returns:
# ['Yes', 'Yes', 'No', 'No', 'No', 'No']
Checking string membership. Source: Nik Piepenbreier
2. Do Magic with F-Strings
F-strings were introduced in Python 3.6 and they don’t get enough credit.
There’s a reason I say they’re magic. They:
- Allow for much more flexibility,
- Are much more readable than other methods, and
- Execute much faster.
But what are they? F-strings (or formatted string literals) allow you to place variables (or any expression) into strings. The expressions are then executed at run time.
To write an f-string, prefix a string with ‘f’.
Let’s take a look at an example:
name = 'Nik'
birthyear = 1987
print(f'My name is {name} and I am {2020-birthyear} years old.')
F-strings are amazing. Source: Nik Piepenbreier
3. Reverse a String with [::-1]
Strings can be reversed (like other iterables), by slicing the string. To reverse any iterable, simply use [::-1].
The -1 acts as a step argument, by which Python starts at the last value and increments by -1:
string = 'pythonisfun'
print(string[::-1])
# Returns: nufsinohtyp
Reversing a string. Source: Nik Piepenbreier
4. Replace Substrings with .replace()
To replace substrings, you can use the replace method. This works for any type of string, including a simple space (as Python doesn’t have built-in methods for removing spaces).
Let’s take a look at an example:
sample = 'Python is kind of fun.'
print(sample.replace('kind of', 'super'))
# Returns:
# Python is super fun.
Replacing substrings. Source: Nik Piepenbreier
5. Iterating over a String with a For-Loop
Python strings are iterable objects (just like lists, sets, etc.).
If you wanted to return each letter of a string, you could write:
sample = 'python'
for letter in sample:
print(letter)
# Returns:
# p
# y
# t
# h
# o
# n
6. Format Strings with .upper(), .lower(), and .title()
Python strings can be a little quirky. You might get yourself a file in all caps, all lower cases, etc. And you might need to actually format these for presenting them later on.
- .upper() will return a string with all characters in upper case
- .lower() will return a string with all characters in lower case
- .title() will capitalize each word of a string.
Let’s see these in action:
sample = 'THIS is a StRiNg'
print(sample.upper())
print(sample.lower())
print(sample.title())
# Returns:
# THIS IS A STRING
# this is a string
# This Is A String
7. Check for Palindromes and Anagrams
Combining what you’ve learned so far, you can easily check if a string is a Palindrome by using the [::-1] slice.
A word or phrase is a palindrome if it’s the same spelled forward as it is backward.
Similarly, you can return a sorted version of a string by using the sorted function. If two sorted strings are the same, they are anagrams:
string = 'taco cat'
def palindrome(string_to_check):
if string.lower().replace(' ', '') == string.lower().replace(' ', '')[::-1]:
print("You found a palindrome!")
else:
print("Your string isn't a palindrome")
palindrome(string)
# Returns:
# You found a palindrome!
An anagram is a word or phrase that is formed by rearranging another word. In short, two words are anagrams if they have the same letters.
If you want to see if two words are anagrams, you can sort the two words and see if they are the same:
def anagram(word1, word2):
if sorted(word1) == sorted(word2):
print(f"{word1} and {word2} are anagrams!")
else:
print(f"{word1} and {word2} aren't anagrams!")
anagram('silent', 'listen')
# Returns:
# silent and listen are anagrams!
8. Split a String with .split()
Say you’re given a string that contains multiple pieces of data. It can be helpful to split this string to parse out individual pieces of data.
In the example below, a string contains the region, the last name of a sales rep, as well as an order number.
You can use .split() to split these values:
order_text = 'north-doe-001'
print(order_text.split('-'))
# Returns:
# ['north', 'doe', '001']