Python Refactoring
Trying to understand what is refactoring, follow the guide from Real Python. https://realpython.com/python-refactoring
1. Functions That Should Be Objects
Without reading the code, you will not know if it will modify the original image or create a new image.
def load_image(path):
with open(path, "rb") as file:
fb = file.load()
image = img_lib.parse(fb)
return image
def crop_image(image, width, height):
...
return image
def get_image_thumbnail(image, resolution=100):
...
return image
To call the codes:
from imagelib import load_image, crop_image, get_image_thumbnail
image = load_image('~/face.jpg')
image = crop_image(image, 400, 500)
thumb = get_image_thumbnail(image)
Symptoms of code using functions that could be refactored into classes:
Similar arguments across functions Higher number of Halstead h2 unique operands (All the variables and constants are considered operands) Mix of mutable and immutable functions Functions spread across multiple Python files
Here is a refactored version of those 3 functions, where the following happens:
.init() replaces load_image(). crop() becomes a class method. get_image_thumbnail() becomes a property.
The thumbnail resolution has become a class property, so it can be changed globally or on that particular instance:
class Image(object):
thumbnail_resolution = 100
def __init__(self, path):
...
def crop(self, width, height):
...
@property
def thumbnail(self):
...
return thumb
This is how the refactored example would look:
from imagelib import Image
image = Image('~/face.jpg')
image.crop(400, 500)
thumb = image.thumbnail
In the resulting code, we have solved the original problems:
It is clear that thumbnail returns a thumbnail since it is a property, and that it doesn’t modify the instance. The code no longer requires creating new variables for the crop operation.
2. Objects That Should Be Functions
Here are some tell-tale signs of incorrect use of classes:
Classes with 1 method (other than .init()) Classes that contain only static methods
class Authenticator(object):
def __init__(self, username, password):
self.username = username
self.password = password
def authenticate(self):
# Do something
return result
It would make more sense to just have a simple function named authenticate() that takes username and password as arguments:
def authenticate(username, password):
# Do something
return result
3. Converting “Triangular” Code to Flat Code
These are the symptoms of highly nested code:
A high cyclomatic complexity because of the number of code branches A low Maintainability Index because of the high cyclomatic complexity relative to the number of lines of code
def contains_errors(data):
if isinstance(data, list):
for item in data:
if isinstance(item, str):
if item == "error":
return True
return False
Refactor this function by “returning early”
def contains_errors(data):
if not isinstance(data, list):
return False
return data.count("error") > 0
Another technique to reduce nesting by list comprehension
Common practise to create list, loop through and check for criteria.
results = []
for item in iterable:
if item == match:
results.append(item)
Replace with:
results = [item for item in iterable if item == match]
4. Handling Complex Dictionaries With Query Tools
It does have one major side-effect: when dictionaries are highly nested, the code that queries them becomes nested too.
data = {
"network": {
"lines": [
{
"name.en": "Ginza",
"name.jp": "銀座線",
"color": "orange",
"number": 3,
"sign": "G"
},
{
"name.en": "Marunouchi",
"name.jp": "丸ノ内線",
"color": "red",
"number": 4,
"sign": "M"
}
]
}
}
If you wanted to get the line that matched a certain number, this could be achieved in a small function:
def find_line_by_number(data, number):
matches = [line for line in data if line['number'] == number]
if len(matches) > 0:
return matches[0]
else:
raise ValueError(f"Line {number} does not exist.")
find_line_by_number(data["network"]["lines"], 3)
There are third party tools for querying dictionaries in Python. Some of the most popular are JMESPath, glom, asq, and flupy.
import jmespath
jmespath.search("network.lines", data)
[{'name.en': 'Ginza', 'name.jp': '銀座線',
'color': 'orange', 'number': 3, 'sign': 'G'},
{'name.en': 'Marunouchi', 'name.jp': '丸ノ内線',
'color': 'red', 'number': 4, 'sign': 'M'}]
If you wanted to get the line number for every line, you could do this:
jmespath.search("network.lines[*].number", data) [3, 4]
You can provide more complex queries, like a ==
or <
. The syntax is a little unusual for Python developers, so keep
the
documentation handy for reference.
Find the line with the number 3
> > > jmespath.search("network.lines[?number==`3`]", data)
> > > [{'name.en': 'Ginza', 'name.jp': '銀座線', 'color': 'orange', 'number': 3, 'sign': 'G'}]