Python Refactoring

April 11, 2022 · 4 min read

Just a person

Trying to understand what is refactoring, follow the guide from Real Python. https://realpython.com/python-refactoring

1. Functions That Should Be Objects

Without reading the code, you will not know if it will modify the original image or create a new image.

imagelib.py
def load_image(path):
    with open(path, "rb") as file:
        fb = file.load()
    image = img_lib.parse(fb)
    return image


def crop_image(image, width, height):
    ...
    return image


def get_image_thumbnail(image, resolution=100):
    ...
    return image

To call the codes:

from imagelib import load_image, crop_image, get_image_thumbnail

image = load_image('~/face.jpg')
image = crop_image(image, 400, 500)
thumb = get_image_thumbnail(image)

Symptoms of code using functions that could be refactored into classes:

Similar arguments across functions Higher number of Halstead h2 unique operands (All the variables and constants are considered operands) Mix of mutable and immutable functions Functions spread across multiple Python files

Here is a refactored version of those 3 functions, where the following happens:

.init() replaces load_image(). crop() becomes a class method. get_image_thumbnail() becomes a property.

The thumbnail resolution has become a class property, so it can be changed globally or on that particular instance:

imagelib-refactored.py
class Image(object):
    thumbnail_resolution = 100

    def __init__(self, path):
        ...

    def crop(self, width, height):
        ...

    @property
    def thumbnail(self):
        ...
        return thumb

This is how the refactored example would look:

from imagelib import Image

image = Image('~/face.jpg')
image.crop(400, 500)
thumb = image.thumbnail

In the resulting code, we have solved the original problems:

It is clear that thumbnail returns a thumbnail since it is a property, and that it doesn’t modify the instance. The code no longer requires creating new variables for the crop operation.

2. Objects That Should Be Functions

Here are some tell-tale signs of incorrect use of classes:

Classes with 1 method (other than .init()) Classes that contain only static methods

authentication class
class Authenticator(object):
    def __init__(self, username, password):
        self.username = username
        self.password = password


def authenticate(self):
    # Do something
    return result

It would make more sense to just have a simple function named authenticate() that takes username and password as arguments:

authenticate.py
def authenticate(username, password):
    # Do something
    return result

3. Converting “Triangular” Code to Flat Code

These are the symptoms of highly nested code:

A high cyclomatic complexity because of the number of code branches A low Maintainability Index because of the high cyclomatic complexity relative to the number of lines of code

def contains_errors(data):
    if isinstance(data, list):
        for item in data:
            if isinstance(item, str):
                if item == "error":
                    return True
    return False

Refactor this function by “returning early”

def contains_errors(data):
    if not isinstance(data, list):
        return False
    return data.count("error") > 0

Another technique to reduce nesting by list comprehension

Common practise to create list, loop through and check for criteria.

results = []
for item in iterable:
    if item == match:
        results.append(item)

Replace with:

results = [item for item in iterable if item == match]

4. Handling Complex Dictionaries With Query Tools

It does have one major side-effect: when dictionaries are highly nested, the code that queries them becomes nested too.

data = {
  "network": {
    "lines": [
      {
        "name.en": "Ginza",
        "name.jp": "銀座線",
        "color": "orange",
        "number": 3,
        "sign": "G"
      },
      {
        "name.en": "Marunouchi",
        "name.jp": "丸ノ内線",
        "color": "red",
        "number": 4,
        "sign": "M"
      }
    ]
  }
}

If you wanted to get the line that matched a certain number, this could be achieved in a small function:

def find_line_by_number(data, number):
    matches = [line for line in data if line['number'] == number]
    if len(matches) > 0:
        return matches[0]
    else:
        raise ValueError(f"Line {number} does not exist.")

find_line_by_number(data["network"]["lines"], 3)

There are third party tools for querying dictionaries in Python. Some of the most popular are JMESPath, glom, asq, and flupy.

import jmespath

jmespath.search("network.lines", data)
[{'name.en': 'Ginza', 'name.jp': '銀座線',
  'color': 'orange', 'number': 3, 'sign': 'G'},
 {'name.en': 'Marunouchi', 'name.jp': '丸ノ内線',
  'color': 'red', 'number': 4, 'sign': 'M'}]

If you wanted to get the line number for every line, you could do this:

jmespath.search("network.lines[*].number", data) [3, 4]

You can provide more complex queries, like a ==or <. The syntax is a little unusual for Python developers, so keep the documentation handy for reference.

Find the line with the number 3

    > > > jmespath.search("network.lines[?number==`3`]", data)
    > > > [{'name.en': 'Ginza', 'name.jp': '銀座線', 'color': 'orange', 'number': 3, 'sign': 'G'}]

1. Functions That Should Be Objects​

2. Objects That Should Be Functions​

3. Converting “Triangular” Code to Flat Code​

4. Handling Complex Dictionaries With Query Tools​

1. Functions That Should Be Objects

2. Objects That Should Be Functions

3. Converting “Triangular” Code to Flat Code

4. Handling Complex Dictionaries With Query Tools