Dunghill Anti-Pattern - Why utility classes and modules smell?

Sep 5, 2023 · 17 min read

Introduction

The Dunghill Anti-Pattern is worth knowing about, even if the name sounds weird. By avoiding dunghills you can increase the maintainability of your codebase.

I call directories or files that contain a large mix of random functions “dunghills”. They tend to accumulate a massive pile of code pieces over time that are not related to each other, which turns them into a metaphorical dunghill. Developers often use names like utils.js, common.py or Helpers.java for these catch-all dumps.

There have been discussions about whether utility classes or files are a bad practice. They are not an incarnation of evil, but they may lead you into a trap if used carelessly. When used correctly, they are a useful tool to structure a software project, and in this article, I’ll show you how to do that.

What are utility functions?

Utility functions are basic functions that do one simple thing and can be used in different parts of an application.

Usually, developers make utility functions to avoid repeating the same code in many places.

Examples of utility functions:

is_email_valid(email): Validates if a string has an email format.
flatten(arr): Flattens a nested array.
hex_to_rgb(hex_value): Converts a hex color code to RGB values.

Sometimes you hear the term helper function being used. It means the same thing as a utility function.

What are utility files and utility classes?

A utility file or utility module means simply a collection of utility functions.

In languages such as Python and JavaScript, which allow functions outside of classes, developers often use utility files instead of utility classes. These files contain commonly used helper functions that don’t fit naturally within other parts of the project or are used throughout the project.

A utility class, on the other hand, refers to a static class that provides a collection of utility functions as static methods. They are often used to provide helper methods in languages such as Java that don’t allow package level functions.

Example of a utility class:

// A sample utility class in Java.

public final class StringUtils {

    // Private constructor to prevent instantiation
    private StringUtils() {}

    public static boolean isNullOrEmpty(String str) {
        return str == null || str.isEmpty();
    }

    public static String capitalizeFirstLetter(String str) {
        if (isNullOrEmpty(str)) {
            return str;
        }
        return Character.toUpperCase(str.charAt(0)) 
            + str.substring(1);
    }

    public static String reverse(String str) {
        if (isNullOrEmpty(str)) {
            return str;
        }
        return new StringBuilder(str).reverse().toString();
    }

}

Why do we need utility classes or modules?

Utility classes and modules collect commonly used simple methods in one place for easy reuse throughout the application. They are necessary as utility functions have to exist somewhere.

For example, think about Math.sin(double a) method in Java. It makes sense that the method exists only once and the Math class is a logical home for it. The method is also independent. All it requires is the a-parameter, so it can be a static utility method instead of being a non-static instance method.

The very basic idea of why utility classes and methods exist is to avoid reinventing the wheel for common, basic operations.

The dunghill anti-pattern

The problems with utility files arise when functions that have less to do with each other are grouped in large, generic collections into a single file. I call these files dunghills because developers just dump stuff into them and they smell far away.

An example of a probable dunghill file in a project:

src
│
├─ app.py
├─ login.py
├─ users.py  
├─ utils.py   <────  A probable dunghill

A typical dunghill is a file or class that began with a few helpers and then grew into a massive collection of them. Often the file has a name such as Utils.java or helpers.js.

An example of utils.py dunghill file:

# An example of dunghill anti-pattern:
#
# Large number of utility functions that have less
# to do with each other in a single python module.

def log(message):
    # ...

def post_request(url, headers, body):
    # ...

def validate_username(username):
		# ...

def validate_email_address(email):
    # ...

def calculate_order_total(order):
		# ...

def send_email(email):
    # ...

def export_order_pdf(order):
		# ...

def format_timestamp(timestamp):
		# ...

# ...
# ... large number of random helper functions
# ...

Sometimes, the dung pile could have grown very large, even thousands of code lines, as new dung (functions) was simply shoveled onto them. And, man, that smells from far, far away.

Sometimes a sneaky developer has tried to disguise the dunghill by giving it a name, such as core.py, base.py, or toolbox.py. This does not make it any better.

💡

The dunghill anti-pattern is having a lot of unrelated helper functions in a single file or class.

Any utility file, class, or module that tempts you to add many kinds of utility functions there is a dunghill. A strong smell of a dunghill is a filename that does not include any information about what should be in the file (e.g. misc.php).

Why are utility classes and files considered bad practice?

To be honest, this question is misleading.

Utility files are not something evil that you should avoid using at all cost. They are one tool in your toolbox and there is a time and place for them, when used correctly.

However, the large, generic utility files are a very bad practice. Here’s why:

Hard to find anything: When you dump many unrelated functions or methods into a single file, it becomes challenging to keep them organized. A utility file/class grows over time into a massive, unstructured collection of code snippets, which leads to difficulties finding anything from that file. And if you don’t even know what the file should contain, you might not look into it at all. Eventually, this can lead to code duplication because you didn’t know that the functionality already exists, and you re-implemented it. By dumping unrelated functions into a single file, you’re essentially creating a tangled web that hinders proper code organization.
Unclear purpose of the functions: When a utility file/class grows into a massive pile of random methods, the methods serve various purposes. The code blob is incohesive as the functions do not relate to each other. You are violating the Single Responsibility Principle (SRP). This chaotic mix of code snippets obscures the purpose of methods, making it difficult to understand the code base. And this slows down the development.
Refactoring challenges: Refactoring a large utility file can be daunting, as changes to one part of the file may inadvertently affect other parts. I.e., the parts of the file can be tightly coupled and are more prone to breaking when you make changes.
Testing challenges: For large pile of functions you need numerous tests. And if you have utils_tests.py file, you have actually two dunghills that are both hard to maintain and work with. Since the utility file contains unrelated functions, it might also need a wide range of test scenarios with very different setup needs.
You don’t know what should be in there. Because the naming of the file is so vague, it’s unclear what the file should contain. What kind of code should you exactly put there?
It’s just getting worse over time: Generic utility files have a cumulative nature. It’s (too) easy to simply add a new function to the utility file without considering its proper place in the codebase. And when you add more and more functions to the file, this accumulation of unrelated functions exacerbates existing maintenance challenges. The codebase becomes even harder to work with.

I consider dunghills to be a sign of laziness or lack of self-discipline. Developers just put anything in the utils files because they don’t want to spend time considering where the function or method should really be so that the structure would be clear. In the long run, they’ll regret that, so do a favor for yourself and clean utility files up as early as possible.

Where do I put my utility functions? Alternatives to generic utility classes.

Okay, if you cannot use the utilities.py or common.js files, where should you place the utility functions? And how can you avoid using utility classes?

My answer is that you should put the code “to where it really belongs”.

What I mean is that when you think about some helper method that you have, usually it’s more like a specific method of some functionality. You should put it to the file that match the functionality.

Let’s examine some concrete examples of this and review some best practices.

1. Next to the feature that is using it

If your application has more than a few features, a good way to organize files is by feature. This is called feature based directory structure or a Folder-By-Feature structure. Often a function is related to a specific feature, and in this case the best place for it is in the corresponding directory. The file should be named after the true responsibilities of it.

For example let’s consider the function calculate_order_total(order)

If we are building an e-commerce application, we can put the function under orders feature. In this case it also looks like that it could be a method of order class.

E-Commerce-App
│
├─── auth
│    |
|    ...
│
├─── products
│    |
|    ...
│
├─── orders
|    ├── order.py  <--- You can place the code here 
|    |                  as order.calculate_total() method
|    ...
...

2. To a function specific, smaller utility file

You can split a large, generic utility files into smaller, function-specific files.

Instead having a massive utils.py file, split it to files such as string_utils.py, date_utils.py and file_utils.py. This approach makes it easier to locate and maintain the needed methods, as they are grouped by functionality.

I recommend using this method when the functions contain truly shared functionality and are not related to some specific feature or business logic.

For example, the format_timestamp(timestamp) function is most likely used in multiple views so logical place for the file is to put in a file called formatting.py or date_utils.py. This is also much better than having the formatting rules defined in multiple locations.

💡

A function-specific utility file should be small and only contain functions that are closely related to each other.

When splitting a large helper file to smaller ones, it is essential to pay attention that the functions are related to each another. Think about the math module in python for example. It has functions such as sqrt, sin, tan and floor. It is a collection of mathematical operations. The fact that all of the methods are closely tied to one another is important. The module as a whole is cohesive.

Our formatting.py could contain also format_money and format_number functions as they all are related to formatting.

Here’s some more examples to get the idea:

Function	Where to place it?
post_request(url, headers, body)	http.py or request.py
validate_email_address(email)	email.py or validation.py
format_timestamp(timestamp)	date.py or formatting.py
to_camel_case(text)	strings.py
area_of_circle(circle)	math_utils.py

And here’s how the structure of the e-commerce app would look after splitting the dunghill file:

E-Commerce-App
│
...
│
├─── orders
|    ├── order.py  
|    |             
|    ...
|
├─── utils
|    ├── logger.py       <─────  Use function-specific small
|    ├── http.py         <────┤  utility files instead of a 
|    ├── email.py        <────┤  large utils.py file.
|    ├── date_utils.py   <────┤
|    ├── string_utils.py <────┤
...  ...

Now it’s much easier, for example, to check what functions there are for handling dates, isn’t it?

3. To a service class

A service class is an alternative to having a function-specific utility file.

While utility files/classes contain small, commonly used functions such as format_timestamp, a service is a higher level of abstraction with a public interface and often hides many implementation details.

A service provides a way to interact with specific functionality of the system. For example, a LoggingService provides an interface for logging, and an OrderExportService provides an interface for exporting orders.

Services can have multiple concrete implementations. As an example, the LoggingService interface can have concrete implementations such as FileLoggingService and ConsoleLoggingService like in the illustration below. Similarly, the OrderExportService can have OrderPdfExportService and OrderCsvExportService implementations.

A LoggingService interface having two concrete implementations: FileLoggingService and ConsoleLoggingService

While in the illustration above, there is a separate LoggingService interface, it is not a requirement to have an interface for having a service. For example, you can have a ProductSearchService implementation that searches for products from the SQL database without having a separate IProductSearchService interface. If you expect that there won’t be multiple implementations, it is often even preferable to avoid over-complexity by not creating an extra abstraction layer.

What’s the difference between services and utility classes?

The table below compares the utility files and services. They both contain commonly used, shared code but serve the slightly different purpose.

Utility files or classes	Services
Implements commonly used general utility functions. Examples: • DateUtils.toDate(dateString) • DateUtils.formatTimestamp(dateString)	Implements an interface for some specific functionality of the system. Examples: • LoggingService • FileStoreService • ProductSearchService • OrderService
Has one concrete implementation of each function.	Can have multiple concrete implementations of an interface. Examples: • FileLoggingService • ConsoleLoggingService • SQLLoggingService
Is stateless.	Can have a state such as: • A file handle • A database connection
Contains no business logic.	May include business logic.
A collection of public functions.	Hides the complexity of the functionality in private class members (encapsulation).

Sometimes it may not be clear whether you should create a utility class or a service. As a rule of thumb, go with a service if any of the following are true:

There is any kind of state.
There might be multiple concrete implementations.
There is business logic.
There is communication with external systems.

Let’s see how our E-Commerce App looks like after refactoring it to use service modules:

E-Commerce-App
│
...
│
├─── orders
|    ├── order.py  
|    ├── order-export-service.py
|    ├── order-pdf-export-service.py
|    ├── order-csv-export-service.py
|    ...
|
├─── services
|    ├── logging-service.py
|    ├── file-logging-service.py
|    ├── console-logging-service.py
|    ├── email-sending-service.py
|
├─── utils
|    ├── http.py
|    ├── date_utils.py
|    ├── string_utils.py
...  ...

We have added order-related export functionality to order-pdf-export-service.py and order-csv-export-service.py. The both classes implement the order-export-service.py interface.

(As a side note: You can create an interface-like behavior in python by using an abstract classes with the abc.ABCMeta library.)

We have also created a services directory for the services that are not related to any specific feature but are used throughout the system.

Large utility directories - A new dunghill ahead

Okay, we got rid of the massive utils.js file and continued happily hacking our project forward. However, time has passed and one morning we realize a new smell in our project. We now have a massive amount of files in the utils/ directory, and once again, we are having a hard time finding anything.

E-Commerce-App
│
...
├─── utils                   <─────  An example of dunghill directory 
     ├── api_helpers.py              (Anti-pattern)
     ├── auth_util.py
     ├── caching.py
     ├── constants.py
     ├── csv_utilities.py
     ├── data_conversion.py
     ├── data_export.py
     ├── data_manager.py
     ├── data_parser.py
     ├── data_processing.py
     ├── date_utils.py
     ├── db_manager.py
     ├── debug.py
     ├── decorators.py
     ├── email.py
     ├── encryption.py
     ├── error_handler.py
     ├── excel_import.py
     ├── form_validation.py
     ├── formatting.py
     ├── html_utils.py
     ├── http.py
     ├── json_utils.py
     ├── markup.py
     ├── math_functions.py
     ├── network.py
     ├── parsing.py
     ├── pdf_generation.py
     ├── regex_utils.py
     ├── search.py
     ├── string_utils.py
     ├── time_utils.py
     ├── unicorn_utils.py
     ├── xml_utils.py

I call the massive utility directories as dunghill directories. They are kind of catch-all folders, containing (too) many files that are not related to each other.

The directory might have a name like utils, shared, common, helpers, misc or some other meaningless word.

The situation is definitely not as bad as it would be with one huge utility file, because the code is split into multiple files. But it’s not optimal either.

💡

The dunghill directory is a directory containing a lot of utility files that aren’t related.

Problems of dunghill directories:

Difficulty finding files: With numerous utility files in the directory, it can take a long time to find the specific functions or methods you need. This can be frustrating and can reduce productivity.
Increased complexity: When there are more utility files in a directory, it gets harder to navigate and understand how files and functions depend on each other. This can cause problems when making changes to existing code.
Maintenance challenges: The maintenance problems associated with a single utility file also exist in utility directories. These directories often contain unrelated files and functions, making it difficult to understand the overall structure and organization of the codebase. It can also be challenging to determine where to add a new function that you need.

How to clean up dunghill directories?

Group similar files in a subdirectory: Instead of dumping all utility files into a single directory, group them according to their functionality or purpose.
- Examples:
  - utils/formatting/
  - utils/file_io/
Adopt a feature-based directory structure: Organize your codebase based on the features it provides. For each feature, create a dedicated directory that contains all related files and modules. Move utility files next to the feature they belong to. This approach improves code organization and enhances maintainability by ensuring that all code related to a specific feature is located in one place.
- Example: If your application has an authentication feature, create an authentication/ directory and move auth_util.py there.
- Example: For an e-commerce application, create directories such as authentication, products, and orders to organize code based on the specific features they provide.
Get rid of files with meaningless names: Refactor the functions where they really belong or rename the files with more descriptive names..
- Examples of meaningless file names:
  - data_parser.py
  - data_manager.py
  - data_processing.py
Avoid file names like these. Group the functions to files whose names describe the functionalities better.

After cleaning up the dunghill directory, our e-commerce app could look like this:


E-Commerce-App
│
├─── authentication
|    ├──
|    ...
|
├─── orders
|    ├── order.py  
|    ├─── services/
|    |    ├──
|    ...  ...
|
├─── search
|    ├──
|    ...
|
├─── services
|    ├── logging
|    |   ├── logging-service.py
|    |   ├── file-logging-service.py
|    |   ├── console-logging-service.py
|    ├── cache-service.py
|    ├── database-service.py
|    ├── email-sending-service.py
|    ...
...
|
├─── utils
     ├── errors
     |   ├── debug_utils.py
     |   ├── error_handler.py
     |   
     ├── files
     |   ├── csv_utils.py
     |   ├── excel_import.py
     |   ├── html_utils.py
     |   ├── pdf_generation.py
     |   ├── json_utils.py
     |   ├── xml_utils.py
     |
     ├── datatypes
     |   ├── date_utils.py
     |   ├── string_utils.py
     |   ├── time_utils.py
     |
     ├── network	   
     |   ├── api_utils.py
     |   ├── network_utils.py
     |   ├── http.py
     |
     ├── encryption.py
     ├── form_validation.py
     ├── formatting.py
     ├── math_utils.py
     ├── regex_utils.py

Note that there are still files left at the root level of the utils directory. This is perfectly fine when there are small amount of them and they do not form any logical groups.

Cleaning up is not a one-time task, but restructuring should be done whenever new logical groups can be identified or the number of files has increased.

Extra tips for avoiding and eliminating utility dunghills

Create safeguards: By creating project-level safeguards, you can avoid dunghills to be created.
- File length limit: Enforce a maximum length for code files using a static code analysis tool. In Python, you can use pylint, which has a default maximum of 1000 code lines per file. This is quite a large file, so I recommend using a smaller value, around 400-500 lines.
- Maximum number of files in a directory: When working on a very large project with many teams, consider setting a maximum number of files per directory to avoid dunghill directories. Implement the check in your CI/CD pipeline.
Use AI to split large files: Take advantage of AI-based tools to analyze your file and suggest ways to split large, monolithic utility files into smaller, more specific files.

A sample prompt for AI:
```
Organize the functions in the file below into logical groups,
with a maximum of 10 functions per group. Move each group to its
own file with a descriptive name. Output each file in its own 
code block.

File:
”””
< paste the file contents here >
”””
```
Remember always to review the results and make any necessary changes manually.
Avoid meaningless words: Avoid words in file and directory names that do not have any specific meaning. This helps you to avoid dunghills forming up.
- Examples of meaningless names: helper, util, misc, common, data, processor, manager, parser, handler
  
  👿 Bad name 🙂 Good name
  
  user-stuff.js user-registration.js
  
  data.sql getUserData.sql
  
  utils.css buttonStyles.css

👿 Bad name	🙂 Good name
user-stuff.js	user-registration.js
data.sql	getUserData.sql
utils.css	buttonStyles.css

Conclusion

Utility classes, modules, and libraries are essential in software development and can be useful if you use them wisely. They have their pros and cons, but without proper consideration, they can lead to stinking “dunghills”.

Large, generic utility files and directories can slow down your development. Instead, place the code where it really belongs, and create small, specific helper files and services. Use a feature-based directory structure to enhance code organization.

👿 Bad - Dunghill anti-pattern	🙂 Good - Try instead
Generic utility modules or classes: • utils.py • helpers.js • shared.ts • Common.java	Files next to the feature using it: • products/qr_code.js Small, focused utility files / classes: • encryption.py • form_validation.py • regex_utils.py Services: • EmailService • FileService • TranslationService
utils/ or helpers/ directory with large amount of utility files	Utility files and services next to the feature using it. Shared services in their own directory. Shared utility files grouped logically in subdirectories: • utils/validation • utils/diagnostics

With these methods in mind, you can improve the quality of your code base. Start by reorganizing your current utility files and directories to enjoy the benefits.

Your subscription could not be saved. Please try again.

Your subscription has been successful.

Get notified!

Be the first to hear when something new is out.

No spam. Unsubscribe at any time.

View the privacy policy.