Refactoring 025 - Decompose Regular Expressions

by Maximiliano Contieri5 min readMarch 31st, 2025

Too Long; Didn't Read

You can break down a complex validation regex into smaller parts to test each part individually and report accurate errors.

People Mentioned

Company Mentioned

featured image - Refactoring 025 - Decompose Regular Expressions

Read by Dr. One voice-avatar

Listen to this story

Make Regular Expressions Testable and Understandable

TL;DR: You can break down a complex validation regex into smaller parts to test each part individually and report accurate errors.

Problems Addressed 😔

Hard-to-test regular expressions
Unclear error reporting
Debugging nightmares
Maintenance challenges
Too long lines and methods
Unmaintainable expressions
Primitive Obsession
Error isolation
Knowledge silos
Obsolete comments
Errors without empathy to end users

HACKERNOON

Code Smell 276 - Untested Regular Expressions | HackerNoon

Use clear and concise regular expressions, and test them thoroughly.

https://hackernoon.com/how-to-find-the-stinky-parts-of-your-code-part-xxv

https://hackernoon.com/how-to-find-the-stinky-parts-of-your-code-part-i-xqz3evd

https://hackernoon.com/how-to-find-the-stinky-parts-of-your-code-part-xxxvii

https://hackernoon.com/how-to-find-the-stinky-parts-of-your-code-part-xx-we-have-reached-100

https://hackernoon.com/how-to-find-the-stinky-parts-of-your-code-part-ix-7rr33ol

Steps 👣

Analyze the regex to identify its logical components.
Break the regex into smaller, named sub-patterns for each component.
Write unit tests for each sub-pattern to ensure it works correctly.
Combine the tested sub-patterns into the full validation logic.
Refactor the code to provide clear error messages for every failing part.

Sample Code 💻

Before 🚨

function validateURL(url) {
  const urlRegex =
    /^(https?:\/\/)([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})(\/.*)?$/;
  // Criptic and untesteable
  return urlRegex.test(url);
}

After 👉

// Step 1: Define individual regex components
const protocolPattern = /^(https?:\/\/)/; 
const domainPattern = /^[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/; 
const pathPattern = /^\/.*$/;

// Step 2: Write unit tests for each component
describe("Protocol Validation", () => {
  test("should pass for http://", () => {
    expect(protocolPattern.test("http://")).toBe(true);
  });

  test("should pass for https://", () => {
    expect(protocolPattern.test("https://")).toBe(true);
  });

  test("should fail for invalid protocols", () => {
    expect(protocolPattern.test("ftp://")).toBe(false);
  });
});

describe("Domain Validation", () => {
  test("should pass for valid domains", () => {
    expect(domainPattern.test("example.com")).toBe(true);
    expect(domainPattern.test("sub.domain.org")).toBe(true);
  });

  test("should fail for invalid domains", () => {
    expect(domainPattern.test("example")).toBe(false);
    expect(domainPattern.test("domain..com")).toBe(false);
  });
});

describe("Path Validation", () => {
  test("should pass for valid paths", () => {
    expect(pathPattern.test("/path/to/resource")).toBe(true);
    expect(pathPattern.test("/")).toBe(true);
  });

  test("should fail for invalid paths", () => {
    expect(pathPattern.test("path/to/resource")).toBe(false);
    expect(pathPattern.test("")).toBe(false);
  });
});

// Step 3: Validate each part and report errors
function validateURL(url) {
  if (!protocolPattern.test(url)) {
    throw new Error("Invalid protocol. Use http:// or https://.");
  }

  const domainStartIndex = url.indexOf("://") + 3;
  const domainEndIndex = url.indexOf("/", domainStartIndex);
  const domain = domainEndIndex === -1 ? 
        url.slice(domainStartIndex) :
        url.slice(domainStartIndex, domainEndIndex);

  if (!domainPattern.test(domain)) {
    throw new Error("Invalid domain name.");
  }

  const path = url.slice(domainEndIndex);
  if (path && !pathPattern.test(path)) {
    throw new Error("Invalid path.");
  }

  return true;
}

// Step 4: Add integration tests for the full URL validation
describe("Full URL Validation", () => {
  test("should pass for valid URLs", () => {
    expect(validateURL("https://lesluthiers.com/tour/")).toBe(true);
    expect(validateURL("https://bio.lesluthiers.org/")).toBe(true);
  });

  test("should fail for invalid URLs", () => {
    expect(() => validateURL("ftp://mastropiero.com")).
      toThrow("Invalid protocol");
    expect(() => validateURL("http://estherpsicore..com")).
      toThrow("Invalid domain name");
    expect(() => validateURL("http://book.warren-sanchez")).
      toThrow("Invalid path");
  });
});

Type 📝

[x]Semi-Automatic

Safety 🛡️

This refactoring is safe if you follow the steps carefully.

Testing each component ensures that you catch errors early.

Why is the Code Better? ✨

The refactored code is better because it improves readability, maintainability, and testability.

Breaking down the regex into smaller parts makes understanding what each part does easier.

You can also report specific errors when validation fails, which helps users fix their input.

This is also a great opportunity to apply the Test-Driven Development technique, gradually increasing complexity by introducing new subparts.

How Does it Improve the Bijection? 🗺️

By breaking down the regex into smaller, meaningful components, you create a closer mapping between the Real-World requirements (e.g., "URL must have a valid protocol") and the code.

This reduces ambiguity and ensures the code reflects the problem domain accurately.

Limitations ⚠️

This approach might add some overhead for very simple regex patterns where breaking them down would be unnecessary.

Refactor with AI 🤖

You can use AI tools to help identify regex components.

Ask the AI to explain what each part of the regex does, then guide you in breaking it into smaller, testable pieces. For example, you can ask, "What does this regex do?" and follow up with, "How can I split it into smaller parts?".

It's 2025, No programmer should write new Regular Expressions anymore.

You should leave this mechanical task to AI.

Suggested Prompt: 1. Analyze the regex to identify its logical components.2. Break the regex into smaller, named sub-patterns for each component.3. Write unit tests for each sub-pattern to ensure it works correctly.4. Combine the tested sub-patterns into the full validation logic.5. Refactor the code to provide clear error messages for every failing part.

Without Proper Instructions	With Specific Instructions
ChatGPT	ChatGPT
Claude	Claude
Perplexity	Perplexity
Copilot	Copilot
Gemini	Gemini
DeepSeek	DeepSeek
Meta AI	Meta AI
Qwen	Qwen

Tags 🏷️

Testability

Level 🔋

[x]Intermediate

‌

‌‌‌

‌

‌‌‌

Credits 🙏

Image by Gerd Altmann on Pixabay

This article is part of the Refactoring Series.

‌

‌‌‌

L O A D I N G
. . . comments & more!

About Author

Maximiliano Contieri@mcsee

I’m a sr software engineer specialized in Clean Code, Design and TDD Book "Clean Code Cookbook" 500+ articles written

Read my stories Read my new Book!

Mentioned in this story

companies

Meta

@relatedcode
Related Code

Join HackerNoon

Latest technology trends. Customized Experience. Curated Stories. Publish Your Ideas

Refactoring 025 - Decompose Regular Expressions

Maximiliano Contieri

@mcsee

Too Long; Didn't Read

People Mentioned

Company Mentioned

Maximiliano Contieri

STORY’S CREDIBILITY

Code License

Problems Addressed 😔

Code Smell 276 - Untested Regular Expressions | HackerNoon

Steps 👣

Sample Code 💻

Before 🚨

After 👉

Type 📝

Safety 🛡️

Why is the Code Better? ✨

How Does it Improve the Bijection? 🗺️

Limitations ⚠️

Refactor with AI 🤖

Tags 🏷️

See also 📚

Credits 🙏

About Author

TOPICS

Languages

THIS ARTICLE WAS FEATURED IN...

Mentioned in this story

companies

Refactoring 025 - Decompose Regular Expressions

Maximiliano Contieri

@mcsee

Too Long; Didn't Read

People Mentioned

Company Mentioned

Maximiliano Contieri

STORY’S CREDIBILITY

Code License

Problems Addressed 😔

Related Code Smells 💨

Code Smell 276 - Untested Regular Expressions | HackerNoon

Steps 👣

Sample Code 💻

Before 🚨

After 👉

Type 📝

Safety 🛡️

Why is the Code Better? ✨

How Does it Improve the Bijection? 🗺️

Limitations ⚠️

Refactor with AI 🤖

Tags 🏷️

Related Refactorings 🔄

See also 📚

Credits 🙏

About Author

TOPICS

Languages

THIS ARTICLE WAS FEATURED IN...

Mentioned in this story

companies

RELATED STORIES