Don't redo the parsing when normalizing the url. #163

bjornkellner · 2025-09-12T11:31:55Z

The parsing isn't redone when calling normalized.
I don't want to make any big changes in the this gem right now. But this is minimal change that will do a significant speedup with what I consider a low risk of introducing changed behavior/bugs.

Copilot

Pull Request Overview

This PR optimizes the URL normalization process by avoiding redundant parsing when creating normalized URLs. The main change replaces a call to parse with direct object creation in the normalized method.

Replaces parse call with direct object instantiation in the normalized method to avoid re-parsing
Extracts public suffix domain parsing logic into a separate private method for reuse
Updates test expectations and RSpec configuration

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
lib/twingly/url.rb	Optimizes normalized method by avoiding re-parsing and extracts public suffix domain logic
spec/lib/twingly/url_spec.rb	Updates test expectation for constructor visibility change
spec/spec_helper.rb	Adds focus filter configuration for RSpec

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

lib/twingly/url.rb

roback

Had one suggestion.

One "disadvantage" with this change, is that when we now call #normalize, we won't rescue potential PublicSuffix::DomainInvalid errors (since we don't call the .parse method inside the #normalize method anymore, which rescues these errors).

However, I'm not sure if it's possible for that error to get raised in this case, considering the non-normalized URL has already passed through the parse method before.

roback · 2025-09-22T10:32:21Z

lib/twingly/url.rb

      normalized_url.path   = normalized_path

-      self.class.parse(normalized_url)
+      public_suffix_domain = self.class.send(:get_public_suffix_domain, normalized_url.host)


Instead of calling this private class method here, then passing the result to the .new on the line below, we could make get_public_suffix_domain a private instance method instead and call it inside the initializer.

We still need to call the private .new method, but I don't see how we could get around that.

diff --git a/lib/twingly/url.rb b/lib/twingly/url.rb index 83a734b..af99fcf 100644 --- a/lib/twingly/url.rb +++ b/lib/twingly/url.rb @@ -72,10 +72,7 @@ module Twingly # URLs that can't be normalized should not be valid try_addressable_normalize(addressable_uri) - host = addressable_uri.host - public_suffix_domain = get_public_suffix_domain(addressable_uri.host) - - new(addressable_uri, public_suffix_domain) + new(addressable_uri) rescue *ERRORS_TO_EXTEND => error error.extend(Twingly::URL::Error) raise @@ -120,13 +117,6 @@ module Twingly label.match?(LETTERS_DIGITS_HYPHEN) end - def get_public_suffix_domain(host) - public_suffix_domain = PublicSuffix.parse(host, list: CUSTOM_PSL, default_rule: nil) - raise Twingly::URL::Error::ParseError if public_suffix_domain.nil? - raise Twingly::URL::Error::ParseError if public_suffix_domain.sld.nil? - public_suffix_domain - end - private :new private :internal_parse private :clean_input @@ -134,12 +124,11 @@ module Twingly private :try_addressable_normalize private :valid_hostname? private :valid_label? - private :get_public_suffix_domain end - def initialize(addressable_uri, public_suffix_domain) + def initialize(addressable_uri) @addressable_uri = addressable_uri - @public_suffix_domain = public_suffix_domain + @public_suffix_domain = get_public_suffix_domain(addressable_uri.host) end def scheme @@ -193,8 +182,7 @@ module Twingly normalized_url.host = normalized_host normalized_url.path = normalized_path - public_suffix_domain = self.class.send(:get_public_suffix_domain, normalized_url.host) - self.class.send(:new, normalized_url, public_suffix_domain) + self.class.send(:new, normalized_url) end def normalized_scheme @@ -262,6 +250,13 @@ module Twingly attr_reader :addressable_uri, :public_suffix_domain + def get_public_suffix_domain(host) + public_suffix_domain = PublicSuffix.parse(host, list: CUSTOM_PSL, default_rule: nil) + raise Twingly::URL::Error::ParseError if public_suffix_domain.nil? + raise Twingly::URL::Error::ParseError if public_suffix_domain.sld.nil? + public_suffix_domain + end + def normalize_blogspot(host, domain) if domain.sld.downcase == "blogspot" host.sub(STARTS_WITH_WWW, "").sub(/#{domain.tld}\z/i, "com")

bjornkellner requested review from Copilot and roback September 12, 2025 11:31

Copilot AI reviewed Sep 12, 2025

View reviewed changes

bjornkellner force-pushed the optimize branch from 4c1f92b to e22e706 Compare September 12, 2025 11:36

Don't redo the parsing on when normalizing the url.

98eedc3

bjornkellner force-pushed the optimize branch from e22e706 to 98eedc3 Compare September 12, 2025 11:38

twingly deleted a comment from Copilot AI Sep 12, 2025

bjornkellner requested a review from Copilot September 12, 2025 11:39

Copilot AI reviewed Sep 12, 2025

View reviewed changes

lib/twingly/url.rb Show resolved Hide resolved

bjornkellner changed the title ~~Don't redo the parsing on when normalizing the url.~~ Don't redo the parsing when normalizing the url. Sep 15, 2025

roback reviewed Sep 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't redo the parsing when normalizing the url. #163

Don't redo the parsing when normalizing the url. #163

bjornkellner commented Sep 12, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

roback left a comment

Uh oh!

roback Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Don't redo the parsing when normalizing the url. #163

Are you sure you want to change the base?

Don't redo the parsing when normalizing the url. #163

Conversation

bjornkellner commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

roback left a comment

Choose a reason for hiding this comment

Uh oh!

roback Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bjornkellner commented Sep 12, 2025 •

edited

Loading