Skip to content

Conversation

@bjornkellner
Copy link

@bjornkellner bjornkellner commented Sep 12, 2025

The parsing isn't redone when calling normalized.
I don't want to make any big changes in the this gem right now. But this is minimal change that will do a significant speedup with what I consider a low risk of introducing changed behavior/bugs.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the URL normalization process by avoiding redundant parsing when creating normalized URLs. The main change replaces a call to parse with direct object creation in the normalized method.

  • Replaces parse call with direct object instantiation in the normalized method to avoid re-parsing
  • Extracts public suffix domain parsing logic into a separate private method for reuse
  • Updates test expectations and RSpec configuration

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
lib/twingly/url.rb Optimizes normalized method by avoiding re-parsing and extracts public suffix domain logic
spec/lib/twingly/url_spec.rb Updates test expectation for constructor visibility change
spec/spec_helper.rb Adds focus filter configuration for RSpec

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@twingly twingly deleted a comment from Copilot AI Sep 12, 2025
@twingly twingly deleted a comment from Copilot AI Sep 12, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@bjornkellner bjornkellner changed the title Don't redo the parsing on when normalizing the url. Don't redo the parsing when normalizing the url. Sep 15, 2025
Copy link
Member

@roback roback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had one suggestion.

One "disadvantage" with this change, is that when we now call #normalize, we won't rescue potential PublicSuffix::DomainInvalid errors (since we don't call the .parse method inside the #normalize method anymore, which rescues these errors).

However, I'm not sure if it's possible for that error to get raised in this case, considering the non-normalized URL has already passed through the parse method before.

normalized_url.path = normalized_path

self.class.parse(normalized_url)
public_suffix_domain = self.class.send(:get_public_suffix_domain, normalized_url.host)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling this private class method here, then passing the result to the .new on the line below, we could make get_public_suffix_domain a private instance method instead and call it inside the initializer.

We still need to call the private .new method, but I don't see how we could get around that.

diff --git a/lib/twingly/url.rb b/lib/twingly/url.rb
index 83a734b..af99fcf 100644
--- a/lib/twingly/url.rb
+++ b/lib/twingly/url.rb
@@ -72,10 +72,7 @@ module Twingly
         # URLs that can't be normalized should not be valid
         try_addressable_normalize(addressable_uri)
 
-        host = addressable_uri.host
-        public_suffix_domain = get_public_suffix_domain(addressable_uri.host)
-
-        new(addressable_uri, public_suffix_domain)
+        new(addressable_uri)
       rescue *ERRORS_TO_EXTEND => error
         error.extend(Twingly::URL::Error)
         raise
@@ -120,13 +117,6 @@ module Twingly
         label.match?(LETTERS_DIGITS_HYPHEN)
       end
 
-      def get_public_suffix_domain(host)
-        public_suffix_domain = PublicSuffix.parse(host, list: CUSTOM_PSL, default_rule: nil)
-        raise Twingly::URL::Error::ParseError if public_suffix_domain.nil?
-        raise Twingly::URL::Error::ParseError if public_suffix_domain.sld.nil?
-        public_suffix_domain
-      end
-
       private :new
       private :internal_parse
       private :clean_input
@@ -134,12 +124,11 @@ module Twingly
       private :try_addressable_normalize
       private :valid_hostname?
       private :valid_label?
-      private :get_public_suffix_domain
     end
 
-    def initialize(addressable_uri, public_suffix_domain)
+    def initialize(addressable_uri)
       @addressable_uri      = addressable_uri
-      @public_suffix_domain = public_suffix_domain
+      @public_suffix_domain = get_public_suffix_domain(addressable_uri.host)
     end
 
     def scheme
@@ -193,8 +182,7 @@ module Twingly
       normalized_url.host   = normalized_host
       normalized_url.path   = normalized_path
 
-      public_suffix_domain = self.class.send(:get_public_suffix_domain, normalized_url.host)
-      self.class.send(:new, normalized_url, public_suffix_domain)
+      self.class.send(:new, normalized_url)
     end
 
     def normalized_scheme
@@ -262,6 +250,13 @@ module Twingly
 
     attr_reader :addressable_uri, :public_suffix_domain
 
+    def get_public_suffix_domain(host)
+      public_suffix_domain = PublicSuffix.parse(host, list: CUSTOM_PSL, default_rule: nil)
+      raise Twingly::URL::Error::ParseError if public_suffix_domain.nil?
+      raise Twingly::URL::Error::ParseError if public_suffix_domain.sld.nil?
+      public_suffix_domain
+    end
+
     def normalize_blogspot(host, domain)
       if domain.sld.downcase == "blogspot"
         host.sub(STARTS_WITH_WWW, "").sub(/#{domain.tld}\z/i, "com")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants