-
Notifications
You must be signed in to change notification settings - Fork 2
Don't redo the parsing when normalizing the url. #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the URL normalization process by avoiding redundant parsing when creating normalized URLs. The main change replaces a call to parse with direct object creation in the normalized method.
- Replaces
parsecall with direct object instantiation in thenormalizedmethod to avoid re-parsing - Extracts public suffix domain parsing logic into a separate private method for reuse
- Updates test expectations and RSpec configuration
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| lib/twingly/url.rb | Optimizes normalized method by avoiding re-parsing and extracts public suffix domain logic |
| spec/lib/twingly/url_spec.rb | Updates test expectation for constructor visibility change |
| spec/spec_helper.rb | Adds focus filter configuration for RSpec |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
4c1f92b to
e22e706
Compare
e22e706 to
98eedc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
roback
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had one suggestion.
One "disadvantage" with this change, is that when we now call #normalize, we won't rescue potential PublicSuffix::DomainInvalid errors (since we don't call the .parse method inside the #normalize method anymore, which rescues these errors).
However, I'm not sure if it's possible for that error to get raised in this case, considering the non-normalized URL has already passed through the parse method before.
| normalized_url.path = normalized_path | ||
|
|
||
| self.class.parse(normalized_url) | ||
| public_suffix_domain = self.class.send(:get_public_suffix_domain, normalized_url.host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of calling this private class method here, then passing the result to the .new on the line below, we could make get_public_suffix_domain a private instance method instead and call it inside the initializer.
We still need to call the private .new method, but I don't see how we could get around that.
diff --git a/lib/twingly/url.rb b/lib/twingly/url.rb
index 83a734b..af99fcf 100644
--- a/lib/twingly/url.rb
+++ b/lib/twingly/url.rb
@@ -72,10 +72,7 @@ module Twingly
# URLs that can't be normalized should not be valid
try_addressable_normalize(addressable_uri)
- host = addressable_uri.host
- public_suffix_domain = get_public_suffix_domain(addressable_uri.host)
-
- new(addressable_uri, public_suffix_domain)
+ new(addressable_uri)
rescue *ERRORS_TO_EXTEND => error
error.extend(Twingly::URL::Error)
raise
@@ -120,13 +117,6 @@ module Twingly
label.match?(LETTERS_DIGITS_HYPHEN)
end
- def get_public_suffix_domain(host)
- public_suffix_domain = PublicSuffix.parse(host, list: CUSTOM_PSL, default_rule: nil)
- raise Twingly::URL::Error::ParseError if public_suffix_domain.nil?
- raise Twingly::URL::Error::ParseError if public_suffix_domain.sld.nil?
- public_suffix_domain
- end
-
private :new
private :internal_parse
private :clean_input
@@ -134,12 +124,11 @@ module Twingly
private :try_addressable_normalize
private :valid_hostname?
private :valid_label?
- private :get_public_suffix_domain
end
- def initialize(addressable_uri, public_suffix_domain)
+ def initialize(addressable_uri)
@addressable_uri = addressable_uri
- @public_suffix_domain = public_suffix_domain
+ @public_suffix_domain = get_public_suffix_domain(addressable_uri.host)
end
def scheme
@@ -193,8 +182,7 @@ module Twingly
normalized_url.host = normalized_host
normalized_url.path = normalized_path
- public_suffix_domain = self.class.send(:get_public_suffix_domain, normalized_url.host)
- self.class.send(:new, normalized_url, public_suffix_domain)
+ self.class.send(:new, normalized_url)
end
def normalized_scheme
@@ -262,6 +250,13 @@ module Twingly
attr_reader :addressable_uri, :public_suffix_domain
+ def get_public_suffix_domain(host)
+ public_suffix_domain = PublicSuffix.parse(host, list: CUSTOM_PSL, default_rule: nil)
+ raise Twingly::URL::Error::ParseError if public_suffix_domain.nil?
+ raise Twingly::URL::Error::ParseError if public_suffix_domain.sld.nil?
+ public_suffix_domain
+ end
+
def normalize_blogspot(host, domain)
if domain.sld.downcase == "blogspot"
host.sub(STARTS_WITH_WWW, "").sub(/#{domain.tld}\z/i, "com")
The parsing isn't redone when calling normalized.
I don't want to make any big changes in the this gem right now. But this is minimal change that will do a significant speedup with what I consider a low risk of introducing changed behavior/bugs.