HTTP URL Validation Improved

I found the HTTP URL Validator for Rails very interesting, and well coded, yet it lacked some things such as URL format restrictions. I added some things, and I came up with a sweet solution.
It checks the format of the given URL, the content type, and whether it was permanently moved. I might be adding to this in the future.


#Check for content type:
  validates_http_url :url, :content_type => "text/html"

#Do not check for content type, just make sure the site is accessible:
  validates_http_url :website

#Make sure there is a DNS entry for a domain
  validates_http_domain :domain
# Domain must be in 'www.site.com' for or 'site.com' form.
# No http://, no path.

Update (6/26/06)

Added the validates_http_domain method:


def validates_http_domain(*attr_names)
  validates_each(attr_names) do |record, attr_name, value|
    # Set valid true on successful connect (all we need is one, one is all we need)
    failed = true
    possibilities = [value, "www."+value]
    possibilities.each do |url|
      begin
        temp = Socket.gethostbyname(url)
        rescue SocketError
          next
        end
        failed = false
        break
    end
    record.errors.add(attr_name, "cannot be resolved.") if failed
  end
end

Now I can just use


:validates_http_domain :website

in my controller, and everything comes up roses. ;)

Update (9/30/06)

It was brought to my attention through a dialogue of emails and the comments that I needed a simple way for people to modify the plugin to accept different codes depending on their needs, or at least a simple way for me to modify the default accepted codes. Therefore, I made an array in the library called allowed_codes:


           allowed_codes = [
            Net::HTTPMovedPermanently,
            Net::HTTPOK,
            Net::HTTPCreated,
            Net::HTTPAccepted,
            Net::HTTPNonAuthoritativeInformation,
            Net::HTTPPartialContent,
            Net::HTTPFound,
            Net::HTTPTemporaryRedirect,
            Net::HTTPSeeOther
           ]

I’ll make it so you can push on your own custom codes from the model soon. This is what I envision:


  validates_http_url :website, :extra_codes => [ HTTPResetContent, HTTPPartialContent ]

I’ll post here when this is reality, and probably make another blog post as well so the aggregators get it.


You can download it with svn:


svn co https://modzer0.cs.uaf.edu/repos/hank/code/http_url_validation_improved

Or use it as a plugin:


./script/plugin install -x https://modzer0.cs.uaf.edu/repos/hank/code/http_url_validation_improved

The above command only works if you have your entire rails project in subversion. If you do not, which I don’t recommend, you should either add it to a repository or alternatively remove the *-x* from the command. Of course, this will remove support for updating to the new code if I make a change.



Comments

  1. Apie

    June 16, 2006 at 4:40 PM

    Is the [:content_uype] perhaps a typo?

    unless configuration[:content_uype].nil?
    record.errors.add(attr_name,
    configuration[:message_wrong_content])
    if response['content-type'].
    index(configuration
    [:content_type]).nil
    end


  2. Hank

    June 16, 2006 at 4:40 PM

    Ahhh. Thank you! Fixing it now.


  3. Apie

    June 16, 2006 at 4:40 PM

    Thank you for the nice code. I used it to validate a number of URL’s. I just wanted to know whether the domainis a real. I decided that response codes > 400 except
    good_codes = [401,402,403,405,406,407,408] are okay. Any thoughts on this?


  4. Hank

    June 16, 2006 at 4:40 PM

    Perhaps I should make a new method – validates\_http\_domain – that way you can just check the dns entry for the domain. Not a bad idea. I’ll revise the code and add this method. Thanks for the input!


  5. Apie

    June 16, 2006 at 4:40 PM

    That would be splendid! I’m gonig to add this post to my ebtags and will check in in a while. Looking forward to integrating the code. My code is a bit of a hack at this stage, and I would feel much beter knowing its properly developed :)
    Will this method return true for something like when yahoo.com redirects you to http://www.yahoo.com – this would be great since from a human perspective yahoo.com is as valid as http://www.yahoo.com
    Many thanks!


  6. Hank

    June 16, 2006 at 4:40 PM

    OK – it’s all set – test it out – I know I am. ;)


  7. chemp

    June 16, 2006 at 4:40 PM

    intresting point!


  8. Caleb

    June 16, 2006 at 4:40 PM

    Hmm, this script seems to be having trouble with https://google.com.

    Any Ideas?


  9. Caleb

    June 16, 2006 at 4:40 PM

    Not able to find a way to contact you directly, I thought I’d just post a modified block of your code that allows the validator to validate both http AND https urls:

    ======================

    url = URI.parse(value)

    url.path = "/" if url.path.length < 1

    http = Net::HTTP.new(url.host, (url.scheme == ‘https’) ? 443 : 80)

    if url.scheme == ‘https’

    http.use_ssl = url.scheme == ‘https’

    http.verify_mode = OpenSSL::SSL::VERIFY _NONE

    end

    response, body = http.get(url.path)

    ===========================

    Note that there is a space between VERIFY and _NONE on the OpenSSL line. There seems to be some wierd formatting issues when posting to this page and the _ is removed with out the space.


  10. Hank

    June 16, 2006 at 4:40 PM

    Thanks, Caleb. I’ll merge it.


  11. sid137@gmail.com

    June 16, 2006 at 4:40 PM

    Hi,

    When I try to use this to validate the following URL

    http://www.youtube.com/watch?v=vFP-MktgOKU&eurl=

    It tells me that the page is inaccessible.

    I use a simple

    validates_http_url :url, :on => :create

    for my model, and am using svn version 136 I think..

    Any ideas?

    Thanks


  12. Hank

    June 16, 2006 at 4:40 PM

    Here’s what I get (rev 138):

    Website is not accessible Net::HTTPSeeOther

    This is because a 303 response doesnt count as valid in the current state of the plugin. I’ll make it valid since it obviously is. **svn up** at your leisure. In the future, you can just edit the **allowed_codes** array and submit a patch if you’d like:

    <typo:code lang=’ruby’ title=’lib/http_url_validation_improved.rb’>
    allowed_codes = [
    Net::HTTPMovedPermanently,
    Net::HTTPOK,
    Net::HTTPCreated,
    Net::HTTPAccepted,
    Net::HTTPNonAuthoritativeInformation,
    Net::HTTPPartialContent,
    Net::HTTPFound,
    Net::HTTPTemporaryRedirect,
    Net::HTTPSeeOther,
    ]
    </typo:code>
    Thanks for bringing this up. What a weird response.


  13. Walter McGinnis

    June 16, 2006 at 4:40 PM

    Nice. You fixed the problems I was having with the validates_http_url plugin. The only thing I’m still wanting is better handling of formatting of the url. Your code just kicks it to the rescue, but doesn’t give the user back anything meaningful in an validation message.

    For now I’m just going to do the validation of format separately with validates_format_of, but would be a nice future enhancment.

    Cheers,
    Walter


  14. Walter McGinnis

    June 16, 2006 at 4:40 PM

    Here’s a patch that finishs the work you started for formatting starting at line 76:

    # if response is nil, then it’s a format issue if response.nil?
    record.errors.add(attr_name, configuration[:message_url_format])
    else
    # Just Plain non-accessible record.errors.add(attr_name, configuration[:message_not_accessible]+" "+response.class.to_s)
    end


  15. riki

    June 16, 2006 at 4:40 PM

    Looks great but I’m getting a password dialog when trying to download the plugin.