Our local newspaper, the San Francisco Chronicle, is occasionally a source for data stories posted to Swivel, and that's how we discovered that there are a couple of edge cases that Rails's auto_link doesn't handle perfectly.
A typical Chronicle link looks something like this: <a href="http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2007/08/24/WIJKRIC6E.DTL">. Nothing too peculiar there. However, when passed through the auto_link function, the / character at the beginning of the value of the query string variable causes the entire value to be dropped, leaving us with lots of links to <a href="http://www.sfgate.com/cgi-bin/article.cgi?f=">.
This bug has been noted by Satya among others, and the monkey-patch solution is fairly straightforward. The first step is to write a functional test:
test("URL querystring variables can begin with a slash") do
url = 'www.example.com?url=/rails/cannot/deal/with/this/'
post(:create, :resource => { :name => "test", :link => url })
get(:show, :id => assigns(:resource).id)
assert_select("a[href=http://#{url}]")
end
Our test fails. The URL is truncated at the =.
ActionView's TextHelper includes a regular expression constant AUTO_LINK_RE which tells Rails how to recognize URLs. We prefer not to tinker with the actual vendor code itself to avoid any upgrading gotchas or other weirdness, so I copied the relevant RegEx to /configuration/initializers/core_ext.rb, made the necessary (one-character!) tweak, and overrode the private TextHelper constant with the new Regex. The changed line is in bold:
ActionView::Helpers::TextHelper::AUTO_LINK_RE = %r{
( # leading text
<\w+.*?>| # leading HTML tag, or
[^=!:'"/]| # leading punctuation, or
^ # beginning of line
)
(
(?:https?://)| # protocol spec, or
(?:www\.) # www.*
)
(
[-\w]+ # subdomain or domain
(?:\.[-\w]+)* # remaining subdomains or domain
(?::\d+)? # port
(?:/(?:[~\w\+@%=\(\)-]|(?:[,.;:'][^\s$]))*)* # path
(?:\?[\w\+@%&=.;:/-]+)? # query string
(?:\#[\w\-]*)? # trailing anchor
)
([[:punct:]]|<|$|) # trailing text
}x
Our test passes, and all is well.
Recent Comments