>> text = " hello " => " hello " >> text.strip => " hello " >> require 'htmlentities' => [] >> HTMLEntities.new.decode(text) => " hello " >> HTMLEntities.new.decode(text).strip => " hello " >> text.gsub(/^(\s| )+|(\s| )+$/,'') # *groan* => "hello"Upon further inspection, one can see that is decoded as 0xC2 0xA0:
>> HTMLEntities.new.decode(text)[0] => 32 >> HTMLEntities.new.decode(text)[1] => 194 >> HTMLEntities.new.decode(text)[2] => 160 >> HTMLEntities.new.decode(text)[3] => 32
It's understandable why you'd want to keep those in the decoding process so that you can re-encode it, but it would have been nice if there had been an option to force non-breaking spaces to be converted to plain whitespace.
Thank you!!! it took me several hours to find your entry that solved my problem, but thank you!! :)
Posted by: Gromadusi | September 15, 2009 at 07:08 AM