Swivel wants to enable people to upload Excel spreadsheets, view them, save them, and later on, visualize them. Here's how we're doing it.
Excel 2007 (Excel 2008 Mac) is saved as a zipped set of XML files. These XML files include: sharedStrings.xml, styles.xml, worksheets/sheet1.xml (depending on the number of sheets in the workbook), and workbook.xml.
First off, you need to unzip the file. I used Zip::ZipFile library to do this.
Remember to install the rubyzip gem: sudo gem install rubyzip. This is how it works:
require 'zip/zipfilesystem'Zip::ZipFile.open('myspreadsheet.xlsx') do |zipFile|
puts zipfile.read('xl/sheet1.xml')
end
Each one of these XML files contain specific information about the workbook.
workbook.xmltells you how many sheets there are in the spreadsheet.styles.xmlcontains all the style that is applied to the sheets.- All the strings are saved in the
sharedStrings.xml. - Finally, the actual data for each sheet is stored in
sheet1.xml,sheet2.xml, etc.
I used Hpricot to parse these xml files.
You start by reading (parsing) the workbook.xml file to figure out how many sheets you have to parse. Here is how:
workbook = Hpricot::XML zipFile.file.read('xl/workbook.xml')
(workbook / 'sheet').each_with_index do |name, i|
doc = Hpricot::XML zipFile.file.read("xl/worksheets/sheet#{i+1}.xml")
end
So now I can parse each sheet and print out the data. This is how sheet1.xml looks (this is just a small portion of xml which contains the cell data):
<sheetData>
<row r="1" spans="1:3">
<c r="A1" t="s">
<v>0</v>
</c>
<c r="B1" t="s">
<v>1</v>
</c>
<c r="C1" t="s">
<v>2</v>
</c>
</row>
<row r="2" spans="1:3">
<c r="A2">
<v>34</v>
</c>
<c r="B2" s="1">
<v>34</v>
</c>
<c r="C2">
<v>56</v>
</c>
</row>
<row r="3" spans="1:3">
<c r="A3">
<v>45</v>
</c>
<c r="B3">
<v>46</v>
</c>
<c r="C3" s="2">
<v>65</v>
</c>
</row>
</sheetData>
And here is how to parse it:
(doc / 'worksheet').each do |worksheet|
(worksheet / 'row').each do |rows|
cols = (row / 'c')
cols.each do |col|
cell_index = col.attributes['r'] # you will need the cell index later to figure out rowspan, colspan
puts col.at('v').inner_html
end
end
end
Strings are stored in a different file called sharedStrings.xml. Each column will have a "t" attribute if the cell contains a string. So it would be something like this: <c r="A1" t="s">. The value for this column (the v tag) would be the index to the "si" tag in sharedString.xml. When you read the t value from the sharedStrings remember to do an inner_html on it since it is surrounded by t tags and you just want the value.
Now let's say your spreadsheet has styles like font family, font size, font decoration, cell height, etc. All these are stored in styles.xml. Parsing styles.xml is not as easy as you might think. If you take a look at sheet1.xml each row (if it has a style) has an "s" attribute with a number. This number refers to the index of "xf" tags within "cellXfs". You will find all the information you need for a cell style in xf tag attributes.
For example in the xf tag, fontId="2" means you have to look up font tags and the index of the font you need is 2 (the index starts from 0).
The font tag itself can have 3 children: sz, color, and name. The value in the sz tag is the font size in pixels. The indexed attribute in the color tag is an index into a color table which you can find in this document. The value in the name tag is the font family. So to get the font attributes out you need to something like this:
styles = Hpricot::XML zipFile.file.read('xl/styles.xml')
# you can store everything in a hash table
fonts = (styles / 'font').map do |font|
color = font.at('color')
{
:fontFamily => font.at('name').attributes['val'],
:fontSize => font.at('sz').attributes['val'] + 'px',
:fontWeight => font.at('b') != nil ? 'bold' : 'normal',
:fontStyle => font.at('i') != nil ? 'italic' : 'normal',
:textDecoration => font.at('u') != nil ? 'underline' : 'none',
:color => color.attributes['indexed'].to_i
}
end
Another example is border styles. Borders are stored in the borders tag and have 5 children. They are: left, top, bottom, diagonal. Each one of them can be empty depending on the border style. Again, the index of the border style tag is given in xf tag in borderId and remember that the index always start from 0.
It might be very confusing at first to figure out how to decode the style.xml and how to map each style to the right cell, but once you figure it, and it will be straightforward and you can pretty much apply it to any style. If you want to do more with style, you can read the Office Open XML SpreadsheetML specification PDF document. It has a comprehensive explanation of every possible style that you might find in a spreadsheet.
Students that were looking for the issue close to write my essay at the writing service should read your best idea just about this post. In other case, it is available to utilize the support of the essay writing service.
Posted by: KiLily | February 01, 2010 at 12:10 PM