This was a monster for me to track down, so I’m hoping it helps many others.
The issue I ran into was that when trying to read an Excel file using Python’s xlrd package that was generated by PHPExcel, the following error was generated:
File read error [row 1]: Workbook corruption: seen == 4
In search of a fix, preferably one that didn’t assume PHPExcel was just buggy (which it may be, but that’s another topic), I perused dozens of online articles with the actual developer of xlrd (John Machin) commenting on threads from people with nearly identical issues, but all the solutions were just specific workarounds that didn’t seem to apply to me.
Somewhere, I caught the tiniest glimpse of the phrase “Compound File Binary.” At first it didn’t lead to much, but it ultimately led me here:
A quick “pip install OleFileIO-PL” and the deployment of this code:
data = file.read()
f = StringIO.StringIO(data)
ole = OleFileIO_PL.OleFileIO(f)
except Exception as e:
And I was on my way.
Note that “Workbook” stream might be specific to PHPExcel, but there is a list_dir() method on the ole object that will show you the streams available. There’s also decent documentation in the zip download from the above link.