Reading Compound File Binary Format Files generated by PHPExcel with Python/Django xlrd

19 Sep 2013

This was a monster for me to track down, so I’m hoping it helps many others.

The issue I ran into was that when trying to read an Excel file using Python’s xlrd package that was generated by PHPExcel, the following error was generated:

File read error [row 1]: Workbook corruption: seen[2] == 4

In search of a fix, preferably one that didn’t assume PHPExcel was just buggy (which it may be, but that’s another topic), I perused dozens of online articles with the actual developer of xlrd (John Machin) commenting on threads from people with nearly identical issues, but all the solutions were just specific workarounds that didn’t seem to apply to me.

Somewhere, I caught the tiniest glimpse of the phrase “Compound File Binary.” At first it didn’t lead to much, but it ultimately led me here:

OleFileIO_PL – a Python module to read MS OLE2 files

A quick “pip install OleFileIO-PL” and the deployment of this code:

data =
f = StringIO.StringIO(data)
ole = OleFileIO_PL.OleFileIO(f)
    if ole.exists('Workbook'):
         d = ole.openstream('Workbook')
         data =

except Exception as e:
     # try to read it the normal way

And I was on my way.

Note that “Workbook” stream might be specific to PHPExcel, but there is a list_dir() method on the ole object that will show you the streams available.  There’s also decent documentation in the zip download from the above link.


XSL Template for converting XSD to complete XML prototype with inline specifications

13 Apr 2012

For a project relating to Amazon, I found a need for an XSL template that could convert their XSD files–which are both numerous and meaty–into complete XML prototypes.  This would be useful for two reasons.

First, it would generate a complete sample call for basically any API request for which the provide an XSD.  Second, it would give me XML that I could programmatically convert to a self-generating form if necessary.

After digging around the web, to my surprise I was unable to find an XSL already made that accomplishes this, so I set about making my own.

The resulting XSL is included below.

Please note that while this appears (as of this writing) to be 100% functional for what I need on Amazon, it does not cover all possible XSD configurations, so may need to be modified for more exotic XSDs.  Also, it’s possible (and quite likely) that the XML generated will not validate to the given XSD. This is because I had to find ways to pass value restriction meta data in a way that made sense in XML.

The two obvious points are nodes with enumerations in which the valid values are passed as <Value> subnodes and nodes with extendable attributes that had to be constructed in the way of other nodes.  In both cases, a flag is set as an attribute in the parent node.

If you make enhancements or fixes or if you find problems, feel free to let me know, and I will revise this.

The XSL is free for anyone to use, modify, etc. without restraint, but I’d certainly be happy to hear about it if it helps you out.

Here is the code:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl=""

<xsl:output method=”xml”/>

<!– Ignore nodes that may have text values as they will output directly otherwise –>
<xsl:template match=”*[not(xs:element) and not(xs:include)]” />

<!– Uncomment to process any included files inline–>
<xsl:template match=”//xs:include”>
<xsl:for-each select=”document(@schemaLocation)/xs:schema/xs:element”>
<xsl:call-template name=”element” />

<!– ######################
####################### –>

<xsl:template match=”/xs:schema/xs:element” name=”element”>


<!– Node is referencing a base element – use that for the prototype either from this file or from any included files–>
<xsl:when test=”@ref”>
<xsl:variable name=”elementRef” select=”current()/@ref” />
<xsl:for-each select=”//xs:element[@name=$elementRef]“><xsl:call-template name=”element”/></xsl:for-each>
<xsl:for-each select=”//xs:include”>
<xsl:for-each select=”document(@schemaLocation)//xs:element[@name=$elementRef]“><xsl:call-template name=”element”/></xsl:for-each>

<!– The node is a standard element – output its XML Prototype –>

<xsl:element name=”{@name}”>

<!– Carry all attributes through other than name (which is used for the element nodeName) and type (which we have to handle cases for) –>
<xsl:copy-of select=”@*[local-name() != 'name' and local-name() != 'type']” />

<!– Bring in any annotations for the element as a “note” attribute –>
<xsl:if test=”./xs:annotation/xs:documentation”>
<xsl:attribute name=”note”><xsl:value-of select=”./xs:annotation/xs:documentation” /></xsl:attribute>


<!– If there is a type attribute, it implies that it’s a complexType node and so we’ll be handling its subNodes independently –>
<xsl:when test=”contains(@type,’:')”>
<xsl:variable name=”elementType” select=”substring-after(@type,’:')” />
<xsl:for-each select=”//xs:complexType[@name=$elementType]“><xsl:call-template name=”complexType”/></xsl:for-each>
<xsl:for-each select=”//xs:simpleType[@name=$elementType]“><xsl:call-template name=”simpleType”/></xsl:for-each>
<xsl:for-each select=”//xs:include”>
<xsl:for-each select=”document(@schemaLocation)//xs:complexType[@name=$elementType]“><xsl:call-template name=”complexType”/></xsl:for-each>
<xsl:for-each select=”document(@schemaLocation)//xs:simpleType[@name=$elementType]“><xsl:call-template name=”simpleType”/></xsl:for-each>
<xsl:when test=”@type”>
<xsl:variable name=”elementType” select=”@type” />
<xsl:for-each select=”//xs:complexType[@name=$elementType]“><xsl:call-template name=”complexType”/></xsl:for-each>
<xsl:for-each select=”//xs:simpleType[@name=$elementType]“><xsl:call-template name=”simpleType”/></xsl:for-each>
<xsl:for-each select=”//xs:include”>
<xsl:for-each select=”document(@schemaLocation)//xs:complexType[@name=$elementType]“><xsl:call-template name=”complexType”/></xsl:for-each>
<xsl:for-each select=”document(@schemaLocation)//xs:simpleType[@name=$elementType]“><xsl:call-template name=”simpleType”/></xsl:for-each>

<!– This is a direct element definition but it still may have inline definitions –>
<xsl:for-each select=”./xs:complexType”><xsl:call-template name=”complexType”/></xsl:for-each>
<xsl:for-each select=”./xs:simpleType”><xsl:call-template name=”simpleType”/></xsl:for-each>
<xsl:for-each select=”./xs:element”><xsl:call-template name=”element”/></xsl:for-each>






<!– ######################
####################### –>

<xsl:template name=”complexType”>

<!– Pass along all attributes –>
<xsl:copy-of select=”@*[local-name() != 'name']” />

<!– If this complexType needs additional attributes, pass a way to use those –>
<xsl:if test=”./xs:simpleContent/xs:extension/xs:attribute”>

<!– Custom node to indicate that this is node has extended attributes –>
<xsl:attribute name=’isExtended’>1</xsl:attribute>

<!– Element is complexType because it’s an extends the attribute set of a simpleType –>
<xsl:if test=”./xs:simpleContent/xs:extension/@base”>
<xsl:when test=”contains(./xs:simpleContent/xs:extension/@base,’:')”>
<xsl:variable name=”baseSimpleType” select=”substring-after(./xs:simpleContent/xs:extension/@base,’:')” />
<xsl:for-each select=”//xs:simpleType[@name=$baseSimpleType]“><xsl:call-template name=”simpleType”/></xsl:for-each>
<xsl:for-each select=”//xs:include”>
<xsl:for-each select=”document(@schemaLocation)//xs:simpleType[@name=$baseSimpleType]“><xsl:call-template name=”simpleType”/></xsl:for-each>
<xsl:variable name=”baseSimpleType” select=”./xs:simpleContent/xs:extension/@base” />
<xsl:for-each select=”//xs:simpleType[@name=$baseSimpleType]“><xsl:call-template name=”simpleType”/></xsl:for-each>
<xsl:for-each select=”//xs:include”>
<xsl:for-each select=”document(@schemaLocation)//xs:simpleType[@name=$baseSimpleType]“><xsl:call-template name=”simpleType”/></xsl:for-each>

<!– Pass along all other attributes (other than @base) –>
<xsl:copy-of select=”./xs:simpleContent/xs:extension/@*[local-name() != 'base']” />

<!– Give a container for the actual value of the node –>
<xsl:element name=”NodeValue” />

<!– Show all attributes as sub nodes –>
<xsl:for-each select=”./xs:simpleContent/xs:extension/xs:attribute”>
<xsl:element name=”{@name}”>
<xsl:variable name=”elementType” select=”current()/@type” />
<xsl:for-each select=”//xs:complexType[@name=$elementType]“><xsl:call-template name=”complexType”/></xsl:for-each>
<xsl:for-each select=”//xs:simpleType[@name=$elementType]“><xsl:call-template name=”simpleType”/></xsl:for-each>
<xsl:for-each select=”//xs:include”>
<xsl:for-each select=”document(@schemaLocation)//xs:complexType[@name=$elementType]“><xsl:call-template name=”complexType”/></xsl:for-each>
<xsl:for-each select=”document(@schemaLocation)//xs:simpleType[@name=$elementType]“><xsl:call-template name=”simpleType” /></xsl:for-each>


<!– Handle any sub nodes –>
<xsl:for-each select=”./xs:sequence/xs:element | ./xs:choice/xs:element”><xsl:call-template name=”element” /></xsl:for-each>


<!– #####################
###################### –>

<xsl:template name=”simpleType”>

<!– If there is a base type, analyze and output its attributes first –>
<xsl:when test=”contains(./xs:restriction/@base,’:')”>
<xsl:variable name=”baseSimpleType” select=”substring-after(./xs:restriction/@base,’:')” />
<xsl:attribute name=’{$baseSimpleType}’>{type}</xsl:attribute>
<xsl:for-each select=”//xs:simpleType[@name=$baseSimpleType]“><xsl:call-template name=”simpleType” /></xsl:for-each>
<xsl:for-each select=”//xs:include”>
<xsl:for-each select=”document(@schemaLocation)//xs:simpleType[@name=$baseSimpleType]“><xsl:call-template name=”simpleType” /></xsl:for-each>
<xsl:when test=”./xs:restriction/@base”>
<xsl:variable name=”baseSimpleType” select=”./xs:restriction/@base” />
<xsl:attribute name=’{$baseSimpleType}’>{type}</xsl:attribute>
<xsl:for-each select=”//xs:simpleType[@name=$baseSimpleType]“><xsl:call-template name=”simpleType” /></xsl:for-each>
<xsl:for-each select=”//xs:include”>
<xsl:for-each select=”document(@schemaLocation)//xs:simpleType[@name=$baseSimpleType]“><xsl:call-template name=”simpleType” /></xsl:for-each>

<!– Pass all type attributes back –>
<xsl:copy-of select=”./@*[local-name() != 'name' and local-name() != 'id']” />

<!– Pass all non-enumeration nodes as attributes (minLength, etc.) –>
<xsl:for-each select=”./xs:restriction/*[local-name() != 'enumeration']“>
<xsl:attribute name=”{local-name()}”><xsl:value-of select=”@value” /></xsl:attribute>

<!– If enumerations exist, pass an attribute indicating so and pass the values as sub nodes (must come last because it outputs XML directly as text) –>
<xsl:for-each select=”./xs:restriction/xs:enumeration”>
<xsl:if test=”position() = 1″><xsl:attribute name=”isList”>1</xsl:attribute></xsl:if>
<xsl:element name=”Value”><xsl:value-of select=”current()/@value” /></xsl:element>




The Solution: Use PHP’s pcntl_fork to limit execution time of MySQL queries

16 Dec 2011

One of the worst parts of a web application can be the variability of mysql queries that get sent into your database.  You can add indices, tweak hardware configurations, etc., but wouldn’t it be nice to simply kill any database query that takes longer than whatever you deem is “too long”?

Well, no, not any query.  I would never want to kill a write — just a read-only query; in particular: search queries.

So, as it turns out, the hurdles for this are immense, and, because my solution uses PHP’s pcntl_fork() function, even my solution, while it works, it has to make assumptions and is not perfect.

That being said, it would seem easy enough for this to be built into PHP or for some similar mechanism to be built into MySQL: Execute this query but only if it takes less than n seconds.  If not, kill it.  This is not the case, however, so we’re left to our own cleverness.

There are hundreds of reasons why you would never want to do this, but I only need one reason to want to do it to try to implement it.

So here is my solution steps in techno-layman’s terms, followed by the necessary code:

  • Call a function to execute a MySQL query (again, preferably read-only)
  • Open a shared memory space so that we can pass the query results back to the parent from the child
  • Store process state information in a database
  • Fork, and execute the query in the child process
  • Keep time in the parent process
  • Kill the child process if it takes longer than n seconds
  • Return the results


Just to forwarn, I have tested this as proof of concept, but I am uncertain about the particulars of PHP’s shared memory and am not confident how reliable the shared memory implementation will in a production environment.  I plan to try it out, but right now I’m just getting the info out there.

So the state database will be defined as follows:

`idx` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`status` enum('0','1') NOT NULL DEFAULT '0',
`output` longblob NOT NULL,

The PHP class is available to download here: (I had it inline, but WordPress did not want to format it properly)

And you would use it something like this

$m = new mysqlRestrictor();
$results = $m->dbQuery("SELECT * FROM `table`");

Since this took me an extremely long time to build and just verify that it even works, I welcome comments for improvement, and by all means use it for yourself. Let me know how it does in production!


A simple question on Sandusky and Penn State’s chopping block

15 Nov 2011

I have seen a few posts generally asking the question, but I want to weigh in on it as well.  The United States is still equipped with a system called due process, but Penn State has already pulled the trigger on many people (most notably Joe Paterno) before any legal judgment has been handed to Sandusky.

Not that I think it will happen, but what if Sandusky is cleared of all charges?  Will that not completely invalidate all the action taken by the University?


A review: Science Story Poster Publishing (

29 Sep 2011

A few weeks ago, I got it in my head that I really want to get a poster of the solar system, complete with as many major moons, dwarf planets, otherwise significant solar system objects, etc. as possible.  Let me tell you, it’s not an easy thing to find.  There are many great posters, but they are often old and outdated, or poor quality, or simply not filled with the level of objects I was looking for.  To top it all off, they cost at least $100, and really don’t have  great durability.  Considering I was looking for posters in the width range of about 8 feet, standard poster paper was not really going to last long on the wall of a 3-year-old.

Yes he’s 3, so maybe the poster was more for me, but as long as it lasts, eventually it will be for him too (again, of course, assuming he ultimately cares about that stuff, and why woudn’t he ?!??! I digress …)

Finally I happened across (note the hyphen).  They had this incredible poster:

and it was only $49.  The only obvious thing I saw missing were Nix and Hydra (Pluto’s 2 smaller moons, and even when I’d ordered its 4th moon had just been discovered).

To top it off, the poster was almost 8′ in length and advertised as basically “untearable.”  Rock!

So, I placed the order.  The total was $62.

Then I got a little concerned.  My invoice was #7, meaning they’d only had 7 orders.  So, the price was very cheap, and no one had really ordered from this company.  I was concerned.

I emailed right away, hoping to get a reply.  Well, I got it.  Scott was very polite, kept me well informed.  Still, there was a delay of nearly a month.  He kept me informed, and it turned out the coating on the poster was having issues.  I, of course, was still very skeptical, but remained positive and kept emailing.

Finally, Scott very courteously sent me this poster:

to appease my impatience, and even offered a refund (after the topographical map poster had already shipped).  I was happy for the free gift, but really wanted the big poster, so I kept waiting.

Finally the big poster arrived, with yet another free gift (The Mona Lisa):

Of course, the highlight for me is still the giant poster, but the service was exceptional, and the product is exactly what I wanted (for a phenomenal price).

If you’re like me, or you’re a teacher looking for great classroom posters, or for any other reason you stumble upon, do not hesitate to order.  In my view, you won’t be disappointed.

I will certainly order from them again …

No Comments

Posted in Reviews


Here’s a big +1 for AquaSource Customer Service

30 Aug 2011

So I am currently redoing my downstairs bathroom (in brief for those that don’t know: completely gutted and replaced with awesomeness).  Cecily and I found what we considered to be the perfect faucet at Lowes:  AquaSource #0476590A with pull-down sprayer.

The problem is, I installed it, and somewhere along the line it started leaking on the top side of the sprayhead.  Fortunately, I kept the documentation, gave the AquaSource customer service department a call, and told them the part I needed (which was clearly listed in the documentation).  They told me there was no charge, and the item would be shipped out this week.

Talk about flooring me.  I’m honest, so being as I didn’t know what caused the problem (me or factory defect), I was expecting to pay for it.  Since I just got off the phone, there may be updates to this post, but assuming the part comes as expected (which I have no reason to doubt at this moment), now would be a good time to read this post’s title.  :)


2011 Washington DC Earthquake and insights on technology

23 Aug 2011

I love how the whole world is abuzz within minutes after the record setting earthquake in Virginia, DC, but it also keeps me entertained keeping an eye on the Wikipedia articles:

Also, dicks.


PHP Random String and POST Form Generator

08 Jul 2011

Just a little snippet of sample code.  Sometimes you just need a random string generator, and on top of that, a random form to test a page.

Maybe I’m just keeping this for my own future reference, but maybe someone else out there could use it too.  :)

The function generates a string containing numbers and letters only (it’s easily customizable to contain other chars).  The form just creates inputs with random names from the string generator and random values from the string generator.

function randomString() {
        $chars = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
        for ($i=0; $i<rand(10,20); $i++) { $string .= $chars[rand(0,strlen($chars))]; }
        return $string;

?><form action="" method="POST"><?php
        for ($i=1; $i<=rand(5,10); $i++) { ?><p /><input name="<?php echo randomString(); ?>" value="<?php echo randomString(); ?>" /><?php }
        ?><p /><input type="submit" /><?php


How to process PayPal Express Checkout for third party merchants

18 May 2011

This is a very simple one, but one that can take a lot of google-search-query-nuance tweaking to find.

It’s easy to find the documentation to do PayPal Express Checkout, but to find that one little field where you send an alternate user as the recipient of the payments, well that’s downright impossible.  It is not in the docs (at least not as of this writing 5-18-2011).

How simple is it? Very — the variable is “SUBJECT”

Yes, you specify an alternate “SUBJECT” of the transaction.

Normally your SetExpressCheckout request looks something like:


Now it will look like this:


Payee_PayPal_Account is the email address/username the user uses to log in.

Hope this helps!  Took us valuable time to find …


Nano global search and replace tabs to spaces or spaces to tabs

17 May 2011

I have been an avid user of nano/pico since about 1999, and yes, many naysayers think it is crap for programming, but it works for me, and I like it.

That being said, one of the major issues I’ve had is that the Xorg select/paste always copies tab characters as the corresponding number of spaces.  So, when I select text in one file, paste into another, I have to replace all the spaces with tabs.

Typically I paste into gedit first, do the replace there, then c/p into the file.  This preserves the tabs.

(If you’re still wondering why I use nano, I just like having my editor accessible as long as I have server access.  I never got used to vi, and nano is more than effective for me.)

I have always thought nano should be able to search and replace tabs and spaces, but I could never get it to work.  Even without the gedit technique, I would typically just replace all double spaces to nothing, then manually insert the tabs.  My workarounds, again, are generally sufficient.  I have not NEEDED search and replace of tabs within nano, but today I decided I wanted to really find out if I could.

And it required some digging …

But, I finally found it: verbatim input!

Nano has a feature to disable character interpretation, and for one character, accept input literally.

To turn it on (again it’s for just the first character typed), hit alt-SHIFT-V (alt-V without shift may trigger x-windows menus), then just hit the tab key (it may or may not show a note that you’re in verbatim input mode).

You only need to do this in the search / replace prompts.  Obviously, you can type tabs directly into the file.

So an example — let’s say you want to convert any instance of 8 spaces to a tab character.

Here is the command stack:

control-W (search)
control-R (replace)
hit space 8 times, then hit enter
alt-shift-V (verbatim input)
hit the tab key, hit enter

Proceed as normal.

Hopefully this post is easier to find than the seemingly impossible digging I just undertook …