mandag den 5. maj 2008

SDE 9.2's ST_GEOMETRY

http://www.palladiumconsulting.com/blog/sebastian/2008/05/sde-92s-stgeometry-part-one-perhaps.html

Last time, I played a little with SDE 9.2's new ST_GEOMETRY support and was glad I waited for SP5 (!). More experimentation revealed that writing spatial SQL was easy. I've waited a long time to use ESRI-sanctioned methods for asking things like
select *
from zip zi
join county co on (st_intersects(zi.shape, co.shape)=1)
where co.population > 1000000;
.
And it does work. The spatial indexing gets used intelligently, and you can throw (variants of the above) into ArcMap and render spatial queries on the fly. I even started experimenting with doing on-the-fly geoprocessing with things like shape intersections.
I soon noticed that doing anything with more than a few hundred shapes was slow. Of course at first I blamed my queries, the Oracle optimizer, the spatial indexes, anything. But this ain't my first rodeo. That stuff was all fine. No, I was noticing that no matter what I did, ST_GEOMETRY couldn't deal with more than about 1000 shapes per second. Fancy spatial indexes don't do much good when the final geometry-to-geometry filter maxes out on so few shapes.
Why oh why?
Part of the problem seems to be because all ST_GEOMETRY functionality is implemented with an external C DLL, st_shapelib. This is how Oracle wants you to do it. Every call to an ST_GEOMETRY function is made out of process from the Oracle process serving your connection (oracle.exe on Windows) to a spawned executable (extproc.exe on Windows) via pipes or sockets. When intercepting a CREATE TABLE command and jumping in to create an index, these two process switches are no big deal. When determining whether one polygon overlaps another during a join operation over tens of thousands of individual polygons, it is murder. Quantifying the cost of a context switch is tricky, but at the very least it is thousands of cycles. Figure four thousand cycles, plus a few thousand for copying the data back (and forth), plus the (perhaps greater) cost of killing cache locality and dumping all your registers, and it's not a pretty picture. Compare that to the cost of doing the few dozen or hundred operations needed for determining overlap of my typical polygons and it's likely this out-of-process trick is killing performance by a factor of ten or more.
To make sure this wasn't just a fancy theory, I pulled up trusty perfmon.exe and had it count context switches. Here it is ticking along for a while on my idle machine, then executing a very simple query for a couple of seconds.

Can you spot the query? The average number of context switches per second hovers around 850, then spikes to 20,000 during the query, then back.
For completeness, I wrote a test program which simply created two threads which did nothing but relinquish control and loop, trying to get an upper bound on the number of switches I could do. Trusty F# with its concision was useful here.
open System.Threading;;
let rec cswitch () = Thread.Sleep(0); cswitch();;
let makethread () = (new Thread(cswitch)).Start();;
makethread();;
makethread();;

And I was off to about a million context switches per second. That's an unrealistically high number, since the above code is just an infinite loop with no overhead whatsoever. But I would guess that 1 million no-overhead process switches translates to roughly the ~100,000 order of magnitude real-world process switches I was seeing with the st_shapelib calls.
This seems to be a fatal flaw. Oracle's EXTPROC isn't meant for this kind of fine-grained work. The next investigations involve checking out whether Oracle can be sweet talked into running these external processes in-process, but I'm willing to guess the answer is no. So, it was promising, but there's no way this architectural decision gives anyone the kind of speed they're looking for. I'm not enough of an Oracle guru to know what kind of choice ESRI had when they wrote this; let's hope a gun was put to their head.
If nothing can be improved on this front, it's back to Oracle Spatial for SQLy goodness.

Ingen kommentarer:

Send en kommentar