shrug-l: Data Drift in GeoDataBases

William Pollock WilliamPollock@WilsonMiller.com
Mon, 21 Oct 2002 09:49:44 -0400


Mark - 

If there is a 2.1 billion GDB unit limitation which is exceeded 
considering the Y-axis of your dataset, could you not "break" 
your dataset into two or more smaller pieces? 

Ciao, bill 


-----Original Message-----
From: Mark.Welsh@dot.state.fl.us [mailto:Mark.Welsh@dot.state.fl.us]
Sent: Monday, October 21, 2002 9:34 AM
To: shrug-l@lists.dep.state.fl.us
Subject: RE: shrug-l: Data Drift in GeoDataBases


Shrugians,

Thanks to everyone who replied to my initial query regarding GDBs.  In
particular, Eric Brockwell provided a very good description of what's going
on behind the scenes in GDB land.  Thanks to Linc Clay for posting FDEPs
replies on this.  Also, ESRI-Charlotte got in the mix Friday afternoon.
Since I was out Friday pm, I didn't have a chance to summarize what I've
discovered, but here's some thoughts so far: (I apologize for the
long-winded discussion, I guess it makes up for my previous lack of
postings.....).

(1) First, I'm not hung up on the fact that the data are "off" by such a
small amount.  Several suggestions I received suggested setting the
precision parameter (FDEP, ESRI) to a higher value in order to make the GDB
data "better".  Yes, I tried this (before posting to SHRUG) and found that
you can change the degree of difference, but that there is an upper limit
to how high you can set the precision, especially for data that cover large
geographic extents (like statewide data or regional data).  I'm still
waiting to find a way to make the GDB data the same as the source data.  I
guess the main concern that I have is that no matter how high I set the
precision (up to the limit) the data are no longer the same as the
original.  The source data are accurate to 4 decimal places and the GDB
data, in effect, lose some of this information.  Spatial joins (or select
by contains) back to the source data or other data built from the source
data are not possible.  This creates a disconnect, between GDB data and
source data.  Also, the argument that the data change is so small is kind
of like telling people that the bank will no longer keep track of the cents
portion of their accounts since these pesky little coins don't really
matter....

(2) Eric Brockwell provided some good leads and I was able to track down a
geodatabase training book from ESRI.  The following is my attempt to
interpret this information.  In essence, to get data into the GDB, the
software needs to translate your map units into what ESRI terms
"Geodatabase units".  The GDB unit conditions the spatial precision of the
data.  If your data are in feet, and you want precision tracking in the GDB
to 1/8 of an inch, you need to be able to divide each foot into 96 GDB
units (this will give the GDB the ability to keep data to 1/8 inch).  The
precision setting in this case would be 96.  So far so good.  However, the
next phase requires you to determine if the desired scale will work in the
GDB given the extent of the data you are trying to import.  And here's
where it all breaks down, I think.  There is a limit of 2.1 Billion GDB
units (in X or Y) that the database can maintain.  Therefore, if the
largest extent of your data (in X or Y terms) times the GDB precision
exceeds 2.1 Billion, then the precision is too high.  This maybe ties into
a point that Jacob made about 32 bit architecture.  For a statewide data
source (like ours) in meters originally accurate to the ten thousandth
place, I would need to divide 1 meter by my precision desired (0.0001)
which yields a precision factor of 10,000.  However, if I try to set this
precision level and import the data, the import fails.  And no wonder since
the extent of our data is approximately 733,000 meters north-south.  If I
multiply this value by 10,000 I exceed the 2.1 Billion limit along the Y
axis by 5 billion.  I want to divide a meter into 10,000 units to capture
the precision of the source data, but in reality, I can only divide a meter
into 1,500 units and successfully import the data into a GDB.  Since 1,500
is smaller than 10,000 by a factor of 10 (approx.), it makes sense that the
imported data are off by 0.001 meters approx.  The key here is that there
are limits in the GDB as to how precise the data can be stored and these
are conditioned by two factors (the 2.1 Billion limit and the geographic
extent of the original data).  Unless I'm totally missing something, it
makes it impossible for us to have our GDB match our original coverage.
Both people I've worked with at ESRI have been able to suggest how to make
the offset less, but not make it disappear.

(3)  Does this matter to anyone?  For many purposes, its a non-issue, but I
would like to have the option to keep my data the same in the GDB as it was
originally.  I'm curious how others feel about this.  Are there any other
unforeseen consequences?

Thanks to everyone who replied.  I'm going to do some more data
exploration/experimentation and I'll post any new things I may discover.

Mark

_______________________________________________
SHRUG-L mailing list
SHRUG-L@lists.dep.state.fl.us
http://lists.dep.state.fl.us/mailman/listinfo/shrug-l