shrug-l: US Census Data extract

Tue May 28 14:32:09 EDT 2019

Jim,

Census is a large database. Suggest you start by looking at PDF's of the 
actual census questionnaire *booklets*. In my mind that is the easiest 
way to get a oriented to exactly what is and isn't covered. You then 
need to skim/look at *methodology *to understand the samples, the 
sampling frequencies, and the masking they apply to protect 
confidentiality of individual information. (Much better masking than big 
data!)

Personally, I like the /Tiger boundary files/ to get the polygons 
(block, block group, tract, county, zip code...) Note that boundaries 
change pretty frequently, you need to line up the correct version by year.

Block is the most granular. The more granular the reporting the greater 
the number of completed surveys they need (and the cost for that goes 
up). Pretty sure that data at the block level is 100% sampled ONLY every 
10 years and very few data elements are actually included in that. (see 
the actual pdf of that level of the survey.) Remember too even though 
they have the data, not much will be shared with you at the block 
level). After that they survey (just SAMPLES) at the block group level, 
on a rotating /multiple year basis/, and use ample inference.

American Fact Finder is the database name & homegrown query language you 
use to get to the census database and pluck out a slice.
https://www.census.gov/programs-surveys/sis/resources/data-tools/aff.html 
The UI is html forms based, clunky, but usable with some practice. After 
writing your query via html forms, you run it, and then it spits back 
your data in a download file.

Virtually all the census data available at Fact Finder is pure 
relational except for the boundary (Tiger) files. My personal philosophy 
- do as much as you can with your relational database of choice (or 
spreadsheet if it's small and simple) prior to looking at the mapping 
(spatial) aspects. Microsoft Excel w/ Power BI tools is pretty good at 
ETL, and its Power Query compresses the data as it goes into RAM/Swapped 
memory. It can then spin that compressed data ball pretty well (I think 
by only uncompressing operative rows or columns, as needed on the fly). 
So, even on a modest desktop machine you can handle quite a bit . The 
trick is to ETL only what you really need vs. trying to recreate the 
entire census DB locally as step one.

Since there tend to be many steps in the ETL tool chain strongly suggest 
you *keep good notes *and "breadcrumbs" along the trail, so you can 
repeat it in six months if you have that need. What seems "so obvious 
and unforgettable" just after you work all through it, magically 
evaporates from the memory in about 90 days, or less.

If you look at your time and effort dealing with it all you may find 3rd 
party forecasting firms better able at keeping up with the constant 
survey frequency and detailed changes, and providing enhanced current 
projections at the granularity you may need. However at other times it 
pays to dive in yourself so you know just how statistically flimsy some 
of those "enhanced" projections actually are (both CENSUS estimates and 
3rd party "enhanced" estimates).

The data is only made "accurate enough" to serve the general needs of 
the government plus or minus. If your application requires more pure / 
more recent data you may have to find ways to supplement census data to 
get a better focus on the situation.

Its all available (except for the masking), just takes some digging. Use 
the questionnaires up front as your "treasure map".

Rick

PS Survey of Current /Business /is it's own can of worms. Operations and 
plants vs. corporations and subsidiaries, economic vs. financial 
accounting, masking at the firm and industry level.... takes a while to 
unravel all that. Try to find "high tech payroll" last quarter vs 4 
quarters ago by county or zip.... Only IRS or State Disability 
operations have that data...and they are not sharing much! Yes they have 
the individual's information - payroll, occupational classification, 
etc, but relate it, mask it and report it? "Not on our budget"....

On 5/28/2019 12:52 PM, Thomas, Jim wrote:
>
> Hi All, what is the best data source for US Census GIS-formatted 
> data?  Ideally, I’m looking for a source where I can extract data 
> based on a geographic area such as block, block group, tract, county, 
> or user-defined area.  I need to report population data based on 
> custom buffers and identify feature data such as institutions, parks, 
> etc. within those buffers.  In the past, I’ve downloaded the data from 
> Esri and extracted it myself, but it’s a lot of data and takes time to 
> query.  Additionally, I’ve use BAO to run the population reports for 
> the study area, but the maps from that service lack the detail we 
> need.  That’s why we create custom maps using the census data.  Our 
> sites can potentially be anywhere in the US.  Suggestions?
>
> Thanks,
>
> Jim.
>
> *Jim Thomas, PG, GISP**
> */Senior Project Geoscientist//
> /*
> *Golder_Logo_SigGolder Associates Inc.
> 9428 Baymeadows Road, Suite 400, Jacksonville, Florida, USA 32256
> *T: *+1 904 363-3430 | *D: *+1 904 421-4254 | 
> *golder.com*<http://www.golder.com/>*
> *LinkedIn<https://www.linkedin.com/company/golder/>| 
> Instagram<https://www.instagram.com/golderassociates/> | 
> Facebook<https://facebook.com/golderassociates/> | 
> Twitter<https://twitter.com/GolderAssociate/>
> *
> **Work Safe, Home Safe**
>
> */This email transmission is confidential and may contain proprietary 
> information for the exclusive use of the intended recipient. Any use, 
> distribution or copying of this transmission, other than by the 
> intended recipient, is strictly prohibited. If you are not the 
> intended recipient, please notify the sender and delete all copies. 
> Electronic media is susceptible to unauthorized modification, 
> deterioration, and incompatibility. Accordingly, the electronic media 
> version of any work product may not be relied upon./
> *
> *Golder and the G logo are trademarks of Golder Associates Corporation
> *
> **Please consider the environment before printing this email.***
>
>
> _______________________________________________
> SHRUG-L mailing list
> SHRUG-L at lists.dep.state.fl.us
> http://lists.dep.state.fl.us/mailman/listinfo/shrug-l

-- 
Richard J. Labs, CFA, CPA
CL&B Capital Management, LLC
Phone: 315-637-0915
E-mail (preferred for efficiency): rick at clbcm.com
3213 Yorktown Dr, Tallahassee, FL 32312-2015
June-August: 408B Holiday Harbour, Canandaigua, NY 14424

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dep.state.fl.us/pipermail/shrug-l/attachments/20190528/f29f23d3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 2154 bytes
Desc: not available
URL: <http://lists.dep.state.fl.us/pipermail/shrug-l/attachments/20190528/f29f23d3/attachment.jpg>