shrug-l: US Census Data extract
Rick Labs
rick at clbcm.com
Tue May 28 14:32:09 EDT 2019
Jim,
Census is a large database. Suggest you start by looking at PDF's of the
actual census questionnaire *booklets*. In my mind that is the easiest
way to get a oriented to exactly what is and isn't covered. You then
need to skim/look at *methodology *to understand the samples, the
sampling frequencies, and the masking they apply to protect
confidentiality of individual information. (Much better masking than big
data!)
Personally, I like the /Tiger boundary files/ to get the polygons
(block, block group, tract, county, zip code...) Note that boundaries
change pretty frequently, you need to line up the correct version by year.
Block is the most granular. The more granular the reporting the greater
the number of completed surveys they need (and the cost for that goes
up). Pretty sure that data at the block level is 100% sampled ONLY every
10 years and very few data elements are actually included in that. (see
the actual pdf of that level of the survey.) Remember too even though
they have the data, not much will be shared with you at the block
level). After that they survey (just SAMPLES) at the block group level,
on a rotating /multiple year basis/, and use ample inference.
American Fact Finder is the database name & homegrown query language you
use to get to the census database and pluck out a slice.
https://www.census.gov/programs-surveys/sis/resources/data-tools/aff.html
The UI is html forms based, clunky, but usable with some practice. After
writing your query via html forms, you run it, and then it spits back
your data in a download file.
Virtually all the census data available at Fact Finder is pure
relational except for the boundary (Tiger) files. My personal philosophy
- do as much as you can with your relational database of choice (or
spreadsheet if it's small and simple) prior to looking at the mapping
(spatial) aspects. Microsoft Excel w/ Power BI tools is pretty good at
ETL, and its Power Query compresses the data as it goes into RAM/Swapped
memory. It can then spin that compressed data ball pretty well (I think
by only uncompressing operative rows or columns, as needed on the fly).
So, even on a modest desktop machine you can handle quite a bit . The
trick is to ETL only what you really need vs. trying to recreate the
entire census DB locally as step one.
Since there tend to be many steps in the ETL tool chain strongly suggest
you *keep good notes *and "breadcrumbs" along the trail, so you can
repeat it in six months if you have that need. What seems "so obvious
and unforgettable" just after you work all through it, magically
evaporates from the memory in about 90 days, or less.
If you look at your time and effort dealing with it all you may find 3rd
party forecasting firms better able at keeping up with the constant
survey frequency and detailed changes, and providing enhanced current
projections at the granularity you may need. However at other times it
pays to dive in yourself so you know just how statistically flimsy some
of those "enhanced" projections actually are (both CENSUS estimates and
3rd party "enhanced" estimates).
The data is only made "accurate enough" to serve the general needs of
the government plus or minus. If your application requires more pure /
more recent data you may have to find ways to supplement census data to
get a better focus on the situation.
Its all available (except for the masking), just takes some digging. Use
the questionnaires up front as your "treasure map".
Rick
PS Survey of Current /Business /is it's own can of worms. Operations and
plants vs. corporations and subsidiaries, economic vs. financial
accounting, masking at the firm and industry level.... takes a while to
unravel all that. Try to find "high tech payroll" last quarter vs 4
quarters ago by county or zip.... Only IRS or State Disability
operations have that data...and they are not sharing much! Yes they have
the individual's information - payroll, occupational classification,
etc, but relate it, mask it and report it? "Not on our budget"....
On 5/28/2019 12:52 PM, Thomas, Jim wrote:
>
> Hi All, what is the best data source for US Census GIS-formatted
> data? Ideally, I’m looking for a source where I can extract data
> based on a geographic area such as block, block group, tract, county,
> or user-defined area. I need to report population data based on
> custom buffers and identify feature data such as institutions, parks,
> etc. within those buffers. In the past, I’ve downloaded the data from
> Esri and extracted it myself, but it’s a lot of data and takes time to
> query. Additionally, I’ve use BAO to run the population reports for
> the study area, but the maps from that service lack the detail we
> need. That’s why we create custom maps using the census data. Our
> sites can potentially be anywhere in the US. Suggestions?
>
> Thanks,
>
> Jim.
>
> *Jim Thomas, PG, GISP**
> */Senior Project Geoscientist//
> /*
> *Golder_Logo_SigGolder Associates Inc.
> 9428 Baymeadows Road, Suite 400, Jacksonville, Florida, USA 32256
> *T: *+1 904 363-3430 | *D: *+1 904 421-4254 |
> *golder.com*<http://www.golder.com/>*
> *LinkedIn<https://www.linkedin.com/company/golder/>|
> Instagram<https://www.instagram.com/golderassociates/> |
> Facebook<https://facebook.com/golderassociates/> |
> Twitter<https://twitter.com/GolderAssociate/>
> *
> **Work Safe, Home Safe**
>
> */This email transmission is confidential and may contain proprietary
> information for the exclusive use of the intended recipient. Any use,
> distribution or copying of this transmission, other than by the
> intended recipient, is strictly prohibited. If you are not the
> intended recipient, please notify the sender and delete all copies.
> Electronic media is susceptible to unauthorized modification,
> deterioration, and incompatibility. Accordingly, the electronic media
> version of any work product may not be relied upon./
> *
> *Golder and the G logo are trademarks of Golder Associates Corporation
> *
> **Please consider the environment before printing this email.***
>
>
> _______________________________________________
> SHRUG-L mailing list
> SHRUG-L at lists.dep.state.fl.us
> http://lists.dep.state.fl.us/mailman/listinfo/shrug-l
--
Richard J. Labs, CFA, CPA
CL&B Capital Management, LLC
Phone: 315-637-0915
E-mail (preferred for efficiency): rick at clbcm.com
3213 Yorktown Dr, Tallahassee, FL 32312-2015
June-August: 408B Holiday Harbour, Canandaigua, NY 14424
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dep.state.fl.us/pipermail/shrug-l/attachments/20190528/f29f23d3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 2154 bytes
Desc: not available
URL: <http://lists.dep.state.fl.us/pipermail/shrug-l/attachments/20190528/f29f23d3/attachment.jpg>
More information about the SHRUG-L
mailing list