shrug-l: Been a while since I've done this...
Rick Labs
rick at clbcm.com
Tue Nov 27 15:16:22 EST 2018
Duane,
At the risk of being cast out as a heretic (especially on this list) I'd
argue that much of your analysis is not spatial, its more analytical,
statistical, operational, business intelligence, machine learning, etc.
Your most relevant prediction tools will not be "map based."
First thought was yes, totally agree, Excel has all the basic stat tools
and is easy to use. Correlation is a snap to calculate. Linear (and some
non-linear) regression analysis functions are included and work great.
Chi Square is included. (note the analysis pack is included with Excel
but you have to tell Excel to load it.)
Suggest a current "masters level" or Phd candidate level in statistics
or applied social sciences (with a recent and solid stats background) to
/briefly /consult on your stats end here, especially if you intend to
publish. Unless you use those skills frequently they fade from memory
pretty quickly. Its always good to get two sets of eyes on the stats
approach anyway.
Preliminary thoughts.
*Independent variables*
dilapidated housing (housing conditions data)
street lighting (street maps of lighting)
prior crime occurred (geo code from street address)
prior fire occurred (geo code from street address)
*Dependent variables:*
/future /crime call occurs (roll officers)
/future /fire call occurs (roll truck)
One of the biggest errors in prediction is using Independent Variables
that are /not yet available /for FUTURE predictions. You have to lag
back in time the Independent variables so they are published and
loadable in the time-frame of making the FUTURE prediction. A high
correlation with data that is only available concurrently is worthless
for /prediction/.
One approach might be: create a grid over the area and assign all
variables to a grid cell. Then consider each cell a sample. At this
point the analysis is "flat", not really spatial.
Then you start to think, how about "nearby" cells potential influence?
And, counts/trends over time, recency, occurrence nearby? Perhaps more
independent variables? Did the assessors office or other departments
know the property was vacant? Owner occupied or rented? Occupied by a
business with a SIC code known to use highly flammable supplies? How
about known prior bad actors and where they live, visit. Things can get
complex quickly. (Can we scan license plates? Run facial ID?)
As the thirst for more and better Independent variables continues
unabated the*Extract Transform Load* (ETL) functions will rise
dramatically. The tool chain used to produce more or less continuous
predictions needs to be efficient. You need to be able to add in future
data /streams /(in near real time, vs. batch)... You need to be able to
automate as much "data acquisition" as possible, write the converter(s)
once and use many times. If the people doing the data collection and
clean up work will turn over in their jobs, the process itself will need
to be very clear what they did, so the next person doesn't start from
scratch again and again. ETL/as a separate function/ is not to be
underestimated. None of it is mapping oriented.
Of course you want to do some really solid interviews with the fire
chief and police people with long experience - extract the data sources
they use, and the "intelligence" in predicting trouble. Let those
in-depth interviews guide the process of ferreting out the best
Independent Variables you can grab in the time frame you need them.
Here are some key terms/phrases for this area for police work (I'm sure
there are parallels for fire. From source #2 below)
"machine learning" AND crime/offense
crime/offense AND predict*/forecast*/map*
"predictive policing"
"risk terrain modeling"
"prospective hot spot/hot-spot analysis/mapping"
"prospective hot-spotting"
"spatiotemporal crime forecasting"
"predictive/prospective crime mapping/analysis"
*Background Papers
*Office of Justice Programs (unit of DOJ)
RAND report on Predictive Policing
https://www.ncjrs.gov/pdffiles1/nij/grants/243830.pdf
[RAND tends to do great work but may be slightly dated?]
A Scoping Review Of Predictive Analysis Techniques For Predicting
Criminal Events
https://www.researchgate.net/profile/Lieven_Pauwels/publication/321833027_A_scoping_review_of_predictive_analysis_techniques_for_predicting_criminal_events/links/5a33e45b45851532e82c9411/A-scoping-review-of-predictive-analysis-techniques-for-predicting-criminal-events.pdf
[Good literature review]
Most of above is likely to be way overkill, especially to start. Still
there may be some nuggets in there to help avoid a false start.
If you wanted to "try out" current predictive technology perhaps fund a
grad student or two at FSU? I'm thinking use Python code and standard
(well understood) *Python code libraries* for statistics and machine
learning. Keep it all /real simple. Let them focus on
*demonstrating***the prediction/learning side. /Output those with
lat/long info attached. To start just import their predictive output
into your mapping systems. Up front the Python analytics would play
really well with traditional mapping products downstream. Best of all
Python is now mainstream, with incredible pre-written libraries that
will be around a long time already "on the shelf" ready to be strung
together.
Avoid a "one off" totally custom solution. Make sure FSU knows you want
the simplest solution possible using only the most standard Python
coding and well established libraries. Don't let it get esoteric.
Rick
--
Richard J. Labs, CFA, CPA
CL&B Capital Management, LLC
Phone: 315-637-0915
E-mail (preferred for efficiency): rick at clbcm.com
3209 Yorktown Dr, Tallahassee, FL 32312
June-August: 408B Holiday Harbour, Canandaigua, NY 14424
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dep.state.fl.us/pipermail/shrug-l/attachments/20181127/d84cd4de/attachment.html>
More information about the SHRUG-L
mailing list