shrug-l: Been a while since I've done this...

Rick Labs rick at clbcm.com
Tue Nov 27 15:16:22 EST 2018


Duane,

At the risk of being cast out as a heretic (especially on this list) I'd 
argue that much of your analysis is not spatial, its more analytical, 
statistical, operational, business intelligence, machine learning, etc. 
Your most relevant prediction tools will not be "map based."

First thought was yes, totally agree, Excel has all the basic stat tools 
and is easy to use. Correlation is a snap to calculate. Linear (and some 
non-linear) regression analysis functions are included and work great. 
Chi Square is included. (note the analysis pack is included with Excel 
but you have to tell Excel to load it.)

Suggest a current "masters level" or Phd candidate level in statistics 
or applied social sciences (with a recent and solid stats background) to 
/briefly /consult on your stats end here, especially if you intend to 
publish. Unless you use those skills frequently they fade from memory 
pretty quickly. Its always good to get two sets of eyes on the stats 
approach anyway.

Preliminary thoughts.

*Independent variables*
dilapidated housing (housing conditions data)
street lighting (street maps of lighting)
prior crime occurred (geo code from street address)
prior fire occurred (geo code from street address)

*Dependent variables:*
/future /crime call occurs (roll officers)
/future /fire call occurs (roll truck)

One of the biggest errors in prediction is using Independent Variables 
that are /not yet available /for FUTURE predictions. You have to lag 
back in time the Independent variables so they are published and 
loadable in the time-frame of making the FUTURE prediction. A high 
correlation with data that is only available concurrently is worthless 
for /prediction/.

One approach might be: create a grid over the area and assign all 
variables to a grid cell. Then consider each cell a sample. At this 
point the analysis is "flat", not really spatial.

Then you start to think, how about "nearby" cells potential influence? 
And, counts/trends over time, recency, occurrence nearby? Perhaps more 
independent variables? Did the assessors office or other departments 
know the property was vacant? Owner occupied or rented? Occupied by a 
business with a SIC code known to use highly flammable supplies? How 
about known prior bad actors and where they live, visit. Things can get 
complex quickly. (Can we scan license plates? Run facial ID?)

As the thirst for more and better Independent variables continues 
unabated the*Extract Transform Load* (ETL) functions will rise 
dramatically. The tool chain used to produce more or less continuous 
predictions needs to be efficient. You need to be able to add in future 
data /streams /(in near real time, vs. batch)... You need to be able to 
automate as much "data acquisition" as possible, write the converter(s) 
once and use many times. If the people doing the data collection and 
clean up work will turn over in their jobs, the process itself will need 
to be very clear what they did, so the next person doesn't start from 
scratch again and again. ETL/as a separate function/ is not to be 
underestimated. None of it is mapping oriented.

Of course you want to do some really solid interviews with the fire 
chief and police people with long experience - extract the data sources 
they use, and the "intelligence" in predicting trouble. Let those 
in-depth interviews guide the process of ferreting out the best 
Independent Variables you can grab in the time frame you need them.

Here are some key terms/phrases for this area for police work (I'm sure 
there are parallels for fire. From source #2 below)

    "machine learning" AND crime/offense
    crime/offense AND predict*/forecast*/map*
    "predictive policing"
    "risk terrain modeling"
    "prospective hot spot/hot-spot analysis/mapping"
    "prospective hot-spotting"
    "spatiotemporal crime forecasting"
    "predictive/prospective crime mapping/analysis"

*Background Papers

*Office of Justice Programs (unit of DOJ)
RAND report on Predictive Policing
https://www.ncjrs.gov/pdffiles1/nij/grants/243830.pdf
[RAND tends to do great work but may be slightly dated?]

A Scoping Review Of Predictive Analysis Techniques For Predicting 
Criminal Events
https://www.researchgate.net/profile/Lieven_Pauwels/publication/321833027_A_scoping_review_of_predictive_analysis_techniques_for_predicting_criminal_events/links/5a33e45b45851532e82c9411/A-scoping-review-of-predictive-analysis-techniques-for-predicting-criminal-events.pdf
[Good literature review]

Most of above is likely to be way overkill, especially to start.  Still 
there may be some nuggets in there to help avoid a false start.

If you wanted to "try out" current predictive technology perhaps fund a 
grad student or two at FSU? I'm thinking use Python code and standard 
(well understood) *Python code libraries* for statistics and machine 
learning. Keep it all /real simple. Let them focus on 
*demonstrating***the prediction/learning side. /Output those with 
lat/long info attached. To start just import their predictive output 
into your mapping systems. Up front the Python analytics would play 
really well with traditional mapping products downstream. Best of all 
Python is now mainstream, with incredible pre-written libraries that 
will be around a long time already "on the shelf" ready to be strung 
together.

Avoid a "one off" totally custom solution. Make sure FSU knows you want 
the simplest solution possible using only the most standard Python 
coding and well established libraries. Don't let it get esoteric.

Rick

-- 

Richard J. Labs, CFA, CPA
CL&B Capital Management, LLC
Phone: 315-637-0915
E-mail (preferred for efficiency): rick at clbcm.com
3209 Yorktown Dr, Tallahassee, FL 32312
June-August: 408B Holiday Harbour, Canandaigua, NY 14424

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dep.state.fl.us/pipermail/shrug-l/attachments/20181127/d84cd4de/attachment.html>


More information about the SHRUG-L mailing list