import React from "react";
import { Accordion } from "react-bootstrap";

export default function project2() {
  return (
    <>
      <div className="jumbotron jumbotron-fluid custom-jumbotron shadow-lg">
        <div className="container">
          <h1 className="display-5 jumbotron-text-spacing text-center">
            Predicting Crime
          </h1>
          <p className="lead text-center pb-5">Linear Regression Application</p>
        </div>
      </div>

      <div className="container border border-top-0 mt-3 text-black">
        <h3>Background/Motivation</h3>
        <p>
          The unemployment rate is one of the many statistics used to gauge the
          well-being of a country and its people. Simply put, a higher
          unemployment rate indicates a weaker economy, and in turn signifies
          more people are facing financial hardship. Because of this, countries
          place a great emphasis on monitoring their unemployment rate and
          introducing programs to lower it. Apart from gauging the health of an
          economy, some use the unemployment rate to predict the crime rate.
          Early in the pandemic, I remember my elderly neighbors expressing
          concern that as unemployment increased, crime would increase as well.
          Their concerns were not unfounded, for we can explain crime using
          simple cost-benefit analysis.
        </p>
        <p>
          Most people do not commit crimes because the cost is too high; jail
          time, public scorn, reduced job opportunities, to name a few, all
          serve as powerful deterrents. However, when faced with increasing
          economic hardship these deterrents are sure to weaken. For some there
          will be more to gain from committing a crime than to continue abiding
          by the law. From this reasoning, we can argue that property crimes
          should become more prevalent as unemployment increases. However, is
          the unemployment rate the actual best predictor of crime? In other
          words, do there exist other predictors that do a better job at
          predicting crime? Further, if there are other predictors, how good of
          a prediction can we get if we begin to combine them? We can hope to
          answer these questions with statistical methods, in particular, by
          using linear regression.
        </p>
        {/* end of Motivation */}
        <Accordion defaultActiveKey={["3"]} alwaysOpen flush>
          <Accordion.Item eventKey="0">
            <Accordion.Header>
              Design Choices and Data Collection (Click to View More)
            </Accordion.Header>
            <Accordion.Body>
              {" "}
              <p>
                I accessed all data from datacommons.org. In short, this website
                serves as a data repository by aggregating data from multiple
                government agencies such as the Bureau of Labor Statistics, FBI,
                and Census Bureau. Aside from the convenience of having all data
                consolidated, the site offers a Python and Pandas wrapper
                (highly recommend), thereby reducing the amount of code needed.
                The site gives statistics at different levels of governance. At
                the national level, I felt I would have too few data points
                casting doubt on the effectiveness of any model. Which then left
                me between getting data from the city, county, or state level.
                Unfortunately, crime statistics are only consistently found at
                the state level, so any statistics I chose also had to be at the
                state level as well.
              </p>
              <div className="row">
                <figure className="figure">
                  <div className="table-responsive">
                    <table className="table table-striped">
                      <thead>
                        <tr>
                          <th scope="col">year/state</th>
                          <th scope="col">Alabamba</th>
                          <th scope="col">Alaska</th>
                          <th scope="col">...</th>
                          <th scope="col">Wyoming</th>
                        </tr>
                      </thead>
                      <tbody>
                        <tr>
                          <th scope="row">2012</th>
                          <td></td>
                          <td></td>
                          <td></td>
                          <td></td>
                        </tr>
                        <tr>
                          <th scope="row">2013</th>
                          <td></td>
                          <td></td>
                          <td></td>
                          <td></td>
                        </tr>
                        <tr>
                          <th scope="row">...</th>
                          <td></td>
                          <td></td>
                          <td></td>
                          <td></td>
                        </tr>
                        <tr>
                          <th scope="row">2019</th>
                          <td></td>
                          <td></td>
                          <td></td>
                          <td></td>
                        </tr>
                      </tbody>
                    </table>
                  </div>

                  <figcaption className="figure-caption text-center">
                    Example dataframe generated from the data commons library.
                  </figcaption>
                </figure>
              </div>
              <p>
                After exploring the data commons site, I chose the following
                statistics in my regression model:
              </p>
              <ul>
                <li>
                  Count of Property Crimes - sourced from the FBI. My goal is to
                  use the remaining statistics to predict property crimes.
                </li>
                <li>
                  Below Poverty Level, in the Past 12 Months Population -
                  sourced from the Census Bureau. Like the unemployment rate,
                  the population below the poverty level is a statistic used to
                  monitor the well-being of a populace and their economy. Hence
                  it would be interesting to see how this statistic compares to
                  the unemployment rate in predicting property crime.
                </li>
                <li>
                  Unemployment Rate (ratio of the workforce looking for a job) -
                  sourced from the Burea of Labor Statistics. NOTE: the BLS
                  provides the unemployment rate for each month, so I calculated
                  and used the yearly average instead. My initial goal was to
                  measure how good the unemployment rate was at predicting
                  crime, so I needed to use it.
                </li>
                <li>
                  Number of People with No Schooling - sourced from the Census
                  Bureau. It is no secret that economic opportunities increase
                  with education level. Thus if the lack of economic
                  opportunities compels individuals to commit a crime, then no
                  education should be a good predictor of the crime rate.
                </li>
                <li>
                  Population Density (people per km^2) - sourced from the
                  Organisation for Economic Co-operation and Development. Crime
                  is more likely to occur in areas of high urbanization. The
                  reason for that is simple - there will be more people willing
                  to commit a crime. In short, as the population increases,
                  resources decrease, compelling some towards crime. Population
                  density should then be a good predictor of crime.
                </li>
              </ul>
              <p>
                To create apples-to-apples comparisons, I transformed some of
                the data into rates. The main reason for this was due to the
                high variability in population size among states. In other
                words, some states have a much higher population than others
                (think California versus Vermont), so dividing by population
                only would result in values of substantially different
                magnitudes leading to, for lack of better terms, strange
                comparisons. As such, I opted to transform the "counts" as "X
                per 100,000." While I could have converted the data as "per
                capita," I chose against it due to the mismatch in population
                size across state levels.
              </p>
              <p>
                The table below shows if I applied the "X per 100,000"
                conversion and the name I will use from this point on to refer
                to each statistic.{" "}
              </p>
              <div className="row">
                <div className="table-responsive">
                  <table className="table table-striped">
                    <thead>
                      <tr>
                        <th scope="col">Predictor</th>
                        <th scope="col">Applied Conversion?</th>
                        <th scope="col">Measurement</th>
                        <th scope="col">New Name</th>
                      </tr>
                    </thead>
                    <tbody>
                      <tr>
                        <th scope="row">Count of Property Crimes</th>
                        <td>Yes</td>
                        <td>X per 100,000</td>
                        <td>Crime Rate</td>
                      </tr>
                      <tr>
                        <th scope="row">
                          Below Poverty Level, in the Past 12 Months Population
                        </th>
                        <td>Yes</td>
                        <td>X per 100,000</td>
                        <td>Poverty Rate</td>
                      </tr>
                      <tr>
                        <th scope="row">Unemployment Rate</th>
                        <td>No</td>
                        <td>% of available workforce</td>
                        <td>Unemployment Rate</td>
                      </tr>
                      <tr>
                        <th scope="row">Number of People with No Schooling</th>
                        <td>Yes</td>
                        <td>X per 100,000</td>
                        <td>No Schooling Rate</td>
                      </tr>
                      <tr>
                        <th scope="row">Population Density</th>
                        <td>No</td>
                        <td>
                          People per km<sup>2</sup>
                        </td>
                        <td>Population Density</td>
                      </tr>
                    </tbody>
                  </table>
                </div>
              </div>
              <p>
                Since each statistic had varying historical data availability, I
                elected to only look over 2012-to 2019 data. Lastly, I had to
                calculate and use the mean for each state's statistic over the
                specified time frame rather than just using all data. Basically
                if I didn't do so, I'd have "fake clusters" i.e. small clusters
                for each state since it would be unlikely for statistics to
                dramatically change from one year to the next. So if left
                unaccounted for, there would exist an autocorrelation which was
                not related between a predictor and the response variable. Which
                in turn meant that we would not be able to draw much from our
                conclusions. Therefore, taking the mean of the historical data
                is one solution to solving this issue.
              </p>
            </Accordion.Body>
          </Accordion.Item>
          <Accordion.Item eventKey="1">
            <Accordion.Header>
              Analyzing Predictors (Click to View More)
            </Accordion.Header>
            <Accordion.Body>
              <p>
                Before creating a model it is imperative to analyze the
                predictors to better understand the results of any models we
                develop.
              </p>
              <div className="text-center">
                <img
                  src="/project_two_images/correlation.png"
                  className="img-fluid border mb-2"
                  alt="correlation matrix of predictors"
                ></img>
              </div>
              <p>
                From the correlation matrix shown above, the most apparent
                correlations are between the unemployment rate and poverty rate
                (+0.52 correlation), and for the unemployment rate and no
                schooling rate (+0.57 correlation). These observations make
                sense for if more individuals are facing unemployment, then more
                individuals are likely to be below the poverty rate. Likewise,
                more individuals with no schooling signifies a greater portion
                of the population having fewer economic opportunities and their
                likelihood of facing unemployment increasing. While we would
                normally be against using predictors with this level of
                correlation, it would be best to calculate the variance
                inflation factors as shown below:
              </p>
              <div className="row text-center justify-content-md-center">
                <div className="col-3">
                  <table className="table table-striped border">
                    <thead>
                      <tr>
                        <th scope="col">Predictor</th>
                        <th scope="col">VIF</th>
                      </tr>
                    </thead>
                    <tbody>
                      <tr>
                        <th scope="row">Poverty Rate</th>
                        <td>2.120794</td>
                      </tr>
                      <tr>
                        <th scope="row">Unemployment Rate</th>
                        <td>2.021844</td>
                      </tr>
                      <tr>
                        <th scope="row">No Schooling Rate</th>
                        <td>1.767233</td>
                      </tr>
                      <tr>
                        <th scope="row">Density</th>
                        <td>1.789392</td>
                      </tr>
                    </tbody>
                  </table>
                </div>
              </div>
              <p>
                Analyzing the VIF's for each predictor we actually notice that
                all the values are below 5 indicating that we are somewhat
                "safe" in applying linear regression. Granted, because we know
                that these predictors are naturally correlated we should be wary
                of the type of conclusions we can draw. Applying a preliminary
                linear regression between each predictor and crime rate, we get
                the following:
              </p>
              <div className="container">
                <div className="row text-center">
                  <div className="col-6 border text-center p-2">
                    <img
                      src="/project_two_images/reg1.png"
                      className="img-fluid"
                      alt="preliminary regression on crime rate versus poverty rate"
                    ></img>
                  </div>
                  <div className="col-6 border p-2">
                    <img
                      src="/project_two_images/reg2.png"
                      className="img-fluid"
                      alt="preliminary regression on crime rate versus unemployment rate"
                    ></img>
                  </div>
                </div>
                <div className="row text-center">
                  <div className="col-6 border p-2">
                    <img
                      src="/project_two_images/reg3.png"
                      className="img-fluid"
                      alt="preliminary regression on crime rate versus no schooling rate"
                    ></img>
                  </div>
                  <div className="col-6 border p-2">
                    <img
                      src="/project_two_images/reg4.png"
                      className="img-fluid"
                      alt="preliminary regression on crime rate versus population density rate"
                    ></img>
                  </div>
                </div>
              </div>
              <p className="mt-3">Some observations:</p>
              <ul>
                <li>
                  Compared to the other predictors, "Poverty Rate" appears to be
                  the best predictor for crime rate. This observation is
                  interesting as it was expected/presumed that the unemployment
                  rate would be the best predictor. We can reason that occurred
                  by arguing that{" "}
                  <b>
                    unemployment does not sufficiently represent a person's
                    welfare
                  </b>
                  . For example, a person could be unemployed and rely on
                  savings to keep them afloat for months on end. Simarily, a
                  person could be employed and live pay-check to pay-check. The
                  point being that employment status simplifies an individual's
                  financial security. On the other hand,{" "}
                  <b>living under the poverty level</b> shows a person's{" "}
                  <b>true socioeconomic status</b>. The caveat is that the
                  "Poverty Rate" is calculated from federal and not city/state
                  thresholds. The variable cost of living in the U.S. means that
                  even if a person is not living in poverty at the Federal
                  level, that might not remain true at a city or state level.
                  Hence, if city/county data were easier to collect, then it
                  would be of better use to use said data. Nonetheless, the
                  Poverty Rate better represents a person's financial well-being
                  and the likelihood that they may view crime as a means to
                  better their condition. Or rather, that said individuals are
                  exposed to an environment where crime is much more commonplace
                  and less likely to be looked down upon if not, encouraged.
                </li>
                <li>
                  All figures depict a positive correlation (with varying
                  strength) between any predictor and crime - except for
                  population density exhibiting a negative correlation. For the
                  positively correlated predictors, the poverty rate performs
                  the best, followed by the unemployment rate and no schooling
                  coming in last. We can reason that no schooling does poorly
                  for a variety of reasons. The first might be explained by a
                  state's job market - in some states receiving higher education
                  won't result in more job opportunities. In short, if an area
                  does not need a highly specific occupation, then receiving
                  higher education will have done little to improve a person's
                  financial security. Another reason might be due to a general
                  shift in job market - in recent years trade jobs have
                  increased in demand. Therefore, in those sectors specifically,
                  receiving higher education will be of little use. Since the
                  necessity of education is variable then this then means that
                  education can't accurately represent a person's financial
                  status and be used to argue a person's propensity for crime.
                </li>
                <li>
                  Recalling an earlier argument, as population density
                  increases, there will be a reduction in available resources
                  (job availability in particular) leading some to look for
                  alternate means to better their condition. Therefore, the
                  negative correlation between crime and population density
                  depicted does not make sense as it suggests that property
                  crime reduces as population density increases. Looking at the
                  figure for "Crime Rate vs Population Density," we can see that
                  the correlation is "pulled" negatively by a couple of data
                  points. By measuring population density at the state level, I
                  inadvertently oversimplified each state's population
                  dispersion. For instance, take California - Northern
                  California is sparsely populated but not Southern California.
                  Population density also varies in Southern California. Most
                  notably between Orange and Imperial County. Therefore, I would
                  need to take data at the city/county level to better gauge the
                  effectiveness of population density in predicting property
                  crimes. In short, population density shouldn't be used as a
                  predictor for a multivariate model as it doesn't represent
                  living conditions for most people living in their area (unless
                  of course a new set of city/county data is used).
                </li>
                <li>
                  The intercepts of all linear models (except for population
                  density) make intuitive sense - even in the absence of these
                  predictors, we would still expect property crimes to exist.
                  For population density, the intercept is suggesting that in a
                  state with no individuals, property crimes would still exist.
                  This cannot be true (for obvious reasons) - another reason why
                  we should be reluctant to use population density as a
                  predictor.
                </li>
              </ul>
              <p>
                From these observations we can should reason that no schooling
                rate and population density should be dropped as predictors. For
                sake of argument, they will be still be used but only to compare
                with other models.
              </p>
            </Accordion.Body>
          </Accordion.Item>
          <Accordion.Item eventKey="2">
            <Accordion.Header>Models (Click to View More)</Accordion.Header>
            <Accordion.Body>
              <p>
                Model A uses all predictors while Model B uses all predictors
                except for No Schooling Rate and Population Density. OLS
                Regression results are shown below:
              </p>
              <div className="container text-center">
                <div className="row border">
                  <div className="col-6">Model A</div>
                  <div className="col-6">Model B</div>
                  <div className="col-6 justify-content-center">
                    <img
                      className="img-fluid"
                      src="/project_two_images/model_a.png"
                      alt="regression output on all predictors"
                    ></img>
                  </div>
                  <div className="col-6">
                    <img
                      className="img-fluid"
                      src="/project_two_images/model_b.png"
                      alt="regression output on unemployment and poverty rates"
                    ></img>
                  </div>
                </div>
              </div>
              <p>
                Comparing the adjusted r-squared values between both models,
                Model A sports an adj. r-squared value of 0.261, whereas Model B
                sports an adj. r-squared value of 0.223. This minimal
                improvement was to be expected - as mentioned earlier, there
                were inherent issues with using No Schooling and Population
                Density as predictors for the crime rate (namely due to choice
                in data selection at the level of governance). By removing these
                problematic predictors in Model B and examining the p-values of
                the remaining predictors, we see that the Poverty Rate is a
                better predictor of the Crime Rate than the Unemployment Rate.
                However, the Durbin-Watson and condition numbers are
                unacceptable (the condition number being more unacceptable than
                the Durbin Watson calculation). In short, these values being as
                they are, indicate issues with the model, and we should be wary
                of concluding that the Poverty Rate is a better predictor than
                the Unemployment Rate. We can explain the issues with Model B by
                remembering the moderate positive correlation between the
                Unemployment Rate and Poverty rate. As such, we should view
                these predictors separately to better gauge which predictor is
                better at predicting crime. Model C only uses Poverty Rate
                whereas Model D only uses the Unemployment Rate as predictors.
              </p>
              <div className="container text-center">
                <div className="row border">
                  <div className="col-6">Model C</div>
                  <div className="col-6">Model D</div>
                  <div className="col-6">
                    <img
                      className="img-fluid"
                      src="/project_two_images/model_c.png"
                      alt="regression output on poverty rate"
                    ></img>
                  </div>
                  <div className="col-6">
                    <img
                      className="img-fluid"
                      src="/project_two_images/model_d.png"
                      alt="regression output on unemployment rate"
                    ></img>
                  </div>
                </div>
              </div>
              <p>
                Looking at the R-squared (and adj. r-squared) calculation, the
                Poverty Rate is a better predictor of property crime incidence
                than the Unemployment Rate as it is larger for the Poverty Rate.
                We can further support this by looking at the p-values of both
                predictors. For the Poverty Rate, it is (basically) 0, but for
                the Unemployment Rate, it is 0.083 which would "fail" at 95%
                significance. Granted, it would be statistically significant at
                the 90% level of significance, so we should exercise some
                caution here. Put together, these two statistics suggest that
                the Poverty Rate is a better predictor of property crime rate
                incidence than the unemployment rate. However, looking at the
                Durbin Watson value and condition number, we can see that they
                are better (read: smaller) for Model D. Thereby casting some
                doubt on the conclusion we drew. Since the Durbin Watson value
                is greater than 2.5, we can conclude that the data points are
                not independent; a requirement for linear regression. We can
                explain this dependence in data points by considering geographic
                proximity. At a surface level, we might argue that issues
                "overflow" onto neighboring states, for example, that crime in
                one state might spread to a neighboring state. But we can better
                reason that neighboring states face the same set of issues due
                to intrinsic similarities. For example, coastal states have
                historically been centers of commerce leading to greater
                population density. There will therefore be a higher rate of
                crime in these states. Similarly, some states are naturally
                better for some jobs than others such as central states for
                farming or California (specifically the Silicon Valley) for tech
                jobs. In short, we should intuitively reason that neighboring
                states will have similar statistical data and seemingly result
                in a dependence or similarity of data points.
              </p>
            </Accordion.Body>
          </Accordion.Item>
          <Accordion.Item eventKey="3">
            <Accordion.Header>Summary of Findings</Accordion.Header>
            <Accordion.Body>
              <p>
                Limited availability in city/county data meant statewide data
                had to be used. In doing so there was an overgeneralization of
                the populace that resulted in the weakening of multiple
                predictors' strength to predict crime. At a minimum, we can
                argue that the poverty rate is a better indicator of financial
                security than the unemployment rate and in turn, a better
                predictor of property crime rate. These two predictors can also
                be used in tandem to predict crime but their intrinsic
                correlation means we should refrain from doing so. The general
                assumption is that crime goes up when economic opportunities are
                low as a consequence of a poor economy. This is primarily based
                on observed behavior and studies are somewhat mixed on this
                (boils down to what statistic is being used). GDP, stock prices,
                or unemployment rates are usually used as a metric for the well
                being of the economy. However, these statistics aren't really
                representative of the population's wellbeing. For instance, at
                the start of the pandemic Wall Street was doing extremely well
                but a good proportion of the population was laid off. Thus if we
                only used metrics such as those then we would incorrectly
                believe that the general population was financially secure. In
                contrast, the poverty rate better captures how a population as a
                whole is doing. Thus the poverty rate can better predict a
                population's crime rate.
              </p>
              <p>
                While this project was certainly insightful, there are some
                considerations that must be made. The first is that since
                state-wide data was used, the project could be improved by
                looking at data at the city/county level. In short, at these
                levels there is a greater variation in population sizes and
                therefore a (possible) better representation of how various
                social pressures can turn individuals to crime. Unfortunately,
                as mentioned in the Design Choices section, data at these levels
                is highly non-uniform if not downright publicly unavailable,
                making it difficult to argue what data point(s) should be
                omitted. The second is that if we consider crime theory,
                predicting crime rate is intrinsically difficult. In brief,
                while some crime is premeditated, some crime is not and it is
                entirely due to chance/opportunity - the exact ratio is heavily
                disputed. Linear regression assumes that variables/processes can
                be explained i.e. that trends we observe are affected minimally
                by randomness. Therefore, if property crime is primarily due to
                chance/opportunity, then we should be aware how linear
                regression might "fail" us. Clearly crime is a complex issue and
                why it takes a coordinated effort of multiple government
                agencies to address.
              </p>
            </Accordion.Body>
          </Accordion.Item>
        </Accordion>
      </div>
    </>
  );
}
