Adjusting Misreported Count Data in Forensic Analysis

A.E. Rodriguez
June 1, 2024

Count Series Suspected of Being Misreported


alt text


  • insurance or Medicare claims
  • excess deaths
  • fire starts
  • medical visits
  • products consumed
  • arrivals
  • species per geographic unit
  • applicants



Old School


alt text


Suppose you now the following:

  • number of times claims were filed: 100 months
  • some indication of the rate of pilfering: 25%
  • the average of reported claims per month: 58 if follows that:
    0.75xL1 + 0.25xL2 = 58
  • if we have some indication of what the average inflated counts is: say 60
  • Then we can solve for the unknown actual level of claims and determine the level of graft: L1 = 57



The Proposed Mixture Model

alt text



Math Expression

\( f(x; \Phi) = \sum_{j=1}^{g} (\pi_i f_j (x; \theta_j )) \)

.



Poisson DGP

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring within a fixed interval of time or space. These events must occur with a known constant mean rate and independently of the time since the last event. It is particularly useful for modeling the number of events that occur randomly over a given period or in a specific area.

Math Expression

\( f(y) = \frac{\lambda^y}{y!} e^{-\lambda} \)


λ is the average rate of events,

Bernoulli DGP



The Bernoulli distribution is often used to model binary outcomes, such as success/failure, yes/no, true/false, cheat did not cheat scenarios.

Math Expression

Density Function

The Density function of a Bernoulli-distributed random variable is given by:

P(X = x) = { p if x = 1, (1 - p) if x = 0 }


Properties

  • Mean (Expected Value): E[X] = p
  • Variance: Var(X) = p(1 - p)
  • Support: x ∈ {0, 1}

Clustering and Classification Libraries


Package Version Non-Gaussian Components Classification
Rmixmod 2.1.10 Yes Yes
mixR 0.2.0 Yes Yes
MixAll 1.5.1 Yes Yes
mixtools 2.0.0 Yes Yes
mclust 6.0.0 No Yes

Claims Analysis: Results



alt text



Estimated model parameters from mclust are λ1 = 49 and λ2 = 56.

The estimated levels are not too different from the simulated ones. Classification of the claims allows us to establish the rate of pilfering. </p


Identifying the Pilfering Rate


Series Mean Count
Latent 44.3 42
Altered 59 58


58/100 = 58 percent

Simulation Results

alt text

Average estimated “cheat-rate” is 45.5 percent. The set cheat-rate of 25 percent and the average estimated cheat rate vary significantly.

The average of the mean of the estimated series equals 49.4; this results does conform quite closely to the set mean of 50.

References

  • Brody, et al (2022). The Effects of Cognitive Bias on Fraud Examiner Judgments and Decisions. Journal of Forensic Accounting Research, 7(1), 50-63.

  • Ioannidis, J. P. (2021). Over- and under-estimation of COVID019 deaths. European Journal of Epidemiology, 36(6), 581-588.

  • Li, T., et al, (2003). Modeling Response Bias in Count: A Structural Approach with an Application to the National Crime Victimization Survey Data. Sociological Methods and Research, 31(4), 514-544.

  • Neubauer, G., Djuras, G., & Friedl, H. (2011). Models for Underreporting: A Bernoulli Sampling Approach for Reported Counts. Austrian Journal of Statistics, 40(1 & 2), 85-92.

  • Pararai, M., Famoye, F., & Lee, C. (2010). Generalized Poisson-Poisson Mixture Model for Misreported Counts with an Application to Smoking Data. Journal of Data Science, 8(4), 607-617.

  • Rodriguez, A. E., & Kucsma, K. (2023). Appraising Audit Error in Medicaid Audits. International Journal of Accounting and Financial Reporting, 13(3), 2162-3082.

  • Schennach, S. (2022). Measurement Systems. Journal of Economics Literature, 60(4), 1223-63.
  • Scrucca, L., Fraley, C., Murphy, T. B., & Raftery, A. E. (2023). Model-Based Clustering, Classification, and Density Estimation Using mclust in R. Chapman and Hall/CRC.

  • Stamey, J. D., & Young, D. M. (2005). Maximum Likelihood Estimation for a Poisson Rate Parameter With Misclassified Counts. Aust. N. Z. J. Stat., 47(2)

Thank you



alt text



arodriguez@newhaven.edu
Department of Economics & Business Analytics Pompea College of Business University of New Haven