Fraud Detection Model: A Machine Learning Case Study

Project Overview

This project demonstrates how machine learning can be applied to detect fraudulent financial transactions. Using a dataset of ~200 companies’ financial statements from Kaggle, I developed a classification model that achieved 91% accuracy in identifying fraud. The goal was not only predictive performance; but also interpretability, ensuring that investigators could understand and trust the results.

Data Set

  • Size: ~200 companies’ financial statements, 32 features

  • Features included: transaction type, amount, balance data, origin/destination accounts, fraud label

  • Target variable: binary classification — fraudulent (1) vs. legitimate (0)

  • Preprocessing: handled nulls, encoded categorical features (e.g., transaction type), normalized numerical fields.

Methodology

  • Data Preparation

    • Cleaned missing values and reformatted categorical fields.

    • Encoded the Fraud column as binary (1 = fraud, 0 = non-fraud).

    • Normalized numeric features to improve comparability.

  • Feature Engineering

    • Created dummy variables for categorical entries.

    • Examined correlations between financial ratios and fraud outcomes.

  • Modeling

    • Trained Logistic Regression, Decision Trees, and Random Forest classifiers.

    • Applied cross-validation to evaluate stability of results.

    • Tuned hyperparameters of Random Forest for maximum recall (minimizing false negatives).

  • Evaluation Metrics

    • Accuracy, Precision, Recall, F1 Score.

    • Confusion matrix to visualize false positives/negatives.

    • ROC/AUC to measure classification strength.

Findings

  • Random Forest achieved ~91% accuracy, outperforming baseline models.

  • Key predictive features included debt ratios, liabilities, and profit measures.

  • Because the dataset was relatively balanced, the model achieved strong recall (low rate of missed fraud cases).

  • Demonstrated that machine learning can add measurable value in fraud risk assessment, complementing traditional auditing methods.

Download Code Here

Previous
Previous

Operation Hydra: Darknet Marketplace Investigation