Fraud Detection

.
4 min readSep 24, 2018

--

Fraud case detection: customer-biker success booking pair.

This is multi-phase process:
1. Identify pairs (customer, biker) w/ high repetitive counts, i.e: more than 30 times.
2. For each pair, run conditional-probabilistic analysis:
— P(cancel | different bikers), i.e: What is the probability of customer cancels a booking, when different biker accepts the task?
— P(cancel | same biker).
— P(success| different biker)
— P(success | same biker)

3. Based on the analysis, label biker with probability of fraud, P(fraud).

— —

1. Suspicious Pair Scanning

## Ultimately, we want our fraud system to mark a booking: fraud: 0 or 1.
## — input: {customer_id, biker_id, device_ids, location, bla, time}
## who is the benificial of fraud? BIKER. A biker needs customer to earn more
## money.
##
## Solely depends on the occurances of repeat bookings (customer-biker)
## is not strong enough to mark it as fraud case.
##
## Within these data: further drill down customer/biker behavior.
## — customer: you need to see the cancelling pattern, if he gets another biker, will he cancel? (personally, if P(cancelling| !biker) > 0.7 — this is fraud)
## — biker: he can cheat and pretend doing good job at the same time.P(cancelling|!customer)
select count(*) as `matched bookings`, t.customer_id, t.sp_id, t.created_at as `since`, c.email as `customer_email`,c.full_name as `customer_fullname`, sp.email as `biker_email`
from gb_m_task t
inner join gb_m_sp sp on sp.sp_id = t.sp_id and sp.email not like ‘%gobike.asia’
inner join gb_m_customer c on c.customer_id = t.customer_id and c.email not like ‘%gobike.asia’
where t.sp_id is not null and task_type != 5
group by t.customer_id, t.sp_id
having count(*) > 30
order by count(*) desc
limit 2000;

2. Conditional Probabilistic Analysis

# case study 1, biker: 1041584, customer: 1005218
# case study 2, biker: 1049917, customer: 1020884 (confirm cheater)
SET @SP_ID := 1045597;
SET @CUST_ID := 1011316;
SELECT
# cancel_ob_count/ob_count
sum(case when stage=5 and sp_id != @SP_ID then 1 else 0 end) /sum(case when sp_id != @SP_ID then 1 else 0 end) as `P(cancel | different biker)`,
# success_ob_count/ob_count
sum(case when stage=6 and sp_id != @SP_ID then 1 else 0 end) /sum(case when sp_id != @SP_ID then 1 else 0 end) as `P(success | different biker)`,
# cancel_sb_count/sb_count
sum(case when stage=5 and sp_id = @SP_ID then 1 else 0 end) /sum(case when sp_id = @SP_ID then 1 else 0 end) as `P(cancel | same biker)`,
# success_sb_count/sb_count
sum(case when stage=6 and sp_id= @SP_ID then 1 else 0 end) /sum(case when sp_id = @SP_ID then 1 else 0 end) as `P(success | same biker)`,
sum(case when stage=6 and sp_id= @SP_ID then 1 else 0 end) as `success_sb_count`,
sum(case when stage=6 and sp_id != @SP_ID then 1 else 0 end) as `success_ob_count`,
sum(case when stage=5 and sp_id = @SP_ID then 1 else 0 end) as `cancel_sb_count`,
sum(case when stage=5 and sp_id != @SP_ID then 1 else 0 end) as `cancel_ob_count`,
sum(case when sp_id != @SP_ID then 1 else 0 end) as `ob_count`,
sum(case when sp_id = @SP_ID then 1 else 0 end) as `sb_count`
FROM gb_m_task
WHERE customer_id=@CUST_ID
AND sp_id IS NOT NULL;

Given a suspected pair(customer, biker), find the probability of customer cancels a task when its different biker. Im using conditional probability here.

P( cancel | other bikers) = ?

If this is high…We can definitely classify it as fraud case.

The calculation from the picture is calculated from this pair:
- customer: 1005218
- biker: 1041584

you can see that P( cancel | other bikers) = 100%
P( cancel | same biker) = 0.2%
P( success | same biker) = 99.73%
P( success | other bikers) = 0%

  • to minimise false positive(marked as fraud but its not) — we need to drill into details.
  • to minimise false negative(marked as non-fraud, but it is indeed a fraud) — we need more variables.

False negative includes:

  • injection — quote DIDI `打针`.
  • teleport.
  • gps driving simulation.
  • different customers, same biker. Although same customer, same biker is easy to detect.

In the existing biker selection algorithm we implemented: nearest win. This is an open exploit for bikers to cheat easily.

Had to stop for quicker solution

In summary, gobike-motion detects fraudulent *COMPLETED* task based on 3 criterion:

1. Has biker reached pick-up point?
2. Has biker reached drop-off point?
3. Has biker travelled the entire journey (from pick-up to drop-off)?

Note: Fraud is positive if either fails.

— -

A general test steps:

  1. Create and complete a task as usual. (you might need to use fake-gps tool to travel as biker)
    2. Go to postman, under collection ‘gobike-motion (fraud radar)’, API: ‘check- fraudulent detection by task’.
    3. Obtain taskID from step 1, and replace and fire the API.
    4. On success, you shall see the following response, 2 outcomes:

Fraud is Positive (biker cheats)

{
“data”: {
“id”: 119447, // taskId
“spId”: 2016768,
“customerId”: 1037826,
“actual”: {
“distance”: {
“text”: “0.0011 km”,
“value”: 0.0011227540394162098
}
},
“estimation”: {
“distance”: {
“text”: “3.1800 km”,
“value”: 3.18
}
},
“fraud”: {
“positive”: true,
“reasons”: [
“Fail to reach pick-up point”,
“Fail to reach drop-off point”,
“Fail to travel entire journey”
]
}
}
}

Fraud is Negative (no cheat)


{
“data”: {
“id”: 191927,
“spId”: 1046174,
“customerId”: 1030150,
“actual”: {
“distance”: {
“text”: “6.6836 km”,
“value”: 6.6836433676888864
}
},
“estimation”: {
“distance”: {
“text”: “4.3100 km”,
“value”: 4.31
}
},
“fraud”: {
“positive”: false
}
}
}

--

--

.
.

Written by .

Engineer & runner based in Bangkok, Thailand.

No responses yet