Fraud Detection

4 min readSep 24, 2018

Fraud case detection: customer-biker success booking pair.

This is multi-phase process:
1. Identify pairs (customer, biker) w/ high repetitive counts, i.e: more than 30 times.
2. For each pair, run conditional-probabilistic analysis:
— P(cancel | different bikers), i.e: What is the probability of customer cancels a booking, when different biker accepts the task?
— P(cancel | same biker).
— P(success| different biker)
— P(success | same biker)

3. Based on the analysis, label biker with probability of fraud, P(fraud).

— —

1. Suspicious Pair Scanning

## Ultimately, we want our fraud system to mark a booking: fraud: 0 or 1.
## — input: {customer_id, biker_id, device_ids, location, bla, time}## who is the benificial of fraud? BIKER. A biker needs customer to earn more
## money. 
##
## Solely depends on the occurances of repeat bookings (customer-biker) 
## is not strong enough to mark it as fraud case.
##
## Within these data: further drill down customer/biker behavior.
## — customer: you need to see the cancelling pattern, if he gets another biker, will he cancel? (personally, if P(cancelling| !biker) > 0.7 — this is fraud)
## — biker: he can cheat and pretend doing good job at the same time.P(cancelling|!customer)select count(*) as `matched bookings`, t.customer_id, t.sp_id, t.created_at as `since`, c.email as `customer_email`,c.full_name as `customer_fullname`, sp.email as `biker_email`
from gb_m_task t
inner join gb_m_sp sp on sp.sp_id = t.sp_id and sp.email not like ‘%gobike.asia’
inner join gb_m_customer c on c.customer_id = t.customer_id and c.email not like ‘%gobike.asia’ 
where t.sp_id is not null and task_type != 5
group by t.customer_id, t.sp_id 
having count(*) > 30
order by count(*) desc
limit 2000;

2. Conditional Probabilistic Analysis

# case study 1, biker: 1041584, customer: 1005218
# case study 2, biker: 1049917, customer: 1020884 (confirm cheater)
SET @SP_ID := 1045597;
SET @CUST_ID := 1011316;SELECT 
# cancel_ob_count/ob_count
sum(case when stage=5 and sp_id != @SP_ID then 1 else 0 end) /sum(case when sp_id != @SP_ID then 1 else 0 end) as `P(cancel | different biker)`,# success_ob_count/ob_count
sum(case when stage=6 and sp_id != @SP_ID then 1 else 0 end) /sum(case when sp_id != @SP_ID then 1 else 0 end) as `P(success | different biker)`,# cancel_sb_count/sb_count
sum(case when stage=5 and sp_id = @SP_ID then 1 else 0 end) /sum(case when sp_id = @SP_ID then 1 else 0 end) as `P(cancel | same biker)`,# success_sb_count/sb_count
sum(case when stage=6 and sp_id= @SP_ID then 1 else 0 end) /sum(case when sp_id = @SP_ID then 1 else 0 end) as `P(success | same biker)`,sum(case when stage=6 and sp_id= @SP_ID then 1 else 0 end) as `success_sb_count`,
sum(case when stage=6 and sp_id != @SP_ID then 1 else 0 end) as `success_ob_count`,
sum(case when stage=5 and sp_id = @SP_ID then 1 else 0 end) as `cancel_sb_count`,
sum(case when stage=5 and sp_id != @SP_ID then 1 else 0 end) as `cancel_ob_count`,
sum(case when sp_id != @SP_ID then 1 else 0 end) as `ob_count`,
sum(case when sp_id = @SP_ID then 1 else 0 end) as `sb_count`FROM gb_m_task 
WHERE customer_id=@CUST_ID 
AND sp_id IS NOT NULL;

Given a suspected pair(customer, biker), find the probability of customer cancels a task when its different biker. Im using conditional probability here.

P( cancel | other bikers) = ?

If this is high…We can definitely classify it as fraud case.

The calculation from the picture is calculated from this pair:
- customer: 1005218
- biker: 1041584

you can see that P( cancel | other bikers) = 100%
P( cancel | same biker) = 0.2%
P( success | same biker) = 99.73%
P( success | other bikers) = 0%

to minimise false positive(marked as fraud but its not) — we need to drill into details.
to minimise false negative(marked as non-fraud, but it is indeed a fraud) — we need more variables.

False negative includes:

injection — quote DIDI `打针`.
teleport.
gps driving simulation.
different customers, same biker. Although same customer, same biker is easy to detect.

In the existing biker selection algorithm we implemented: nearest win. This is an open exploit for bikers to cheat easily.

Had to stop for quicker solution

In summary, gobike-motion detects fraudulent *COMPLETED* task based on 3 criterion:

1. Has biker reached pick-up point?
2. Has biker reached drop-off point?
3. Has biker travelled the entire journey (from pick-up to drop-off)?

Note: Fraud is positive if either fails.

— -

A general test steps:

Create and complete a task as usual. (you might need to use fake-gps tool to travel as biker)
2. Go to postman, under collection ‘gobike-motion (fraud radar)’, API: ‘check- fraudulent detection by task’.
3. Obtain taskID from step 1, and replace and fire the API.
4. On success, you shall see the following response, 2 outcomes:

Fraud is Positive (biker cheats)

{
 “data”: {
 “id”: 119447, // taskId
 “spId”: 2016768,
 “customerId”: 1037826,
 “actual”: {
 “distance”: {
 “text”: “0.0011 km”,
 “value”: 0.0011227540394162098
 }
 },
 “estimation”: {
 “distance”: {
 “text”: “3.1800 km”,
 “value”: 3.18
 }
 },
 “fraud”: {
 “positive”: true,
 “reasons”: [
 “Fail to reach pick-up point”,
 “Fail to reach drop-off point”,
 “Fail to travel entire journey”
 ]
 }
 }
}

Fraud is Negative (no cheat)


{
 “data”: {
 “id”: 191927,
 “spId”: 1046174,
 “customerId”: 1030150,
 “actual”: {
 “distance”: {
 “text”: “6.6836 km”,
 “value”: 6.6836433676888864
 }
 },
 “estimation”: {
 “distance”: {
 “text”: “4.3100 km”,
 “value”: 4.31
 }
 },
 “fraud”: {
 “positive”: false
 }
 }
}