Fraud case detection: customer-biker success booking pair.
This is multi-phase process:
1. Identify pairs (customer, biker) w/ high repetitive counts, i.e: more than 30 times.
2. For each pair, run conditional-probabilistic analysis:
— P(cancel | different bikers), i.e: What is the probability of customer cancels a booking, when different biker accepts the task?
— P(cancel | same biker).
— P(success| different biker)
— P(success | same biker)
3. Based on the analysis, label biker with probability of fraud, P(fraud).
— —
1. Suspicious Pair Scanning
## Ultimately, we want our fraud system to mark a booking: fraud: 0 or 1.
## — input: {customer_id, biker_id, device_ids, location, bla, time}## who is the benificial of fraud? BIKER. A biker needs customer to earn more
## money.
##
## Solely depends on the occurances of repeat bookings (customer-biker)
## is not strong enough to mark it as fraud case.
##
## Within these data: further drill down customer/biker behavior.
## — customer: you need to see the cancelling pattern, if he gets another biker, will he cancel? (personally, if P(cancelling| !biker) > 0.7 — this is fraud)
## — biker: he can cheat and pretend doing good job at the same time.P(cancelling|!customer)select count(*) as `matched bookings`, t.customer_id, t.sp_id, t.created_at as `since`, c.email as `customer_email`,c.full_name as `customer_fullname`, sp.email as `biker_email`
from gb_m_task t
inner join gb_m_sp sp on sp.sp_id = t.sp_id and sp.email not like ‘%gobike.asia’
inner join gb_m_customer c on c.customer_id = t.customer_id and c.email not like ‘%gobike.asia’
where t.sp_id is not null and task_type != 5
group by t.customer_id, t.sp_id
having count(*) > 30
order by count(*) desc
limit 2000;
2. Conditional Probabilistic Analysis
# case study 1, biker: 1041584, customer: 1005218
# case study 2, biker: 1049917, customer: 1020884 (confirm cheater)
SET @SP_ID := 1045597;
SET @CUST_ID := 1011316;SELECT
# cancel_ob_count/ob_count
sum(case when stage=5 and sp_id != @SP_ID then 1 else 0 end) /sum(case when sp_id != @SP_ID then 1 else 0 end) as `P(cancel | different biker)`,# success_ob_count/ob_count
sum(case when stage=6 and sp_id != @SP_ID then 1 else 0 end) /sum(case when sp_id != @SP_ID then 1 else 0 end) as `P(success | different biker)`,# cancel_sb_count/sb_count
sum(case when stage=5 and sp_id = @SP_ID then 1 else 0 end) /sum(case when sp_id = @SP_ID then 1 else 0 end) as `P(cancel | same biker)`,# success_sb_count/sb_count
sum(case when stage=6 and sp_id= @SP_ID then 1 else 0 end) /sum(case when sp_id = @SP_ID then 1 else 0 end) as `P(success | same biker)`,sum(case when stage=6 and sp_id= @SP_ID then 1 else 0 end) as `success_sb_count`,
sum(case when stage=6 and sp_id != @SP_ID then 1 else 0 end) as `success_ob_count`,
sum(case when stage=5 and sp_id = @SP_ID then 1 else 0 end) as `cancel_sb_count`,
sum(case when stage=5 and sp_id != @SP_ID then 1 else 0 end) as `cancel_ob_count`,
sum(case when sp_id != @SP_ID then 1 else 0 end) as `ob_count`,
sum(case when sp_id = @SP_ID then 1 else 0 end) as `sb_count`FROM gb_m_task
WHERE customer_id=@CUST_ID
AND sp_id IS NOT NULL;
Given a suspected pair(customer, biker), find the probability of customer cancels a task when its different biker. Im using conditional probability here.
P( cancel | other bikers) = ?
If this is high…We can definitely classify it as fraud case.
The calculation from the picture is calculated from this pair:
- customer: 1005218
- biker: 1041584
you can see that P( cancel | other bikers) = 100%
P( cancel | same biker) = 0.2%
P( success | same biker) = 99.73%
P( success | other bikers) = 0%
- to minimise false positive(marked as fraud but its not) — we need to drill into details.
- to minimise false negative(marked as non-fraud, but it is indeed a fraud) — we need more variables.
False negative includes:
- injection — quote DIDI `打针`.
- teleport.
- gps driving simulation.
- different customers, same biker. Although same customer, same biker is easy to detect.
In the existing biker selection algorithm we implemented: nearest win. This is an open exploit for bikers to cheat easily.
Had to stop for quicker solution
In summary, gobike-motion detects fraudulent *COMPLETED* task based on 3 criterion:
1. Has biker reached pick-up point?
2. Has biker reached drop-off point?
3. Has biker travelled the entire journey (from pick-up to drop-off)?
Note: Fraud is positive if either fails.
— -
A general test steps:
- Create and complete a task as usual. (you might need to use fake-gps tool to travel as biker)
2. Go to postman, under collection ‘gobike-motion (fraud radar)’, API: ‘check- fraudulent detection by task’.
3. Obtain taskID from step 1, and replace and fire the API.
4. On success, you shall see the following response, 2 outcomes:
Fraud is Positive (biker cheats)
{
“data”: {
“id”: 119447, // taskId
“spId”: 2016768,
“customerId”: 1037826,
“actual”: {
“distance”: {
“text”: “0.0011 km”,
“value”: 0.0011227540394162098
}
},
“estimation”: {
“distance”: {
“text”: “3.1800 km”,
“value”: 3.18
}
},
“fraud”: {
“positive”: true,
“reasons”: [
“Fail to reach pick-up point”,
“Fail to reach drop-off point”,
“Fail to travel entire journey”
]
}
}
}
Fraud is Negative (no cheat)
{
“data”: {
“id”: 191927,
“spId”: 1046174,
“customerId”: 1030150,
“actual”: {
“distance”: {
“text”: “6.6836 km”,
“value”: 6.6836433676888864
}
},
“estimation”: {
“distance”: {
“text”: “4.3100 km”,
“value”: 4.31
}
},
“fraud”: {
“positive”: false
}
}
}