Client-Driver Matching for NYCabs

Scenario

NYCabs is an upcoming (fictional) private cab/rideshare app. I was tasked with implementing the core part of the service, that of matching client requests to available drivers. Since each driver sent position updates roughly every 5 seconds, it was a large stream of data which had to be handled real-time. To handle the scale, the best approach was to use stream-processing model of computation using the frameworks, Apache Kafka and Apache Samza.

What did I use?

Java
Apache Kafka
Apache Samza
AWS

Implementation

The program got a live stream of data where drivers updated their locations on a regular basis and clients requested for rides to arrive at some time. Each line of input was a record containing an event. The program parsed those events and based on the type of the event, wrote the messages to the Kafka topics driver-locations and events. These topics resided in a Samza cluster which had a Samza job to read through these streams and produce client-driver matching. The information about the matched client-driver pair was outputted in another stream match. Each stream was a json having the following attributes:

driver-locations
This stream was a stream of free driver locations as they moved through the city. It included both the client and the drivers. Example-

{"driverId":131,"blockId":3214,"latitude":40.7519871,"longitude":-74.0047584,
"type":"DRIVER_LOCATION"}

events
This stream was a stream of events that were separate from driver location updates. Example-

{"blockId":5647,"driverId":7806,"latitude":40.7901188,"longitude":-73.9747985,"type":"ENTERING_BLOCK","status":"AVAILABLE","rating":2.14,"salary":11,"gender":"F"}

{"blockId":1930,"clientId":6343,"latitude":40.731471,"longitude":-73.9901805,"type":"RIDE_REQUEST","gender_preference":"N"}

match
This stream outputted the client-driver pair matching produced by the Samza job. Example-

{"clientId":902,"driverId":434}

Stream matching

Apart from the distance from the client, other factors like gender preference, driver’s rating and salary were also taken into consideration to calculate driver scores and then match with the client.

Conclusion

This was a really interesting project as I got first-hand experience into designing solution for a problem which companies like Uber and Lyft are solving at a much larger scale. It gave some interesting insights into live data processing, and the intricacies involved in designing and debugging such systems.