Concurrency issues when dealing with webhooks
Our application creates/updates database entries based on webhooks of external services. The webhook sends the external ID of the object so we can get more data to process. Webhook processing with round trips to get more data is 400-1200ms.
Sometimes multiple hooks for the same object ID are sent in milliseconds between each other. Here is the timestamp of the most recent occurrence:
2020-11-21 12:42:45.812317+00:00
2020-11-21 20:03:36.881120+00:00 <-
2020-11-21 20:03:36.881119+00:00 <-
Around this time other objects can also be sent for processing. The problem is that concurrent processing of the two hooks highlighted above will create two new database entries for the same single object.
Q: What is the best way to prevent two highlighted entries from being processed at the same time?
Things I've tried: Currently, at the beginning of the incoming hook, I create a database entry in the Changes table that stores the object ID. Before processing, the "changes" table is checked for an entry created for this ID in the last 10 seconds;if one is found, it exits to let another process do its work.
In the above case, two database entries are created, and since they are very close in time, they both arrive at the detection point at the same time, discover each other and exit, thus doing nothing.
I've thought about adding some jitter timeout before the check (increasing processing time), or locking the table (increasing processing time again), but it all feels like I'm fighting the wrong battle.
Any suggestions?
Our API is Django 3.1 with Postgres database
If you look at the ubiquitous webhook documentation , they provide a field called , which makes the webhook idempotent . Here's a quote I can salvage:action
Action or plan rescheduled to cancel the change or order.completed depending on the action that initiated the webhook call
Different actions :
- Invoke a booking when initially booking an appointment
- Reschedule is called when an appointment is rescheduled to a new time
- canceled is called whenever an appointment is cancelled
- changed will be called when the appointment is changed in any way. This includes when it was originally scheduled, rescheduled or cancelled, and when appointment details such as email addresses or receiving forms were updated.
- order.completed is called when the order is complete
Based on the wording, I think yes , and both are unique per OBJECT_ID , which means you can use a unique common constraint on these messages:scheduled
canceled
order.completed
class AcquityAction(models.Model):
id = models.CharField(max_length=17, primary_key=True)
class AcquityTransaction(models.Model):
action = models.ForeignKey(AcquityAction, on_delete=models.PROTECT)
object_id = models.IntegerField()
class Meta:
unique_together = [['object_id', 'action_id']]
You can replace with models if you want , but I prefer to put them in the database.AcquityAction
Enumeration Field
I would ignore that event entirely , as it seems to fire every event by their vague definition . For the event, I'll create a model that allows you to use a unique constraint on the new date, so something like this:change
rescheduled
class Reschedule(models.Model):
schedule = models.ForeignKey(MyScheduleModel, on_delete=models.CASCADE)
schedule_date = models.DateTimeField()
class Meta:
unique_together = [['schedule', 'schedule_date']]
Also, you might want to execute a task dedicated to updating your scheduling model with rescheduled dates, so that it remains idempotent .
Now, in your view, you would do the following:
from django.db import IntegrityError
ACQUITY_ACTIONS = {'scheduled', 'canceled', 'order.completed'}
def webhook_view(request):
validate(request)
action = get_action(request)
if action in ACQUITY_ACTIONS:
try:
insert_transaction()
except IntegrityError:
return HttpResponse(200)
webhook_task.delay()
elif action == 'rescheduled':
other_webhook_task.delay()
...