January 29, 2024
I run a large marketplace application that has a Shopify sales channel application. Our users can connect this Shopify application to synchronize product data from their Shopify shops into our application, and to do that, we register webhooks via the Shopify API. Shopify uses webhooks to notify your application when an interesting event happens in a connected shop - for example, a new order, a change to an order, a new product, a change to a product, a new shipment, etc. When these events happen, you receive a HTTP POST with a JSON payload at a URL of your choosing. Our application primarily needs to know about product changes, so we register a few:
product_listings/delete, and a few mandatory webhooks for basic app functionality. We're notified when one of our connected shops adds, updates, or removes a product from our sales channel.
Shopify can send a large amount of traffic to your application. It can be quite bursty traffic. We actually run two Shopify apps - one with primarily B2C shops generates approximately 300 webhooks requests/shop/day, the other is primarily B2B and generates approximately 30 webhook requests/shop/day. Our install base is quite different for the two apps so my theory is that the type of shop is more important than the number of shops connected. Since webhooks are sent when inventory changes, a shop that is receiving many orders per day will generate more webhook requests for your application.
Shopify has a webhook timeout of 5 seconds and recommends your application returns a HTTP 200 within 2 seconds. Your application needs to be able to handle many webhook requests in a burst of traffic and respond within 2 seconds, so I recommend dispatching a queued job with the webhook data and returning a quick HTTP 200 to keep Shopify happy. Your queued job can then execute asynchronously without affecting your webhook endpoint response time.
Generally speaking, it's a good idea to avoid serializing more data than you need when queueing a job. I looked at how much memory a batch of webhook jobs might consume and found that a typical webhook payload is 3-5 kB of JSON. Assuming your queue backed up and you had 10,000 queued jobs, you'd use 50 MB of memory. I'm not concerned about this consumption in our application.
Laravel gives us job retrying for free, so it's trivial to implement. We periodically experience database lock timeouts when processing large batches of webhooks and jobs are almost always successful on the second try. I do have an unresolved concern about processing webhook data out of order - see below.
You should schedule a regular reconciliation job that syncs your data with Shopify data via the Admin API. It's possible that you had failed jobs, and it's also possible that a webhook wasn't sent for every single update. A nightly reconciliation job should resolve discrepancies and is recommended by Shopify.
This has not been a problem for us yet, but theoretically we could be processing webhook data out of order. Consider two webhook requests for a single product:
We will dispatch a queued job for each request in order. If the first job (inventory 10) fails, it's put back on the end of the queue. The second job (inventory 0) would then be processed, and update the product's inventory to 0. Finally, the first job (inventory 10) would be re-processed, and update the product's inventory to 10. We now have erroneous data because of a transient job failure.
This is quite an edge case: we'd have to receive webhooks for the same product in quick succession, and have a job fail, and have it fail in the right order. That said, I think the solution is quite simple:
A frequently scheduled command or daemon could be used to process webhook data instead of a queued job, depending on your application's requirements.
Cons: this approach does require database inserts every time you receive a webhook, and inserts are certainly higher impact than redis writes. The table would need to be pruned to avoid growing out of size.
Pros: job failures would not result in invalid data. Queue memory consumption would be negligible. Metrics would be very easy to summarize.
I'm feelin' a PR coming on!
This blog is an experiment: less about education and more about documenting the oddities I've experienced while building web apps.
My hope is that you're here because of a random search and you're experiencing something I've had to deal with.