Webhook Security for API Providers

If you are building a SaaS product that sends webhooks to your customers, you are the provider. That means the security responsibility is yours. It is not just about protecting your own infrastructure, but also giving your customers the tools to receive events safely and verify them correctly. This post covers what that actually looks like in practice: what to build, where things break in production, and what to document so your customers can hold up their side correctly.

Signing payloads

Every webhook you send should be signed. Without a signature, your customer has no way to verify that an event came from you rather than someone who discovered their endpoint URL. The mechanism is straightforward. You compute an HMAC-SHA256 signature of the request body using a shared secret, include it in a header, and your customer recomputes it on their side to verify. What matters more than the mechanism is getting the implementation details right on both sides.

Sign the timestamp alongside the body, not separately. If the timestamp lives only in a header and is not part of what you sign, it can be modified without invalidating the signature. Include both in the signed string.

For your customers, two things need to be explicit in your documentation. First, they must verify the signature against the raw request body before parsing it. Most web frameworks parse the JSON body automatically before any middleware runs. The re-serialised output can have different whitespace or key ordering than what you signed, causing verification to fail in ways that are difficult to diagnose. Tell them to read raw bytes first, verify, then parse.

Second, they must use constant-time comparison when checking signatures, not regular string equality. Regular equality leaks timing information that can be exploited to reconstruct the correct signature incrementally. In Python this is hmac.compare_digest. In Node.js it is crypto.timingSafeEqual. In PHP it is hash_equals. Put this in your documentation clearly. Most verification failures come from customers who did not know about either of these requirements.

On the provider side, signing secrets need proper storage. A database column works for one customer. For hundreds of customers, each with their own secret, secrets belong in a dedicated secrets manager such as AWS Secrets Manager, HashiCorp Vault, or GCP Secret Manager, with access scoped to delivery workers only.

Replay prevention

A valid signed event can be captured and replayed later. The signature is still correct because the secret has not changed. The customer processes a duplicate payment, provisions a duplicate account, or triggers a workflow that should only run once.

Two things working together prevent this.

Timestamp validation. Include a timestamp in every event as part of the signed payload. Document that customers should reject events outside a defined window. Five minutes is standard. Since the timestamp is included in what you sign, it cannot be modified without invalidating the signature.

One operational reality worth knowing: customer server clocks drift. A customer whose NTP synchronisation is misconfigured will start seeing timestamp validation failures on legitimate events. Your error responses need to be specific enough for them to diagnose this themselves. “Timestamp outside acceptable range” is useful. “Validation failed” is not.

Stable event IDs. Every event you send should carry a unique ID that stays the same across all retry attempts for that event. Document that customers should store processed event IDs and reject duplicates. This is a shared responsibility. You provide the stable ID. They deduplicate on it. Both sides need to hold up for the protection to work.

SSRF protection

When customers configure their webhook endpoint URLs, your delivery workers make outbound HTTP requests to those URLs. This means you are making HTTP requests based on user-supplied input, which creates a Server-Side Request Forgery risk.

A URL pointing to http://169.254.169.254/latest/meta-data/, the AWS instance metadata endpoint, will return IAM credentials if your workers can reach it. Internal admin services, databases with HTTP interfaces, and internal dashboards all become reachable if your delivery workers have access to private IP ranges.

The architectural fix is running delivery workers in a network segment with no routing to internal services. Regardless of what URL is supplied, connections to internal addresses are blocked at the network level.

Add DNS-level filtering as a second layer. A filtering proxy that resolves the destination URL before connecting and rejects requests resolving to private IP ranges catches DNS rebinding attacks, where a public domain passes an initial check but later resolves to an internal address when the connection is made.

Rolling secrets

Customers need to rotate signing secrets. The implementation detail that matters most is that rotation should not cause any gap in webhook delivery. When a customer generates a new secret, there is a window between when the new secret is active and when they have deployed updated verification code. If the old secret is invalidated immediately, every event sent during that deployment window will fail verification.

Rolling secrets keep both the old and new secrets valid for an overlap period, typically a few hours to 24 hours. During this window you sign with the new secret but accept verification against either. When the window closes, the old secret is automatically revoked.

This becomes more complex across hundreds of customers. Your infrastructure needs to track rotation state per tenant, know which secrets are in their overlap window, and handle edge cases. A customer who initiates rotation and never completes it. A customer who rotates twice before the first rotation closes. Events in flight when the overlap window expires.

Planned rotation and emergency revocation are different operations and both need to exist. Planned rotation benefits from the overlap window. A suspected secret compromise needs immediate revocation with no overlap. Document both procedures clearly enough that a customer can execute either one without contacting support.

Transport security

Every webhook you send should be over HTTPS. Reject plain HTTP endpoint URLs in production. Do not just warn. Block them.Beyond that, the operational detail that matters most is what happens when a customer’s TLS certificate expires or is misconfigured. Your delivery will start failing. The behaviour you build here determines how bad the outcome is.

Retry with exponential backoff. Notify the customer after a defined failure threshold. Disable the endpoint before the queue becomes unmanageable. The notification needs to say exactly what happened and exactly what the customer needs to do to re-enable delivery. A vague error message at this moment generates a support ticket. A specific one lets the customer fix it themselves.

Document the full failure behaviour. How long do you retry before disabling? What does the notification say? How does the customer re-enable the endpoint after fixing the certificate? These are questions customers will ask when it happens to them. For customers in financial services, healthcare, and other regulated industries, mutual TLS is often a requirement. mTLS authenticates both sides of the connection cryptographically rather than just the server. If these sectors are part of your target market, mTLS support belongs on your roadmap.

Delivery logs

Full delivery logs are what let you and your customer reconstruct exactly what happened when something goes wrong. Every delivery attempt should capture the timestamp, destination endpoint, request headers, request body, response status code, response body, response time, and delivery status. Every retry attempt for the same event should be logged separately and linked to the same event ID so the full delivery history of any event is traceable.

Customers need access to these logs directly through a dashboard and through an API. API access matters because it lets customers pull logs into their own systems, run their own queries, and satisfy their audit requirements without depending on your dashboard. Be deliberate about what you log and what you do not. Full request and response bodies are useful for debugging but may contain personally identifiable information that creates compliance obligations if stored.

How Convoy handles this

None of these pieces is particularly complex in isolation. In production, across hundreds of customers, all at once, they become ongoing operational work that your team has to own.

You are not just implementing signing. You are managing secrets per tenant, handling rotation edge cases, isolating delivery infrastructure from SSRF risks, dealing with certificate failures, and maintaining delivery logs that customers depend on for debugging and audits.

This is the layer Convoy is built to handle.

Convoy is an open source webhooks gateway for API providers and SaaS teams that send webhooks at scale. You connect your application, configure your endpoints, and the security and delivery concerns described in this post are handled for you. Signing, replay prevention, SSRF protection, rolling secrets, TLS enforcement, mTLS, and full delivery logs available through both a dashboard and an API.

Your customers get a self-service portal to manage endpoints, rotate secrets, and debug delivery issues without contacting your team. Your team gets to focus on building the product instead of maintaining webhook infrastructure.

Self-hosted in your own infrastructure or fully managed on Convoy Cloud. Get started free at getconvoy.io or email us at [email protected].

Written by

Motunrayo Koyejo

Share

Signing payloads

Replay prevention

SSRF protection

Rolling secrets

Transport security

Delivery logs

How Convoy handles this

What Is a Dead Letter Queue?

Motunrayo Koyejo

Webhooks vs APIs: What's the Difference?

Motunrayo Koyejo