cancel
Showing results for 
Search instead for 
Did you mean: 

Webhook inactive test in sandbox stopped succeeding

We're trying to develop webhooks handling in our payment application.  For development we have set up an AWS ELB to reroute to the internal development machine, and we are setting the webhook to go to the public address published by the ELB.  We got the "Test Webhook" button to say "ping successful" ... exactly once.  It showed "ping successful" that one time, even though our application returned a 404 error because the code to decode the webhook payload hasn't been written yet.

 

But thereafter, we now always see the message "Error occured in connecting to the endpoint" shown in the Auth.net control panel when we try that test.  We are able to check the network logging from the ELB, and none of the additional test requests are ever making it there now, much less to our development workstation.  Is there something that could cause the "Test Webhook" button to stop working if it gets a 404 response?  Failing that, has anyone done anything with an AWS ELB in this regard and seen similar problems with webhooks in their sandbox?

dsandberg
Contributor
1 ACCEPTED SOLUTION

Accepted Solutions
@dsandberg

A single 404 response (or dozens, or hundreds) won’t inactivate your webhooks. Your endpoint is tied to a domain name, so as long as your ELB is routing traffic properly, it is also a non factor. How long has the ELB and domain been established? It sounds like a DNS issue. Auth.net is slow to resolve DNS, and if you’ve only recently set up or changed your DNS records it will be on and off. About 3 days seems to be around the benchmark you have to hit to get a consistent hit. It will be on and off in the interim period. So if your server logs do not show a 500 response I would say it is likely DNS resolution.

View solution in original post

Renaissance
All Star
5 REPLIES 5
@dsandberg

A single 404 response (or dozens, or hundreds) won’t inactivate your webhooks. Your endpoint is tied to a domain name, so as long as your ELB is routing traffic properly, it is also a non factor. How long has the ELB and domain been established? It sounds like a DNS issue. Auth.net is slow to resolve DNS, and if you’ve only recently set up or changed your DNS records it will be on and off. About 3 days seems to be around the benchmark you have to hit to get a consistent hit. It will be on and off in the interim period. So if your server logs do not show a 500 response I would say it is likely DNS resolution.
Renaissance
All Star
And your localhost cannot receive webhooks. You have to use a service to accomplish that. To test your webhooks use one of the free online endpoints. If you get a ping on them you know it is DNS (again, assuming no 500 response)

Thanks for the input.  We have had the 2nd level DNS published for years, and the specific 3rd level DNS was created more than a week ago, so as much as I wish it was, I don't think DNS resolution is to blame.

 

Regarding localhost, our configuration is that the ELB is forwarding these webhook messages to the internal development workstation via a VPN which knows the IP addresses of our internal systems.  We can hit the same URL as specified for the webhook manually with an external browser and it reaches the endpoint in our application runnng on the development workstation in question, and we also see traffic logs of all such accesses from the ELB as well as seeing the actual code hit in our application.  But when we test the webhook that is pointed to that exact same URL (or when we set the webhook to active and then send through a payment on that sandbox), we get the message from Auth.net (for the webhook test only) that it couldn't resolve that URL, and we never see any network hit reaching the ELB in association with those tests.  Except, as I said, for the initial webhook test, which did make it through all of that and to our development workstation, but nothing since then, which is why I was concerned about the webhook being silently disabled for a 404 or something.  I am also concerned that AWS might have some sort of DDOS blocking that could have kicked in after repeated webhook messages from Authorize.net, but our network/AWS specialist assures me this is not the case.

 

Thanks again, and please let me know if this brings to mind any other possible explanations.

Okay, as of this afternoon it looks like you were probably correct in identifying DNS resolution as the culprit, because it has started working consistently without any further changes on our side.  Which is a little trouble in its own way, because if this DNS has existed for upwards of two weeks, and if it might take that long for the Auth.net DNS to update, I can imagine it becoming an issue for new production environments as well (if the production webhooks have the same kind of delay).  But for now I'm just happy that I can start developing code for handling the webhooks now.  Thanks much.

@dsandberg 

 

I imagine that somehow in the mix your ELB has caused a longer delay. I haven't seen 2 weeks, but 3 days I've seen consistently.  Might also be a hiccup with auth.net. Also your TTL could have impacted it if you've been rerouting. 

 

You will be fine for production, I think.