Contracts and expectations with data APIs

The phrase “application programming interface” (API) is officially quite generic. It encompasses interactions between software components as tight as data structures (e.g. C++’s standard template library) and as loose as structured queries over the Internet (e.g. GitHub, Yelp, Yahoo! Finance). A piece of code that exposes an API offers a contract with its clients: “if you interact with me in this particular way, I will perform these actions or return this data.”

APIs work really well when they’re able to be evaluated objectively. If I query the GitHub API about which repositories belong to user ‘mattspitz’, and the list it returns is the list of my repositories, then the API has done its job. If it doesn’t, it’s wrong. If I ask the Yelp API for the Mexican food restaurants within five miles of me that are open right now and it shows me a post office, a Japanese restaurant, a Mexican restaurant in Arizona, or a restaurant that’s closed for the night, the API is wrong. Similarly, if I use the ‘cos()’ function to compute the cosine of 0, the API is correct if it returns 1. It is incorrect otherwise. It’s worth noting that correctness is different from ease-of-use. If I have to call the GitHub API with a particular username, the name of my favorite nearby Mexican restaurant, and the square root of 7894 in order to get a user’s repositories as a backwards, hex-encoded string, that’s annoying, but the API is still correct. Fortunately, I have means by which to get the names of nearby Mexican restaurants and the square root of 7894, so that makes my life a little easier.

Over the last couple of years, I’ve worked on both internal and external recommendations APIs. Specifically, given a description of a user, either a username or a set of preferences describing a user’s taste, produce some personalized recommendations of a certain sort. As an example, “given that Lisa has been to these twenty-five restaurants in New York and San Francisco, recommend for her some Mexican restaurants within five miles of where she lives.” The same correctness expectation still applies as above. If anything other than nearby Mexican restaurants is recommended, the API is incorrect. Now, though, we have a subjective concept of correctness. Maybe Lisa has been to Casa de Guapo and doesn’t like it. Maybe she’s a vegan. Maybe she actually prefers Rosario’s (ranked 5th) to La Isla (ranked 3rd). The easy thing to say is that if the API produces good recommendations, it is correct, but even for the same set of results for the same user, two clients might (and often do) disagree on the results returned and their ranking.

This subjectiveness can be eliminated by explainability. Clients need to trust your results. Part of that trust comes from the explainability of your API. If the API recommends La Isla with the context “Lisa has been to Jimmy’s Roadside Grill, and 80% of people who liked Jimmy’s also liked La Isla”, the client can understand the nature of that recommendation and is less likely to question the result. In fact, I’ve seen that, with context, users are likely to blame themselves if the recommendations are perceived as incorrect. If a cat owner is recommended dog food but sees the reason as the dog toy purchase for her best friend last month, she’s less likely to get upset. At this point, the API is less subjective and easier to evaluate, much like GitHub or Yelp’s APIs described above.

If you can’t explain recommendations because you’re using an algorithm that uses inference rather than direct observations, your job becomes one of managing expectations with clients. From there, it’s a slippery slope to adding knobs and levers to your API for every single client, and soon enough, you don’t have a generic recommendations API anymore.