Since the year 2002, when Bezos issued a mandate to all teams at Amazon to expose data and functionality through APIs, service-oriented architecture has come a long way. Most of the companies expose and consume internal and external data and functionality through APIs, because APIs provide loose coupling where individual services can be managed, scaled, and evolved independently. The foundations of SOA also enable teams to move fast in dreaming up and building new functionality, faster testing cycles, and continuous deployment through CI/CD. Over the last decade, REST has become the default modus operandi for implementing software services. The reason, this article is called the API Manifesto is because, as APIs have become extremely important for organizations, getting them right is supercritical. A manifesto is a public document proclaiming the aim of an organization or a team or an individual. It not only declares the goal but also the means to achieve it. In this case, the manifesto is about achieving the best API outcome by employing the right design principles, implementing the right semantics, managing expectations of API consumers, and using the right tools to monitor, control, and debug. This page tries to cover the entire area of API Management. API Management includes API Design, implementation, authentication and authorization, rate limiting, audit logging, metrics, monetization, documentation, health checks, reporting, etc. In other words, it is a discussion on the best practices on the design and implementation of REST APIs which are followed across the industry. Best Practices in API DesignREST is an architectural style for modeling distributed systems as a set of resources. Resources can be data, objects, or services that can be accessed by clients. Every resource is represented by a URI which uniquely identifies the resource. Also, REST APIs (based on HTTP) are built around the HTTP actions such as GET, POST, PUT, DELETE, and PATCH. A well-designed API will have the following characteristics:
Organizing the APIs around resources
Define operations in terms of HTTP methodsMost of the REST APIs use HTTP semantics to implement APIs. Common HTTP methods used are:
The following table enumerates common implementation conventions for these HTTP methods with an example: Design the APIs around HTTP semanticsAll the guidelines in this section MUST be adhered to. MIME types
GET Methods
POST Methods
PUT Methods
DELETE Methods
HEAD Methods
PATCH MethodsThe PATCH request is used by clients to send updates to an existing resource in the form of a patch document. A patch document need not contain all the fields of the resource. In other words the patch document doesnt describe the whole resource but only the changes to be applied. There are mainly two JSON based document formats for patching. For the sake of further discussions, consider a resource with the following representation JSON Patch In this method, the document contains a list of data items which contain the specific directives such as “add”, “replace”, “remove”, “move” and “copy” operations. JSON Merge Patch This is a simpler format, where the patch document has a similar format to that of resource creation but will include just the subset of fields that should be changed or added. In addition to this, a field can be deleted by specifying “null” in the patch document as shown below. The media type for Merge Patch JSON payload must be “application/merge-patch+json”. In both cases, 200 (OK) must be returned as a success response. If the document format is not supported, the server should return 415 (unsupported media type), if the document is invalid, 400 (bad request) must be returned and if the document is valid but the changes cannot be applied to the object, 409 (conflict) must be returned as the response code. Patterns for API ImplementationGuidelines in this section are based on various features of APIs such as asynchronous operations and filtering. When implemented, the APIs SHOULD follow the below-listed common patterns to bring consistency across all the APIs. Handling Asynchronous OperationsIt is possible that some of the update operations (POST, PUT, DELETE) might take a while for the processing to complete, and if the API waits for completion of processing before sending a response, then it would cause unacceptable latency on the client’s end. In this case, it is better to implement it as an asynchronous operation. An asynchronous operation returns the HTTP status code 202 (accepted) along with the URI to “status” endpoint in the location header. For example: Status endpoint must be implemented to get the update on the request. If the client sends a GET request to the endpoint specified in the location, it must return the current status of the asynchronous request with a 200 (OK) status code. For example: While the asynchronous request is in progress, if a DELETE is sent to the status API endpoint, the processing should be canceled (if it is possible). Data filtering, sorting, and paginationWhen we expose a collection of resources (example: orders) there is a possibility that a large amount of data might be fetched when only a subset of the information should suffice. Let’s say that we just provided a plain vanilla REST API to access orders for a given customer. If the client wants to extract only those orders which exceeded a specific amount, the client has to get all orders and apply the filter and then extract the information needed. This is inefficient since clearly there is a wastage of processing power and bandwidth on both the client and server. An optimal way to accomplish this would be for the client to pass a set of filters to the API and for the API to apply those filters while reading data from the data source. Any API potentially returning a large number of items should implement filtering and pagination. This also limits the possibility of DoS (denial of service) attacks on the application layer. An example of filtering is shown below: /orders?mincost=500&status=SHIPPED&sort=createdDate&order=ascending The above example is a GET request, which contains filters as URL parameters. Since GET requests don’t support Body, APIs supporting complex filters may need to be implemented as a POST request, even though it is semantically incorrect. This example also shows the sorting directive passed using “sort” and “order”. The sort parameter contains “field name” (by which the records need to be sorted) as value and “order” contains either “ascending” or “descending” directive. An example of pagination is shown below: /orders?limit=25&offset=50 The above example depicts pagination through limit and offset. The first page will start from offset = 0, and the limit represents the page size that the client expects. As the client moves to the next page, usually the limit remains the same, but offset keeps increasing. The API should implement default page size and maximum page size (to avoid DoS attacks). Response with pagination is generally structured as shown below. ![]() It should contain prev and next links along with the limit and offset of the current page. This allows the client program to easily navigate the pages. For the first-page prev link need not be provided and for the last next link need not be provided. That way the client knows when it has reached the end of data. VersioningAll APIs will evolve over time. As business requirements change, new resources may be added, old resources might be amended and relationships between different resources might change. However, the clients of the API might not have the bandwidth to consume the changes immediately. Hence, while continuing to innovate, improve, and evolve the APIs, it is imperative to help the existing client applications to continue to work without breaking their functionality. Versioning is an approach that enables us to achieve the aim of isolating existing clients from breaking when new changes are released. Versioning through URIIn this method, every time an API signature (data contract or behavior or response) changes a new version number is added to the URI of the resource. For example, https://api.xx.com/v1/orders, here v1/v2/v3 in the path indicates the version numbers. However, the existing versions should continue to operate as before, returning resource representation conforming to the original schema. Even though this versioning mechanism is simple, it depends on the server being able to route requests to appropriate end-point depending on the version path parameter. It becomes unwieldy as more and more versions are released. Navigability (including paths of objects) within REST results becomes more complicated as the paths need to include versions as well. Most of the API gateways support this type of versioning based routing of requests. Versioning through Query StringRather than using URIs to determine the version, the query string based versioning works by using a query parameter to specify the version of API being invoked, For example: /orders/12345?version=3&limit=25&offset=50 In this case, we need to implement a default version that will be returned when no version variable is specified. Also, in this case, the versioning needs to be handled within the code, which needs to parse the query string and construct the object conforming to that version. In other words, the routing to the right API endpoint cannot be handled by an API gateway or a load balancer. Versioning through HeaderThis approach works through custom header which indicated the version of API being invoked. The client is expected to add the custom header indicating the version. For example: GET https://api.xyz.com/orders/12345 HTTP/1.1 Custom-Header: api-version=1 Even in this case, the routing cannot be done by API gateway or load balancer. IdempotencyTheoretically, idempotency means the same operation repeated multiple times results in the same value. That is F(x) = F(F(x)). Read more on the patterns of idempotency here. Implement GET, PUT, and DELETE operations to be idempotent. In other words, the same request repeated over the same resource should result in the same state for the resource and the same response to the client without causing any side-effects. In case of hard delete, it is possible that the first time client gets a response of 204 (no content), but subsequent requests get 404 (resource not found) because it has been hard deleted. Otherwise, the API should ensure that it returns the same response (unless there is a server error). POST operations which do not create a resource, but perform processing on an object to move it from state A to state B are also ideal candidates for idempotency. In a loosely connected world of distributed systems, where there are many points of failure (servers, routers, switches etc), Idempotency acts to reduce friction. Let’s say the client initiated a transaction which timed out on the client’s side. The client at this point doesn’t know whether his request succeeded or failed. If it succeeded. If the client retries the same transaction, the server can either respond with an error code (returned by a state machine or the database), or it can return the same response it would return on success. The second option requires more work, but it reduces the ambiguity to the client. Avoiding ChattinessTo avoid chattiness, it is recommended to support POST and PUT over the entire collections (for example: /orders). A POST request should be able to accept the resource array in the payload and create them in bulk and a PUT request should be able to replace multiple resources in a collection. Error handlingIt is very important to pass the correct error codes and error descriptions to the clients. Any internal errors need to be caught and appropriate error responses returned to the clients. The framework/platform implementing the APIs should make sure that uncaught errors are not propagated to the clients. Try to avoid sending 500 status codes to clients because they are unactionable. For example, if a client is trying to delete an order when one of the lines is in the shipped state, return 409 (conflict) instead of returning 500 (system error). If due to any condition (or rules) the request is unachievable, return 400 (bad request). On many web servers / API gateways, you can specify Authentication providers. This routine is executed even before the request reaches the API endpoint. If an authentication error occurs on the webserver/API gateway, they return 401 (unauthorized). Once the client is authenticated, it is the responsibility of the API to authorize the client. In other words, to check the client’s privileges to execute the current API. If the authorization fails, the API should return 403 (forbidden). The list of HTTP response codes is enumerated in this document hosted by W3C. Visit this page to know more about all the standard HTTP response codes and what they mean. Enabling Client-side CachingIn distributed systems, network latency is something that cannot be wished away. The client will experience this every time it makes a request and receives the response. Wherever clients are frequently sending requests and receiving responses, we should aim to reduce the amount of network traffic flowing through the network. HTTP protocol supports caching by clients and intermediate proxy servers through which the request is routed by using cache-control headers. When the server sends a response for a client request, it needs to include Cache-Control headers in the response, which indicate whether data in the body can be safely cached and for how long. The example of such a response is shown below: ![]() In the above example, the Cache-Control header specifies that the content can be cached for 600 seconds (5 minutes) and only by a private client (such as a browser). In other words, the above response will not be cached in shared caches (such as a proxy). Specifying “public” in the Cache-Control header will enable caching on shared caches, whereas specifying “no-store” in the Cache-Control header will disable caching by the clients. A word of caution: Enabling client-side caching can make objects go stale in the cache. Based on this information, the client may try to update the object, which will cause data consistency issues. To avoid this, ETAGs need to be used. You can read more about it here. Using API GatewayIn an API centric world, we will have to expose our APIs to clients, which are dependent on our APIs to get things done. However, it is not secure to expose the API endpoints directly as if the endpoints are exposed, they can be hacked or attacked. API gateways serve the same purpose as proxies server for web applications. They provide a layer of managed indirection, hiding the real endpoint from the consumer while monitoring and protecting the endpoint. The most important function of an API gateway is rate-limiting. Rate limiting will reduce the occurrence of DoS (denial of service) attacks on the API. API gateways can also be used for offloading common functionality like SSL termination, authentication/authorization, metrics collection, audit logging, transformations, and so on. Some of the commercial and open-source APIs gateways available in the market are NGINX, MuleSoft, Kong, ZUUL etc.
Where API Gateway cannot be deployed, the assumption is that the API itself will implement the required features such as Authentication, Metrics, Audit logging, and Rate limiting. Backward compatibilityAn API is backward compatible if the client code written for the previous version of API works for the current version. In other words, a client which was written for version 1 can work with version 2. This has various advantages. The clients don’t have to invest in development effort every time an API changes. The release cycle will be faster since no clients will break because of a new release. Developers SHOULD, whenever possible, maintain the backward compatibility of the resources and objects (input/output). If the new changes make it impossible to maintain backward compatibility, a new resource and resource representation (input/output) will have to be created. The exception to this rule will be made when the current behavior or input/output constitutes a security threat or when the API has been incorrectly implemented, affecting a large number of customers. In this case, the API will be changed even if it breaks backward compatibility. However, when the API change is not backward compatible, customers need to be notified and educated. Rules of Backward CompatibilityStable URIs: The resource that existed at a given URI for the previous version, should continue to exist at the same URI without a change in meaning. HTTP response codes should not change between versions. However, the resource may support new query parameters in new versions, but they SHOULD be optional. Not providing them should not break the functionality. The new version of the resource can return a redirection response (301/302), which needs to be handled by the client. In this case, a location header MUST be sent to the client. Stable Representations (input/output objects): If a resource accepts a representation (input object), via POST or PUT, it MUST continue to accept the same representation in future versions. Additional properties are allowed but will NOT be mandatory. The default value that is substituted for the absent property must carry the same meaning as the previous version. Default values and limits: Default values with respect to page size, object (input/output) size limits, and rate limits (throttling) can change between versions. Robustness: To make it easier for our clients to use our APIs, we should build them robustly. API should be resilient to failures. This means it should be tolerable to variations in input data, query parameters, headers, etc. from the clients. API should decide how to handle the request only based on what it recognizes. For example: if a request query parameter from the client request is unrecognized, the server should ignore the query parameter. If any of the fields in the JSON payload are unrecognized, those fields should be ignored. If the header contains an unrecognized attribute, it must be ignored. Monitoring API HealthAPI monitoring is a serious business as many clients would depend on a business-critical API and an outage will cause dependent services and clients to fail. Monitoring APIs for an outage is not enough. We need to monitor APIs for failure to meet the SLA (see next section). The following attributes of an API call are recommended to be recorded, preferably at the API gateway level. Recording these metrics will help us in strong analytics and alerting capability.
Recording these either in a time series database (such as Prometheus/Clickhouse/Druid) or analytics database as idempotent entries will be very useful in a deep analysis of the API performance and uptime. External health checkChecking the health of APIs can be accomplished by periodically calling an API with demo/dummy data from outside the corporate network. When a consumer calls an external API, the call goes through a network of routers, switches, firewalls, load-balancers, and gateways. Even if one link is broken along the chain, the consumer experiences an outage. Internal monitoring will not provide a real world view of an API’s uptime. Hence it is absolutely necessary to have an external monitoring setup. Many companies like Postman provide such service. An API health check can be performed from a set of geographically distinct clients (Ex: US East, South Asia, etc) to measure uptime, latency, and outage from a customer perspective. AlertsFor API monitoring, it is not enough to measure the CPU and memory of the system or outage. It is equally important to make sure the API is complying with a given SLO (see below). To achieve this, alerts are very useful. It is recommended that alerting needs to be set up not only for outages but also for breaches of SLO. In other words, SLO breach (availability and performance) MUST be treated as an outage for operational purposes. SLAs, SLOs, and SLIsSLA (Service level agreement): The agreement a company makes with the consumer for a given API. The SLAs are generally drawn up by business/legal teams in terms of responsiveness, uptime, and responsibilities (customer vs provider). SLO (Service level objective): The objectives that the team must meet to satisfy the SLA. In other words, SLO is a line-item within an SLA which refers to a specific metric such as response time or uptime. SLOs are the individual promises that hold the engineering and DevOps teams accountable for meeting them. SLOs can also be defined for internal systems as well, for example, a CRM system or an IAM system. SLI (Service level indicator): Real metrics (numbers) gathered on performance and availability of a service. In other words, SLI measures the compliance of a measurement with respect to a given SLO. For example, let's say SLO for an API is 99.5% uptime. To meet the SLA, the SLI has to meet or exceed a given SLO. With respect to our example API, the SLI needs to meet or exceed 99.5% uptime to satisfy the SLA. It is obvious that before SLA or SLO can be provided, an API has to undergo performance and scalability testing. Providing a Monthly Uptime SLA Actual Monthly Uptime Percentage = (A-B+C)/A , where: Providing Response Time SLA Response time SLA for a given (set of) APIs can be provided in terms of either 90th, or 95th or 99th percentile response times either in milliseconds or seconds over a fixed period of time. For example, 95th percentile response time of 1 second calculated on a daily basis. An important thing to note is that this metric needs to be monitored and published from the API gateways. This metric can show a lot of variances if measured from the last mile (API consumer’s end), since network delays will add up. DocumentationAPI documentation can be defined as a set of instructions on how to effectively use an API, specifically written for developers. It can be thought of as a reference manual containing all the information to work with the API, such as authentication/authorization, input/output payloads, headers and parameters. There are several API description formats available such as Swagger Open API specification and RAML. I have found Stripe API specification to be one of the best examples of API documentation. Code Samples/SDKUnless we provide client libraries or SDKs to our clients for integration with our APIs, there is no need to provide code samples. However, we need to provide comprehensive examples of calling our APIs using curl commands, which include authentication, query params, headers and payload. We MUST also provide complete JSON object examples for input/output. Here too, it is worth emulating Stripe. Testbeds/SandboxSandbox environment helps consumers of the APIs to test their integrations and validate their application integration and use case flows, before deploying it on their production environment. DeprecationAn API can be in one of the following states.
We should strive to provide our customers with stable APIs. If we need to discontinue or remove features from an API, we should provide a notice of at least 60 days to the API consumers providing the following information
Documentation of each API should carry the current status of the API. |
Web Development >