Skip to main content

What developer docs can we help you find?

Read our latest developer blogs

User-ID / EDL … better both of them

User-ID / EDL … better both of them

By: Xavier Homs


User-ID or EDL … why not both?

Original Photo by Jon Tyson

User-ID and External Dynamic Lists (EDL’s) are probably the most commonly used PAN-OS features to share external IP metadata with the NGFW. The Use cases for them are enormous: from blocking known DoS sources to forward traffic from specific IP addresses on low latency links.

Let’s take for instance the following logs generated by the SSH daemon in a linux host.

https://medium.com/media/6a8bdd451748be2a7326a77029a6a200/href

Won’t it be great to have a way to share the IP addresses used by the user xhoms with the NGFW so specific security policies are applied to traffic sourced by that address just because we now know the endpoint is being managed by that user?

The following Shell CGI script could be used to create an External Dynamic List (EDL) that would list all known IP addresses the account xhoms was used to login into this server.

https://medium.com/media/ec08fd38c48c8a005909d5b502e521cc/href

That is great but:

  • When would addresses be removed from the list? At the daily log rollout?
  • Are all listed IP addresses still being operated by the user xhoms?
  • Is there any way to remove addresses from the list at user logout?

To overcome these (and many other) limitations Palo Alto Networks NGFW feature a very powerful IP tagging API called User-ID. Although EDL and User-ID cover similar objectives there are fundamental technical differences between them:

  • User-ID is “asynchronous” (push mode), supports bot “set” and “delete” operations and provides a very flexible way to create groupings. Either by tagging address objects (Dynamic Address Group — DAG ) or by mapping users to addresses and then tagging these users (Dynamic User Group — DUG). User-ID allows for these tags to have a timeout in order for the PAN-OS device to take care of removing them at their due time.
  • EDL is “universal”. Almost all network appliance vendors provide a feature to fetch an IP address list from a URL.

Years ago it was up to the end customer to create its own connectors. Just like the CGI script we shared before. But as presence of PAN-OS powered NGFW’s grown among enterprise customers more application vendors decided to leverage the User-ID value by providing API clients out-of-the-box. Although this is great (for Palo Alto Network customers) losing the option to fetch a list from a URL (EDL mode) makes it more difficult to integrate legacy technologies that might be out there still in the network.

Creating EDL’s from User-ID enabled applications

Let’s assume you have an application that features a PAN-OS User-ID API client and that is capable of pushing log entries to the PAN-OS NGFW in an asynchronous way. Let’s assume, as well, that there are still some legacy network devices in your network that need to fetch that address metadata from a URL. How difficult it would be to create a micro-service for that?

One option would be to leverage a PAN-OS XML SDK like PAN Python or PAN GO and to create a REST API that extracts the current state from the PAN-OS device using its operations API.

GET https://<pan-os-device>/api
?key=<API-KEY>
&type=op
&cmd=<show><object><registered-ip><all></all></registered-ip></object></show>
HTTP/1.1 200 OK
<response status="success">
<result>
<entry ip="10.10.10.11" from_agent="0" persistent="1">
<tag>
<member>tag10</member>
</tag>
</entry>
<entry ip="10.10.10.10" from_agent="0" persistent="1">
<tag>
<member>tag10</member>
</tag>
</entry>
<count>2</count>
</result>
</response>

It shouldn’t be that difficult to parse the response and provide a plain list out of it, right? Come back to me if you’re thinking on using a XSLT processor to pipe it at the output of a cURL command packing everything as a CGI script because we might have some common grounds ;-)

I’d love to propose a different approach though: to hijack User-ID messages by implementing a micro-service that behaves as a reverse proxy between the application that features the User-ID client and the PAN-OS device. I like this approach because it opens the door to other interesting use cases like enforcing timeout (adding timeout to entries that do not have it), converting DAG messages into DUG equivalents or adding additional tags based on the source application.

Components for a User-ID to EDL reverse proxy

The ingredients for such a receipt would be:

  1. a light http web server featuring routing (hijack messages sent to /api and let other requests pass through). GO http package and the type ReverseType fits like a glove in this case.
  2. a collection of XML-enabled GO structs to unmarshal captured User-ID messages.
  3. a library featuring User-ID message payload processing capable of keeping a valid (non-expired) state of User-to-IP, Group-to-IP and Tag-to-IP maps.

Fortunately for us there is the xhoms/panoslib GO module that covers 2 and 3 (OK, let’s be honest, I purpose-built the library to support this article)

With all these components it won’t be that difficult to reverse proxy a PAN-OS https service monitoring User-ID messages and exposing the state of the corresponding entries on a virtual endpoint named /edl that won’t conflict at all with the PAN-OS schema.

I wanted to prove the point and ended up coding the receipt as a micro-service. You can either check the code in the the GitHub repository xhoms/uidmonitor or run it from a Docker-enabled host. Take a look to the related panos.pan.dev tutorial for insights in the source code rationale

docker run --rm -p 8080:8080 -e TARGET=<pan-os-device> ghcr.io/xhoms/uidmonitor

Some payload examples

To get started you can check the following User-ID payload that registers the tag test to the IP addresses 10.10.10.10 and 10.10.20.20. Notice the different timeout values (100 vs 10 seconds)

POST http://127.0.0.1:8080/api/ HTTP/1.1
Content-Type: application/x-www-form-urlencoded
key=<my-api-key>
&type=user-id
&cmd=<uid-message>
<type>update</type>
<payload>
<register>
<entry ip="10.10.10.10">
<tag>
<member timeout="100">test</member>
</tag>
</entry>
<entry ip="10.10.20.20">
<tag>
<member timeout="10">test</member>
</tag>
</entry>
</register>
</payload>
</uid-message>

Just after pushing the payload (before the 10 second tag expires) perform the following GET request on the /edl endpoint and verify the provided list contains both IP addresses.

GET http://127.0.0.1:8080/edl/
?list=tag
&key=test
HTTP/1.1 200 OK
Content-Type: text/plain
Date: Mon, 12 Apr 2021 10:07:50 GMT
Content-Length: 24
Connection: close
10.10.20.20
10.10.10.10

A few seconds (+10) later the same transaction should return only the IP address whose tag has not expired.

GET http://127.0.0.1:8080/edl/
?list=tag
&key=test
Content-Type: text/plain
Date: Mon, 12 Apr 2021 10:08:55 GMT
Content-Length: 12
Connection: close
10.10.10.10

Now let’s use a bit more complex payload. The following one register (login) two users mapped to IP addresses and then apply the user tag admin to both of them. As in the previous case, each tag has a different expiration value.

POST http://127.0.0.1:8080/api/ HTTP/1.1
Content-Type: application/x-www-form-urlencoded
key=<my-api-key>
&type=user-id
&cmd=<uid-message>
<type>update</type>
<payload>
<login>
<entry name="bar@test.local" ip="10.10.20.20" timeout="100"></entry>
<entry name="foo@test.local" ip="10.10.10.10" timeout="100"></entry>
</login>
<register-user>
<entry user="foo@test.local">
<tag>
<member timeout="100">admin</member>
</tag>
</entry>
<entry user="bar@test.local">
<tag>
<member timeout="10">admin</member>
</tag>
</entry>
</register-user>
</payload>
</uid-message>

Although the login transaction has a longer timeout on both cases, the group membership tag for the user “bar@test.local” is expected to timeout in 10 seconds.

Calling the /edl endpoint at specific times would demonstrate the first tag expiration.

GET http://127.0.0.1:8080/edl/
?list=group
&key=admin
HTTP/1.1 200 OK
Content-Type: text/plain
Date: Mon, 12 Apr 2021 10:28:16 GMT
Content-Length: 24
Connection: close
10.10.20.20
10.10.10.10
...
GET http://127.0.0.1:8080/edl/
?list=group
&key=admin

HTTP/1.1 200 OK
Content-Type: text/plain
Date: Mon, 12 Apr 2021 10:28:32 GMT
Content-Length: 12
Connection: close
10.10.10.10

Summary

Both EDL (because it provides “universal” coverage) and User-ID (because its advanced feature set) have their place in a large network. Implementing a reverse proxy between User-ID enabled applications and the managed PAN-OS device capable of hijacking User-ID messages have many different use cases: from basic User-ID to EDL conversion (as in the demo application) to advanced cases like applying policies to User-ID messages, enforcing maximum timeouts or transposing Address Groups into User Groups.


User-ID / EDL … better both of them was originally published in Palo Alto Networks Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Ingest PAN-OS Alerts into Cortex XDR Pro Endpoint

Ingest PAN-OS Alerts into Cortex XDR Pro Endpoint

By: Amine Basli


Photo by callmefred.com
In this blog I’m sharing my experience leveraging Cortex XDR API’s to tailor-fit this product into specific customer requirements.

I bet that if you’re a Systems Engineer operating in EMEA you’ve argued more than once about the way products and licenses are packaged from an opportunity size point of view (too big for your customer base). There are many reasons that explain why a given vendor would like to have a reduced set of SKUs. But, no doubt, the fewer SKUs you have the lower the chances are for a given product to fit exactly into the customer’s need from a sizing standpoint.

At Palo Alto Networks we’re no different, but we execute on two principles that help bridge the gaps:

  • Provide APIs for our products
  • Encourage the developer community to use them through our Developer Relations team

The use case

A Cortex XDR Pro Endpoint customer was interested in ingesting threat logs from their PAN-OS NGFW into his tenant to stitch them with the agent’s alerts and incidents. The customer was aware that this requirement could be achieved by sharing the NGFW state into the cloud-based Cortex Data Lake. But the corresponding SKU was overkill from a feature point of view (it provides not only alert stitching but ML baselining of traffic behaviour as well) and an unreachable cost from a budgeting perspective.

Cortex XDR Pro provides a REST API to ingest third-party alerts to cover this specific use case. It is rate limited to only 600 alerts per minute per tenant but was more than enough for my customer because they were only interested in medium to critical alerts that appeared at a much lower frequency in their network. What about leveraging this API to provide an exact match to my customer’s requirement?

On the other end, PAN-OS can be configured to forward these filtered alerts natively to any REST API with a very flexible payload templating feature. So at first it looked like we could “connect” the PAN-OS HTTP Log Forwarding feature with the Cortex XDR Insert Parsed Alert API. But a deep dive analysis revealed inconsistencies between the mandatory timestamp field format (it must be presented as UNIX milliseconds for XDR to accept it)

It was too close to give up. I just needed a transformation pipeline that could be used as a middleman between the PAN-OS HTTP log forwarding feature and the Cortex XDR Insert Parsed Alert API. Provided I was ready for a middleman, I’d like it to enforce the XDR API quota limitations (600 alerts per minute / up to 60 alerts per update)

Developers to the rescue

I engaged our Developer Relations team in Palo Alto Networks because they’re always eager to discuss new use cases for our APIs. A quick discussion ended up with the following architecture proposal:

  • An HTTP server would be implemented to be used as the endpoint for PAN-OS alert ingestion. Basic authentication would be implemented by providing a pre-shared key in the Authentication header.
  • A transformation pipeline would convert the PAN-OS payload into a ready-to-consume XDR parsed alert API payload. The pipeline would take care, as well, to enforce the XDR API quota limits buffering bursts as needed.
  • The XDR API client implementation would use an Advanced API Key for authentication.
  • Everything would be packaged in a Docker container to ease its deployment. Configuration parameters would be provided to the container as environment variables.

Implementation details ended up shared in a document available in cortex.pan.dev. If you’re in the mood of creating your own implementation I highly recommend you taking the time to go over the whole tutorial. If you just want to get to the point then you can use this ready-to-go container image.

Let’s get started

I opted for using the available container image. Let me guide you through my experience using it to fulfil my customer’s request.

First of all the application requires some configuration data that must be provided as a set of environmental variables. A few of them are mandatory, others are just optional with default values.

The following are the required variables (the application will refuse to start without them)

  • API_KEY: XDR API Key (Advanced)
  • API_KEY_ID: The XDR API Key identifier (its sequence number)
  • FQDN: Fully Qualified Domain Name of the corresponding XDR Instance (i.e. myxdr.xdr.us.paloaltonetworks.com)

The following are optional variables

  • PSK: the server will check the value in the Authorization header to accept the request (default to no authentication)
  • DEBUG: if it exists then the engine will be more verbose (defaults to false)
  • PORT: TCP port to bind the HTTP server to (defaults to 8080)
  • OFFSET: PAN-OS timestamp does not include time zone. By default, they will be considered in UTC (defaults to +0 hours)
  • QUOTA_SIZE: XDR ingestion alert quota (defaults to 600)
  • QUOTA_SECONDS: XDR ingestion alert quota refresh period (defaults to 60 seconds)
  • UPDATE_SIZE: XDR ingestion alert max number of alerts per update (defaults to 60)
  • BUFFER_SIZE: size of the pipe buffer (defaults to 6000 = 10 minutes)
  • T1: how often the pipe buffer polled for new alerts (defaults to 2 seconds)

For more details, you can check public repository documentation

Step 1: Generate an Advanced API key on Cortex XDR

Connect to your Cortex XDR instance and navigate to Setting > API Keys Generate an API Key of type Advanced granting the Administrator role to it (that role is required for Alert ingestion)

Step 2: Run the container image

Assuming you have Docker installed on your computer the following command line command would pull the image and run the micro-service with configuration options passed as environmental variables.

docker run -rm -p 8080:8080 -e PSK=hello -e FQDN=xxx.xdr.us.paloaltonetworks.com -e API_KEY=<my-api-key> -e API_KEY_ID=<my-key-id> -e DEBUG=yes ghcr.io/xhoms/xdrgateway

The Debug option provides more verbosity and it is recommended for initial experimentation. If everything goes as expected you’ll see a log message like the following one.

2021/03/17 21:32:25 starting http service on port 8080

Step 3: Perform a quick test to verify the micro-service is running

Use any API test tool (I like good-old Curl) to push a test payload.

curl -X POST -H "Authorization: hello" "http://127.0.0.1:8080/in" -d '{"src":"1.2.3.4","sport":1234,"dst":"4.3.2.1","dport": 4231,"time_generated":"2021/01/06 20:44:34","rule":"test_rule","serial":"9999","sender_sw_version":"10.0.4","subtype":"spyware","threat_name":"bad-bad-c2c","severity":"critical","action":"alert"}---annex---"Mozi Command and Control Traffic Detection"'

You should receive an empty status 200 response and see log messages in the server console like the following ones:

2021/03/17 21:52:16 api - successfully parsed alert
2021/03/17 21:52:17 xdrclient - successful call to insert_parsed_alerts

Step 4: Configure HTTP log Forward on the Firewall:

PAN-OS NGFW can forward alerts to HTTP/S endpoints. The feature configuration is available in Device > Server Profiles > HTTP and, for this case, you should use the following parameters

  • IP : IP address of the container host
  • Protocol : HTTP
  • Port: 8080(default)
  • HTTP Method POST

Some notes regarding the payload format

  • URI Format:“/in” is the endpoint where the micro-service listen for POST requests containing new alerts
  • Headers: Notice we’re setting the value hello in the Authorization header to match the -e PSK=hello configuration variable we passed to the micro-service
  • Payload: The variable $threat_name was introduced with PAN-OS 10.0. If you’re using older versions of PAN-OS the you can use the variable $threatid instead

The last action to complete this job is to create a new Log Forwarding profile that will consume our recently created HTTP server and to use it in all security rules we want their alerts being ingested into XDR. The configuration object is available at Objects > Log Forwarding

Step 5: Final checks

As your PAN-OS NGFW starts generating alerts, you should see activity in the micro-service’s output log and alerts being ingested into the Cortex XDR instance. The source tag Palo Alto Networks — PAN-OS clearly indicates these alerts were pushed by our deployment.

Production-ready version

There a couple of additional steps to perform before considering the micro-service production-ready.

  • The service should be exposed over a secure channel (TLS). Best option is to leverage the preferred forward-proxy environment available in our container environment. Notice the image honours the PORT env variable and should work out-of-the-box almost everywhere
  • A restart-policy should be in place to restart the container in case of a crash. It is not a good idea to run more than one instance in a load balancing group because the quota enforcement won’t be synchronised between them

Summary

Palo Alto Networks products and services are built with automation and customisation in mind. They feature rich API’s that can be used to tailor them to specific customer needs.

In my case these APIs provided the customer with a choice:

  • Either wait for the scope of the project (and its budget) to increase in order to accommodate additional products like Cortex Network Traffic Analysis or
  • Deploy a compact “middle-man” that would connect the PAN-OS and Cortex XDR API’s

This experience empowered me by acquiring DevOps knowledge (building micro-services) I’m sure I’ll use in many opportunities to come. Special thanks to The Developer Relations team at Palo Alto Networks who provided documentation and examples, and were eager to explore this new use case. I couldn’t have done it without their help and the work they’ve been putting into improving our developer experience.


Ingest PAN-OS Alerts into Cortex XDR Pro Endpoint was originally published in Palo Alto Networks Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Enterprise API design practices: Part 4

Enterprise API design practices: Part 4

By: Francesco Vigo


Photo by Andrés Dallimonti on Unsplash

Welcome to the last part of my series on creating valuable, usable and future-proof Enterprise APIs. I initially covered some background context and the importance of design, then I presented security and backend protection and in the third chapter I explored optimizations and scale; this post is about monitoring and Developer Experience (DX).

5. Monitor your APIs

Don’t fly blind: you must make sure that your APIs are properly instrumented so you can monitor what’s going on. This is important for a lot of good reasons, such as:

  • Security: is your service under attack or being used maliciously? You should always be able to figure out if there are anomalies and react swiftly to prevent incidents.
  • Auditing: depending on the nature of your business, you might be required by compliance or investigative reasons to produce an audit trail of user activity of your product, which requires proper instrumentation of your APIs as well as logging events.
  • Performance and scale: you should be aware of how your API and backend are performing and if the key metrics are within the acceptable ranges, way before your users start complaining and your business is impacted. It’s always better to optimize before performance becomes a real problem.
  • Cost: similarly, with proper instrumentation, you can be aware of your infrastructure costs sustained to serve a certain amount of traffic. That can help you with capacity planning and cost modeling if you’re monetizing your service. And to avoid unexpected bills at the end of the month that might force you to disrupt the service.
  • Feedback: with proper telemetry data to look at when you deploy a change, you can understand how it’s performing and whether it was a good idea to implement it. It also allows you to implement prototyping techniques such as A/B testing.
  • Learn: analyzing how developers use your API can be a great source of learning. It will give you useful insights on how your service is being consumed and is a valuable source of ideas that you can evaluate for new features (i.e. new use-case driven API endpoints).

Proper instrumentation of services is a vast topic and here I just want to summarize a few items that are usually easy to implement:

  • Every API request should have a unique operation identifier that is stored in your logs and can help your ops team figure out what happened. This identifier should also be reported back to clients somewhere (usually in API response the headers) especially, but not exclusively, in case of errors.
  • Keep an eye on API requests that fail with server errors (i.e. the HTTP 5xx ones) and, if the number is non-negligible, try to pinpoint the root cause: is it a bug, or some request that is causing timeouts in the backend? Can you fix it or make it faster?
  • Keep a reasonable log history to allow tracking errors and auditing user activity for at least several days back in time.
  • Create dashboards that help you monitor user activity, API usage, security metrics, etc. And make sure to check them often.

6. Make Developer Experience a priority

Last principle, but definitely not the least important. Even if you are lucky and the developers that use your APIs are required to do so because of an external mandate, you shouldn’t make their experience less positive.

In fact, the Developer Experience of your product should be awesome.

If developers properly understand how to work with your APIs, and enjoy doing so, they will likely run into fewer issues and be more patient when trying to overcome them. They will also provide you with good quality feedback. It will reduce the number of requests they generate on your support teams. They will give you insights and use case ideas for building a better product.

And, more importantly, happy developers will ultimately build better products for their own customers which, in turn, will act as a force multiplier for the value and success of your own product. Everybody wins in the API Economy model: customers, business stakeholders, partners, product teams, engineers, support teams.

I’ve witnessed several situations where developers were so exhausted and frustrated with working against a bad API that, as soon as they saw the light at the end of the tunnel, they stopped thinking creatively and just powered their way through to an MVP that was far from viable and valuable. But it checked the box they needed so they were allowed to move on. I consider this a very specific scenario of developer fatigue: let’s call it “API Fatigue”.

Luckily, I’ve also experienced the other way around, where new, unplanned, great features were added because the DX was good and we had fun integrating things together.

There are many resources out there that describe how to make APIs with great developer experience: the most obvious one is to create APIs that are clear and simple to consume. Apply the Principle of least astonishment.

I recommend considering the following when shipping an API:

  • Document your API properly: it’s almost certain that the time you spend creating proper documentation at the beginning is saved later on when developers start consuming it. Specification files, such as OpenAPI Specification (OAS) tremendously help here. Also, if you follow the design-first approach that we discussed in the first post of this series, you’ll probably already have a specification file ready to use.
  • Support different learning paths: some developers like to read all the documentation from top to bottom before writing a single line of code, others will start with the code editor right away. Instead of forcing a learning path, try to embrace the different mindsets and provide tools for everyone: specification files, Postman collections, examples in different programming languages, an easy-to-access API sandbox (i.e. something that doesn’t require installing an Enterprise product and waiting 2 weeks to get a license to make your first API call), a developer portal, tutorials and, why not, video walkthroughs. This might sound overwhelming but pays off in the end. With the proper design done first a lot of this content can be autogenerated from the specification files.
  • Document the data structure: if you can, don’t just add lists of properties to your specs and docs: strive to provide proper descriptions of the fields that your API uses so that developers that are not fully familiar with your product can understand them. Remove ambiguity from the documentation as much as possible. This can go a long way, as developers can make mental models by understanding the data properly that often leads to better use cases than “just pull some data from this API”.
  • Use verbs, status codes, and error descriptions properly: leverage the power of the protocol you are using (i.e. HTTP when using REST APIs) to define how to do things and what responses mean. Proper usage of status codes and good error messages will dramatically reduce the number of requests to your support team. Developers are smart and want to solve problems quickly so if you provide them with the right information to do so, they won’t bother you. Also, if you are properly logging and monitoring your API behavior, it will be easier for your support team to troubleshoot if your errors are not all “500 Internal Server Error” without any other detail.

Finally, stay close to the developers: especially if your API is being used by people external from your organization, it’s incredibly important to be close to them as much as you can to support them, learn and gather feedback on your API and its documentation. Allow everyone that is responsible for designing and engineering your API to be in that feedback loop, so they can share the learnings. Consider creating a community where people can ask questions and can expect quick answers (Slack, Reddit, Stack Overflow, etc.). I’ve made great friendships this way!

A few examples

There are many great APIs out there. Here is a small, not complete, list of products from other companies that, for one reason or another, are strong examples of what I described in this blog series:

Conclusion

And that’s a wrap, thanks for reading! There are more things that I’ve learned that I might share in future posts, but in my opinion, these are the most relevant ones. I hope you found this series useful: looking forward to hearing your feedback and comments!


Enterprise API design practices: Part 4 was originally published in Palo Alto Networks Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Enterprise API design practices: Part 3

Enterprise API design practices: Part 3

By: Francesco Vigo


Photo by Youssef Abdelwahab on Unsplash

Welcome to the third part of my series on creating valuable, usable, and future-proof Enterprise APIs. Part 1 covered some background context and the importance of design, while the second post was about security and backend protection; this chapter is on optimization and scale.

3. Optimize interactions

When dealing with scenarios that weren’t anticipated in the initial release of an API (for example when integrating with an external product from a new partner), developers often have to rely on data-driven APIs to extract information from the backend and process it externally. While use-case-driven APIs are generally considered more useful, sometimes there might not be one available that suits the requirements of the novel use case that you must implement.

By considering the following guidelines when building your data-driven APIs, you can make them easier to consume and more efficient for the backend and the network, improving performance and reducing the operational cost (fewer data transfers, faster and cheaper queries on the DBs, etc.).

I’ll use an example with sample data. Consider the following data as a representation of your backend database: an imaginary set of alertsthat your Enterprise product detected over time. Instead of just three, imagine the following JSON output with thousands of records:

{
"alerts" : [
{
"name": "Impossible Travel",
"alert_type": "Behavior",
"alert_status": "ACTIVE",
"severity": "Critical",
"created": "2020-09-27T09:27:33Z",
"modified": "2020-09-28T14:34:44Z",
"alert_id": "8493638e-af28-4a83-b1a9-12085fdbf5b3",
"details": "... long blob of data ...",
"evidence": [
"long list of stuff"
]
},
{
"name": "Malware Detected",
"alert_type": "Endpoint",
"alert_status": "ACTIVE",
"severity": "High",
"created": "2020-10-04T11:22:01Z",
"modified": "2020-10-08T08:45:33Z",
"alert_id": "b1018a33-e30f-43b9-9d07-c12d42646bbe",
"details": "... long blob of data ...",
"evidence": [
"long list of stuff"
]
},
{
"name": "Large Upload",
"alert_type": "Network",
"alert_status": "ACTIVE",
"severity": "Low",
"created": "2020-11-01T07:04:42Z",
"modified": "2020-12-01T11:13:24Z",
"alert_id": "c79ed6a8-bba0-4177-a9c5-39e7f95c86f2",
"details": "... long blob of data ...",
"evidence": [
"long list of stuff"
]
}
]
}

Imagine implementing a simple /v1/alerts REST API endpoint to retrieve the data and you can’t anticipate all the future needs. I recommend considering the following guidelines:

  • Filters: allow your consumers to reduce the result set by offering filtering capabilities in your API on as many fields as possible without stressing the backend too much (if some filters are not indexed it could become expensive, so you must find the right compromise). In the example above, good filters might include: name, severity, alert_type, alert_status, created, and modified. More complicated fields like details and stack might be too expensive for the backend (as they might require full-text search) and you would probably leave them out unless really required.
  • Data formats: be very consistent in how you present and accept data across your API endpoints. This holds true especially for types such as numbers and dates, or complex structures. For example, to represent integers in JSON you can use numbers (i.e. "fieldname": 3) or strings (i.e. "fieldname": "3" ): no matter what you choose, you need to be consistent across all your API endpoints. And you should also use the same format when returning outputs and accepting inputs.
  • Dates: dates and times can be represented in many ways: timestamps (in seconds, milliseconds, microseconds), strings (ISO8601 such as in the example above, or custom formats such as 20200101), with or without time zone information. This can easily become a problem for the developers. Again, the key is consistency: try to accept and return only a single date format (i.e. timestamp in milliseconds or ISO8601) and be explicit about whether you consider time zones or not: usually choosing to do everything in UTC is a good idea because removes reduce ambiguity. Make sure to document the date formats properly.
  • Filter types: depending on the type of field, you should provide appropriate filters, not just equals. A good example is supporting range filters for dates that, in our example above, allow consumers to retrieve only the alerts created or modified in a specific interval. If some fields are enumerators with a limited number of possible values, it might be useful to support a multi-select filter (i.e. IN): in the example above it should be possible to filter by severity values and include only the High and Critical values using a single API call.
  • Sorting: is your API consumer interested only in the older alerts or the newest? Supporting sorting in your data-driven API is extremely important. One field to sort by is generally enough, but sometimes (depending on the data) you might need more.
  • Result limiting and pagination: you can’t expect all the entries to be returned at once (and your clients might not be interested or ready to ingest all of them anyway), so you should implement some logic where clients should retrieve a limited number of results and can get more when they need. If you are using pagination, clients should be able to specify the page size within a maximum allowed value. Defaults and maximums should be reasonable and properly documented.
  • Field limiting: consider whether you really need to return all the fields of your results all the time, or if your clients usually just need a few. By letting the client decide what fields (or groups of fields) your API should return, you can reduce the network throughput and backend cost, and performance. You should provide and document some sane default. In the example above, you could decide to return by default all the fields except details and evidence, which can be requested by the client only if they explicitly ask, using an include parameter.

Let’s put it all together. In the above example, you should be able to retrieve, using a single API call, something like this:

Up to 100 alerts that were created between 2020–04–01T00:00:00Z (April 1st 2020) and 2020–10–01T00:00:00Z (October 1st 2020) with severity “medium” or “high”, sorted by “modified” date, newest first, including all the fields but “evidence”.

There are multiple ways you can implement this; through REST, GraphQL, or custom query languages: in many cases, you don’t need something too complex as often data sets are fairly simple. The proper way depends on many design considerations that are outside the scope of this post. But having some, or most of these capabilities in your API will make it better and more future proof. By the way, if you’re into GraphQL, I recommend reading this post.

4. Plan for scale

A good API shouldn’t be presumptuous: it shouldn’t expect that clients are not doing anything else than waiting for its response, especially at scale, where performance is key.

If your API requires more than a few milliseconds to produce a response, I recommend considering supporting jobs instead. The logic can be as follows:

  • Implement an API endpoint to start an operation that is supposed to take some time. If accepted, it would return immediately with a jobId.
  • The client stores the jobId and periodically reaches out to a second endpoint that, when provided with the jobId, returns the completion status of the job (i.e. running, completed, failed).
  • Once results are available (or some are), the client can invoke a third endpoint to fetch the results.

Other possible solutions include publisher/subscriber approaches or pushing data with webhooks, also depending on the size of the result set and the speed requirements. There isn’t a one-size-fits-all solution, but I strongly recommend avoiding a polling logic where API clients are kept waiting for the server to reply while it’s running long jobs in the backend.

If your need high performance and throughput when in your APIs, consider gRPC, as its binary representation of data using protocol buffers has significant speed advantages over REST.

Side note: if you want to learn more about REST, GraphQL, webhooks, or gRPC use cases, I recommend starting from this post.

Finally, other considerations for scale include supporting batch operations on multiple entries at the same time (for example mass updates), but I recommend considering them only when you have a real use case in mind.

What’s next

In the next and last chapter, I’ll share some more suggestions about monitoring and supporting your API and developer ecosystem.


Enterprise API design practices: Part 3 was originally published in Palo Alto Networks Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Enterprise API design practices: Part 2

Enterprise API design practices: Part 2

By: Francesco Vigo


Photo by Dominik Vanyi on Unsplash

Welcome to the second part of my series on creating valuable, usable and future-proof Enterprise APIs. In the first post, I covered some background context and the importance of design; this chapter is about security and protecting your backend.

1. Secure your APIs

Most of the projects I’ve been involved with revolve around Cybersecurity, but I don’t consider myself biased when I say that security should be a top priority for every Enterprise API. Misconfigurations and attacks are common and you should go long ways to protect your customers and their data.

Besides the common table stakes (i.e. use TLS and some form of authentication), when it comes to API security I recommend considering at least the following:

  • API Credentials: OAuth 2 should lead the way but, especially for on-premise products, API Keys are still commonly used. In any case, you should support multiple sets of API credentials in your product to allow for more granularity and proper auditing and compliance. For example, every individual service (or user) that consumes your API should have a dedicated set of credentials, which will make it easier to track down from the audit logs who/what performed an operation and apply proper rate limits.
  • API RBAC (Role-Based Access Control): another important reason to support multiple sets of API credentials is the ability to assign different permissions to each set. This way your customers won’t give administrative privileges to a set of credentials used, for example, to perform only read-only routine operations. Having RBAC is not enough: you need to guide your customers to use it properly: a good solution is to enforce a flow in your product UI that discourages users from creating API credentials with unnecessary permissions (assigning permissions in an additive way rather than subtractive, for example).
  • Credentials shouldn’t last forever: it’s a good practice to create sets of API credentials that expire at some point or require a manual reauthorization. For how long a credential should stay valid is a good question, as you want to make sure that a business process doesn’t fail because someone forgot to reauthorize something that was set up 6 months ago. There are ways to mitigate this risk in your product UI. I recommend showing a list of the API credential sets (without showing the actual secrets!), when they expire, and how long ago they were used for the last time. This will help customers identify and delete stale credentials and refresh expiring ones.

2. Protect your backend

Don’t trust your API users: they are likely to break things when given a chance! Not maliciously, of course, but there are so many things that can go wrong when you grant external users/applications access to the heart of your product through an API. And yes, bad guys and Denial of Service attacks exist too.

Similarly to Product UIs and CLI tools that validate user inputs and behavior, your APIs should do it too. But it goes beyond just making sure that all the required fields are present and correctly typed (if an API doesn’t do it properly, I would file a bug to get it fixed): include cost and performance in your design considerations.

Consider the following examples:

  • Example 1: A proprietary API serving data from a Public Cloud database backend: in many cases such backends charge by the amount of data processed in each query. If your API allows consumers to freely search and return unlimited data from the backend, you might end up with an expensive bill.
  • Example 2: An on-premise product with an API that allows users to provide files to be scanned for malicious behavior (i.e. a sandbox). A developer makes a mistake and forgets to add a sleep() in a cycle, hammering the API with tens of requests in a very short amount of time. If the API doesn’t protect the product from this situation, it will likely not just slow down the API service itself, but it can probably have significant performance impacts on the server and all the other services running on it, including your product and its UI.

Both examples are not uncommon for APIs designed exclusively for internal consumption (i.e. when if something goes wrong you can send an IM to the developer that is breaking your API and resolve the problem right away).

You should consider guidelines to mitigate these risk, such as:

  • Shrink the result sets: try to support granularity when reading data from backends, for example by reducing the number of selected columns and slicing the results by time or other values. Set proper limits with sane defaults that don’t overload the backend. More on this in the next post.
  • Rate limits: API rate limits are a practical way to tell your consumers that they need to slow down. Different consumers, identified with their sets of API credentials, should have their own limits, according to their strategic value, service level agreements, or monetization policy. Also, not all requests are equal: for example, some products implement rate limits using tokens. Each consumer has a limited number of tokens that replenish over time (or cost money) and each API call has a token cost that depends on a number of factors (i.e. impact on the backend load or business value of the service being consumed): once you run out of tokens you need to wait or buy more.
  • Communicate errors properly: when hitting rate limits, API consumers should receive a proper error code from the server (i.e. 429 if using HTTP) and the error message should specify how long until new API calls can be accepted for this client: this won’t just improve DX, but will also allow the clients/SDKs to properly adapt and throttle their requests. Imagine that your API is used by a mobile app and they hit the rate limit: specifying how long they should wait before you are ready to accept a new request allows the app developer to show a popup in the UI to tell the end-user how long to wait. This will not just improve your API’s DX, but will empower the app developers to provide a better User Experience (UX) to their customers!

What’s next

In the next chapter, I’ll dive deep into some technical suggestions on data-driven APIs and scale. In the last part of the series, I’ll share some more suggestions about monitoring your API and supporting the developer ecosystem.


Enterprise API design practices: Part 2 was originally published in Palo Alto Networks Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Enterprise API design practices: Part 1

Enterprise API design practices: Part 1

By: Francesco Vigo


Photo by Amy Elting on Unsplash

In this series, I will share some guidelines on creating valuable, usable, and future-proof Enterprise APIs. In this initial post, I will cover some background context and a very important lesson, before getting more technical in parts two, three, and four.

If you’re wondering why this matters, I recommend reading about the API Economy.

In the last few years, thanks to my role as a Principal Solutions Architect at Palo Alto Networks, I’ve dealt with a lot of APIs. Most of these APIs were offered by Enterprise companies, predominately in Cybersecurity. By dealt with I mean that I’ve either written code to integrate with the APIs myself, or I’ve helped other developers do it. During some of these integrations, I had the privilege of discussing issues and potential improvements with the Engineers and Product Managers responsible for product APIs. Meeting with these stakeholders and providing feedback often lead to API improvements and a stronger partnership.

In this blog post, I will share some of the things I’ve learned about good API design that can lead to better business outcomes and fewer headaches for API builders and consumers, including better Developer Experience (DX) and more efficient operations.

There are tons of great resources out there about API best practices (such as the Google Cloud API Design Guide) and this post is not meant to be a comprehensive list, but rather present a set of common patterns that I ran into. Places where I usually find very interesting content are: APIs you won’t hate and Nordic APIs.

Background context

Before sharing the lessons I’ve learned, let me explain the context where most of my experience is based upon:

  • B2B: Enterprise on-premise or SaaS products integrating with other Enterprise products (by partners) or custom applications/scripts (by customers).
  • Top-down: some business mandate requires integrating with a specific product (i.e. because a joint customer was requesting an integration between two vendors), without the developers being able to choose an alternative that provided a better Developer Experience (DX). Sometimes we really had to bite the bullet and power through sub-optimal DX.

And in most cases we had the following goals and constraints:

  • Speed: build something quickly and move on to the next project. That’s how many Enterprise companies work when it comes to technology partnerships. While this doesn’t sound ideal, there are several good business reasons: sales objection handling, customer acquisition, marketing, etc. You know that only a few ideas can be successful: similarly, many product integrations don’t move the needle, so it’s good to test many hypotheses quickly and invest properly only in the few that are worth it.
  • Build with what we have: because of the above requirement, we rarely had the opportunity to request API or Product changes before we had to ship something.
  • Create value: doing things quickly doesn’t mean you need to check a box that provides no value to customers. Even when the integration use cases were limited in scope, we always had an outcome in mind that could benefit some end-user.
  • Performance: don’t create things that create performance issues on the APIs or, even worse, on the backing products. Or that could end up in significantly higher public cloud spending because they trigger expensive, non-optimized backend queries. APIs should protect themselves and the backing products from misuse but, in my experience, it’s not always the case, especially with on-premise products.

While I believe that a different context could lead to different conclusions, I think that the guidelines that follow are valid for most scenarios. Let’s begin with the most disruptive one.

0. Design first (if you can… but you really should)

Although it might sound like preaching to the choir, my first recommendation is to adopt the design-first approach when creating APIs. This also (especially!) holds true for Enterprise products even though it may sound hard because APIs are too often not perceived as first-class components of a product (or as products by themselves). I’ve seen a lot of different scenarios, including:

  • APIs created to decouple the client logic from the server backend, without thinking about any other consumer than the product UI itself (a good example of a great idea made much less effective).
  • APIs that basically just exposed the underlying database with some authentication in front of it.
  • APIs created only to satisfy a very narrow/specific feature request coming from a customer and evolved in the same, very tactical, way, feature by feature.
  • APIs meant to be internal-only that were later exposed to external consumers because of a business request. By the way, this breaks the so-called Bezos API Mandate and is usually a primary cause of technical debt.

The examples above often performed poorly when we tried to consume those APIs in real-world integration use cases, leading to poor DX and reliance on client-side logic to work around the problems, which many times translated into lower quality results and developer fatigue (more on this in part 4).

But such approaches aren’t necessarily wrong: arguably you can’t really know all the potential use cases of an API when you ship its first iteration and you gotta start somewhere but, if you understand that you don’t know what you don’t know, you can at least mitigate the risks. And the easiest and cheapest way to mitigate risks is to do it earlier on, hence I strongly recommend to embrace a design-first approach when creating APIs.

However, if you really want or are forced to follow a code-first approach, you can still apply most of the guidelines presented in this series as you develop your APIs.

What’s next

In the next post, I’ll cover some security and backend considerations. In part three I’ll dive into some more technical considerations about data-driven APIs and scale. Finally, in the last chapter, I’ll present some more guidelines about monitoring and supporting your API and developer ecosystem.


Enterprise API design practices: Part 1 was originally published in Palo Alto Networks Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.