The home of developer docs at
Explore our Developer Docs
Network Security
Learn how to make the most of the PAN-OS APIs, SDKs, Expedition, Terraform, Ansible, and more.
Cortex Data Lake
Cloud-Delivered Security Services
Expedition
Secure Access Service Edge
Discover Prisma SASE APIs, including Prisma Access and Prisma SD-WAN.
Prisma SASE
Prisma Access Configuration
Cloud Native Security
Discover the APIs, tools and techniques necessary for bringing DevOps practices to the cloud.
Cloud Security Posture Management
Cloud Workload Protection Platform
Security Operations
Browse reference docs, tutorials, the XSOAR Marketplace and more.
Explore our Partner Tools
Palo Alto Networks as Code with Terraform
Hashicorp's Terraform is widely used to build and deploy infrastructure, safely and efficiently, with high levels of automation and integration.
Ansible at Palo Alto Networks
The collection of Ansible modules for PAN-OS has been officially certified by the Red Hat Ansible team (list of Ansible certified content) since version 2.12.2.
Read our latest Developer Blogs

Ingest PAN-OS Alerts into Cortex XDR Pro Endpoint
By: Amine Basli

In this blog I’m sharing my experience leveraging Cortex XDR API’s to tailor-fit this product into specific customer requirements.
I bet that if you’re a Systems Engineer operating in EMEA you’ve argued more than once about the way products and licenses are packaged from an opportunity size point of view (too big for your customer base). There are many reasons that explain why a given vendor would like to have a reduced set of SKUs. But, no doubt, the fewer SKUs you have the lower the chances are for a given product to fit exactly into the customer’s need from a sizing standpoint.
At Palo Alto Networks we’re no different, but we execute on two principles that help bridge the gaps:
- Provide APIs for our products
- Encourage the developer community to use them through our Developer Relations team
The use case
A Cortex XDR Pro Endpoint customer was interested in ingesting threat logs from their PAN-OS NGFW into his tenant to stitch them with the agent’s alerts and incidents. The customer was aware that this requirement could be achieved by sharing the NGFW state into the cloud-based Cortex Data Lake. But the corresponding SKU was overkill from a feature point of view (it provides not only alert stitching but ML baselining of traffic behaviour as well) and an unreachable cost from a budgeting perspective.
Cortex XDR Pro provides a REST API to ingest third-party alerts to cover this specific use case. It is rate limited to only 600 alerts per minute per tenant but was more than enough for my customer because they were only interested in medium to critical alerts that appeared at a much lower frequency in their network. What about leveraging this API to provide an exact match to my customer’s requirement?
On the other end, PAN-OS can be configured to forward these filtered alerts natively to any REST API with a very flexible payload templating feature. So at first it looked like we could “connect” the PAN-OS HTTP Log Forwarding feature with the Cortex XDR Insert Parsed Alert API. But a deep dive analysis revealed inconsistencies between the mandatory timestamp field format (it must be presented as UNIX milliseconds for XDR to accept it)
It was too close to give up. I just needed a transformation pipeline that could be used as a middleman between the PAN-OS HTTP log forwarding feature and the Cortex XDR Insert Parsed Alert API. Provided I was ready for a middleman, I’d like it to enforce the XDR API quota limitations (600 alerts per minute / up to 60 alerts per update)
Developers to the rescue
I engaged our Developer Relations team in Palo Alto Networks because they’re always eager to discuss new use cases for our APIs. A quick discussion ended up with the following architecture proposal:
- An HTTP server would be implemented to be used as the endpoint for PAN-OS alert ingestion. Basic authentication would be implemented by providing a pre-shared key in the Authentication header.
- A transformation pipeline would convert the PAN-OS payload into a ready-to-consume XDR parsed alert API payload. The pipeline would take care, as well, to enforce the XDR API quota limits buffering bursts as needed.
- The XDR API client implementation would use an Advanced API Key for authentication.
- Everything would be packaged in a Docker container to ease its deployment. Configuration parameters would be provided to the container as environment variables.
Implementation details ended up shared in a document available in cortex.pan.dev. If you’re in the mood of creating your own implementation I highly recommend you taking the time to go over the whole tutorial. If you just want to get to the point then you can use this ready-to-go container image.
Let’s get started
I opted for using the available container image. Let me guide you through my experience using it to fulfil my customer’s request.
First of all the application requires some configuration data that must be provided as a set of environmental variables. A few of them are mandatory, others are just optional with default values.
The following are the required variables (the application will refuse to start without them)
- API_KEY: XDR API Key (Advanced)
- API_KEY_ID: The XDR API Key identifier (its sequence number)
- FQDN: Fully Qualified Domain Name of the corresponding XDR Instance (i.e. myxdr.xdr.us.paloaltonetworks.com)
The following are optional variables
- PSK: the server will check the value in the Authorization header to accept the request (default to no authentication)
- DEBUG: if it exists then the engine will be more verbose (defaults to false)
- PORT: TCP port to bind the HTTP server to (defaults to 8080)
- OFFSET: PAN-OS timestamp does not include time zone. By default, they will be considered in UTC (defaults to +0 hours)
- QUOTA_SIZE: XDR ingestion alert quota (defaults to 600)
- QUOTA_SECONDS: XDR ingestion alert quota refresh period (defaults to 60 seconds)
- UPDATE_SIZE: XDR ingestion alert max number of alerts per update (defaults to 60)
- BUFFER_SIZE: size of the pipe buffer (defaults to 6000 = 10 minutes)
- T1: how often the pipe buffer polled for new alerts (defaults to 2 seconds)
For more details, you can check public repository documentation
Step 1: Generate an Advanced API key on Cortex XDR
Connect to your Cortex XDR instance and navigate to Setting > API Keys Generate an API Key of type Advanced granting the Administrator role to it (that role is required for Alert ingestion)
Step 2: Run the container image
Assuming you have Docker installed on your computer the following command line command would pull the image and run the micro-service with configuration options passed as environmental variables.
docker run -rm -p 8080:8080 -e PSK=hello -e FQDN=xxx.xdr.us.paloaltonetworks.com -e API_KEY=<my-api-key> -e API_KEY_ID=<my-key-id> -e DEBUG=yes ghcr.io/xhoms/xdrgateway
The Debug option provides more verbosity and it is recommended for initial experimentation. If everything goes as expected you’ll see a log message like the following one.
2021/03/17 21:32:25 starting http service on port 8080
Step 3: Perform a quick test to verify the micro-service is running
Use any API test tool (I like good-old Curl) to push a test payload.
curl -X POST -H "Authorization: hello" "http://127.0.0.1:8080/in" -d '{"src":"1.2.3.4","sport":1234,"dst":"4.3.2.1","dport": 4231,"time_generated":"2021/01/06 20:44:34","rule":"test_rule","serial":"9999","sender_sw_version":"10.0.4","subtype":"spyware","threat_name":"bad-bad-c2c","severity":"critical","action":"alert"}---annex---"Mozi Command and Control Traffic Detection"'
You should receive an empty status 200 response and see log messages in the server console like the following ones:
2021/03/17 21:52:16 api - successfully parsed alert
2021/03/17 21:52:17 xdrclient - successful call to insert_parsed_alerts
Step 4: Configure HTTP log Forward on the Firewall:
PAN-OS NGFW can forward alerts to HTTP/S endpoints. The feature configuration is available in Device > Server Profiles > HTTP and, for this case, you should use the following parameters
- IP : IP address of the container host
- Protocol : HTTP
- Port: 8080(default)
- HTTP Method POST

Some notes regarding the payload format
- URI Format:“/in” is the endpoint where the micro-service listen for POST requests containing new alerts
- Headers: Notice we’re setting the value hello in the Authorization header to match the -e PSK=hello configuration variable we passed to the micro-service
- Payload: The variable $threat_name was introduced with PAN-OS 10.0. If you’re using older versions of PAN-OS the you can use the variable $threatid instead
The last action to complete this job is to create a new Log Forwarding profile that will consume our recently created HTTP server and to use it in all security rules we want their alerts being ingested into XDR. The configuration object is available at Objects > Log Forwarding
Step 5: Final checks
As your PAN-OS NGFW starts generating alerts, you should see activity in the micro-service’s output log and alerts being ingested into the Cortex XDR instance. The source tag Palo Alto Networks — PAN-OS clearly indicates these alerts were pushed by our deployment.

Production-ready version
There a couple of additional steps to perform before considering the micro-service production-ready.
- The service should be exposed over a secure channel (TLS). Best option is to leverage the preferred forward-proxy environment available in our container environment. Notice the image honours the PORT env variable and should work out-of-the-box almost everywhere
- A restart-policy should be in place to restart the container in case of a crash. It is not a good idea to run more than one instance in a load balancing group because the quota enforcement won’t be synchronised between them
Summary
Palo Alto Networks products and services are built with automation and customisation in mind. They feature rich API’s that can be used to tailor them to specific customer needs.
In my case these APIs provided the customer with a choice:
- Either wait for the scope of the project (and its budget) to increase in order to accommodate additional products like Cortex Network Traffic Analysis or
- Deploy a compact “middle-man” that would connect the PAN-OS and Cortex XDR API’s
This experience empowered me by acquiring DevOps knowledge (building micro-services) I’m sure I’ll use in many opportunities to come. Special thanks to The Developer Relations team at Palo Alto Networks who provided documentation and examples, and were eager to explore this new use case. I couldn’t have done it without their help and the work they’ve been putting into improving our developer experience.
Ingest PAN-OS Alerts into Cortex XDR Pro Endpoint was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

Enterprise API design practices: Part 4
By: Francesco Vigo

Welcome to the last part of my series on creating valuable, usable and future-proof Enterprise APIs. I initially covered some background context and the importance of design, then I presented security and backend protection and in the third chapter I explored optimizations and scale; this post is about monitoring and Developer Experience (DX).
5. Monitor your APIs
Don’t fly blind: you must make sure that your APIs are properly instrumented so you can monitor what’s going on. This is important for a lot of good reasons, such as:
- Security: is your service under attack or being used maliciously? You should always be able to figure out if there are anomalies and react swiftly to prevent incidents.
- Auditing: depending on the nature of your business, you might be required by compliance or investigative reasons to produce an audit trail of user activity of your product, which requires proper instrumentation of your APIs as well as logging events.
- Performance and scale: you should be aware of how your API and backend are performing and if the key metrics are within the acceptable ranges, way before your users start complaining and your business is impacted. It’s always better to optimize before performance becomes a real problem.
- Cost: similarly, with proper instrumentation, you can be aware of your infrastructure costs sustained to serve a certain amount of traffic. That can help you with capacity planning and cost modeling if you’re monetizing your service. And to avoid unexpected bills at the end of the month that might force you to disrupt the service.
- Feedback: with proper telemetry data to look at when you deploy a change, you can understand how it’s performing and whether it was a good idea to implement it. It also allows you to implement prototyping techniques such as A/B testing.
- Learn: analyzing how developers use your API can be a great source of learning. It will give you useful insights on how your service is being consumed and is a valuable source of ideas that you can evaluate for new features (i.e. new use-case driven API endpoints).
Proper instrumentation of services is a vast topic and here I just want to summarize a few items that are usually easy to implement:
- Every API request should have a unique operation identifier that is stored in your logs and can help your ops team figure out what happened. This identifier should also be reported back to clients somewhere (usually in API response the headers) especially, but not exclusively, in case of errors.
- Keep an eye on API requests that fail with server errors (i.e. the HTTP 5xx ones) and, if the number is non-negligible, try to pinpoint the root cause: is it a bug, or some request that is causing timeouts in the backend? Can you fix it or make it faster?
- Keep a reasonable log history to allow tracking errors and auditing user activity for at least several days back in time.
- Create dashboards that help you monitor user activity, API usage, security metrics, etc. And make sure to check them often.
6. Make Developer Experience a priority
Last principle, but definitely not the least important. Even if you are lucky and the developers that use your APIs are required to do so because of an external mandate, you shouldn’t make their experience less positive.
In fact, the Developer Experience of your product should be awesome.
If developers properly understand how to work with your APIs, and enjoy doing so, they will likely run into fewer issues and be more patient when trying to overcome them. They will also provide you with good quality feedback. It will reduce the number of requests they generate on your support teams. They will give you insights and use case ideas for building a better product.
And, more importantly, happy developers will ultimately build better products for their own customers which, in turn, will act as a force multiplier for the value and success of your own product. Everybody wins in the API Economy model: customers, business stakeholders, partners, product teams, engineers, support teams.
I’ve witnessed several situations where developers were so exhausted and frustrated with working against a bad API that, as soon as they saw the light at the end of the tunnel, they stopped thinking creatively and just powered their way through to an MVP that was far from viable and valuable. But it checked the box they needed so they were allowed to move on. I consider this a very specific scenario of developer fatigue: let’s call it “API Fatigue”.
Luckily, I’ve also experienced the other way around, where new, unplanned, great features were added because the DX was good and we had fun integrating things together.
There are many resources out there that describe how to make APIs with great developer experience: the most obvious one is to create APIs that are clear and simple to consume. Apply the Principle of least astonishment.
I recommend considering the following when shipping an API:
- Document your API properly: it’s almost certain that the time you spend creating proper documentation at the beginning is saved later on when developers start consuming it. Specification files, such as OpenAPI Specification (OAS) tremendously help here. Also, if you follow the design-first approach that we discussed in the first post of this series, you’ll probably already have a specification file ready to use.
- Support different learning paths: some developers like to read all the documentation from top to bottom before writing a single line of code, others will start with the code editor right away. Instead of forcing a learning path, try to embrace the different mindsets and provide tools for everyone: specification files, Postman collections, examples in different programming languages, an easy-to-access API sandbox (i.e. something that doesn’t require installing an Enterprise product and waiting 2 weeks to get a license to make your first API call), a developer portal, tutorials and, why not, video walkthroughs. This might sound overwhelming but pays off in the end. With the proper design done first a lot of this content can be autogenerated from the specification files.
- Document the data structure: if you can, don’t just add lists of properties to your specs and docs: strive to provide proper descriptions of the fields that your API uses so that developers that are not fully familiar with your product can understand them. Remove ambiguity from the documentation as much as possible. This can go a long way, as developers can make mental models by understanding the data properly that often leads to better use cases than “just pull some data from this API”.
- Use verbs, status codes, and error descriptions properly: leverage the power of the protocol you are using (i.e. HTTP when using REST APIs) to define how to do things and what responses mean. Proper usage of status codes and good error messages will dramatically reduce the number of requests to your support team. Developers are smart and want to solve problems quickly so if you provide them with the right information to do so, they won’t bother you. Also, if you are properly logging and monitoring your API behavior, it will be easier for your support team to troubleshoot if your errors are not all “500 Internal Server Error” without any other detail.
Finally, stay close to the developers: especially if your API is being used by people external from your organization, it’s incredibly important to be close to them as much as you can to support them, learn and gather feedback on your API and its documentation. Allow everyone that is responsible for designing and engineering your API to be in that feedback loop, so they can share the learnings. Consider creating a community where people can ask questions and can expect quick answers (Slack, Reddit, Stack Overflow, etc.). I’ve made great friendships this way!
A few examples
There are many great APIs out there. Here is a small, not complete, list of products from other companies that, for one reason or another, are strong examples of what I described in this blog series:
- Microsoft Graph
- Google Cloud
- VirusTotal
- GitHub REST and GraphQL APIs
Conclusion
And that’s a wrap, thanks for reading! There are more things that I’ve learned that I might share in future posts, but in my opinion, these are the most relevant ones. I hope you found this series useful: looking forward to hearing your feedback and comments!
Enterprise API design practices: Part 4 was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

Enterprise API design practices: Part 3
By: Francesco Vigo

Welcome to the third part of my series on creating valuable, usable, and future-proof Enterprise APIs. Part 1 covered some background context and the importance of design, while the second post was about security and backend protection; this chapter is on optimization and scale.
3. Optimize interactions
When dealing with scenarios that weren’t anticipated in the initial release of an API (for example when integrating with an external product from a new partner), developers often have to rely on data-driven APIs to extract information from the backend and process it externally. While use-case-driven APIs are generally considered more useful, sometimes there might not be one available that suits the requirements of the novel use case that you must implement.
By considering the following guidelines when building your data-driven APIs, you can make them easier to consume and more efficient for the backend and the network, improving performance and reducing the operational cost (fewer data transfers, faster and cheaper queries on the DBs, etc.).
I’ll use an example with sample data. Consider the following data as a representation of your backend database: an imaginary set of alertsthat your Enterprise product detected over time. Instead of just three, imagine the following JSON output with thousands of records:
{
"alerts" : [
{
"name": "Impossible Travel",
"alert_type": "Behavior",
"alert_status": "ACTIVE",
"severity": "Critical",
"created": "2020-09-27T09:27:33Z",
"modified": "2020-09-28T14:34:44Z",
"alert_id": "8493638e-af28-4a83-b1a9-12085fdbf5b3",
"details": "... long blob of data ...",
"evidence": [
"long list of stuff"
]
},
{
"name": "Malware Detected",
"alert_type": "Endpoint",
"alert_status": "ACTIVE",
"severity": "High",
"created": "2020-10-04T11:22:01Z",
"modified": "2020-10-08T08:45:33Z",
"alert_id": "b1018a33-e30f-43b9-9d07-c12d42646bbe",
"details": "... long blob of data ...",
"evidence": [
"long list of stuff"
]
},
{
"name": "Large Upload",
"alert_type": "Network",
"alert_status": "ACTIVE",
"severity": "Low",
"created": "2020-11-01T07:04:42Z",
"modified": "2020-12-01T11:13:24Z",
"alert_id": "c79ed6a8-bba0-4177-a9c5-39e7f95c86f2",
"details": "... long blob of data ...",
"evidence": [
"long list of stuff"
]
}
]
}
Imagine implementing a simple /v1/alerts REST API endpoint to retrieve the data and you can’t anticipate all the future needs. I recommend considering the following guidelines:
- Filters: allow your consumers to reduce the result set by offering filtering capabilities in your API on as many fields as possible without stressing the backend too much (if some filters are not indexed it could become expensive, so you must find the right compromise). In the example above, good filters might include: name, severity, alert_type, alert_status, created, and modified. More complicated fields like details and stack might be too expensive for the backend (as they might require full-text search) and you would probably leave them out unless really required.
- Data formats: be very consistent in how you present and accept data across your API endpoints. This holds true especially for types such as numbers and dates, or complex structures. For example, to represent integers in JSON you can use numbers (i.e. "fieldname": 3) or strings (i.e. "fieldname": "3" ): no matter what you choose, you need to be consistent across all your API endpoints. And you should also use the same format when returning outputs and accepting inputs.
- Dates: dates and times can be represented in many ways: timestamps (in seconds, milliseconds, microseconds), strings (ISO8601 such as in the example above, or custom formats such as 20200101), with or without time zone information. This can easily become a problem for the developers. Again, the key is consistency: try to accept and return only a single date format (i.e. timestamp in milliseconds or ISO8601) and be explicit about whether you consider time zones or not: usually choosing to do everything in UTC is a good idea because removes reduce ambiguity. Make sure to document the date formats properly.
- Filter types: depending on the type of field, you should provide appropriate filters, not just equals. A good example is supporting range filters for dates that, in our example above, allow consumers to retrieve only the alerts created or modified in a specific interval. If some fields are enumerators with a limited number of possible values, it might be useful to support a multi-select filter (i.e. IN): in the example above it should be possible to filter by severity values and include only the High and Critical values using a single API call.
- Sorting: is your API consumer interested only in the older alerts or the newest? Supporting sorting in your data-driven API is extremely important. One field to sort by is generally enough, but sometimes (depending on the data) you might need more.
- Result limiting and pagination: you can’t expect all the entries to be returned at once (and your clients might not be interested or ready to ingest all of them anyway), so you should implement some logic where clients should retrieve a limited number of results and can get more when they need. If you are using pagination, clients should be able to specify the page size within a maximum allowed value. Defaults and maximums should be reasonable and properly documented.
- Field limiting: consider whether you really need to return all the fields of your results all the time, or if your clients usually just need a few. By letting the client decide what fields (or groups of fields) your API should return, you can reduce the network throughput and backend cost, and performance. You should provide and document some sane default. In the example above, you could decide to return by default all the fields except details and evidence, which can be requested by the client only if they explicitly ask, using an include parameter.
Let’s put it all together. In the above example, you should be able to retrieve, using a single API call, something like this:
Up to 100 alerts that were created between 2020–04–01T00:00:00Z (April 1st 2020) and 2020–10–01T00:00:00Z (October 1st 2020) with severity “medium” or “high”, sorted by “modified” date, newest first, including all the fields but “evidence”.
There are multiple ways you can implement this; through REST, GraphQL, or custom query languages: in many cases, you don’t need something too complex as often data sets are fairly simple. The proper way depends on many design considerations that are outside the scope of this post. But having some, or most of these capabilities in your API will make it better and more future proof. By the way, if you’re into GraphQL, I recommend reading this post.
4. Plan for scale
A good API shouldn’t be presumptuous: it shouldn’t expect that clients are not doing anything else than waiting for its response, especially at scale, where performance is key.
If your API requires more than a few milliseconds to produce a response, I recommend considering supporting jobs instead. The logic can be as follows:
- Implement an API endpoint to start an operation that is supposed to take some time. If accepted, it would return immediately with a jobId.
- The client stores the jobId and periodically reaches out to a second endpoint that, when provided with the jobId, returns the completion status of the job (i.e. running, completed, failed).
- Once results are available (or some are), the client can invoke a third endpoint to fetch the results.
Other possible solutions include publisher/subscriber approaches or pushing data with webhooks, also depending on the size of the result set and the speed requirements. There isn’t a one-size-fits-all solution, but I strongly recommend avoiding a polling logic where API clients are kept waiting for the server to reply while it’s running long jobs in the backend.
If your need high performance and throughput when in your APIs, consider gRPC, as its binary representation of data using protocol buffers has significant speed advantages over REST.
Side note: if you want to learn more about REST, GraphQL, webhooks, or gRPC use cases, I recommend starting from this post.
Finally, other considerations for scale include supporting batch operations on multiple entries at the same time (for example mass updates), but I recommend considering them only when you have a real use case in mind.
What’s next
In the next and last chapter, I’ll share some more suggestions about monitoring and supporting your API and developer ecosystem.
Enterprise API design practices: Part 3 was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

Security Automation at BlackHat Europe 2022: Part 2
By: James Holland
In part 2 of this double-header, we look at the operations side of the conference infrastructure. If you missed part one, it’s here.
Automating Security Operations Use Cases with Cortex XSOAR
To reiterate from the previous post, on the Black Hat conference network we are likely to see malicious activity, in fact it is expected. As the Black Hat leadership team say, occasionally we find a “needle in a needlestack”, someone with true malicious intent. But how do you go about finding malicious activity with real intent within a sea of offensive security demonstrations and training exercises?
Without being able to proactively block the majority of malicious activity (in case we disrupt training exercises, or break someone’s exploitation demo in the Arsenal), we hunt. To hunt more efficiently we automate. It’s a multi-vendor approach, with hunters from Palo Alto Networks, Cisco, RSA Netwitness and Ironnet all on-site and collaborating. Cortex XSOAR provides the glue between all the deployed inline and out-of-band security tooling, as well as being the conduit into Slack for the analysts to collaborate and communicate.
An investigation may start from various angles and different indicators, and being able to quickly classify if the source of the incident is a training class is a great start. Without leaving Slack, an Cortex XSOAR chatbot is able to provide an automated lookup of a machine’s MAC address, and tell the analyst: the IP address, the vendor assigned to that MAC address where applicable, the wireless access point the host is connected to (thanks to the Cortex XSOAR integration with Cisco Meraki, docs here), and crucially the firewall zone where the machine is located. In the example below, the “tr_digi_forens_ir” zone tells us this machine is in a training class, specifically the digital forensics and incident response class:

That’s really useful information when examining internal hosts, but how about a lookup for IP addresses which are sending traffic towards the Black Hat conference infrastructure in a suspicious way from the outside, from the Internet? To see if any of the variety of available Threat Intelligence sources have specific information available, and the level of confidence. There’s a Slack chatbot query for that too, powered by Cortex XSOAR:

Or checking Threat Intellignce sources for information about a domain being contacted by a machine in the visitor wireless network which is potentially compromised, and analysing it in a sandbox too?

The chatbot has many features, all available to any analyst from any vendor working in the NOC, with no requirement to learn any product’s user interface, just a simple Slack chatbot:

Other ways of automating our operations included ingestion of the data from other deployed toolsets, like the Palo Alto Networks IoT platform, which below is shown creating incidents in Cortex XSOAR based on the passive device and application profiling it does on the network traffic:

The data from the IoT platform enriches the incident, providing the analyst wish a page of information to quickly understand the context of the incident and what action would be appropriate:


As well as integrating Cortex XSOAR with Cisco Meraki, we also integrated Cortex XSOAR with RSA Netwitness, and were able to use alerts from Netwitness to generate and work through any incidents that looked like potentially malicious behaviour.
We also utilised Cortex XSOAR for some more network-focused use cases. For instance, by leveraging the intelligence data maintained within the PAN-OS NGFWs, we were interested to see if there was any traffic approaching the Black Hat infrastructure’s public facing services from TOR exit nodes, and we weren’t disappointed:

We also leveraged Cortex XSOAR playbooks to provide an OSINT news into a dedicated Slack channel, so analysts could see breaking stories as they happen:

And we even used a Cortex XSOAR playbook to proactively monitor device uptime, which would alert into Slack if a critical device stopped responding and was suspected to be “down”:

Summary
It’s an infrastructure full of malicious activity, on purpose. It gets built, rapidly, to a bespoke set of requirements for each conference. It is then operated by a collaboration of Black Hat staff and multiple security vendors’ staff.
That can only happen successfully with high levels of automation, in both the build and the operation phases of the conference. With the automation capabilities of the PAN-OS network security platform, the orchestration from Cortex XSOAR, and the collaboration across vendors, the Black Hat conference was once again a safe and reliable environment for all who attended.
Acknowledgements
Palo Alto Networks would like to once again thank Black Hat for choosing us to provide network security, as well as the automation and orchestration platform, for the operations centres of the conferences this year in Singapore, Las Vegas and London ♥
Thank you Jessica Stafford, Bart Stump, Steve Fink, Neil R. Wyler and ᴘᴏᴘᴇ for your leadership and guidance. Thank you Jessica Bair Oppenheimer, Evan Basta, Dave Glover, Peter Rydzynski and Muhammad Durrani for all the cross-vendor collaboration along with your teams including Rossi Rosario, Paul Fidler, Panagiotis (Otis) Ioannou, Paul Mulvihill, Iain Davison, and (sorry) everyone else who may be lurking on other social media platforms where I couldn’t find them!
And of course, thanks so much to the amazing folks representing Palo Alto Networks in London, great job team; Matt Ford, Ayman Mahmoud, Matt Smith, Simeon Maggioni and Doug Tooth. Also Scott Brumley for his work on the Cortex XSOAR Slack chatbot during the USA conference earlier this year.

Security Automation at BlackHat Europe 2022: Part 2 was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

Security Automation at BlackHat Europe 2022: Part 1
By: James Holland
In part 1 of this double-header, we look at the build and configuration tasks for the conference.

It’s been called one of the most dangerous networks in the world, and there are many good reasons why each Black Hat conference has its own IT infrastructure built from the ground up.
There are training classes, where attendees learn offensive security techniques, from hacking infrastructure to attacking the Linux kernel, exploiting IIoT, and abusing directory services. There is the Arsenal, where researchers demonstrate the latest techniques, as well as briefings from experts in a variety of security domains. Then add hundreds of eager and interested attendees, who are not only learning from the content at the conference, but may have their own tricks to bring to the party too.
Roll Your Own
A dedicated infrastructure that does not rely (as far as is possible) on the venue’s own network and security capabilities is the only feasible way to host this kind of community of keen security professionals. Building an infrastructure per conference means that a multi-disciplined team, from a variety of vendors and backgrounds, must find ways to make the build as streamlined as possible. Automation is key to the approach.

The Black Hat team chose Palo Alto Networks to provide network security for all three of their conferences during 2022, renewing an annual partnership which now spans 6 years. The partnership includes Palo Alto Networks supplying their staff to work in the conference NOCs, configuring and operating several PA-Series hardware next-generation firewalls (NGFWs). In 2022, the partnership expanded to include the use of Cortex XSOAR to automate security operations.
Automating the Build Process
The build happens in a short period of time; the core infrastructure went from cardboard boxes to “live” in just over one day for the Europe 2022 conference. A design including complete segmentation of each conference area (including segmenting each training class, the Arsenal, the exhibiting vendors, the registration area, the NOC itself, and more), requires a lot of IP subnets and VLANs, multiple wireless SSIDs, and several DHCP servers and scopes. Some DHCP scopes require reservations, particularly where infrastructure components require predictable IP addressing, but there are too many of them for configuration of static addressing to be feasible. And change happens; IT security is a fast-paced industry, and we knew from experience that we would be adding, moving or changing the configuration data as the conference progressed.
With a single source for all of that configuration data, and a PAN-OS network security platform with plenty of automation capability, automation was inevitable, the only choice was the flavour!
Step forward Ansible. With its task-based approach, its ability to bring in configuration data from almost any structured source, and a collection of modules for idempotent configuration of PAN-OS, it was the perfect match for the requirements.
All of those segmented subnets needed configuring with IP addresses, as well as security zones. Here you can see some excerpts from a playbook execution, where Ansible observed modifications in the configuration data source, and changes were made to only to the required items, with the rest being of the configuration left in original state:


This is important; the initial configuration would not be the final configuration, so when re-executing Ansible to make incremental changes, we only want to make modifications where they are needed. This approach also speeds up the processing time for changes.
Below you can also see a long (and truncated, for brevity) list of DHCP reservations required for some of the infrastructure components. They are being configured with a single Ansible task; this is a list of MAC addresses and IP address that definitely does not want to be configured by hand!

The PAN-OS next-generation firewalls are the DHCP servers for every subnet, and at scale, such a large quantity of DHCP servers is also something which nobody would want to configure by hand, so again, Ansible did that for us automatically:

Automatically Keeping an Eye on Suspicious Hosts
It is rare that the Black Hat team has to take any action against a conference attendee; the majority of seemingly malicious activity is usually part of the trainings, a demo in the Arsenal, or something else “expected”. Occasionally attendees approach or cross the line of acceptable behaviour, and during those instances and investigations it is very useful to be able to view the historical data across the conference.
User-ID provides a huge benefit when the network should include known and authenticated users, but at Black Hat conferences, that is not the case. There is no authentication past the pre-shared key to join the wireless network, and no tracking of any person that attends the conference. However, we chose to modify the user-to-IP mapping capability of User-ID to become MAC-to-IP mappings. Being the DHCP server, the PAN-OS NGFWs knew the MAC address of each host as it requested an IP address, so we routed that information into the mapping database. This meant we were able to observe a host machine (without any knowledge of the person using it) as it moved throughout the conference. Even if the machine left the network and joined again later (after lunch!?) with a new DHCP IP address, or if the machine moved between different wireless SSIDs and hence different IP subnets.
Should action be required when a machine is exhibiting unacceptable behaviour, one option is to utilise network security controls based on the MAC address of the host, instead of the IP address. These controls would be applicable no matter which network the host moved into.
Part Two
The second part of this double-header will focus on the operations side of the conference infrastructure, as the team (below) move into threat hunting mode. Carry on reading here…

Security Automation at BlackHat Europe 2022: Part 1 was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.
The Developer’s Guide To Palo Alto Networks Cloud NGFW for AWS
By: Migara Ekanayake
Busy modernizing your applications? One thing you can’t cut corners on is the security aspect. Today, we will discuss network security — inserting inbound, outbound, and VPC-to-VPC security for your traffic flows, to be precise, without compromising DevOps speed and agility. When it comes to network security for cloud-native applications, it’s challenging to find a cloud-native security solution that provides the best in class NGFW security while consuming security as a cloud-native service. This means developers have to compromise security and find a solution that fits their development needs. That’s no longer the case — today, we will look at how you can have your cake and eat it too!
Infrastructure-as-Code is one of the key pillars in the application modernization journey, and there is a wide range of tools you can choose from. Terraform is one of the industry’s widely adopted infrastructure-as-code tools to shift from manual, error-prone provisioning to automated provisioning at scale. And, we firmly believe that it is crucial to be able to provision and manage your cloud-native security using Terraform next to your application code where it belongs. We have decided to provide launch day Terraform support for Palo Alto Networks Cloud NGFW for AWS with our brand new cloudngfwaws Terraform provider, allowing you to perform day-0, day-1, and day-2 tasks. You can now consume our Cloud NGFW with the tooling you are already using without leaving the interfaces you are familiar with; it’s that simple!
Getting Started
Prerequisites
- Subscribed to Palo Alto Networks Cloud NGFW via the AWS marketplace
- Your AWS account is onboarded to the Cloud NGFW
AWS Architecture
We will focus on securing an architecture similar to the topology below. Note the unused Firewall Subnet — later, we will deploy the Cloud NGFW endpoints into this subnet and make the necessary routing changes to inspect traffic through the Cloud NGFW.
Authentication and Authorization
Enable Programmatic Access
To use the Terraform provider, you must first enable the Programmatic Access for your Cloud NGFW tenant. You can check this by navigating to the Settings section of the Cloud NGFW console. The steps to do this can be found here.
You will authenticate against your Cloud NGFW by assuming roles in your AWS account that are allowed to make API calls to the AWS API Gateway service. The associated tags with the roles dictate the type of Cloud NGFW programmatic access granted — Firewall Admin, RuleStack Admin, or Global Rulestack Admin.
The following Terraform configuration will create an AWS role which we will utilize later when setting up the cloudngfwaws Terraform provider.
https://medium.com/media/d8915d509a0c0702eead9a113506a065/hrefSetting Up The Terraform Provider
In this step, we will configure the Terraform provider by specifying the ARN of the role we created in the previous step. Alternatively, you can also specify individual Cloud NGFW programmatic access roles via lfa-arn, lra-arn, and gra-arn parameters.
https://medium.com/media/ca97592b6491495aa21242c76a4077bd/hrefNote how Terraform provider documentation specifies Admin Permission Type required for each Terraform resource as Firewall, Rulestack, or Global Rulestack. You must ensure the Terraform provider is configured with an AWS role(s) that has sufficient permission(s) to use the Terraform resources in your configuration file.
Rulestacks and Cloud NGFW Resources
There are two fundamental constructs you will discover throughout the rest of this article — Rulestacks and Cloud NGFW resources.
A rulestack defines the NGFW traffic filtering behavior, including advanced access control and threat prevention — simply a set of security rules and their associated objects and security profiles.
Cloud NGFW resources are managed resources that provide NGFW capabilities with built-in resilience, scalability, and life-cycle management. You will associate a rulestack to an NGFW resource when you create one.
Deploying Your First Cloud NGFW Rulestack
First, let’s start by creating a simple rulestack, and we are going to use the BestPractice Anti Spyware profile. BestPractice profiles are security profiles that come built-in, which will make it easier for you to use security profiles from the start. If required, you can also create custom profiles to meet your demands.
https://medium.com/media/aa4fbade6ffa823fcec6836a47a6f212/hrefThe next step is to create a security rule that only allows HTTP-based traffic and associate that with the rulestack we created in the previous step. Note that we use the App-ID web-browsing instead of traditional port-based enforcement.
https://medium.com/media/da8119359cfd91cd2d471af2ddc12fde/hrefCommitting Your Rulestack
Once the rulestack is created, we will commit the rulestack before assigning it to an NGFW resource.
Note: cloudngfwaws_commit_rulestack should be placed in a separate plan as the plan that configures the rulestack and its contents. If you do not, you will have perpetual configuration drift and need to run your plan twice so the commit is performed.
https://medium.com/media/9eeaa739ecbf2c2ab3113b5f05d8f879/hrefDeploying Your First Cloud NGFW Resource
Traffic to and from your resources in VPC subnets is routed through to NGFW resources using NGFW endpoints. How you want to create these NGFW endpoints is determined based on the endpoint mode you select when creating the Cloud NGFW resource.
- ServiceManaged — Creates NGFW endpoints in the VPC subnets you specify
- CustomerManaged — Creates just the NGFW endpoint service in your AWS account, and you will have the flexibility to create NGFW endpoints in the VPC subnets you want later.
In this example, we are going to choose the ServiceManaged endpoint mode. Also, notice how we have specified the subnet_mapping property. These are the subnets where your AWS resources live that you want to protect.
In production, you may want to organize these Terraform resources into multiple stages of your pipeline — first, create the rulestack and its content, and proceed to the stage where you will commit the rulestack and create the NGFW resource.
https://medium.com/media/3af82ed097ccfb15ed2ced3400baaa22/hrefAt this point, you will have a Cloud NGFW endpoint deployed into your Firewall subnet.
You can retrieve the NGFW endpoint ID to Firewall Subnet mapping via cloudngfwaws_ngfw Terraform data resource. This information is required during route creation in the next step.
https://medium.com/media/8e3e8465f5a224b6761da982f7477075/hrefRouting Traffic via Cloud NGFW
The final step is to add/update routes to your existing AWS route tables to send traffic via the Cloud NGFW. The new routes are highlighted in the diagram below. Again, you can perform this via aws_route or aws_route_table Terraform resource.
Learn more about Cloud NGFW
In this article, we discovered how to deploy Cloud NGFW in the Distributed model. You can also deploy Cloud NGFW in a Centralized model with AWS Transit Gateway. The Centralized model will allow you to run Cloud NGFW in a centralized “inspection” VPC and connect all your other VPCs via Transit Gateway.
We also discovered how to move away from traditional port-based policy enforcement and move towards application-based enforcement. You can find a comprehensive list of available App-IDs here.
There is more you can do with Cloud NGFW.
- Threat prevention — Automatically stop known malware, vulnerability exploits, and command and control infrastructure (C2) hacking with industry-leading threat prevention.
- Advanced URL Filtering — Stop unknown web-based attacks in real-time to prevent patient zero. Advanced URL Filtering analyzes web traffic, categorizes URLs, and blocks malicious threats in seconds.
Cloud NGFW for AWS is a regional service. Currently, it is available in the AWS regions enumerated here. To learn more, visit the documentation and FAQ pages. To get hands-on experience with this, please subscribe via the AWS Marketplace page.
The Developer’s Guide To Palo Alto Networks Cloud NGFW for AWS was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

Dynamic Firewalling with Palo Alto Networks NGFWs and Consul-Terraform-Sync
By: Migara Ekanayake

I cannot reach www.isitthefirewall.com…!! Must be the firewall!!
Sounds familiar? We’ve all been there. If you were that lonely firewall administrator who tried to defend the good ol’ firewall, congratulations for making it this far. Life was tough back then when you only had a handful of firewall change requests a day, static data centres and monolithic applications.
Fast forward to the present day, we are moving away from traditional static datacentres to modern dynamic datacentres. Applications are no longer maintained as monoliths, they are arranged into several smaller functional components (aka Microservices) to gain development agility. As you gain development agility you introduce new operational challenges.
What are the Operational Challenges?
Now that you have split your monolithic application into a dozen of microservices, most likely you are going to have multiple instances of each microservice to fully realise the benefits of this app transformation exercise. Every time you want to bring up a new instance, you open a ticket and wait for the firewall administrator to allow traffic to that new node, this could take days if not weeks.
When you add autoscaling into the mix (so that the number of instances can dynamically scale in and out based on your traffic demands) having to wait days or weeks before traffic can reach those new instances defeats the whole point of having the autoscaling in the first place.
Long live agility!
The Disjoint
Traditionally, firewall administrators used to retrieve the requests from their ticketing system and implement those changes via UI or CLI during a maintenance window. This results in creating an impedance mismatch with the application teams and overall slower delivery of the solutions and can also introduce an element of human error during the ticket creation process as well as the implementation of this request. If you are a network/security administrator who has recognised these problems, the likelihood is that you have already written some custom scripts and/or leveraged a configuration management platform to automate some of these tasks. Yes, this solves the problem to a certain extent, still, there is a manual handoff task between Continuous Delivery and Continuous Deployment.

Consul-Terraform-Sync
Network and security teams can solve these challenges by enabling dynamic service-driven network automation with self-service capabilities using an automation tool that supports multiple networking technologies.
HashiCorp and Palo Alto Networks recently collaborated on a strategy for this using HashiCorp’s Network Infrastructure Automation (NIA). This works by triggering a Terraform workflow that automatically updates Palo Alto Networks NGFWs or Panorama based on changes it detects from the Consul catalog.
Under the hood, we are leveraging Dynamic Address Groups (DAGs). In PAN-OS it is possible to dynamically associate (and remove) tags from IP addresses, using several ways, including the XML API. A tag is simply a string that can be used as match criteria in Dynamic Address Groups, allowing the Firewall to dynamically allow/block traffic without requiring a configuration commit.
If you need a refresher on DAGs here is a great DAG Quickstart guide.
Scaling Up with Panorama
The challenge for large-scale networks is ensuring every firewall that enforces policies has the IP address-to-tag mappings for your entire user base.
If you are managing your Palo Alto Networks NGFWs with Panorama you can redistribute IP address-to-tag mappings to your entire firewall estate within a matter of seconds. This could be your VM-Series NGFWs deployed in the public cloud, private cloud, hybrid cloud or hardware NGFWs in your datacentre.

What’s Next?
If you have read this far, why not give this step-by-step guide on how you can automate the configuration management process for Palo Alto Networks NGFWs with Consul-Terraform-Sync a try?.
For more resources, check out:
- Webinar: Network Security Automation with HashiCorp Consul-Terraform-Sync and Palo Alto Networks
- Whitepaper: Enabling Dynamic Firewalling with Palo Alto and HashiCorp
Dynamic Firewalling with Palo Alto Networks NGFWs and Consul-Terraform-Sync was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

PAN.dev and the Rise of Docs-as-Code
By: Steven Serrata

It started out as a simple idea — to improve the documentation experience for the many open-source projects at Palo Alto Networks. Up until that point (and still to this day) our open source projects were documented in the traditional fashion: the ubiquitous README.md and the occasional Read the Docs or GitHub Pages site. In truth, there is nothing inherently “wrong” with this approach but there was a sense that we could do more to improve the developer experience of adopting our APIs, tools and SDKs. So began our exploration into understanding how companies like Google, Microsoft, Facebook, Amazon, Twilio, Stripe, et al., approached developer docs. Outside of visual aesthetics, it was clear that these companies invested a great deal of time into streamlining their developer onboarding experiences, something we like to refer to as “time-to-joy.” The hard truth was that our product documentation site was catering to an altogether different audience and it was evident in the fact that features like quick starts (“Hello World”), code blocks, and interactive API reference docs were noticeably absent.
After some tinkering we had our first, git-backed developer portal deployed using Gatsby and AWS Amplify, but that was only the beginning.
Great Gatsby!
Circa March 2019. Although we technically weren’t new to Developer Relations as a practice, it was the first time we made an honest attempt to solve for the lack of developer-style documentation at Palo Alto Networks. We quickly got to work researching how other companies delivered their developer sites and received a tip from a colleague to look at Gatsby — a static-site generator based on GraphQL and ReactJS. After some tinkering, we had our first git-backed developer portal deployed using AWS Amplify, but that was only the beginning. It was a revelation that git-backed, web/CDN hosting services existed and that rich, interactive, documentation sites could be managed like code. It wasn’t long before we found Netlify, Firebase, and, eventually, Docusaurus — a static-site generator specializing in the documentation use case. It was easy to pick up and run with this toolchain as it seamlessly weaved together the open-source git flows we were accustomed to with the ability to rapidly author developer docs using markdown and MDX — a veritable match made in heaven. We had the stack — now all we needed was a strategy.
At a deeper level, it’s a fresh, bold new direction for a disruptive, next-generation security company that, through strategic acquisitions and innovation, has found itself on the doorsteps of the API Economy.
For Developers, by Developers
At the surface, pan.dev is a family of sites dedicated to documentation for developers, by developers. At a deeper level, it’s a fresh, bold, new direction for a disruptive, next-generation security company that, through strategic acquisitions and innovation, has found itself on the doorsteps of the API Economy (more on that topic coming soon!).

The content published on pan.dev is for developers because it supports and includes the features that developer audiences have come to expect, such as code blocks and API reference docs (and dark mode!):

It’s by developers since the contributing, review and publishing flows are backed by git and version control systems like GitLab and GitHub. What’s more, by adopting git as the underlying collaboration tool, pan.dev is capable of supporting both closed and open-source contributions. Spot a typo or inaccuracy? Open a GitHub issue. Got a cool pan-os-python tutorial you’d like to contribute? Follow our contributing guidelines and we’ll start reviewing your submission. By adopting developer workflows, pan.dev is able to move developer docs closer to the source (no pun intended). That means we can deliver high-quality content straight from the minds of the leading automation, scripting, DevOps/SecOps practitioners in the cybersecurity world.

So, What’s Next?
If we’ve learned anything, over the years, it’s that coming up with technical solutions might actually be the easier aspect of working on the pan.dev project. What can be more difficult (and arguably more critical) is garnering buy-in from stakeholders, partners and leadership, while adapting to the ever-evolving needs of the developer audience we’re trying to serve. All this while doing what we can to ensure the reliability and scalability of the underlying infrastructure (not to mention SEO, analytics, accessibility, performance, i18n, etc.). The real work has only recently begun and we’re ecstatic to see what the future holds. Already, we’ve seen a meteoric rise in monthly active users (MAUs) across sites (over 40K!) and pan.dev is well-positioned to be the de facto platform for delivering developer documentation at Palo Alto Networks.
To learn more and experience it for yourself, feel free to tour our family of developer sites and peruse the GitHub Gallery of open-source projects at Palo Alto Networks.
For a deeper dive into the pan.dev toolchain/stack, stay tuned for part 2!
PAN.dev and the Rise of Docs-as-Code was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

User-ID / EDL … better both of them
By: Xavier Homs
User-ID or EDL … why not both?

User-ID and External Dynamic Lists (EDL’s) are probably the most commonly used PAN-OS features to share external IP metadata with the NGFW. The Use cases for them are enormous: from blocking known DoS sources to forward traffic from specific IP addresses on low latency links.
Let’s take for instance the following logs generated by the SSH daemon in a linux host.
https://medium.com/media/6a8bdd451748be2a7326a77029a6a200/hrefWon’t it be great to have a way to share the IP addresses used by the user xhoms with the NGFW so specific security policies are applied to traffic sourced by that address just because we now know the endpoint is being managed by that user?
The following Shell CGI script could be used to create an External Dynamic List (EDL) that would list all known IP addresses the account xhoms was used to login into this server.
https://medium.com/media/ec08fd38c48c8a005909d5b502e521cc/hrefThat is great but:
- When would addresses be removed from the list? At the daily log rollout?
- Are all listed IP addresses still being operated by the user xhoms?
- Is there any way to remove addresses from the list at user logout?
To overcome these (and many other) limitations Palo Alto Networks NGFW feature a very powerful IP tagging API called User-ID. Although EDL and User-ID cover similar objectives there are fundamental technical differences between them:
- User-ID is “asynchronous” (push mode), supports bot “set” and “delete” operations and provides a very flexible way to create groupings. Either by tagging address objects (Dynamic Address Group — DAG ) or by mapping users to addresses and then tagging these users (Dynamic User Group — DUG). User-ID allows for these tags to have a timeout in order for the PAN-OS device to take care of removing them at their due time.
- EDL is “universal”. Almost all network appliance vendors provide a feature to fetch an IP address list from a URL.
Years ago it was up to the end customer to create its own connectors. Just like the CGI script we shared before. But as presence of PAN-OS powered NGFW’s grown among enterprise customers more application vendors decided to leverage the User-ID value by providing API clients out-of-the-box. Although this is great (for Palo Alto Network customers) losing the option to fetch a list from a URL (EDL mode) makes it more difficult to integrate legacy technologies that might be out there still in the network.
Creating EDL’s from User-ID enabled applications
Let’s assume you have an application that features a PAN-OS User-ID API client and that is capable of pushing log entries to the PAN-OS NGFW in an asynchronous way. Let’s assume, as well, that there are still some legacy network devices in your network that need to fetch that address metadata from a URL. How difficult it would be to create a micro-service for that?
One option would be to leverage a PAN-OS XML SDK like PAN Python or PAN GO and to create a REST API that extracts the current state from the PAN-OS device using its operations API.
GET https://<pan-os-device>/api
?key=<API-KEY>
&type=op
&cmd=<show><object><registered-ip><all></all></registered-ip></object></show>
HTTP/1.1 200 OK
<response status="success">
<result>
<entry ip="10.10.10.11" from_agent="0" persistent="1">
<tag>
<member>tag10</member>
</tag>
</entry>
<entry ip="10.10.10.10" from_agent="0" persistent="1">
<tag>
<member>tag10</member>
</tag>
</entry>
<count>2</count>
</result>
</response>
It shouldn’t be that difficult to parse the response and provide a plain list out of it, right? Come back to me if you’re thinking on using a XSLT processor to pipe it at the output of a cURL command packing everything as a CGI script because we might have some common grounds ;-)
I’d love to propose a different approach though: to hijack User-ID messages by implementing a micro-service that behaves as a reverse proxy between the application that features the User-ID client and the PAN-OS device. I like this approach because it opens the door to other interesting use cases like enforcing timeout (adding timeout to entries that do not have it), converting DAG messages into DUG equivalents or adding additional tags based on the source application.
Components for a User-ID to EDL reverse proxy
The ingredients for such a receipt would be:
- a light http web server featuring routing (hijack messages sent to /api and let other requests pass through). GO http package and the type ReverseType fits like a glove in this case.
- a collection of XML-enabled GO structs to unmarshal captured User-ID messages.
- a library featuring User-ID message payload processing capable of keeping a valid (non-expired) state of User-to-IP, Group-to-IP and Tag-to-IP maps.
Fortunately for us there is the xhoms/panoslib GO module that covers 2 and 3 (OK, let’s be honest, I purpose-built the library to support this article)
With all these components it won’t be that difficult to reverse proxy a PAN-OS https service monitoring User-ID messages and exposing the state of the corresponding entries on a virtual endpoint named /edl that won’t conflict at all with the PAN-OS schema.
I wanted to prove the point and ended up coding the receipt as a micro-service. You can either check the code in the the GitHub repository xhoms/uidmonitor or run it from a Docker-enabled host. Take a look to the related panos.pan.dev tutorial for insights in the source code rationale
docker run --rm -p 8080:8080 -e TARGET=<pan-os-device> ghcr.io/xhoms/uidmonitor
Some payload examples
To get started you can check the following User-ID payload that registers the tag test to the IP addresses 10.10.10.10 and 10.10.20.20. Notice the different timeout values (100 vs 10 seconds)
POST http://127.0.0.1:8080/api/ HTTP/1.1
Content-Type: application/x-www-form-urlencoded
key=<my-api-key>
&type=user-id
&cmd=<uid-message>
<type>update</type>
<payload>
<register>
<entry ip="10.10.10.10">
<tag>
<member timeout="100">test</member>
</tag>
</entry>
<entry ip="10.10.20.20">
<tag>
<member timeout="10">test</member>
</tag>
</entry>
</register>
</payload>
</uid-message>
Just after pushing the payload (before the 10 second tag expires) perform the following GET request on the /edl endpoint and verify the provided list contains both IP addresses.
GET http://127.0.0.1:8080/edl/
?list=tag
&key=test
HTTP/1.1 200 OK
Content-Type: text/plain
Date: Mon, 12 Apr 2021 10:07:50 GMT
Content-Length: 24
Connection: close
10.10.20.20
10.10.10.10
A few seconds (+10) later the same transaction should return only the IP address whose tag has not expired.
GET http://127.0.0.1:8080/edl/
?list=tag
&key=test
Content-Type: text/plain
Date: Mon, 12 Apr 2021 10:08:55 GMT
Content-Length: 12
Connection: close
10.10.10.10
Now let’s use a bit more complex payload. The following one register (login) two users mapped to IP addresses and then apply the user tag admin to both of them. As in the previous case, each tag has a different expiration value.
POST http://127.0.0.1:8080/api/ HTTP/1.1
Content-Type: application/x-www-form-urlencoded
key=<my-api-key>
&type=user-id
&cmd=<uid-message>
<type>update</type>
<payload>
<login>
<entry name="bar@test.local" ip="10.10.20.20" timeout="100"></entry>
<entry name="foo@test.local" ip="10.10.10.10" timeout="100"></entry>
</login>
<register-user>
<entry user="foo@test.local">
<tag>
<member timeout="100">admin</member>
</tag>
</entry>
<entry user="bar@test.local">
<tag>
<member timeout="10">admin</member>
</tag>
</entry>
</register-user>
</payload>
</uid-message>
Although the login transaction has a longer timeout on both cases, the group membership tag for the user “bar@test.local” is expected to timeout in 10 seconds.
Calling the /edl endpoint at specific times would demonstrate the first tag expiration.
GET http://127.0.0.1:8080/edl/
?list=group
&key=admin
HTTP/1.1 200 OK
Content-Type: text/plain
Date: Mon, 12 Apr 2021 10:28:16 GMT
Content-Length: 24
Connection: close
10.10.20.20
10.10.10.10
...
GET http://127.0.0.1:8080/edl/
?list=group
&key=admin
HTTP/1.1 200 OK
Content-Type: text/plain
Date: Mon, 12 Apr 2021 10:28:32 GMT
Content-Length: 12
Connection: close
10.10.10.10
Summary
Both EDL (because it provides “universal” coverage) and User-ID (because its advanced feature set) have their place in a large network. Implementing a reverse proxy between User-ID enabled applications and the managed PAN-OS device capable of hijacking User-ID messages have many different use cases: from basic User-ID to EDL conversion (as in the demo application) to advanced cases like applying policies to User-ID messages, enforcing maximum timeouts or transposing Address Groups into User Groups.
User-ID / EDL … better both of them was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

Ingest PAN-OS Alerts into Cortex XDR Pro Endpoint
By: Amine Basli

In this blog I’m sharing my experience leveraging Cortex XDR API’s to tailor-fit this product into specific customer requirements.
I bet that if you’re a Systems Engineer operating in EMEA you’ve argued more than once about the way products and licenses are packaged from an opportunity size point of view (too big for your customer base). There are many reasons that explain why a given vendor would like to have a reduced set of SKUs. But, no doubt, the fewer SKUs you have the lower the chances are for a given product to fit exactly into the customer’s need from a sizing standpoint.
At Palo Alto Networks we’re no different, but we execute on two principles that help bridge the gaps:
- Provide APIs for our products
- Encourage the developer community to use them through our Developer Relations team
The use case
A Cortex XDR Pro Endpoint customer was interested in ingesting threat logs from their PAN-OS NGFW into his tenant to stitch them with the agent’s alerts and incidents. The customer was aware that this requirement could be achieved by sharing the NGFW state into the cloud-based Cortex Data Lake. But the corresponding SKU was overkill from a feature point of view (it provides not only alert stitching but ML baselining of traffic behaviour as well) and an unreachable cost from a budgeting perspective.
Cortex XDR Pro provides a REST API to ingest third-party alerts to cover this specific use case. It is rate limited to only 600 alerts per minute per tenant but was more than enough for my customer because they were only interested in medium to critical alerts that appeared at a much lower frequency in their network. What about leveraging this API to provide an exact match to my customer’s requirement?
On the other end, PAN-OS can be configured to forward these filtered alerts natively to any REST API with a very flexible payload templating feature. So at first it looked like we could “connect” the PAN-OS HTTP Log Forwarding feature with the Cortex XDR Insert Parsed Alert API. But a deep dive analysis revealed inconsistencies between the mandatory timestamp field format (it must be presented as UNIX milliseconds for XDR to accept it)
It was too close to give up. I just needed a transformation pipeline that could be used as a middleman between the PAN-OS HTTP log forwarding feature and the Cortex XDR Insert Parsed Alert API. Provided I was ready for a middleman, I’d like it to enforce the XDR API quota limitations (600 alerts per minute / up to 60 alerts per update)
Developers to the rescue
I engaged our Developer Relations team in Palo Alto Networks because they’re always eager to discuss new use cases for our APIs. A quick discussion ended up with the following architecture proposal:
- An HTTP server would be implemented to be used as the endpoint for PAN-OS alert ingestion. Basic authentication would be implemented by providing a pre-shared key in the Authentication header.
- A transformation pipeline would convert the PAN-OS payload into a ready-to-consume XDR parsed alert API payload. The pipeline would take care, as well, to enforce the XDR API quota limits buffering bursts as needed.
- The XDR API client implementation would use an Advanced API Key for authentication.
- Everything would be packaged in a Docker container to ease its deployment. Configuration parameters would be provided to the container as environment variables.
Implementation details ended up shared in a document available in cortex.pan.dev. If you’re in the mood of creating your own implementation I highly recommend you taking the time to go over the whole tutorial. If you just want to get to the point then you can use this ready-to-go container image.
Let’s get started
I opted for using the available container image. Let me guide you through my experience using it to fulfil my customer’s request.
First of all the application requires some configuration data that must be provided as a set of environmental variables. A few of them are mandatory, others are just optional with default values.
The following are the required variables (the application will refuse to start without them)
- API_KEY: XDR API Key (Advanced)
- API_KEY_ID: The XDR API Key identifier (its sequence number)
- FQDN: Fully Qualified Domain Name of the corresponding XDR Instance (i.e. myxdr.xdr.us.paloaltonetworks.com)
The following are optional variables
- PSK: the server will check the value in the Authorization header to accept the request (default to no authentication)
- DEBUG: if it exists then the engine will be more verbose (defaults to false)
- PORT: TCP port to bind the HTTP server to (defaults to 8080)
- OFFSET: PAN-OS timestamp does not include time zone. By default, they will be considered in UTC (defaults to +0 hours)
- QUOTA_SIZE: XDR ingestion alert quota (defaults to 600)
- QUOTA_SECONDS: XDR ingestion alert quota refresh period (defaults to 60 seconds)
- UPDATE_SIZE: XDR ingestion alert max number of alerts per update (defaults to 60)
- BUFFER_SIZE: size of the pipe buffer (defaults to 6000 = 10 minutes)
- T1: how often the pipe buffer polled for new alerts (defaults to 2 seconds)
For more details, you can check public repository documentation
Step 1: Generate an Advanced API key on Cortex XDR
Connect to your Cortex XDR instance and navigate to Setting > API Keys Generate an API Key of type Advanced granting the Administrator role to it (that role is required for Alert ingestion)
Step 2: Run the container image
Assuming you have Docker installed on your computer the following command line command would pull the image and run the micro-service with configuration options passed as environmental variables.
docker run -rm -p 8080:8080 -e PSK=hello -e FQDN=xxx.xdr.us.paloaltonetworks.com -e API_KEY=<my-api-key> -e API_KEY_ID=<my-key-id> -e DEBUG=yes ghcr.io/xhoms/xdrgateway
The Debug option provides more verbosity and it is recommended for initial experimentation. If everything goes as expected you’ll see a log message like the following one.
2021/03/17 21:32:25 starting http service on port 8080
Step 3: Perform a quick test to verify the micro-service is running
Use any API test tool (I like good-old Curl) to push a test payload.
curl -X POST -H "Authorization: hello" "http://127.0.0.1:8080/in" -d '{"src":"1.2.3.4","sport":1234,"dst":"4.3.2.1","dport": 4231,"time_generated":"2021/01/06 20:44:34","rule":"test_rule","serial":"9999","sender_sw_version":"10.0.4","subtype":"spyware","threat_name":"bad-bad-c2c","severity":"critical","action":"alert"}---annex---"Mozi Command and Control Traffic Detection"'
You should receive an empty status 200 response and see log messages in the server console like the following ones:
2021/03/17 21:52:16 api - successfully parsed alert
2021/03/17 21:52:17 xdrclient - successful call to insert_parsed_alerts
Step 4: Configure HTTP log Forward on the Firewall:
PAN-OS NGFW can forward alerts to HTTP/S endpoints. The feature configuration is available in Device > Server Profiles > HTTP and, for this case, you should use the following parameters
- IP : IP address of the container host
- Protocol : HTTP
- Port: 8080(default)
- HTTP Method POST

Some notes regarding the payload format
- URI Format:“/in” is the endpoint where the micro-service listen for POST requests containing new alerts
- Headers: Notice we’re setting the value hello in the Authorization header to match the -e PSK=hello configuration variable we passed to the micro-service
- Payload: The variable $threat_name was introduced with PAN-OS 10.0. If you’re using older versions of PAN-OS the you can use the variable $threatid instead
The last action to complete this job is to create a new Log Forwarding profile that will consume our recently created HTTP server and to use it in all security rules we want their alerts being ingested into XDR. The configuration object is available at Objects > Log Forwarding
Step 5: Final checks
As your PAN-OS NGFW starts generating alerts, you should see activity in the micro-service’s output log and alerts being ingested into the Cortex XDR instance. The source tag Palo Alto Networks — PAN-OS clearly indicates these alerts were pushed by our deployment.

Production-ready version
There a couple of additional steps to perform before considering the micro-service production-ready.
- The service should be exposed over a secure channel (TLS). Best option is to leverage the preferred forward-proxy environment available in our container environment. Notice the image honours the PORT env variable and should work out-of-the-box almost everywhere
- A restart-policy should be in place to restart the container in case of a crash. It is not a good idea to run more than one instance in a load balancing group because the quota enforcement won’t be synchronised between them
Summary
Palo Alto Networks products and services are built with automation and customisation in mind. They feature rich API’s that can be used to tailor them to specific customer needs.
In my case these APIs provided the customer with a choice:
- Either wait for the scope of the project (and its budget) to increase in order to accommodate additional products like Cortex Network Traffic Analysis or
- Deploy a compact “middle-man” that would connect the PAN-OS and Cortex XDR API’s
This experience empowered me by acquiring DevOps knowledge (building micro-services) I’m sure I’ll use in many opportunities to come. Special thanks to The Developer Relations team at Palo Alto Networks who provided documentation and examples, and were eager to explore this new use case. I couldn’t have done it without their help and the work they’ve been putting into improving our developer experience.
Ingest PAN-OS Alerts into Cortex XDR Pro Endpoint was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

Enterprise API design practices: Part 4
By: Francesco Vigo

Welcome to the last part of my series on creating valuable, usable and future-proof Enterprise APIs. I initially covered some background context and the importance of design, then I presented security and backend protection and in the third chapter I explored optimizations and scale; this post is about monitoring and Developer Experience (DX).
5. Monitor your APIs
Don’t fly blind: you must make sure that your APIs are properly instrumented so you can monitor what’s going on. This is important for a lot of good reasons, such as:
- Security: is your service under attack or being used maliciously? You should always be able to figure out if there are anomalies and react swiftly to prevent incidents.
- Auditing: depending on the nature of your business, you might be required by compliance or investigative reasons to produce an audit trail of user activity of your product, which requires proper instrumentation of your APIs as well as logging events.
- Performance and scale: you should be aware of how your API and backend are performing and if the key metrics are within the acceptable ranges, way before your users start complaining and your business is impacted. It’s always better to optimize before performance becomes a real problem.
- Cost: similarly, with proper instrumentation, you can be aware of your infrastructure costs sustained to serve a certain amount of traffic. That can help you with capacity planning and cost modeling if you’re monetizing your service. And to avoid unexpected bills at the end of the month that might force you to disrupt the service.
- Feedback: with proper telemetry data to look at when you deploy a change, you can understand how it’s performing and whether it was a good idea to implement it. It also allows you to implement prototyping techniques such as A/B testing.
- Learn: analyzing how developers use your API can be a great source of learning. It will give you useful insights on how your service is being consumed and is a valuable source of ideas that you can evaluate for new features (i.e. new use-case driven API endpoints).
Proper instrumentation of services is a vast topic and here I just want to summarize a few items that are usually easy to implement:
- Every API request should have a unique operation identifier that is stored in your logs and can help your ops team figure out what happened. This identifier should also be reported back to clients somewhere (usually in API response the headers) especially, but not exclusively, in case of errors.
- Keep an eye on API requests that fail with server errors (i.e. the HTTP 5xx ones) and, if the number is non-negligible, try to pinpoint the root cause: is it a bug, or some request that is causing timeouts in the backend? Can you fix it or make it faster?
- Keep a reasonable log history to allow tracking errors and auditing user activity for at least several days back in time.
- Create dashboards that help you monitor user activity, API usage, security metrics, etc. And make sure to check them often.
6. Make Developer Experience a priority
Last principle, but definitely not the least important. Even if you are lucky and the developers that use your APIs are required to do so because of an external mandate, you shouldn’t make their experience less positive.
In fact, the Developer Experience of your product should be awesome.
If developers properly understand how to work with your APIs, and enjoy doing so, they will likely run into fewer issues and be more patient when trying to overcome them. They will also provide you with good quality feedback. It will reduce the number of requests they generate on your support teams. They will give you insights and use case ideas for building a better product.
And, more importantly, happy developers will ultimately build better products for their own customers which, in turn, will act as a force multiplier for the value and success of your own product. Everybody wins in the API Economy model: customers, business stakeholders, partners, product teams, engineers, support teams.
I’ve witnessed several situations where developers were so exhausted and frustrated with working against a bad API that, as soon as they saw the light at the end of the tunnel, they stopped thinking creatively and just powered their way through to an MVP that was far from viable and valuable. But it checked the box they needed so they were allowed to move on. I consider this a very specific scenario of developer fatigue: let’s call it “API Fatigue”.
Luckily, I’ve also experienced the other way around, where new, unplanned, great features were added because the DX was good and we had fun integrating things together.
There are many resources out there that describe how to make APIs with great developer experience: the most obvious one is to create APIs that are clear and simple to consume. Apply the Principle of least astonishment.
I recommend considering the following when shipping an API:
- Document your API properly: it’s almost certain that the time you spend creating proper documentation at the beginning is saved later on when developers start consuming it. Specification files, such as OpenAPI Specification (OAS) tremendously help here. Also, if you follow the design-first approach that we discussed in the first post of this series, you’ll probably already have a specification file ready to use.
- Support different learning paths: some developers like to read all the documentation from top to bottom before writing a single line of code, others will start with the code editor right away. Instead of forcing a learning path, try to embrace the different mindsets and provide tools for everyone: specification files, Postman collections, examples in different programming languages, an easy-to-access API sandbox (i.e. something that doesn’t require installing an Enterprise product and waiting 2 weeks to get a license to make your first API call), a developer portal, tutorials and, why not, video walkthroughs. This might sound overwhelming but pays off in the end. With the proper design done first a lot of this content can be autogenerated from the specification files.
- Document the data structure: if you can, don’t just add lists of properties to your specs and docs: strive to provide proper descriptions of the fields that your API uses so that developers that are not fully familiar with your product can understand them. Remove ambiguity from the documentation as much as possible. This can go a long way, as developers can make mental models by understanding the data properly that often leads to better use cases than “just pull some data from this API”.
- Use verbs, status codes, and error descriptions properly: leverage the power of the protocol you are using (i.e. HTTP when using REST APIs) to define how to do things and what responses mean. Proper usage of status codes and good error messages will dramatically reduce the number of requests to your support team. Developers are smart and want to solve problems quickly so if you provide them with the right information to do so, they won’t bother you. Also, if you are properly logging and monitoring your API behavior, it will be easier for your support team to troubleshoot if your errors are not all “500 Internal Server Error” without any other detail.
Finally, stay close to the developers: especially if your API is being used by people external from your organization, it’s incredibly important to be close to them as much as you can to support them, learn and gather feedback on your API and its documentation. Allow everyone that is responsible for designing and engineering your API to be in that feedback loop, so they can share the learnings. Consider creating a community where people can ask questions and can expect quick answers (Slack, Reddit, Stack Overflow, etc.). I’ve made great friendships this way!
A few examples
There are many great APIs out there. Here is a small, not complete, list of products from other companies that, for one reason or another, are strong examples of what I described in this blog series:
- Microsoft Graph
- Google Cloud
- VirusTotal
- GitHub REST and GraphQL APIs
Conclusion
And that’s a wrap, thanks for reading! There are more things that I’ve learned that I might share in future posts, but in my opinion, these are the most relevant ones. I hope you found this series useful: looking forward to hearing your feedback and comments!
Enterprise API design practices: Part 4 was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

Enterprise API design practices: Part 3
By: Francesco Vigo

Welcome to the third part of my series on creating valuable, usable, and future-proof Enterprise APIs. Part 1 covered some background context and the importance of design, while the second post was about security and backend protection; this chapter is on optimization and scale.
3. Optimize interactions
When dealing with scenarios that weren’t anticipated in the initial release of an API (for example when integrating with an external product from a new partner), developers often have to rely on data-driven APIs to extract information from the backend and process it externally. While use-case-driven APIs are generally considered more useful, sometimes there might not be one available that suits the requirements of the novel use case that you must implement.
By considering the following guidelines when building your data-driven APIs, you can make them easier to consume and more efficient for the backend and the network, improving performance and reducing the operational cost (fewer data transfers, faster and cheaper queries on the DBs, etc.).
I’ll use an example with sample data. Consider the following data as a representation of your backend database: an imaginary set of alertsthat your Enterprise product detected over time. Instead of just three, imagine the following JSON output with thousands of records:
{
"alerts" : [
{
"name": "Impossible Travel",
"alert_type": "Behavior",
"alert_status": "ACTIVE",
"severity": "Critical",
"created": "2020-09-27T09:27:33Z",
"modified": "2020-09-28T14:34:44Z",
"alert_id": "8493638e-af28-4a83-b1a9-12085fdbf5b3",
"details": "... long blob of data ...",
"evidence": [
"long list of stuff"
]
},
{
"name": "Malware Detected",
"alert_type": "Endpoint",
"alert_status": "ACTIVE",
"severity": "High",
"created": "2020-10-04T11:22:01Z",
"modified": "2020-10-08T08:45:33Z",
"alert_id": "b1018a33-e30f-43b9-9d07-c12d42646bbe",
"details": "... long blob of data ...",
"evidence": [
"long list of stuff"
]
},
{
"name": "Large Upload",
"alert_type": "Network",
"alert_status": "ACTIVE",
"severity": "Low",
"created": "2020-11-01T07:04:42Z",
"modified": "2020-12-01T11:13:24Z",
"alert_id": "c79ed6a8-bba0-4177-a9c5-39e7f95c86f2",
"details": "... long blob of data ...",
"evidence": [
"long list of stuff"
]
}
]
}
Imagine implementing a simple /v1/alerts REST API endpoint to retrieve the data and you can’t anticipate all the future needs. I recommend considering the following guidelines:
- Filters: allow your consumers to reduce the result set by offering filtering capabilities in your API on as many fields as possible without stressing the backend too much (if some filters are not indexed it could become expensive, so you must find the right compromise). In the example above, good filters might include: name, severity, alert_type, alert_status, created, and modified. More complicated fields like details and stack might be too expensive for the backend (as they might require full-text search) and you would probably leave them out unless really required.
- Data formats: be very consistent in how you present and accept data across your API endpoints. This holds true especially for types such as numbers and dates, or complex structures. For example, to represent integers in JSON you can use numbers (i.e. "fieldname": 3) or strings (i.e. "fieldname": "3" ): no matter what you choose, you need to be consistent across all your API endpoints. And you should also use the same format when returning outputs and accepting inputs.
- Dates: dates and times can be represented in many ways: timestamps (in seconds, milliseconds, microseconds), strings (ISO8601 such as in the example above, or custom formats such as 20200101), with or without time zone information. This can easily become a problem for the developers. Again, the key is consistency: try to accept and return only a single date format (i.e. timestamp in milliseconds or ISO8601) and be explicit about whether you consider time zones or not: usually choosing to do everything in UTC is a good idea because removes reduce ambiguity. Make sure to document the date formats properly.
- Filter types: depending on the type of field, you should provide appropriate filters, not just equals. A good example is supporting range filters for dates that, in our example above, allow consumers to retrieve only the alerts created or modified in a specific interval. If some fields are enumerators with a limited number of possible values, it might be useful to support a multi-select filter (i.e. IN): in the example above it should be possible to filter by severity values and include only the High and Critical values using a single API call.
- Sorting: is your API consumer interested only in the older alerts or the newest? Supporting sorting in your data-driven API is extremely important. One field to sort by is generally enough, but sometimes (depending on the data) you might need more.
- Result limiting and pagination: you can’t expect all the entries to be returned at once (and your clients might not be interested or ready to ingest all of them anyway), so you should implement some logic where clients should retrieve a limited number of results and can get more when they need. If you are using pagination, clients should be able to specify the page size within a maximum allowed value. Defaults and maximums should be reasonable and properly documented.
- Field limiting: consider whether you really need to return all the fields of your results all the time, or if your clients usually just need a few. By letting the client decide what fields (or groups of fields) your API should return, you can reduce the network throughput and backend cost, and performance. You should provide and document some sane default. In the example above, you could decide to return by default all the fields except details and evidence, which can be requested by the client only if they explicitly ask, using an include parameter.
Let’s put it all together. In the above example, you should be able to retrieve, using a single API call, something like this:
Up to 100 alerts that were created between 2020–04–01T00:00:00Z (April 1st 2020) and 2020–10–01T00:00:00Z (October 1st 2020) with severity “medium” or “high”, sorted by “modified” date, newest first, including all the fields but “evidence”.
There are multiple ways you can implement this; through REST, GraphQL, or custom query languages: in many cases, you don’t need something too complex as often data sets are fairly simple. The proper way depends on many design considerations that are outside the scope of this post. But having some, or most of these capabilities in your API will make it better and more future proof. By the way, if you’re into GraphQL, I recommend reading this post.
4. Plan for scale
A good API shouldn’t be presumptuous: it shouldn’t expect that clients are not doing anything else than waiting for its response, especially at scale, where performance is key.
If your API requires more than a few milliseconds to produce a response, I recommend considering supporting jobs instead. The logic can be as follows:
- Implement an API endpoint to start an operation that is supposed to take some time. If accepted, it would return immediately with a jobId.
- The client stores the jobId and periodically reaches out to a second endpoint that, when provided with the jobId, returns the completion status of the job (i.e. running, completed, failed).
- Once results are available (or some are), the client can invoke a third endpoint to fetch the results.
Other possible solutions include publisher/subscriber approaches or pushing data with webhooks, also depending on the size of the result set and the speed requirements. There isn’t a one-size-fits-all solution, but I strongly recommend avoiding a polling logic where API clients are kept waiting for the server to reply while it’s running long jobs in the backend.
If your need high performance and throughput when in your APIs, consider gRPC, as its binary representation of data using protocol buffers has significant speed advantages over REST.
Side note: if you want to learn more about REST, GraphQL, webhooks, or gRPC use cases, I recommend starting from this post.
Finally, other considerations for scale include supporting batch operations on multiple entries at the same time (for example mass updates), but I recommend considering them only when you have a real use case in mind.
What’s next
In the next and last chapter, I’ll share some more suggestions about monitoring and supporting your API and developer ecosystem.
Enterprise API design practices: Part 3 was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

Security Automation at BlackHat Europe 2022: Part 2
By: James Holland
In part 2 of this double-header, we look at the operations side of the conference infrastructure. If you missed part one, it’s here.
Automating Security Operations Use Cases with Cortex XSOAR
To reiterate from the previous post, on the Black Hat conference network we are likely to see malicious activity, in fact it is expected. As the Black Hat leadership team say, occasionally we find a “needle in a needlestack”, someone with true malicious intent. But how do you go about finding malicious activity with real intent within a sea of offensive security demonstrations and training exercises?
Without being able to proactively block the majority of malicious activity (in case we disrupt training exercises, or break someone’s exploitation demo in the Arsenal), we hunt. To hunt more efficiently we automate. It’s a multi-vendor approach, with hunters from Palo Alto Networks, Cisco, RSA Netwitness and Ironnet all on-site and collaborating. Cortex XSOAR provides the glue between all the deployed inline and out-of-band security tooling, as well as being the conduit into Slack for the analysts to collaborate and communicate.
An investigation may start from various angles and different indicators, and being able to quickly classify if the source of the incident is a training class is a great start. Without leaving Slack, an Cortex XSOAR chatbot is able to provide an automated lookup of a machine’s MAC address, and tell the analyst: the IP address, the vendor assigned to that MAC address where applicable, the wireless access point the host is connected to (thanks to the Cortex XSOAR integration with Cisco Meraki, docs here), and crucially the firewall zone where the machine is located. In the example below, the “tr_digi_forens_ir” zone tells us this machine is in a training class, specifically the digital forensics and incident response class:

That’s really useful information when examining internal hosts, but how about a lookup for IP addresses which are sending traffic towards the Black Hat conference infrastructure in a suspicious way from the outside, from the Internet? To see if any of the variety of available Threat Intelligence sources have specific information available, and the level of confidence. There’s a Slack chatbot query for that too, powered by Cortex XSOAR:

Or checking Threat Intellignce sources for information about a domain being contacted by a machine in the visitor wireless network which is potentially compromised, and analysing it in a sandbox too?

The chatbot has many features, all available to any analyst from any vendor working in the NOC, with no requirement to learn any product’s user interface, just a simple Slack chatbot:

Other ways of automating our operations included ingestion of the data from other deployed toolsets, like the Palo Alto Networks IoT platform, which below is shown creating incidents in Cortex XSOAR based on the passive device and application profiling it does on the network traffic:

The data from the IoT platform enriches the incident, providing the analyst wish a page of information to quickly understand the context of the incident and what action would be appropriate:


As well as integrating Cortex XSOAR with Cisco Meraki, we also integrated Cortex XSOAR with RSA Netwitness, and were able to use alerts from Netwitness to generate and work through any incidents that looked like potentially malicious behaviour.
We also utilised Cortex XSOAR for some more network-focused use cases. For instance, by leveraging the intelligence data maintained within the PAN-OS NGFWs, we were interested to see if there was any traffic approaching the Black Hat infrastructure’s public facing services from TOR exit nodes, and we weren’t disappointed:

We also leveraged Cortex XSOAR playbooks to provide an OSINT news into a dedicated Slack channel, so analysts could see breaking stories as they happen:

And we even used a Cortex XSOAR playbook to proactively monitor device uptime, which would alert into Slack if a critical device stopped responding and was suspected to be “down”:

Summary
It’s an infrastructure full of malicious activity, on purpose. It gets built, rapidly, to a bespoke set of requirements for each conference. It is then operated by a collaboration of Black Hat staff and multiple security vendors’ staff.
That can only happen successfully with high levels of automation, in both the build and the operation phases of the conference. With the automation capabilities of the PAN-OS network security platform, the orchestration from Cortex XSOAR, and the collaboration across vendors, the Black Hat conference was once again a safe and reliable environment for all who attended.
Acknowledgements
Palo Alto Networks would like to once again thank Black Hat for choosing us to provide network security, as well as the automation and orchestration platform, for the operations centres of the conferences this year in Singapore, Las Vegas and London ♥
Thank you Jessica Stafford, Bart Stump, Steve Fink, Neil R. Wyler and ᴘᴏᴘᴇ for your leadership and guidance. Thank you Jessica Bair Oppenheimer, Evan Basta, Dave Glover, Peter Rydzynski and Muhammad Durrani for all the cross-vendor collaboration along with your teams including Rossi Rosario, Paul Fidler, Panagiotis (Otis) Ioannou, Paul Mulvihill, Iain Davison, and (sorry) everyone else who may be lurking on other social media platforms where I couldn’t find them!
And of course, thanks so much to the amazing folks representing Palo Alto Networks in London, great job team; Matt Ford, Ayman Mahmoud, Matt Smith, Simeon Maggioni and Doug Tooth. Also Scott Brumley for his work on the Cortex XSOAR Slack chatbot during the USA conference earlier this year.

Security Automation at BlackHat Europe 2022: Part 2 was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

Security Automation at BlackHat Europe 2022: Part 1
By: James Holland
In part 1 of this double-header, we look at the build and configuration tasks for the conference.

It’s been called one of the most dangerous networks in the world, and there are many good reasons why each Black Hat conference has its own IT infrastructure built from the ground up.
There are training classes, where attendees learn offensive security techniques, from hacking infrastructure to attacking the Linux kernel, exploiting IIoT, and abusing directory services. There is the Arsenal, where researchers demonstrate the latest techniques, as well as briefings from experts in a variety of security domains. Then add hundreds of eager and interested attendees, who are not only learning from the content at the conference, but may have their own tricks to bring to the party too.
Roll Your Own
A dedicated infrastructure that does not rely (as far as is possible) on the venue’s own network and security capabilities is the only feasible way to host this kind of community of keen security professionals. Building an infrastructure per conference means that a multi-disciplined team, from a variety of vendors and backgrounds, must find ways to make the build as streamlined as possible. Automation is key to the approach.

The Black Hat team chose Palo Alto Networks to provide network security for all three of their conferences during 2022, renewing an annual partnership which now spans 6 years. The partnership includes Palo Alto Networks supplying their staff to work in the conference NOCs, configuring and operating several PA-Series hardware next-generation firewalls (NGFWs). In 2022, the partnership expanded to include the use of Cortex XSOAR to automate security operations.
Automating the Build Process
The build happens in a short period of time; the core infrastructure went from cardboard boxes to “live” in just over one day for the Europe 2022 conference. A design including complete segmentation of each conference area (including segmenting each training class, the Arsenal, the exhibiting vendors, the registration area, the NOC itself, and more), requires a lot of IP subnets and VLANs, multiple wireless SSIDs, and several DHCP servers and scopes. Some DHCP scopes require reservations, particularly where infrastructure components require predictable IP addressing, but there are too many of them for configuration of static addressing to be feasible. And change happens; IT security is a fast-paced industry, and we knew from experience that we would be adding, moving or changing the configuration data as the conference progressed.
With a single source for all of that configuration data, and a PAN-OS network security platform with plenty of automation capability, automation was inevitable, the only choice was the flavour!
Step forward Ansible. With its task-based approach, its ability to bring in configuration data from almost any structured source, and a collection of modules for idempotent configuration of PAN-OS, it was the perfect match for the requirements.
All of those segmented subnets needed configuring with IP addresses, as well as security zones. Here you can see some excerpts from a playbook execution, where Ansible observed modifications in the configuration data source, and changes were made to only to the required items, with the rest being of the configuration left in original state:


This is important; the initial configuration would not be the final configuration, so when re-executing Ansible to make incremental changes, we only want to make modifications where they are needed. This approach also speeds up the processing time for changes.
Below you can also see a long (and truncated, for brevity) list of DHCP reservations required for some of the infrastructure components. They are being configured with a single Ansible task; this is a list of MAC addresses and IP address that definitely does not want to be configured by hand!

The PAN-OS next-generation firewalls are the DHCP servers for every subnet, and at scale, such a large quantity of DHCP servers is also something which nobody would want to configure by hand, so again, Ansible did that for us automatically:

Automatically Keeping an Eye on Suspicious Hosts
It is rare that the Black Hat team has to take any action against a conference attendee; the majority of seemingly malicious activity is usually part of the trainings, a demo in the Arsenal, or something else “expected”. Occasionally attendees approach or cross the line of acceptable behaviour, and during those instances and investigations it is very useful to be able to view the historical data across the conference.
User-ID provides a huge benefit when the network should include known and authenticated users, but at Black Hat conferences, that is not the case. There is no authentication past the pre-shared key to join the wireless network, and no tracking of any person that attends the conference. However, we chose to modify the user-to-IP mapping capability of User-ID to become MAC-to-IP mappings. Being the DHCP server, the PAN-OS NGFWs knew the MAC address of each host as it requested an IP address, so we routed that information into the mapping database. This meant we were able to observe a host machine (without any knowledge of the person using it) as it moved throughout the conference. Even if the machine left the network and joined again later (after lunch!?) with a new DHCP IP address, or if the machine moved between different wireless SSIDs and hence different IP subnets.
Should action be required when a machine is exhibiting unacceptable behaviour, one option is to utilise network security controls based on the MAC address of the host, instead of the IP address. These controls would be applicable no matter which network the host moved into.
Part Two
The second part of this double-header will focus on the operations side of the conference infrastructure, as the team (below) move into threat hunting mode. Carry on reading here…

Security Automation at BlackHat Europe 2022: Part 1 was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.
The Developer’s Guide To Palo Alto Networks Cloud NGFW for AWS
By: Migara Ekanayake
Busy modernizing your applications? One thing you can’t cut corners on is the security aspect. Today, we will discuss network security — inserting inbound, outbound, and VPC-to-VPC security for your traffic flows, to be precise, without compromising DevOps speed and agility. When it comes to network security for cloud-native applications, it’s challenging to find a cloud-native security solution that provides the best in class NGFW security while consuming security as a cloud-native service. This means developers have to compromise security and find a solution that fits their development needs. That’s no longer the case — today, we will look at how you can have your cake and eat it too!
Infrastructure-as-Code is one of the key pillars in the application modernization journey, and there is a wide range of tools you can choose from. Terraform is one of the industry’s widely adopted infrastructure-as-code tools to shift from manual, error-prone provisioning to automated provisioning at scale. And, we firmly believe that it is crucial to be able to provision and manage your cloud-native security using Terraform next to your application code where it belongs. We have decided to provide launch day Terraform support for Palo Alto Networks Cloud NGFW for AWS with our brand new cloudngfwaws Terraform provider, allowing you to perform day-0, day-1, and day-2 tasks. You can now consume our Cloud NGFW with the tooling you are already using without leaving the interfaces you are familiar with; it’s that simple!
Getting Started
Prerequisites
- Subscribed to Palo Alto Networks Cloud NGFW via the AWS marketplace
- Your AWS account is onboarded to the Cloud NGFW
AWS Architecture
We will focus on securing an architecture similar to the topology below. Note the unused Firewall Subnet — later, we will deploy the Cloud NGFW endpoints into this subnet and make the necessary routing changes to inspect traffic through the Cloud NGFW.
Authentication and Authorization
Enable Programmatic Access
To use the Terraform provider, you must first enable the Programmatic Access for your Cloud NGFW tenant. You can check this by navigating to the Settings section of the Cloud NGFW console. The steps to do this can be found here.
You will authenticate against your Cloud NGFW by assuming roles in your AWS account that are allowed to make API calls to the AWS API Gateway service. The associated tags with the roles dictate the type of Cloud NGFW programmatic access granted — Firewall Admin, RuleStack Admin, or Global Rulestack Admin.
The following Terraform configuration will create an AWS role which we will utilize later when setting up the cloudngfwaws Terraform provider.
https://medium.com/media/d8915d509a0c0702eead9a113506a065/hrefSetting Up The Terraform Provider
In this step, we will configure the Terraform provider by specifying the ARN of the role we created in the previous step. Alternatively, you can also specify individual Cloud NGFW programmatic access roles via lfa-arn, lra-arn, and gra-arn parameters.
https://medium.com/media/ca97592b6491495aa21242c76a4077bd/hrefNote how Terraform provider documentation specifies Admin Permission Type required for each Terraform resource as Firewall, Rulestack, or Global Rulestack. You must ensure the Terraform provider is configured with an AWS role(s) that has sufficient permission(s) to use the Terraform resources in your configuration file.
Rulestacks and Cloud NGFW Resources
There are two fundamental constructs you will discover throughout the rest of this article — Rulestacks and Cloud NGFW resources.
A rulestack defines the NGFW traffic filtering behavior, including advanced access control and threat prevention — simply a set of security rules and their associated objects and security profiles.
Cloud NGFW resources are managed resources that provide NGFW capabilities with built-in resilience, scalability, and life-cycle management. You will associate a rulestack to an NGFW resource when you create one.
Deploying Your First Cloud NGFW Rulestack
First, let’s start by creating a simple rulestack, and we are going to use the BestPractice Anti Spyware profile. BestPractice profiles are security profiles that come built-in, which will make it easier for you to use security profiles from the start. If required, you can also create custom profiles to meet your demands.
https://medium.com/media/aa4fbade6ffa823fcec6836a47a6f212/hrefThe next step is to create a security rule that only allows HTTP-based traffic and associate that with the rulestack we created in the previous step. Note that we use the App-ID web-browsing instead of traditional port-based enforcement.
https://medium.com/media/da8119359cfd91cd2d471af2ddc12fde/hrefCommitting Your Rulestack
Once the rulestack is created, we will commit the rulestack before assigning it to an NGFW resource.
Note: cloudngfwaws_commit_rulestack should be placed in a separate plan as the plan that configures the rulestack and its contents. If you do not, you will have perpetual configuration drift and need to run your plan twice so the commit is performed.
https://medium.com/media/9eeaa739ecbf2c2ab3113b5f05d8f879/hrefDeploying Your First Cloud NGFW Resource
Traffic to and from your resources in VPC subnets is routed through to NGFW resources using NGFW endpoints. How you want to create these NGFW endpoints is determined based on the endpoint mode you select when creating the Cloud NGFW resource.
- ServiceManaged — Creates NGFW endpoints in the VPC subnets you specify
- CustomerManaged — Creates just the NGFW endpoint service in your AWS account, and you will have the flexibility to create NGFW endpoints in the VPC subnets you want later.
In this example, we are going to choose the ServiceManaged endpoint mode. Also, notice how we have specified the subnet_mapping property. These are the subnets where your AWS resources live that you want to protect.
In production, you may want to organize these Terraform resources into multiple stages of your pipeline — first, create the rulestack and its content, and proceed to the stage where you will commit the rulestack and create the NGFW resource.
https://medium.com/media/3af82ed097ccfb15ed2ced3400baaa22/hrefAt this point, you will have a Cloud NGFW endpoint deployed into your Firewall subnet.
You can retrieve the NGFW endpoint ID to Firewall Subnet mapping via cloudngfwaws_ngfw Terraform data resource. This information is required during route creation in the next step.
https://medium.com/media/8e3e8465f5a224b6761da982f7477075/hrefRouting Traffic via Cloud NGFW
The final step is to add/update routes to your existing AWS route tables to send traffic via the Cloud NGFW. The new routes are highlighted in the diagram below. Again, you can perform this via aws_route or aws_route_table Terraform resource.
Learn more about Cloud NGFW
In this article, we discovered how to deploy Cloud NGFW in the Distributed model. You can also deploy Cloud NGFW in a Centralized model with AWS Transit Gateway. The Centralized model will allow you to run Cloud NGFW in a centralized “inspection” VPC and connect all your other VPCs via Transit Gateway.
We also discovered how to move away from traditional port-based policy enforcement and move towards application-based enforcement. You can find a comprehensive list of available App-IDs here.
There is more you can do with Cloud NGFW.
- Threat prevention — Automatically stop known malware, vulnerability exploits, and command and control infrastructure (C2) hacking with industry-leading threat prevention.
- Advanced URL Filtering — Stop unknown web-based attacks in real-time to prevent patient zero. Advanced URL Filtering analyzes web traffic, categorizes URLs, and blocks malicious threats in seconds.
Cloud NGFW for AWS is a regional service. Currently, it is available in the AWS regions enumerated here. To learn more, visit the documentation and FAQ pages. To get hands-on experience with this, please subscribe via the AWS Marketplace page.
The Developer’s Guide To Palo Alto Networks Cloud NGFW for AWS was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

Dynamic Firewalling with Palo Alto Networks NGFWs and Consul-Terraform-Sync
By: Migara Ekanayake

I cannot reach www.isitthefirewall.com…!! Must be the firewall!!
Sounds familiar? We’ve all been there. If you were that lonely firewall administrator who tried to defend the good ol’ firewall, congratulations for making it this far. Life was tough back then when you only had a handful of firewall change requests a day, static data centres and monolithic applications.
Fast forward to the present day, we are moving away from traditional static datacentres to modern dynamic datacentres. Applications are no longer maintained as monoliths, they are arranged into several smaller functional components (aka Microservices) to gain development agility. As you gain development agility you introduce new operational challenges.
What are the Operational Challenges?
Now that you have split your monolithic application into a dozen of microservices, most likely you are going to have multiple instances of each microservice to fully realise the benefits of this app transformation exercise. Every time you want to bring up a new instance, you open a ticket and wait for the firewall administrator to allow traffic to that new node, this could take days if not weeks.
When you add autoscaling into the mix (so that the number of instances can dynamically scale in and out based on your traffic demands) having to wait days or weeks before traffic can reach those new instances defeats the whole point of having the autoscaling in the first place.
Long live agility!
The Disjoint
Traditionally, firewall administrators used to retrieve the requests from their ticketing system and implement those changes via UI or CLI during a maintenance window. This results in creating an impedance mismatch with the application teams and overall slower delivery of the solutions and can also introduce an element of human error during the ticket creation process as well as the implementation of this request. If you are a network/security administrator who has recognised these problems, the likelihood is that you have already written some custom scripts and/or leveraged a configuration management platform to automate some of these tasks. Yes, this solves the problem to a certain extent, still, there is a manual handoff task between Continuous Delivery and Continuous Deployment.

Consul-Terraform-Sync
Network and security teams can solve these challenges by enabling dynamic service-driven network automation with self-service capabilities using an automation tool that supports multiple networking technologies.
HashiCorp and Palo Alto Networks recently collaborated on a strategy for this using HashiCorp’s Network Infrastructure Automation (NIA). This works by triggering a Terraform workflow that automatically updates Palo Alto Networks NGFWs or Panorama based on changes it detects from the Consul catalog.
Under the hood, we are leveraging Dynamic Address Groups (DAGs). In PAN-OS it is possible to dynamically associate (and remove) tags from IP addresses, using several ways, including the XML API. A tag is simply a string that can be used as match criteria in Dynamic Address Groups, allowing the Firewall to dynamically allow/block traffic without requiring a configuration commit.
If you need a refresher on DAGs here is a great DAG Quickstart guide.
Scaling Up with Panorama
The challenge for large-scale networks is ensuring every firewall that enforces policies has the IP address-to-tag mappings for your entire user base.
If you are managing your Palo Alto Networks NGFWs with Panorama you can redistribute IP address-to-tag mappings to your entire firewall estate within a matter of seconds. This could be your VM-Series NGFWs deployed in the public cloud, private cloud, hybrid cloud or hardware NGFWs in your datacentre.

What’s Next?
If you have read this far, why not give this step-by-step guide on how you can automate the configuration management process for Palo Alto Networks NGFWs with Consul-Terraform-Sync a try?.
For more resources, check out:
- Webinar: Network Security Automation with HashiCorp Consul-Terraform-Sync and Palo Alto Networks
- Whitepaper: Enabling Dynamic Firewalling with Palo Alto and HashiCorp
Dynamic Firewalling with Palo Alto Networks NGFWs and Consul-Terraform-Sync was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

PAN.dev and the Rise of Docs-as-Code
By: Steven Serrata

It started out as a simple idea — to improve the documentation experience for the many open-source projects at Palo Alto Networks. Up until that point (and still to this day) our open source projects were documented in the traditional fashion: the ubiquitous README.md and the occasional Read the Docs or GitHub Pages site. In truth, there is nothing inherently “wrong” with this approach but there was a sense that we could do more to improve the developer experience of adopting our APIs, tools and SDKs. So began our exploration into understanding how companies like Google, Microsoft, Facebook, Amazon, Twilio, Stripe, et al., approached developer docs. Outside of visual aesthetics, it was clear that these companies invested a great deal of time into streamlining their developer onboarding experiences, something we like to refer to as “time-to-joy.” The hard truth was that our product documentation site was catering to an altogether different audience and it was evident in the fact that features like quick starts (“Hello World”), code blocks, and interactive API reference docs were noticeably absent.
After some tinkering we had our first, git-backed developer portal deployed using Gatsby and AWS Amplify, but that was only the beginning.
Great Gatsby!
Circa March 2019. Although we technically weren’t new to Developer Relations as a practice, it was the first time we made an honest attempt to solve for the lack of developer-style documentation at Palo Alto Networks. We quickly got to work researching how other companies delivered their developer sites and received a tip from a colleague to look at Gatsby — a static-site generator based on GraphQL and ReactJS. After some tinkering, we had our first git-backed developer portal deployed using AWS Amplify, but that was only the beginning. It was a revelation that git-backed, web/CDN hosting services existed and that rich, interactive, documentation sites could be managed like code. It wasn’t long before we found Netlify, Firebase, and, eventually, Docusaurus — a static-site generator specializing in the documentation use case. It was easy to pick up and run with this toolchain as it seamlessly weaved together the open-source git flows we were accustomed to with the ability to rapidly author developer docs using markdown and MDX — a veritable match made in heaven. We had the stack — now all we needed was a strategy.
At a deeper level, it’s a fresh, bold new direction for a disruptive, next-generation security company that, through strategic acquisitions and innovation, has found itself on the doorsteps of the API Economy.
For Developers, by Developers
At the surface, pan.dev is a family of sites dedicated to documentation for developers, by developers. At a deeper level, it’s a fresh, bold, new direction for a disruptive, next-generation security company that, through strategic acquisitions and innovation, has found itself on the doorsteps of the API Economy (more on that topic coming soon!).

The content published on pan.dev is for developers because it supports and includes the features that developer audiences have come to expect, such as code blocks and API reference docs (and dark mode!):

It’s by developers since the contributing, review and publishing flows are backed by git and version control systems like GitLab and GitHub. What’s more, by adopting git as the underlying collaboration tool, pan.dev is capable of supporting both closed and open-source contributions. Spot a typo or inaccuracy? Open a GitHub issue. Got a cool pan-os-python tutorial you’d like to contribute? Follow our contributing guidelines and we’ll start reviewing your submission. By adopting developer workflows, pan.dev is able to move developer docs closer to the source (no pun intended). That means we can deliver high-quality content straight from the minds of the leading automation, scripting, DevOps/SecOps practitioners in the cybersecurity world.

So, What’s Next?
If we’ve learned anything, over the years, it’s that coming up with technical solutions might actually be the easier aspect of working on the pan.dev project. What can be more difficult (and arguably more critical) is garnering buy-in from stakeholders, partners and leadership, while adapting to the ever-evolving needs of the developer audience we’re trying to serve. All this while doing what we can to ensure the reliability and scalability of the underlying infrastructure (not to mention SEO, analytics, accessibility, performance, i18n, etc.). The real work has only recently begun and we’re ecstatic to see what the future holds. Already, we’ve seen a meteoric rise in monthly active users (MAUs) across sites (over 40K!) and pan.dev is well-positioned to be the de facto platform for delivering developer documentation at Palo Alto Networks.
To learn more and experience it for yourself, feel free to tour our family of developer sites and peruse the GitHub Gallery of open-source projects at Palo Alto Networks.
For a deeper dive into the pan.dev toolchain/stack, stay tuned for part 2!
PAN.dev and the Rise of Docs-as-Code was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

User-ID / EDL … better both of them
By: Xavier Homs
User-ID or EDL … why not both?

User-ID and External Dynamic Lists (EDL’s) are probably the most commonly used PAN-OS features to share external IP metadata with the NGFW. The Use cases for them are enormous: from blocking known DoS sources to forward traffic from specific IP addresses on low latency links.
Let’s take for instance the following logs generated by the SSH daemon in a linux host.
https://medium.com/media/6a8bdd451748be2a7326a77029a6a200/hrefWon’t it be great to have a way to share the IP addresses used by the user xhoms with the NGFW so specific security policies are applied to traffic sourced by that address just because we now know the endpoint is being managed by that user?
The following Shell CGI script could be used to create an External Dynamic List (EDL) that would list all known IP addresses the account xhoms was used to login into this server.
https://medium.com/media/ec08fd38c48c8a005909d5b502e521cc/hrefThat is great but:
- When would addresses be removed from the list? At the daily log rollout?
- Are all listed IP addresses still being operated by the user xhoms?
- Is there any way to remove addresses from the list at user logout?
To overcome these (and many other) limitations Palo Alto Networks NGFW feature a very powerful IP tagging API called User-ID. Although EDL and User-ID cover similar objectives there are fundamental technical differences between them:
- User-ID is “asynchronous” (push mode), supports bot “set” and “delete” operations and provides a very flexible way to create groupings. Either by tagging address objects (Dynamic Address Group — DAG ) or by mapping users to addresses and then tagging these users (Dynamic User Group — DUG). User-ID allows for these tags to have a timeout in order for the PAN-OS device to take care of removing them at their due time.
- EDL is “universal”. Almost all network appliance vendors provide a feature to fetch an IP address list from a URL.
Years ago it was up to the end customer to create its own connectors. Just like the CGI script we shared before. But as presence of PAN-OS powered NGFW’s grown among enterprise customers more application vendors decided to leverage the User-ID value by providing API clients out-of-the-box. Although this is great (for Palo Alto Network customers) losing the option to fetch a list from a URL (EDL mode) makes it more difficult to integrate legacy technologies that might be out there still in the network.
Creating EDL’s from User-ID enabled applications
Let’s assume you have an application that features a PAN-OS User-ID API client and that is capable of pushing log entries to the PAN-OS NGFW in an asynchronous way. Let’s assume, as well, that there are still some legacy network devices in your network that need to fetch that address metadata from a URL. How difficult it would be to create a micro-service for that?
One option would be to leverage a PAN-OS XML SDK like PAN Python or PAN GO and to create a REST API that extracts the current state from the PAN-OS device using its operations API.
GET https://<pan-os-device>/api
?key=<API-KEY>
&type=op
&cmd=<show><object><registered-ip><all></all></registered-ip></object></show>
HTTP/1.1 200 OK
<response status="success">
<result>
<entry ip="10.10.10.11" from_agent="0" persistent="1">
<tag>
<member>tag10</member>
</tag>
</entry>
<entry ip="10.10.10.10" from_agent="0" persistent="1">
<tag>
<member>tag10</member>
</tag>
</entry>
<count>2</count>
</result>
</response>
It shouldn’t be that difficult to parse the response and provide a plain list out of it, right? Come back to me if you’re thinking on using a XSLT processor to pipe it at the output of a cURL command packing everything as a CGI script because we might have some common grounds ;-)
I’d love to propose a different approach though: to hijack User-ID messages by implementing a micro-service that behaves as a reverse proxy between the application that features the User-ID client and the PAN-OS device. I like this approach because it opens the door to other interesting use cases like enforcing timeout (adding timeout to entries that do not have it), converting DAG messages into DUG equivalents or adding additional tags based on the source application.
Components for a User-ID to EDL reverse proxy
The ingredients for such a receipt would be:
- a light http web server featuring routing (hijack messages sent to /api and let other requests pass through). GO http package and the type ReverseType fits like a glove in this case.
- a collection of XML-enabled GO structs to unmarshal captured User-ID messages.
- a library featuring User-ID message payload processing capable of keeping a valid (non-expired) state of User-to-IP, Group-to-IP and Tag-to-IP maps.
Fortunately for us there is the xhoms/panoslib GO module that covers 2 and 3 (OK, let’s be honest, I purpose-built the library to support this article)
With all these components it won’t be that difficult to reverse proxy a PAN-OS https service monitoring User-ID messages and exposing the state of the corresponding entries on a virtual endpoint named /edl that won’t conflict at all with the PAN-OS schema.
I wanted to prove the point and ended up coding the receipt as a micro-service. You can either check the code in the the GitHub repository xhoms/uidmonitor or run it from a Docker-enabled host. Take a look to the related panos.pan.dev tutorial for insights in the source code rationale
docker run --rm -p 8080:8080 -e TARGET=<pan-os-device> ghcr.io/xhoms/uidmonitor
Some payload examples
To get started you can check the following User-ID payload that registers the tag test to the IP addresses 10.10.10.10 and 10.10.20.20. Notice the different timeout values (100 vs 10 seconds)
POST http://127.0.0.1:8080/api/ HTTP/1.1
Content-Type: application/x-www-form-urlencoded
key=<my-api-key>
&type=user-id
&cmd=<uid-message>
<type>update</type>
<payload>
<register>
<entry ip="10.10.10.10">
<tag>
<member timeout="100">test</member>
</tag>
</entry>
<entry ip="10.10.20.20">
<tag>
<member timeout="10">test</member>
</tag>
</entry>
</register>
</payload>
</uid-message>
Just after pushing the payload (before the 10 second tag expires) perform the following GET request on the /edl endpoint and verify the provided list contains both IP addresses.
GET http://127.0.0.1:8080/edl/
?list=tag
&key=test
HTTP/1.1 200 OK
Content-Type: text/plain
Date: Mon, 12 Apr 2021 10:07:50 GMT
Content-Length: 24
Connection: close
10.10.20.20
10.10.10.10
A few seconds (+10) later the same transaction should return only the IP address whose tag has not expired.
GET http://127.0.0.1:8080/edl/
?list=tag
&key=test
Content-Type: text/plain
Date: Mon, 12 Apr 2021 10:08:55 GMT
Content-Length: 12
Connection: close
10.10.10.10
Now let’s use a bit more complex payload. The following one register (login) two users mapped to IP addresses and then apply the user tag admin to both of them. As in the previous case, each tag has a different expiration value.
POST http://127.0.0.1:8080/api/ HTTP/1.1
Content-Type: application/x-www-form-urlencoded
key=<my-api-key>
&type=user-id
&cmd=<uid-message>
<type>update</type>
<payload>
<login>
<entry name="bar@test.local" ip="10.10.20.20" timeout="100"></entry>
<entry name="foo@test.local" ip="10.10.10.10" timeout="100"></entry>
</login>
<register-user>
<entry user="foo@test.local">
<tag>
<member timeout="100">admin</member>
</tag>
</entry>
<entry user="bar@test.local">
<tag>
<member timeout="10">admin</member>
</tag>
</entry>
</register-user>
</payload>
</uid-message>
Although the login transaction has a longer timeout on both cases, the group membership tag for the user “bar@test.local” is expected to timeout in 10 seconds.
Calling the /edl endpoint at specific times would demonstrate the first tag expiration.
GET http://127.0.0.1:8080/edl/
?list=group
&key=admin
HTTP/1.1 200 OK
Content-Type: text/plain
Date: Mon, 12 Apr 2021 10:28:16 GMT
Content-Length: 24
Connection: close
10.10.20.20
10.10.10.10
...
GET http://127.0.0.1:8080/edl/
?list=group
&key=admin
HTTP/1.1 200 OK
Content-Type: text/plain
Date: Mon, 12 Apr 2021 10:28:32 GMT
Content-Length: 12
Connection: close
10.10.10.10
Summary
Both EDL (because it provides “universal” coverage) and User-ID (because its advanced feature set) have their place in a large network. Implementing a reverse proxy between User-ID enabled applications and the managed PAN-OS device capable of hijacking User-ID messages have many different use cases: from basic User-ID to EDL conversion (as in the demo application) to advanced cases like applying policies to User-ID messages, enforcing maximum timeouts or transposing Address Groups into User Groups.
User-ID / EDL … better both of them was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

Ingest PAN-OS Alerts into Cortex XDR Pro Endpoint
By: Amine Basli

In this blog I’m sharing my experience leveraging Cortex XDR API’s to tailor-fit this product into specific customer requirements.
I bet that if you’re a Systems Engineer operating in EMEA you’ve argued more than once about the way products and licenses are packaged from an opportunity size point of view (too big for your customer base). There are many reasons that explain why a given vendor would like to have a reduced set of SKUs. But, no doubt, the fewer SKUs you have the lower the chances are for a given product to fit exactly into the customer’s need from a sizing standpoint.
At Palo Alto Networks we’re no different, but we execute on two principles that help bridge the gaps:
- Provide APIs for our products
- Encourage the developer community to use them through our Developer Relations team
The use case
A Cortex XDR Pro Endpoint customer was interested in ingesting threat logs from their PAN-OS NGFW into his tenant to stitch them with the agent’s alerts and incidents. The customer was aware that this requirement could be achieved by sharing the NGFW state into the cloud-based Cortex Data Lake. But the corresponding SKU was overkill from a feature point of view (it provides not only alert stitching but ML baselining of traffic behaviour as well) and an unreachable cost from a budgeting perspective.
Cortex XDR Pro provides a REST API to ingest third-party alerts to cover this specific use case. It is rate limited to only 600 alerts per minute per tenant but was more than enough for my customer because they were only interested in medium to critical alerts that appeared at a much lower frequency in their network. What about leveraging this API to provide an exact match to my customer’s requirement?
On the other end, PAN-OS can be configured to forward these filtered alerts natively to any REST API with a very flexible payload templating feature. So at first it looked like we could “connect” the PAN-OS HTTP Log Forwarding feature with the Cortex XDR Insert Parsed Alert API. But a deep dive analysis revealed inconsistencies between the mandatory timestamp field format (it must be presented as UNIX milliseconds for XDR to accept it)
It was too close to give up. I just needed a transformation pipeline that could be used as a middleman between the PAN-OS HTTP log forwarding feature and the Cortex XDR Insert Parsed Alert API. Provided I was ready for a middleman, I’d like it to enforce the XDR API quota limitations (600 alerts per minute / up to 60 alerts per update)
Developers to the rescue
I engaged our Developer Relations team in Palo Alto Networks because they’re always eager to discuss new use cases for our APIs. A quick discussion ended up with the following architecture proposal:
- An HTTP server would be implemented to be used as the endpoint for PAN-OS alert ingestion. Basic authentication would be implemented by providing a pre-shared key in the Authentication header.
- A transformation pipeline would convert the PAN-OS payload into a ready-to-consume XDR parsed alert API payload. The pipeline would take care, as well, to enforce the XDR API quota limits buffering bursts as needed.
- The XDR API client implementation would use an Advanced API Key for authentication.
- Everything would be packaged in a Docker container to ease its deployment. Configuration parameters would be provided to the container as environment variables.
Implementation details ended up shared in a document available in cortex.pan.dev. If you’re in the mood of creating your own implementation I highly recommend you taking the time to go over the whole tutorial. If you just want to get to the point then you can use this ready-to-go container image.
Let’s get started
I opted for using the available container image. Let me guide you through my experience using it to fulfil my customer’s request.
First of all the application requires some configuration data that must be provided as a set of environmental variables. A few of them are mandatory, others are just optional with default values.
The following are the required variables (the application will refuse to start without them)
- API_KEY: XDR API Key (Advanced)
- API_KEY_ID: The XDR API Key identifier (its sequence number)
- FQDN: Fully Qualified Domain Name of the corresponding XDR Instance (i.e. myxdr.xdr.us.paloaltonetworks.com)
The following are optional variables
- PSK: the server will check the value in the Authorization header to accept the request (default to no authentication)
- DEBUG: if it exists then the engine will be more verbose (defaults to false)
- PORT: TCP port to bind the HTTP server to (defaults to 8080)
- OFFSET: PAN-OS timestamp does not include time zone. By default, they will be considered in UTC (defaults to +0 hours)
- QUOTA_SIZE: XDR ingestion alert quota (defaults to 600)
- QUOTA_SECONDS: XDR ingestion alert quota refresh period (defaults to 60 seconds)
- UPDATE_SIZE: XDR ingestion alert max number of alerts per update (defaults to 60)
- BUFFER_SIZE: size of the pipe buffer (defaults to 6000 = 10 minutes)
- T1: how often the pipe buffer polled for new alerts (defaults to 2 seconds)
For more details, you can check public repository documentation
Step 1: Generate an Advanced API key on Cortex XDR
Connect to your Cortex XDR instance and navigate to Setting > API Keys Generate an API Key of type Advanced granting the Administrator role to it (that role is required for Alert ingestion)
Step 2: Run the container image
Assuming you have Docker installed on your computer the following command line command would pull the image and run the micro-service with configuration options passed as environmental variables.
docker run -rm -p 8080:8080 -e PSK=hello -e FQDN=xxx.xdr.us.paloaltonetworks.com -e API_KEY=<my-api-key> -e API_KEY_ID=<my-key-id> -e DEBUG=yes ghcr.io/xhoms/xdrgateway
The Debug option provides more verbosity and it is recommended for initial experimentation. If everything goes as expected you’ll see a log message like the following one.
2021/03/17 21:32:25 starting http service on port 8080
Step 3: Perform a quick test to verify the micro-service is running
Use any API test tool (I like good-old Curl) to push a test payload.
curl -X POST -H "Authorization: hello" "http://127.0.0.1:8080/in" -d '{"src":"1.2.3.4","sport":1234,"dst":"4.3.2.1","dport": 4231,"time_generated":"2021/01/06 20:44:34","rule":"test_rule","serial":"9999","sender_sw_version":"10.0.4","subtype":"spyware","threat_name":"bad-bad-c2c","severity":"critical","action":"alert"}---annex---"Mozi Command and Control Traffic Detection"'
You should receive an empty status 200 response and see log messages in the server console like the following ones:
2021/03/17 21:52:16 api - successfully parsed alert
2021/03/17 21:52:17 xdrclient - successful call to insert_parsed_alerts
Step 4: Configure HTTP log Forward on the Firewall:
PAN-OS NGFW can forward alerts to HTTP/S endpoints. The feature configuration is available in Device > Server Profiles > HTTP and, for this case, you should use the following parameters
- IP : IP address of the container host
- Protocol : HTTP
- Port: 8080(default)
- HTTP Method POST

Some notes regarding the payload format
- URI Format:“/in” is the endpoint where the micro-service listen for POST requests containing new alerts
- Headers: Notice we’re setting the value hello in the Authorization header to match the -e PSK=hello configuration variable we passed to the micro-service
- Payload: The variable $threat_name was introduced with PAN-OS 10.0. If you’re using older versions of PAN-OS the you can use the variable $threatid instead
The last action to complete this job is to create a new Log Forwarding profile that will consume our recently created HTTP server and to use it in all security rules we want their alerts being ingested into XDR. The configuration object is available at Objects > Log Forwarding
Step 5: Final checks
As your PAN-OS NGFW starts generating alerts, you should see activity in the micro-service’s output log and alerts being ingested into the Cortex XDR instance. The source tag Palo Alto Networks — PAN-OS clearly indicates these alerts were pushed by our deployment.

Production-ready version
There a couple of additional steps to perform before considering the micro-service production-ready.
- The service should be exposed over a secure channel (TLS). Best option is to leverage the preferred forward-proxy environment available in our container environment. Notice the image honours the PORT env variable and should work out-of-the-box almost everywhere
- A restart-policy should be in place to restart the container in case of a crash. It is not a good idea to run more than one instance in a load balancing group because the quota enforcement won’t be synchronised between them
Summary
Palo Alto Networks products and services are built with automation and customisation in mind. They feature rich API’s that can be used to tailor them to specific customer needs.
In my case these APIs provided the customer with a choice:
- Either wait for the scope of the project (and its budget) to increase in order to accommodate additional products like Cortex Network Traffic Analysis or
- Deploy a compact “middle-man” that would connect the PAN-OS and Cortex XDR API’s
This experience empowered me by acquiring DevOps knowledge (building micro-services) I’m sure I’ll use in many opportunities to come. Special thanks to The Developer Relations team at Palo Alto Networks who provided documentation and examples, and were eager to explore this new use case. I couldn’t have done it without their help and the work they’ve been putting into improving our developer experience.
Ingest PAN-OS Alerts into Cortex XDR Pro Endpoint was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

Enterprise API design practices: Part 4
By: Francesco Vigo

Welcome to the last part of my series on creating valuable, usable and future-proof Enterprise APIs. I initially covered some background context and the importance of design, then I presented security and backend protection and in the third chapter I explored optimizations and scale; this post is about monitoring and Developer Experience (DX).
5. Monitor your APIs
Don’t fly blind: you must make sure that your APIs are properly instrumented so you can monitor what’s going on. This is important for a lot of good reasons, such as:
- Security: is your service under attack or being used maliciously? You should always be able to figure out if there are anomalies and react swiftly to prevent incidents.
- Auditing: depending on the nature of your business, you might be required by compliance or investigative reasons to produce an audit trail of user activity of your product, which requires proper instrumentation of your APIs as well as logging events.
- Performance and scale: you should be aware of how your API and backend are performing and if the key metrics are within the acceptable ranges, way before your users start complaining and your business is impacted. It’s always better to optimize before performance becomes a real problem.
- Cost: similarly, with proper instrumentation, you can be aware of your infrastructure costs sustained to serve a certain amount of traffic. That can help you with capacity planning and cost modeling if you’re monetizing your service. And to avoid unexpected bills at the end of the month that might force you to disrupt the service.
- Feedback: with proper telemetry data to look at when you deploy a change, you can understand how it’s performing and whether it was a good idea to implement it. It also allows you to implement prototyping techniques such as A/B testing.
- Learn: analyzing how developers use your API can be a great source of learning. It will give you useful insights on how your service is being consumed and is a valuable source of ideas that you can evaluate for new features (i.e. new use-case driven API endpoints).
Proper instrumentation of services is a vast topic and here I just want to summarize a few items that are usually easy to implement:
- Every API request should have a unique operation identifier that is stored in your logs and can help your ops team figure out what happened. This identifier should also be reported back to clients somewhere (usually in API response the headers) especially, but not exclusively, in case of errors.
- Keep an eye on API requests that fail with server errors (i.e. the HTTP 5xx ones) and, if the number is non-negligible, try to pinpoint the root cause: is it a bug, or some request that is causing timeouts in the backend? Can you fix it or make it faster?
- Keep a reasonable log history to allow tracking errors and auditing user activity for at least several days back in time.
- Create dashboards that help you monitor user activity, API usage, security metrics, etc. And make sure to check them often.
6. Make Developer Experience a priority
Last principle, but definitely not the least important. Even if you are lucky and the developers that use your APIs are required to do so because of an external mandate, you shouldn’t make their experience less positive.
In fact, the Developer Experience of your product should be awesome.
If developers properly understand how to work with your APIs, and enjoy doing so, they will likely run into fewer issues and be more patient when trying to overcome them. They will also provide you with good quality feedback. It will reduce the number of requests they generate on your support teams. They will give you insights and use case ideas for building a better product.
And, more importantly, happy developers will ultimately build better products for their own customers which, in turn, will act as a force multiplier for the value and success of your own product. Everybody wins in the API Economy model: customers, business stakeholders, partners, product teams, engineers, support teams.
I’ve witnessed several situations where developers were so exhausted and frustrated with working against a bad API that, as soon as they saw the light at the end of the tunnel, they stopped thinking creatively and just powered their way through to an MVP that was far from viable and valuable. But it checked the box they needed so they were allowed to move on. I consider this a very specific scenario of developer fatigue: let’s call it “API Fatigue”.
Luckily, I’ve also experienced the other way around, where new, unplanned, great features were added because the DX was good and we had fun integrating things together.
There are many resources out there that describe how to make APIs with great developer experience: the most obvious one is to create APIs that are clear and simple to consume. Apply the Principle of least astonishment.
I recommend considering the following when shipping an API:
- Document your API properly: it’s almost certain that the time you spend creating proper documentation at the beginning is saved later on when developers start consuming it. Specification files, such as OpenAPI Specification (OAS) tremendously help here. Also, if you follow the design-first approach that we discussed in the first post of this series, you’ll probably already have a specification file ready to use.
- Support different learning paths: some developers like to read all the documentation from top to bottom before writing a single line of code, others will start with the code editor right away. Instead of forcing a learning path, try to embrace the different mindsets and provide tools for everyone: specification files, Postman collections, examples in different programming languages, an easy-to-access API sandbox (i.e. something that doesn’t require installing an Enterprise product and waiting 2 weeks to get a license to make your first API call), a developer portal, tutorials and, why not, video walkthroughs. This might sound overwhelming but pays off in the end. With the proper design done first a lot of this content can be autogenerated from the specification files.
- Document the data structure: if you can, don’t just add lists of properties to your specs and docs: strive to provide proper descriptions of the fields that your API uses so that developers that are not fully familiar with your product can understand them. Remove ambiguity from the documentation as much as possible. This can go a long way, as developers can make mental models by understanding the data properly that often leads to better use cases than “just pull some data from this API”.
- Use verbs, status codes, and error descriptions properly: leverage the power of the protocol you are using (i.e. HTTP when using REST APIs) to define how to do things and what responses mean. Proper usage of status codes and good error messages will dramatically reduce the number of requests to your support team. Developers are smart and want to solve problems quickly so if you provide them with the right information to do so, they won’t bother you. Also, if you are properly logging and monitoring your API behavior, it will be easier for your support team to troubleshoot if your errors are not all “500 Internal Server Error” without any other detail.
Finally, stay close to the developers: especially if your API is being used by people external from your organization, it’s incredibly important to be close to them as much as you can to support them, learn and gather feedback on your API and its documentation. Allow everyone that is responsible for designing and engineering your API to be in that feedback loop, so they can share the learnings. Consider creating a community where people can ask questions and can expect quick answers (Slack, Reddit, Stack Overflow, etc.). I’ve made great friendships this way!
A few examples
There are many great APIs out there. Here is a small, not complete, list of products from other companies that, for one reason or another, are strong examples of what I described in this blog series:
- Microsoft Graph
- Google Cloud
- VirusTotal
- GitHub REST and GraphQL APIs
Conclusion
And that’s a wrap, thanks for reading! There are more things that I’ve learned that I might share in future posts, but in my opinion, these are the most relevant ones. I hope you found this series useful: looking forward to hearing your feedback and comments!
Enterprise API design practices: Part 4 was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.

Enterprise API design practices: Part 3
By: Francesco Vigo

Welcome to the third part of my series on creating valuable, usable, and future-proof Enterprise APIs. Part 1 covered some background context and the importance of design, while the second post was about security and backend protection; this chapter is on optimization and scale.
3. Optimize interactions
When dealing with scenarios that weren’t anticipated in the initial release of an API (for example when integrating with an external product from a new partner), developers often have to rely on data-driven APIs to extract information from the backend and process it externally. While use-case-driven APIs are generally considered more useful, sometimes there might not be one available that suits the requirements of the novel use case that you must implement.
By considering the following guidelines when building your data-driven APIs, you can make them easier to consume and more efficient for the backend and the network, improving performance and reducing the operational cost (fewer data transfers, faster and cheaper queries on the DBs, etc.).
I’ll use an example with sample data. Consider the following data as a representation of your backend database: an imaginary set of alertsthat your Enterprise product detected over time. Instead of just three, imagine the following JSON output with thousands of records:
{
"alerts" : [
{
"name": "Impossible Travel",
"alert_type": "Behavior",
"alert_status": "ACTIVE",
"severity": "Critical",
"created": "2020-09-27T09:27:33Z",
"modified": "2020-09-28T14:34:44Z",
"alert_id": "8493638e-af28-4a83-b1a9-12085fdbf5b3",
"details": "... long blob of data ...",
"evidence": [
"long list of stuff"
]
},
{
"name": "Malware Detected",
"alert_type": "Endpoint",
"alert_status": "ACTIVE",
"severity": "High",
"created": "2020-10-04T11:22:01Z",
"modified": "2020-10-08T08:45:33Z",
"alert_id": "b1018a33-e30f-43b9-9d07-c12d42646bbe",
"details": "... long blob of data ...",
"evidence": [
"long list of stuff"
]
},
{
"name": "Large Upload",
"alert_type": "Network",
"alert_status": "ACTIVE",
"severity": "Low",
"created": "2020-11-01T07:04:42Z",
"modified": "2020-12-01T11:13:24Z",
"alert_id": "c79ed6a8-bba0-4177-a9c5-39e7f95c86f2",
"details": "... long blob of data ...",
"evidence": [
"long list of stuff"
]
}
]
}
Imagine implementing a simple /v1/alerts REST API endpoint to retrieve the data and you can’t anticipate all the future needs. I recommend considering the following guidelines:
- Filters: allow your consumers to reduce the result set by offering filtering capabilities in your API on as many fields as possible without stressing the backend too much (if some filters are not indexed it could become expensive, so you must find the right compromise). In the example above, good filters might include: name, severity, alert_type, alert_status, created, and modified. More complicated fields like details and stack might be too expensive for the backend (as they might require full-text search) and you would probably leave them out unless really required.
- Data formats: be very consistent in how you present and accept data across your API endpoints. This holds true especially for types such as numbers and dates, or complex structures. For example, to represent integers in JSON you can use numbers (i.e. "fieldname": 3) or strings (i.e. "fieldname": "3" ): no matter what you choose, you need to be consistent across all your API endpoints. And you should also use the same format when returning outputs and accepting inputs.
- Dates: dates and times can be represented in many ways: timestamps (in seconds, milliseconds, microseconds), strings (ISO8601 such as in the example above, or custom formats such as 20200101), with or without time zone information. This can easily become a problem for the developers. Again, the key is consistency: try to accept and return only a single date format (i.e. timestamp in milliseconds or ISO8601) and be explicit about whether you consider time zones or not: usually choosing to do everything in UTC is a good idea because removes reduce ambiguity. Make sure to document the date formats properly.
- Filter types: depending on the type of field, you should provide appropriate filters, not just equals. A good example is supporting range filters for dates that, in our example above, allow consumers to retrieve only the alerts created or modified in a specific interval. If some fields are enumerators with a limited number of possible values, it might be useful to support a multi-select filter (i.e. IN): in the example above it should be possible to filter by severity values and include only the High and Critical values using a single API call.
- Sorting: is your API consumer interested only in the older alerts or the newest? Supporting sorting in your data-driven API is extremely important. One field to sort by is generally enough, but sometimes (depending on the data) you might need more.
- Result limiting and pagination: you can’t expect all the entries to be returned at once (and your clients might not be interested or ready to ingest all of them anyway), so you should implement some logic where clients should retrieve a limited number of results and can get more when they need. If you are using pagination, clients should be able to specify the page size within a maximum allowed value. Defaults and maximums should be reasonable and properly documented.
- Field limiting: consider whether you really need to return all the fields of your results all the time, or if your clients usually just need a few. By letting the client decide what fields (or groups of fields) your API should return, you can reduce the network throughput and backend cost, and performance. You should provide and document some sane default. In the example above, you could decide to return by default all the fields except details and evidence, which can be requested by the client only if they explicitly ask, using an include parameter.
Let’s put it all together. In the above example, you should be able to retrieve, using a single API call, something like this:
Up to 100 alerts that were created between 2020–04–01T00:00:00Z (April 1st 2020) and 2020–10–01T00:00:00Z (October 1st 2020) with severity “medium” or “high”, sorted by “modified” date, newest first, including all the fields but “evidence”.
There are multiple ways you can implement this; through REST, GraphQL, or custom query languages: in many cases, you don’t need something too complex as often data sets are fairly simple. The proper way depends on many design considerations that are outside the scope of this post. But having some, or most of these capabilities in your API will make it better and more future proof. By the way, if you’re into GraphQL, I recommend reading this post.
4. Plan for scale
A good API shouldn’t be presumptuous: it shouldn’t expect that clients are not doing anything else than waiting for its response, especially at scale, where performance is key.
If your API requires more than a few milliseconds to produce a response, I recommend considering supporting jobs instead. The logic can be as follows:
- Implement an API endpoint to start an operation that is supposed to take some time. If accepted, it would return immediately with a jobId.
- The client stores the jobId and periodically reaches out to a second endpoint that, when provided with the jobId, returns the completion status of the job (i.e. running, completed, failed).
- Once results are available (or some are), the client can invoke a third endpoint to fetch the results.
Other possible solutions include publisher/subscriber approaches or pushing data with webhooks, also depending on the size of the result set and the speed requirements. There isn’t a one-size-fits-all solution, but I strongly recommend avoiding a polling logic where API clients are kept waiting for the server to reply while it’s running long jobs in the backend.
If your need high performance and throughput when in your APIs, consider gRPC, as its binary representation of data using protocol buffers has significant speed advantages over REST.
Side note: if you want to learn more about REST, GraphQL, webhooks, or gRPC use cases, I recommend starting from this post.
Finally, other considerations for scale include supporting batch operations on multiple entries at the same time (for example mass updates), but I recommend considering them only when you have a real use case in mind.
What’s next
In the next and last chapter, I’ll share some more suggestions about monitoring and supporting your API and developer ecosystem.
Enterprise API design practices: Part 3 was originally published in Palo Alto Networks Developers on Medium, where people are continuing the conversation by highlighting and responding to this story.