Last 7 days Last 30 days Last 90 days. Corvil recently announced its move to nanosecond granularity latency measurements with a press release issued jointly with Nomura. This move is not just a technological advance for the sake of the technology — the CorvilNet product has always used hardware-generated nanosecond-timescale timestamps as its foundation.
Nor is this move to nanosecond latency measurements just a headline-grabbing ploy — it has been driven by the concrete needs of our customers. As high performance trading drives the infrastructure to deliver ever-lower latencies, there is a resulting need for visibility into that infrastructure at correspondingly shorter timescales. This visibility must be based on timestamps with a precision that is at least an order of magnitude or two finer than the latencies being measured.
In order to characterize the performance of the system fully, Nomura needed a solution to deliver sub-microsecond precision timestamping and latency measurements.
High Performance Trading Systems The reason for Corvil's move to nanosecond-granularity latency measurements is quite simple: It is interesting, however, to dig beyond this simple answer and explore some of the techniques and architectures that have allowed such systems to achieve the performance levels that they do.
Our interest lies in the trading-system components that directly process and transport the trade instructions and the market data along the trading loop. All trading systems differ in their details, but we can make some generalizations. On the exchange side, the relevant components are matching engines, order gateways and market-data publishers; on the market participant side, we have trading strategies, EMSs and OMSs, SORs and feed handlers.
To support DMA, brokers may need to deploy additional components such as FIX engines; in proprietary-trading environments, many of these components may be hosted on a single multi-core server, or the functionality even collapsed into a single software application. There are also the networks that connect all these sub-systems together to create the full end-to-end trading loop. This crude summary necessarily ignores much of the supporting infrastructure such as power, cooling and management. It noticeably neglects even storage and databases: Having identified the major components of trading systems, let us now turn to the technologies they use.
By and large, much of the latency budget is spent on this software because this is where the complexity lies. Software latency is lower than it has ever been, not only because of hardware improvements but also because of improvements in software use of the hardware. Kernel- and stack-bypass Any deployed application runs on top of an operating system and cohabits with any number of other applications and processes.
The most effective way of ensuring the application performs well is to keep it as isolated as possible from the OS and other processes, just as we want to keep its own internal threads as independent as possible by avoiding locking. General purpose processors that are the CPUs of standard servers and PCs read a stream of instructions from memory, whereas application-specific integrated-circuits have their instruction set hardwired into them when they are manufactured.
FPGAs are closer to ASICs in that their instruction sets are implemented in silicon rather than being read on the fly, but they are programmable in that they can be rewired on demand.
They are not as fast or efficient as ASICs, but their programmability is a huge advantage: A key characteristic of FPGAs is that they typically have a very deterministic latency profile: Low-latency networking The techniques used to reduce network latencies are somewhat different, as the primary causes of network latency are quite different to those of software latency.
Those primary causes are:. The simplest and most direct way of tackling latency in networks has always been to increase bandwidths: Interestingly, it is the simplest component of latency that has received the most attention in recent years, namely propagation delay.
By taking space in an exchange colo or proximity hosting site, a trading firm can eliminate nearly all of the propagation delay between their algorithms and the matching engines. This provides a simple, clear, and quantifiable improvement in their latency profile; one that many firms clearly feel justifies the high premium exchanges and hosting sites charge for such proximity.
Infiniband is a high-bandwidth, low-latency, host-interconnect technology that is most often used in high-performance computing clusters. It is effectively a LAN technology used as an alternative to Ethernet and many of its advanced features have started to be adopted by modern Ethernet variants. However it also pushes extra complexity into the end systems, resulting in a network interface that does not map cleanly to the standard socket model. Latency Distribution The low levels of latency that can be achieved by the technologies we have discussed are impressive but there are a number of very important considerations that must be taken into account.
The most fundamental is that any comparison of latencies must define very clearly exactly what latencies are being compared and how they are measured. Another important consideration is that these latencies are headline numbers: The lowest latencies are achieved under ideal laboratory conditions; it may be that production systems may achieve close to these latencies, but it is certainly not a given.
There are sometimes trade-offs to be made between latency and throughput: Another aspect to consider is that most attention gets paid to minimum or average latencies, whereas it is usually the maximum latency or the high percentiles of the latency distribution that are most important.
For example, trading networks are usually operated at low load — but microbursts can drive significant queuing in the buffers that protect the aggregation links against packet loss. As we have seen, one of the attractive features of FPGAs in trading systems is the fact that they typically have a deterministic, or close to deterministic, latency profile. InfiniBand fabrics also provide deterministic latency guarantees; however it is worth pointing out that they do so by providing bandwidth reservations from host to host across the fabric, and pushing all the queuing back into the senders.
It is not that there are no latency spikes from application to application across IB, they just all happen inside the hosts rather than on the IB network. In the high performance trading world, reliability and predictability of execution are often more important determiners of profitability than average latency. Consider for example the question of market data latency: High market activity, such as that driven by the announcement of economic indicators, will result in microbursts of market data; these may then drive spikes in the feed-handler latency precisely at the time that it is important to get timely updates of market activity.
Knowing the average latency across the feed-handler does not help in engineering the system for reliably low latency. What is required is the ability to timestamp all the events at the appropriate resolution, measure the latency of every hop, and correlate the latency across the full technology stack with the microbursts that drive the latency.
Low latency trading has driven the adoption and development of a set of technologies that enable trade-execution and market data delivery and handling at the timescale of microseconds. Many of the key processes can be implemented within a handful of microseconds, which means that precision latency management must be capable of delivering an accuracy of hundreds or tens of nanoseconds. At the same time, total system latencies can vary by orders of magnitude because of the effects of dynamic congestion.
Effective latency management requires, not just a capture of the complete distribution of latencies, but also an analysis of the causes of latency. That is, it is not sufficient to just measure the spikes in trading system latency, but we must also capture the microbursts and other infrastructure behavior that drives the latency spikes.
CorvilNet is the only latency management solution to provide nanosecond latency measurements today, and the only solution that delivers unified latency management across the whole application and network infrastructure.
This article is an abridged version of the Corvil white paper, Nanosecond Latency Management. Click here to read the entire paper. TabbFORUM is an open community that provides a platform for capital markets professionals to share their ideas and thought leadership with their peers. The views and opinions expressed are solely those of the author s.
You must log in to comment. Charting a New Course MarketTech Delivering Liquidity in Scale Fixed Income FinTech Barbarians at the Gates Equities Navigating the Confluence MarketTech Game of Smarts Equity Trading Efficiency and Disorder Fixed Income Breaking Rates MarketTech You have been granted access to this page through First Click Free.
If you don't have an account, registration is free. Latency Monitoring In this segment, the importance of relative latency, or measuring how your latency stacks up with the person next to you. The Return of the Network A year ago, latency talk focused on applications. At the recent World Exchange Conference in Madrid, it was the network.
Most Read Most Commented Duration: For more stories in the Latency Matters Spotlight Series click here. Add a Comment You must log in to comment.More...