FinTech Journal Authors: Yeshim Deniz, Pat Romanski, Liz McMillan, Zakia Bouachraoui, Elizabeth White

Related Topics: SDN Journal, Java IoT, Containers Expo Blog, @CloudExpo, Cloud Security, @DXWorldExpo

SDN Journal: Blog Post

Stateless Transport Tunneling (STT) Meets the Network

At a high level the concepts of larger packets, hardware offload, reduced CPU load and interrupts all make sense

Last week I walked through the packet formats for VXLAN and NVGRE specifically focused on ways by which the overlay packets provide information to the physical network that help the physical network. Some of the initial extreme thoughts that the overlay and physical network can and should be completely ignorant of each other have softened more recently and more pragmatic thoughts of collaborating layers are being articulated. At Plexxi we have often mentioned that we believe the physical network and the overlay need to be closely orchestrated to get the most benefit out of the total network solution. And orchestration != ECMP.

In addition to VXLAN and NVGRE, Stateless Transport Tunneling (STT) is an encapsulation mechanism used by VMware, mostly for communication between server based vSwitches. It is a bit more involved and complicated than VXLAN and NVGRE, mostly because it was designed to carry large data packets, up to 64 Kbytes. Physical networks have limitations on the size of a packet that can be transferred. Ethernet standard maximum transmission unit (MTU) used to be 1500 bytes, but most ethernet devices these days can support jumbo packets allowing packets of 4, 9 or even 16 Kbytes in size. Even at those sizes, large data transfers are somewhat hampered by the work involved in taking a large chunk of data and then chopping them up into smaller portions to be transmitted. In a response to this, hardware vendors have taken some of this functionality and added it to the Network Interface Cards (NICs) on servers and have them do most of this segmentation and re-assembly work based on how TCP takes large portions of data and chops them into smaller segments. Doing his in hardware means it can be done faster, but more importantly, it removes this burden from the server CPUs, allowing them to do other (more useful) work.

STT was designed to make use of these TCP capabilities in NICs. STT can take ethernet packets up to 64 Kbytes from a VM on a server, and tunnel it to its destination as a 64 Kbyte entity. This STT frame has to be chopped into smaller pieces to match the MTU of the physical network, but an STT packet looks just like a TCP segment to the receiving NICs, allowing them to reconstruct the original 64 Kbyte packet without needing the CPU.

When the sending tunnel endpoint receives a large chunk of data to be transmitted at another VM at the other side of a tunnel, the vSwitch takes several steps to encapsulate this packet. First, it adds an STT Frame Header to the packet.

STT Frame Format 1

The STT Header is 18 bytes in length and has a variety of administrative fields, but the key field is the Context ID. This is a 64 bit field and its intended use is similar to the VXLAN Network Identifier (VNI) or the NVGRE Virtual Subnet ID (VSID). While the semantics of this field are somewhat defined, its value and how to use it is left open in the latest specifications. Its main purpose is to provide the receiving tunnel endpoint the information it needs to determine where this packet needs to be sent after decapsulation.

After the STT Frame Header has been added, this new packet (original packet  + new STT header) is chopped into smaller pieces so that each piece is at least 62 bytes smaller than the MTU of the physical network. Each of these new segments receives 24 byte TCP like header, a normal 20 byte IP header, and of course the final 18 byte Ethernet header before transmission. The magic (or ugliness for those less enamored by STT) is in the TCP like header. These 24 bytes are formatted just like a normal TCP header to ensure the hardware in the NICs can re-assemble segments that belong together. The traditional Acknowledgement field in TCP is used as a fragment ID, essentially telling the NIC that all packets/segments that come in with the same fragment ID belong together and should be reassembled into the larger original ethernet frame. The traditional Sequence number is used as an offset indicator, to tell the NIC in what order the fragments need to be put together.

STT Frame Format 2

Similar to VXLAN and NVGRE described last week, STT has a mechanism to create entropy for the physical network to distinguish flows from each other and allow them to be balanced using ECMP (or link aggregation – LAG) based deployments. In STT, the TCP source port is used to create entropy. The originating tunnel end point will use some hash calculation on the original packets header information and use the result to populate the TCP source port. Switches in the physical network can now use the TCP port information from the tunneled packet in their hash calculation for ECMP or LAG packet distribution.

While STT is likely to be more efficient than either VXLAN or NVGRE for the transfer of large amount of information because it offloads the segmentation and re-assembly, it carries significantly more overhead than either VXLAN or NVGRE in additional header information for smaller packets. STT adds 80 bytes of new header to a VM originated ethernet packet for the first segment of this packet, 62 for each following segment. Compare that to a consistent 46 bytes for each NVGRE encapsulated packet, and 54 bytes for VXLAN. For traffic between VMs on the same server this may not matter, but it certainly does for traffic carried across the physical network. For the plentiful mice flows, we have likely doubled the size and bandwidth required for each.

A probably more significant drawback of STT comes from its strength. Designed for large packet transfers, once an original packet is encapsulated with STT header, chopped into parts, then encapsulated into individual ethernet, IP and TCP (like) headers, only the first packet provides any clue or context of the original source, destination, protocol, application and other content. The relevant pieces of that will only be found in the first segment, any follow up segments only provide enough information about the tunnel endpoints and no other original context without the first segment. And that makes debugging really hard. It also makes it hard to differentiate traffic on the physical network, even at a very high level Virtual Network identifier. And every existing network based service (realizing that one of the goals of overlay networks is to push this to the vSwitches themselves) will also have a hard time deciding what to do with these packets.

At a high level the concepts of larger packets, hardware offload, reduced CPU load and interrupts all make sense. But most data center ethernet networks can easily support 9k or even 16k packets, so perhaps the gap between 16k packet based transfer and 64k semi-stream based communication is really not that much considering that the bulk of packets are small to begin with (remember those mice and elephants?). Perhaps aligning the MTU of the virtual port with that of the network may be worthwhile to have the STT and original header in each and every packet on the wire. Regardless of whether that is a real wire, or a virtual one.

[Today's fun fact: One of the primary reasons the Mayflower pilgrims ended their voyage at Plymouth Rock was pretty much the same reason people today suspend their journeys: they ran out of beer. No need for a funny punch line on that one]

The post Stateless Transport Tunneling (STT) meets the Network appeared first on Plexxi.

Read the original blog entry...

More Stories By Marten Terpstra

Marten Terpstra is a Product Management Director at Plexxi Inc. Marten has extensive knowledge of the architecture, design, deployment and management of enterprise and carrier networks.

IoT & Smart Cities Stories
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
Machine learning has taken residence at our cities' cores and now we can finally have "smart cities." Cities are a collection of buildings made to provide the structure and safety necessary for people to function, create and survive. Buildings are a pool of ever-changing performance data from large automated systems such as heating and cooling to the people that live and work within them. Through machine learning, buildings can optimize performance, reduce costs, and improve occupant comfort by ...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time t...
CloudEXPO New York 2018, colocated with DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
In this Women in Technology Power Panel at 15th Cloud Expo, moderated by Anne Plese, Senior Consultant, Cloud Product Marketing at Verizon Enterprise, Esmeralda Swartz, CMO at MetraTech; Evelyn de Souza, Data Privacy and Compliance Strategy Leader at Cisco Systems; Seema Jethani, Director of Product Management at Basho Technologies; Victoria Livschitz, CEO of Qubell Inc.; Anne Hungate, Senior Director of Software Quality at DIRECTV, discussed what path they took to find their spot within the tec...
The deluge of IoT sensor data collected from connected devices and the powerful AI required to make that data actionable are giving rise to a hybrid ecosystem in which cloud, on-prem and edge processes become interweaved. Attendees will learn how emerging composable infrastructure solutions deliver the adaptive architecture needed to manage this new data reality. Machine learning algorithms can better anticipate data storms and automate resources to support surges, including fully scalable GPU-c...
Disruption, Innovation, Artificial Intelligence and Machine Learning, Leadership and Management hear these words all day every day... lofty goals but how do we make it real? Add to that, that simply put, people don't like change. But what if we could implement and utilize these enterprise tools in a fast and "Non-Disruptive" way, enabling us to glean insights about our business, identify and reduce exposure, risk and liability, and secure business continuity?
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
DXWorldEXPO LLC announced today that Telecom Reseller has been named "Media Sponsor" of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.