[
TPDS] PostMan: Rapidly Mitigating Bursty Traffic via On-Demand Offloading of Packet Processing.
Y. Niu, P. Jin, J. Guo, Y Xiao,
R. Shi, F. Liu, C. Qian, Y. Wang
In IEEE Transactions on Parallel and Distributed Systems, Jun. 2021
bib
abstract
paper
@ARTICLE{rong-tpds2021,
author={Niu, Yipei and Jin, Panpan and Guo, Jian and Xiao, Yikai and Shi, Rong and Liu, Fangming and Qian, Chen and Wang, Yang},
journal={IEEE Transactions on Parallel and Distributed Systems},
title={PostMan: Rapidly Mitigating Bursty Traffic via On-Demand Offloading of Packet Processing},
year={2022}, volume={33}, number={2}, pages={374-387},
doi={10.1109/TPDS.2021.3092266}
}
Unexpected bursty traffic brought by certain sudden events, such as news in the spotlight on a
social network or discounted items on sale, can cause severe load imbalance in backend services.
Migrating hot data - the standard approach to achieve load balance - meets a challenge when handling
such unexpected load imbalance, because migrating data will slow down the server that is already under heavy pressure.
This article proposes PostMan, an alternative approach to rapidly mitigate load imbalance for services processing small requests.
Motivated by the observation that processing large packets incurs far less CPU overhead than processing small ones,
PostMan deploys a number of middleboxes called helpers to assemble small packets into large ones for the heavily-loaded server.
This approach essentially offloads the overhead of packet processing from the heavily-loaded server to helpers.
To minimize the overhead, PostMan activates helpers on demand, only when bursty traffic is detected.
The heavily-loaded server determines when clients connect/disconnect to/from helpers based on the real-time load statistics.
To tolerate helper failures, PostMan can migrate connections across helpers and can ensure packet ordering despite such migration.
Driven by real-world workloads, our evaluation shows that, with the help of PostMan, a Memcached server can mitigate bursty
traffic within hundreds of milliseconds, while migrating data takes tens of seconds and increases the latency during migration.