NDM 2013‎ > ‎Workshop Program‎ > ‎

Accepted Papers

Regular Papers:

Characterizing the Impact of End-System Affinities On the End-to-End Performance of High-Speed Flows, Nathan Hanford*, Vishal Ahuja, Mehmet Balman, Matthew Farrens, Dipak Ghosal, Eric Pouyoul and Brian Tierney. (University of California Davis, LBNL)  

Abstract: Multi-core end-systems use Receive Side Scaling (RSS) to parallelize protocol processing. RSS uses a hash function on the standard flow descriptors and an indirection table to as- sign incoming packets to receive queues which are pinned to specific cores. This ensures flow affinity in that the interrupt processing of all packets belonging to a specific flow is processed by the same core. A key limitation of standard RSS is that it does not consider the application process that consumes the incoming data in determining the flow affinity. In this paper, we carry out a detailed experimental analysis of the performance impact of the application affinity in a 40 Gbps testbed network with a dual hexa-core end- system. We show, contrary to conventional wisdom, that when the application process and the flow are affinitized to the same core, the performance (measured in terms of end- to-end TCP throughput) is significantly lower than the line rate. Near line rate performance is observed when the flow and the application process are affinitized to different cores belonging to the same socket. Furthermore, affinitizing the application and the flow to cores on different sockets results in significantly lower throughput than the line rate. These results arise due to the memory bottleneck, which is demonstrated using preliminary correlational data on the cache hit rate in the core that services the application process.

Evaluating I/O Aware Network Management for Scientific Workflows on Networked Clouds, Anirban Mandal, Paul Ruth, Ilya Baldin, Yufeng Xin, Claris Castillo, Mats Rynge and Ewa Deelman. (RENCI - UNC Chapel Hill , ISI - University of Southern California)

Abstract: This paper presents a performance evaluation of scientific workflows on networked cloud systems with particular emphasis on evaluating the effect of provisioned network band- width on application I/O performance. The experiments were run on ExoGENI, a widely distributed networked infrastructure as a service (NIaaS) testbed. ExoGENI orchestrates a federation of independent cloud sites located around the world along with backbone circuit providers. The evaluation used a representative data-intensive scientific work-flow application called Montage. The application was deployed on a virtualized HTCondor environment provisioned dynamically from the ExoGENI networked cloud testbed, and managed by the Pegasus workflow manager. The results of our experiments show the effect of modifying provisioned network bandwidth on disk I/O throughput and workflow execution time. The marginal benefit as perceived by the workflow reduces as the network bandwidth allocation increases to a point where disk I/O saturates. There is little or no benefit from increasing network bandwidth beyond this inflection point. The results also underline the importance of network and I/O performance isolation for predictable application performance, and are applicable for general data-intensive workloads. Insights from this work will also be useful for real-time monitoring, application steering and infrastructure planning for data-intensive workloads on networked cloud platforms.

Efficient Wide Area Data Transfer Protocols for 100 Gbps Networks and Beyond, Ezra Kissel, Martin Swany, Brian Tierney and Eric Pouyoul. (Indiana University, LBNL)

Abstract: Due to a number of recent technology developments, now is the right time to re-examine the use of TCP for very large data transfers. These developments include the deployment of 100 Gigabit per second (Gbps) network backbones, hosts that can easily manage 40 Gbps, and higher, data transfers, the Science DMZ model, the availability of virtual circuit technology, and wide-area Remote Direct Memory Access (RDMA) protocols. In this paper we show that RDMA works well over wide-area virtual circuits, and uses much less CPU than TCP or UDP. We also characterize the limitations of RDMA in the presence of other traffic, including competing RDMA flows. We conclude that RDMA for Science DMZ to Science DMZ transfers of massive data is a viable and desirable option for high-performance data transfer.

Network-Aware Data Caching and Prefetching for Cloud-hosted Metadata Retrieval, Bing Zhang, Brandon Ross, Sanatkumar Tripathi, Sonali Batra, and Tevfik Kosar. (University at Buffalo - SUNY)

Abstract: With the overwhelming emergence of data-intensive applications in the Cloud, the wide-area transfer of metadata and other descriptive information about remote data is critically important for searching, indexing, and enumerating remote file system hierarchies, as well as for purposes of data trans- fer estimation and reservation. In this paper, we present a highly efficient network-aware caching and prefetching mechanism tailored to reduce metadata access latency and improve responsiveness in wide-area data transfers. To im- prove the maximum requests per second (RPS) handled by the system, we designed and implemented a network-aware prefetching service using dynamically provisioned parallel TCP streams. To improve the performance of accessing lo- cal metadata, we designed and implemented a non-blocking concurrent in-memory cache to handle unexpected bursts of requests. We have implemented the proposed mechanisms in the Directory Listing Service (DLS) system—a Cloud- hosted metadata retrieval, caching, and prefetching system, and have evaluated its performance on Amazon EC2 and XSEDE.

On Causes of GridFTP Transfer Throughput Variance, Zhengyang Liu, Malathi Veeraraghavan, Jianhui Zhou, Jason Hick and Yee-Ting Li. (University of Virginia, LBNL, SLAC - Stanford University)

Abstract: In prior work, we analyzed the GridFTP usage logs col- lected by data transfer nodes (DTNs) located at national scientific computing centers, and found significant through- put variance even among transfers between the same two end hosts. The goal of this work is to quantify the impact of various factors on throughput variance. Our methodol- ogy consisted of executing experiments on a high-speed re- search testbed, running large-sized instrumented transfers between operational DTNs, and creating statistical models from collected measurements. A non-linear regression model for memory-to-memory transfer throughput as a function of CPU usage at the two DTNs and packet loss rate was created. The model is useful for determining concomitant resource allocations to use in scheduling requests. For ex- ample, if a whole NERSC DTN CPU core can be assigned to the GridFTP process executing a large memory-to-memory transfer to SLAC, then only 32% of a CPU core is required at the SLAC DTN for the corresponding GridFTP process due to a difference in the computing speeds of these two DTNs. With these CPU allocations, data can be moved at 6.3 Gbps, which sets the rate to request from the circuit scheduler.

Network-aware Virtual Machine Consolidation for Large Data Centers, Dharmesh Kakadia, Nandish Kopri and Vasudeva Varma.
( IIIT Hyderabad, Unisys Bangalore - India)

Abstract: Resource management in modern data centers has become a challenging task due to the tremendous growth of data cen- ters. In large virtual data centers, performance of applica- tions is highly dependent on the communication bandwidth available among virtual machines. Traditional algorithms ei- ther do not consider network I/O details of the applications or are computationally intensive. We address the problem of identifying the virtual machine clusters based on the network traffic and placing them intelligently in order to improve the application performance and optimize the network usage in large data center. We propose a greedy consolidation algo- rithm that ensures the number of migrations is small and the placement decisions are fast, which makes it practical for large data centers. We evaluated our approach on real world traces from private and academic data centers, using simulation and compared the existing algorithms on various parameters like scheduling time, performance improvement and number of migrations. We observed a ∼70% savings of the interconnect bandwidth and overall ∼60% improvements in the applications performances. Also, these improvements were produced within a fraction of scheduling time and number of migrations.

Short Papers:

The Practical Obstacles of Data Transfer: Why researchers still love scp, Hai Ah Nam, Jason Hill and Suzanne Parete-Koon.
(Oak Ridge National Laboratory)

Abstract: The importance of computing facilities is heralded every six months with the announcement of the new Top500 list, showcasing the world’s fastest supercomputers. Unfortunately, with great computing capability does not come great long-term data storage capacity, which often means users must move their data to their local site archive, to remote sites where they may be doing future computation or anal- ysis, or back to their home institution, else face the dreaded data purge that most HPC centers employ to keep utilization of large parallel filesystems low to manage performance and capacity. At HPC centers, data transfer is crucial to the scientific workflow and will increase in importance as computing systems grow in size. The Energy Sciences Net- work (ESnet) recently launched its fifth generation network, a 100 Gbps high-performance, unclassified national network connecting more than 40 DOE research sites to support scientific research and collaboration. Despite the tenfold increase in bandwidth to DOE research sites amenable to multiple data transfer streams and high throughput, in practice, researchers often under-utilize the network and resort to painfully-slow single stream transfer methods such as scp to avoid the complexity of using multiple stream tools such as GridFTP and bbcp, and contend with frustration from the lack of consistency of available tools between sites. In this study we survey and assess the data transfer methods provided at several DOE supported computing facilities, including both leadership-computing facilities, connected through ESnet. We present observed transfer rates, suggested optimizations, and discuss the obstacles the tools must overcome to receive wide-spread adoption over scp.

In-network, Push-based Network Resource Monitoring, Taylor Groves, Yihua He and Dorian Arnold. (University of New Mexico, Yahoo Inc.)

Abstract: We present preliminary work from our experiences with distributed, push-based monitoring of networks at Yahoo!. Net- work switches have grown beyond mere ASICs into machines which support unmodified Linux kernels and familiar user interfaces. These advances have enabled a paradigm shift in network monitoring. In lieu of traditional approaches where network diagnostics were delivered via SNMP we utilize Sysdb of Arista’s EOS to implement a push based approach to network monitoring. This leaves the individual switches in charge of determining what monitoring data to send and when to send it. With this approach – on-switch collection, dissemination, and analysis of interfaces and protocols become possible. This push based approach reduces the feedback loop of network diagnostics and enables network- aware applications, middleware and resource managers to have access to the freshest available data. Our work utilizes the OpenTSDB monitoring framework to provide a scalable back-end for accessing and storing real-time statistics delivered by on-switch collection agents. OpenTSDB is built on top of Hadoop/HBase, which handles the underlying access and storage for the monitoring system. We wrote two collection agents as prototypes to ex- plore the framework and demonstrate the benefits of push based network monitoring. 

End-to-End Data Movement Using MPI-IO Over Routed Terabits Infrastructures, Geoffroy Vallee, Scott Atchley, Youngjae Kim and Galen Shipman. (Oak Ridge National Laboratory)

Abstract: Scientific discovery is nowadays driven by large-scale simulations running on massively parallel high-performance computing (HPC) systems. These applications each generate a large amount of data, which then needs to be post-processed for example for data mining or visualization. Unfortunately, the computing platform used for post processing might be different from the one on which the data is initially generated, introducing the challenge of moving large amount of data between computing platforms. This is especially challenging when these two platforms are geographically separated since the data needs to be moved between computing facilities. This is even more critical when scientists tightly couple their domain specific applications with a post processing application.The paper presents a solution for the data transfer between MPI applications using a dedicated wide area network (WAN) terabit infrastructure. The proposed solution is based on parallel access to data files and the Message Passing Inter- face (MPI) over the Common Communication Infrastructure (CCI) for the data transfer over a routed infrastructure. In the context of this research, the Energy Sciences Network (ESnet) of the U.S. Department of Energy (DOE) is targeted for the transfer of data between DOE national laboratories

* Best Paper Certificate was awarded to Nathan Hanford from University of California, Davis