Amin Vahdat

Google (United States)

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Alex C. Snoeren

University of California, San Diego

George Porter

University of California, San Diego

Thomas E. Anderson

University of Washington

Jeannie Albrecht

Williams College

Nathan Farrington

Google (United States)

Diwaker Gupta

ASTER

Priya Mahadevan

S.N. Bose National Centre for Basic Sciences

Brent Chun

Nutanix (United States)

Jeffrey S. Chase

Edison International (United States)

George C. Papen

University of California, San Diego

Cooperative Institutions

Google (United States)

262

University of California, San Diego

University of California, Berkeley

Duke University

Stanford University

University of Washington

UC San Diego Health System

Microsoft Research (United Kingdom)

Intel (United States)

Microsoft (United States)

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Collaborative Research: NeTS—FIND: Privacy-Preserving Attribution and Provenence

Alex C. Snoeren Tadayoshi Kohno Stefan Savage Amin Vahdat Geoffrey M. Voelker

Source

Cite

Citations (2)

Addressing strategic behavior in a deployed microeconomic resource allocator

Chaki Ng Philip Buonadonna Brent Chun Alex C. Snoeren Amin Vahdat

While market-based systems have long been proposed as solutions for distributed resource allocation, few have been deployed for production use in real computer systems. Towards this end, we present our initial experience using Mirage, a microeconomic resource allocation system based on a repeated combinatorial auction. Mirage allocates time on a heavily-used 148-node wireless sensor network testbed. In particular, we focus on observed strategic user behavior over a four-month period in which 312,148 node hours were allocated across 11 research projects. Based on these results, we present a set of key challenges for market-based resource allocation systems based on repeated combinatorial auctions. Finally, we propose refinements to the system's current auction scheme to mitigate the strategies observed to date and also comment on some initial steps toward building an approximately strategyproof repeated combinatorial auction.

Testbed

Combinatorial auction

Allocator

Resource Management

10.1145/1080192.1080195

Cite

Citations (41)

Scale-Out Networking in the Data Center

IEEE Micro (2010)

Amin Vahdat Mohammad Al-Fares Nathan Farrington Radhika Niranjan Mysore George Porter

Scale-out architectures supporting flexible, incremental scalability are common for computing and storage. However, the network remains the last bastion of the traditional scale-up approach, making it the data center's weak link. Through the UCSD Triton network architecture, the authors explore issues in managing the network as a single plug-and-play virtualizable fabric scalable to hundreds of thousands of ports and petabits per second of aggregate bandwidth.

Data center

10.1109/mm.2010.72

Cite

Citations (128)

ECOSystem

ACM SIGARCH Computer Architecture News (2002)

Heng Zeng Carla Ellis Alvin R. Lebeck Amin Vahdat

Energy consumption has recently been widely recognized as a major challenge of computer systems design. This paper explores how to support energy as a first-class operating system resource. Energy, because of its global system nature, presents challenges beyond those of conventional resource management. To meet these challenges we propose the Currentcy Model that unifies energy accounting over diverse hardware components and enables fair allocation of available energy among applications. Our particular goal is to extend battery lifetime by limiting the average discharge rate and to share this limited resource among competing task according to user preferences. To demonstrate how our framework supports explicit control over the battery resource we implemented ECOSystem, a modified Linux, that incorporates our currentcy model. Experimental results show that ECOSystem accurately accounts for the energy consumed by asynchronous device operation, can achieve a target battery lifetime, and proportionally shares the limited energy resource among competing tasks.

Resource Management

Limiting

10.1145/635506.605411

Cite

Citations (4)

Realistic and responsive network traffic generation

Kashi Venkatesh Vishwanath Amin Vahdat

This paper presents Swing, a closed-loop, network-responsive traffic generator that accurately captures the packet interactions of a range of applications using a simple structural model. Starting from observed traffic at a single point in the network, Swing automatically extracts distributions for user, application, and network behavior. It then generates live traffic corresponding to the underlying models in a network emulation environment running commodity network protocol stacks. We find that the generated traces are statistically similar to the original traces. Further, to the best of our knowledge, we are the first to reproduce burstiness in traffic across a range of timescales using a model applicable to a variety of network settings. An initial sensitivity analysis reveals the importance of capturing and recreating user, application, and network characteristics to accurately reproduce such burstiness. Finally, we explore Swing's ability to vary user characteristics, application properties, and wide-area network conditions to project traffic characteristics into alternate scenarios.

Burstiness

Network traffic simulation

10.1145/1159913.1159928

Cite

Citations (49)

A 10 µs Hybrid Optical-Circuit/Electrical-Packet Network for Datacenters

Nathan Farrington Alex Forencich Pang-Chen Sun Shaya Fainman Joe Ford

We built and evaluated a hybrid electrical-packet/optical-circuit network for datacenters using a 10 µs optical circuit switch using wavelength-selective switches based on binary MEMs. This network has the potential to support large-scale, dynamic datacenter workloads.

Optical burst switching

Packet Switching

Optical cross-connect

10.1364/ofc.2013.ow3h.3

Cite

Citations (32)

A chat room assignment for teaching network security

ACM SIGCSE Bulletin (2001)

W. Garrett Mitchener Amin Vahdat

This paper describes a chat room application suitable for teaching basic network programming and security protocols. A client/server design illustrates the structure of current scalable network services while a multicast version demonstrates the need for efficient simultaneous distribution of network content to multiple receivers (e.g., as required by video broadcasts). The system also includes implementations of two security protocols, one similar to Kerberos and another based on public key encryption.

Kerberos

Implementation

10.1145/366413.364532

Cite

Citations (2)

SiP-ML

Mehrdad Khani Manya Ghobadi Mohammad Reza Alizadeh Ziyi Zhu Madeleine Glick

This paper proposes optical network interconnects as a key enabler for building high-bandwidth ML training clusters with strong scaling properties. Our design, called SiP-ML, accelerates the training time of popular DNN models using silicon photonics links capable of providing multiple terabits-per-second of bandwidth per GPU. SiP-ML partitions the training job across GPUs with hybrid data and model parallelism while ensuring the communication pattern can be supported efficiently on the network interconnect. We develop task partitioning and device placement methods that take the degree and reconfiguration latency of optical interconnects into account. Simulations using real DNN models show that, compared to the state-of-the-art electrical networks, our approach improves training time by 1.3--9.1x.

Control reconfiguration

10.1145/3452296.3472900

Cite

Citations (40)

Lightwave Fabrics: At-Scale Optical Circuit Switching for Datacenter and Machine Learning Systems

Liu Hong Ryohei Urata Kevin Yasumura Xiang Zhou Roy Bannon

We describe our experience developing what we believe to be the world's first large-scale production deployments of lightwave fabrics used for both datacenter networking and machine-learning (ML) applications. Using optical circuit switches (OCSes) and optical transceivers developed in-house, we employ hardware and software codesign to integrate the fabrics into our network and computing infrastructure. Key to our design is a high degree of multiplexing enabled by new kinds of wavelength-division-multiplexing (WDM) and optical circulators that support high-bandwidth bidirectional traffic on a single strand of optical fiber. The development of the requisite OCS and optical transceiver technologies leads to a synchronous lightwave fabric that is reconfigurable, low latency, rate agnostic, and highly available. These fabrics have provided substantial benefits for long-lived traffic patterns in our datacenter networks and predictable traffic patterns in tightly-coupled machine learning clusters. We report results for a large-scale ML superpod with 4096 tensor processing unit (TPU) V4 chips that has more than one ExaFLOP of computing power. For this use case, the deployment of a lightwave fabric provides up to 3× better system availability and model-dependent performance improvements of up to 3.3× compared to a static fabric, despite constituting less than 6% of the total system cost.

Optical networking

Transceiver

Multiwavelength optical networking

Optical Transport Network

Traffic Grooming

Optical cross-connect

10.1145/3603269.3604836

Cite

Citations (19)

PLuSH: A Tool for Remote Deployment, Management, and Debugging

Christopher A. Tuttle Jeannie Albrecht Alex C. Snoeren Amin Vahdat

Source

Cite

Citations (2)