

User browser Domain Name System (DNS) lookup to map to a particular IP address Multiple Google clusters distributed worldwide each cluster with a few thousand machines to handle query traffic
#GOOGLE COMPUTER PROGRAMS FOR ARCHITECTURE SOFTWARE#
Reliability provided in software level rather than in server-class hardware so that commodity PCs can be used to build a cluster at a low price Design for best aggregate throughput rather than peak server response time Building a reliable computing infrastructure from clusters of unreliable commodity PCsġ0 SERVING A GOOGLE QUERY When user enters a queryĮ.g. leverage high volume off-the-shelf switches and computers Amazon, AOL, Google, Hotmail, and Yahoo rely on clusters of PCs to provide services used by millions of people every day cost of administering a shared address space N processors multiprocessor administering 1 big machine Clusters usually connected using I/O bus whereas multiprocessors usually connected on memory bus Cluster of N machines has N independent memories and N copies of OS but a shared address multi-processor allows 1 program to use almost all memoryĮrror isolation separate address space limits contamination of error Repair Easier to replace a machine without bringing down the system than in a shared memory multiprocessor Scale easier to expand the system without bringing down the application that runs on top of the cluster Cost Large scale machine has low volume => fewer machines to spread development costs vs. Often need to be highly available, requiring error tolerance and repairability Often need to scaleĬost of administering a cluster of N machines administering N independent machines vs. Require high amounts of computation per request A single query on Google (on average) reads hundreds of megabytes of data consumes tens of billions of CPU cycles A peak request stream on Google Thousands of queries per second requires an infrastructure comparable in size to largest supercomputer installationsĬombines more than 15,000 commodity-class PCs Instead of a smaller number of high-end servers Most important factors that influenced the design Energy efficiency Price-performance ratio Google application affords easy parallelization Different queries can run on different processors A single query can use multiple processors because the overall index is partitionedĬollection of independent computers using switched network to provide a common service Many mainframe applications run more "loosely coupled" machines than shared memory machines databases, file servers, Web servers, simulations, etc. Google architecture overview Serving a Google query Design principles of Google clusters Leveraging commodity parts Power problem Hardware-level characteristics Memory system SummaryĤ INTRODUCTION Search engines A single query on Google (on average) One of the mostly known and used search engines today How it can achieve such a processing power under such big workloadģ OUTLINE Introduction Cluster architectures 2 PURPOSE To overview the computer architecture of Google
