A review of the IT artifact in
Grid, Cloud and Utility Computing.
In a much quoted paper,
Orlikowski and Iacconno (2001) charged the IS research community with
ignoring technology or failing to make it the central focus of research,
and thereby cheating the information systems field of its raison d’etre.
Perhaps their critique is well taken, certainly it seems to have incited
many to boldly claim that they have seen the light (or never lost sight
of it) and will now account for the IT artefact be it RFID, mobile
devices, CRM etc, and not by proxy. But in all this the most ancient,
enduring and distinctive manifestation of ‘the IT artefact’, as a
computer, has been overlooked – its raw power as an information storage
and processing device. It almost seems that, since the coming of the
VLSI chip, and the year on year empirical re-validation of Moore’s Law
(Moore 1965), that computing power and the computer itself can be just
taken for granted[1].
We do not seem very often to worry about the computer and assume that if
we need more computing power then it will be just around the corner (Cornford
2003). Yet we observe recently a change in the nature of computing which
reorients our assumptions about this “computer power” from individual to
collective. We observe the changing nature of computer processing within
the information systems field, reviewing in particular the shift from
individualised PCs running shrink-wrapped software packages supported by
servers and databases, to a new form of distributed computing model
represented by Utility, Grid and Cloud computing. These new concepts
remain poorly defined within information systems and are dominated by
practitioner marketing materials which confuse. The focus of this review
is thus on the new Computing Artifact, and the technological components
it is constructed from.
Utility computing is the
conceptual core of our analysis but much of the current debate on this
idea is discourse on the concept of “Cloud Computing” – a more
marketable vision perhaps than utility computing. Cloud Computing is a
new and confused term. Gartner define cloud computing succinctly as “a
style of computing where massively scalable IT-related capabilities are
provided ‘as a service’ using Internet technologies to multiple external
customers”.
Yet our interest is not in the particulars of cloud computing itself but
the opportunities presented for researchers and practitioners by this
new technology. We argue that fundamental to both cloud computing and
utility computing is a decoupling of the physicality of IT
infrastructure from the architecture of such infrastructures use. While
in the past we thought about the bare-metal system (a humming grey box
in an air-conditioned machine room with physical attributes and a host
of peripherals) today such ideas are conceptual and virtualized – hidden
from view. It is this decoupling which will form the basis of our
discussion of the technology of the Grid.
There certainly is a strong
element of hype in much of the Utility, Grid and Cloud computing
discourse and perhaps such hype is necessary. As Swanson and Ramillar
(1997) remind us, the organising visions of information and
communications technology are formed as much in extravagant claims and
blustering sales talk as they are in careful analysis, determination of
requirements or proven functionality. We can at times observe a
distinct tension between the technologists’ aspiration to develop and
define an advanced form of computer infrastructure, and a social
construction of such technology through discourses of marketing, public
relations. We find a plethora of terms associated with Utility
computing within commercial settings include Autonomic Computing; Grid
Computing; On-Demand Computing; Real-time Enterprise; Service-Oriented
Computing; Adaptive computing (or Adaptive Enterprise) (Goyal and
Lawande 2006; Plaszczak and Wellner 2007) and peer-to-peer computing
(Foster and Iamnitchi 2003). We have adopted the term “utility
computing” as our categorization of this mixed and confused definitional
landscape.
Many authors who write about
Utility Computing start with an attempt to provide a definition, often
accompanied by a comment as to the general “confusion” surrounding the
term (e.g. (Gentzsch 2002)). It is unrealistic to expect an accepted
definition of a technology which is still emerging, but by tracing the
evolution of definitions in currency we can see how the understanding of
new technology is influenced by various technical, commercial and
socio-political forces. Put another way, the computer is not a static
thing, but rather a collection of meanings that are contested by
different groups (Bijker 1995), and as any other technology, embodies to
degrees its developers’ and users’ social, political, psychological,
and professional commitments, skills, prejudices, possibilities and
constraints.
Computing Utility: The
Shifting nature of Computing.
Since Von Neumann defined our
modern computing architecture we have seen computers as consisting of a
processing unit (capable of undertaking calculation) and a memory
(capable of storing instructions and data for the processing unit to
use). Running on this machine is operating system software which
manages (and abstracts) the way applications software makes use of this
physical machine. The development of computing networks, client-server
computing and ultimately the internet essentially introduced a form of
communication into this system – allowing storage and computing to be
shared with other locations or sites - but ultimately the concept of a
"personal computer" or "server computer" remains.
This basic computer architecture
no longer represents computing effectively. Firstly the physical
computer is becoming virtualized – represented as software rather than
as a physical machine. Secondly it is being distributed through Grid
computing infrastructure such that it is owned by virtual rather than
physical organizations. Finally these two technologies are brought
together in a commoditization of computing infrastructure as cloud
computing – where all physicality of the network and computer is hidden
from view. It is for this reason that in 2001 Shirky –at a P2P
Webservices conference stated that “Thomas Watson’s famous quote that’
I think there is a world market for maybe five computers’
was wrong - he overstated the
number by four”. For Shirky the computer was now a single device
collectively shared. All PCs, mobile phones and connected devices share
this Cloud of services on demand – and where processing occurs is not
relevant. We now review the key technologies involved in Utility
Computing (see table).
|
1: Internet – Bandwidth and Internet Standards |
At the core of the Utility Computing model is the network. The
internet and its associated standards have enabled
interoperability among systems and provides the foundation for
Grid Standards. |
|
2: Virtualisation
|
Central to the Cloud Computing idea is the concept of Virtualising the
machine. While we desire services, these are provided by
personal-machines (albeit simulated in software). |
|
3:Grid Computing Middleware and Standards |
Just as the Internet infrastructure (standards, hardware and
software) provides the foundation of the Web, so Grid Standards
and Software extend this infrastructure to provide utility
computing utilising large clusters of distributed computers.
|
Internet – Bandwidth and Standards
The internet emerged because of attempts to connect mainframe computers
together to undertake analysis beyond the capability of one machine -
for example within the SAGE air-defence system or ARPANET for scientific
analysis (Berman and Hey 2004). Similarly the Web emerged from a desire
to share information globally between various different computers
(Berners-Lee 1989). Achieving such distribution of resources is however
founded upon a communications infrastructure (of wires and radio-waves)
capable of transferring information at the requisite speed (bandwidth)
and without delays (latency). Until the early 2000s however the
bandwidth required for large applications and processing services to
interact was missing. During the dot-com boom however a huge amount of fibre-optic cable and network routing equipment was installed across the
globe by organisations, such as the failed WorldCom, which reduced costs
dramatically and increased availability.
Having an effective network infrastructure in place is not enough. A set
of standards (protocols) are also required which define mechanisms for
resource sharing (Baker, Apon et al. 2005). Internet standards
(HTTP/HTML/TCP-IP) made the Web possible by defining how information is
shared globally through the internet. These standards ensure that a
packet of information is reliably directed between machines. It is this
standardised high-speed high-bandwidth Internet infrastructure upon which Utility Computing is built.
Virtualisation
Virtualization for cloud
computing is a basic idea of providing a software simulation of an
underlying hardware machine. These simulated machines (so called
Virtual Machines) present themselves to the software running upon them
as identical to a real machine of the same specification. As such the
virtual machine must be installed with an operating system (e.g. Windows
or Linux) and can then run applications within it. This is not a
new technology and was first demonstrated in 1967 by IBM’s CP/CMS
systems as a means of sharing a mainframe with many users who are each
presented with their own “virtual machine” (Ceruzzi 2002). However its
relevance to modern computing rests in its ability to abstract the
computer away from the physical box and onto the internet. “Today the
challenge is to virtualize computing resources over the Internet. This
is the essence of Grid computing, and it is being accomplished by
applying a layer of open Grid protocols to every “local” operating
system, for example Linux, Windows, AIX, Solaris, z\OS”
(Wladawsky-Berget 2004). Once such Grid enabled virtualization is
achieved it is possible to decouple the hardware from the now
virtualized machine, for example running multiple virtual machines on
one server or moving a virtual machine between servers using the
internet. Crucially for the user it appears they are interacting with a
machine with similar attributes to a desktop machine or server - albeit
somewhere within the internet-cloud.
Grid Computing
The term “Grid” is increasingly
used in discussions about the future of ICT infrastructure, or more
generally in discussion of how computing will be done in the
future. Unlike “Cloud computing” which emerges and belongs to an IT
industry and marketing domain, the term “Grid Computing” emerged from
the super-computing (High Performances Computing) community (Armbrust,
Fox et al. 2009). Our discussion of Utility computing begins with this
concept of Grids as a foundation. As with the other concepts however for
Grids hyperbole around the concept abounds, with arguments proposed that
they are “the next generation of the internet”, “the next big thing”; or
that will “overturn strategic and operating assumptions, alter
industrial economics, upset markets (…) pose daunting challenges for
every user and vendor” (Carr 2005) and even “provide the electronic
foundation for a global society in business, government, research,
science and entertainment” (Berman, Fox et al. 2003). Equally, Grids
have been accused of faddishness and that “there is nothing new” in
comparison to older ideas, or that the term is used simply to attract
funding or to sell a product with little reference to computational
Grids as they were originally conceived (Sottrup and Peterson 2005).
From a technologists perspective
an overall description might be that Grid technology aims to provide
utility computing as a
transparent, seamless and dynamic delivery of computing and data
resources when needed, in a similar way to the electricity power Grid
(Chetty and Buyya 2002; Smarr 2004). Indeed the word grid is directly
taken from the idea of an electricity grid, a utility delivering power
as and when needed. To provide that power on demand a Grid is built
(held together) by a set of standards (protocols) specifying the control
of such distributed resources. These standards are embedded in the Grid
middleware, the software which powers the Grid. In a similar way to how
Internet Protocols such as FTP and HTTP enable information to be past
through the internet and displayed on users PCs, so Grid protocols
enable the integration of resources such as sensors, data-storage,
computing processors etc (Wladawsky-Berget 2004).
The idea of the Grid is usually
traced back to the mid 1990s and the I-Way project to link together a
number of US supercomputers as a ‘metacomputer’ (Abbas, 2004). This was
led by Ian Foster of the University of Chicago and Argonne National
Laboratory. Foster and Carl Kesslemenn then the Globus project to
develop the tools and middle ware for this metacomputer.
This tool kit rapidly took off in the world of supercomputing and Foster
remains a prominent proponent of the Grid. According to Foster and
Kesselman’s (1998) “bible of the grid” a computational Grid is
“a hardware and software
infrastructure that provides dependable, consistent, pervasive and
inexpensive access to high-end computational capabilities”. In this
Foster highlights “high-end” in order to focus attention on Grids as
supercomputing resource supporting large scale science; “Grid
technologies seek to make this possible, by providing the protocols,
services and software development kits needed to enable flexible,
controlled resource sharing on a large scale” (Foster 2000).
Three years after their first
book however the same authors shift their focus, again speaking of
Grids as "coordinated
resource sharing and problem solving in dynamic, multi-institutional
virtual organizations" (Foster, Kesselman et al. 2001).
The inclusion of “multi-institutional” within this 2001 definition
highlights the scope of the concept as envisaged by these key Grid
proponents, with Berman
(2003) further adding that Grids
enable resource sharing “on a
global scale”. Such definitions, and the concrete research projects that
underlie them, make the commercial usage of the Grid seem hollow and
opportunistic. These authors seem critical of the contemporaneous
re-badging by IT companies of existing computer-clusters and databases
as “Grid enabled”
(Goyal and Lawande 2006; Plaszczak and Wellner
2007). This critique seems
to run through the development of Grids within supercomputing research
and science where many lament the use of the term by IT companies
marketing clusters of computers in one location.
In 2002 Foster provides a three
point checklist to assess a Grid
(Foster 2002). A Grid 1)
coordinates resources that are NOT subject to centralized control; 2)
uses standard, open, general purpose protocols and interfaces; 3)
delivers non-trivial qualities of service. Fosters highlighting of
‘NOT’, and the inclusion of ‘open protocols’ appear as a further
challenge to the commercialization of centralized, closed grids. While
this checkpoint was readily accepted by the academic community and is
widely cited, unsurprisingly, it was not well received by the commercial
Grid community (Plaszczak and Wellner 2007). The demand for
“decentralization” was seen as uncompromising and excluded “practically
all known ‘grid’ systems in operation in industry” (Plaszczak and
Wellner 2007, p57). It is perhaps in response to this definition that
the notion of “Enterprise Grids” (Goyal and Lawande 2006) emerged as a
form of Grid operating within an organisation, though possibly employing
resources across multiple corporate locations employing differing
technology. It might ultimately be part of the reason why "Cloud
computing" has eclipsed Grid computing as a concept. The commercial usage of Grid terms such as “Enterprise Grid
Computing” highlights the use of Grids away from the perceived risk of
globally distributed Grids and is the foundation of modern Cloud
Computing providers (e.g Amazon S3). The focus is not to achieve
increased computing power through connecting distributed clusters of
machines, but as a solution to the “Silos of applications and IT systems
infrastructure” within an organisation’s IT function (Goyal and Lawande
2006, p4) through a focus on utility computing and reduced complexity.
Indeed in contrast to most academic Grids such “Enterprise Grids” demand
homogeneity of resources and centralization within Grids as essential
components. It is these Grids which form the backdrop for Cloud
Computing and ultimately utility computing in which cloud provider
essentially maintain a homogenous server-farm providing virtualized
cloud service. In such cases the Grid is far from distributed, rather
existing as “a centralized pool of resources to provide dedicated
support for virtualized architecture” (Plaszczak and Wellner 2007,p174)
often within data-centers.
Before considering the nature of
Grids we discuss their underlying architecture. Foster (Foster,
Kesselman et al. 2001) provides an hour-glass Grid architecture (Figure
1). It begins with the fabric which provides the interfaces to the local
resources of the machines on the Grid (be they physical or virtual
machines). This layer provides the local, resource-specific facilities
and could be computer processors, storage elements , tape-robots,
sensor, databases or networks. Above this is a resource and connectivity
layer which defines the communication and authentication protocols
required for transactions to be undertaken on the Grid. The next layer
provides a resource management function including directories, brokering
systems, as well as monitoring and diagnostic resources. In the final
layer reside the tools and applications which use the Grid. It is here
that Virtualization software resides to provide services.

Figure 1: The Layered Grid
Architecture from Foster 2004.
One of the key challenges of
Grids is the management of the resources they manage for the users.
Central to achieving this is the concept of a Virtual Organisation (VO).
A Virtual Organisation is a set of individuals and/or institutions
defined by the sharing rules for a set of resources (Foster and
Kesselman 1998) or “a set of Grid entities, such as individuals,
applications, services or resources, that are related to each other by
some level of trust” (Plaszczak and Wellner 2007). By necessity these
resources must be controlled “with resource providers and consumers
defining clearly and carefully just what is shared, who is allowed to
share, and the conditions under which sharing occurs” (Foster and
Kesselman 1998) and for this purpose VOs are technically defined along
with the rules of their resources sharing. A Grid VO implies the
assumptions of “the absence of central location, central control,
omniscience, and an existing trust relationship” (Abbas 2004). It is
this ability to control access to resources which is also vital within
Cloud Computing - allowing walled-gardens for security and accounting of
resource usage for billing.
Various classes and categories of
Grids exist. According to Abbas Grids can be categorised according to
their increasing scale - desktop grids, cluster grids, enterprise grids,
and global grids (Abbas 2004). Desktop Grids are based on
existing dispersed desktop PC’s and can create a new computing resource
by employing unused processing and storage capacity while the existing
user can continue to use the machine. Cluster Grids describe a
form of parallel of distributed computer system that consists of a
collection of interconnected yet standardised computer nodes
working together to act, as far as the user is concerned, as a single
unified computing resource.
Many existing supercomputers are clusters which “use Smart Software
Systems (SSS) to virtualise independent operation-system instances to
provide an HPC
service” (Abbas 2004).
All the above are arguably grids,
and potentially can just about live up to Fosters 3 tests. However, for
the information systems field, for Pegasus, and for those who wish to
explore Cloud Computing, it is the final category of global Grids that
is the most significant. Global Grids employ the public internet
infrastructure to communicate between Grid Nodes, and rely on
heterogeneous computing and networking resources. Some global grids
have gained a large amount of publicity by providing social benefit
which capture the public imagination. Perhaps the first large scale such
project was SETI@home which searches radio-telescope data for signs of
extra-terrestrial intelligence. WorldCommunityGrid.org undertaking
research for healthcare and Folding@home concerned with protein folding
experiments are other examples. Folding@home indeed can claim to be the
worlds most powerful distributed computing network according to the
Guinness Book of Records, with 700,000 Sony PlayStation 3 machines and
over 1,000 trillion calculations per second.
Each works by dividing a problem into steps and distributing software
over the internet to the computers of those volunteering. Since within
the home and workplace a
large number of desktop computers remain idle most of the time such
donations have little impact on the user. Indeed the average computer is
idle for over 90% of the time, and even when used only a very small
amount of the CPU’s capabilities are employed (Smith 2005).
Another way to categories Grids
is by the types of solutions that they best address (Jacob 2003). A
computational grid is focused on undertaking large numbers of
computations rapidly, and hence the focus is on using high performance
processors. A data grid’s focus is upon the effective storage and
distribution of large amounts of data, usually across multiple
organisations of locations. The focus of such systems is upon data
integrity, security and ease of access. It should be stressed that there
are no hard boundaries between these two types of grid, and one need
often pre-supposes the other and real users face both issues.
As an example of a grid project
with a more data orientation, consider the Biomedical Informatics
Research Network, a grid infrastructure project that serves biomedical
research needs
http://www.nbirn.net/index.shtm.
They express their offerings in terms of 5 complementary elements; a
cyber infrastructure, software tools (applications) for biomedical data
gathering, resources of shared data, data integration support, an
ontology and support for multi-site integration of research activity. As
they say, “By intertwining concurrent revolutions occurring in
biomedicine and information technology, BIRN is enabling researchers to
participate in large scale, cross-institutional research studies where
they are able to acquire, share, analyze, mine and interpret both
imaging and clinical data acquired at multiple sites using advanced
processing and visualization tools.”
Other examples of Grid Computing
exist within science, particularly particle physics. The particle
physics community faces the challenge of analyzing the unprecedented
amounts of data - some 15 Petabytes per year - that will be produced by
the LHC (Large Hadron Collider) experiments at CERN.
To process this data CERN required around 100,000 computer-equivalents
forming its associated grids by 2007, spread across the globe and
incorporating a number of grid infrastructures (Faulkner, Lowe et al.
2006). In using the Grid physicists submit their computing-jobs to the
Grid which spreads across the globe. Similarly data from the LHC is
initially processed at CERN but is quickly spread to 12 computer centres
across the world (so called Tier-1 Grid sites). From here data is spread
to local data-centres at universities within these countries (Tier-2
sites).
Conclusions
In summary then the concept of a
Grid is bound up in the purpose to which it is put, and the foundational
technology upon which it is based. For its user however this complexity
is hidden – they remain presented with a computing resource which, they
may continue to believe, is a single processor able to analyze their
data – just as Von Neumann had envisaged it.
References
Abbas, A. (2004).
Grid Computing: A Practical Guide to Technology and Applications.
MA, Charles River Media.
Armbrust, M., A.
Fox, et al. (2009). Above the Clouds: A Berkeley View of Cloud
Computing, UC Berkeley Reliable Adaptive Distributed Systems Laboratory.
Baker, M.,
A. Apon, et al. (2005). "Emerging Grid Standards." Computer.
Berman, F. and T. Hey (2004). The Scientific
Imperative. The Grid 2. I. Foster and C. Kesselman. San
Francisco, Morgan Kaufmann.
Berman, F., G.
Fox, et al. (2003). The Grid: past, present, future. Grid Computing -
Making the Global Infrastructure a Reality. F. Berman, G. Fox and T.
Hey, John Wiley & Sons, Ltd.
Berners-Lee, T. (1989). Information
Management: A Proposal. CERN.
http://www.w3.org/History/1989/proposal.html, W3 Organization
Archive.
Bijker, W. (1995).
Of Bicycles, Bakerlites and Bulbs; Toward a theory of Socio-Technical
Change. Cambridge, MA, MIT Press.
Carr, N. (2005).
"The End of Corporate Computing." MIT Sloan Management Review
46(3): 67-73.
Ceruzzi, P.
(2002). A History of Modern Computing. Cambridge,MA, MIT Press.
Chetty, M. and R.
Buyya (2002). "Weaving computational grids: how analogous are they with
electrical grids?" Computing in Science & Engineering 4(4):
61-71.
Cornford, T.
(2003). Information Systems and New Technologies: Taking Shape in Use.
Information Systems and the Economics of Innovation. C. Avgerou.
Cheltenham, Edward Elgar Publishing: 162-177.
Faulkner, P., L.
Lowe, et al. (2006). "GridPP: development of the UK computing Grid for
particle physics." Journal of Physics G: Nuclear and Particle
Physics. 32: N1-N20.
Foster, I. (2002).
"What is the Grid? A Three Point Checklist." GridToday 1(6).
Foster, I. and A.
Iamnitchi (2003). On Death, Taxes, and the Convergence of
Peer-to-Peer and Grid Computing 2nd International Workshop on
Peer-to-Peer Systems (IPTPS'03), , , Berkeley, CA.
Foster, I. and C.
Kesselman (1998). The Grid: Blueprint for a New Computing
Infrastructure, Elsevier.
Foster, I., C.
Kesselman, et al. (2001). "The anatomy of the Grid." International
Journal of Supercomputer Applications 15(3): 200-222.
Foster, I., C.
Kesselman, et al. (2001). "The Anatomy of the Grid: Enabling Scalable
Virtual Organizations." The International Journal of High Performance
Computing Applications 15(3): 200-222.
Gentzsch, W.
(2002). "Response to Ian Foster's "What is the Grid?"." GridToday
1(8).
Goyal, B. and S.
Lawande (2006). Grid Revolution: An Introduction to Enterprise Grid
Computing. New York, McGraw-Hill Osborne.
Jacob, B. (2003,
27 Jun 2006). "Grid computing: What are the key components?" Retrieved
26 October 2006, 2006, from
http://www-128.ibm.com/developerworks/library/gr-overview/.
Moore, G. E.
(1965). "Cramming more components onto integrated circuits."
Electronics 38(8).
Orlikowski, W. J.
and C. S. Iacono (2001). "Research Commentary: Desperately Seeking the
"IT" in IT Research--A Call to Theorizing the IT Artifact."
Information Systems Research 12(2): 121 -134.
Plaszczak, P. and
R. Wellner (2007). Grid Computing: A Savvy Manager's Guide.
Amsterdam, Elsevier.
Smarr, L. (2004).
Grids in Context. The Grid 2. I. Foster and C. Kesselman. San
Francisco, Morgan Kaufmann.
Smith, R. (2005).
Grid Computing: A Brief Technology Analysis. CTO Network Library.
Swanson, E. B. and
N. C. Ramiller (1997). "The Organizing Vision in Information Systems
Innovation." Organization Science 8(5): 458-474.
Wladawsky-Berget,
I. (2004). The Industrial Imperative. The Grid 2. I. Foster and
C. Kesselman. San Francisco, Morgan Kaufmann.
|