A study of distributed e-commerce system effectiveness

(1)

Uniwersytet Łódzki

Summary

The spread of communication networks, and in particular the growth of afford-able broadband Internet in developed countries, has enafford-abled organizations to share their computational resources. The paper presents the study of effectiveness between the techniques like Java RMI and CORBA which are currently used to build distrib-uted resources. The obtained results showed that the RMI technology is much faster than the CORBA one in case seeking algorithms.

Keywords: distributed resources, search algorithm, RMI technology, COBRA technology

1. Introduction

The spread of high-speed broadband networks in the world, the continual increase in compu-ting power, and the growth of the Internet have changed the way in which society manages information and information services. Thus, distributed resources, such as storage devices, data sources, and supercomputers, are interconnected and can be exploited by users around the world as single, unified resource. These systems enable the sharing, selection, and aggregation by users of a wide variety of geographically distributed resources owned by different organizations and is well-suited for solving IT resourcintensive problems in science, engineering as well in e-commerce. For instance, the world’s largest company and banking group HSBC [12] uses an information system with more than 3,500 CPUs operating in its data centres in four countries. They carry out derivative trades, which rely on making numerous calculations based on future events, and risk analysis, which also looks to the future, calculating risks based on available information [13]. The German shipyard FSG [14] uses high performance computing resources to solve complex and CPU-intensive calculations to create individual ship designs in a short time. On-demand access to resources, which are not available locally or which are only needed tempo-rarily, reduces cost of ownership and reduces technical and financial risks in the ship design.

According to [6] a distributed system is a non-centralized network consisting of numerous computers that can communicate with one another and that appear to users as parts of a single, large, accessible "storehouse" of shared hardware, software, and data. A distributed system is conceptually the opposite of a centralized, or monolithic, system in which clients connect to a single central computer, such as a mainframe.

The distributed systems are developed on the top of existing operating systems and software. Unfortunately, these systems are very difficult to maintain and build. A new layer of the software called middle-ware is being developed to simplify the development of software. All of this is followed by agreed protocols and standards providing services like naming, concurrency control, persistence, event distribution, security and authorization. Examples of such services are Network

(2)

Time Protocol (abbr. NTP) and Distributed Name Services (DNS) [7, 2].

The typical way to organize distributed systems applications is represented by the Client-Server Model [2, 9]. In such model a program, which is at some node acts as a server, which provides some sort of service. Examples of such applications are a Name Service, a Data Base Service or a File Service, etc. First the Client sends a request to the Server. As a next step the Server replies to the requests by sending the response back to the Client. In the distributed resource the server can be stateless or state-full. Also the server can be remote or local. The example of a local server is when the server and the client are on the same machine. The remote server is when the client has to access the server through a network. The second one that is the remote server is the mostly used. The server can act as concurrent or interactive. In the former case it processes one request at a time, in the letter it can service a number of requests at the same time, for example by forking a separate process (or creating a separate thread) to handle each requests. In a concurrent server, the processes or threads that handle the individual requests are called slaves (or workers) of the server. These slaves may be created “on demand” when a request is received, and deleted when the request has been handled. Or they may be “pre-allocated” into a pool of slaves when the server is started, available to handle future requests. The size of this pool can be made adaptive to the load on the server. Other servers are called multiplexed; they are concurrent without using forks or threads, using instead the “select” functions to respond to multiple connect-ed sockets.

The advantages and disadvantages of distributed systems are collected in tables 1 and 2. Table 1. Advantages of distributed systems [2]

Scope Advantage

1 Economics Microprocessors nowadays offer better prices and performance in comparison to mainframes

2 Speed Distributed systems have (may have) totally a greater amount of computer power than a mainframe.

3 Reliability In situations when for example one machine on the distributed system crashes the whole of the environment will function properly without this one machine. 4 Incremental growth The growth of computing power on the distributed system can be added slowly in

small increments.

5 Data sharing Users from all over the environment can access the same database at the same time.

6 Device sharing Users can share the same device on all the system. Devices link printers/plotters, files, services.

7 Flexibility The work can be spread all over the system, which is cost effective. Table 2. Disadvantages of distributed systems [2]

Scope Disadvantage

1 Software Now a day there is still little software that can be used on distributed systems. 2 Security issues The system is much more susceptible to external attacks

3 Network Due to the fact that the system is vast the network may cause many different problems.

(3)

The interaction the client onwards to the server is done very often by the Remote Procedure Call (RPC) [7], Remote Method Invocation (RMI) [1] or CORBA [8]. Objects, at different times, may play the role of client or the role of server. In fact an object can act at one time as a server (responding to a request) and as a client (requesting service from a third object as part of imple-menting its own service).

In the Client-Server Architecture the initiative lays with the client. The server is always there at a known endpoint (usually it’s an IP address and the port) waiting for requests. It only responds to requests, it never initiates an interaction with a client. A peer-to-peer architecture instead em-phasizes that the interaction initiative may lay with any object and all objects may be fairly equivalent in terms of resources and responsibilities [10]. In the client-server architecture usually most of the resources and responsibilities lay with the servers. The clients (if Thin clients) may just be responsible for the Presentation Layer to the user and for network communications.

A further distinction when talking of the "state" of a server is between Hard-state and Soft-state. Hard-state is true state, in the sense that if it is lost because of a crash, we are lost: the server interaction has to be fully re-initialized. Instead soft-state if lost it can be reconstructed. In other words, it is a Hint that improves the performance of a server when it is available. But, if lost, the state can be recomputed. For example, the NFS server could maintain for each recent request information on user, cursor position, etc. Then if another request arrives from that user for that file, this state information can be used to speed up access. If this 'state" information is lost or becomes stale, then it can be reconstructed because a NSF request carries all the information required to satisfy it. Note that a server with soft-state is really just a stateless server.

The state-full/stateless distinction is used also when talking of protocols used in the interac-tion client/server. So we talk of State-full or Stateless Protocols. HTTP, the protocol used for www interactions, is stateless, since its response to requests is independent of each other, and so is the web server [4]. Also the Network File System (NFS) protocol is stateless. It deals with accessing files made available by a server on a network. All requests are made as if no Open file command was available, thus the requests must explicitly state in the request the name of the file and the position of the cursor. Instead the File Transfer Protocol (FTP) is state-full since a session is established and requests made in context, for example, the current working directory.

Though HTTP is stateless, it can be made state-full by having the clients preserve state infor-mation and sending it to the server at the time requests are made. This is the role played by cookies. When the client sends a request, it can provide the state information (the cookie) with the Cookie header. When the server responds, it places in the header, with the Set-Cookie header, the state, appropriately updated. The fact that the state is kept in the client makes the server more scalable since no space has to be devoted to the client’s state. Also, if the server crashes, then the client's state is not affected. More about technical discussion of cookies and their use in state management are presented in [3].

When servers require the power of many computer systems two architectures are commonly used. First architecture uses vertical distribution, arranging computers on a number of tiers, usually three. This is the 3-tier architecture: the tier where user interactions take place, the presentation tier; the tier where user requests are received and analyzed, the application tier; the tier where data is preserved and manipulated, the data tier. Upon receiving a user request appropriate interactions take place between the application tier and the data tier. They result in the development within the application tier of the response to be sent to the presentation tier.

(4)

The other architecture uses horizontal distribution between a numbers of equivalent, fully functional servers. User requests are routed, usually in round-robin fashion or hashed on the basis of the requesting IP address, to one of the available servers. This architecture is often used in heavily loaded web servers.

2. Remote Method Invocation technology

Remote Method Invocation (RMI) technology [5, 9], first introduced in JDK 1.1 package, ele-vates network programming to a higher plane. Although RMI is relatively easy to use, it is a remarkably powerful technology and exposes the average Java developer to entirely new para-digm – the world of distributed object computing. A primary goal for the RMI designers was to allow programmer to develop distributed Java programs with the same syntax and semantics used for non distributed programs. To do this, the designers have developed the joint Java classes and objects working both in a single Java Virtual Machine (JVM) and in a distributed computing environment (multiple JVM).

The RMI architects tried to make the use of distributed Java objects similar to using local Java objects. While they succeeded, some important differences are listed in table 3.

The design goal for the RMI architecture was to create a Java distributed object model that in-tegrates naturally into the Java programming language and the local object model. RMI architectures have succeeded, creating a system that extends the safety and robustness of the Java architecture to the distributed computing world.

A client program makes method calls on the proxy object, RMI sends the request to the re-mote JVM, and forwards it to the implementation. Any return values provided by the implementation sent back to the proxy and then to the client’s program.

Table 3. Comparison of Distributed and Non-distributed Java Programs [10]

Local Object Remote Object

Object Definition A local object is defined by a Java class

A remote objects behaviour is defined by an interface that must extend the Remote interface Object Implementation A local object is implemented by

its Java class

A remote object’s behaviour is executed by a Java class that implements the remote interface

Object Creation A new instance of a local object is created by the new operator

A new instance of a remote object is created on the host computer with the new operator. A client cannot directly create a new remote object (unless using Java 2 Remote Object Activation).

Object Access A local object is accessed directly via an object reference variable

A remote object is accessed via an object reference variable which points to a proxy stub implementa-tion of the remote interface

References In a single JVM, an object reference points directly at an object in the heap

A “remote reference” is a pointer to a proxy object in the local heap. That stub contains information that allows it to connect to a remote object, which contains the implementation of the methods. Active References In a single JVM, an object is

considered “alive” if there is at least one reference to it

In a distributed environment remote JVMs may crash, and network connections may be lost. A remote object is considered to have an active remote reference to it, if it has been accessed within

(5)

Local Object Remote Object

a certain time period. If all remote references have been explicitly dropped, or if all remote references have expired leases, then remote object is available for distributed garbage collection.

Finalization If an object implements the

finalize method, it is called

before an object is reclaimed by the garbage collector.

If a remote object implements the Unreferenced interface, the unreferenced method of that interface is called when all remote references have been dropped.

Garbage Collection When all local references to an object have been dropped, an object becomes a candidate for garbage collection.

The distributed garbage collector works with the local garbage collector. If there are no remote references and all local references to a remote object have been dropped, then it becomes a candidate for garbage collection through the normal means.

Exception Exceptions are either Runtime exceptions or Exceptions. The Java compiler forces a program to handle all Exceptions

RMI forces programs to deal with any possible Remote Exception objects that may be thrown. This was done to ensure the robustness of distributed applications.

3. CORBA technology

CORBA, which stands for Common Object Request Broker Architecture, is an industry stand-ard developed by the OMG (a consortium of more than 700 companies) to aid in distributed objects programming. CORBA is just a specification for creating and using distributed objects; CORBA is not a programming language.

The CORBA architecture is based on the object model. This model is derived from the ab-stract core object model defined by the OMG in the “Object Management Architecture Guide”, which can be found at http://www.omg.org. The model is abstract in the sense that it is not directly realized by any particular technology. This allows applications to be built in a standard manner using basic building blocks such as objects. Therefore, a CORBA-based system is a collection of objects that isolates the requestors of services (clients) from the providers of services (servers) by a well-defined encapsulation interface. It is important to note that CORBA objects differ from typical programming objects in three ways:

• CORBA objects can be located anywhere on the network,

• CORBA objects can we written in any language that has IDL mapping, • CORBA objects can run on any platform.

The OMG’s Object Management Architecture (OMA) [11] tries to define the various high-level facilities that are necessary for distributed object-oriented computing. The core of the OMA is the Object Request Broker (ORB), a mechanism that provides object location transparency, communication, and activation. Based on the OMA, the CORBA specification, which provides a description of the interfaces and facilities provided by compliant ORBs, was released.

As with RMI, CORBA objects are to be specified with interfaces, which are the contract be-tween the client and server. In CORBA’s case, however, interfaces are specified in the special definition language IDL.

(6)

The IDL defines the types of objects by defining their interfaces. An interface consists of a set of named operations and the parameters to those operations. Note that IDL is used to describe the interface only, not implementations. Despite the fact that IDL syntax is similar to C++ and Java, IDL is not a programming language.

Through IDL, a particular object implementation tells its potential clients what operations are available and how they should be invoked. From IDL definitions, the CORBA objects are mapped into different programming languages. Some of the programming languages with IDL mapping include C, C++, Java, Smalltalk, Lisp, and Python. Thus, once you define an interface to objects in IDL, you are free to implement the object using any suitable programming language that has IDL mapping. And, consequently, if you want to use that object, you can use any programming lan-guage to make remote requests to the object.

CORBA stands for Common Object Broker Architecture. CORBA is an industry-standard de-veloped by OMG, a consortium of more than 700 companies. CORBA is not a programming language; it’s a specification for creating and using distributed objects.

CORBA objects are different from typical programming objects in three ways: CORBA ob-jects can run on any platform, they can be locates anywhere on the network and that can be written in any language that supports IDL mapping. CORBA is composed of five major components: ORB, IDL, DII IR and OA. The ORB is responsible for finding the object implementation for a request, preparing the object implementation to receive the request, and communicating the data making up the request. The OMA is the next higher level that builds upon the CORBA architec-ture. OMA consists of two main components: CORBAservices and CORBAfacilities. The OMA allows applications to provide their basic functionality through a standard interface.

CORBA 3.0 will have several major new features, including PO, CORBA messaging, and ob-jects by value. POA provides a new feature that allows applications and its servants to be portable between different ORBs supplied by different vendors. CORBA messaging adds two new asyn-chronous request techniques: polling and callback. These new techniques represent a significant advantage for most programming languages because static invocations provide a more natural programming model than the DII. Using objects by value, it is now possible to pas objects by value with CORBA.

4. Effectiveness of distributed systems

The issues of communication between computers on the network is very complex and can be achieved in many different ways, ranging from low level socket interface to advanced environ-ment’s acting as a communication platform between applications on the distributed network. The most popular standards for communication applications used in distributed systems are: RMI and CORBA. They provide communication protocols in addition to the many useful services.

In recent years, the CORBA standard has reached a high technological level. In addition to commercial implementation, there are free products available even in the form of the source – such as the Mico.

(7)

4.1. Environmental Test and scenario Tests

This section describes the test scenarios and conditions for the studies concerning effective-ness of the selected distributed technologies. The shown characteristics of the hardware platform and the software will let you know what are the requirements of computer resources and of the individual implementations.

The measurements have been made for two different testing models. In the first, the commu-nication from the customer’s servers was made through a computer network. In the second scenario both the client and server was running on one machine. In the case of the first scenario the client connects to the selected number of servers by dividing the text file there the algorithm searches the pattern onto parallel computers.

All tests were carried out on three separate files. All three files are plain text. First file had 113 000 characters, second one – 1 244 500 characters and last one contained 7 461 000 charac-ters. The reason why three different files were used is to see how effective the technologies are (CORBA, RMI).

Figure 1 shows the first scenario when the computers will be connected in a network applying the joint bus technology.

Figure 1. Scenario 1 (a multiple machine model)

(8)

Figure 2 shows the second scenario (one machine model).

Figure 2. Scenario 2 (one machine model) Source: Authors’ elaboration.

4.2. Hardware and Software Platforms

To perform the tests 10 identical machines were used. The table 4 shows the hardware specifi-cation of the 10 servers (all servers have the same hardware and software configuration).

Table 4. Hardware and software specification of each server

Name of server CPUs Memory Operating System

10 x Sun Fire 20z

2 x AMD 248

(2,2GHz) 8GB RAM Windows 203 Server

Source: Authors’ elaboration.

4.3. Comparison of CORBA and RMI

The results will determine the differences in performance of the test environments. The pro-gram was run for each amount of server 10 times. After performing the average time for each amount of servers was calculated.

In this situation the optimum (best) number of servers to do the calculating is just two servers. The comparison of CORBA and RMI technology is shown on Figure 3. By showing the results of both technologies on one chart one can see which technologies (CORBA or RMI) are quicker for this kind of tasks. The charts on Figure 3 are a summary of all results of all three techniques. The best technique for this kind of job is JAVA RMI and worst results were obtained by the CORBA technology. The reason for this is that RMI deals best with the transmission of data over the network (This program is compiles to a binary code form).

(9)

I

a) b)

c)

Figure 3. Comparison of all technologies for a) text1 file b) text2 file c) text3 file Source: Authors’ elaboration.

4. Conclusion

Looking at the results obtained, one can notice a substantial difference, which raises some cu-riosity. The CORBA had worst results than the other two techniques. On the other hand RMI has a longer time to connect to the server and data transmission, but when increasing the number of servers the performance of the computing is growing. The easiness of programming in Java should also be worth mentioning. After analyzing the results it’s clear that parallel processing in complex calculations is of great importance, you can save a great deal of time and the cost of purchasing super high-speed computers. It should however be kept in mind that adding a large number of servers to the program is leading to the opposite effect. Thus, the time for the tasks will simply

10 20 30 40 50 60 70 80 90 100 110 1 2 3 4 5 6 7 8 9 10 t= m s number of servers

CORBA RMI SEQUENCIAL

95 145 195 245 295 345 1 2 3 4 5 6 7 8 9 10 t= m s number of servers

CORBA RMI SEQUENCIAL

550 650 750 850 950 1050 1150 1250 1350 1 2 3 4 5 6 7 8 9 10 t= m s number of servers TSCORBA RMI CORBA

(10)

become increasingly longer. Each technology has its advantages and disadvantages. One is simple in terms of programming (RMI), but is limited to one programming language (server and client must be written in the same language), but independent of the operating system. The second one is not dependent on the system as well as not dependent on the language (CORBA), but it is more complicated than the others and causes major problems with compatibility of different implemen-tations.

%LEOLRJUDSK\

[1] Brose G., Vogel A, Duddy K., Java Programming, Publisher Wiley Computer Publishing, Canada 2001.

[2] Coulouris G., Dollimore J., Kindberg T., Systemy rozproszone – podstawy projektowania, Publisher Wydawnictwo Naukowo-Techniczne, Warsaw 1998.

[3] Desmeules R., IPv6. Sieci oparte na protokole IP w wersji 6, Publisher Wydawnictwo Naukowe PWN, 2006.

[4] Dye M.A., McDonald R., Rufi A.T.W., Akademia sieci Cisco. Publisher 2008. [5] Eckel B., Tinking in Java (3rd Edition), Publisher Prentice Hall PTR; December 2002. [6] Farley J., Java Distributed Computing, Publisher O’Reilly, February 1998.

[7] Grosso W., Java RMI, publisher O’Reilly, October 2001.

[8] Henning M., Vinoski S., Advanced CORBA ® Programming with C++, Publisher Addison Wesley Longman, Massachusetts 1999.

[9] Herold E. R., Java – programowanie sieciowe, Publisher O”Reilly, Warsaw 2001. [10] Rosenberger J., Teach Yourself CORBA in 14 Days, Publisher SAMS 2000.

[11] Silberschatz A., Peterson J., Galvin B., Podstawy systemów operacyjnych, Publisher Wydawnictwo Naukowo-Techniczne, Warsaw 1993.

[12] http://www.forbes.com/lists/2008/18/biz_2000global08_The-Global-2000_Rank.html. [13]

http://www.computerweekly.com/Articles/2006/09/26/218593/how-grid-power-pays-off-for-hsbc.htm.

(11)

BADANIE EFEKTYWNOĝCI ROZPROSZONYCH SYSTEMÓW E-COMMERCE

Streszczenie

Rozproszone sieci telekomunikacyjne a szczególnie rozwój szerokopasmowych sieci w krajach rozwinitych pozwalaj rozbudowa rozproszone zasoby o znaczcej mocy obliczeniowej. Artykuł przedstawia badanie efektywnoci takich technik jak Java RMI i CORBA wykorzystywanych do budowy rozproszonych zasobów. Otrzymane wyniki pokazały, e technologia RMI jest szybsza ni technologia CORBA w przypadku algorytmów wyszukiwania.

Słowa kluczowe: rozproszone zasoby, algorytm wyszukiwania, technologia RMI, technologia COBRA

Volodymyr Mosorov Marian Nied wiedziski

Katedra Informatyki Ekonomicznej Wydział Ekonomiczno-Socjologiczny Uniwersytet Łódzki

ul. P.O.W. 3/5, 90-255 Łód e-mail: mosorow@uni.lodz.pl mariann@uni.lodz.pl