Index of /rozprawy2/10625

Pełen tekst

(1)AGH University of Science and Technology. Faculty of Computer Science, Electronics and Telecommunications Department of Computer Science. Ph.D. Dissertation. Programmable logic devices in video surveillance services compliant with SOA paradigm. Author: mgr inż. Robert Brzoza-Woch. Supervisor: prof. dr hab. inż. Krzysztof Zieliński. Kraków, 2013.

(2) Acknowledgements Fist, I would like to show my gratitude to my supervisor Prof. Krzysztof Zieliński for providing me with interesting research topics and supporting me in my research. I am especially grateful to my colleague Dr. Andrzej Ruta for his inappreciable work on image processing algorithms for the developed video surveillance hardware-software platform. Without his great commitment, the platform would be just a theoretical solution. Many thanks to my colleagues Dr. Jacek Długopolski and Dr. Wojciech Zaborowski for their constant support and providing me with new ideas, especially in the FPGA domain. I would like to thank my dear wife Małgorzata who was greatly supporting me and giving me motivation to work. Finally, I would like to thank everyone who helped me with my research or just wished me well. The author of this thesis is a grant holder of the project “Doctus – Małopolski fundusz stypendialny dla doktorantów” partially supported by the European Union in the scope of the European Social Fund. (Autor niniejszej pracy jest stypendystą w ramach projektu “Doctus – Małopolski fundusz stypendialny dla doktorantów” współfinansowanego ze środków Unii Europejskiej w ramach Europejskiego Funduszu Społecznego). The research was partially supported by the European Union in the scope of the European Regional Development Fund program no. POIG.01.03.01-00-008/08, grant no. 11.11.230.015.. ii.

(3) Contents 1 Introduction 1.1 Motivation . . . . . . . . . . . . . . 1.2 The Purpose of This Dissertation . 1.3 Topic and Scope of the Discussion . 1.4 The Main Hypothesis . . . . . . . . 1.5 Contributions . . . . . . . . . . . . 1.6 Contents of the Following Sections. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. 1 1 2 2 3 3 4. 2 DVS Challenges and Technologies 2.1 Diverse Approaches . . . . . . . . . . . . . . . . . . . . . 2.1.1 Centralized Analog Systems . . . . . . . . . . . . 2.1.2 Centralized Digital Systems . . . . . . . . . . . . 2.1.3 Distributed Digital Systems . . . . . . . . . . . . 2.2 Service Oriented Architecture (SOA) . . . . . . . . . . . 2.2.1 Service Contracts . . . . . . . . . . . . . . . . . . 2.2.2 Service Coupling . . . . . . . . . . . . . . . . . . 2.2.3 Service Abstraction . . . . . . . . . . . . . . . . . 2.2.4 Service Reusability . . . . . . . . . . . . . . . . . 2.2.5 Service Autonomy . . . . . . . . . . . . . . . . . 2.2.6 Service Statelessness . . . . . . . . . . . . . . . . 2.2.7 Service Discoverability . . . . . . . . . . . . . . . 2.2.8 Service Composability . . . . . . . . . . . . . . . 2.3 Programmable Logic Devices . . . . . . . . . . . . . . . . 2.3.1 Internal Architecture of PLD . . . . . . . . . . . 2.3.2 Logic Implementation in PLD . . . . . . . . . . . 2.3.3 High-level Synthesis Tools . . . . . . . . . . . . . 2.3.4 Future Trends . . . . . . . . . . . . . . . . . . . . 2.4 State of the Art for DVS . . . . . . . . . . . . . . . . . . 2.4.1 FPGA-based Image Processing Subsystems . . . . 2.4.2 DVS Hardware Platforms and Low-level Solutions 2.4.3 Video Surveillance Solutions . . . . . . . . . . . . 2.4.4 Embedded and FPGA-Based Web Services . . . . 2.4.5 Conclusions on the Current State of the Art . . . 2.5 Proposed Solution . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. 5 5 6 7 8 9 10 10 11 11 12 12 13 13 14 14 17 19 20 21 21 23 24 27 31 32. 3 DVS Service Design Overview 3.1 Design Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 General Capabilities of the Hardware Platform . . . . . . .. 33 33 33. iii. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . ..

(4) CONTENTS. 3.2. 3.3. 3.4. 3.5. 3.1.2 Memory Requirements . . . . . . . . . . . . . 3.1.3 Power Requirements . . . . . . . . . . . . . . 3.1.4 Network Communication and QoS . . . . . . . 3.1.5 Actuator Control . . . . . . . . . . . . . . . . 3.1.6 Update, Repair, and Maintenance . . . . . . . Evolution of the System . . . . . . . . . . . . . . . . 3.2.1 Architectural Improvements . . . . . . . . . . 3.2.2 High-level Synthesis Tools . . . . . . . . . . . 3.2.3 FPGA Hardware Platform . . . . . . . . . . . Final System Architecture . . . . . . . . . . . . . . . 3.3.1 Final Concepts of the Hardware Platform . . . 3.3.2 System Architecture . . . . . . . . . . . . . . 3.3.3 Network Communication . . . . . . . . . . . . 3.3.4 Remote Reconfiguration Subsystem . . . . . . 3.3.5 Data processing and Storage . . . . . . . . . . Possible Alternative Solutions . . . . . . . . . . . . . 3.4.1 Microcontroller-based Implementation . . . . 3.4.2 Network communication based on PHY Chip . Conclusions . . . . . . . . . . . . . . . . . . . . . . .. iv . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. 4 DVS Details and Results 4.1 General Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Motion Detection Service . . . . . . . . . . . . . . . . . . . 4.1.2 Object Classifier Service . . . . . . . . . . . . . . . . . . . 4.2 HW Accel. for the Motion Detection Service . . . . . . . . . . . . 4.2.1 Hardware Accelerator Operation . . . . . . . . . . . . . . . 4.2.2 Hardware Accelerator and Nios-II Synchronization . . . . . 4.3 HW Accel. for the Object Classifier Service . . . . . . . . . . . . 4.3.1 Coarse-grained Accelerator Implementation Details . . . . 4.3.2 FPGA Resource-oriented Optimizations for Coarse-grained Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Fine-grained Accelerators Implementation Details . . . . . 4.4 Network Communication . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Full-featured Version of the Network Communication Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Simplified Version of the Network Communication Subsystem 4.4.3 Performance Comparison Between the Two Versions of the Network Communication Subsystem . . . . . . . . . . . . 4.5 Web Server Control Logic . . . . . . . . . . . . . . . . . . . . . . 4.6 Image Input Path . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Remote Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . 4.8 Additional Features . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.1 Actuator Control . . . . . . . . . . . . . . . . . . . . . . . 4.8.2 Image Preview Generator . . . . . . . . . . . . . . . . . . 4.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 34 37 37 39 39 41 41 44 46 48 48 49 49 50 51 53 53 54 55 57 57 58 58 60 60 61 62 62 64 66 69 69 71 72 73 75 76 78 78 79 81.

(5) CONTENTS. v. 5 System Performance Evaluation 5.1 Motion Detection Service . . . . . . . . . . . . . . . . . . . . . . . 5.2 Object Classifier Service . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 General Performance Considerations . . . . . . . . . . . . 5.2.2 Overall Performance Evaluation for Different Accelerator Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Compliance with SOA Paradigm . . . . . . . . . . . . . . . . . . . 5.3.1 Service Contracts . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Service Coupling . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Service Abstraction . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Service Reusability . . . . . . . . . . . . . . . . . . . . . . 5.3.5 Service Autonomy . . . . . . . . . . . . . . . . . . . . . . 5.3.6 Service Statelessness . . . . . . . . . . . . . . . . . . . . . 5.3.7 Service Discoverability . . . . . . . . . . . . . . . . . . . . 5.3.8 Service composability . . . . . . . . . . . . . . . . . . . . . 5.4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Advanced Intrusion Detection . . . . . . . . . . . . . . . . 5.4.2 Recognition of Vehicle Types . . . . . . . . . . . . . . . . 5.4.3 Intelligent Security Systems . . . . . . . . . . . . . . . . . 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 82 82 85 85. 6 Summary 6.1 General Summary . . . . . . . . . . . 6.2 Proof of the Hypothesis . . . . . . . 6.3 Contributions to the State of the Art 6.4 Further Development . . . . . . . . .. 95 95 96 97 97. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. 86 88 89 89 89 89 90 90 91 91 91 91 92 93 94.

(6) List of Figures 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 3.1 3.2 3.3. 3.4. 3.5. 3.6 3.7 3.8. IVC-8371P video capture card hardware view and block diagram (video path only). . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Block diagram of an exemplary analog video surveillance system. . 7 Video surveillance system using IP cameras. . . . . . . . . . . . . 8 Block diagram of a distributed digital video surveillance system. . 9 Architectural diagram of Web service discovery mechanism (source: [7]). 13 Simplified internal architecture of CPLD. . . . . . . . . . . . . . . 15 Simplified internal architecture of FPGA. . . . . . . . . . . . . . . 16 CPLD and FPGA logic array blocks. . . . . . . . . . . . . . . . . 16 Simplified block diagram of a macrocell. . . . . . . . . . . . . . . 17 Simplified block diagram of an adaptive logic module (ALM). . . 17 PLD project design flow. . . . . . . . . . . . . . . . . . . . . . . . 18 Block diagram of a sensor node for low-level image processing for video surveillance purposes (source: [33]). . . . . . . . . . . . . . . 25 Architecture of video surveillance and monitoring (VSAM) system (Source: [35]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Architecture of OmniEye surveillance system (source: [55]). . . . . 27 An electronic, e-Grid-enabled device (source: [59]). . . . . . . . . . 29 The architecture of high performance FPGA-based Web server (source: [62]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Architectural block diagram of the preliminary version of the video surveillance platform for stream-wise image processing algorithms. Architectural block diagram of the video surveillance service which uses an external block of memory. . . . . . . . . . . . . . . . . . . Architectural block diagram of the digital video surveillance Web service which uses a microcontroller for requests and responses processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Architectural block diagram of the digital video surveillance Web service in which main functional blocks are integrated into a single microcontroller system. . . . . . . . . . . . . . . . . . . . . . . . . Top view of the DE2-70 FPGA evaluation board which was used at early-stage for video surveillance service prototypes (source: DE270 User Manual [73]). . . . . . . . . . . . . . . . . . . . . . . . . . The video surveillance service hardware platform based on the TREX-S2 and TMB boards. . . . . . . . . . . . . . . . . . . . . . Bottom view of the TREX-S2 module. . . . . . . . . . . . . . . . Video surveillance smart camera hardware platform. . . . . . . . .. vi. 42 42. 43. 44. 47 48 49 50.

(7) LIST OF FIGURES 3.9 3.10 3.11 3.12. Block diagram of the video surveillance smart camera hardware. . Reconfiguration module block diagram. . . . . . . . . . . . . . . . General diagram of logical connections of the smart camera hardware. Two distinct approaches to network communication implementation in embedded systems. . . . . . . . . . . . . . . . . . . . . . . 3.13 Multiplexing and demultiplexing network data at socket level. Some sockets are reserved for Communication purposes, and one is reserved for Reconfiguration purposes. . . . . . . . . . . . . . . . . . 3.14 Multiplexing and demultiplexing network data at physical layer. Note that both FPGA and MCU have their own complete TCP/IP stack implemented. . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 4.2. 4.3. 4.4 4.5 4.6. 4.7. 4.8. Part of the general hardware architecture modified for Motion Detector service purposes. . . . . . . . . . . . . . . . . . . . . . . . . Part of the general hardware architecture utilized in the object classifier service purposes with the coarse-grained hardware accelerator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Part of the general hardware architecture utilized in the object classifier service purposes with the fine-grained hardware accelerators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The idea of the block-wise difference image computation with 5×5 px block size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The architecture of the parallel object classifier implemented in FPGA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulation results for hw_serial_mul_16 multiplication function which operates on fixed point numbers in 0s6.10 format. In the example the module computes multiplication of 0x0C00 (3.0 in decimal notation) times 0x0A00 (2.5) and the result is 0x1E00 (7.5). Input values are supplied to x and y signals and r_e_t_u_r_n signal holds the result. . . . . . . . . . . . . . . . . . . . . . . . . . . Simulation results for hw_serial_div_16 division function which operates on 16-bit wide fixed point numbers in 0s6.10 format. In the presented example the module computes division of 0x1E00 (7.5 in decimal) by 0x0A00 (2.5) and the produced quotient is 0x0C00 (3.0). Input values are supplied to x and y signals and r_e_t_u_r_n signal holds the result. . . . . . . . . . . . . . . . . Simulation results for hw_serial_div_32 division function which operates on 16-bit wide fixed point numbers in 0s2.14 format. In the presented example the module computes division of 0x3000 (0.75 in decimal) by 0x6000 (1.5) and the produced quotient is 0x2000 (0.5). Input values are supplied to x and y signals and r_e_t_u_r_n signal holds the result. . . . . . . . . . . . . . . . .. vii 50 51 52 54. 55. 55 58. 59. 59 60 63. 65. 65. 66.

(8) LIST OF FIGURES 4.9. 4.10. 4.11 4.12 4.13 4.14. 4.15 4.16 4.17. 4.18 4.19 4.20 4.21 5.1 5.2. Simulation results for atan fine-grained accelerator (custom instruction). First function is computed from values X=-170 (0xFF56), Y=170 (0x00AA) supplied to dataa input and the result is 132 (0x84) degrees (exact value should be 135). Second function is computed from values X=187 (0x00BB), Y=187 (0x00BB) supplied to datab and the result is 47 (0x2F) degrees (exact value should be 45). Those values are coded on result output. . . . . . Simulation results for sqrt function which computes square root. In this example the function computes square root of 144 (decimal) supplied on dataa input and returns value 12 on result output. . Block diagram of network communication subsystem. . . . . . . . Block diagram of the simplified network communication subsystem. Measured response times for two different versions of the network communication subsystem. . . . . . . . . . . . . . . . . . . . . . . Video surveillance smart camera server processing scheme (dashed boxes denote procedures executed in large degree in hardware while other procedures are implemented in Nios-II software; colored elements are related to image processing tasks of the smart camera). Block diagram of the image acquisition subsystem. . . . . . . . . GUI of the reconfiguration management application. . . . . . . . . Graphical representation of remote reconfiguration procedures spread in a period of time (intervals shown on the time axis are times measured on a working hardware). . . . . . . . . . . . . . . . . . . . . Data flow during each step of the reconfiguration procedure. . . . Block diagram of the actuator control subsystem. . . . . . . . . . Block diagram of the preview image generator subsystem. . . . . . Schematic diagram of the simple video DAC used in the preview image generator. . . . . . . . . . . . . . . . . . . . . . . . . . . .. Realization of the motion detection service. . . . . . . . . . . . . . Response times for each method measured for 10 subsequent request (acronyms for method names are defined in Table 5.1). . . . 5.3 Response times for GetMotionHistory method with ROI set to 1/4 VGA (QVGA) size and full VGA size. . . . . . . . . . . . . . . . 5.4 A set of three images of one face for creating the prototype database for the object classifier service. This set (among many more) was used for the object classifier’s testing purposes. . . . . . . . . . . . 5.5 Comparison of different hardware acceleration module sets. . . . . 5.6 Response times for 10 consecutive requests of each method available in the object classifier service with the fine-grained accelerators (option 5). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Response times for 10 consecutive requests of each method available in the object classifier service with the coarse-grained accelerator (option 6). . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Schematic diagram of the advanced intrusion detection system. . . 5.9 Schematic diagram of a system which recognizes types of vehicles and controls a gate. . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Schematic diagram of a road security system. . . . . . . . . . . .. viii. 67. 68 70 72 72. 74 76 76. 77 78 79 80 81 83 84 85. 86 87. 87. 88 92 92 93.

(9) LIST OF FIGURES 5.11 Schematic diagram of an intelligent security system which provides an access to a restricted area for privileged staff members only. . .. ix. 94.

(10) List of Tables 2.1. Summarized conclusions on the current state of the art. . . . . . .. 32. 3.1. Required memory types for an FPGA-based smart camera implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 35. 4.1 4.2 5.1. FPGA resource optimization results at the cost of speed for custom hardware multiplication and division modules. . . . . . . . . . . . Performance summary of the fine-grained hardware accelerators for atan and sqrt functions. . . . . . . . . . . . . . . . . . . . . . Average response times for each method of the motion detector service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. x. 66 68 84.

(11) Abstract In this thesis, a novel solution for distributed digital video surveillance systems is proposed. The solution assumes utilization of programmable logic devices for implementing processing logic of each video surveillance camera and that all cameras are Web services compliant with the Service Oriented Architecture (SOA) paradigm. The programmable logic devices are used to increase computational power of the video surveillance services while the SOA compliance provides easiness of integration with other information systems. Classic video surveillance systems rely on a centralized data collection and processing as well as human supervision to detect selected types of events. In contrast, the presented approach assumes processing data from a video surveillance camera directly inside the camera and transferring only meta data which contain extracted information about detected events. Thus transmission of video and image data can be minimized or even avoided and data processing is distributed across all surveillance cameras in the system. This greatly reduces amount of data transmitted over a medium and improves system scalability in terms of number of video surveillance cameras. A methodology of SOA-compliant video surveillance Web services construction is presented and explained using an exemplary hardware-software platform. The platform was developed especially for the purpose of realizing the video surveillance Web services. The hardware part of the platform is based on a highperformance FPGA with additional peripheral hardware.. xi.

(12) Chapter 1 Introduction In the modern world, the use of video surveillance systems is rapidly growing. The systems are now an essential element of media infrastructure in publicly available buildings, streets, and private properties. The main use of video surveillance systems is to increase physical security in a selected area by facilitating visual detection of events such as unauthorized trespassing, acts of vandalism or other criminal activity. Video surveillance systems can also provide storage capability for recorded data which may later be used as an evidence. Modern use of video surveillance cannot be narrowed down to criminal incidents detection. They can also be used as general purpose systems for environmental conditions monitoring (e.g. vehicle or pedestrians traffic intensity measurements, automated car park billing calculation, etc.).. 1.1. Motivation. Today, a vast majority of video surveillance systems still rely on centralized data processing combined with human supervision. As discussed further in this thesis, systems composed in the centralized manner are difficult to deploy, scale, and upgrade. First, video surveillance cameras typically lack any degree of autonomy or “intelligence” and just acquire and transmit video data. Most of the transmitted video surveillance data are virtually useless except for very few short periods of time in which important events happened. Let us look at a hypothetical scenario: during one month period three unauthorized trespassing events have been detected. It results in just a few hundreds of video frames containing valuable information compared to millions of frames containing almost the same image of an empty scene. Classic multi-camera video surveillance systems are typically deployed using analog medium which transmits raw video data to a central display and storage unit. More modern solutions use digital data transmission. Even though advanced video compression algorithms are used, such systems transmit lots of redundant information, hence, due to large bandwidth requirements, they must use a separate network or cable infrastructure. This is an expensive solution in terms of further upgrades and modifications. It would be, however, very convenient to deploy a video surveillance system by using an existing network infrastructure even without a need to upgrade the infrastructure, thus allowing developers to create a flexible and scalable video surveillance systems. 1.

(13) CHAPTER 1. INTRODUCTION. 2. Another issue is a selection of data transmission standards and protocols. Currently, the classic video surveillance systems utilize media and protocols optimized for transmitting large amount of video data. However, reducing the amount of data allows using a wide range of available simple, open, and welldocumented protocols. This, in turn, would greatly facilitate the whole video surveillance system composition, its flexibility, and a possibility of integration with other information systems.. 1.2. The Purpose of This Dissertation. In this thesis the author attempts to: • introduce a novel approach to video surveillance systems in which compliance with the SOA paradigm is combined with hardware-accelerated operation to create an autonomous system with distributed processing intelligence, • show an exemplary practical implementation of an “intelligent camera” compliant with this approach. The presented approach involves adopting the SOA paradigm for developing video surveillance systems with distributed processing logic. In a comparison to the mentioned centralized analog and digital video surveillance systems, this approach has multiple advantages, mainly better flexibility because of deployment and upgrade easiness and convenient means of integration with enterprise class systems thanks to compliance with the SOA paradigm. The description of the exemplary SOA-compliant video surveillance camera aims at providing many vital details about practical implementation-related issues which had to be overcame during the development process. Evolution of the presented solution is also described in order to provide the reader with further valuable information about possible design traps which had arisen and, finally, have been solved.. 1.3. Topic and Scope of the Discussion. There are multiple aspects of video surveillance systems’ implementation. Each type and implementation of video surveillance systems have some interesting aspects both from an engineer’s and a scientist’s point of view. For example, analog systems can be analyzed in terms of data transmission and multiplexing quality (signal distortion, attenuation) and means of improving them. Systems, which rely on digital data transmission, are worth discussing for their advanced video compression algorithms, various aspects of robust network communication, and advanced means of data storage. This thesis deals with digital video surveillance systems with distributed processing logic. From the engineering and scientific point of view the most interesting aspects include: • general architecture of the video surveillance system, i.e. how surveillance cameras are collaborating together and how they are integrated with other systems,.

(14) CHAPTER 1. INTRODUCTION. 3. • ways of moving the processing logic from a central entity to multiple nodes, • architectural and implementation aspects of a single node of the video surveillance system. Image and video processing algorithms are an indispensable yet very broad aspect of video surveillance systems. This thesis covers only selected aspects of image/video processing algorithms’ implementation in a scope required for basic understanding of their operation and practical performance analysis. It does not cover the algorithms’ theoretical background and performance analysis.. 1.4. The Main Hypothesis. This thesis addresses selected aspects of distributed video surveillance systems implementation and performance. Results of the author’s research concerns the utilization of programmable logic devices (PLD) in implementation of an intelligent video surveillance camera intended to support flexible and easy-to-deploy video surveillance systems compliant with SOA paradigm. The compliance makes them easy to integrate with enterprise class information systems. Selected economical and Quality of Service (QoS) aspects of the implementation are expected to be at acceptable levels for the video surveillance purposes. This leads to the following hypothesis: Hardware solutions based on programmable logic devices allow an increase in the ergonomics of video surveillance services and in the integration easiness of such services with enterprise class systems while preserving selected economical and QoS aspects.. 1.5. Contributions. The main contributions are summarized below. • A hardware-software platform for developing video surveillance Web service camera is practically implemented. The platform consists of physical hardware and “virtual hardware” implemented as a configuration of a programmable logic device. The platform had been used to implement two exemplary Web services which are described below. Both services are in a large degree compliant with the SOA paradigm. • Fully functional autonomous motion detection service was developed on the hardware platform. The service is able to detect moving objects in a selected region of interest and queue detected events in its internal memory. Moreover, its storage and detection parameters are configurable. A client can request a currently stored motion detection history, as well as set and read the service’s operation parameters. The service is hardware-accelerated, i.e. it uses a specialized module implemented in programmable logic to enhance its operation robustness. • Versatile object classifier service was developed. It is capable of recognizing an object from a set of other similar objects. It is well suited for recognizing.

(15) CHAPTER 1. INTRODUCTION. 4. faces, brands of cars, types of vehicles, etc. The service can be invoked by client and returns a previously defined identification name of a recognized object. It was also developed as an example of hardware accelerated Web service with a high ratio of acceleration. • Remote reconfiguration mechanisms were developed. The remote reconfiguration allows a developer to change the video surveillance service functionality without physical access to its hardware using the service’s network interface.. 1.6. Contents of the Following Sections. The rest of this dissertation is organized as follows. Chapter 2 contains general information about the most important aspects of this dissertation which include first and foremost: video surveillance solutions, service oriented architecture, and programmable logic devices. It starts with a discussion of different approaches to video surveillance system overall architecture (Section 2.1) and compares their individual features. Next, the eight principles of SOA are summarized to provide a further reference (Section 2.2) and then an introductory theoretical background on the topic of PLD is provided (Section 2.3). Current state of the art is reviewed in Section 2.4. It is divided into a few subcategories – each containing a different aspect of the currently available solutions. That section ends with a brief description of the proposed solution confronted to the previously made literature review. Chapter 3 contains information about important assumptions and decisions which had to be made during a development process of the presented video surveillance hardware-software platform. As the platform is a result of a few years of development, many hardware and software solutions had been practically tested. Some of them are used in the final version and some have been rejected during development process. Details are described in Section 3.2 of that chapter. The chapter ends with a summary of the final version of the video surveillance service. Chapter 4 contains detailed information about vital functional blocks of the developed hardware platform. Most of them are custom-made modules developed during the authors research. The author provides simulation as well as practical test results for the most of the presented building blocks of the video surveillance platform. Performance evaluation of the whole exemplary video surveillance Web service are provided in Chapter 5. The developed motion detection and object classifier services are described in therms of achieved functionality, compliance with SOA paradigm, and overall service performance. A summary of this dissertation and final conclusions are described in Chapter 6..

(16) Chapter 2 Digital Video Surveillance Challenges and Technologies This section contains information about classical, modern, and future approaches to video surveillance systems. Selected vital aspects of video surveillance systems’ architectures are discussed and compared, including their flexibility, scalability, as well as network and bandwidth requirements. The author places emphasis on embedding digital video surveillance (DVS) into Service Oriented Architecture (SOA) compliant systems. It requires to introduce changes to general architecture of a video surveillance system and induces developers to use modern or even cutting-edge technologies for implementing such systems. Available solutions and technologies for implementing whole surveillance systems and their components are reviewed and discussed.. 2.1. Diverse Approaches to Video Surveillance Systems. A variety of applications and degrees of complexity led to wide range of surveillance systems’ architectures. Depending on assumed functionality, video surveillance systems have to provide different degree of internal autonomous logic: from the classical closed-circuit television (CCTV) systems which require constant human supervision and just transmit, multiplex, and record visual information, to advanced distributed systems with large degree of autonomy which automatically detect occurrence of programmed events. This subsection contains discussion about different approaches to a hypothetical video surveillance system which must: • Acquire information about defined anomalies such as motion in a restricted zone; • Work as a Web service offering results of custom image processing from each camera; • Autonomously drive a set of actuators as a reaction to detected programmed events (e.g. "open a door when an authorized staff member’s face is recognized"); 5.

(17) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 6. • Be easily upgradable in terms of image quality and number of cameras at relatively low cost. Various digital video surveillance architectures can fulfill these goals. However, depending on the architecture, implementing these features can be a more or less complex and expensive task. Further in this section, three different approaches to video surveillance systems are discussed and compared: classic analog systems (2.1.1), digital systems with centralized data acquisition and processing (2.1.2), and digital systems with distributed processing capabilities (2.1.3). They feature various levels of autonomy, scalability, and flexibility. They also require different technologies for their implementation and deployment.. 2.1.1. Centralized Analog Systems. Basic analog video surveillance systems are commonly used to provide physical security and decrease number of thefts, assaults, acts of vandalism, or unauthorized trespassing. Such systems rely mostly on an analog video acquisition and transmission. Digital components include automated or manual signal multiplexing and very limited or no automated event detection. It implies a requirement for human supervision of acquired or recorded image in order to detect selected events.. Figure 2.1: IVC-8371P video capture card hardware view and block diagram (video path only). More advanced analog video surveillance systems use dedicated video capture cards which perform multi-channel audio and video analog-to-digital conversion, MPEG-1, MPEG-2, MPEG-4 data compression, and simple image processing tasks (e.g. scaling, motion detection) implemented in hardware. IVC-8371P by IEI [3] shown in Figure 2.1 is an example of this type of card. Exemplary block diagram of video surveillance system using analog cameras is shown in Figure 2.2. Analog signals from each camera is transmitted using coaxial cable to a central computer (usually PC) equipped with specialized video capture cards and Web interface. Typical video capture cards have up to 4 analog video inputs each and PCI or PCI-Express interface on the PC side. Initial processing and video compression is done directly on those cards. Further processing (e.g. face or license plate number recognition) can be done at software side on the central computer. Alternatively, instead of the central computer, a system designer can use an embedded solution – a specialized multichannel video recorder.

(18) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 7. Figure 2.2: Block diagram of an exemplary analog video surveillance system. which would have similar functionality as the computer but in this case implementation possibilities of custom image processing algorithms would be very due to specialized hardware and firmware of such solutions. Analog video surveillance systems are still very popular due to low overall costs of such solutions. However they offer relatively low image quality, typically up to 576 lines of screen resolution, without any upgrade possibilities. Analog signals, due to their nature, are sensitive to noise, reflections on weak connection points, etc. which cause further signal degradation. Another disadvantage of such approach is the fact that wiring infrastructure utilized for analog video signals is not reusable for any other purpose except for analog video signal transmission in one direction. Other data (e.g. camera position control) must be transferred using additional media. Finally, the scalability of analog video surveillance systems is very limited in both hardware and software point of view. Each camera requires one input port of video multiplexer or signal switch and much computational power for data acquisition and compression. In typical scenario, all capture cards are connected to a single PC which also has limited resources. Moreover, those elements of system infrastructure are more expensive (price per one port) than equivalent solutions based for example on Ethernet network.. 2.1.2. Centralized Digital Systems. Recent rogress in high-resolution image sensors and fast analog-to-digital converters (ADC) for video applications allowed creating many solutions for high quality image acquisition. Further advances in integrated circuit (IC) technology resulted in fast and low power digital circuits (mainly microcontrollers and FPGAs) capable of efficiently executing image processing algorithms. Those improvements facilitated mass production of relatively low-cost cameras with digital interfaces, including USB (Universal Serial Bus), FireWire, Ethernet, and Wi-Fi. Cameras with Ethernet or Wi-Fi network interface (commonly known as IP Cameras) can work as Web servers. They also provide image compression (typically H.264 [4]) and are capable of executing image processing tasks, e.g. motion detection which triggers video recording to an internal mass memory. Figure 2.3. shows an example system architecture which uses IP cameras as.

(19) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 8. Figure 2.3: Video surveillance system using IP cameras. video sources. Each camera works as a stand-alone configurable Web server which transmits video data to a client. We can assume that handling one video stream requires tens to hundreds of kilobytes of free bandwidth in a network infrastructure, depending on expected video quality and utilized compression algorithms. The data is gathered in the client computer which can perform additional image processing tasks, actuator control, and also work as a Web Service server. The system architecture is flexible from the hardware point of view, because a camera can be connected to an existing network infrastructure. However, adding another IP camera to the system requires ability to handle bandwidth requirements for another connection, compressed video stream, and processing the incoming data. In this solution, each camera can send a continuous video stream to the central management computer which must process the incoming data. From the system scalability point of view, the computer is the most vulnerable point: not only it must handle many incoming connections, but also has to process video stream from each camera.. 2.1.3. Distributed Digital Systems. Another approach to video surveillance systems is to perform image processing tasks directly in cameras. In such scenario each camera (called smart camera) must be capable of image acquisition, processing and have a Web Service functionality (Figure 2.4). The mentioned approach increases cameras’ hardware complexity and cost, because it requires the use of advanced microcontrollers and FPGA chips. However, a system architecture which uses smart cameras based on modern microcontrollers or FPGA chips can have increased scalability and flexibility. From the hardware point of view adding another camera implies sufficient increase in.

(20) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 9. Figure 2.4: Block diagram of a distributed digital video surveillance system. computational power – each camera can execute image processing tasks independently and image processing results and control data are transmitted over a network. If all smart cameras provide services for only one client machine, it must be able to handle a large number of connections (usually TCP), which in most cases is not difficult to achieve, especially when when there is no need to transmit much information. On the contrary, handling multiple incoming video streams would require much more network bandwidth and processing power. Significant reduction of required bandwidth for data transmission between servers (smart cameras) and client computer is a great advantage of this approach. Exemplary image processing results may consist only of metadata, i.e. coordinates of a detected moving object, identification number of a recognized face, recognized license plate number, etc. Similarly, control data may include region of interest (ROI) borders’ coordinates, detection threshold levels, etc. These information can usually be packed in up to 1 KB raw data or less then 10 KB when considering overhead of high level protocols, e.g. SOAP. This is very little in comparison to digital video transmission requirements. For example H.264 video stream with 360p quality requires ≈90 KB/s bandwidth.. 2.2. Service Oriented Architecture (SOA). All of the previously mentioned video surveillance systems have their own specific architectural features and use various methods of communication between acquisition-processing nodes (cameras) and for integration with other information and control systems. The distributed digital video surveillance would benefit the most from its features only if a proven and well-established architectural and communication paradigm is applied. Service Oriented Architecture is an abstract architectural paradigm for enterprise class distributed systems where solution logic is primarily represented by a service. The model is connected with the Web Service (WS) technology.

(21) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 10. which is typically used in SOA compliant systems development. The idea behind SOA is to reuse currently developed software by wrapping it with software modules which provide standard and open interfaces. It provides flexibility by means of interoperability of software working on heterogeneous environments and hardware platforms. The potential interoperability between heterogeneous systems can be even more tempting when considering embedded and FPGA-based platforms which until modern times were in most cases formed into hermetic architectures and used proprietary protocols. Thomas Erl in [7] defines and explains the eight design principles of SOA. This section contains a brief summary of them.. 2.2.1. Service Contracts. The Service Contracts principle says that a service should have a set of documents (called a contract), which are utilized to express a purpose and features of the service. The required document set should consists of so called technical contract and additional documents. The technical contract includes documents designed to be processed by a machine, but are also human-readable: • WSDL (Web Service Description Language) – an XML-based document which describes functionality and interface of a Web Service; it is also a core of a technical service contract, • XML Schema which defines the data model for message exchanged via Web services, • WS-Policy responsible for asserting and defining policy assertions to different parts of the WSDL Other documents, such as Service Layer Agreements (SLA), can also contain technical content, but are mainly intended to be processed by human and are not necessarily technically but can be legally binding. A SLA can contain expected service characteristics such as performance and accessibility, which are generally known as Quality of Service (QoS) parameters. It can also provide information about planned availability schedule, response times, usage statistics or even rating based on consumer feedback.. 2.2.2. Service Coupling. The Service Coupling principle states that services should be loosely-coupled. Two applications can be tightly coupled when they need each other to operate properly which results in bidirectional dependency. The relationship can be unidirectional as well. Then an application A could be developed as dependent on application B, however the application B is not dependent on A. An optimal situation is when there is loose-coupling (dependencies are minimal) between services and between services and clients. A way to achieve loosecoupling is to use logic-to-contract coupling, which means that service logic should be dependent on a contract. It simplifies further system development, especially service logic upgrade, because in such case the logic is forced to be independent from technological or implementation issues. Other approaches, including the.

(22) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 11. opposite approach (contract-to-logic coupling), would result in increased levels of functional and implementation coupling. We may say that the service coupling is a measure of dependency between services or between a service and a client.. 2.2.3. Service Abstraction. The Service Abstraction principle imposes publishing only information that are absolutely required for clients (non-essential service information should be abstracted ). The service developers have to evaluate risks associated with publishing service meta information, therefore too much or too little information may restrict a number of potential service clients. Most importantly, hiding service details gives service developers and owners possibilities to change or upgrade the service using a different technology or implementation without changing its contract. The following types of information can be potentially hidden when applying the abstraction principle: technology, functional, programmatic logic information, and QoS details. Hiding technology information is important from the perspective of service future upgrades. As the underlying technology changes, the service can keep its properties intact just as they are listed in the service contract. Technology information include programming languages, environments, back-end database access details, etc. Functional abstraction refers to not exposing a certain functionality of the service in its contract. A functionality can be a subject to a different type of contract or it can be left for developer usage only and not disclosed at all. Another type of information which shall not be disclosed in a contract is programmatic logic which refers to utilized algorithms, low-level design, ways of exception handling, etc. Disclosing this type of information would lead developers of consumer services to exploit specific features of the service and, as a result, too tight-coupling to the service. Finally, there is no need to disclose too detailed information which refer to QoS. For example, information about service availability hours is necessary for a client who needs to know what to expect from the service, but it is not advised to inform the client, why the service is unavailable out of hours.. 2.2.4. Service Reusability. The Service Reusability is one of the fundamental principles in SOA. Simply it states that a software (a service) capabilities must be useful for more than just one purpose. Single-purpose, specialized programs are much simpler to design and test because their usage scenarios are predictable. In contrast, more general purpose applications are designed to provide wider functionality and their test procedures are much more complicated. Such programs also require additional exception handling logic and must take into consideration scalability issues. A reward for those additional costs can be a possibility of high return of the initial investment of delivering a service and increased business agility by potentially supporting future automation requirements. The goal to achieve by following the reusability principle is to have as great as possible amount of agnostic services in a service inventory. Obviously, the reusability should be provided on a reasonable level which would provide balance between initial costs and potential benefits..

(23) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 12. An important requirement for a service to be reusable is that its logic must be associated with a sufficiently agnostic context, so multiple usage scenarios can be taken into account. The service logic has to be generic enough to potentially satisfy needs of a wide range of consumers. The service contract also has to be generic, extensible and flexible. Finally, the service must facilitate simultaneous access by multiple client applications or consumers services.. 2.2.5. Service Autonomy. According to this principle, services should be autonomous. Autonomy of a computer software represents the independence to carry out its logic by means of being insensitive to external influences. The external influences can be considered as unpredictable from the software point of view. As a result, the more autonomy has the software, the more reliable it can be. In the SOA service domain the autonomy could be defined as a level of control over an underlying runtime execution environment. Two forms of a service autonomy can be distinguished: runtime (execution), and design time (governance) autonomy. Runtime autonomy is a degree of control over processing logic when a service is invoked and executing. Keeping a high level of this form of autonomy contributes to predictable behavior of the service, consistent runtime execution performance and overall service reliability. Sharing logic resources with other parts of an enterprise can have a negative impact on runtime autonomy, because more factors can affect the service execution, e.g. resource utilization for other purposes. Collective performance of services included in the service composition also reduces its autonomy and influences its performance. Another form of service autonomy is a design-time autonomy which refers to the level of freedom to make changes to the service during its lifetime by an owner of the service. Changes may concern increasing the ability to scale the service in response to higher usage demands, enhancing the service hosting environment, etc. The service loose-coupling principle is somehow related to this problem, because it suggests to keep the service independent from an implementation technology.. 2.2.6. Service Statelessness. The Service Statelessness principle says that service should not process or retain any state data. This principle was introduced to help maximizing service scalability. It is especially important for agnostic services which potentially can be reused and composed to form larger systems. The service has to be able to interface with multiple client requests, thus realizing the automation of a specific business task and must be also capable of serving large number of client interactions. It makes the state management an important factor to minimizing resource consumption. Since the state management represents the processing of information mainly associated with a current activity, the system resource consumption for this activity should be reduced. The ideal service design assumes stateless operation, however, in practice achieving full statelessness is difficult, so the state information is often deferred to a state data repository. This contributes to decreasing resources and memory consumption..

(24) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 2.2.7. 13. Service Discoverability. The Service Discoverability principle states that services should allow a standardized process of discovery and identification of its features by using a service registry which is a repository holding the service’s meta data. It helps to determine if a required functionality is already available as a service, or it must be implemented. The process of service discovery starts with a client searching the service registry to locate a service which is capable of delivering a desired functionality. The service registry contains in its inventory information about available services and location of corresponding service contracts. If a selected service is capable of fulfilling the client’s needs, the client can retrieve the contract and start using the service by exchanging messages in compliance with the contract (refer to Figure 2.5).. Figure 2.5: (source: [7]).. 2.2.8. Architectural diagram of Web service discovery mechanism. Service Composability. The Service Composability principle sums up the ideas of SOA. It says that services can potentially be partricipants of a composition. The composability principle is utilized for solving complex problems by dividing them into smaller tasks. Each task can then be solved by available service, hence multiple basic services can be assembled into a specific configuration in order to solve the complex task. In this case tasks work in a coordinated manner. The same basic services can be recomposed to solve another problem with significantly reduced implementation effort. To do that smoothly, the component services should be compliant with all other principles, especially clearly visible is the compliance with reusability principle. It can be noticed that all previously mentioned principles in a way support and affect the last one - service composability. For example, low autonomy of a component service can affect the overall autonomy of a complex service created with the component service. In this case, the complex service would have low composition autonomy, even if its native autonomy could be high..

(25) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 2.3. 14. Programmable Logic Devices. Digital video surveillance systems with distributed computational power require each camera-node to offer relatively large processing capabilities compared to centralized systems. This section describes Programmable Logic Devices (PLDs) – a class of integrated circuits that are suitable for construction of low-power and high-performance computational systems. Traditional approach to commercially mass-produced electronic devices is to use application-specific integrated circuits (ASIC). They are usually low price custom made integrated circuits designed solely for one type or model of an electronic device and manufactured typically in bulk quantities which compensate initial costs of design and prototyping. Modern ASIC technologies allow large scale production of very fast integrated circuits with good electrical parameters: low current consumption, high operating frequencies, and good overall performance. However, since the ASIC design output consists of technological masks for silicon dies, they are not customizable in any way which was not expected during internal schematic design stage. In past few years ASIC in customer electronic gadgets are successfully replaced by high performance microcontrollers. This trend has been particularly visible since introduction of modern microcontrollers with ARM Cortex-A cores and their derivatives. ASICs are also in most cases not suitable for industrial applications: such equipment is usually manufactured in smaller volumes, is not as cost sensitive as commercial products, and may potentially require tune-ups or upgrades during its life cycle. Utilization of general purpose microcontrollers (MCUs) in those applications could be a good solution for less computationally demanding equipment, yet the microcontrollers would not outperform ASICs. One of the best solutions here, which provide design flexibility, good overall performance, and power efficiency, is the use of programmable logic devices. For a more detailed comparison between FPGAs and ASICs refer to [24]. PLDs are a type of digital integrated circuits which functionality is undefined at the moment of manufacture and must be programmed in order to work in a target application. They contain general purpose resources (combinational logic modules, latches, memory blocks, interconnections, etc.) which can be configured to work as a full-featured high-speed digital integrated circuit. Maximum complexity level of a resulting functionality depends on amount of resources available in a PLD. Obviously a PLD chip price usually grows in proportion to its amount of resources. Standard silicon market rules also apply to PLDs. It means that the technology advances result in cheaper and more complex PLDs each year.. 2.3.1. Internal Architecture of PLD. Formerly used PLD architectures are referred to as Simple Programmable Logic Device (SPLD). Typical SPLDs are Programmable Array Logic (PAL) and Generic Array Logic (GAL). They are rarely used in modern applications, because they offer very limited amount of logical building blocks. Currently, there are two commonly used types of PLD architecture:.

(26) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 15. • Complex Programmable Logic Devices (CPLD), • Field Programmable Gate Array (FPGA). Various implementations of CPLD and FPGA may differ between types, families, and manufacturers, however, most of them have similar core functionality. As a reference two popular CPLD families were taken into account: MAX3000 CPLD family [8] from Altera Corp., and CoolRuuner-II [9] CPLD family from Xilinx Inc. The FPGA architecture discussion was based on Stratix-II [10] FPGA family from Altera Corp. In general, PLDs consist of interconnections, general purpose functional blocks, and specialized functional blocks.. Figure 2.6: Simplified internal architecture of CPLD. An example of general CPLD architecture is shown in Figure 2.6. Logic Array Blocks contain programmable logic which determines, along with interconnections, the final functionality of the chip. Input-Output Control Blocks are responsible for providing the specific electrical interface for each external pin: voltage levels, timing characteristics, input or output configuration, impedance, etc. Global Interconnection Fabric is a configurable matrix able to connect any signal source to any destination on the CPLD. The fact that the central interconnection fabric in CPLDs is one dimensional simplifies the internal structure of the chip and makes a design’s timing performance easier to predict. However, this also reduces the maximum number of available connections on the chip which in turn decreases overall maximum interconnection bandwidth. By contrast, the internal architecture of FPGA can be perceived as twodimensional, i.e. connections can be routed horizontally or vertically between functional blocks (refer to Figure 2.7). Another improvements of FPGA include introduction of specialized functional blocks: scattered configurable memory (MEM) and hardware multipliers or digital signal processing (DSP) elements, which are useful in high performance designs (DSP blocks are more common in high-end FPGAs while low-end models are instead equipped in multiplier blocks). Moreover, FPGAs offer flexible clock management thanks to digital PLL (Phase Locked Loop) or DLL (Delay Locked Loop) blocks allowing to multiply and divide.

(27) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 16. Figure 2.7: Simplified internal architecture of FPGA.. Figure 2.8: CPLD and FPGA logic array blocks. frequency of a provided clock signal (PLLs and DLLs are not shown in Figure 2.7). Input-output signals management blocks (IOE in Figure 2.7) in FPGA are usually more complex than in CPLD, hence FPGA interface signals can be more precisely tailored to a specific design. The idea of LAB is similar in CPLD and FPGA (Figure 2.8). Each LAB in CPLD consists of smaller blocks called macrocells. In FPGA case, the building blocks of LAB are called adaptive logic modules – ALMs. Macrocells (Figure 2.9) and ALMs (Figure 2.10) have similar structure. In both cases there is a configurable combinational logic block and one or more flipflops (FF). Each FF functions as 1-bit memory element which can be utilized for sequential logic implementation or can be bypassed, if the cell is expected to work as combinational logic. Macrocell usually has only one configurable FF which can work as D or T-type flip-flop. In contrast, the exemplary ALM is equipped with two D-type FFs. Multiplexers (shown as trapezoidal blocks in Figure 2.9 and Figure 2.10) allow additional configuration of a cell. For macrocells the configuration includes selection of clock and enable signal source for FF or bypassing the FF. Multiplexers in ALM also allow chaining FFs or adders in order to gain wider.

(28) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 17. registers or logic functions. On the contrary, in macrocells the communication paths must be routed using local or global multipurpose interconnections and no specialized channels for chaining purposes are provided.. Figure 2.9: Simplified block diagram of a macrocell.. Figure 2.10: Simplified block diagram of an adaptive logic module (ALM).. 2.3.2. Logic Implementation in PLD. PLDs must be configured before operation. During the configuration process all required internal connections are established, functional blocks are programmed for their purpose in the design, and internal memories are filled with data. The configuration process is based on a specialized configuration file which stores description of all connections and settings inside the PLD. Due to large complexity level of modern PLDs, such file is usually automatically generated by a dedicated electronic design automation (EDA) software from input files provided by developers. Compiling input files to a programming file is a multi-step process called a design flow and performed mostly automatically by an EDA software. The further description is based on Quartus II documentation ([12] and [13]) by Altera corporation. The design and compilation flow is shown in Figure 2.11. The Design.

(29) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 18. Figure 2.11: PLD project design flow. Entry consists of input files which can contain a schematic design or a text description of desired PLD behavior written in a hardware description language (HDL), information about expected timing constrains, input-output (IO) assignments to physical pins of a PLD, etc. The hardware description is usually coded in HDLs which are currently in common use: Verilog and VHDL (Very high speed integrated circuit Hardware Description Language). Other HDLs are dramatically less popular or they lost their popularity during recent years (e.g. CUPL, which was commonly used for SPLD). By using Analysis and Synthesis tool the design entry files are first analyzed for errors, which includes logical completeness, consistency of the project, and syntax correctness. In the process of Synthesis the compiler performs technology mapping on the project files, i.e. it infers machine states and connections of basic logic blocks such as latches and FFs from input HDL files. The optimizations made by the Analysis and Synthesis aim at minimizing gate count, removing redundant logic, and preliminary tailoring the input design for a specific PLD architecture. The whole project logic at this level is represented as a netlist of available components, but the components are not yet assigned to a specific physical resources of the PLD. After the synthesis is finished, the compiler invokes a fitter process, which is also called Place and Route. The fitter uses the logic netlist and timing requirements from the previous steps of design flow to match them with the available physical resources of the target PLD chip. Another objective of the fitter is to route signals between logic blocks by selecting optimal interconnection paths so.

(30) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 19. that the timing requirements can be met. The next two steps in compilation process are verification procedures: Timing Analysis and Simulation. Timing analysis concerns rather internal structure of the design, including signal timing between blocks (setup and hold times, pulse widths, etc), and clock characteristics (maximum frequency, rise and fall times, etc.). The simulation usually refers to a process of verifying a functionality of a logic block or a whole design from the external point of view. Typical modes include timing and functional simulation. The functional simulation is based on the gate-level design and results are computed by simply calculating output states in response to given input data (test vectors). Timing simulation includes both logical functionality of a design and its timing characteristics which are affected by the utilized PLD chip technology. Other methods of verification include live analysis of internal signals using external hardware such as oscilloscopes and logic analyzers. Modern design environments support special software tools which allow a developer to connect external signals (using probes generated inside PLD) to specific points in the design while keeping the design’s internal characteristics intact. The successfully compiled and verified design can be then uploaded to a target hardware platform in the process of Configuration. The configuration is supervised by the software environment but takes place in a target hardware. During the configuration, an external device (a hardware programming tool) programs PLD’s internal connections, sets multiplexers, enables or disables specific blocks, etc. using a control-diagnostic interface of the PLD. In CPLDs the configuration is stored in a chip’s internal non-volatile memory so once they are programmed, they keep their functionality until they are erased and another configuration is uploaded. In modern FPGA chips the configuration transistors are driven by SRAM (Static Random Access Memory) cells. As a result, they lose their configuration after the power supply is disconnected. A typical solution to mitigate that problem is to use an external Flash memory chip which stores the configuration and configures the FPGA each time its power supply is applied. Less typical but much more flexible way of configuring a PLD is to use an external module (usually based on an MCU or another PLD) which automatically manages and performs reconfiguration.. 2.3.3. High-level Synthesis Tools. As stated in 2.3.2, configuration development for a PLD is usually a complex task which requires at least basic knowledge of the chip’s internal structure and nuances of logic implementation. HDLs are also uncommon type of programming languages among software specialists. To address some of those issues, there are specialized software tools for high-level synthesis. The tools allow the developer to work at much more abstract level. Nowadays, it becomes more and more common to use dedicated software tools which automatically generate HDL code. The code, which is generated using high-level synthesis tools, together with human-written code makes up the complete design. Examples of automatic code generating tools include: • System generators such as SOPC Builder [20] (system-on-a-programmable chip) or QSys [21] by Altera. They are able to generate HDL code for.

(31) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 20. one or more microprocessor cores, interconnection buses, memories, and parametrized peripheral modules which produce a microcontroller unit to be implemented entirely inside a PLD. • High-level language compilers which generate HDL output code. Typically they require input code partially compliant with C language but with limitations (for example no recursion is allowed) and extensions (parallel execution and pipelining control). Popular languages designed for C-to-HDL compiling are: ImpulseC [11] and Handel-C [22]. Typical use of such tools is to manage creation and code generation for a processing accelerator module or an on-chip MCU in order to reduce development time. Their usefulness however is widely debated in PLD developers’ society and is very dependent on tool’s vendor, purpose, and developer’s task. For example, mentioned above ImpulseC compiler from Impulse CoDeveloper tool set allows to create very sophisticated accelerator modules but requires much development effort especially where a complex algorithm implementation is required. In contrast, SOPC Builder tool is much more robust in use but does not intrinsically provide as good speed-up as CoDeveloper. See section 3.2 for details and further discussion.. 2.3.4. Future Trends. FPGAs are currently gaining a firm position in the field of high-speed data processing among modern multi-core CPUs and GPUs. S. Asano et al. in [39] prove, that FPGA performance can be competitive or, in some cases, even better than modern CPU and GPU systems. Modern high-density FPGAs allow to implement multi-core high-performance microprocessors and microcontrollers and complex blocks of specialized logic. Such logic is currently utilized as hardware acceleration modules in supercomputers. This trend leads to the highest performance solutions and would probably be kept. However, increasing amount of available FPGA resources allows construction of stand-alone FPGA-based embedded systems which do not require a supervising computer nor microcontroller. This is because the same obvious future trends apply to PLDs as to other areas of silicon industry, i.e. increasing the amount of available logic and input-output terminals on a single IC while reducing its total price. In the digital video surveillance domain, the FPGA appear to be a highly promising technology. Video surveillance systems can potentially benefit from FPGA’s features such as possibility to efficiently implement image processing algorithms and very fast communication interfaces. Moreover, internal architecture of systems implemented in FPGA can be flexibly changed in response to temporary requirements which is not possible when using high-performance microcontrollers and microprocessors. Before the FPGA gain more popularity in mainstream video surveillance applications, developers of design tools need to overcome an important disadvantage of the FPGA which is a requirement to code their functionality in rather obscure HDLs. As stated in 2.3.3, some steps towards relatively easy high-level implementation of PLD configuration have already been.

(32) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 21. taken. Specialists with expertise in image processing can successfully implement algorithms without a deep HDL knowledge.. 2.4. State of the Art for Digital Video Surveillance. This section contains information about modern research in the field of digital video surveillance and image processing applications. Solutions described in this section are divided into four categories related to distributed digital video surveillance with an emphasis on FPGA-based systems. First, in 2.4.1, a few interesting FPGA-based implementations of image processing algorithms are reviewed, as they typically are a core of a single node for video surveillance systems. Then, in 2.4.2, selected hardware platforms and components for implementing digital video surveillance systems are presented. Such components can be used as a base to a more elaborate entities. Complete video surveillance nodes and systems are presented in 2.4.3 – diverse approaches to processing nodes and system architectures are reviewed in that subsection. Finally, in 2.4.4, embedded and FPGAbased Web server and Web service implementations are described and reviewed because, according to previous considerations in 2.1.3, each node of distributed video surveillance systems may work as a stand-alone Web server.. 2.4.1. FPGA-based Image Processing Subsystems. The solutions described in this subsection range from general to more specialized ones, which are tailored for DVS. The use of general purpose solutions is not limited to DVS only. All solutions described here can be called subsystems, because they share a common feature - inability to operate autonomously. In all cases, the FPGA is used as an accelerator or coprocessor for tasks most expensive computationally, such as video encoding/decoding or digital filtering. D. Bumann and J. Tinembart [40] built a library of basic morphological operations for implementation in FPGAs. The library is optimized for systems with strong real-time constrains and is described as FPGA vendor independent. However, the library is just a part of a digital system, so it cannot work as a standalone device, i.e. it still requires a PC as a host. Similar example [38] by I. S. Uzun et al. concerns Fast Frourier Transform (FFT) implementation using FPGA, which is a very common issue in the field of signal processing. In that paper the authors propose not only ready-made solutions but also a framework and user interface which simplifies high level synthesis of FFT modules for custom applications. An example of an image processing algorithm implementation in FPGA is presented by R. Djemal et al. in [41]. In that paper the authors describe a hardware implementation of edge-preserving, real-time video smoothing filter. It is assumed that the filter is a computation core and functions as a PCI card controlled by software running on a PC. The described results include both the library of basic modules implemented in VHDL and additional components for combining their functionality. The algorithms had also been integrated in more complex systems. Another solution had been described by G. J. Gent et al. [42]. It is a customized computing platform to accelerate a deformable template image segmentation algorithm implementation. The acceleration is achieved by offloading the most computational intensive tasks to an FPGA-based coprocessor..

(33) CHAPTER 2. DVS CHALLENGES AND TECHNOLOGIES. 22. In [45] H. Neoh and A. Hazanchuk present an adaptive edge detection solution which is also based on FPGA accelerator. In this system the authors use a highlevel synthesis tool (Altera DSP Builder [26]) to implement real-time Canny edge detector core ([25]). It is designed solely as a hardware accelerator dependent on a supervising microprocessor. The microprocessor can be either an external unit or can be implemented along with the accelerator inside the FPGA chip for example as a NIOS-II [27] microcontroller system. Similar approach is presented by G. Cai et al. in [43]. In that paper the authors describe a complete NIOS-II microcontroller system for motion and edge detection implemented in FPGA on Altera DE-1 evaluation board. The system uses a hardware accelerator, which authors call a hard task. The accelerator is implemented using the NIOS-II Embedded Design Suite [14] built-in C-toHardware compiler. It is a high level synthesis tool which generates a HDL code from a function written in C language and primarily executed by NIOSII microcontroller. After the code is generated, the function execution can be redirected to the generated hardware accelerator. It allows the system to gain acceleration of up to two orders of magnitude. Face detection solutions are an important aspect of digital video surveillance. There are many scientific studies which present different approaches to the topic. S. Jin et. al. in [48] and [47] present a design and implementation of a pipelined datapath for real-time face detection using FPGA with cascades of boosted classifiers. The proposed platform uses Virtex-5 FPGA from Xilinx and is able to process 307 frames/s of standard VGA (640×480) resolution when clocked with 125.59 MHz irrespectively to the number of faces detected. The authors substituted massive parallel implementation with a tree-structured cascade processing model. It does not significantly influence the processing throughput, but reduces an amount of required FPGA resources. Another class of similar solutions are subsystems which directly support implementation of DVS. They are usually less algorithmically complex, but can offer a practical value for DVS developers. In [30] L. Longfei and Y. Songyu describe a simple yet effective solution for signal processing and multiplexing optimized for centralized analog video surveillance purposes using programmable logic instead of analog multiplexers. In that solution, multiplexing takes place in digital signal domain, which is beneficial for signal quality and ease of further processing. Since this solution concerns a centralized analog surveillance system, it has some negative properties described in 2.1.1: it works in analog video infrastructure and the processing is centralized. In [32] P. NimalKumar et al. present an FPGA-based coprocessor which improves DSP performance, and is optimized for video surveillance. Similar approach, which uses Xilinx Virtex II Pro FPGA for implementation of complex video processing algorithms, is described by C. Desmouliers et al. in [31]. In [29] M. Brogioli et al. describe H.264 compression algorithm implementation partitioned between software and hardware (DSP and FPGA) elements. In typical scenarios presented here a hardware accelerator module is controlled by a supervising microprocessor or microcontroller. In some cases authors provide case study or information about physical development and the usage of their filters in larger image processing systems, but typically they concentrate solely on an implementation of a filter or an accelerator. An important common feature of all.