DISTRIBUTED PARALLEL COMPUTING ARCHITECTURE FOR MONITORING AND CONTROL OF LARGE PHYSICAL PROCESSES ‘ Thesis for the Degree of M. S. MICHIGAN STATE UNIVERSITY JAMES E. SIEBERT 1977 ABSTRACT DISTRIBUTED PARALLEL COMPUTING ARCHITECTURE FOR MONITORING AND CONTROL OF LARGE PHYSICAL PROCESSES BY James E. Siebert Distributed-intelligence computing systems can success- fully automate remote monitoring and control operations within large physical processes in a cost-effective, practical manner. When computing intelligence is located at the process points in addition to a central control location, neither communication channel bandwidth nor sequential task perfor- mance limit the real-time operation as they do in a central- ized system. Process point control activities and data processing which do not involve the resources of the central host facility are performed locally at the remote sites. This distributed-intelligence approach minimizes data communication; improves real-time performance; isolates performance of system tasks that facilitates system development, expansion, maintenance, and fault tolerance; reduces system complexity and cost; and improves the accuracy of acquired data. Further- more, the economies of LSI technology can often be exploited while maintaining fast and predictable response times. The three elements that comprise a distributed computing systemr-a central host computing facility, satellite computing facilities, and communication links interconnecting host and James E. Siebert satellites—-can be arranged in distributed star, common-bus- organized, or multi-level system configurations. Communica- tion links often effect a recurring operating expense that is minimized by the design algorithm presented. The design process specifies a network of dial-up lines by determining the line data rate, host line requirements, optimal message block length, grade of service provided to the satellites, and wait time for outstanding satellite service requests. The monitoring and control needs of the Water Quality Management Project typify a class of application areas best implemented by a distributed-star computing system. This approach provides the economy, flexibility, and ease of development and expansion needed to successfully manage the land treatment facility. DISTRIBUTED PARALLEL COMPUTING ARCHITECTURE FOR MONITORING AND CONTROL OF LARGE PHYSICAL PROCESSES BY James E. Siebert A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Department.of Electrical Engineering and Systems Science 1977 ACKNOWLEDGEMENTS I wish to express my sincere gratitude to Dr. P. David Fisher for his thorough critical review of the manuscript. In addition, I wish to thank him and the Institute of Water Research for creating the environment that fostered and facilitated this investigation. ii II. III. IV. TABLE OF CONTENTS INTRODUCTION 0 O O O 0 O O O O O O O O O O O O O BAC KG ROUND 0 O I O O O O O O O O O O O I O O O O 2.1 2.2 2.3 Examples of Large Physical Processes. . . . Economics . . . . . . . . . . . . . . . . The Hisotrical Solution: The Centralized System. . . . . . . . . . . . . . . . . . . 2.3.1 Inherent Deficiencies of the Centralized Plan . . . . . . . . . . Distributed Intelligence Monitoring and Control Systems . . . . . . . . . . . . . . 2.4.1 General Description. . . . . . .2 Attendant Benefits of Distributing . .3 Some Deficiencies of the Distributed Approach . . . . . . . . . . . . . . 2.4.4 Notes on the Implementation. . . . . DISTRIBUTED SYSTEM ARCHITECTURAL ALTERNATIVES. . Required Operating Characteristics. . . Star Configuration. . . . . . . . . . . Common-Bus-Organized Configuration. . . Multi-Level Configurations. . . . . . . 3.4.1 Multi-Level Star Configuration . 3.4.2 Multiple-Bus Configuration . . . Summary . . . . . . . . . . . . . . . . . DESIGN CONSIDERATIONS OF THE HOST, SATELLITE, AND CHANNEL FACILITIES . . . . . . . . . . . . . 4.1 4.2 The Host Facility and Operating System Functions . . . . . . . . . . . . . . . . . 4.1.1 An Operating System Alternative. . . Satellite Facility Specifications and Architecture. . . . . . . . . . . . . . . . 4.2.1 Satellite Computer Hardware. . . . . 4.2.2 Hardware Organization. . . . . . . . 4.2.3 Memory Requirements. . . . . . . . . iii Page 35 35 36 38 39 4O 4O VI. 4.4 EXAMPLE APPLICATION: Telecommunications Interface . . Local I/O Interface. . . . . . . Console Interface. . . . . . . . Priority Interrupt Controller. . ommunication Channel and Network . Error Rates and Anomalies of the Switched Telecommunications Network... . . . . . . . . . . . . . Error Control and Its Effectiveness. The Dial- -Up Lines vs. Leased Lines Decision . . . . . . . . . . . . An Algorithm for Specifying the Dial- -Up Communication Network. . . . Optimal Block Length . . Number of Switched Lines Serving the Host . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . bahhbh o 5". o N‘DNNNN o o Iar3qmq I pomaom PI Ir upon II Houucou uuom v. oummmumm ‘ «IImmmueewuo mom u - 1 4 J - I MUMD HMCOHHUOHHDIfim MUMD Hamu soocmn< no Davao uxoz ucmmmum I camoq Houucou 30¢ coflumamcmue .PI .u pom Hmumwmmm HO>OA 4 1 mmmlmm uHOm ommumm I Internal I/O Bus 45 4.2.5 Local I/O Interface The local I/O interface provides parallel and serial I/O ports to input local sensor data and output control signals to external devices. Figure 12 depicts a practical interface configuration. The programmable peripheral interfaces shown are the respective LSI function blocks available in every major miCroprocessor family. These chips produce parallel I/O ports configurable under program control and contain interrupt request generation logic. By providing bidirectional drivers on the I/O device lines, the direction of data flow can be determined at time of installation by both program commands and jumper position. This feature enables the common use of this module for a range of I/O port configurations. For those sensors and controlled devices located more than several meters from the satellite computer, serial communication over a current loop is a very economical and noise-immune transmission technique. The USART performs the serial/parallel conversions; the current-loop multiplexer (MUX) derives, translates, isolates, and selects the current— loop signals. Current-loop data rates are limited by transmission-line effects to 4800 bits-per-second (bps) for lines shorter than 300 meters, decreasing to 600 bps for lines of 3,000 meters (the practical limit of line length using this technique). Internal I/O Bus 46 Programmable Peripheral Interface Bidirectional Bus Drivers Programmable Peripheral Interface F1 on 8 u 0 ~4> :-a 05 H -H u 'o s an an Current- ' Loop MUX Control Port Current- Select Loop Logic MUX Address Bus Note: Bus Drivers omitted Figure 12. Block Diagram of the Local I/O Interface Local I/O Ports to Sensors or Devices Port 1 Port 2 Port 3 Port 4 Port 5 Loop 1 20 mA Loop N 47 4.2.6 Console Interface Access to the internal operation of the microcomputer must be provided locally at the remote site. The computer console and an optional terminal furnish this access through the console interface. Console operations include: halt/resume application program execution, restart, current status display, and the execution of routine maintenance commands by an operator. Examples of such commands are: resetting an alarm flag after operator correction, checking current equipment status, or simple diagnostic self-checking procedures. Presented in Figure 13, the console interface is composed of function blocks previously described. Note that another USART occurs here, redundant with the local I/O interface. This hardware redundancy finds justification in (l) greater software convenience and (2) data rate flexibility (the local I/O current loops are not constrained to the 110-300 bps bit rate required by most terminals). Remote station operation under the control of a local terminal is required in the event of an extended failure at the host facility; data buffer contents could be dumped to the terminal. But even more importantly, the terminal's greatest utility is as a development, diagnostic, and maintenance tool. An appropriate design intent could be to provide for the terminal in hardware and software but connecting one only as required. 48 woomuousH mHomsoo one mowumuuwSHHH cmuwmwn zooam .mH muswwm sowuwamsmuh Handsome oooa Hmuoa nusouuso mHouusoo umsowuwpaou maouusou wHomsou «Homsoo, mum>wua hwannwn AW meadow: aaomsou oHomsoo I4 ' J amen: Houusoo I I u sumo B . no I I mesmuoucH .u n a Huuosaauom t m anmEEmuwoum owwoa uooaom mmmupo< uuom 49 4.2.7 Priority Interrupt Controller Interrupt capability is a standard feature on general purpose uPs. Placing I/O functions under interrupt control rather than polled service enhances real-time performance and simplifies the application software when independent software modules can handle individual tasks. The following functions, listed roughly in order of decreasing priority, should be interrupt serviced: Power fail/restart Operation monitor Telecommunications Local I/O Real-time clock Programmed control delays Console commands \lmmthH The block diagram of a priority interrupt controller which provides service to these functions appears in Figure 14. Two LSI function blocks comprise the controller: a programmable interval timer which possesses at least three independent programmable down-counters, and a priority interrupt controller. Both blocks are currently available in most major microprocessor families. The power-fail detect circuit interprets a signal line from the power supply that forewarns a drop in the logic voltage. When AC power fails, an interrupt request immediately ensues that is given highest service priority. Execution of a power- down routine then Saves machine status and flags the power outage to the restart routine. On power resumption, execution of a restart routine is forced by an immediate Internal I/O Bus 50 (1) Realmtime Clock (2) Control Delay Timer (3) Operations Monitor Control Programmable Interval Timer (3) Priority Interrupt Controller Control Interrupt requests Power Select Logic Power Fail/ Restart Detect Address Power status Clock Countdown Interrupt Requests Figure 14. Block Diagram of the Priority Interrupt Controller 51 interrupt request which restores the former machine status and resumes orderly operation. One of the programmed timers serves as an operation monitor--a "watchdog" of sorts. The timer is set to the longest acceptable scan interval by a portion of regularly executed code. If a scan cycle ever extends beyond the expected worst case, the operation monitor timer generates an interrupt request which indicates a malfunction. Such an occurrence could result from an external device failure, lightning, or other acts of God. With this technique, facility failures can be easily detected and flagged to the host. A degraded mode of operation can be maintained rather than catastrophic facility failure. 4.3 The Communication Channel and Network Distributed architectures provide the potential for excellent real-time system behavior. Performance, however, must not be restricted by limitations on interprocessor communications. Careful consideration must be exercised in the communication network design since it is often the system bottleneck, the primary source of errors, and a recurring monthly operating expense. Hence, the network design objectives reduce to the classic optimization problem: achieve adequate performance at a minimized expense. Often, either all or a major portion of the large automated control system can tolerate communication delays as a result of task 52 isolation. In view of geographic scale and this assumption, the Switched Telecommunications Network becomes a workable and attractive basis for a practical network implementation in light of: (l) the universal availability of connections, (2) reliability from alternate routing, and (3) the lack of an economical alternative. In these sections, an interprocessor communications plan will be proposed along with a formalized quantitative design process. For the remainder of the communications discussion, a star system configuration is generally assumed in which dial-up lines connect the satellite facilities to the host facility. In some cases, leased lines must be specified as discussed later. In Figure 15, common-carrier services are incorporated into the star system diagram. Several design decisions must be made to specify the network and channels: How many lines are required at the host? When should dial-up or private- leased lines be used? What data rate should be used? How long should messages be? And what about error control? All of these areas are discussed in detail in the remainder of this chapter. 4.3.1 Error Rates and Anomalies of the Switched Telecommuni— cations Network No modulation technique can be designed to deliver error-free data in spite of the variety of phenomena common to telephone channels. Consequently, transmission errors are inevitable. Telephone channel performance is well Satellite Computing Facility Satellite Computing Facility Design Decisions: 53 Satellite Computing Facility Common- Carrier Services Host Computing Facility Satellite Computing Facility -- Number of Lines to the Host (M) -- Data Rate or Line Speed (8) -- Optimal Message Block Length (NM) Figure 15. Common-Carrier Services Within a Distributed System 54 documented--the 1969-70 Connection Surveys conducted by Bell Laboratories [6, 7] extensively studied error rates for low- and high-speed bit rates. For low-speed, asynchronous character transmission at 300 bps, the study observed a lost character rate (characters not recovered by the receiving modem) of 13.7 x 10-4 and an incorrect 4 character rate of 1.07 x 10- , for an average overall character error rate of 14.77 x 10.4. This rate represents about 2.2 missing or wrong characters per page of this text! It must be noted, however, that 90% of the calls in that study had no lost characters; a few bad calls dominated in lost characters. In addition, 50% of the calls were error free. The high-speed voice-band data transmission study concentrated on 1200 bps and 2000 bps. Error rates at 1200 and 2000 bps were observed to be comparable to each other, while error rates at 3600 bps were observed to be somewhat larger. Overall, for the 1200 and 2000 bps calls, about 82% 5 of the calls had error rates of 10- or better, although 3.5% of the connections had rates poorer than 10.4. This study was also dominated by a few error prone calls: 72% of the observed errors occurred during 5% of the calls. Table 3 provides a rudimentary summary. 55 Table 3. Bell Connection Survey Summary [6, 7] Overall Average Bit Error Rates: 2000 bps: 1.9 x 10:: 1200 bps: 6.6 x 10_s 300 bps: 1.5 x 10 Percent of Calls with Error Rate Better Than 10-4 2000 bps; 88% 1200 bps: 84% 300 bps: 78% Another class of telephone channel anomalies were summarized in a Bell study published in 1964 [12]. This study focused on synchronous data transmission at 2000 bps. Of the 548 calls attempted in the study, 10.7% (59 calls) could not be completed. The breakdown is as follows: 1. Long dropouts or fades resulting in loss of synchronism between data sets: 5.5% of attempted calls 2. Inability of data sets to achieve initial synchronism.within a reasonable length of time (2-3 minutes): 3.8% of attempted calls 3. Lost connections, i.e., the dial tone returns in middle of a calll: 1.5% of calls Several conclusions can now be stated concerning the requirements and capabilities of a viable error control plan. Error rates vary greatly from call to call, so the error 1. Lost connections were determined to be associated with telephone company maintenance operations. 56 control employed must accomodate high, raw error-rate conditions. Errors also tend to occur in clusters--the result of burst noise, dropouts, and longer interferences. Long concatenations of errors must be detected. Further- more, unsuccessful or interrupted connections need to be recognized, terminated, and re-dialed to obtain a different connection. 4.3.2 Error Control and Its Effectiveness The telephone channel then is a formidable communications environment; nevertheless, accurate reliable operation can be achieved. Reliable detection of errors is the key. Error detection-block retransmission is a simple and very effective system of error handling. Messages to be transmitted are encoded to contain redundant information that can be checked by the receiver to discover transmission errors. Upon detection of an error, the receiver requests retrans- mission of the block containing the error. (See Section 4.3.5.) The customary parity check bits are an example of the general encoding technique. However, an AT&T study conducted in 1960 shows that the single parity bit check will fail to detect about 30% of the character errors caused by a telephone channel [13, p. 79; 14]. However, polynomial codes1 are 1. Martin in reference 13, pp. 81-95, gives a practical condensed treatment of polynomial checking. 57 very effective in detecting telephone channel transmission errors. This is well documented in another Bell System study [12] using a Bose-Chaudhuri-Hocquenghem (BCH) code. The study employed a polynomial code with a word length of 31 bits of which 21 are information (31, 21) transmitted at 2000 bps over the switched network. An undetected bit error rate of 10"9 was observed. That is one expected undetected error in about 10,000 transmissions of the entire text of this thesis. Thus, from these arguments it can be concluded that accurate information transmission is possible over the Switched Telecommunications Network. 4.3.3 The Dial-Up Lines vs. Leased Lines Decision Selection between switched or leased lines is based on three considerations: (1) required speed of service, (2) monthly cost, and (3) data rates and attendant error rates. Speed of Service A significant inescapable delay exists between the time when an originating facility asserts off-hook (a line to the DAA used to request a dial tone) and the start of called party ringing. Figure 16 diagrammatically presents the components comprising this service delay [9, p. 12; 15, pp. 712-15]. Dial tone delay is a function of the type of the local office switching equipment and current traffic load conditions. For light traffic conditions the delay is 58 Off-Hook Speed of Service L- :-4 I Telephone I Dial Automatic Network I Tone Dialing Connection [Data Set Delay I Time LTime lHandshakingl I l l T 0.1-28 7.7S--Rotary* 1-15 S 0.75-4S 0.7S--Touch-Tone Time *Expected Value (Seconds) Figure 16. The Components of Switched-Line Service Delays 59 typically from 100-500 msec. The type of signaling employed sets the dialing time along with the actual number dialed. Rotary dial pulse signaling occurs at 10 i 0.5 . pulses per second, where N pulses required for digit N. Likewise, the Touch-Tone signaling rate can occur up to 10 digits per second. Telephone network connect time can vary from one second to about 20 seconds, and is primarily dependent on the number of network links in tandem for that chance routing. And data set handshaking time is specific to the particular modems in use. In the case of leased lines, the time delay between the request for transmission and the start of information exchange is determined by the communication hardware arrangements at the computers. Clearly, if the leased lines are interrupt serviced, the communication delay would consist of the data set handshaking time. In any case, service delays for leased lines are far shorter than for switched connections. Monthly Cost Within a local dialing area, no broad generalization concerning line costs can be made because each area has a unique definition of communication services and rates (e.g., flat monthly rate or message unit ratings). But usually, dial-up lines will be cheaper except for an appli- cation requiring frequent calls in a dialing area that 60 employs message unit ratings for local calling revenue [16]. For leased lines, rentals can range from a flat $7/month to typically $4/month/mile [17]. For channels extending beyond the host local dialing area, the cost of a leased line can be more or less than for the equivalent usage of the Direct Distance Dial Network. It depends on the length of the line and on the usage pattern. If line usage is high--greater than about three hours per day for example—-then the leased-line tariff will usually be less expensive. Also, a usage pattern of 25-50 calls per day lasting less than three minutes each will have lower costs over a leased line. (References 15 and 18 give extensive treatment.) Data Rates and Error Rates Because switching equipment is not involved with leased lines, the composition of the line is fixed, and these facilities are less prone to the chief source of errors on switched connections--impulse noise originating in switching offices. (No quantitative measurements on the magnitude of this advantage appear to be available [18, Lucky, p. 144].) The practical data rate limit over dial-up lines is currently 3600-4800 bps. Conditioned leased lines can support data rates up to 10,800 bps. But for the nature of communications involved in the applications listed in Chapter II, the higher data rate capacity of leased lines is not requisite. For example, at 2400 bps, the time 61 required to exchange 1,024 eight-bit characters is 2.8 seconds (an approximated effective line speed used in the calculation, see Section 4.3.5). Thus, in the majority of these cases, dial-up lines will satisfy the data rate needs of large distributed systems. 4.3.4 An Algorithm for Specifying the Dial-Up Communication Network A practicable iterative design process for designing the interprocessor communication system composed of dial-up lines appears in Figure 17. This algorithm applies directly to the fundamental network configuration that is depicted in Figure 15. Here, each satellite possesses one line for communications with the host facility, and the host maintains several lines, equally loaded, and with equal priority status. This is the simplest case and serves as the starting point for the analysis of network refinements required by specific applications. For example, a mechanism for organizing the queue of outstanding satellite service requests might be incorporated to allocate host service by priority assignments. It is presumed at this point that an approximation of the protocols and codes to be used is known along with an acceptable range for service delays. The design process starts with a good guess of the line speed that the system requires, and using performance data of various available modems one can proceed through the process. Necessary 62 Choose Line Speed (S) on Modem Cost vs. No. of Host Lines (M) vs. Queue Wait Time [E(tw)] v Half or Full Duplex Operation 9 FDX HDX Line Turnaround Time (tL) Larger Modem Cost and Data Rate Sacrificed 1 9 [_Determine Optimal Block Length (NM)] [uApproximate Mean Service Time [E(t )] I 1 [Choose No. of Lines to Host (M)J N IS 0 t Grade of Service OK 9 Yes No IS * Queue Wait Time OK ? Yes IS Maybe Cheaper No ’ ‘ t Solution Best DesignJ 'ossible * ? Figure 17. Elementary Communication Network Design Algorithm 63 design relations are presented in the following sections; the equations are referenced or derived in the Appendix. By employing these relations and iterating, the algorithm converges on the best cost versus performance compromise. 4.3.5 Optimal Block Length In error detection-retransmission error control, the transmitter sends a block of NM message characters along with the necessary control characters numbering Nc’ The transmitter then stops and waits for acknowledgement from the receiver via some reverse channel. Some line turn- around time, t is associated with the reversal in the L' direction of transmission. If the block is positively acknowledged, no transmission errors occurred and the next block can be sent. Otherwise, the prior block is retransmit- ted. Note that two line turn-arounds take place for each block transmission. With this method of error control, the effective line speed, 5 characters per second (cps), is E less than the actual line speed, S cps, due to both the required retransmissions and line turn-around delays. In the Appendix, the effective line speed is shown to be: r C(N +N )" (l-PE) M c S = S (4.1) E NM+Nc ZStL N + N M M t .1 where PE = bit error rate over switched telephone network at rate SC bps C = number of bits per character 1'8 M 15 64 For a given data rate (S) and its associated error rate (PE), the message block length (NM) is the only variable available to maximize the effective line speed (SE). Figure 18 demonstrates the behavior of SE with varying NM' As demonstrated in the Figure, there exists a unique N for M which SE is maximized for a given set of line conditions. In the Appendix it is shown that the optimal block length is given by: i 2 L 4(NC+2tLS) k Nc+2t s NM=3[(NC+2tLS) - C lntl-PE) 1 - ———7——— (4.2) ‘ The effective line throughput is maximum at this block length. As an example, consider transmission at 2400 bps with t = 0.175 sec., NC = 8 characters, P = 2 x 10-5, and L E C = 8 bits. Optimum block length is NM = 786 characters. The resultant optimized line speed is SE = 231 cps = 1848 bps. These values are indicated in Figure 18. 4.3.6 Number of Switched Lines Serving the Host Since lines connected to the host incur a monthly operating expense, the number of lines must be minimized. But, limited access to the host facility effects a system bottleneck. Contentions for service become inevitable and queues form. A satisfactory compromise must be determined: provide adequate host accessibility at the least monthly expense. To this end, the design goal is to determine the number of lines that provide an adequate grade of service Effective Line Speed SE (cps) 65 250 4 231 200 150 100 50 786 ‘b-------------- l '7 100 1000 10,000 L ‘- Message Block Length NM (characters) Given that: S = 2400 bps = 300 cps PE = 2 x 10'5 tL = 0.175 S NC = 8 characters C = 9 bits/character Figure 18. Plot: Effective Line Speed Versus Message Block Length 66 and an acceptable wait time in the queue of satellites requesting service. Grade of service (PB) is the probability that all lines to the host are simultaneously busy, and, consequently, a requesting satellite cannot obtain immediate service. PB determines the mean wait time in the queue, and it is a function of the number of host lines and their utilization. Basic queueing theory will furnish the relations. The most efficient utilization of the host lines results when all lines are available for use by all satellites. Smaller queues and shorter wait times result with this configuration than with any other. When a satellite and host terminate a call, that line is freed; the host can then dial out or the next satellite to dial the host can obtain a connection. The dispatching discipline at the host, then, is that in which the next satellite to be served is selected at random from the multi-server queue (many lines serving the queue). To achieve an analytical expression for PB' the following assumptions are made: 1. The arrival pattern of satellite service requests follow a Poisson distribution 2. Service times (call lengths) are exponentially distributed 3. All lines to the host are equally loaded 4. No service requests leave the queue 67 From analysis of the application in design, the mean call length or mean service time (E(ts) seconds) can be approximated using estimated data buffer sizes and the effective line speed (SE) being considered. Correspondingly, the mean expected number of calls per hour (E(n)/hour) can be estimated. So now, Em mcfiusmfioo @203 mumucmepnm may no Emuwmflo xoon .om ousmflm Houucou mfism cam ucoesnumcH mo coHumapmm HMHOm mm mpw>fiuospcou mu nusafiud mafia mm v v mm mu pmmmm peas m nouomaaoo cwmwxo po>H0mmwo U mmsmo seam mm mHmEMmlumumz m4 wusumummfioe umumz NU ucflom 3mm .Nm musumummEma Emmuum ms wuflpwnusa Ho mnsumummfime Ham Hm Hm>ma uwm3 ad oUquUmUNUHU mmmmvmmmmmam m< «é ad n . , 5362 52666 x 336mm mmHamwo wuaaamumm amammfla mufiHHmumm moammwo onwaaoumm H6004 pwwmmnm: H6004 Ummmmnm: H6004 command: mZOHedem wBHA¢DO mmaéz- mZOHfidem AdUHooqomOmBmz monafiem AfiUHwoqomDMm mueafiomm mcwusmaou umom .cowuomumucH Houmuodo meBaHmmm Lam..-“ 78 appropriate bulk storage peripheral (magnetic tape), inter- active terminal(s), and the required data communication equipment. 5.3.2 uP-Based Satellite Facilities At the remote sites, there are no operations demanding high-speed data manipulation or arithmetic processing. Even if digital filtering (averaging) is employed, the low band- widths of environmental signals impose minimal processing speed requirements. Consequently, general-purpose micro- processors, with typical instruction cycle times of one to ten microseconds, provide sufficient computing capacity to easily implement the remote-site control algorithms. The satellite computers in Figure 20 could be comprised of the hardware configurations developed in Chapter IV (see Figures 10-14). Located at the remote sites, the satellite computer functions as the local data acquisition system and data logger, data processor, and external device controller. Peripheral hardware operations required at the satellite facility include signal conditioning and multiplex- ing of sensor signals, conversion of analog signals to the digital domain, and local external device control. Also, either an answer-only modem or a modem-automatic calling unit combination is attached to the telecommunications interface (see Figures 10 and 11). Satellite computer memory requirements will vary directly with the complexity of the control algorithms, the number of 79 attached sensors, and the time interval between data dumps to the host computer. Typically, a satellite control program of modest complexity would occupy 4K words of ROM (or EPROM). In addition, a data buffer consisting of 4K words of RAM would be appropriate for most stations performing data acquisition. 5.4 Summary A distributed-star computing system can successfully and inexpensively automate several portions of the WQMP. Requisite remote-site tasks can be performed comfortably by uP-based satellite computers and associated peripheral equip- ment. Furthermore, since communication delays between satellites and the host are of no consequence, dial-up lines can be used to interconnect the system computers. By exploiting these economical resources, the resulting distri- buted-star system becomes a very attractive solution to WQMP automation. As research progresses and optimized facility management strategies are determined, automation of the project can proceed further. Additional satellite facilities, performing both monitoring and control functions, can be gracefully integrated along with upgrades of the host control algorithm. The distributed-star configuration facilitates this gradual evolution towards project optimization. CHAPTER VI SUMMARY Overall, the primary objective of the investigation reported in this thesis is to study a computing system architecture(s) which successfully automates remote monitor- ing and control of large physical processes in a cost- effective practical manner. Generally, such a solution to the automation problem was difficult to attain in the past. The high cost of computing intelligence motivated centralized system configurations whose performance was intrinsically limited by: (l) the expensive bandwidth of the communication channels linking remote sites, and (2) sequential performance of tasks. These constraining factors must be diminished to improve the real-time performance of large computing systems. The recent achievement of practical, inexpensive, general- purpose microprocessors greatly improves the cost-effective- ness of a distributed-intelligence approach to performing remote monitoring and control operations. This alternative approach alleviates the previous performance-limiting factors and proves to be advantageous in the real-time management of geographically large and diverse physical systems. Specifically, computing system architectures were investigated in which computing intelligence is distributed 80 81 away from a central control location to the process points. In the distributed architectures, control activities and data processing which do not involve the resources of the host facility are performed locally at the remote site (cluster of process points) by a dedicated satellite computing facility. System operations are synchronized and supervised by the central host computing facility which also supports a central data base. For geographically large systems, common-carrier services (lines) are often employed for interprocessor communications. Distributed-intelligence system architectures effect several significant improvements over the centralized approach. These improvements are: 1. Minimized communication between the central host and the process points 2. Improved real-time performance from shorter, predictable response delays and a broadened range of implementable control algorithm complexity 3. Isolated performance of tasks allows truly modular implementations that facilitate development, expansion, and maintenance; and provide higher reliability and fault tolerance 4. Reduced system complexity and cost 5. Improved accuracy of acquired data Furthermore, the conflict of optimization criteria--good real-time performance and high host computer productivity-- can be overcome in a distributed system. This is primarily the result of task partitioning, i.e., the performance of most real-time tasks in parallel by dedicated satellite processors. 82 Some problems remain however. In a distributed system that accesses satellite and host facilities over dial—up lines, significant communication delays must be tolerated. In addition, the use of a priority-structured queue often results in a decrease in the utilization of communication facilities and, hence, increases the recurring monthly operating expense. Also, systems providing fast response among interrelated satellite operations, i.e., common-bus- organized configurations, can be expensive and difficult to implement. Initial system programming, system evolution, and system expansion can be monumental chores. Likewise, in distributed systems that employ a distribution of application control with a mix of control roles among satellites and host, system software becomes complex and expensive. Three fundamental elements comprise a distributed computing system: a central host computing facility, satellite computing facilities, and communication links interconnecting host and satellites. The host facility is made up of a general purpose computer supporting data pro- cessing I/O peripherals and telecommunications hardware, and whose resident operating system contains a telecommunications access package. Satellite facilities are often composed of a uP-based computer(s) equipped with local I/O and tele- communications hardware. The microcomputers are relatively simple machines with moderate memory requirements (ROM and RAM), limited console functions, and a simple interrupt 83 structure. Interprocessor communication lines often effect a recurring monthly expense. An iterative communication network design process was presented that minimizes this operating expense. Determined in the design process were: line data rate, line requirements at the host, optimal message block length, grade of service provided to the satellites, .11 and mean wait time for satellite service requests. In addition to the operating expense, the communication lines are the primary source of errors in the system. However, effective error rates of 10"9 have been demonstrated using ii an error detection-retransmission scheme of error control [12]. The monitoring and control needs of the WQMP typify a class of applications best implemented by a distributed-star system configuration. Since the service delays imposed by dial-up lines can be tolerated, the simple distributed-star system investigated in this thesis can very efficiently manage the operation of the WQMP. This approach provides the economy, flexibility, maintainability, and ease of develop- ment and expansion needed to successfully manage the land treatment facility. With advancements in LSI technology, the electronic elements of computing systems (CPUs, memories, peripheral interfaces, etc.) will continue to decrease in cost. Concomitantly, the cost of data communication will continue to grow in its relative percentage of computing system costs. 84 So, for now and in the future, the amount and rate of data communication occurring within a computing system must be minimized to reduce operating expense. This is generally accomplished within distributed-intelligence computing systems. For this reason, distributed systems will become increasingly cost-effective, and, thus, increasingly attractive for application in areas not heretofore economic- ally feasible. Distributed system architectures have a growing future role in an expanding family of application areas . qu . I» APPENDIX [3; .3 APPENDIX DERIVATION OF EFFECTIVE LINE SPEED AND OPTIMUM MESSAGE BLOCK LENGTH When the error detection-block retransmission procedure for error control is employed, the resulting effective line speed (SE cps) is less than the modem operating speed (S cps). In this analysis, all data transmission occurs at a constant Baud rate. The sequence of events across the line is: ...[tL][Msg A][tL][ACK][tL][Msg B][tL][NAK][tL][Msg B]... where tL = total turnaround time of the line and modems Msg A = duration in seconds of message block A ACK = duration in seconds of an acknowledgement of accurate message reception NAK = duration in seconds of a negative acknowledge- ment (error detected) Assuming that the line behaves as a binary symetric channel with an error probability of P the mean time elapsed in E! the transmission of one block is: 85 "via. A N +N N +N _ M C M c 2 3 tB—ZtL+ S + [(2tL+ S )(PR+PR +PR ...)] N +N N +N P _ M c y M c R tB—ZtL+ s + [(2tLT s )(1_PR)] N +N P _ M C R tB-[ZtL + S ][l + l-PR] (A.l) where tB = total seconds taken to transmit one message block NM = number of message characters contained in the message block 3 4 NC = total number of control characters used in the J procedure including ACK, NAK, identification, error check, etc. pR = 1 - (l-PE)C(NM+NC) (A.2) = probability that a block must be retransmitted C = number of bits per character The effective line speed is: S 2 fig characters E tB second (A.3) Substituting Equations (A.1) and (A.2) into (A.3) and rearranging terms yields: (l—PE)C(NM+NC) E (NM+NC ZStL )+(———-) N N L. M M .1 (A.4) This result appears as Equation (4.1) in the text. 87 SE is a maximum for a unique message block length. The optimum N can be found by differentiating Equation (A.4) M and equating to zero. _ _ C(N +N ) d S = [Zn(1 PE)CNM+1](1 PE) M C dNM E NM+NC + 2t S L NM(l-PE)C(NM+NC) - (A.5) N +N M C 2 S( s + 2tL) Setting Equation (A.5) to zero gives: N 2 NC S(§E-+ ZtL) (NM) + S(§- + ZtL)NM + Zn(l-PE)C = 0 (A.6) Solving Equation (A.6) yields the optimum message block length: 2 4(NC+2tLS) % NC+2tLS = 8[(NC+2tLS) - c Zfi(1-PE) ] - ———§——— (A.7) NM This expression is Equation (4.2) in the text. LIST OF REFERENCES LIST OF REFERENCES Rodnay Zaks, "A Processor Network for Urban Traffic Control," IEEE 1973 Computer Society International 5~ Conference (COMPCON '73) Digest of Papers (February 1973), PP. 215-18. Scott E. Cutler, "Microcomputer Networks in Automobile Traffic Control," Tenth IEEE Computer Society Inter- national Conference (COMPCON '75) Digest of Papers (February 1975), pp. 263-66. I ,~ "7 -- LeRoy H. Anderson, ”The Microcomputer as Distributed Intelligence," Proceedings IEEE 1975 International Symposium on Circuits and Systems (April 1975), pp. 337-40. Frank V. Wagner, "Is Decentralization Inevitable?" Datamation, Vol. 22, No. 11 (November 1976): 86-97. Alan J. Weissberger, "Microprocessors as Intelligent Remote Controllers," 1974 WESCON Technical Papers, Vol. 18 (September 1974), 23/4: 1-4. M.D. Balkovic, H.W. Klancer, S.W. Klare, and W.G. McGruther, "High-Speed voiceband Data Transmission Performance on the Switched Telecommunications Network," The Bell System Technical Journal, Vol. 50, No. 4 (April 1971): 1349-84. H.C. Fleming, R.M. Hutchinson, Jr., "Low-Speed Data Transmission Performance on the Switched Telecommuni- cations Network," The Bell System Technical Journal, Vol. 50, No. 4 (April 1971): 1385-1406. Bradley J. Beitel, Virginia B. Nagel, "An Approach to Generalized Distributed Systems," Advances in Instru- mentation, V01. 29, Part 1 (1974), 518: 1-5. "Data Communications Using the Switched Telecommunica— tions Network," Bell System Technical Reference, American Telephone and Telegraph Company, PUB 41005, May 1971. 88 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 89 EIA Standard RS-232-C, "Interface Between Data Terminal Equipment and Data Communication Equipment Employing Serial Binary Data Interchange," Electronic Industries Association (1969). EIA Standard RS-366, "Interface Between Data Terminal Equipment and Automatic Calling Equipment for Data Communication," Electronic Industries Association. R.L. Townsend, R.N. Watts, "Effectiveness of Error Control in Data Communication over the Switched Tele- phone Network," The Bell System Technical Journal, Vol. 43, No. 6 (November 1964): 2611-37. James Martin, Teleprocessing Network Organization, Prentice-Hall, Inc., 1970. "Error Distribution and Error Control Evaluation," Extracts from Contribution GT. 43, No. 13, February 1960, CCITT Red Book, V01. VII, The International Telecommunication Union, Geneva, 1961. James Martin, Systems Analysis'for Data Transmission, Prentice-Hall, Inc., 1972. John E. Buckley, "Local System Access," Computer Design, Vol. 15, No. 12 (December 1976): 20-22. Mondy Lariz, "Doing an End Run Round Bell's Plans to Time Local Calls with Message Units," Data Communica- tions, February 1977, pp. 29-32. Robert W. Lucky, "Common-Carrier Data Communication," in Computer-Communication Networks, pp. 142-96, edited by Norman Abramson and Franklin F. Kuo, Prentice-Hall, Inc., 1973. "Utilization of Natural Ecosystems for Wastewater Renovation," Final Report EPA Project No. Y005065, Institute of Water Research, Michigan State University, March 1976. .4 I. I . . ,9. . . .. $1.51.! #4: 1:... | $63.43.,“ MICHIGAN STATE UNIV. LIBRQRIES 31293009602248