Several factors determine whether the performance of an NI LabVIEW program improves on a multicore system (for example, dual-core and quad-core machines). Such factors include specifics of the new hardware, structure of the LabVIEW application, and system software. Tests of common LabVIEW program structures show, on average, a 25 to 35 percent improvement in execution time because of the natural parallelism in most LabVIEW code. However, the nature of an individual program can significantly affect this estimate. Optimizing a LabVIEW program for a multicore computing environment can significantly reduce execution time when you upgrade to a multicore computing system. This white paper addresses the major factors affecting LabVIEW program performance on multicore systems.
When evaluating LabVIEW program performance, memory usage and execution time are the main metrics to consider. Execution time is the amount of time required to process a group of instructions, usually measured in seconds. Memory usage is the amount of memory space required to process a group of instructions, usually measured in bytes. These measurements are good indicators of individual program performance, with execution time being key. Another performance improvement seen on multicore systems is responsiveness. Responsiveness – how quickly a program or system responds to inputs – does not take into account the amount of time required to execute a desired action. Multicore systems exhibit improved responsiveness due to multitasking with several cores available. While this is a performance improvement, it is not necessarily an indication of program execution time improvement.
LabVIEW automatically scales programs to take advantage of multiple processors on high-end computing systems by determining the number of cores available and helping you develop a greater number of threads. For example, LabVIEW creates eight threads for program execution when running on an octal-core computer.
Replacing a single-core computing system with a multicore system that uses processing units with clock speeds equal to that of the single-core system shortens LabVIEW program execution time. Ideally, program execution speed increases by a factor equal to the number of cores on the multicore computing system (for example, a four times speed increase on a quad-core system); however, communication overhead between threads and cores prevents ideal execution time improvement. If the LabVIEW program in question is completely sequential and running on a single processor, it needs less shared processor time with other software, leading to program execution time improvement. If the program in question is completely parallel and consists of tasks of equal size having no data dependency, you can achieve near ideal execution time improvement.
When you replace a single-core computing system with a multicore system that uses processors with slower clock speeds, you create an ambiguous scenario for determining changes in LabVIEW program execution time. If the LabVIEW program in question is completely sequential and has exclusive access to a single core on the multicore system, relative clock speeds of the system and task scheduling likely determine the effect on execution time. Execution time of a LabVIEW program that is completely parallel, that consists of tasks of equal size having no data dependency, and that is given access to all available cores on the multicore machine is also dependent on relative clock speeds and task scheduling.
Memory organization in multicore computing systems affects communication overhead and LabVIEW program execution speed. Common memory architectures are shared memory, distributed memory, and hybrid shared-distributed memory. Shared memory systems use one large global memory space, accessible by all processors, to provide fast communication. However, as more processors are connected to the same memory, a communication bottleneck between the processors and memory occurs. Distributed memory systems use local memory space for each processor and communicate between processors via a communication network, causing slower interprocessor communication than shared memory systems. In addition, shared-distributed memory architecture is used on some systems to exploit the benefits of both architectures. Memory schemes have a significant effect on communication overhead and, as a result, an effect on program execution speed in any language (LabVIEW, C, Visual Basic, and so on).
Physical distances between processors and the quality of interprocessor connections affect LabVIEW program execution speed through communication overhead. Multiple processors on separate ICs exhibit higher interprocessor communication latency than processors on a single IC. This results in larger communication overhead penalties, which slow LabVIEW program execution time. For example, in Figure 1, the dual-processor system (two sockets) on the left has higher latency than the single-chip, dual-core processor on the right.
LabVIEW program execution time on a multicore computer depends just as much on the program as on the computer running it. The program must be written in such a way that it can benefit from the unique environment presented on multicore systems. The degree of program parallelism has a large effect on program execution time, as does granularity (the ratio of computation to communication) and load balancing. A large amount of existing G code is written for sequential execution; however, this type of code likely has some inherent parallelism due to the nature of dataflow programming. As stated previously, tests of common LabVIEW program structures show, on average, a 25 to 35 percent improvement in execution time when moved from a single-core to a multicore system. The nature of an individual program, however, significantly affects this estimate. Optimizing a LabVIEW program for a multicore computing environment can result in large execution time reductions when upgrading to a multicore computing system.
Organizing G code to increase execution speed is complicated when you do not know the hardware on which you are executing the program. Understanding the system a multicore program is running on is vital to achieving maximum execution speed. Multicore programming techniques require a more generic approach for systems with an unknown number of cores. This approach helps ensure some execution time reduction on most multicore machines but may hinder maximum execution speed on any specific system. Hardware-specific tuning of LabVIEW programs can be time-consuming and is not always necessary; however, it may be necessary if you require maximum execution speed on specific hardware. For example, to fully use an octal-core computing system, you can employ advanced parallel programming techniques such as data parallelism or pipelining. Additionally, you can take advantage of the number of cores on a system, the core layout (two dual-cores or one quad-core), the connection scheme, the memory scheme, and information about known bugs to achieve minimal program execution times on multicore systems.
For more information about parallel programming strategies, refer to:
Bottlenecks in parallelism may arise at several levels of the software stack; avoiding this problem is a challenge in traditional languages such as C. An advantage of LabVIEW programming is the “multicore ready” software stack, which removes these bottlenecks up front. To realize the performance gains that are possible with multicore hardware, the software stack has four layers that you must evaluate to determine multicore readiness – the development tool, libraries, device drivers, and the operating system. If these layers are not multicore ready, performance gains are unlikely, and performance degradation may occur. Table 1 shows how LabVIEW ensures a multicore-ready software stack.
Multicore systems with libraries and drivers that are not multicore ready or operating systems that cannot load balance tasks across multiple cores do not execute parallel LabVIEW programs faster.
You must consider a number of factors when determining the expected execution time of LabVIEW programs on multicore systems. Hardware and software issues outside of LabVIEW can hinder execution time improvements when you upgrade to a multicore system that is not properly configured. In addition, a LabVIEW program’s structure is often the main issue when considering multicore performance. LabVIEW helps minimize the factors you must address when upgrading to multicore computing systems by being multicore ready and providing simple and intuitive multicore programming capabilities and examples.
Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries.