MCDM Filter with Pareto Parallel Implementation in Shared Memory Environment

Nowadays, multi-criteria decision-making (MCDM) methods are often used to solve problems involving large data sets, especially with the advent of the big data age. In such a context, the multi-criteria decision-making methods theoretically can be used but technically are not efficient in terms of the treatment time. Indeed, the majority of commercial or even experimental multi-criteria decision support tools always have limits in terms of the number of alternatives and the number of criteria to be retained in the decision-making process, which presents a computational challenge to relieve. This present paper discusses the application of parallel computation to meet this challenge and make the application of MCDM methods possible in the presence of a big number of alternatives and criteria. More precisely, the main objective of this work is to provide a parallel filtering mechanism that can be executed even on accessible personal computers and offering a short and reasonable response time. The introduction of a filter as a first step in the decision-making process consists in retaining, as alternatives to be treated by the MCDM method, and by parallel processing only the Pareto solutions. To achieve this objective, we propose a parallel computing approach deploying the Open MP (Open Multi-Processing) paradigm on a shared memory environment to find Pareto solutions. To prove the effectiveness of the proposed approach for problems with large dimensionality, several numerical examples with different dimensions will be examined.


Introduction
This present article is an extension of our paper published at the international conference [1]. In this paper, we have shown the importance of reducing the dimension of the decision matrix used by any MCDM method. We have proposed a preliminary filter based on the Pareto front to eliminate low potential alternatives from the beginning of the decision process. We have proposed two algorithms to reduce the number of pairwise comparisons. Knowing that in the context of big data, the execution time of pairwise comparisons is very long [2]. In this paper, we propose a parallel implementation to plot the Pareto front using thread based parallelism on a multi-core shared memory computer.
The big data era's evolution goes into different shapes such as volume, velocity, variety, veracity; all are challenging tasks. Decision-making based on multi-criteria is one of the most critical issues solving ways to select the most suitable decision among a large volume of data.
The general aim of decision making in the era of Big data is to reduce large-scale problems to a scale that humans can comprehend and act upon. Decision-makers cannot count anymore on classical MCDM methods to analyze data-sets with the profusion of alternatives available.
Parallel computing is very promising to speed up the computation and meet the growing demand for resources to treat and deal with a massive amount of data. We can also refer to a few applications of parallel processing to multi criteria decision making [5]. We attempt to exploit all the available computing power usefully. Especially that all processors are now multi-core, and it is essential to take advantage of this fact. Parallel computing is leveraged through the increasing availability of powerful computing capabilities.
The rest of this paper is organized as follows: the next section will introduce the background of multi-criteria decision making. The third section presents the parallel computing technologies trend. Section 4 illustrates the proposed approach. Finally, the last section will summarize the result and computational performance reached. We conclude this paper by providing a summary and conclusions as well as suggests future research topics.

Background
MCDA reduces human error instead of relying on intuition and experience for effective decision-making. In this section, the fundamental of MCDM is revisited as well as some problems, and the parallel computing technology Open MP is introduced.

Multiple criteria decision analysis
An MCDA or MCDM problem consists of judging a set of alternatives based on their evaluations on a set of criteria. In MCDA, three decision problems are possible [6].
• The first allows to decision-maker (DM) to ranking the alternatives from the best to worst alternative, • The second consists of sorting the alternatives into predefined categories, • The third problem allows DM to select and choose the best alternative.
For the resolution of the third MCDA problems, several methods have been proposed in the literature; which we can classify into two main categories of approach [6] [7]: the first concerns the approach of the synthetic criterion, where all the criteria are aggregated into a single criterion, as examples of these methods, we cite the Weight Sum Method, the Weight Product Method, the Goal Programming method, the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS ), etc. The second approach is known as the outranking approach, where we build a binary relationship between the alternatives, called the outranking relationship. As examples of the methods of this approach, we cite the two popular methods ELECTRE (ELimination et Choix Traduisant la REALITE) [7] and PROMETHEE (Preference Ranking Organization METHod for Enrichment of Evaluations) [8].
Any MCDA problem considers a set A of n alternatives, consisting of candidate solutions, and a family F of m criteria, the view's points according to which the alternatives will be examined and compared. To take into account, the differences between the relative importance of criteria in the decision making, a set W of weights is also provided. To summarize all the data of a decision problem, a matrix, see • F = (g 1 , g 2 , ..., g j , ..., g m ) is the family of m criteria m 2 • A = (a 1 , a 2 , . . . , a j , ..., a n ) is the set of alternatives.
• g j (a i ) is the evaluation of the criterion g j for the alternative a i .
• Min indicates that the criterion to be minimized • Max indicates that the criterion to be maximized

Problem and restriction
In the outranking approach, the construction of relations between the set of alternatives is an operation whose complexity depends both on the set size (number of alternatives to be considered) and the criteria to be taken into account. The main restriction of this approach resides in a large number of pairwise comparisons between the alternatives to be made to construct the outranking relationship. This fact makes the decision process, especially in massive data, very greedy for computation and memory time.
As far as we know, few references can be found about parallel implementations of Multicriteria Decision Support Method [9], even less with the Open MP paradigm. Some works focus on very specific methods : Electre III [10], AHP [11], PROMETHEE [12]. They proposed a parallel implementation that requires a very complex and expensive IT hardware infrastructure.
In [1] we have proposed a revised MCDA approach where we suggest filtering all candidate alternatives and keeping only the most relevant. For that, we propose to reject all the dominated alternatives, according to the applied dominance relation, and keep only the alternatives that are not dominated by any other alternative. The filter consists of finding the actions belonging to the Pareto front to be retained for the rest of the decision process. This operation requires several pairwise comparisons that are very computationally intensive for a large set of alternatives.
Parallel computing on modern computers shows excellent promise for speeding up computing. The key problem is how to convert the sequential algorithms proposed in [1] to parallel algorithms that properly exploit the available resources.

The Pareto-optimal solution
In the literature [13], there are two definitions to express the optimal situation in the sense of Pareto. Let be a set of alternatives A of size n, and considering F the set of m criteria, and let P be the set of Pareto-optimal solutions, called the Pareto front. Then an alternative a of A is a solution of Pareto Optimal and which will be considered in the Pareto Front P if and only if: • P2 : Strong definition In this paper, we use the strong definition P2 to define the Pareto-optimal solutions. The weak definition is too rigid, and it can give an empty set of elements.

The dominance relationship
The dominance relationship is noted by a∆b, and is read: a dominates b and b is dominated by a.

Example of the Pareto Front
To illustrate the definitions of Pareto-optimal solutions, non-Pareto-optimal solutions, and the Pareto Front, we give a case presented in figure 1. It is an MCDA problem where two criteria g1 and g2, are to be maximized. According to this example, all the choices located on the red colored curve are Pareto-optimal solutions, in the sense that there is no other choice, which is their best on the two criteria, g1 and g2. For example, choices a and b are two Pareto-optimal solutions because any other alternative does not dominate a and b. However, the choice c is not a Pareto-optimal solution because it is at least dominated by a and b: a∆c and b∆c. As this example illustrates, the choices in gray color are not Pareto-optimal solutions because there are choices from the Pareto Front, which dominates them. As proposed and recommended by this work, the investigation of the best choices will restrict to the Pareto Front, which means that all non-Pareto-optimal options will be excluded from the next step of the decision process.

Parallel computing and the evolution of multicores
The use of parallel processing is today essential for solving practical problems in science and engineering. Parallelism is a way of speeding up computations that make high time and require large memory, especially for massive data.
In the past, for a very long period, programmers do not take care of the execution time. They rely on evolving hardware and improving CPU speed. Moore's Law [14] states that processing power doubles every 18 months. This law remained correct throughout the 90s decade. This was the result of improvements in the gates-per-die count of transistors per area, the main attribute of CPUs that Moore based his law on. The number of instructions executed per time unit (clock speed) and the instruction-level parallelism (ILP), basically meaning the possibility of performing more than just one single operation within the same clock. Unfortunately, Moore's law of performance gain is over [15]. In recent years CPU manufacturers have started selling CPUs with more computational cores instead of faster CPUs, as beginning in 2003, the laws of physics put an end to the increase in clock speed. The principal reason for this is that doubling the clock speed solicits halving the distance travelled by the electrical signal per clock cycle, which requires the physical size of the CPU to be twice as small [16]. However, reducing the physical dimensions of CPUs is limited by the diffraction limits of the lithographic methods used for chip manufacturing [17]. Other methods are used to increase performance that can at least partly compensate for the limited increase in clock speed. These are, for example, sophisticated ILP schemes, which nowadays are the main basis for performance improvement apart from the gate count. These methods are what manufacturers focus on today, resulting in feature-rich CPUs that are additionally equipped with an increasing number of computational cores.
For the next decade, with the multi-core revolution, Major processor vendors have shifted the trend from increasing clock speeds to adding on-chip parallelism support with multi-core processors. In other words, put more than one CPU core on a single chip [18]. This effectively makes a system with a processor with two cores operate like a dual-processor computer, and a system with a processor with four cores operate like a quad-processor computer.
Based on the computer's memory architectures classification, there are shared-memory computers, distributedmemory computers (clusters), and hybrid supercomputers. For each type of parallel architecture corresponds one or more programming paradigms, with associated languages [19]: 1. Distributed-memory computer: requires a communication network to connect inter-processor memory because each processor, local memory operates independently. The memory is scalable with the number of processors, but data communication between processors is more complicated. The paradigm MPI is a communication protocol for parallel programming in distributed-memory architecture. It is used to allow applications to run in parallel across several separate computers connected by a network.
2. Shared-memory computer: has the ability for all its processors to operate independently but share the same global memory resources. Changes in one memory location affected by one processor are visible to all other processors. Therefore, data sharing between tasks is both fast and uniform due to the proximity of memory to central processing units (CPUs). Still, the scalability between memory and CPUs is relatively poor. the Open MP approach is a development paradigm for this type of architecture.
Moreover, to take advantage of multi-core power, we must adapt our programs and make their execution distributed over the different cores available. This tendency motivated us to favor using a parallel implementation based on the cooperation of threads to filter the initial set of alternatives. Currently, the majority of laptops are shared-memory computers which prompted us to use the Open MP paradigm.

Open MP paradigm
The Open MP standard was formulated in 1997 as an API for writing portable, multithreaded applications [20]. It started as a Fortran-based standard but later grew to include C, C++, and others. The current version is Open MP 5.1 [21]. Open MP is among the most widely used parallel programming (API) and is largely directive-based, making it one of the most accessible parallel programming APIs.

Parallelism in Open MP
Open MP supports the fork-join model of parallel computing where at particular points in the execution, the master thread spawns a number of threads that are executed concurrently to gain a performance benefit. An Open MP program begins with a single thread, the master thread. As the program executes, the application may encounter parallel regions in which the master thread creates thread teams. At the end of a parallel region, the thread teams are parked and the master thread continues execution, see figure 2. From within a parallel region, there can be nested parallel regions where each thread of the original parallel region becomes the master of its own thread team. Nested parallelism can continue to further nest other parallel regions [20]. Fork-join model is a method of programming on parallel machines in which one or more child processes branch out from the root task when it is time to do work in parallel and end when the parallel work is done [22].
The principle is as follows: we introduce directives in a sequential program, and these help the compiler to build parallel code. Open MP is not a language in itself. It provides a set of directives (pragmas), routines and environment variables ( Figure 3 illustrates the environment of Open MP). The Open MP API for C / C ++ contains a large number of directives and constructions [20]. Among the most important directives, we cite : #pragma omp parallel: Define a parallel region. #pragma omp for: Tells Open MP that the for loop, when called from a parallel region, should have its iterations divided among the thread team. #pragma omp sections: Work sharing (instruction blocks). #pragma omp single: Execution by a single thread.

Methodology
The primary novelty and contribution of the current research study is to introduce, into the decision process, a preliminary filtering step based on parallel computation to subsequently make the decision-making process practical and operational to study several scenarios of massive structured data problems.
We propose a decision process revision that considers the scenario of DMs with a large number of alternatives (facing large amounts of data ). The contributions of this paper can be mainly summed up as two aspects: 1. Introduce a filter based on the Pareto front as a first step in the decision process. The proposed filter reduces a large set of alternatives to a smaller set that most likely contains the best choice.
2. The applicability of Open MP, the dominant shared-memory parallel programming model in highperformance computing, to filter alternatives with Pareto front. Parallel implementation significantly reduces execution time compared to the previous sequential implementation.

Solution Implementation
The Pareto filter proposed in [1] is a potential candidate for parallelization for different reasons. The large number of iterations required in the pairwise comparisons and the independence between the iterations which compare, for each pair belonging to the set of alternatives, the group of criteria taken into consideration. The idea is to execute in parallel several pairwise comparisons. We choose to parallel the section that consumes the most execution time in the sequential algorithm and overcome communication overhead by using the Open MP directives.
Large Open MP parallel regions are used because fragmented parallel regions would increase the overhead of creating and terminating threads. The proposed program begins as a single execution thread called the master thread. When the thread meets the parallel sections, it creates a team of threads consisting of the initial thread itself and the other threads becomes the team's main thread. All the members of the team collaborate to update the dominance matrix. At the end of the parallel sections, the further execution of the code is performed only by the master thread.
The parallel program is written in a way that facilitates the automatic parallelization over no-data-dependency loops by compile directives. The pragma OMP PARALLEL is used to fork additional threads to carry out the work.
For example, the proposed algorithm in [1] contains two nested regions (see blow: loops framed in red and blue) which are candidates for parallel execution. On these two regions, we have included the pragma omp parallel for and indicated the number of threads to use:

Experiments and evaluation
In this section, we assess the performance of the parallel implementation of the filter with Open MP. The platform for conducting the experiments is an IntelCore i7 2.60 GHz laptop with 4 cores, 8 logical cores. The program was implemented using C language with Open MP 5.1. We conducted several experiments to measure the advantages of using Open MP. Our empirical study focused on the following parameters: • The problem dimension: the number of alternatives • The number of threads used.
The decision matrix used is based on random numbers with a high number of alternatives (200 to 20.000) and 10 criteria. We have collected several measures to evaluate the performance of the filter : The results show that parallel computing can speed up computation mainly for a large number of alternatives. But it runs at its maximum efficiency before an overhead saturation, reached when handling a large number of threads takes too long and slows down improvement in execution time, see figure 4.  The curve in figure 5 shows the variation of the computation time according to the number of threads. There are three different parts on the graph: -For 1 to 8 threads, the gain in computing time is very close to the theoretical gain, the parallelism is excellent.
The threads are all processed by processor cores. The system behaves like a computer with 8 processors.
-Between 9 and 12 threads, the curve no longer follows the same performance. The increase in the number of threads is still interesting. From 9 threads, hyperthreading technology is used to manage additional threads.

201
-Beyond 12 threads, the result is stable. Managing threads takes too long and slows down improvement in execution time. The curve in figure 6 shows the variation of speedup achieved with 20000 alternatives according to the number of threads used. A green line represents the theoretical speedup.
The speedup roughly follows the theoretical curve, but it starts to deteriorate from the use of 20 threads (thread overhead). Adding more threads usually helps, but after some point, they cause some performance degradation. Finding the right number of threads depends on the decision matrix size and the architecture it runs on.

Conclusion
In this paper, we have proposed a parallel approach based on Open MP to filter the Pareto front from a large set of alternatives. The nature of the program and the features offered by Open MP made parallelization more accessible and straightforward. At the same time, some modifications had to be made to the algorithm to obtain the level of efficiency that we achieved. The proposed implementation performs very high even with the increased size of problem dimensionality. It can be widely used in real-world decision-making cases. It can be incorporated into the analysis of robustness in the Multi-criteria Decision Making process to generate results in a reasonable time.
High-performance computing-based parallel computing is a very promising technology to speed up the computation and facilitate the decision process. On the horizon, there are radically new solutions such as quantum computing, optical computing, which all possess the potential for future parallel computing. All this progress is tightly connected with the development of parallelization models and algorithms for these systems. Without effort in this field, the computational power of super-modern computers, which are today available, cannot be exploited.
in our future work, we want to investigate the possibility and gain of parallel implementation by using GPU accelerators in parallel with CPUs to speed up the decision process. And so, we can take advantage of technological advances in hardware and modern development paradigms, especially with the emergence of computers equipped with high-performance GPUs.