Improved View Selection Algorithm Using SOM and 0/1 Knapsack

Data warehouse is designed for answering analytical queries. Data warehouse saves historical data. In the data warehouse, the response time to analytical queries is long. So reducing the response time is a critical problem. There are a lot of algorithms to solve the problem. Some of them, materialize frequent views. The previously posed queries have important information that will be used in the future. This paper proposes an algorithm for view materialization. The proposed algorithm ﬁnds proper views using previous queries and materializes them. The views are able to answer future queries. The view selection algorithm has four steps. At ﬁrst, it clusters previous queries by SOM method. Then frequent queries are found by Apriori algorithm. In the third step the problem is converted to 0/1 knapsack equations and ﬁnally, optimal queries are joined to create only one view for each cluster. This paper improves the ﬁrst and third step. This paper uses the SOM algorithm for clustering previous queries in the ﬁrst step and it solves the 0/1 knapsack equations according to shufﬂed frog leaping algorithm in the third step. Experimental results show that it improves the previous view selection algorithms according to response time and storage space factor.


Introduction
Nowadays a lot of data are produced in the world.The collection and storage of data is a big problem although the technology has many advances.Also, large dimensional problems are occurring as large amount of data are produced [1].The data are generated by managers for decision making.The advances of computer technology facilitate to store large databases [2].There are two methods to access the data.The first method is called ondemand and the other one is called in-advance.In the first method, data are collected from different databases after executing users queries.But in the second method, data are collected in the data warehouse and then analytical queries are answered [3].A data warehouse is a subject-oriented, integrated, time-variant and nonvolatile source that is used for decision making [4].Subject-oriented means that the data warehouse is created according to a specific subject and it concerns on modeling and analyzing the data, instead of daily operations.Integration means that the data warehouse is created by integrating data from different databases.Databases are usually non-uniform.Time-variant means that a data warehouses data are related to a period of time like 5 or 10 years.Nonvolatile means that data wont be destroyed by itself [4].
Data warehouses are very important for decision making.They integrate data from various sources and save in a data warehouse for business managers [5].
Data in data warehouses are integrated from different sources to help decision making.Thus it must be able to answer the user's analytical queries.The response time for answering OLAP queries is an important factor.The data warehouse uses materialized views to reduce the response time.It is impossible to materialize all views because of limited storage space.So some views must be selected to materialize.There are a lot of algorithms for view selection.One of them materializes profitable views according to previous queries.Previous queries have main information; as they will happen in the future most probably.In the algorithm, previous queries are clustered and then frequent queries are found in each cluster.Therefore optimal queries are found in each cluster according to the limited storage space; and finally, optimal queries are joined to create one view for each cluster [6].It has a long response time; because of using primary techniques for finding optimal queries in each cluster.On the other hand, the main factor in view selection algorithms is response time; so this paper proposes an algorithm that solves the problem and has higher speed than the other ones that were simulated in our paper [6,7].
A data warehouse is subject-oriented.So most of the previous queries posed on a data warehouse are related to a subject.Thus, it is better to materialize views according to the previous queries posed on the data warehouse.Because previous queries contains beneficial information related to the data warehouses subject.But, all the previous queries dont have important information.Therefore, the queries that access the same data and were repeated more than the others are more significant.Because they are more probable to occur in other times in the future.
In this paper, an improved view selection algorithm is proposed.The algorithm uses previous queries; according to previous paragraph.This algorithm clusters the previous queries to find similar queries, then frequent queries are found in each cluster to find similar and frequent queries.But, all the views cant be materialized, because of storage limitation.So, in the next step optimal queries are found according to available space.Finally, the optimal queries are merged together to find one view in each query.The information helps to improve query performance, so it would simplifies decision making.
The main contribution of the proposed algorithm in this paper is in the first and third steps.In the first step, previous queries are clustered using SOM neural network [8,9].It has less response time than the previous one.In the third step, the problem is modeled as 0/1 knapsack based on storage space.The shuffled frog leaping algorithm [10] is used to solve the knapsack equations.This algorithm doesnt trap into a local optimal and also has less response time.
The algorithm explained in [6] and the proposed algorithm are simulated and the results show that the proposed algorithm reduces the response time and storage space.The proposed algorithm has 14.22% improvement in time factors and 45.78% in storage space factor.
The structure of the paper is as the following: section 2 explains related works.Section 3 describes the proposed algorithm and section 4 shows simulation results.The last section explains the conclusion.

Related works
Most of the view selection algorithms materialize views by maximizing the profit.One of the algorithms is a greedy algorithm [11,12].The algorithms materialize optimal views according to limited storage space.But these algorithms dont consider all the parameters like maintenance cost.
The other group of algorithms is genetic algorithms [13,14,15,16].These algorithms select more appropriate views than the others, but they have a long response time for huge data warehouses.
The input of some algorithms is workload.This means that they use previous queries.Because future queries will be very similar to the previous queries.Some of them find similar sub-expressions in the workload and materialize them [17,18].But finding sub-expressions requires a long time.
Some algorithms model a graph based on input queries, according to view selection problem.Some of these graphs are called AND-OR graph or MVPP .According to both cost function and modeled graph, views are materialized [19,20,21].But the algorithm doesn't consider some parameters like the complexity of the query.
One of the algorithms materializes views by maximizing the profit and minimizing the cost [22].Some factors that are important in profit function are as follows: access frequency of the query, query execution time, complexity of the query, query processing cost and view maintenance cost.After that, the DAG graph is constructed according to the cost function.Views are edges in the graph.Then the shortest path is found in the graph.Thus views with maximum cost are found to materialize.The algorithm involves many factors but it doesnt consider some factors like the cost of dropping the materialized view.One of the view selection algorithms predicts the next query and materializes related views according to the prediction [23].But always the prediction isnt confident.So much time will be wasted if the prediction will be wrong.Some algorithms use mathematical modeling.These algorithms convert the view selection problem to mathematical equations and then materialize the views by solving the equations.Some of these algorithms model the problem by Constraint Satisfaction Problem [24,25,26] and integer programming [27,28].The method is very powerful because it uses mathematics.To model the problem, it had to be supposed some constraints, therefore it makes the problem unreal.
The algorithm proposed in [29] is an improvement of the greedy algorithm.The algorithm uses some factors in addition to greedy factors: size of the view, frequency of the view and decision making capability of views [29].The algorithm uses the lattice of cuboids.
The algorithm introduced in [30] is another improvement of the greedy algorithm.The algorithm uses a tablelike structure and a cost model.But the algorithm only considers query processing and view maintenance cost in the cost function.
The algorithm proposed in [31] uses a transactional database.In the algorithm, the input queries are transformed into a transaction table.A transaction corresponds to a query and its itemsets are the original queryś predicates.
The algorithm proposed in [32] uses backtracking search optimization algorithm for materializing views.This algorithm minimizes the cost of query processing within the storage constraint.But, when number of branches increases it needs long time and large space complexity because of multiple function calls.Query prioritization algorithm [33] considers the queries priority.Because each query has a priority value that means how immediately it must be answered.But, if the priority of a query assigns wrongly, it wastes response time of the algorithm.
View materialization can be merged with cloud computing.In [34] a cloud based view materialization algorithm is presented to enhance the performance of the data warehousing.But, cloud computing is very expensive.In some papers clustering and data mining methods are used [6,7,35,36,37].In these algorithms, previous queries are used because they will be used in the future most probably.These algorithms follow the MVCF method.This method has four steps: a) First, queries are clustered using clustering methods.b) Then frequent queries are found in each cluster using data mining methods.c) Then optimal queries are found in each cluster.d) Finally, optimal queries are merged together to find only one view for each cluster.
In [6,7,35,36,37] primary techniques like Hierarchical [7] technique is used for clustering.0/1 knapsack is used for finding optimal queries in each cluster and dynamic programming is used for solving knapsack problems.The solution isn't suitable for view selection problem because of the long response time.Thus it wont be able to use for huge data warehouses.In this paper SOM method [8,9] is used for clustering previous queries and shuffled frog leaping algorithm [10] is used for solving 0/1 knapsack equations to improve the performance of the view selection algorithm.

The proposed view selection algorithm
In this section, the proposed view selection algorithm is explained.This algorithm uses the MVCF method that was explained in the related works.Figure 1 shows the steps in general.
In this algorithm previously posed queries are used, because the queries have essential information that will be needed in the future most probably.The proposed algorithm in this paper is an improvement of the view selection algorithm in [6,7,35,36,37].The algorithm explained in [6,7] is called KD2013 according to the editors name and the year of publication.The proposed view selection algorithm is called SRTTU-SF according to use SOM and shuffled frog leaping algorithm.The proposed view selection algorithm is explained as follow: At first, previous queries are clustered by SOM [8,9] neural network.SOM neural network has input and output nodes.The number of input nodes is equal to the number of dimensions in the data warehouse.So each node corresponds to one dimension.The number of Output nodes is equal to the number of clusters.An edge connects the input node to the output node.In the SOM neural network, each input node is connected to all of the output nodes.Each edge has a weight that its value is between -1 and 1. Weights are produced randomly for the first run of the algorithm.Figure 2 shows an example of the SOM algorithm.This figure has 4 input nodes and 2 output nodes.This means that the data warehouse has 4 dimensions and the number of clusters is 2. As shown in Figure 2 each edge has a weight.Equation 1shows an example of weights.2.
After calculating the distances, the minimum distance is selected to update the weight matrix of that output node.For example, the distance between x and W 1 is 1.67 and the distance between x and W 2 is 0.9.So the minimum value corresponds to W 2 and the weight matrix of W 2 should be updated.The update function is represented in Equation 3.
In this equation, W j is a weight matrix corresponded to the output node j.The distance between x and W j is minimum so W j should be updated by Equation 3.η is the learning rate and its value is between 0 and 1.After updating the W j matrix, the next training query is analyzed and the distances are calculated and then weights are updated.So the process will be repeated for all training queries.For Convergence, the algorithm will be repeated several times.The number of repetition is called epoch.After executing the algorithm, weight matrixes will be calculated.To decide which query belongs to which cluster, we should do as following for each query: at first, each input query is showed as an n × 1 matrix.Then the distance between the input query matrix and all of the weight matrixes (corresponded to this input query) is calculated.Therefore the minimum distance is found and the input query belongs to the cluster with minimum distance.The number of clusters is the same as the number of output nodes.Algorithm 1 shows the pseudo code for finding the queries cluster of the proposed algorithm.
As mentioned in algorithm 1, queries are clustered by SOM neural network.In this algorithm epoch is the number of repetition.In the next stage, frequent queries in each cluster are found.To find the frequent queries Apriori algorithm is used [7].Then optimal queries should be found in each cluster [6].The optimal queries are selected according to limited storage space.The problem likes a knapsack.It is supposed that the limited storage space likes the space of a knapsack.So all the queries cant be put in the knapsack, and some of them should be selected.In our problem, optimal queries should be selected according to the storage space limitation [6].Suppose that Q 1 , Q 2 , , Q n are frequent queries in cluster k.There are two variables P i and S i for each query Q i .The result of executing query Q i is a table that is called T i .The number of T i s records is saved in P i variable.Q i contains some tables in from clause.The record number of the tables is calculated and saved in the S i variable.Therefore for each Q i , both P i and S i are calculated.0/1 Knapsacks equation is constructed according to equation 4.
S i is defined according to equation 5.
The value of Q i can be zero or one; if the i-th query is selected then its value is one; otherwise, it's zero [37].Equation 6 The knapsacks equations are solved through shuffled frog leaping algorithm to get the Q i s value.First, shuffled frog leaping algorithm will be explained.Then the changed shuffled frog leaping algorithm for solving 0/1 knapsack problem is explained, and finally, our proposed algorithm will be described.

Shuffled frog leaping algorithm
Shuffled frog leaping algorithm [10] is based on group behavior of frogs to find a location with maximum food.The algorithm is as follows: In this algorithm, the population includes a set of frogs (solutions).The population participates into some subsets.Each subset is called memeplex.Each memeplex has a different culture to the others.Each frog has an idea in each memeplex.Its idea can be affected from other frogs in its memeplex.Their idea will be improved by memeplex evolution procedure.After executing the evolution procedure for special steps, the ideas will be shared between memeplexes.Local search procedure that is executed inside the memeplex and sharing ideas between memeplexes are continued until the termination condition happens.The algorithm is explained in figure 2.
An initial population of frogs is produced randomly.For a problem with S dimensions (with S variables) the frog i is shown as i = (X i1 , X i2 , , X iS ).Then frogs are sorted according to their fitness in descending order.Their population is divided into m parts (memeplex) that each part has n frogs (their population is P = m * n).Then the first frog is allocated to the first memeplex, the second frog is allocated to the second memeplex, the m-th frog is allocated to the m-th memeplex and m+1-frog is allocated to the first memeplex.In each memeplex, the frog with the best fitness is shown with X b and the frog with the worst fitness is shown with X w .In the whole memeplexes, Algorithm 2 Shuffled frog leaping algorithm [10] procedure SHUFFLED FROG LEAPING ALGORITHM Input: the number of frogs P ; the number of memeplexes m; the number of generation for each memeplex before shuffling n; the number of shuffling iterations it; and the maximum number of iterations iM ax.
Output: best solution Generate random population of P solutions (frogs) for each individual i ∈ P do Calculate fitness(i); end for Sort the population P in descending order of their fitness; Divide P into m memeplexes; for each memeplex do Determine the best and worst frogs; Improve the worst frog position; Repeat for a specific number of iterations; end for Combine the evolved memeplexes; Sort the population P in descending order of their fitness; if termination = true then Return the best solution; end if end procedure the frog that has the best fitness is defined with X g .Then X w is improved in a loop.Equations 7 and 8 shows the improvement function of X w : D i shows the change in the ith frog position.Equation 8shows a new position of Xw.Rand() generated a random number between zero and one.D max is the possible maximum changes in frogs position.If the procedure produces better position, X w (new) will be replaced with X w ; otherwise, X g will be replaced with X h in equation 7 and 8 and the equations will be repeated.But if X w (new) isnt better than X w , then a new solution is produced randomly.The procedure is repeated for specific cycles.In section (b) modified shuffled frog leaping algorithm that is used for solving 0/1 knapsack problem is explained.

Modified shuffled frog leaping algorithm for solving 0/1 knapsack
Shuffled frog leaping algorithm cant solve the knapsack problem directly.So it is modified to solve the knapsack problem [10].The shuffled frog leaping algorithm converges but sometimes it drops into a local optimal.To solve the problem we use a function to shuffle the frogs population.The changes of shuffled frog leaping algorithm are explained as follow:

Producing initial population
Consider each frog as an n-bit number; n is the number of knapsacks dimensions.

508
IMPROVED VIEW SELECTION ALGORITHM USING SOM AND 0/1 KNAPSACK

3.2.2.
Variables discretization Equation 8 should be discrete because 0/1 knapsack problem is a discrete problem.Equation 9shows the discretization of X w (new) position.
The D variable is defined in equation 4 and is a constant variable.

Constrained optimization
Constrained problems are more difficult to solve than unconstrained ones.In constrained ones, we should find a balance between finding the optimal solutions and satisfying constraints.One approach is using repair methods.The repair method that was used in this paper is as follow: first, all items are sorted in descending order of the ratio of their profit to weight.Then the repair method deletes the last one.The repair method is called when the sum of the solution's weight reaches more than the knapsack capacity.

Genetic mutation
Sometimes the shuffled frog leaping algorithm traps in a local optimal.To solve the problem, a genetic mutation is used.The function changes the initial population a bit.This cause that it visits other optimal too.

Termination condition
As mentioned in algorithm2, iM ax is the maximum value of the algorithm's iterations.If the termination condition is a constant number, it may be satisfied before converging.So the termination condition should be like ⌈ iM ax 20 ⌉ ≤ △ ≤ ⌈ iM ax 10 ⌉.

Solving knapsack equations in MVCF method using shuffled frog leaping algorithm
According to mentioned items, the 0/1 knapsack problem is solved by shuffled frog leaping algorithm.So the equation 4 (knapsack equations) could be solved by modified shuffled frog leaping algorithm to find optimal queries in each cluster.The algorithm is explained in algorithm 3. Algorithm 3 shows the algorithm for finding optimal queries.In the previous step, frequent queries were found in each cluster.In this step, optimal queries are found according to storage space limitation.First knapsack equations should be written according to storage space, P i and S i .If the number of a specific clusters queries is d and the number of frogs is P , so P numbers with d-bits should be produced to create an initial population.Then the fitness function is calculated for each frog and frogs are sorted in descending order.Therefore frogs are divided into m memeplexes.X b and X w are calculated for each memeplex.Next, X w is updated according to equation 9.The cycle is repeated for each time.Then genetic mutation is executed and the frogs are sorted in descending order.If the termination condition is satisfied it leaves the loop.X g is the result.After Converting the number into the binary format, bits with the value equal to 1, means that the corresponding query should be materialized.For example, suppose that the final result is a number in binary format like 10100111.It means the bits number 0,2,5,6 and 7 are one and others are zero.So queries number 0,2,5,6 and 7 should be materialized.Since S i is defined for each query, if the sum of S i is more than storage space limitation, then repair method will be used.
In the last step, optimal queries are merged to result in a view for each cluster [35].In this step, related tables will be natural outer join to find a view for the cluster.Figure 3 and 4 show the flowcharts of the proposed algorithm.Figure 3 shows the first step of the algorithm that uses the SOM method for clustering.
Figure 4 shows the flowchart of the third step that uses shuffled frog leaping algorithm for finding optimal queries.Improve the worst frog position by using Eq.9; Repeat for a specific number of iterations; end for Combine the evolved memeplexes; Apply genetic mutation on population; Sort the population P in descending order of their fitness; while (termination = False) do Find the optimal queries according to best solution; end while if the best solution is an infeasible solution then Execute repair methods to find optimal queries; end if end for end procedure

Simulation results
The results and experiments are discussed in this section.The algorithms KD2013 and SRTTU-SF are simulated by Microsoft Visual Studio.We use Microsoft SQL Server database.Simulation data are generated by a loop of queries in the programming language.We produce some dimensional queries.The queries are produced randomly on our dimensions.The results arent sensitive to a special database because data was produced randomly.The data produced logically using dimension tables in C#.Net programming language with Microsoft Visual Studio.Our system has 4GB RAM, CPU 2.2 GHz Corei3.Our data warehouse has 10 dimension tables and one fact table.
To compare the algorithms, the KD2013 and SRTTU-SF algorithm are executed.There are two kinds of query; input and test queries.Both of them are produced randomly and logically using dimension tables.Input queries are the previous queries.After executing the proposed algorithm, test queries are used.The response time and storage space are important factors for view selection algorithms [35,37].The factors are studied in our experiments; so we use three factors which are explained: 1.The number of materialized views' rows: After executing the view selection algorithm, the results view are materialized.This factor counts the number of materialized views' rows.In other words, this factor studies storage space.2. Total time: This factor considers two parameters; first the time of executing the view selection algorithm and second, the response time of answering test queries.3. Test time: The response time for answering test queries.
Each experiment was repeated five times and the result is the average of five numbers and then the diagrams were drawn.First, the optimal value for the number of iteration,α parameter, learning rate (η) and epoch parameter are found.In figure 5 the number of test queries is constant (for example 60), the number of memeplexes is 4, max changes of D is 10, the value of α is 0.5 and maximum number of Xw is 5.In figure 5 by increasing number of iterations and comparison of test time and total time in SRTTU-SF, the optimal value of iterations is calculated.According to figure 5 optimal value of iteration is 70.Suppose that the number of test queries is constant (for example 60), the number of iteration is 70, the number of memeplexes is 4, max changes of D is 10 and maximum number of X w is 5, so the optimal value of α is found.In figure 6 the value of α is increasing to find the optimal value of α.So the optimal value of α is 0.6.The optimal value of the epoch parameter is found in figure 7.In this figure, the number of test queries is constant (for example 60), the number of iteration is 70, the number of memeplexes is 4, max changes of D is 10, the maximum number of X w is 5 and the value of α is 0.6.
According to figure 7, the optimal value of epoch is 10. Figure 8 founds the optimal value of learning rate (η).In this figure, suppose that the number of test queries is constant (for example 60), the number of iteration is 70, the number of memeplexes is 4, max changes of D is 10, the maximum number of X w is 5, the value of is 0.6 and the value of α the epoch parameter is 10.
According to Figure 8, the optimal value of learning rate (η) parameter is 0.01.Now, the effect of improvement in the first step is analyzed.In figures 9, 10 and 11 the optimal values of parameters are supposed; the number of epochs is 10, and the learning rate is 0.01. Figure 9 shows the comparison between KD2013 and improvement of the first step of SRTTU-SF according to first factor (number of materialized records).In this figure, the horizontal axis represents the number of test queries and the vertical axis represents the number of materialized views' records.
According to Figure 9, it is obvious that the number of materialized views rows in the SRTTU-SF algorithm is less than the other.So the proposed algorithm needs less storage space than KD2013.Figure 10 shows that the total time of the proposed algorithm is less than KD2013.Since time is very important in view selection algorithms, the proposed algorithm is better than KD2013 in time factor too. Figure 11, shows the comparison of two algorithms according to test time.The horizontal axis shows the number of test queries and the vertical one shows test time.
Figure 11 shows that the test time of the proposed algorithm is less than KD2013.According to our experiments, it is observed that the improvement of the first step is more efficient than KD2013 according to the mentioned factors.Now, the effect of the improvement in the third step is probed.In figures 12, 13 and 14 the optimal values of parameters are supposed; the value of iteration (repetition of the algorithm) is 70, the number of memeplexes is 4, max changes of D is 10, the value of α is 0.6 and maximum number of X w is 5.In the figures, the horizontal axis shows the number of test queries and the vertical axis shows the experimental factors.They show that the improvement of the third step of the proposed algorithm is better than KD2013 according to our experimental factors.Finally, we compare SRTTU-SF algorithm (which has two improvements in first and third steps) vs. KD2013.In figures 15, 16 and 17 optimal values are supposed.In these figures the value of iteration (repetition of the algorithm) is 70, the number of memeplexes is 4, max changes of D is 10, the value of α is 0.6, the maximum number of X w is 5, the value of epoch is 10 and the value of learning rate (η) is 0.01. Figure 15 shows the differences between KD2013 and SRTTU-SF according to the first factor.In this figure, the horizontal axis shows the number of test queries and the vertical axis shows the number of materialized views' rows.
According to figure 15, it is clear that the size of materialized views in SRTTU-SF is less than KD2013.It means that the proposed algorithm needs less storage space, so it is better than the KD2013 according to storage space.

Conclusion
In this paper, an algorithm for view selection was introduced.The algorithm is called SRTTU-SF.The proposed algorithm uses previous queries.First, previous queries are clustered and then frequent queries are found in each cluster.Therefore according to storage space, optimal queries are found in each cluster.In this stage, optimal queries are found using 0/1 knapsack algorithm.The knapsack algorithm is solved by shuffled frog leaping algorithm.Finally, optimal queries are joined to find only one view for each cluster.The proposed algorithm is an improvement of the KD2013 algorithm.SRTTU-SF algorithm improved the stages of clustering and finding optimal queries.Experimental results show that SRTTU-SF is more efficient than KD2013.The proposed algorithm has 14.22% improvement according to time factor (total and test time) and it has 45.78% improvement according to factor 3 (number of materialized views rows).We use the SOM algorithm to find clusters and shuffled frog algorithm to find optimal queries which improve KD2013 according to time and space factors.

Figure 2 .
Figure 2.An example of SOM.

Figure 5 .
Figure 5.Comparison of total time according to increasing iteration value.

Figure 10 ,
Figure 10, shows the comparison of two algorithms according to factor 2 (total time).The horizontal axis shows the number of test queries and the vertical one shows total time; total time contains both executing time of the algorithm and response time for answering OLAP queries.Figure10shows that the total time of the proposed algorithm is less than KD2013.Since time is very important in view selection algorithms, the proposed algorithm is better than KD2013 in time factor too. Figure11, shows the comparison of two algorithms according to test time.The horizontal axis shows the number of test queries and the vertical one shows test time.Figure11shows that the test time of the proposed algorithm is less than KD2013.According to our experiments, it is observed that the improvement of the first step is more efficient than KD2013 according to the mentioned factors.Now, the effect of the improvement in the third step is probed.In figures 12, 13 and 14 the optimal values of parameters are supposed; the value of iteration (repetition of the algorithm) is 70, the number of memeplexes is 4, max changes of D is 10, the value of α is 0.6 and maximum number of X w is 5.

Figures 12 ,
Figures 12, 13 and 14 show the comparison of the improvement of the third step of SRTTU-SF vs. KD2013 according to the number of materialized views' rows, total time and test time factors respectively.In the figures, the horizontal axis shows the number of test queries and the vertical axis shows the experimental factors.They show that the improvement of the third step of the proposed algorithm is better than KD2013 according to our experimental factors.Finally, we compare SRTTU-SF algorithm (which has two improvements in first and third steps) vs. KD2013.In figures 15, 16 and 17 optimal values are supposed.In these figures the value of iteration (repetition of the algorithm) is 70, the number of memeplexes is 4, max changes of D is 10, the value of α is 0.6, the maximum number of X w is 5, the value of epoch is 10 and the value of learning rate (η) is 0.01.Figure15shows the differences between KD2013 and SRTTU-SF according to the first factor.In this figure, the horizontal axis shows the number of test queries and the vertical axis shows the number of materialized views' rows.According to figure15, it is clear that the size of materialized views in SRTTU-SF is less than KD2013.It means that the proposed algorithm needs less storage space, so it is better than the KD2013 according to storage space.
518IMPROVED VIEW SELECTION ALGORITHM USING SOM AND 0/1 KNAPSACK SOM has training and test data.The training data train the SOM neural network and the test data tests the SOM neural network.In SRTTU-SF algorithm the previous queries are training data and the test queries are test data.After initializing input, output nodes and edges weights, the training queries are analyzed.At first, a training query is selected.The query is shown as a matrix: x = [x 1 , x 2 , x 3 x n ]. x i could have two values.If x i = 0 it means that training query x doesnt contain i dimensions in the query's from clause.If x i = 1 it means that training query x contains i dimensions in its from clause.Then the distance between x and each weight matrix of output nodes will be calculated.In other words in Equation 1 the distance between x and W 1 , W 2 will be calculated.The Euclidean distance between two matrixes A = [a 1 , a 2 a n ] and B = [b 1 , b 2 b n ] is calculated with Equation Dimensions of data warehouse n,ηLearning rate ,number of epochs epoch, number of clusters c. aximize 13Q 1 + 15Q 7 + 28Q 16 + 35Q 17 + 32Q 20 .Subject to120Q 1 + 250Q 7 + 430Q 16 + 310Q 17 + 360Q 20 ≤ 1000.and Q i = 0 or1 where i = 1, 7, 16, 17, 20. M Algorithm 3 Finding optimal queries procedure FINDING OPTIMAL QUERIES Input: the number of frogs P ; the number of memeplexes m; the number of generation for each memeplex before shuffling n; the number of shuffling iterations it, the maximum number of iterations iM ax; and Dimensions of the 0-1 Knapsack problem d

Table 1 .
The percentage of improvement