Exploratory factor analysis for software development projects in Mexico

Time and risk management have always been major concerns in a continously growing number of processes and methodologies used to develop all kinds of useful products and services. Software like any other industry not only needs the most appropriate development methodology but also an efﬁcient project management strategy that is capable of estimating and managing development times and any potential risks as best as possible in order to deliver high quality software products and services on time. Hence, the success or failure of software development projects depends on the efﬁciency by which several key factors such as cost, time, and risks are managed and how other software development factors inﬂuenced them. This paper presents an exploratory factor analysis to study the effects produced primarily on time planning caused by common factors observed in outsourced software development projects such as communication, teamwork, personnel training, among others. In our work, we also build a structural model in which we analyze the relationships between sets of dependent and independent variables derived from the responses of a survey applied to 32 Mexican organizations that subcontracted their software development projects to outsourcing providers. The results are presented and the reliability of our model was validated using the SmartPLS software.


Introduction
Time has always been a critical factor in several situations.For instance, deficient time management may produce economic losses for a company that does not meet its commitments on time.Very often, companies rely on external service providers to fulfill their business needs given a lack of internal skills [1,3,5,8,18].In particular, it is well known that software development is one of the most common business areas handed off to outsourcing companies.The outsourcing companies are expected to produce high quality software with shorter development cycles and reduced costs [7,14].In order to do that, software development companies or software suppliers often must use various tools and methodologies that not only can help reduce all development work, but also meet deadlines and budgets [2,25].However, developing software products and services is usually a complex task that requires both knowledge and experience in managing several major aspects related to the whole development process, e.g., costs, risks, personnel, and scheduling or time management issues among others [6,15,26].In fact, the success or failure of a given project heavily depends on how well the previous aspects and several others are managed.
Regarding scheduling issues, it has been shown that software projects that are late usually imply higher expenses.Therefore, time planning, which in this work entails time estimation and management, have become one of the most important aspects of a growing number of software outsourcing projects where meeting deadlines and staying under budget are as important as satisfying all customer requirements [24].In general, factors such as, development methodologies, level of technical skills, available personnel, teamwork, communication issues, 86 EXPLORATORY FACTOR ANALYSIS FOR SOFTWARE DEVELOPMENT PROJECTS IN MEXICO among others, have been identified as relevant in project time planning and coordination efforts.Hence, the motivation of this work is to investigate the level of significance that the previous factors and several others exert on project time planning activities.For that, we performed an exploratory factor analysis (EFA) [17,28] to address the following research question: RQ1: How feasible is to determine the factors that have more relevance on project time planning using an EFA-based model in order to minimize risks.
To answer the previous question, we conducted a survey among 32 Mexican organizations of various types that had to subcontract some or all of their software development needs to outsourcing providers.From the responses, we obtained valuable information about the clients experience on such projects that allowed us to define a structural model along with two sets of independent and dependent variables to analyze their dependencies and level of significance through a series of hypotheses tests.By identifying the most relevant factors, not only time planning and management activities can be executed more effectively, but also any potential risks can be minimized namely, financial risks.Given the ubiquitous nature of software in a continuously growing number of applications, we think this kind of studies can provide valuable information on how to handle numerous technical and managerial aspects commonly found in software development projects within organizations.The rest of this paper is organized as follows.In the next section, we present some background information and motivation for this work by discussing various aspects that are usually present in virtually any process or methodology for developing business or commercial software products and services, and their impact on any time management activities.In Section 3, we describe our research methodology including our survey whose results were then used to build our model.Section 4 presents the statistical analysis and results.The paper concludes with a summary of the key findings and their implications for further research discussed in Section 5.

Background and motivation
Ever since it first appeared, software engineering has always sought to apply a systematic approach for developing, operating, and maintaining software-based systems [2,25].However, as the demand for faster and more efficient software products and services increases, software development methodologies have been constantly evolving to satisfy such demands.Because of that, the scope of software engineering has not been limited to technical activities of software development such as analysis, design, coding, and testing software.In fact, it also covers administrative tasks that can only be successfully completed through an efficient project planning strategy that can ultimately help meet all customer requirements [22].Thus, an essential part of any project planning strategy should consider both, time estimation and time management as two of the most important factors in any software development project [24].Inadequate handling of either or both time factors can originate a series of problems for both customers and development organizations.Clearly, some of the most common problems are multiple delays, additional costs, and in general, a loss of credibility and a negative image for the outsourcing organization [4].Therefore, a correct time planning strategy is so important in any software development project given the various forms to review progress and status information [27].
Regarding time estimation, one approach indicates that time estimates can be derived from past projects that share similar characteristics, i.e., historical data from previous efforts can be used as a basis for proposing new estimates.The latter approach falls into a category known as analogy-based models [15].However, for new software projects the complexity of time estimation increases and other models would have to be considered to come up with such important information [7,25,26].In any case, a successful project management strategy depends on how well the various software development and project management activities are organized and coordinated [22].Among the various activities that play a major role in time planning and management activities, the development methodology used to build the software has been identified as one of the most important factors.For instance, it is well know that any software development methodology divide all the necessary work into several phases or stages usually defined as requirements definition, design, coding and several forms of software testing and quality assurance activities embedded into them [2,25].Clearly, the previous work phases need to be properly managed and planned out in order to minimize any potential risks [21], i.e., it should also include accurate estimates regarding timeframe durations for each phase based on other key factors such as personnel size, their technical skills, available resources, among others.Of course, customers and developers should communicate effectively at all times to ensure deadlines are satisfied as well as all customers demands [16,20].Otherwise, communication deficiencies may cause deliverables to slip.
Besides the development methodology and communication issues, there are other factors that also impact time estimation and time management efforts.For instance, the members of the team responsible for developing the requested software product or service should possess a set of technical and non-technical skills that would allow them to deliver a quality product on time [9].Hence, the level of technical competencies along with the number of available members assigned to a given project represent two additional important aspects for time scheduling tasks.As existing applications continue to evolve and the number of new requirements grows, software development organizations have taken steps to improve their production processes in order to be more competitive in the globalized world.
So far, we have identified and described the main sources that somehow affect any activities related to time planning required in business or commercial software development projects.Clearly, there could be other sources or factors which essentially make any software development time planning activity a complex task that requires both ample knowledge and experience in project management, namely project time planning and risk management.As indicated before, we first conducted a survey among 32 Mexican companies that subcontracted some or all of their software development needs to external or outsourcing providers in order to obtain valuable information from which we were able to define a model intended to analyse the level of significance between sets of independent and dependent variables through an EFA-based study.Moreover, we also used the Smart-PLS software to help us find the correlations between the two sets of aforementioned variables and test our proposed hypotheses [13].In the following sections, we explain how our method is capable of measuring which development factors are more significant or have more influence on both time planning and risk management tasks.

Method
We consulted the database of the Software Council of the state of Nuevo Leon, Mexico (CSOFTMTY) which is an alliance between universities, government, and enterprises searching for economic growth, quality, and innovation mainly in the software industry.From the data set analysed, we developed a questionnaire of 29 multiple-choice questions (indicators), each one using a five-point Likert scale (1.Totally disagree, 2. Disagree, 3. Neither disagree nor agree, 4. Agree, 5. Totally agree), as our main data collection tool [19].For only indicator V 9 the scale had to be changed accordingly (1.Once every two years, 2. Once a year, 3. Twice a year, 4. Three times a year, 5. Four or more times a year).Notice, however, that the questions are formulated from the perspective of a client company evaluating different aspects of the entire development work cycle provided by some software service provider.
The target population for this survey-based study was also selected from the CSOFTMTY database where we identified a total of 73 organizations of various types that were invited to participate during the second half of 2011 although at the end, only 32 of those responded.Since we wanted to collect project-related data from experienced people, the majority of respondents were either technical managers, high-level executives or project managers.Hence, each participant responded all multiple choice questions based on their own background knowledge and experience.A breakdown of the participants according to their role is the following: 71.88% are directors, 12.5% are sub directors, 9.37% are project leaders, and 6.25% are group managers.Meanwhile, the gender distribution is 90.63% for men and 9.37% for women.Even though there are other criteria such as lines of code (LOC) or effort estimates based on the number of available personnel [25,26], the size of the projects consulted in this study was classified as small, medium, or large based on their approximate total costs, i.e., $1-500,000 USD for small, $500,001-1,500,000 USD for medium, and over $1,500,000 USD for large-sized projects.In this case, the project size distribution was roughly 40% for small, 35% for medium, and 25% for large efforts.
As part of the models definition, the questions are then organized in constructs based on the specific aspects or topics of a software development project they are focusing on.Moreover, for convenience each question is assigned a unique identifier or code to facilitate the graphical representation of the various dependencies that may exist among indicators and constructs.Table 1 and 2 show the 29 items that form our questionnaire in conjunction with their corresponding constructs where each one represents one of the following common terms usually found in any software development project: In this work, the Y j variables represent our dependent variables and our goal is to measure the approximate level of significance or influence that the X i latent variables exert on the Y j ones.
Below we present the set of hypothesis tests that were defined based on the proposed constructs, and the data that were collected and analyzed.

Model fit
In order to validate the efficiency of our model, the following tests were carried out: First, we made a normality test on the data focusing on the skewness and kurtosis.As shown in Table 3, the obtained values are within 2, and therefore, we can conclude that the proposed variables are normally distributed.
Next, a median test was conducted to determine whether the data comes from the same population and in which we consider the following hypotheses:  4, the null hypothesis of the medians'comparison is rejected whenever the significance value is less than 0.05 which also corresponds to a confidence level of 95%.Therefore, we can conclude that for the relationships between X 6 and Y 2 as well as X 7 and Y 3 , at least one of the populations has a distinct median.
A multicollinearity analysis is performed in which Table 5 shows the values obtained for each of the latent variables associated with their dependent variables.The Variance Inflation Factor (VIF) should have values under 5 [12].As shown, each VIF value is less than 5 and the tolerance index values are within acceptable levels, i.e., the values are neither close to 0 nor above 1.The latter shows that there is no collinearity between latent variables.
Convergent Validity (CV) evaluates whether a set of indicators measure a particular construct and not some other concept [11].Moreover, the Average Variance Extracted (AVE) represents the average variation that a latent variable exerts over the observable variables [10].It can be shown that values above 0.5 are acceptable [12] Contractors showed a high level of performance The software development methodology satisfied all customer requirements The total project cost was reasonable based on personnel capacities Client sought outsourcing providers due to a lack of internal capacities Contractors adapted to all client necessities Contractors showed a right judgment and kept the client informed about any issues When needed, contractors displayed an adequate command of the English language V 8 Contractors communicated effectively How often new software products or services are released The determination and initiative of client personnel was important to meet objectives Clients long term vision to foresee any future requirements The set of tangible resources (financial resources, physical assets) were sufficient V 13 All project objectives were satisfied Adequate individual and collective interpersonal abilities Project management X 6 The project management strategy help achieve the financial commitments for the project The final product was completed within the established dates Source: Authors as Table 6 shows, all AVE values are above 0.5 and their average value is 0.748 which altogether satisfy the CV criterion [10,12].Composite Reliability (CR) refers to the internal consistency of a latent variable without assuming that the indicators are reliable, but instead it assigns them priorities.Any values between 0.6 and 0.7 are considered appropriate as inferior limit [12].As shown in Table 6, all CR values are above 0.8.
The provider added more contractors to the project due to a shortage of skills Contractors showed at any time that they needed to acquire more work related experience The provider added more contractors to the project to deliver the product sooner The provider managed time effectively in order to meet all project objectives The provider always kept enough personnel to attend each development phase The time allocated for each development phase was adequate Risk analysis Y 3 - During the entire project development, was there any risk of increasing the total cost During the entire project development, was there any risk of compromising the products quality During the entire project development, was there any risk of cancelling the whole project Source: Authors Discriminant Validity proves that a construct measures a concept distinct from other constructs.This type of validity was performed in two parts, the first part consists of the Fornell-Larcker method which compares the squared value of the highest correlation (0.454) against the AVE for each variable.It can be shown that the AVE is superior, and therefore, the second part can be executed.Basically, in the second part we obtained the average cross loading values for each latent variable which are then compared against the composite reliability values [11].Notice that for each latent variable, the composite reliability values are higher than the average cross loading values as shown in Table 6.The R 2 results for dependent variables Y 1 (0.713), Y 2 (0.766), and Y 3 (0.446) are shown in the fifth column of Table 6.Any values above 0.750 are considered substantial whereas values above 0.500 are considered moderate, and values above 0.250 are weak [12].
Cronbach's alpha represents the internal correlation or reliability of a set of indicators that measure either a non-observable or directly measured variable.In this case, each of the proposed variables have been measured with their corresponding indicators, and the results are acceptable based on the fact that they all satisfy the lower limit of 0.6 [12].On the other hand, X 2 and X 4 are measured with just one indicator and that explains the high value obtained (1.000) as shown in Table 6.
Analysis of Q 2 value.Table 7 shows the predictive relevance of the model.For instance, any Q 2 values above 0.35 have a high predictive relevance whereas any values between 0.15 and 0.35 have a medium predictive relevance.Lastly, any values between 0.02 and 0.15 have a low predictive relevance [13].Source: Analysis of results using SMART-PLS and SPSS.

Independence Test
In order to prove the hypotheses (X i → Y j ) we used a χ 2 test to probe the relationship between two constructs by the following hypothesis: H 0 : The study construct is not related to the dependent variable.
H a : The study construct is related to the dependent variable.
The test shown in Table 8 where all practical cases are higher than their theoretical counterparts except for the relationship between Innovation (X 4 ) and Development time planning (Y 2 ).Therefore, each of the proposed hypotheses is accepted except H 5 .

Analysis of results
Employee training programs (X 1 ).Each employee assigned to work in a project is expected to apply various skills and perform several activities that would in turn help the project satisfy its requirements.This variable also represents all qualified personnel, development programs and any costs related to it.Moreover, it is important to always maintain a good working relationship with the service provider, share information and communicate effectively.Once the project is over, certain employees may be assigned to train users on how to operate the recently created software.In general, this variable is relevant whenever there is a lack of internal skills.
Outsourcing (X 2 ).This variable is relevant for companies whose main business has nothing to do with software development.In this case, the idea is to hand off all the company's software development needs to an external service provider in order to save time and money that otherwise the company would have had to invest to hire and train the right personnel to do it.
Communication (X 3 ).This variable is very important throughout the entire development work cycle but especially during the initial and final stages of a project.All personnel from both the client and outsourcing companies should keep in constant communication during all development phases [16,20].It is possible that communication deficiencies may cause a negative impact on other important variables, e.g., the negative values obtained for 'lack of internal development skills'(Y 1 ) and 'development time planning'(Y 2 ).
Innovation (X 4 ).This construct is not relevant because so far, one of the main business objectives of software development companies is to satisfy all software-related customer demands on time and within cost estimates, i.e., the former are not necessarily innovating at least at the time when this work was performed.Actually, customer or client organizations are the ones who are innovating with the aid of the software products and/or services they are continuously requesting.Teamwork (X 5 ).Represents the group of people responsible for performing all the necessary activities to develop the end product.Clearly, there has to be an effective teamwork collaboration to minimize any potential risks, in particular anything that could impact the proposed schedule and its deliverables.Hence, this variable is also relevant.
Project management (X 6 ).Evidently, this construct should not be overlooked as it is fundamental for other major variables, in this case, for 'development time planning '(Y 2 ).Moreover, we used a factor that represents the 'vision of the software development process 'as an indicator (V 11 ).Such indicator tries to reflect the idea that projects, in general, should be built with a broader long term vision and not just to satisfy any immediate needs Project completion (X 7 ).Unexpected events of various kinds may cause delays at any stage of the development process among several other risks [24].Therefore, it is very important that any software provider has an effective project management strategy capable of handling the aforementioned events in order to reduce or eliminate risks that can affect the estimated completion of a given project.
Lack of internal development skills (Y 1 ).Even though the software provider is responsible for the whole development process, there may be instances where personnel from the client company may be asked to provide feedback or guidance with certain project related details.Thus, the less client employees know about the way business software is usually developed, the higher the risks that they might provide incorrect or misleading information.
Development time planning (Y 2 ).The development time becomes larger, the higher the chances of witnessing more risks that may increase costs, cause delays, among others.
As with any empirical study, our model is not freed from a number of threats of validity, both internally and externally as well as the construct itself.First, internal validity reflects the extent to which the proposed model supports its outcome.The criteria analyzed in section 4 provide support for a general assessment of the internal and construct validity of our method.On the other hand, external validity is concerned with the generalization of the obtained results to other environments or practical cases.In this case, our study only involved 32 respondents from the same geographical region which represents a sample that somehow limits any attempts to generalize the results to any software development projects.However, our model is capable of producing effective results for samples of size 30 or more to comply with normality tests.
Moreover, our model is designed to work with quantitative data only, and requires qualitative data to be transformed into some form of numerical representation prior to its application.In general, several factors such as cross-cultural differences, work practices, and work values in the case of offshore software development can also have a strong impact on the results of studies such as this one.Likewise, technical characteristics of each project, e.g., whether it is a small, medium or large-sized project could play a major role in the outcome.Clearly, business software development is a complex task that involves several factors that need to be properly addressed and managed, in particular, the ones that can become the source of major risks and jeopardize the development of the end product.
Given the large diversity of practical scenarios, we realize the list of dependent and independent variables presented in this case study are not comprehensive because each respondent may come up with several more factors based on their own experiences.Nevertheless, we are confident the set of indicators and constructs that were defined provide a fair description of most current outsourced software development projects, and therefore, we consider this kind of experimental studies can be valuable as they can show other ways to approach and analyze certain aspects related to the development of software products or services.

Conclusions
The proposed model is designed to work with quantitative data only, and requires qualitative data to be transformed into some form of numerical representation prior to its application.In general, several factors such as cross-cultural differences, work practices, and work values in the case of offshore software development can also have a strong impact on the results of studies such as this one.Likewise, technical characteristics of each project, e.g., whether it is a small, medium or large-sized project could play a major role in the outcome.

H 9 :
Lack of internal development skills (Y 1 ) is significant for risk analysis (Y 3 ).H 1 0: Development time planning (Y 2 ) is significant for risk analysis (Y 3 ).

Figure 1
Figure 1 presents the complete graphical representation of our model including all the relationships between constructs and indicators.

H 0 :
The median values of all k populations are the same H a : At least one of the populations has a distinct median value As shown in Table

Figure 1 .
Figure 1.Graphical view of the model

H 1 :
Employee training programs (X 1 ) are significant for a lack of internal development skills (Y 1 ).H 2 : Outsourcing (X 2 ) is significant for a lack of internal development skills (Y 1 ).H 3 : Communication (X 3 ) is significant for a lack of internal development skills (Y 1 ).H 4 : Communication (X 3 ) is significant for development time planning (Y 2 ).H 5 : Innovation (X 4 ) is significant for development time planning (Y 2 ).H 6 : Teamwork (X 5 ) is significant for development time planning (Y 2 ).H 7 : Project management (X 6 ) is significant for development time planning (Y 2 ).H 8 : Project completion (X 7 ) is significant for risk analysis (Y 3 ).

Table 1 .
, and Survey questions and Hypothesis

Table 2 .
Survey questions and Hypothesis, cont.

Table 6 .
Quality criteria