RBUS6902 Understanding Regression Analysis Assignment Sample

Subject Code : RBUS6902
University : University of Queensland School of Medicine My Assignment Services is not sponsored or endorsed by this college or university.
Subject Name : General Management

Quantitative Business Research Methods

The proxy dependent variable used by the website management agency for examining the consumer’s preference towards the website is of, time spent by the visitor while browsing the website. From the viewpoint of determining the effectiveness of online presence, it is of utmost interest to the organization that customers prefer its website. This interest is particularly magnified in case of E-commerce companies as they are operating on an online platform full time to run their business. The customer acquisition and interest is thus solely based on the visiting time spent by the customers on the company’s website. In this respect, the choice of dependent variable, as the time spent in browsing the website is apt for identifying the factors which will influence this variable.
The focus group interviews are more suited for a qualitative research environment as they are purposeful for collecting in-depth information and are therefore time consuming as well. On the other hand, for the purpose of obtaining primary information for applying analytical techniques, a sample survey is more feasible which not only saves time but is also free from moderator bias. The variables initially chosen as those which can affect the consumer’s amount of time spent on the website are ordinal in nature and hence ratings are the responses which are the meaningful data for processing the analysis. The quantifiable nature of the information makes this study more inclined towards using a sample survey. As for the variables, the factors of “colour” and “design” can be clubbed into a factor called “visual appeal” which would provide an overall idea of the impression collected by the customer upon visiting the website initially.
The linear multiple regression model is a commonly applied technique to establish the relationship between response and independent variables. However, before proceeding with the modeling, it is important to attend to the list if assumptions so that the final outcomes are valid and credible. The assumptions are enlisted below:

The first assumption states that the dependent variable must be measurable on a continuous scale. The dependent variable in current case is the “time” spent on browsing the website which is continuous in nature. Hence, this assumption is verified.
The second assumption is of linearity which states that there must be a linear relation that exists between the dependent variable X and independent variable Y. This assumption is also satisfied as the following scatter plot displayed in figure 1, shows that linearity exists between dependent and independent variables.

The third assumption is about outliers which must not be present as they cast a negative impact on the regression modelling process by reducing the fit of the regression equation (Cohen et al., 2013). Thus, in order to preserve the accuracy of outcomes, no outlier presence must be detected. For current data, it is checked with the help of boxplot images displayed in figure 2. It is seen that for some independent variables namely, the colour composition, relevance of content and speed of uploading pages on website, outlier presence is detected. Even the dependent variable, is observed to hold some data points that lie far from the pattern which leads to violation of current assumption.
The next assumption is that the residuals must be normally distributed. The residuals are also known as the error terms which is defined from the vertical distance between the data point and the regression line ((Schroeder, Sjoquist and Stephan, 2016). The error is hence the unexplained difference between the data point and regression line here, which is also interpreted as the difference between observed value and predicted value. As per current assumption, this difference must be normally distributed. The testing for this assumption is performed with the help of a p-p plot shown in figure 8. It is observed that most of the data points do not coincide with the middle line which indicates the violation of this assumption. The closeness of data points along the diagonal line reveal that the residuals are normally distributed.
The independence of observations is another important consideration before running a regression analysis. It implies that the observations are not affected by one another and in one a single term; this assumption is also known as testing for autocorrelation within the data. This assumption can be tested using the Durbin Watson statistic which as seen from Table 2 is 2.02 and the ideal value suggesting independence of observations lies in the range of 1.5 to 2.5 (Denis, 2018). As the Durbin Watson statistic lies in the range 1.5 - 2.5, the assumption of independence of observations is satisfied.
The final assumption is of multicollinearity which states that the independent variables must not be related to each other. Usually, the multicollinearity situation arises due to redundancy of factors (Schroeder, Sjoquist and Stephan, 2016). While finding out correlation between predictors is one way of testing for multicollinearity, calculation of variance inflation factor is a commonly used method on SPSS which informs regarding the dependency amongst independent variables. A presence of higher degree of correlation amongst independent variable creates problem during fitting of regression model. The collinearity statistics as seen for the coefficients in the model is displayed in Table 1. The ideal value for variance inflation factor must be less than 10 while the tolerance scores must be greater than 0.2 for satisfying the current assumption ((Denis, 2018). The scores for tolerance and variance inflation factor are well within the specified limits and hence, it is inferred that current data is free from any issues of multicollinearity.

Table 1. Coefficients
Model		Collinearity Statistics
Model		Tolerance	VIF
1	Colour composition of the key pages of the website	.716	1.396
	Ease of navigation across pages	.761	1.314
	Font size and design	.659	1.517
	Speed of uploading of the pages	.899	1.113
	Relevance of the content offered on the pages	.668	1.496
	Interestingness of the content offered on the pages	.728	1.373
a. Dependent Variable: Time spent on the website by a user

The linear regression model is displayed in table 2 and the coefficient of determination value as seen from R-squared value, is 0.546 which implies that the current model with six independent variables is capable of explaining approximately 54.6% variation in the dependent variable.

Table 2. Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate	Durbin-Watson
1	.739^a	.546	.538	8.386	2.026
a. Predictors: (Constant), Interestingness of the content offered on the pages, Colour composition of the key pages of the website, Ease of navigation across pages, Speed of uploading of the pages, Relevance of the content offered on the pages, Font size and design
b. Dependent Variable: Time spent on the website by a user

The fitness of current model is determined from the output displayed in table 3. The p-value is observed to be 0.000<0.05 which implies that the current model is statistically significant in predicting the dependent variable of “time” spent by visitors on the website.

Table 3. ANOVA^a
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	28139.411	6	4689.902	66.689	.000^b
	Residual	23418.236	333	70.325
	Total	51557.647	339
a. Dependent Variable: Time spent on the website by a user
b. Predictors: (Constant), Interestingness of the content offered on the pages, Colour composition of the key pages of the website, Ease of navigation across pages, Speed of uploading of the pages, Relevance of the content offered on the pages, Font size and design

The regression model based on the information provided by table 5 is created as under:

time = -0.737+2.871*(colour)+5.644*(navigation)+1.532*(font)+2.466*(speed)-2.036*(relevance)-0.359*(interestingness)

It is observed from table 5 that “interestingness” variable is not statistically significant (0.339>0.05) in predicting the dependent variable while rest of the variables have p-values less than 0.05. It is noteworthy that “interestingness” and “relevance” cast a negative influence on the response variable while; highest influence is observed for “navigation”. A unit increase in the ease of navigation across web pages is likely to raise the time spent on website browsing by 56.4% approximately, holding all other variables constant. The “font” variable is observed to affect the response variable least positively while; “speed” and “colour” also affect the time spent on website browsing positively.

Table 5. Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	-.737	3.550		-.208	.836
	Colour composition of the key pages of the website (colour)	2.871	.435	.288	6.599	.000
	Ease of navigation across pages (navigation)	5.644	.530	.451	10.652	.000
	Font size and design (font)	1.532	.355	.196	4.318	.000
	Speed of uploading of the pages (speed)	2.466	.318	.302	7.748	.000
	Relevance of the content offered on the pages (relevance)	-2.036	.571	-.161	-3.567	.000
	Interestingness of the content offered on the pages (interestingness)	-.359	.374	-.041	-.958	.339
a. Dependent Variable: Time spent on the website by a user

(a) The correlation matrix displayed in the following image shows strong association (0.516) between “colour” and “relevance” while, “font” and “ease” are correlated with a Pearson correlation coefficient of 0.356, “relevance” and “ease” (0.328), “interestingness” and “font” (0.503). Even though, there is no multicollinearity detected from the test, an association between these pairs which is higher than 30% gives an initial idea that these associations could create problem during regression modeling. However, on the grounds of statistical significance, it is determined that only “font’ and “colour” , “font” and “relevance”, “relevance and interestingness”, have a statistically significant association as observed from the respective p-values (>0.01). However, the value of correlation coefficient for all these statistically significantly associative pairs is extremely low so they can be ignored. This speculation leads to the inference that “relevance”, “ease” and “font” might be responsible for causing the issue of multicollinearity has the prior checking for the assumption was not performed.

(b) As mentioned above, observation of the correlation matrix leads to the inference that “ease”, “relevance” and “font” are likely to cause the issue of multicollinearity. The easiest solution to the problem of correlated independent variables is detecting the pairs with highest correlation and removing them from the dataset. This is followed by re-checking for correlation matrix but not only this is a hit and trial method, it is also likely that important information may be lost causing additional issues for model specification. Thus, the correct alternative is grouping highly correlated variables under a single term and performing principal component analysis before proceeding with regression modeling.

(c) The “navigation” variable strongly influences the time spent by a visitor as they increasing look for seamless viewing of web pages in a website and in the absence of this feature, website gives an unstructured and unorganised impression which can be frustrating to the visitors. The information and knowledge is the key factor which defines the purpose of website browsing. While aesthetics and visual appeal may be important, if the information being searched cannot be viewed uninterruptedly, it creates a negative impact. For a positive user experience, a transparent navigation to web pages is highly necessary to provide user with consistent flow of information.

A parsimonious model with lesser independent variables will involve elimination of variables of “relevance” and “interestingness”. The reason behind eliminating “relevance” is due to its negative influence on the time spent as the practical interpretation entails that increase in page relevance will reduce the time spent by visitor on website which is meaningless. Secondly, “interestingness” variable is dropped owing to its lack of statistical significance towards predicting the response variable. The final regression model has R-squared value of 0.526 which is comparatively lesser than the initial model but is statistically significant. The final regression model has all the independent variables statistically significant and with increased degree of influence on the time spent by the visitor. In this model, the set of independent variables stay true to the practical as well as statistical implications derived from the regression analysis. While, “colour” and “font” are important for creating the visual appeal and add attractiveness to the website, “navigation” and “speed” add value to the information provided via website.

time = -6.34+ 2.15*(colour) + 5.14*(navigation) + 1.51*(font) + 2.33*(speed)

Table 6. Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.726^a	.526	.521	8.537
a. Predictors: (Constant), Speed of uploading of the pages, Font size and design, Colour composition of the key pages of the website, Ease of navigation across pages

Table 7. Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
Model		B	Std. Error	Beta	t	Sig.
1	(Constant)	-6.340	3.270		-1.939	.053
	Colour composition of the key pages of the website	2.153	.390	.216	5.517	.000
	Ease of navigation across pages	5.140	.519	.411	9.911	.000
	Font size and design	1.515	.315	.194	4.808	.000
	Speed of uploading of the pages	2.335	.320	.286	7.301	.000
a. Dependent Variable: Time spent on the website by a user

References for Understanding Regression Analysis

Cohen, J., Cohen, P., West, S.G. and Aiken, L.S., 2013. Applied multiple regression/correlation analysis for the behavioral sciences. Routledge.

Denis, D.J., 2018. SPSS data analysis for univariate, bivariate, and multivariate statistics. John Wiley & Sons.

Schroeder, L.D., Sjoquist, D.L. and Stephan, P.E., 2016. Understanding regression analysis: An introductory guide. Sage Publications.

Remember, at the center of any academic work, lies clarity and evidence. Should you need further assistance, do look up to our Management Assignment Help