Optimisation of the Heterotrophic Production of EPA by the Diatom Nitzschia Laevis

EG-219 Statistical Methods in EngineeringG1 G2

Zamzam ali faraj

Research CentreCollege of Engineering

Swansea University

Swansea UK

Abstract—This report will observe two experiments that were conducted by the Hong Kong university using the diatom Nitzschia laevis by some statistical methods, and 4 questions will be answered by collecting some data using Matlab.

I. Introduction

Current clinical experimentation exhibits the significance of w-3 polyunsaturated fatty acids (w-3 PUFAs). Fish oil is the profitable source of both Eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) Which plays a crucial part in preventing arrhythmia, cardiovascular disease, and cancer. There are plenty restrictions on extracting w-3 PUFAs but most of them experience several problems such as security problems, high purifications, costs, the poison of pesticides and heavy metals. Current experimentation displays that the diatom, Nitzschia laevis is a good EPA producer. G3 G4 G5

The experimental work was conducted by the University of Hong Kong and the diatom Nitzschia laevis (UTEX 2047) was used in that work.Two different experiments were conducted in the studies. In experiment 1 the effects of PI metal w for two different values of PI metal solution as studied. Forty examinations were performedG6 . The EPA yield is gathered for each test. In experiment 2 included 27 different test. Environmental circumstances (PH and temperature) were altered and one medium component (NaCl) is used in the experiment.G7 G8 G9

II. results

Question 1. In the first experiment, the first twenty test that was conducted when the value of PI at 4.5mL/L and there were twenty other tests that were conducted when the value of PI at 13.5 mL/L. In all these tests the levels for NaCl, CaCl2, temperature, and pH were held fixed at 16 g/L, 0.204 g/L, 22 oC and 7.5 respectively. The data includes the EPA yield for different values of PI metal solution. G10 G11 G12

Figure .1 EPA yield values for different PI metal solution

Figure.1 displays EPA yield values at a two different PI metal solution values for each examination. As it is shown, the first and second boxplots are well-proportioned, which means that the data is normally distributed .it is displayed by both of the boxplots that they are similar to each other in symmetry despite that the second boxplot has a higher maximum and median and lower minimum than the first boxplot. The statistical characteristics of each part of data are calculated from the boxplot (except for mean and standard deviation not shown on boxplot they are calculated on Matlab) and Table 1 shows gathered results for each part of data .it is presented that the statistical characteristics of each dataset are very identical, so PI metal solution does not affect the EPA production a lot for when the PI is value is 4.5 mL/L and 13.5mL/L .histogram of both series Figure 2 and Figure 3 also displays and justifies that both data are normally distributed, despite the first data set contains four bars and the second data contains 5 bars .

First part of experiment 1

PI=4.5m L/L

Second part of experiment 1

PI=13.5 mL/L

Minimum value of data

209.9

208.6

Maximum value of data

224.3

227

Mean value of data

217.165

219.025

Standard deviation of data

3.8208

5.3959

Median value of data

217.5

220.15

First quantile value of data

214.1

215.2

Second quantile value of data

219.85

222.75

Interquartile range of data

5.75

7.55

Table 1. statistical characteristics of each part of data

Figure. 2 Histogram series 1

Figure. 3 Histogram series 2

Question 2.Parametric tests are the ones that make presumptions about the parameters of the population dispersion from which the sample is drawn1. Non-parametric tests are defines as those tests that do not involve with the parameters of a distribution2.G13

Parametric

Non-parametric

Assumed distribution

Normal

None

Test

T-test

Sign-test

Usual central measure3

Mean

Median

Advantages

They are most likely to identify genuine differences or relationship that exist.4

They are used for smaller sample size.5

Disadvantages

The sample size that is used for this test is always is very big which makes very difficult to carry on in conducting this kind of tests.6

Special tables are sometimes required for the test statics, sometimes the values in the table must be calculated manually.7

Table2.difference between the parametric tests and non-parametric test

Parametric test to determine the allegation that the usual EPA yield is not identical for each different value of PI value metal solution.

Assuming the data is normally distributed, it is confirmed by the histogram or the boxplot that the data is normally distributed since it is symmetric .it is possible to find the data by using the function normplot() in Matlab . the central theorem cannot be used since we have a small population of data n<30, and the standard deviation is unidentified, thus we can use the t-test, in G14 Figure 4. It is shown the normality of the first of 20 tests and in Figure 5. The normality of the last 20 tests is displayed. since the points are close to the straight line, then the normality of the data is confirmed by both plots.G15 Figure 4.the normality of the first 20 tests where the value of PI=4.5mL/L Figure 5. the normality of the last 20 tests where the of PI=13.5mL/L As the mean is the bias measurement for the t-test, if the value of the mean for each data set is between their confident intervals this proves the values of the EPA yield is not identical to the different values of the PI metal solution. · The mean of the first 20 tests is 217.165 as result of this data it is difficult to deny the null hypothesis (meaning there is no significant between specified data). · The interval confidence for this set of tests is 215.3768 218.9532 · The mean of the last 20 tests is 219.025 in this case as well it is difficult to ignore the null hypothesis. · The interval confidence for this set of tests is 216.4496 221.5504 This explains that the null-theory is accepted and that the EPA yield is different for each different level of the PI metal G16 solution, but are similar in the values in the statistical characteristic wise as shown above in Table 1. A non-parametric test was used to determine the allegation that the EPA yield is different for each level G17 different level of PI metal solution.G18 The main measurement of the non-parametric tests is the maiden, if the median value is not the same then it means that the EPA yield in different for each value of the PI metal solution. For this test sign test is used to confirm this claim is true or not. · The median for the first set of tests is 217.5G19 · The interval confidence for this set of tests is 214.2 219.6 Since the median is included in the interval confidence the deficiency to dismiss the null-theory G20 confirmed in this case. · The median of the second set of data is 220.15 · The interval confidence for this set of tests is 215.2 221.9 Since the maiden is also included in this interval confidence the deficiency to dismiss the null-theory confirmed in this case as well.G21 The parametric test will be considered since it fits the data that is given in Table 1. as it is more accurate and reliable, unlike the non-parametric G22 test, since the data is normally distributed as it is shown in the plots in Figure 4. And Figure 5. The parametric test t-test is used. G23 Question 3. For this experiment, it will be assumed that the residuals (EPA yield) are normally distributed and are not interacting. It will be proven that the first presumption by granting a normality plot using the normplot() command in Matlab, and the second presumption by residuals versus the predicted value plot. Figure 6.normality of the EPA yield residuals Figure 7.The predicted yield against the residuals plot The presumptions that were taken can be proven as the normality plot exhibits that the EPA yield is normally distributed , by ignoring the first couple of residuals, considering them as outliers and the predicted yield against residuals plots displays that the EPA yield are not interacting which also confirms the second presumption .since there is no interaction then that means at that the data are randomly distributed above and below the straight line. G24 G25 Y=?0+?1X1+ ?2X2+ ?3X3+ ?11X12+ ?22X22+ ?33X32+ ?12X1X2+ ?13X1X3+ ?23X2X3 +? Y=EPA yield, X1=PH value, X2=NaCl level, X3=temperature level The following values were conducted by using the function fitlm in Matlab which solves several regression problems and returns evaluations of coefficients ?.G26 ?0 -1930 ?1 457.38 ?2 -7.127 ?3 56.331 ? 11 -34.559 ? 22 0.18032 ? 33 -2.0597 ? 12 0.55312 ? 13 3.0208 ? 23 -0.20885 Table 3. evaluations of ? parameters of the second-order response surface model (3variables)G27 The numerical value of PH and the temperature level is greater than the value on NaCl. As a result of that the greater parameters in the model are the PH value and the temperature, so NaCl can be executed from the model. Question 4. A Simplified version of the 2nd order model can be written as: Y= ?0+ ?1X1 + ?2X2+ ?11X12 + ?22X22+ ?12X1X2+? Y=EPA yield,X1=PH value,X2=temperature level ? 0 -2195.1 ? 1 509.5 ? 2 56.516 ? 11 -37.445 ? 22 -2.1398 ? 12 3.0208 Figure 8. actual v fitted plot Figure 9. the estimated model 2nd order Figure 9. displays the estimated value and real value of the EPA yield, since the size of the data is small the accuracy of the data is not as good as if there more data was represented, in order for that to beG28 G29 G30 III. discussion The effect of the PI metal solution on the EPA yield levels was examined in the first experiment. Two different values of PI were examined in 20 different for each value which is 4.5mL/L and 13.5mL/L so 40 tests in G31 total. the statistical characteristics were calculated for each dataset. the results that were collected in this experiment present that the effect of the effect PI values on the EPA can be dismissed. G32 In the second experiment, the temperature and PH value were considered as two environmental elements and NaCl which Is a medium component in this experiment 27 tests were conducted and the EPA yield at each test and environmental factors were recorded. The effect of NaCl is not important after tryinG33 g to fit data using 3 variables and ? parameters in the 2nd order model, then the parameters were reduced to 2 variables the results are shown in the 3d plot, more data could have been used to achieve more accurate data. G34 IV. appendix %question1 %import data-exper1 x=xlsread('bioengproject.xlsx','sheet1','C4:C23'); y=xlsread('bioengproject.xlsx','sheet1','E4:E23'); %boxplot of data set 1 and 2 figure(1) boxplot(x y) title('epa yeild boxplot'); xlabel('dataset'); ylabel('epa yeild'); %statistical properties of data test1 maximum1=max(x); minimum1=min(x); mean1=mean(x); med1=median(x) s1=std(x); q11=quantile(x,0.25); q21=quantile(x,0.75); r1=iqr(x); %statistical properties of data test2 maximum2=max(y); minimum2=min(y); mean2=mean(y); med2=median(y); s2=std(y); q21=quantile(y,0.25); q22=quantile(y,0.75); r2=iqr(y); %histogram of dataset1 figure(2) histogram(x,'normalization','probability') title('the histogram of data test1 ') xlabel('epa yield') ylabel('relative frequency') %histogram of dataset2 figure(3) histogram(y,'normalization','probability') title('the histogram of data test2') xlabel('epa yield') ylabel('relative frequency') %question 2 %parametric test for dataset1 muval1=mean(x) alpha1=0.05; h1,p1,ci1,statics1=ttest(x,muval1,alpha1,'both') ci1 %parametric test for dataset2 muval2=mean(y) h2,p2,ci2,statics2=ttest(y,muval2,alpha1,'both') ci2 %nonparametric test for dataset1 m1=median(x) h3,p3,statics3=signtest(x,m1,'alpha',0.05,'tail','both'); nu1=numel(x); ds1=sort(x); b1=binoinv(0.025 0.975,nu1,0.5); LOWER_CI1=ds1(b1(1)); UPPER_CI1=ds1(b1(2)+1); CI1=LOWER_CI1 UPPER_CI1 %nonparametric test for dataset2 m2=median(y) h4,p4,statics4=signtest(y,m2,'alpha',0.05,'tail','both'); nu2=numel(y); ds2=sort(y); b2=binoinv(0.025 0.975,nu2,0.5); LOWER_CI2=ds2(b2(1)); UPPER_CI2=ds2(b2(2)+1); CI2=LOWER_CI2 UPPER_CI2 %normality of dataset1 figure(4) normplot(x) %normality of dataset2 figure(5) normplot(y) %question 3 %estimate coefficients x1=xlsread('bioengproject.xlsx','sheet2','C3:C29'); x2=xlsread('bioengproject.xlsx','sheet2','D3:D29'); x3=xlsread('bioengproject.xlsx','sheet2','E3:E29'); y=xlsread('bioengproject.xlsx','sheet2','F3:F29'); x=x1 x2 x3; Model.Residuals=fitlm(x,y,'quadratic'); disp(Model.Residuals) %normality of residuals %residuals vs predicted value plot Model.Residuals Res=table2array(Model.Residuals) Rawresi=Res(:,1) figure(1) normplot(Rawresi) figure(2) fit_y=predict(Model.Residuals,x) title('normal probability plot') figure(2) scatter(fit_y,Rawresi,'filled'); refline(0,0) xlabel('predicted yield') ylabel('residuals'); title('predicted yield against residuals') %question 4 %estimate coefficient x0=x1 x3 model2=fitlm(x0,y,'quadratic') disp(model2) %actual vs predicted value plot figure(3) scatter(y,fit_y,'filled') refline(1,0) xlabel('actual y values') ylabel('fitted y values') title('actual v fitted') %3d plot Dataset = xlsread('BioEngProject.xlsx','Sheet2','B3:F29'); x1 = Dataset(:,2); x2 = Dataset(:,3); x3 = Dataset(:,4); y = Dataset(:,5); x = x1 x2 x3; Model = fitlm(x,y,'y~1+x1:x2+x1:x3+x2:x3+x1^2+x2^2+x3^2-x2-x2:x3-x1:x2-x2^2'); disp(Model) Model.Residuals Res = table2array(Model.Residuals); RawRes = Res(:,1); figure normplot(RawRes); fit_y=predict(Model,x); figure scatter(fit_y, RawRes,'filled'); refline(0,0) xlabel('Predicted yield') ylabel('Residuals') title('Predicted yield against Residuals') figure scatter(y,fit_y,'filled') refline(1,0) xlabel('Actual yield') ylabel('Predicted yield') title('Actual V Predicted yield') CI2 = coefCI(Model,0.05); disp(CI2) figure scatter3(x1,x3,y,'filled','k') hold on x1fit=linspace(min(x1),max(x1),100); x3fit=linspace(min(x3),max(x3),100); X1FIT,X3FIT=meshgrid(x1fit,x3fit); b0=-2195.1; b1=509.5; b2=56.516; b3=3.0208; b4=-37.445; b5=-2.1398; YFIT=b0+b1*X1FIT+b2*X3FIT+b3*X1FIT.*X3FIT+b4*(X1FIT.^2)+b5*(X3FIT.^2); mesh(X1FIT,X3FIT,YFIT) xlabel('Temperature level') ylabel('ph levels') zlabel('Yield') title('estimated value and real values of EPA yield') a,b=max(YFIT(:)); B=(X1FIT(:)); C=(X3FIT(:)); pHvalue1=B(b); Temp=C(b); disp(pHvalue1) disp(Temp) disp(a) 1 Young M. Technical Writer's Handbook V. References 1 Health Knowledge. (2017). Parametric and Non-parametric tests for comparing two or more groups. online Available at: https://www.healthknowledge.org.uk/public-health-textbook/research-methods/1b-statistical-methods/parametric-nonparametric-tests 2 Sphweb.bumc.bu.edu. (2017). When to Use a Nonparametric Test. online Available at: http://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_nonparametric/bs704_nonparametric2.html 3 Health Knowledge. (2017). Parametric and Non-parametric tests for comparing two or more 4Statstutor.ac.uk. (2017). http://www.statstutor.ac.uk/resourckguidetostatistics.pdf 5 Itl.nist.gov. (2017). 7.2. Comparisons based on data from one process. online Available at: http://www.itl.nist.gov/div898/handbook/prc/section2/prc2.htm Accessed 6 Reddy, C. (2017). Advantages and Disadvantages of Parametric Tests - WiseStep. online WiseStep. Available at: https://content.wisestep.com/advantages-disadvantages-parametric-tests/ Accessed 12 Dec. 2017. 7 Itl.nist.gov. (2017). 7.2. Comparisons based on data from one process. online Available at: http://www.itl.nist.gov/div898/handbook/prc/section2/prc2.htm 1 Health Knowledge. (2017). Parametric and Non-parametric tests for comparing two or more groups. online Available at: https://www.healthknowledge.org.uk/public-health-textbook/research-methods/1b-statistical-methods/parametric-nonparametric-tests 2 3 4 5 6 7 Inserted: M Deleted:m Inserted: o Inserted: a Deleted:i Inserted: tion Inserted: is Deleted: tion Deleted:are Inserted: , Inserted: as Deleted:ere Deleted:es Inserted: a Deleted:, Inserted: - Inserted: A n Deleted:N Inserted: r Inserted: - Inserted: - Inserted: , Deleted:t i Inserted: is Deleted:are Inserted: s Inserted: - Inserted: r Inserted: f Deleted:f Inserted: is Deleted:are Inserted: Th Deleted:H