Optimisation w for two different values of


Optimisation of the Heterotrophic Production of EPA by the Diatom Nitzschia Laevis

EG-219 Statistical Methods in EngineeringG1 G2 

Your time is important. Let us write you an essay from scratch
100% plagiarism free
Sources and citations are provided

Get essay help

Zamzam ali faraj

Research CentreCollege of Engineering

Swansea University

Swansea UK

[email protected]


Abstract—This report will observe two experiments that were conducted by the Hong Kong university using the diatom Nitzschia laevis by some statistical methods, and 4 questions will be answered by collecting some data using Matlab.

                                                                                                                                                        I.          Introduction

Current clinical experimentation exhibits the significance of w-3 polyunsaturated fatty acids (w-3 PUFAs). Fish oil is the profitable source of both Eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) Which plays a crucial part in preventing arrhythmia, cardiovascular disease, and cancer. There are plenty restrictions on extracting w-3 PUFAs but most of them experience several problems such as security problems, high purifications, costs, the poison of pesticides and heavy metals. Current experimentation displays that the diatom, Nitzschia laevis is a good EPA producer.  G3 G4 G5 


The experimental work was conducted by the University of Hong Kong and the diatom Nitzschia laevis (UTEX 2047) was used in that work.Two different experiments were conducted in the studies. In experiment 1 the effects of PI metal w for two different values of PI metal solution as studied. Forty examinations were performedG6 . The EPA yield is gathered for each test. In experiment 2 included 27 different test. Environmental circumstances (PH and temperature) were altered and one medium component (NaCl) is used in the experiment.G7 G8 G9 

    II.         results

Question 1. In the first experiment, the first twenty test that was conducted when the value of PI at 4.5mL/L and there were twenty other tests that were conducted when the value of PI at 13.5 mL/L. In all these tests the levels for NaCl, CaCl2, temperature, and pH were held fixed at 16 g/L, 0.204 g/L, 22 oC and 7.5 respectively. The data includes the EPA yield for different values of PI metal solution.   G10 G11 G12 











Figure .1 EPA yield values for different PI metal solution


Figure.1 displays EPA yield values at a two different PI metal solution values for each examination. As it is shown, the first and second boxplots are well-proportioned, which means that the data is normally distributed .it is displayed by both of the boxplots that they are similar to each other in symmetry despite that the second boxplot has a higher maximum and median and lower minimum than the first boxplot. The statistical characteristics of each part of data are calculated from the boxplot (except for mean and standard deviation not shown on boxplot they are calculated on Matlab) and Table 1 shows gathered results for each part of data .it is presented that the statistical characteristics of each dataset are very identical, so PI metal solution does not affect the EPA production a lot for when the PI is value is 4.5 mL/L and 13.5mL/L .histogram of both series Figure 2 and Figure 3 also displays and justifies that both data are normally distributed, despite the first data set contains four bars and the second data contains 5 bars .









First part of experiment 1
PI=4.5m L/L

Second part of experiment 1
PI=13.5 mL/L

Minimum value of data



Maximum value of data



Mean value of data



Standard deviation of data



Median value of data



First quantile value of data



Second quantile value of data



Interquartile range of data



Table 1. statistical characteristics of each part of data


Figure. 2 Histogram series 1

Figure. 3 Histogram series 2


Question 2.Parametric tests are the ones that make presumptions about the parameters of the population dispersion from which the sample is drawn1. Non-parametric tests are  defines as those tests that do not involve with the parameters of a distribution2.G13 




Assumed distribution






Usual central measure3




They are most likely to identify genuine differences or relationship that exist.4

They are used for smaller sample size.5


The sample size that is used for this test is always is very big which makes very difficult to carry on in conducting this kind of tests.6

Special tables are sometimes required for the test statics, sometimes the values in the table must be calculated manually.7




























            Table2.difference between the parametric tests and non-parametric test


Parametric test to determine the allegation that the usual EPA yield is not identical for each different value of PI value metal solution.

Assuming the data is normally distributed, it is confirmed by the histogram or the boxplot that the data is normally distributed since it is symmetric .it is possible to find the data by using the function normplot() in Matlab . the central theorem cannot be used since we have a small population of data n<30, and the standard deviation is unidentified, thus we can use the t-test, in G14 Figure 4. It is shown the normality of the first of 20 tests and in Figure 5. The normality of the last 20 tests is displayed. since the points are close to the straight line, then the normality of the data is confirmed by both plots.G15                                          Figure 4.the normality of the first 20 tests where the value of PI=4.5mL/L                         Figure 5. the normality of the last 20 tests where the of PI=13.5mL/L       As the mean is the bias measurement for the t-test, if the value of the mean for each data set is between their confident intervals this proves the values of the EPA yield is not identical to the different values of the PI metal solution. ·         The mean of the first 20 tests is 217.165 as result of this data it is difficult to deny the null hypothesis (meaning there is no significant between specified data). ·         The interval confidence for this set of tests is 215.3768 218.9532 ·         The mean of the last 20 tests is 219.025 in this case as well it is difficult to ignore the null hypothesis. ·         The interval confidence for this set of tests is 216.4496 221.5504    This explains that the null-theory is accepted and that the EPA yield is different for each different level of the PI metal G16 solution, but are similar in the values in the statistical characteristic wise as shown above in Table 1.   A non-parametric test was used to determine the allegation that the EPA yield is different for each level G17 different level of PI metal solution.G18  The main measurement of the non-parametric tests is the maiden, if the median value is not the same then it means that the EPA yield in different for each value of the PI metal solution. For this test sign test is used to confirm this claim is true or not. ·         The median for the first set of tests is 217.5G19  ·         The interval confidence for this set of tests is 214.2 219.6 Since the median is included in the interval confidence the deficiency to dismiss the null-theory G20 confirmed in this case. ·         The median of the second set of data is 220.15 ·         The interval confidence for this set of tests is 215.2 221.9 Since the maiden is also included in this interval confidence the deficiency to dismiss the null-theory confirmed in this case as well.G21    The parametric test will be considered since it fits the data that is given in Table 1. as it is more accurate and reliable, unlike the non-parametric G22 test, since the data is normally distributed as it is shown in the plots in Figure 4. And Figure 5. The parametric test t-test is used.   G23    Question 3. For this experiment, it will be assumed that the residuals (EPA yield) are normally distributed and are not interacting. It will be proven that the first presumption by granting a normality plot using the normplot() command in Matlab, and the second presumption by residuals versus the predicted value plot.         Figure 6.normality of the EPA yield residuals                                                         Figure 7.The predicted yield against the residuals  plot     The presumptions that were taken can be proven as the normality plot exhibits that the EPA yield is normally distributed , by ignoring the first couple of residuals, considering them as outliers and the predicted yield against residuals plots displays that the EPA yield are not interacting which also confirms the second presumption .since there is no interaction then that means at that the data are randomly distributed above and below the straight line. G24 G25  Y=?0+?1X1+ ?2X2+ ?3X3+ ?11X12+ ?22X22+ ?33X32+ ?12X1X2+ ?13X1X3+ ?23X2X3 +?   Y=EPA yield, X1=PH value, X2=NaCl level, X3=temperature level   The following values were conducted by using the function fitlm in Matlab which solves several regression problems and returns evaluations of coefficients ?.G26  ?0 -1930 ?­1 457.38 ?2 -7.127 ?3 56.331 ? 11 -34.559 ? 22 0.18032 ? 33 -2.0597 ? 12 0.55312 ? 13 3.0208 ? 23 -0.20885 Table 3. evaluations of ? parameters of the second-order response surface model (3variables)G27    The numerical value of PH and the temperature level is greater than the value on NaCl. As a result of that the greater parameters in the model are the PH value and the temperature, so NaCl can be executed from the model.     Question 4. A Simplified version of the 2nd order model can be        written as:                                                                                                           Y= ?0+ ?1X1 + ?2X2+ ?11X12 + ?22X22+ ?12X1X2+? Y=EPA yield,X1=PH value,X2=temperature level ? 0 -2195.1 ? 1 509.5 ? 2 56.516 ? 11 -37.445 ? 22 -2.1398 ? 12 3.0208       Figure 8. actual v fitted plot       Figure 9. the estimated model 2nd order                  Figure 9. displays the estimated value and real value                                          of the EPA yield, since the size of the data is small                  the accuracy of the data is not as good as if there                                              more data was represented, in order for that to beG28 G29 G30                                                                                                                                                            III.        discussion   The effect of the PI metal solution on the EPA yield levels was examined in the first experiment. Two different values of PI were examined in 20 different for each value which is 4.5mL/L and 13.5mL/L so 40 tests in G31 total. the statistical characteristics were calculated for each dataset. the results that were collected in this experiment present that the effect of the effect PI values on the EPA can be dismissed.  G32    In the second experiment, the temperature and PH value were considered as two environmental elements and NaCl which Is a medium component in this experiment 27 tests were conducted and the EPA yield at each test and environmental factors were recorded. The effect of NaCl is not important after tryinG33 g to fit data using 3 variables and ? parameters in the 2nd order model, then the parameters were reduced to 2 variables the results are shown in the 3d plot, more data could have been used to achieve more accurate data. G34                                                                                                                                                                       IV.        appendix %question1 %import data-exper1 x=xlsread('bioengproject.xlsx','sheet1','C4:C23'); y=xlsread('bioengproject.xlsx','sheet1','E4:E23'); %boxplot of data set 1 and 2 figure(1) boxplot(x y) title('epa yeild boxplot'); xlabel('dataset'); ylabel('epa yeild'); %statistical properties of data test1 maximum1=max(x); minimum1=min(x); mean1=mean(x); med1=median(x) s1=std(x); q11=quantile(x,0.25); q21=quantile(x,0.75); r1=iqr(x); %statistical properties of data test2 maximum2=max(y); minimum2=min(y); mean2=mean(y); med2=median(y); s2=std(y); q21=quantile(y,0.25); q22=quantile(y,0.75); r2=iqr(y);   %histogram of dataset1 figure(2) histogram(x,'normalization','probability') title('the histogram of data test1 ') xlabel('epa yield') ylabel('relative frequency') %histogram of dataset2 figure(3) histogram(y,'normalization','probability') title('the histogram of data test2') xlabel('epa yield') ylabel('relative frequency') %question 2 %parametric test for dataset1 muval1=mean(x) alpha1=0.05; h1,p1,ci1,statics1=ttest(x,muval1,alpha1,'both') ci1 %parametric test for dataset2 muval2=mean(y) h2,p2,ci2,statics2=ttest(y,muval2,alpha1,'both') ci2 %nonparametric test for dataset1 m1=median(x) h3,p3,statics3=signtest(x,m1,'alpha',0.05,'tail','both'); nu1=numel(x); ds1=sort(x); b1=binoinv(0.025 0.975,nu1,0.5); LOWER_CI1=ds1(b1(1)); UPPER_CI1=ds1(b1(2)+1); CI1=LOWER_CI1 UPPER_CI1 %nonparametric test for dataset2 m2=median(y) h4,p4,statics4=signtest(y,m2,'alpha',0.05,'tail','both'); nu2=numel(y); ds2=sort(y); b2=binoinv(0.025 0.975,nu2,0.5); LOWER_CI2=ds2(b2(1)); UPPER_CI2=ds2(b2(2)+1); CI2=LOWER_CI2 UPPER_CI2 %normality of dataset1 figure(4) normplot(x) %normality of dataset2 figure(5) normplot(y) %question 3 %estimate coefficients x1=xlsread('bioengproject.xlsx','sheet2','C3:C29'); x2=xlsread('bioengproject.xlsx','sheet2','D3:D29'); x3=xlsread('bioengproject.xlsx','sheet2','E3:E29'); y=xlsread('bioengproject.xlsx','sheet2','F3:F29'); x=x1 x2 x3; Model.Residuals=fitlm(x,y,'quadratic'); disp(Model.Residuals) %normality of residuals %residuals vs predicted value plot Model.Residuals Res=table2array(Model.Residuals) Rawresi=Res(:,1) figure(1) normplot(Rawresi) figure(2) fit_y=predict(Model.Residuals,x) title('normal probability plot') figure(2) scatter(fit_y,Rawresi,'filled'); refline(0,0) xlabel('predicted yield') ylabel('residuals'); title('predicted yield against residuals') %question 4 %estimate coefficient x0=x1 x3 model2=fitlm(x0,y,'quadratic') disp(model2) %actual vs predicted value plot figure(3) scatter(y,fit_y,'filled') refline(1,0) xlabel('actual y values') ylabel('fitted y values') title('actual v fitted') %3d plot Dataset = xlsread('BioEngProject.xlsx','Sheet2','B3:F29'); x1 = Dataset(:,2); x2 = Dataset(:,3); x3 = Dataset(:,4); y = Dataset(:,5); x = x1 x2 x3; Model = fitlm(x,y,'y~1+x1:x2+x1:x3+x2:x3+x1^2+x2^2+x3^2-x2-x2:x3-x1:x2-x2^2'); disp(Model) Model.Residuals Res = table2array(Model.Residuals); RawRes = Res(:,1); figure normplot(RawRes); fit_y=predict(Model,x); figure scatter(fit_y, RawRes,'filled'); refline(0,0) xlabel('Predicted yield') ylabel('Residuals') title('Predicted yield against Residuals') figure scatter(y,fit_y,'filled') refline(1,0) xlabel('Actual yield') ylabel('Predicted yield') title('Actual V Predicted yield') CI2 = coefCI(Model,0.05); disp(CI2) figure scatter3(x1,x3,y,'filled','k') hold on x1fit=linspace(min(x1),max(x1),100); x3fit=linspace(min(x3),max(x3),100); X1FIT,X3FIT=meshgrid(x1fit,x3fit); b0=-2195.1; b1=509.5; b2=56.516; b3=3.0208; b4=-37.445; b5=-2.1398; YFIT=b0+b1*X1FIT+b2*X3FIT+b3*X1FIT.*X3FIT+b4*(X1FIT.^2)+b5*(X3FIT.^2); mesh(X1FIT,X3FIT,YFIT) xlabel('Temperature level') ylabel('ph levels') zlabel('Yield') title('estimated value and real values of EPA yield') a,b=max(YFIT(:)); B=(X1FIT(:)); C=(X3FIT(:)); pHvalue1=B(b); Temp=C(b); disp(pHvalue1) disp(Temp) disp(a)       1      Young M. Technical Writer's Handbook     V.         References 1  Health Knowledge. (2017). Parametric and Non-parametric tests for comparing two or more groups. online Available at: https://www.healthknowledge.org.uk/public-health-textbook/research-methods/1b-statistical-methods/parametric-nonparametric-tests 2  Sphweb.bumc.bu.edu. (2017). When to Use a Nonparametric Test. online Available at: http://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_nonparametric/bs704_nonparametric2.html    3 Health Knowledge. (2017). Parametric and Non-parametric tests for comparing two or more 4Statstutor.ac.uk. (2017).  http://www.statstutor.ac.uk/resourckguidetostatistics.pdf 5 Itl.nist.gov. (2017). 7.2. Comparisons based on data from one process. online Available at: http://www.itl.nist.gov/div898/handbook/prc/section2/prc2.htm Accessed 6 Reddy, C. (2017). Advantages and Disadvantages of Parametric Tests - WiseStep. online WiseStep. Available at: https://content.wisestep.com/advantages-disadvantages-parametric-tests/ Accessed 12 Dec. 2017. 7 Itl.nist.gov. (2017). 7.2. Comparisons based on data from one process. online Available at: http://www.itl.nist.gov/div898/handbook/prc/section2/prc2.htm                    1 Health Knowledge. (2017). Parametric and Non-parametric tests for comparing two or more groups. online Available at: https://www.healthknowledge.org.uk/public-health-textbook/research-methods/1b-statistical-methods/parametric-nonparametric-tests 2 3 4 5 6 7 Inserted: M Deleted:m Inserted: o Inserted:  a Deleted:i Inserted: tion Inserted: is Deleted: tion Deleted:are Inserted: , Inserted: as Deleted:ere Deleted:es Inserted:  a Deleted:, Inserted: - Inserted: A n Deleted:N Inserted: r Inserted: - Inserted: - Inserted: , Deleted:t i Inserted: is Deleted:are Inserted: s Inserted: - Inserted: r Inserted: f Deleted:f Inserted: is Deleted:are Inserted: Th Deleted:H