项目作者: odeibarredo

项目描述 :
Statistical analysis comparing metal accumulation levels in three macroinvertebrate groups
高级语言: R
项目地址: git://github.com/odeibarredo/Statistics_Group-Comparisons.git


STATISTICS - GROUP COMPARISONS

The aim of this repository is to show the statistical workflow when comparing groups of data in order to prove if there is significant difference between them.
As data I provide some borrowed from my master thesis) (in spanish), which consisted in a first approach to establishing background tissue concentration in macroinvertebrates of rivers from mining areas of northern Spain. This time we will only perform a statistical analysis to compare three different taxa to set if their metal levels really differ between them. The three taxa are: Heptageniidae (scraper), Hydropsychidae (collector-filterer) and Rhyacophilidae (predator).

alt text

WORKFLOW

alt text

DESCRIPTIVE ANALYSIS

The first step is always to take a quick view of the basic features of the data, perfoming some descriptive analysis.

  1. > summary(mydata) # quick first look at global data
  2. Taxa As Se Cd Hg
  3. Heptageniidae :27 Min. :0.1100 Min. : 0.500 Min. :0.010 Min. :0.0300
  4. Hydropsychidae:23 1st Qu.:0.6325 1st Qu.: 1.930 1st Qu.:0.080 1st Qu.:0.0700
  5. Rhyacophilidae:20 Median :1.0900 Median : 3.605 Median :0.255 Median :0.1000
  6. Mean :1.9741 Mean : 4.961 Mean :0.622 Mean :0.1359
  7. 3rd Qu.:2.9775 3rd Qu.: 6.478 3rd Qu.:0.865 3rd Qu.:0.1500
  8. Max. :7.2300 Max. :20.280 Max. :2.820 Max. :0.5200

alt text

At first look we could think that there may be signifficant differences in the next cases:

  • As -> the three taxas
  • Se -> Hepta vs Hydro and Rhya
  • Cd -> Hepta vs Hydro and Rhya
  • Hg -> none

In the next step we will prove if our hypothesis is true or not.

STATISTICAL ANALYSIS

NORMAL DISTRIBUTION

To know which statistical test we need to use we have to know if our data follows a normal distribution or not. For this case we’ll use three approaches:

This are the results:

  1. --------------------------------------------------
  2. Taxa Metal Shapiro.Wilk Skewness
  3. ---------------- ------- -------------- ----------
  4. Heptageniidae As 0.292 0.23
  5. Heptageniidae Se 0.216 0.77
  6. Heptageniidae Cd 0.456 0.3
  7. Heptageniidae Hg *0.009* 1.54
  8. Hydropsychidae As *0.004* 0.68
  9. Hydropsychidae Se 0.334 0.76
  10. Hydropsychidae Cd 0.106 **2.6**
  11. Hydropsychidae Hg *0* 1.74
  12. Rhyacophilidae As *0.001* 0.28
  13. Rhyacophilidae Se *0* 0.33
  14. Rhyacophilidae Cd *0* 1.81
  15. Rhyacophilidae Hg *0.001* 1.53
  16. --------------------------------------------------
  17. Table: *p<0.01; **-2<p>2
  18. ---------------------
  19. Metal Levene.Test
  20. ------- -------------
  21. As *0*
  22. Se *0*
  23. Cd *0*
  24. Hg 0.154
  25. ---------------------
  26. Table: *p<0.01

As we can see, some of the varaibles adjust to normal distribution and others don’t. To better understand the difference we’ll create some visualization for each one of the tests. For Shapiro-Wilk we can perform a QQ plot, which is commonly used to detect deviations from the normal distribution.

alt text

Cd in Heptageniidae fits the QQplot, Cd in Rhyacophilidae doesn’t.

We can visualize the skweness with histograms.

alt text

In the case of Hydropsychidae the density line is skewed to the left.

Finally to better understand the Levene’s test we can build some boxplots and pay attention to the whiskers.

alt text

Heptageniidae shows a lot of variance in all metals except in Hg, that’s why Levene fails for the first three metals.

NOTE:
One important thing to consider is the amount of samples that we are working with, less than 30 for each metal and taxa. The smallest the population the harder the probability of having normal ditribution. Outliers can have a big impact.

DATA TRANSFORMATION AND SECOND ROUND CHECKING DISTRIBUTION

None of the metals pass the three test we stablished on all taxa. Transforming the data we may achieve normal distribution, so that’s
the next step. Take note though, that in Cd in Hydropsychidae showed skewness, a parameter that does not change even if the data is
transformed
,so we already know that the data referring to Cd must be treated with non-parametric test.

After a logarithm transformation this are the results for SW and Levene:

  1. ---------------------------------------
  2. Taxa Metal Shapiro.Wilk
  3. ---------------- ------- --------------
  4. Heptageniidae As *0*
  5. Heptageniidae Se 0.912
  6. Heptageniidae Cd 0.449
  7. Heptageniidae Hg 0.18
  8. Hydropsychidae As 0.243
  9. Hydropsychidae Se 0.383
  10. Hydropsychidae Cd *0.002*
  11. Hydropsychidae Hg 0.162
  12. Rhyacophilidae As 0.959
  13. Rhyacophilidae Se 0.315
  14. Rhyacophilidae Cd 0.815
  15. Rhyacophilidae Hg 0.365
  16. ---------------------------------------
  17. Table: *p<0.01
  18. ---------------------
  19. Metal Levene.Test
  20. ------- -------------
  21. As 0.609
  22. Se 0.597
  23. Cd 0.551
  24. Hg 0.919
  25. ---------------------
  26. Table: *p<0.01

So now, all cases pass the Levene’s Test, and only two cases do not pass the SW -> As and Cd. This means that we can perform parametric test with the transformed data for Se and Hg, and non-parametric test for As and Cd with original data.

SIGNIFFICANT DIFFERENCES?

PARAMETRIC TESTS

First we run a global test to see if there are signfficant differences between groups, in this case an ANOVA. If the the p value is <0.05 there are differences, and we have to perform a paired comparison as a post-hoc analysis, in this case Bonferroni.

Results:

  1. ---------------
  2. Metal Anova
  3. ------- -------
  4. Se *0*
  5. Hg 0.057
  6. ---------------
  7. Table: *p<0.05
  8. --------------------------------------------------
  9. Comparisons Se Hg
  10. -------------------------------- --------- -------
  11. Heptageniidae - Hydropsychidae *0* 0.068
  12. Heptageniidae - Rhyacophilidae *0.004* 0.28
  13. Hydropsychidae - Rhyacophilidae *0.001* 1
  14. --------------------------------------------------
  15. Table: *p<0.05
NON-PARAMETRIC TESTS

As non-parametric tests we’ll use Kruskal-Wallis as global comparation and a Dunn’s Test as post hoc paired comparison.

Results:

  1. ------------------------
  2. Metal Kruskal.Wallis
  3. ------- ----------------
  4. As *0*
  5. Cd *0*
  6. ------------------------
  7. Table: *p<0.05
  8. --------------------------------------------------
  9. Comparisons As Cd
  10. -------------------------------- --------- -------
  11. Heptageniidae - Hydropsychidae *0* *0*
  12. Heptageniidae - Rhyacophilidae *0* *0*
  13. Hydropsychidae - Rhyacophilidae *0.001* 0.079
  14. --------------------------------------------------
  15. Table: *p<0.05

So now we know which cases have signifficant differences we can visualize the boxplots from before with annotations of this cases

alt text

PCA

As a final step we can check the correlation between the metals and the taxa with a PCA

alt text

So most influencial metals in taxa differentiation are As and Se; Cd has a bit of differentiation power towards Heptageniidae; Hg shows same accumulation levels in three taxa.

THE END!!!