项目作者: jmw86069

项目描述 :
Jam MA-plots, volcano plots, other relevant genomics visualizations
高级语言: R
项目地址: git://github.com/jmw86069/jamma.git
创建时间: 2017-09-06T17:52:07Z
项目社区:https://github.com/jmw86069/jamma

开源协议:

下载


jamma

The goal of jamma is to create MA-plots with several useful and powerful
capabilities that are intended to provide a more thorough understanding
of the data.

The main function provided is jammaplot(). It is distinct from similar
MA-plot functions in that it uses smooth scatter by default, and in fact
inspired the creation of a custom smooth scatter function provided by
jamba::plotSmoothScatter().

Package Reference

A full online function reference is available via the pkgdown
documentation:

Full jamma command reference

Example MA-plot

A reasonable example MA-plot can be created using data from the
affydata package, if installed.

  1. library(jamma);
  2. library(jamba);
  3. if (suppressPackageStartupMessages(require(affydata))) {
  4. data(Dilution);
  5. edata <- log2(1+Biobase::exprs(Dilution));
  6. jammaplot(edata);
  7. }

What is a smooth scatter plot, and why is it important for MA-plots?

MA-plots are typically created for gene expression data, historically
used for microarray data, which contains tens of thousands of rows. Most
MA-plot tools combat the number of points either by displaying single
pixel points (pch=“.” in R base plotting), or adding transparency.

A secondary issue is that these plots take a while to render when
drawing individual points. This effect is amplified when running on a
remote server, since each individual point is transmitted over the
network for rendering. Also when saving a figure, certain file types
save each point as an object, making the file size surprisingly large.
If the file is printed to paper (ha!) the printer can take a long time
to prepare the image for printing. And the volume of data is not
currently getting smaller with new technologies.

First, we show the same MA-plot using single pixel points:

  1. if (exists("edata")) {
  2. jammaplot(edata[,2:3], ylim=c(-1.5,1.5), titleCexFactor=0.8,
  3. smoothScatterFunc=function(x, col="navy", ...){plot(x=x, pch=".",col="#000077",...)},
  4. maintitle="plot(pch='.')");
  5. }

The overall range of points is clearly shown, but the density of points
is not clear from that plot. Adding alpha transparency helps somewhat:

  1. if (exists("edata")) {
  2. jammaplot(edata[,2:3], ylim=c(-1.5,1.5), titleCexFactor=0.8,
  3. smoothScatterFunc=function(x, col="navy", ...){plot(x=x, pch=".",col="#00007711",...)},
  4. maintitle="plot(pch='.', alpha=0.07)");
  5. }

The transparency helps visualize the massive number of points in the
middle, but now has made all the fun outlier points almost invisible.
The typical next step in R is to use smoothScatter(), shown below using
its default color ramp:

  1. if (exists("edata")) {
  2. jammaplot(edata[,2:3],
  3. xlim=c(6, 14),
  4. ylim=c(-1.5,1.5),
  5. titleCexFactor=0.8,
  6. smoothScatterFunc=function(colramp,...){
  7. smoothScatter(...,colramp=jamba::getColorRamp(colramp, n=NULL))},
  8. colramp="Blues",
  9. maintitle="smoothScatter()");
  10. }

Again, the visualization is improved, but the default “Blues” color ramp
(credit Brewer colors from RColorBrewer) could perhaps be improved.

  1. if (exists("edata")) {
  2. jammaplot(edata[,2:3],
  3. xlim=c(6, 14),
  4. ylim=c(-1.5,1.5),
  5. titleCexFactor=0.8,
  6. smoothScatterFunc=function(colramp,...){
  7. smoothScatter(...,colramp=jamba::getColorRamp(colramp, n=NULL))},
  8. maintitle="smoothScatter()");
  9. }

Now the figure depicts the full range of data, while also conveying the
truly massive number of points in the central region. Only two smaller
issues remain.

First, not visible here, the underyling data is plotted using tiny
rectangles. For the reasons described above, a large number of
rectangles can be problematic when saving as a vector image (PDF, SVG),
when printing on paper, or when rendering the figure across a remote
network connection. The solution is to use a rasterized image, instead
of individual rectangles, which can be compressed and resized.

Second, the pixel size used for the point density is flattened
horizontally, because the default density function uses the range of
data, and not the plot visible range. When the density function is
applied to plot coordinates, there is often some distortion. Visually
small effect, but when there are 20 panels onscreen, the inconsistency
becomes much more obvious.

The plotSmoothScatter function resolves both the issues described, with
some enhancements. It uses a density function based upon plot space, but
also adds detail, so smaller features are less blurry.

  1. if (exists("edata")) {
  2. jammaplot(edata[,2:3],
  3. ylim=c(-1.5,1.5),
  4. titleCexFactor=0.8,
  5. maintitle="plotSmoothScatter()");
  6. }

It looks like a small effect here, but the density around single points
is now circular. When rendering a density map of plotted data points, it
should represent the true density of points as accurately as possible.

To demonstrate some other color effects, the plotSmoothScatter function
also fills the complete plot panel with the correct background color,
which is not done by smoothScatter().

  1. if (exists("edata")) {
  2. par("mfrow"=c(2,2));
  3. jammaplot(edata[,2:3],
  4. xlim=c(6, 14),
  5. ylim=c(-2,2),
  6. titleCexFactor=0.8,
  7. colramp="viridis",
  8. doPar=FALSE,
  9. smoothScatterFunc=function(colramp,...){
  10. smoothScatter(...,colramp=jamba::getColorRamp(colramp, n=NULL))},
  11. maintitle="smoothScatter(colramp='viridis')");
  12. jammaplot(edata[,2:3], ylim=c(-2,2), titleCexFactor=0.8,
  13. colramp="viridis", doPar=FALSE,
  14. maintitle="plotSmoothScatter(colramp='viridis')");
  15. }