Bio::Phylo

View the Project on GitHub

Examples

Input data

Integrating the data sets

As a first step, we do an inner join of the two data sets, so that the merger only includes the taxa that are both present in the tree as well as the database. From the database, we embed one data column, which we log-transform (because the default is body mass) and transform to color values along the spectrum (i.e. the lowest observed value is red, the highest violet). In addition, we annotate the tree such that monophyletic genera receive a clade label on their MRCA node. We do all this by executing this script, like so:

perl binindaXpantheria.pl \
	--tree=Bininda-emonds_2007_mammals.nex \
	--data=PanTHERIA_1-0_WR93_Aug2008.tsv \
	--names=MSW93_Binomial \
	--column='5-1_AdultBodyMass_g' \
> tree.xml

All the arguments shown here are the default values that are also embedded in the script. The --names argument specifies which column in the PanTHERIA database contains the taxon names that should match those in the tree. The --column argument specifies which trait to plot on the tree. Hence, you can experiment with other traits besides the example given here (which is body mass), but keep in mind that the script log-transforms the input values, which might make sense for body mass, but not necessarily for the other traits in the database (let me know if there needs to be a switch to turn the transformation on and off).

There is also an optional --verbose argument that can be used multiple times to increase the verbosity of the script. By default, only warnings and error messages are printed; by increasing this value, also informational messages and debugging messages can be printed. (It might be reassuring to do this because some of the steps take some time and this way you get some progress feedback.) The result of the script is normally written to STDOUT, so here we re-direct it to a file, which is in nexml format.

Visualizing the result

In the next step, we visualize the results as a radial phylogram with painted branches and braces to mark up the monophyletic genera. The drawer script is invoked as follows:

perl drawer.pl \
	--width=12000 \
	--height=12000 \
	--shape=radial \
	--nexml=tree.xml \
> tree.svg

Again, all the arguments shown here are the default values that are embedded in the script. The --width and --height values are in pixels. --shape specifies the tree shape, and --nexml the location of the file that we produced in the previous step. The output is written to STDOUT so we re-direct into the file tree.svg. In this SVG file, the taxon names have been made clickable, triggering a query to the Encyclopedia of Life. Because there is some potential for compatibility issues with SVG (not all browser and editors interpret and support the standard to the same extent) I also made a PDF version (by opening the SVG in Illustrator and saving to PDF).

Dependencies

The scripts are written in Perl, and require a number of packages that are freely available from the comprehensive Perl archive network. If you know what you are doing and you have a correctly configured system, the installation is as simple as issuing the command sudo cpanm Package::Name, where Package::Name is one of the packages below. (I’m afraid I can’t provide support for setting up your environment and installing dependencies. These are standard operations for which there is ample documentation online.) Required packages: