The phylogeny of insects has been both extensively studied and vigorously debated for over a century. A relatively accurate deep phylogeny had been produced by 1904. It was not substantially improved in topology until recently when phylogenomics settled many long-standing controversies. Intervening advances came instead through methodological improvement. Early molecular phylogenetic studies (1985-2005), dominated by a few genes, provided datasets that were too small to resolve controversial phylogenetic problems. Adding to the lack of consensus, this period was characterized by a polarization of philosophies, with individuals belonging to either parsimony or maximum-likelihood camps; each largely ignoring the insights of the other. The result was an unfortunate detour in which the few perceived phylogenetic revolutions published by both sides of the philosophical divide were probably errors. The size of datasets has been growing exponentially since the mid 1980s accompanied by a wave of confidence that all relationships will soon be known. However, large datasets create new challenges, and a large number of genes does not guarantee reliable results. If history is a guide, the quality of conclusions will be determined by an improved understanding of both molecular and morphological evolution, and not simply the number of genes analyzed.
Insects, morphology, parsimony, cladistics, likelihood, phylogenomics
Kjer and Beutel wrote about their respective areas of expertise. Simon provided comments, fact-checking, and additional historical and methodological insights. Yavorskaya interpreted and summarized Russian works.
We like to think of scientific research as insulated from human bias and personality. Like other fields of science, phylogenetics follows trends as ideas are rejected or accepted, influenced by new information. However, collective consensus comes not just from a series of technological advances and discoveries, but from human interactions as well. New ideas are often rejected for years, even if they are supported by strong evidence. These are exciting times for evolutionary biologists as new technologies give us hope that the resolution of the tree of life is within sight. But times have been exciting for decades and this optimistic sentiment has arisen with every new technology. It was only 25 years ago that phylogenetic trees generated with a few hundred nucleotides were considered revolutionary, just as the application of cladistic1 principles with a defined methodology was revolutionary a decade before that. With the large datasets we have today, some previously intractable questions now appear solved. The authors of this work have witnessed many of these changes, and we present our insights on this history, recognizing that others may remember things differently. We focus this review on the relationships among insect orders, missing many fine works on arthropod phylogeny, and intra-ordinal studies. We attempt to maintain a rough chronological order, considering three main periods: Morphological phylogenetics, when morphology was the only source of data (roughly before 1990); the Sanger2 sequencing period, where a few genes dominated most studies (roughly 1990-2005); and the current state of the art with datasets so large that traditional ways of analysing them are no longer feasible. New challenges will doubtless arise in the age of big data, but at least we can look back at previous trends with hindsight in order to learn from history.
The numerous names of orders and other higher-level taxa1 for a group as diverse as insects pose a significant challenge to the non-entomologist reader. Common names like “angel insect” or “gladiators” are often as obscure as the scientific ones in Latin or Greek. For this review, we direct the reader to the figures for the common names of the orders and to Table 1 for a translation of super-ordinal names. We focus especially on four controversial deep-branching taxa: Entognatha, Palaeoptera, Polyneoptera, and Holometabola. The controversy arises from persistent conflicting evidence that suggests contradictory groups. The entognathous hexapods with internalized mouthparts include mostly tiny, wingless, litter-dwelling species that appear very early in the fossil record. The palaeopteran insects comprise mayflies, dragonflies and damselflies, characterized by wings that cannot be folded. Polyneoptera is the name given to a diverse group of insects, such as grasshoppers and close relatives, walking sticks, roaches, mantids, earwigs, stoneflies and some other groups, usually but not always characterized by leathery forewings. The holometabolous orders exhibit complete metamorphosis where the larva undergoes an amazing reorganization of the body during the pupal stage before it changes into the winged adult form. A confusing convention for the non-entomologist is the inconsistent use of the names Hexapoda and Insecta. Hexapoda (insects in the widest sense), comprise all six-legged arthropods, including the 3 entognathous orders, whereas Insecta excludes the entognathous orders (Table 1). Phylogenetic terms that many readers might not be familiar with are defined in numbered boxes, which are referenced by superscripts at the first several usages of the term, referring to the box containing the definition.
Pre-Hennigian concepts in insect taxonomy and phylogeny
The roots of insect systematics go back to the 16th, 17th and 18th centuries. Important pioneers of entomology were the Italian naturalist Ulisse Aldrovandi (1522-1605), the Dutch doctor and microscopist Jan Swammerdam (1637-1680), and the German naturalist August Johann Rösel von Rosenhof (1705-1759) (1) (2). In the middle of the 18th century, the Swedish botanist Carolus Linnaeus (1707-1778) described more than 10,000 species in his “Systema Naturae” (3), including over 2,000 insects. His ordinal names refer to the characteristics of the wings, e.g., Heteroptera (heterogeneous forewing), Hymenoptera (membranous forewing), and Coleoptera (sheath-like forewing). Although his views evolved, Linnaeus was an essentialist in his early works, embracing the – at that time – commonly held belief that organisms were given an “essence” by the Creator, which could be slightly modified but never fundamentally changed. Linnaeus’ system remains useful to this day because it was based on characters that, unknown to him, are heritable and hierarchically organized through evolution. The Danish entomologist Johann Christian Fabricius (1745-1808) described 9776 insect species. Unlike his mentor Linnaeus, he emphasized the importance of mouthparts and the potential usefulness of genitalia (4). Another prominent entomologist of the era was Pierre André Latreille (1762-1833). In his major work (5) he outlined insect families for the first time and used a broad spectrum of characters for classification based on phylogeny (2). Together with explicit criteria for homology, this was an important step towards an evolutionary concept of classification.
The evolutionary theory developed by Charles Darwin and Alfred Russell Wallace (6)(7) laid a new foundation for classifying organisms, but had limited immediate impact on insect systematics (1). Ernst Haeckel (1834–1919), an energetic promoter of Darwin’s ideas in Germany, dealt with insects among many other groups. His classification included five “legions” based on how insects feed (8); we today see that it only partly reflected phylogenetic relationships. However, Haeckel presented the first explicit phylogenetic tree of insects (Haeckel 1896 (8): p. 710). In 1904 (9), a remarkable study covering the entire Hexapoda was published by Carl Börner (1880-1953). Börner was a specialist on grape phylloxera (Daktulosphaira vitifoliae), an almost microscopic aphid-like insect that is a major pest of grapes. He was also a collector of springtails (Collembola), small hexapods that are common in leaf litter. As a young scientific assistant, he discussed cephalic structures in great detail. He focused on the hypopharynx, a central element of insect mouthparts and one of the most difficult character systems to explore. Even though his approach lacked a repeatable methodology, his phylogenetic tree (Fig. 1) comes close to concepts developed decades later. Naturally, since our current cladistic1 concept of reserving names for monophyletic1 groups (10)(11) was not developed until the 1950s and 60s, Börner’s classification is partly inconsistent with the branching pattern shown in the tree. For instance, he placed the phenotypically similar Archaeognatha and Zygentoma in the Order Thysanura. (Fig. 1; See table 1 here, and throughout, for a definition of taxon1 names).
Monophyletic group: A group of organisms that is defined by a most recent common ancestor, and all of its descendants. Also known as a clade.
Cladistics: An approach in systematics that bases all classification on “clades” (i.e., monophyletic groups). Cladistics was developed by Hennig and insists that all named groups (taxa) be monophyletic, as evidenced by shared derived characters (“synapomorphies”). After Hennig’s death, a group of cladists started using the term to refer to a set of numerical analytical procedures which aim to reconstruct a phylogeny based on character state matrices and parsimony2. “Cladistics” can mean two very different things.
Taxon (pl. taxa): A named group of organisms. Under cladistic principles, all taxa are monophyletic.
Sister group: The most closely related taxon to a group of interest.
Ingroup: A taxon under investigation. For this review, the ingroup is Hexapoda.
Outgroup: A taxon outside the group under study. For this review, the outgroups could be any non-hexapod, but the best would be other arthropods.
Character polarization: Determination of the evolutionary direction of a character, which determines whether a character state is ancestral (plesiomorphic) or newly derived (apomorphic).
A highly productive North American entomologist of the early 20th century was G.C. Crampton (12)(13), who’s phylogenetic tree from 1938 (14) was another hypothesis that came remarkably close to modern concepts (see Fig. 1. in Engel & Kristensen (2)). Important works were published by Imms (15) Snodgrass (16), Weber (17)(18), and also Handlirsch, who was frequently cited in Hennig’s later work (11) (see Friedrich et al. (19)). Handlirsch attempted a classification reflecting phylogeny, but believed that a purely phylogenetic system was not possible (11). Even in studies published posthumously in 1937 (20) and 1939 (21) Handlirsch considered the extinct winged Palaeodictyoptera as the ancestors not only of Pterygota but of all other insects including the wingless (apterygote) orders (11).
Willi Hennig (1913-1976) revolutionized systematics and classification (22) in the last century with his theoretical work, offering clear and repeatable methodology. Works published prior to Henning are often referred to as “intuitive”. This is perhaps unfairly pejorative when you consider that their remarkably accurate phylogenetic insights were often based on expertise gained through meticulous observation, rather than intuitive hunches. However, before Hennig’s methods were widely adopted, systematists would postulate relationships based on shared characters that they deemed particularly important. In this respect, phylogenies could be considered as imparted wisdom, rather than science. Hennig’s method involved distinguishing ancestral (plesiomorphic) and derived (apomorphic) features. He also developed a more precise concept of monophyly1, under which no descendants of the most recent common ancestor could be excluded from a named group (clade). Hennig reconstructed phylogenies with an iterative, stepwise approach. Using putative shared-derived character states (synapomorphies), he successively established sistergroup1 relationships. Distinguishing ancestral from derived character states required the definition of a taxon1 outside the group of interest for comparison (an outgroup1). The outgroup method was introduced as a formal procedure in the early 1980s (23)(24) even though it had already implicitly been used by Hennig (2). Hennig’s phylogeny (11), published in 1969, is widely considered to be the starting point of modern insect phylogenetics (Fig. 2). Despite his precise methodology, his hypotheses were quite similar to earlier trees. They changed with time as can be seen by comparing the phylogenetic concept presented in Hennig’s 1969 phylogeny (Fig. 2)(11) with his earlier work (10).
Hennig’s “Phylogenetische Systematik” (22), was not a completely new concept when it was published in 1950. The botanist Zimmermann developed similar ideas in the 1930s, and Sturtevant used a very similar approach in his taxonomic studies of fruit flies (Drosophilidae; e.g., (25)). Moreover, it is apparent that ideas similar to Hennig’s were implicitly used before his methods were formalized. It is impossible to consider Börner’s phylogeny without recognizing an approach that went beyond intuition. Aside from the primacy of synapomorphies, a major point of Hennig’s concept was that classification should be strictly linked to phylogeny. The requirement that named taxa be monophyletic1 originated with Hennig, but the unique value of synapomorphies was loosely recognized by systematists earlier. Herbert Ross (1908-1978)(26), for instance, was polarizing1 characters relative to a hypothetical ancestor in 1937, and he indicated derived states with marks on the internodes2 of his insect phylogeny in 1955 (27). The phylogeny in his 1965 textbook (28) is almost as close to current concepts as morphology has ever gotten. However, as advocated in general by Ernst Mayr (29)(see also Nelson’s reply (30)), Ross gave names to paraphyletic groups (groups that do not include all descendants of the deepest ancestor). If your concept of “dinosaur” does not include birds, then you accept paraphyletic taxa too. Systematists today consider, for example, birds to be a subgroup of Sauropsida, a clade that also contains dinosaurs and extant reptiles such as turtles, lizards and crocodiles. Mayr and followers understood that birds had been derived from a paraphyletic assemblage of reptiles, but still found “Reptilia” to be a useful term representing a different evolutionary level, just as we sometimes use “apterygotes” as a name for the ancestrally wingless hexapods, even though we understand that they are not a monophyletic group. Generally, when systematists put a name in quotes, it is to indicate that they understand it to be a paraphyletic group, and are waiting for the term to fade into disuse.
A remarkable study was published by the Argentinian entomologist Alvaro Wille (31) in 1960. Although he distinguished “primitive” from “specialized” or “unusual” features, he also characterized groups by a mixture of plesiomorphies and apomorphies. The major clades on his tree, however, were characterized by evolutionary innovations (Fig. 3). Another important work of the time was Hinton’s 1958 review (32). Hinton made some bold statements that appear untenable today, such as “the polyphyletic nature of the old groups Myriapoda and Hexapoda”, but his evaluation of morphological characters, taken largely from the head, including a detailed scrutiny of larval muscles, helped elucidate the evolution of Holometabola.
Gerhard Mickoleit, who graduated under the insect morphologist Hermann Weber at the University of Tübingen and attended seminars given by Hennig in the early 1970s, investigated several groups of insects, with a main focus on genital structures, especially the ovipositor. This included thrips (33), Neuropterida (lace wings and close relatives), beetles (34), and fleas, flies and scorpion flies (35)(36)(37)(38). In 1973 (34), he provided specific evidence for the first time for a close relationship between neuropteroids and beetles.
Geographic isolation and “parallel universes”
The importance of the contributions made by Russian entomologists and insect paleontologists is reflected by numerous citations by Hennig (11). Formal names for important higher ranking taxa such as Palaeoptera, Neoptera, and Polyneoptera were introduced by Russian scientists (39)(40). Moreover, Russian paleontologists, notably A. V. Martynov, B. B. Rohdendorf, V. V. Zherikin and A. G. Ponomarenko (41)(42) (reviewed in 2002 (43) and 2009 (1)) made immense contributions to the knowledge of fossil insects.
Through the entire 20th century, most Russian entomologists maintained a conservative approach, with traditional descriptions based on morphology, without formal cladistic1 phylogenetic character evaluations. International collaboration was partly impeded by linguistic barriers, but also by the isolation of the Soviet Union and the pseudoscientific Lysenkoism, an anti-genetic view that was politically favoured (44). The limited exchange and cooperation is still reflected by the strikingly different nomenclature for high ranking taxa, such as Scarabaeona for Pterygota, Scarabaeiformes for Holometabola, and Scarabaeoidea for Coleoptera (4)(43).
A prominent and highly efficient Russian entomologist of the 19th century was Victor I. Motschulsky, who published numerous works on biogeographic, faunistic, or systematic aspects of entomology, most of them on beetles (45). Georgij G. Jacobson had a crucial impact on the development of Russian systematics in the early 20th century. He is best known as the author of the 1905 magisterial “Beetles of Russia, Western Europe and neighboring countries” (46), including an impressive catalogue with keys for the identification of all known Eurasian genera. More recently, the palaeoentomologist Alexandr P. Rasnitsyn described approximately 250 new genera and over 800 new species of fossil insects. He suggested a sistergroup1 relationship between Hymenoptera (sawflies, bees, wasps, and ants) and the remaining Holometabola (43) before this was established with formal analyses of extensive morphological or molecular data sets (47)(48)(49)(50) (see Ronquist’s and others reanalysis (51)). Rasnitsyn (43) suggested that insect flight originated from gliding (52). Phylistics, his alternative approach to cladistics, as discussed by Brothers (53) explicitly accepts paraphyletic groups (e.g., †Caloneurida (44))
Before the internet, geography and language also played a role in isolating phylogenetic communities from Europe, America, and East Asia. For example, a profound treatment of insect morphology was presented by René Jeannel in 1949 (54), but has rarely been used outside of the French community. Hennig’s work was unknown to most Americans until it was translated into English in 1966 (55). Although systematists were aware of work in other countries, the Meetings on Insect Phylogeny in Dresden played a major role in fostering collaborations between workers from different parts of the world, although they have not seen significant Russian or Latin American participation. These meetings were organized for the first time in 2003 by Klaus-Dieter Klass and Niels Peder Kristensen (56). Most members of the “1000 insect transcriptome evolution” (www.1KITE.org) initiative first became acquainted at these meetings. The 1KITE team created our current best estimate of insect ordinal phylogeny (Fig. 5) with the largest dataset assembled to date. Europe was an ideal meeting place because the particular brand of cladistic fervor in America that was characterized by name-calling and personal insults was less pronounced there. The Europeans absorbed new ideas quickly, and went about their business in developing new centers of insect phylogenetics based on emerging techniques in morphology, and the refinement of model-based molecular phylogenetics (reviewed in (57)).
The classical tradition of insect morphology and phylogeny was upheld on a high level by Niels P. Kristensen (1943-2014) of the Zoologisk Museum in Copenhagen. He published outstanding morphological treatments of lepidopteran key taxa (58)(59)(60)(61), profound reviews of insect phylogeny (62)(63)(64), and landmark volumes on systematics and morphology of Lepidoptera in the Handbook of Zoology series (65)(66)(67)(68). Even though he never performed computer-assisted analyses, his critical contributions helped refine character interpretations and pointed out problematic phylogenetic issues. A characteristic feature of Kristensen’s approach was a deep-rooted skepticism, reflected by largely or completely unresolved parts of his phylogenetic trees. His display of polyneopteran relationships (63) became known as “Kristensen’s comb”, and if polytomy is preferable to error, then Kristensen’s phylogeny was not bested until genomic resources were brought to bear. However, Kristensen remained skeptical even after the publication of large transcriptomic4 works (49)(50) (pers. comm. to RGB).
Controversy was common in morphology-based insect phylogenetics and results were strongly affected by the selection of characters, before very large and well documented data sets emerged in the 21st century. Boudreaux’s 1979 book (69) on arthropod phylogeny, though criticized by some (63)(70)(71)(43), was cited frequently by others (72)(73)(74)(75)(76)(77). Even though Boudreaux adopted the methods of phylogenetic systematics, according to Kristensen (78) his interpretations often differed from those of Hennig (10)(11). As in the case of the controversial Zoraptera (79), phylogenetic conclusions were often based on characters that were ancestral, ill-defined or homplasious. Jarmila Kukalová-Peck published a summarizing account of insect paleontology (80), numerous specific studies on extant and extinct insects (81)(82)(83), and comprehensive analyses of characters of the wing base and wing venation (84)(85). She advocated the origin of wings from gill-like appendages (86)(87) and proposed a clade Cercophora (Diplura + Insecta) for the first time (87). Her groundplan approach challenged standard cladistic procedures (88) and was criticized by some authors (e.g., 82). Her phylogenetic hypotheses, usually based on wing characters, yielded some results inconsistent with earlier (11) and most recent concepts (50).
As in earlier attempts to classify insects (like Haeckel 1896), studies based entirely on wing venation (84) show the weakness of limited character systems, especially when strong functional constraints drive convergent evolution. Nevertheless in depth studies of specific body parts, organs, or developmental stages can yield important insights. Examples are the circulatory system investigated by Günther Pass (90)(91), the female genitalia of polyneopteran groups studied by Klaus Klass (92)(93), and embryology, with important contributions made by Ryuichiro Machida and others (94)(95). Throughout the first decade of this century, it was more common in presentations to see these characters mapped onto molecular phylogenies than to have explicit, data-matrix-based phylogenies constructed from these systems. This is understandable, given the recognition that subsets of characters were only part of the whole picture, the general lack of coordination in taxon sampling, and the enormous effort involved in constructing a unified combined data matrix.
Classical Hennigian studies relied on detailed anatomical information obtained for few selected taxa with informal character discussions without data matrices (35)(36)(37)(62)(63)(96)(97)(98). These studies treated all taxa within larger groups as a single hypothetical ancestor, reducing characters to reconstructed groundplan states. Modern computer-based analysis is better suited to entering characters of individual representative species into data matrices. Even so, earlier computer-based studies extracted data from the literature (e.g. (62)) and coded entire orders with identical groundplan states (72)(74)(77). This was almost inevitable because thorough anatomical studies using microtome sectioning of a single species (e.g. (99)) could take years. In the early 2000s, new technologies such as micro-computed tomography (µCT) and computer-based 3D reconstructions greatly accelerated the acquisition of high quality anatomical data (19). The coordinated efforts of international research teams, using both new and traditional methods (e.g. (100)(101)), have yielded matrices of hundreds of characters from different body parts and life stages. For example, a study of Holometabola (48) contained 365 well documented characters that corroborated current molecular phylogenies.
Insect morphology and cladistics
In the late 1970s and 1980s cladistics “evolved” as a transformed version of Hennigian phylogenetic systematics (e.g. (102)(103)), arguably linked with the development of suitable computers and software programs. The Hennigian method of searching for sister taxa required great care in polarizing each character. Polarity in this context refers to the assignment of character states as either ancestral or derived. However, character polarity is automatically determined based on outgroups1 (i.e., rooting2) with computer-based analysis (e.g. (104)). The first computer program capable of estimating phylogenies was Felsenstein’s PHYLIP in 1980. Mickevitch and Farris were developing their program “PHYSIS” near the same time and released it in 1982. It saw limited use, perhaps because of the $5000 price tag. Farris’ updated program Henning86 became available in 1989 (105), and Swofford’s PAUP (106) was released free of charge the same year. Along with new molecular data, Whiting et al. (77) presented an analysis of a morphological matrix-based analysis of most major groups in 1997, which was extended to include all hexapod orders by Wheeler et al. in 2001 (74). That same year, Beutel & Gorb (72) presented a matrix-based morphological analysis of the entire Hexapoda. These morphological phylogenies were largely consistent with earlier hypotheses (e.g.,(10)(11)(28)(31)(62)(63)(64)(78). Wheeler’s insect ordinal phylogeny (74) emphasized molecular data, but without the morphological data, their results were largely unresolved and implausible (107).
The dawn of molecular systematics in the early ’90s – molecular work in the Sanger days
A number of studies in the late 1980s explored animal phylogeny, including insects, using direct RNA sequencing of the nuclear small subunit ribosomal RNA gene (18S rRNA)(108)(109). Turbeville et al.’s 1991 work (110) used parsimony2, distance2, and other methods (109). Their distance analysis grouped the annelids with the mollusks as opposed to the previous assumption that annelids should group with the arthropods based on segmentation. They also recovered Pancrustacea, a group that unites the traditional crustaceans with hexapods. At the time, the Tracheata hypothesis (Myriapoda + Hexapoda) was heavily entrenched, and they suggested that the position of the crustaceans may have been the result of bias introduced by long-branch attraction, and the limited number of characters. Earlier work (108) also recovered Pancrustacea, and suggested that the annelids were distant from the arthropods. These works remind us to be careful of what we dismiss as “wrong”, because we now understand Pancrustacea to be strongly supported. Turbeville et al. (110) were aware of branch-length artifacts and alignment ambiguity, and made careful, if arbitrary decisions about data exclusion. Unlike some who followed them, they considered suboptimal trees to be worth discussing. However, given that few were impressed with confirming arthropod monophyly, and still fewer believed that crustaceans should group with Hexapoda, this study, as insightful as it was, did not become a model for future analyses.
Parsimony: A broad scientific principle that prefers simple over complex explanations. In a phylogenetic context, parsimony refers to preferring a tree with the fewest possible character state transformations. Thus, whenever possible, transformations are assumed to be shared among taxa and thus placed on internodes as synapomorphies, rather than as homoplasies.
Synapomorphy: A shared, derived character (feature) that can used as an argument for a group being monophyletic.
Homoplasy: Character state evolving more than once on a tree or changing back to its original state. Parsimony attempts to minimize homoplasy. Parsimony attempts to minimize homoplasy. Homoplasy creates phylogenetic noise (conflicting signals).
Distance analysis: Methods that reduce all character differences between pairs of taxa to a
single value, their pair-wise distance. Trees are then constructed by grouping the most similar taxa. Distance methods are criticized by cladists as being phenetic.
Phenetics: Organisms are classified based on overall similarity in their phenotype or appearance, rather than focussing on derived character states only.
Likelihood analysis: A statistical method of selecting among possible trees based on the probability of the data under a model of evolution.
Long branch attraction: A phenomenon that misleads phylogenetic reconstruction. On long branches, shared phylogenetic noise (homoplasy) accumulates and overrides the true phylogenetic signal on short internal branches of a phylogenetic tree.
Bootstraps: A subsampling of phylogenetic data which creates a number of pseudoreplicate datasets. These pseudoreplicates are then analyzed individually, and their results are summarized on a consensus tree in order to estimate conflicting signal and provide an assessment of support for individual clades. (Jackknifing is similar, but in jackknifing, the new pseudoreplicate datasets are generated by random deletions of columns of characters).
Branch support: Quantitative measures to assess confidence for particular clades in a phylogeny. Examples include bootstraps, jackknifing, posterior probabilities, and Bremer support. Congruence among independent datasets could also be considered as branch support, but is seldom quantitatively expressed.
Root: A hypothetical taxon assigned as the most recent common ancestor of all the taxa in a phylogeny. A root is used to assess the polarity of a phylogeny. Defining an outgroup also establishes a root.
Node: The point at which an ancestral lineage splits into two lineages in a phylogeny.
Internode: The lines in a branching diagram between nodes (internal branches on a tree). In a phylogeny, an internode represents an ancestral lineage. Synapomorphies occur on internodes. The longer an internode exists, the more chance for synapomorphies (either molecular, or morphological) to accumulate. Short internodes are generally the source of controversy, since they have a very low probability of accumulating informative substitions.
Substitution: An observed change in a character. For molecular data, substitutions are related to mutations, but since lethal mutations are seldom observable, substitutions are mutations that have survived the filter of selection.
Sanger sequencing: The dominant method of DNA sequencing during the 1980s-2005.
Restriction sites: Short unique motifs scattered throughout the genome which can be cut by certain restriction enzymes, yielding fragments that can be visualized on a gel.
A 1984 review of insect molecular systematics by Stewart Berlocher (111) focused on allozyme gel electrophoresis studies, with discussions of methods used at the time. Before the invention of the polymerase chain reaction (PCR) in 1985, direct sequencing of rRNA was possible, but rare, and only one study of the molecular structure of rRNA, presenting 3 insect 5.8S sequences (112) was mentioned in the review. A 1988 paper by Chris Simon (113) included a table of 30 molecular phylogenetics projects underway at the time, but all of them were as yet unpublished. Thus, the first molecular study that we are aware of that specifically addressed insect phylogeny was published in 1989, when Wheeler (114) discussed separate analyses of 18S sequences and restriction sites2. The restriction site analysis (his Fig. 6) included more taxa than the DNA sequence tree, and supported Metapterygota (damselfly + Neoptera), and Neuropteroidea (beetle + lacewing). It seemed from this work that the 18S rRNA gene was a promising source of characters, especially given that restriction sites alone could result in a reasonable tree. In 1992, Carmean et al. (115) used 18S rRNA to explore relationships among holometabolous insect orders and noticed that flies had an elevated substitution2 rate, and long regions that had to be excluded from the analysis because they could not be aligned with confidence. They surmised that the flies (Diptera) were being drawn to the root2, and thus, excluded them in most of their analyses Pashley et al. in 1993 (116) published distance2 and parsimony2 analyses of a fragment of 18S rRNA from nine orders of Holometabola. They were able to recover Mecopterida, and Amphiesmenoptera, but bootstrap2 support for most groups was very low. Pashley et al. concluded that using different outgroups1 yielded different topologies for poorly supported ingroup1 taxa. The failure of these analyses to converge on strongly supported results from a few taxa, and fragments of 18S is not surprising. This gene alone has never resolved relationships among all orders of Holometabola, even with many more taxa, although von Reumont’s work (117) came very close with combined complete 18S and 28S but without the confounding Strepsiptera.
Mitochondrial data were also explored in the early days of Sanger2 sequencing. Liu and Beckenbach (118) explored the mitochondrial cytochrome oxydase II (COII) gene in 10 orders of insects, using a genetic-distance-based2 analysis (119) and parsimony2. Trees from various analyses grouped the cockroach and the termite, and the three species of Hymenoptera (ichneumonid wasp, bee, and ant), and not much else. In a study of arthropod phylogeny, a small fragment of mitochondrial 12S rRNA gene was analyzed, and it was proposed that onychophorans (velvet worms), are in fact modified arthropods (120). Since onychophorans are generally considered to be arthropod outgroups, this study was published with fanfare in the journal Science. The statement that “These data demonstrate that 12S … can resolve arthropod relationships…” is strongly contradicted by the highly unusual (and since rejected) phylogeny they recovered.
Through the Sanger2 sequencing period, molecular phylogenetics focused largely on 18S, 28S, and a few mitochondrial genes, mostly 12S rRNA, 16S rRNA, and COI. However, rRNAs were difficult to align (121)(122) and model (123), and it seemed that the mitochondrial genes were biased and full of misleading signal (118). Single copy nuclear genes were seen as a solution, but remained difficult to sequence. The standard markers were easier to amplify, because universal primers were available (123)(124)(125), and both mitochondrial genes and nuclear rRNAs were present in multiple copies in every cell. A review in 2000 (126) called for coordinated efforts in selecting genes that were compatible across studies, and supported the continued use of 18S rRNA and commonly sequenced mitochondrial markers. In a contrasting opinion, in order to move beyond rRNA and mitochondrial genes, a group of workers at the University of Maryland embarked on a program to locate and sequence single copy nuclear protein coding genes (127). Of 14 “promising candidates” they identified, several (EF-1a, DDC, POLII, and to a lesser extent, PEPCK) saw extensive use in insect intra-ordinal phylogenetics. They would continue to develop useful protocols for amplifying genes such as wingless, CAD and others (128)(129)(130)(131)(132)(133)(134)(135). Additional contributions to the arsenal of nuclear genes for insect phylogenetics soon followed (136)(137)(138). Histone H3 and U2 snRNAs were examined (139)(140), with the former used extensively despite the fact that neither gene could recover any reasonable higher level groups (141). Practically all major higher level insect phylogenetic studies in the past decade have relied, at least in part, on single-copy nuclear genes, and these have now become the dominant markers in transcriptome4 analyses. The markers developed by the Maryland workers and others were put to good use across arthropods, among a few orders (142), and within orders such as Lepidoptera, Hymenoptera, Diptera, and Coleoptera (see below). However, they were not applied broadly across orders until Wiegmann’s 2009 work (47).
Taken as a whole, it is not difficult to see why morphologists would be less than excited by the state of molecular phylogenetics in the early 1990s. In historical context, this was a time when university hiring priorities favoured molecular workers who could sequence a couple hundred nucleotides from backyard insects, and “discover” relationships that were either already widely accepted, or hard to believe. Grant money seemed to be reserved for molecular work. Still, morphological workers seemed to be at an impasse. They agreed on the general outlines of Hennig and Kristensen, and had established Dictyoptera, Mecopterida, Amphiesmenoptera, Antliophora and Neuropterida as monophyletic. However, they were unable to resolve the relationships among the entognathous, palaeopteran, polyneopteran, or holometabolous orders.
The development of PCR made possible the rapid collection of nucleotide sequence data throughout the 1990’s and beyond (143)(144). Most molecular workers at this time recognized the limitations of their own data, and were proposing ways to address them. Two early reviews (123)(145) discussed strategies for modeling DNA to account for known biases in the way it evolves. Swofford and colleague’s influential chapters in the “Molecular Systematics” books (146) (147) had laid a foundation for understanding the analytical issues, and the programs PAUP* (148) and PAML (149) were available for running model-based (likelihood2) analyses at a time when computers were finally up to the task of implementing complex substitution2 models. PAUP is an acronym for “Phylogenetic Analysis Using Parsimony”, so the asterisk after the new release referred to “and other methods”, such as likelihood and distance. The pull-down menu (Graphical User Interface or GUI) in PAUP* was an ideal platform for newcomers to learn the complexities of models of DNA evolution and statistically-based likelihood phylogeny-building methods. At this time, the need to accommodate biases with either models or differential weighting were becoming obvious (109). By the mid 1990s the field of insect molecular phylogenetics looked promising, but it had begun to splinter into camps based on analytical methods. There was a brief honeymoon where the logic of cladistic taxonomy was universally adopted, but then co-opted by some who conflated cladistics with parsimony, as they transitioned from intuitive Hennigian methods to computer-based parsimony analyses.
The problem with “the Strepsiptera problem” – 1995-2010
There was probably no question that occupied the minds of insect systematists more during the late 1990s than the “Strepsiptera problem”. This is surprising because strepsipterans are neither diverse nor conspicuous. However, like many parasites with highly modified structural features, these fascinating and unusual insects were difficult to place morphologically. Their ribosomal data would prove to be the battleground over which likelihood2 and parsimony practitioners would argue, which in turn helped to reveal the problems inherent in parsimony. Most morphological studies placed Strepsiptera (Fig. 4) as the sister taxon to the beetles (Coleoptera), based on hind-wing flight (posteromotorism), and a few other characters (72)(150)(151)(152), or within a subgroup of Coleoptera (153) (see also McKenna and Farrell (154)). The first of the molecular-based studies addressing this, (but without published data or analytical detail) was a Scientific Correspondence that appeared in Nature in 1994 by Whiting and Wheeler (155). They proposed that Strepsiptera were the sister taxon of Diptera (flies; the two groups combined named Halteria (77)). Halteria refers to halteres; the gyroscopic reduced hind wing stubs found in flies that are superficially similar to strepsipteran forewings. They surmised that the grouping of flies with strepsiperans was in itself evidence of a homeotic mutational transformation that resulted in halteres flipping from the 3rd thoracic segment in flies to the second in strepsipterans (Fig. 4). Most morphologists doubted this assertion. Skepticism came from the molecular perspective as well. Carmean and Crespi (156) responded almost immediately that “long branches attract flies”, which was unambiguously demonstrated by Huelsenbeck (157), with a likelihood2 analysis of available data (156) (the Whiting data were not yet public). Chalwatzis et al. (158) were the first team to actually publish an analysis of this problem and make available their data. Like Whiting and Wheeler, they used 18S rRNA sequences and recovered Halteria. In a 1996 follow-up work (159), increasing the number of taxa to 26 and including all holometabolous orders, they again recovered Halteria. In addition to parsimony, these authors used the neighbor-joining distance method with a model (160) designed to accommodate nucleotide compositional bias3 among lineages and among-site rate variation3. Strepsipteran 18S was found to be about 1000 nucleotides longer than the next longest 18S and shared extreme and similar AT nucleotide compositional bias with Diptera. All their analyses favoured Halteria including those designed to correct for the bias they had observed. However, in an analysis they did not show, when site-specific rates were used to correct for among-site rate variation (161)(162), the bootstrap2 value for Halteria dropped from 100% to 77%. They cautioned that 18S could be artificially clustering long-branched taxa and looked forward to investigating other genes not linked to rRNA to test their findings.
Nucleotide compositional bias: When nucleotide frequencies stray significantly from 25% each. This phenomenon is particularly problematic when bias differs among taxa.
Among-site rate variation (ASRV): When different sites along a sequence vary in their substitution rates. For example, when the substitution rates are higher for 3rd codon positions than for 2nd codon positions, this difference in rates is important to capture in model-based analyses, and argues against equally weighted parsimony. ASRV is also extremely problematic when it varies across lineages in a tree.
Multiple sequence alignment: The process of lining up DNA or amino acid data into columns of presumed homologous positions.
Consensus tree: A graph summarizing a set of trees by only showing clades which are shared among multiple equally favoured solutions or even multiple analyses. A strict consensus tree shows only those relationships found in all trees, while a majority-rule consensus depicts the most common resolution.
Sensitivity analysis: A means of exploring the robustness of a conclusion by altering the analytical details that influence it. For example, if one were interested in exploring how alignment parameters (like the penalty for inserting a gap in an alignment) influenced a phylogeny, one could change the input values to create new phylogenies from the new alignments and explore how the resulting trees differ.
Input parameters: Many complex analyses, such as alignment of DNA sequences or phylogeny reconstructions, require specification of a number of parameters. Common input parameters include numbers that indicate costs or ratios, or parameters of a specific evolutionary model. Input parameters are often derived from empirical data and can drastically alter phylogenetic results.
The largest dataset of the 1990s exploring the phylogeny of Holometabola was presented in 1997 by Whiting et al. (77). Approximately 1100 18S, and 400 28S rRNA positions were aligned with the multiple sequence alignment3 program Malign (163), and analyzed with parsimony. Molecular data were then combined with morphological data taken from the literature. Sensitivity to alignment was explored by evaluating trees both with, and without hypervariable regions. Both beetles and neuropteroids were polyphyletic, due to contamination. The paper is best remembered for its recovery of Halteria, a hypothesis that they would vigorously defend (74)(164)(165)(166)(167) until morphology (48)(168) and additional genes (154)(47)(154)(169)(170) overturned it 15 years after it had been proposed.
Whiting (77)(164) rejected the suggestion that Halteria was an artifact of long branch attraction2, arguing that 18S and 28S corroborated one-another. However, these are not independent genes but rather different regions of the same transcript. Countering Whiting’s other arguments, Huelsenbeck showed in 1998 (171) that the length of the branches leading to Strepsiptera and Diptera were “virtually unparalleled in phylogenetic analysis”. Huelsenbeck’s analysis was among the first to apply likelihood to a large number of insect orders. Whiting (77) had argued that the branch leading to the amphiesmenopterans was “not far out of range” of those leading to the Strepsiptera and Diptera. However, this statement missed the central tenet on which long-branch attraction is based. In parsimony analyses, as independent changes are transferred away from the terminal branches leading to both Strepsiptera and Diptera where they occurred, to the internode2 that falsely links them together, the observed terminal branch lengths are underestimates of the real number of independent changes. In other words, parsimony takes two independent changes, and assumes they are shared derived character states. By removing either Strepsiptera or Diptera from the analysis, each remained in the same position relative to the remaining taxa. Their argument was that Strepsiptera could not have been attracted to Diptera given that it ends up in the same place in the tree when Diptera is removed (165). However, removing either taxon simply caused the remaining long-branch taxon to attract to the next longest branch, the Amphiesmenoptera, a branch that they recognized as almost as long (77) as those leading to Strepsiptera and Diptera (although they had underestimated these branch lengths). Huelsenbeck (171) showed that given the taxon sampling used by Whiting et al. (77), the branches leading to both Strepsiptera and Diptera were long enough to attract one another with parsimony, and that likelihood analyses could not distinguish among hypotheses. It was recognized that taxa at the end of long branches may actually be sister groups (107)(157)(171)(172), but that the rRNA data in hand could not support any conclusion including Halteria. Friedrich and Tautz (173), confirming the observations of Chalwatzis (158)(159), showed in 1998 that there had been an extreme change in both substitution rate and compositional bias3 in the stem dipteran lineage that would pose problems for phylogenetic analyses. These biases were further explored in 2000 by Steel and colleagues (172) and others (174). Hwang et al. (189) sequenced additional large subunit rRNA fragments in order to test the Halteria hypothesis. Their parsimony analyses recovered Halteria, which they attributed to long-branch attraction2, while their likelihood analyses placed the strepsipteran with the scorpionfly. All these authors recommended that the Halteria hypothesis be dropped. So why did it take phylogenomics to settle the issue? It didn’t, and this is not the wisdom of hindsight. For likelihood practitioners, it was settled in 1998. It had been conclusively demonstrated that Halteria was the result of inappropriate methods and obvious predictable bias. Halteria has only been found from the analysis of nuclear rRNA data, and contradicted by every other source of data (47)(50)(154)(168)(175). Genomic (175)(176) and transcriptomic4 (50)(177) analyses now leave very little room for debate: Strepsiptera belongs as sister to the beetles. The debate was never really about Halteria, but rather, about the philosophical merits of parsimony versus likelihood.
In addition to disagreements over the merits of parsimony, the methods of nucleotide alignment3 played an important role in insect phylogenetics during the Sanger sequencing period (107)(163)(178)(179)(180)(181)(182)(183)(184)(185)(186). The definition of cladistics had been transformed, and in the new sense, was characterized by strict and exclusive adherence to parsimony analyses. The first work in insect molecular systematics that covered all orders came from a group of cladists (in the new sense) who were centered at the American Museum of Natural History in New York (74). They extended the dataset of Whiting with the same rRNA fragments as in the 1997 work (77), but with additional taxa, particularly outside Holometabola. The principle analytical difference was the implementation of simultaneous alignment and tree building (187). They called this method “direct optimization” when implemented by their program, POY (179). The molecular data by itself, presented in their Figs 12a 13, and 14, suggested many implausible relationships (107). However, it seems that the morphological data as shown in their Fig. 10, provided a reasonable scaffold on which to structure the molecular data. In order to explore the influence of different analytical assumptions, they presented six combined data trees. The analysis that minimized incongruence among datasets (their Fig. 11) would seem to be the favoured hypothesis, although this was not explicitly stated. The summary trees of all assumption sets, shown in their Figs. 18a&b, were largely unresolved consensus trees3. However, often their results are cited as their Fig. 20, which did not come from an analysis but rather was a “discussion tree” created from nodes2 the authors favoured from different datasets. The deepest parts of their trees seemed robust to analytical assumptions, while relationships among polyneopterans and holometabolans were unstable. Despite published papers that pointed toward branch effects and compositional bias3 for these data (137)(156)(157)(171)(172)(188)(189), they did not take these problems into consideration, favoring parsimony on philosophical grounds.
Many of the differences among phylogenetic hypotheses were the result of differing analytical approaches and ambiguous alignment3 of rRNA data. The history of alignment disputes have been described in detail elsewhere (181)(182)(185) and some researchers likely turned to nuclear single copy genes simply to avoid the problems of rRNA alignment altogether, and perhaps the tedious bickering from both sides of this issue (190). POY has since fallen out of favor with systematists, due to numerous and diverse criticisms (121)(183)(184)(191), although the possibility of simultaneous alignment and tree building remains (192). The problem with model-based simultaneous alignment and tree building is that it is difficult to create a biologically reasonable model for gaps.
The sensibility of sensitivity
By 2001, the insect systematics community was strongly divided into parsimony, and likelihood camps. It was another set of parallel universes, with different journals (Systematic Biology vs. Cladistics), different heroes (Felsenstein vs. Farris), different branch support2 measures (bootstraps vs. jack-knifing and Bremer support) and even different computer systems (Mac vs. PC) brought about by the platforms of their different programs (PAUP vs. Winclada, NONA, and TNT). Many in both camps basically dismissed the ideas from the other side as flawed and without merit. Most morphologists found themselves in the parsimony camp, likely due to tradition and the fact that morphological characters are less amenable to modeling than molecular characters. Much of the error in parsimony analysis could have been mediated by differential weighting of characters, upweighting slow sites, and downweighting fast sites (123)(193). Morphologist had always weighted their data, if only by selection of characters that they deemed reliable. However, with DNA there was little interest in differential weighting, as one side rejected it based on the insistence that equal weights were assumption free, and the other side preferred likelihood, because models mimic differential weights with the added benefit of being grounded in statistics (194). Weights and other parameters3 upon which phylogenetic conclusions depend, must be selected by the user. If their selection is arbitrary, then subjectivity remains, but it is transferred from a thinking person to a machine (181)(182)(185). Many in the molecular-parsimony camp were dedicated to POY analyses, and believed that they were removing as much subjectivity as possible. In order to deal with the problem of objectively selecting analytical parameters, they developed a brand of sensitivity analyses (74)(195) that was based on incongruence length difference (ILD) tests (196). ILD testing involves comparing subdivisions of the data to combined data, seeking parameters that minimize incongruence. Criticism of ILD testing is beyond the scope of this review, but can be found in many works (182)(197)(198)(199)(200)(201). However, even if ILD tests were legitimate, one must decide which among many parameters should be evaluated, each with an infinite space to explore, with each influencing the behavior of the others (182). Grant and Kluge (202), in a particularly radical application of their own view of epistemological consistency, reject the whole idea of sensitivity analyses3, and suggest that all parameters should be equal, and set to 1 on philosophical grounds. Ogden and Whiting (203) applied sensitivity analysis to the “Palaeoptera problem”–the phylogenetic positions of dragonflies and damselflies (Odonata) and Mayflies (Ephemeroptera) relative to insects that have the ability to fold their wings (Neoptera). They showed that the results were indeed sensitive to input parameters. In a justification for using a single analytical method, and counter to exploring the influence of the application of model-based methods, they stated that they “…do not consider congruence among different methodologies to be a suitable measure of robustness because agreement among inferior methods is nebulous at best.” It seems that this attitude was shared by both sides, as model-based analyses were not explored by the cladistics group until Terry and Whiting in 2005 (204), and parsimony analyses were virtually abandoned by the practitioners of likelihood. We see now that short, ancient internodes2 are always sensitive to assumptions and input parameters. Thus, the nodes2 unseen by Börner in 1904 collapse with sensitivity analyses3 as it was applied, as seen, for example in fig. 18 in Wheeler et al. 2001 (74).
Concerning the relationships among insect orders, the entire Sanger period provided few if any new insights that were widely agreed upon. Cockroach paraphyly, an apparent exception, had been suggested based on morphology (205). Part of the lack of resolution came about because the parsimony and likelihood schools were so far apart, and non-specialists could not choose between them. Even a hypothesis that was supported by practitioners on both sides of the analytical divide–Nonoculata (Protura+Diplura; Table 1)–now seems to be an error (but see (206). In addition, the common result of finding snow fleas (Mecoptera: Boreidae) closer to the fleas than other mecopterans, is now in question. These are very difficult phylogenetic problems. The current prevailing opinion is that model-based analyses outperform parsimony even when parsimony is weighted to be more realistic (123) (but see the editorial in 2016 in Cladistics (207)). Accepting this premise, the philosophically driven parsimony-based insect molecular phylogenies that dominated the literature in the 1990s and 2000s were an unfortunate detour, especially when compounded by the failure to recognize the errors resulting from compositional bias, non-homogenous substitution rates, alignment error and the inconsistencies of rRNA analysis with POY (107)(171)(172)(181)(182)(183)(184)(208).
The likelihood camp
The basic principles and performance of likelihood (209)(210)(211)(212)(213)(214)(215) were laid out by Felsenstein long before they became standard practice. Likelihood was first introduced into phylogenetic systematics by Cavalli-Sforza and Edwards in 1965 but it was not widely applied because user friendly programs were not available until PHYLIP was developed in 1980 (216) Even after likelihood programs were available, it took a while for people to develop an understanding of how models of evolution could lead to an estimate of phylogeny. Parsimony was far easier to grasp. Many were uncomfortable with the number of assumptions required for model-based analyses. However, although it was claimed that the assumptions required for equally-weighted parsimony were fewer or non-existent, it is clear that they were simply not defined. If they were defined, they would be exceedingly complex and unacceptably unrealistic (121, 201). Even if there were fewer assumptions, these few would still lead to error with certainty under common branch length combinations (188). However, a major obstacle to using likelihood was that it was difficult to analyze more than 10 taxa in a reasonable time frame. Sophisticated models of evolution could not be implemented until computers gradually gained the speed to analyze datasets of typical size, more than ten years after PHYLIP was introduced. Throughout the 1990’s phylogenetic methods based on likelihood as an optimality criterion increased in importance and implementation.
The motivation for implementing likelihood was strong. Felsenstein had demonstrated in 1978 (188), that under parsimony with some branch length ratios, the addition of data would strengthen support for the wrong tree. He speculated that parsimony would work if rates of evolution were low or sufficiently equal among lineages. Hendy and Penny (218) extended Felsenstein’s work to show that neither of these conditions for the success of parsimony would hold once the number of taxa exceeds four. They concluded that rather than unequal rates it was the long branches that were the problem, and introduced the concept of long-branch attraction. The idea that adding more data could not overcome this bias was hard to accept, and we still find the idea expressed that more genes or increased taxon sampling is a panacea. Into the 1980s, most cladists were still basking in the glow of defeating the numerical taxonomists (whom they called pheneticists). Although likelihood is clearly based on individual characters (like parsimony), its statistical underpinnings were incorrectly assumed by some cladists to link it to phenetics2. In 2000, it could still take weeks on a desktop computer to analyze 500 nucleotides for 50 taxa. Bootstrapping or any kind of branch support2 was difficult if not impossible for likelihood analyses with more than 25 taxa until fast maximum likelihood programs were developed–PhyML (219), Garli (220) and RAxML (221)(222).
Bayesian analysis, which shares many properties with likelihood, was introduced into phylogenetics in 1999 (223), and could be implemented in the user friendly program MrBayes (224). By 2001 many in the likelihood school rapidly adopted MrBayes, because it calculated branch support in the form of posterior probabilities at lightning speed (141). It was later realized that much longer Bayesian runs were necessary to be sure that the program had converged on the optimal answer, especially when data required complex models of evolution. However, the problems with model-based analysis were not entirely due to ignorance or the lack of computing power. The influence of long-branch attraction was debated (164)(165)(224), but its ubiquity was not fully understood. Models that did not accommodate key elements of reality, such as among-site rate variation3, could be as error prone as parsimony, without parsimony’s comfortable philosophical footing based on the perception that it minimized unjustified assumptions. It was uncomfortable to use a method so dependent on models, if one could not justify which model to use. Model selection became a major focus of phylogenetic studies (225)(226)(227). At first models of evolution were tested manually using likelihood ratio tests (228)(229), but this became automated in 1998 with “Modeltest” (230). It seemed that this program almost always suggested the most complex model, which led to the development of decision theory (reviewed in Sullivan and Joyce (231)), including a stronger penalty for increasing the number of parameters3.
In hindsight, if we are to judge by current standards, and our ability to assess accuracy in the light of phylogenomic data, likelihood analyses were both more accurate and philosophically grounded. Most early likelihood practitioners in entomology confined themselves to intraordinal relationships, and their work has become relatively robust, as datasets have gotten larger. Friedrich and Tautz, in 1995 (232), were among the first to use likelihood to estimate deep arthropod relationships with Phylip (216). Their analysis included 3 hexapods, and recovered Pancrustacea, and crustacean paraphyly. Likelihood (among other methods) was also used by von Dohlen and Moran (233) in 1995 to demonstrate the paraphyly of “Homoptera”. Frati et al. (228) used Maximum likelihood analyses of mitochondrial COII gene data in 1997 to examine relationships among springtails, and demonstrated that including a correction for among-site rate variation3 in the analysis had more of an effect on likelihood scores than the substitution models themselves. Flook and Rowell (234) used likelihood methods to explore the properties of mitochondrial data among orthopterans. Whitfield and Cameron (235) found in 1998 that likelihood outperformed parsimony in their study of hymenopteran 16S rRNA. Lo et al. (236) used likelihood methods in 2000 to demonstrate the paraphyly of cockroaches. In 2001 Kjer et al (238) were the first entomologists to use Bayesian methods in their study of Caddisfly (Trichoptera) phylogeny, and Kjer (101) was the first to include many insect orders with Bayesian methods.
It was not until the mid 2000s that consensus among entomologists in the U.S. swung toward likelihood analyses for molecular data, but parsimony analyses are still being published due to cultural/historical factors, and is still being actively favoured by the journal Cladistics (207). It can take a long time for attitudes to shift, and sometimes recollection is subject to “retrospective meaning change” (see discussion by Hull (237)). As with debates over creationism, or climate change, the fact that there are two sides to an issue does not mean that both sides are equally supported. Parsimony for molecular data seems to be supported by faith. Sometimes progress in science comes, not from evidence or flashes of insight, but through strong personalities fading into retirement. (Paraphrasing Max Planck: “Science advances one funeral at a time”).
The dominance of ribosomal RNA
Although many papers included small fragments of 28S, or Histone H3, it was the 18S that dominated results from the Sanger sequencing period, sometimes stabilized by morphological characters (107). This was partially due to historical artifact and partially due to the ease of amplifying and sequencing nuclear rRNA. The 18S gene suffers from extensive among-site rate variation (193) and severe alignment problems within some regions, while (unlike the 28S) the alignable regions are practically invariant. Kjer (107) explored the properties of the 18S, structurally aligned, using a model-based analysis that accommodated rRNA covariation (238). His phylogeny was much closer to current consensus than previous parsimony analyses of 18S. In 2005, a large insect phylogeny was presented by Terry and Whiting (207), using Histone H3, larger portions of the 18S and 28S, and a modified morphological data matrix from Wheeler et al. (74). They focused on polyneopterans, including Mantophasmatodea for the first time, and included a Bayesian phylogeny along with their POY-based parsimony (204). Their Bayesian analysis was a great leap forward. They recovered many of the nodes2 we now find with larger datasets; many for the first time with molecular data, including Xenonomia, Eukinolabia, and Haplocerata (Table 1), which they named, as well as Polyneoptera, Neuropteroidea (Strepsiptera was not included), and Antliophora. In a counterpoint to POY-based analyses, Kjer et al. (141) provided a review of the data of the time, with a commentary on methods. They reported the results of a 15,000 nucleotide multi-gene supermatrix, put together from complete 18S, a large fragment of 28S, EF-1a, Histone H3, and mitochondrial 12S, 16S, COI and COII, along with 170 morphological characters from the older sources, such as Hennig, and Kristensen. Their results came very close to our current consensus, particularly within the polyneopterans. In all Kjer’s analyses, Strepsiptera and Zoraptera were excluded because these taxa exhibited extreme substitution rate accelerations in their rRNA. He was also suspicious of the published Zoraptera sequences. Zoraptera was resequenced and a modified structural alignment (107) was used by Yoshizawa and Johnson (75) in order to place this difficult taxon. They found it to be sister to Dictyoptera, as did Ishiwata (170) with nuclear single copy genes. They cited morphological support for this relationship from Boudreaux (69) and Kukalová-Peck (82). Misof’s group (239) provided an insect specific secondary structural model, and re-evaluated Kjer’s (107) analysis of 18S, with increased taxon sampling. They found similar results to those from earlier structural alignments although they found Zoraptera grouped with Stoneflies (Plecoptera). This analysis included Strepsiptera, which, as in other likelihood analyses of rRNA (157)(189) did not group with Diptera, but instead, in this case as sister to an implausible Diptera + “Coleoptera” group, with the long branch Diptera acting as a second internal root2 that rendered beetles paraphyletic. For the first time since 1997 (77) the molecular data recovered Hymenoptera as sister to the rest of Holometabola (Table 1. Aparaglossata). The most thorough exploration of rRNA-based insect phylogeny was completed by von Reumont et al. in 2009 (117). They used an automated alignment algorithm that incorporated rRNA secondary structural information (186), eliminated randomized sites (phylogenetic noise) with the program Aliscore (240), and used more realistic substitution models. Thus none of the previous criticisms over manual manipulation of alignments and manual data exclusion could be applied to this study, recovering Nonoculata, Ectognatha, Dicondylia, Pterygota, Chiastomyaria, Neoptera, Holometabola, Aparaglossata, Amphiesmenoptera, and Mecopterida.
Published phylograms (trees with branch lengths proportional to the number of estimated substitutions) (75)(117)(141)(171)(172) illustrate the extreme heterogeneity of rRNA substitution rates among orders, and this property causes problems with standard methods (172), even under likelihood. Protura and Diplura share extreme branch lengths relative to neighboring Collembola and Archaeognatha, and the rRNA of both has extremely long regions of hypervariability that are difficult to align (241). Phylograms show that Zoraptera, Strepsiptera and Diptera are also extreme. Odonata evolve more slowly than their neighbors in the tree. Ribosomal RNA analyses frequently recover Nonoculata (74)(75)(107)(117)(239)(242)(243)(244)(245) Chiastomyaria (117)(141)(239), Dermaptera sister to Plecoptera (107)(117)(239), and mecopteran paraphyly (74)(75)(77)(107)(117)(141)(246). The consistency of these results despite the differences in alignment and optimality criteria indicate that rRNA supports these relationships when analyzed with existing methods, even though much larger datasets now contradict Nonoculata and mecopteran paraphyly.
Other types of data
Besides rRNAs and the few nuclear protein-coding gene studies, there were other novel character systems explored for insect phylogenetics, such as locations of introns, and mitochondrial gene order. Rokas and colleagues reported in 1999 that an insertion in a homeobox gene (247), shared by Diptera and Lepidoptera was not found in Strepsiptera, contradicting Halteria. Carapelli et al. (248) noted that Collembola and Diplura shared the loss of an intron within EF-1a. Intron positions in EF2 were mapped (249), showing that Coleoptera, Lepidoptera, and Diptera shared a derived arrangement that was absent in Hymenoptera, predicting our current understanding. A survey of intron positions in EF-1a (250), found a remarkable tree from only 6 informative characters, but intron positions did show homoplasy2, largely due to independent loss. A study of ecdysone receptors by Bonneton et al. (251) showed a significant rate acceleration that countered Halteria, as did their sequence analysis. Predel and Roth have put their analysis of neuropeptides to use in studies of cockroaches, grasshoppers and Mantophasmatodea (252)(253)(254)(255). Xie et al. tabulated the distributions and lengths of 18S hypervariable regions (241), and they reported that some of them could be used as synapomorphies for insect groups. In addition to updating a secondary structural model for insects, they found that Zoraptera and Dermaptera shared the greatest number of hypervariable regions of identical lengths. Boore et al. (256) examined mitochondrial gene order among arthropods in 1995, and they found that Pancrustacea was supported by a mitochondrial gene order character (257). After this discovery, it was hoped that mitochondrial gene order might help resolve difficult nodes2 among insect orders, because it was assumed that gene order was highly conserved and unlikely to be homoplastic2. However, the most controversial internodes2 are likely to be short. We can think of internodes as targets where the size of the target is proportional to the length of time an ancestral lineage exists before it splits. As in archery, small targets are hard to hit. Thus, the probability of hitting extremely short internodes with extremely rare events is extremely low. An understanding of this phenomenon is currently important in genomic studies, where it is hoped that with an abundance of characters, extremely short internodes may be hit by extremely rare changes in the structure of genomes. Unfortunately, mitochondrial gene order is remarkably conservative in insects, except within Paraneoptera (258)(259)(260)(261) and Hymenoptera (262)(263), with groups supported by changes in gene order, summarized in a recent review by Cameron (190).
Mitochondrial data have been the subject of two recent reviews (190)(264). The accumulation of mitochondrial genomes continued through the 2000s (265)(266)(267)(268)(269), at a slow pace, but picked up rapidly after 2003 with concerted efforts from Cameron, Song, and Whiting (190). Currently whole mtDNA genomes are accumulating very rapidly because they can be efficiently targeted with high throughput4 methods (270), and are often recoverable as accidental “by-catch” in high throughput sequencing. It was not until preliminary results from the full scale efforts to sequence entire mitochondrial genomes were published (271)(272)(273)(274) that the extent of the problems with mitochondrial data became clear. Cameron and others concluded that mitochondrial data were promising, but that nucleotide compositional bias among lineages, unequal substitution rates among groups, and other long-branch effects must be carefully considered. Many of the relationships recovered with complete mitochondrial genomes were implausible, and they recommended that mitochondrial genomes be combined with other sources of data. Talavera and Vila found the same problems in 2011 (275), and proposed that deep nodes cannot be reconstructed with the methods of the time (which included Bayesian and likelihood analyses). Simon and Hadrys (274) recommended a similarly cautious view, finding many implausible relationships among orders even when using dense taxon sampling, and careful modeling. Chen et al. (276) found that including a projapygid helped recover dipluran monophyly, but they were still unable to recover hexapod monophyly with extensive taxon sampling among basal hexapods, and arthropod outgroups. Cameron’s more optimistic review of the phylogenetic implications of insect mitochondrial genomics, summarized model violations, and made thorough recommendations for the appropriate treatment of mitochondrial genomes for phylogenetics. The issue of whether mitochondrial data are “good” or “bad” is clearly a gross oversimplification. Many ancient nodes that are accepted and corroborated by other data are recovered from mitochondrial data (274)(277)(278)(279)(280)(281), and many relationships among polyneopterans are shared between nuclear and mitochondrial analyses (190). For example, using mtDNA genomes Wan et al. (281) found many of the nodes that Misof et al. (50) recovered from transcriptomes4 and some of these nodes (Fig. 5: P,Q,R) have only rarely been seen before. Mitochondrial data consistently recover Mantophasmatodea with Phasmatodea (281)(273). Cameron et al. (282) found Megaloptera sister to Neuroptera, reflecting results from transcriptomes (50). The number of nucleotides in any analysis is strongly correlated with branch support2. However, since mitochondrial genes are all linked and thus inherited as a unit, once the gene tree is accurately recovered, there is little more in terms of corroboration that the full mitochondrial genomes can add. The motivations and disagreements today, in the era of “big data” phylogenomics, are sometimes centered around those who advocate for more data (both in terms of longer sequences and more taxa), and those who advocate “better data”. This disagreement has been with us since the beginning of molecular systematics, and it misses the point that more data, better data, and better models are all good things.
Work on individual orders
This review has given short shrift to the vast majority of insect phylogenetics papers because of our focus on works addressing higher-level insect phylogeny. Given their almost unimaginable diversity, it is impossible for any individual to be considered an expert for all Hexapoda, and most workers spend their careers exploring particular groups. Here we list a sample of the recent advances from various authors in the phylogeny of Odonata (283)(284)(285)(286)(287)(288)(289)(290) (286)(292), Ephemeroptera (293)(294)(278), Plecoptera (295), Dermaptera (296), Embioptera (297), Phasmatodea (298), Dictyoptera (236)(299)(300)(279)(301), Mantodea (302), Orthoptera (234)(303)(304)(305)(306), Hemiptera (307)(308), Psocodea (309), Hymenoptera (310)(311)(312)(313)(314)(315)(316)(317), Neuropterida (318)(319)(320), Coleoptera (321)(322)(322)(323), Diptera (324)(325), Lepidoptera (326)(327)(328)(329)(330)(331) (332)(333)(334), Trichoptera (335)(336)(337), Mecoptera (246)(338)(339) and Siphonaptera (340)(341).
Beyond the standard toolbox: Multiple genes, transcriptomes and genomes
Although many useful studies are still published with a few genes and morphology, phylogenetics today frequently involves the analysis of very large datasets. Savard et al. (342), in an early use of genomic phylogenetic resources in 2006, analyzed 185 nuclear genes from four holometabolous orders, rooted2 with a grasshopper and an aphid, found results that are consistent with our current best estimates: (Hymenoptera, (Coleoptera, (Lepidoptera, Diptera))). At that time, most studies found Hymenoptera to be weakly supported as sister to Mecopterida, so the strong support for their alternative result led the authors to suggest that large datasets could resolve long-standing controversies in insect phylogenies. One of the first studies on Holometabola to break free of the standard rRNA and mitochondrial genes for interordinal analyses reported results from six single copy nuclear genes (47). A similar study using nine nuclear genes (154), found nearly identical results, both predicting our current understanding of relationships within Holometabola. Even though the datasets were no larger than previous rRNA-dominated analyses, and significantly smaller than the transcriptomic analyses to come (Fig. 5), the fact that both papers rejected Halteria independent of rRNA gave them extra impact. Three new nuclear protein coding genes (DPD1, RPB1, and RPB2), were used in 2011 (170), further rejecting Halteria. Sasaki et al. (338) sequenced over 10K nucleotides from these same 3 genes, and focused their attention to the early splits among hexapods, with significant arthropod outgroups, and polyneopterans. They recovered the unusual result of (Protura, ((Collembola, Diplura), Insecta)), which has not been subsequently corroborated.
Data-mining and big, automated phylogeny pipelines – Behaviorists, ecologists, and other biologists rely on phylogenetic trees to understand the evolution of complex characteristics. GenBank is data rich and the temptation to create pipelines4 to download, combine, filter, and analyze these data to produce a tree is strong. Building upon work by Hunt and Vogler (344) to generate large datasets from public databases, Peters et al. (312) developed a “proof of concept” pipeline that mined GenBank for data from Hymenoptera, in order to construct a phylogeny with over 1000 taxa. The concept worked, but the phylogeny suffered from the quality of the original data in GenBank. Bocak et al. (322) constructed a phylogeny with public databases for more than 8000 beetle species. Again, this study proved that such a thing can work, and supported a several disputed internal relationships. Zhou et al. (345) produced a phylogeny of over 16,000 barcode haplotypes from Trichoptera, but this study differed in that constraints were used to insulate the phylogeny from predictable errors. These studies provide evidence that producing huge phylogenies from public databases is feasible. However, based on our experience with genomic and morphological data, we caution that without analytical expertise for the specific properties of the data, as well as the insights of taxonomic specialists for a particular group of insects it is impossible to reconstruct and evaluate the plausibility of phylogenenetic relationships. This idea exemplifies the balance between skilled analyses that produce reasonable phylogenies, and the concern that unjustified or capricious decisions could bias phylogenetic conclusions.
The Palaeoptera Problem Revisited. An early transcriptomic analysis of seven pterygote orders, rooted with Collembola, grouped the mayflies with Neoptera (the Chiastomyaria hypothesis; Table 1) (346), but the Palaeoptera problem was far from solved because Regier et al. (347) in a study focusing on arthropods, recovered a contradictory node Palaeoptera. Thomas et al. (348) evaluated the standard Sanger data in 2013, and found support for Palaeoptera. The first very large EST (expressed sequence tags = partial transcriptomes) dataset to evaluate arthropod relationships was published by Meusemann et al. in 2010 (349). In addition to the size of the dataset, this paper was groundbreaking in terms of filtering the data. Randomized or phylogenetically uninformative sites were algorithmically identified and masked4 with a program called Aliscore (240), and the matrix was optimized with the MARE program (350). MARE eliminates both genes and taxa that are problematic due to missing data, resulting in a smaller, but more dense matrix. Meusemann et al. recovered both Palaeoptera and Chiastomyaria using alternative analytical parameters3. In addition, they also found that Hymenoptera were the sister taxon of other Holometabola, and a monophyletic Nonoculata, as in rRNA dominated analyses.
A consistent pattern with large datasets is that they tend to recover either Palaeoptera (117), Chiastomyaria, or both (349) (50), but rarely the morphologically favoured Metapterygota (but see (351)). While Misof et al. (50) reported the monophyly of Palaeoptera, the quartet mapping4 analyses reported in their supplementary materials favoured Chiastomyaria and rejected the morphologically favoured Metapterygota. The resolution of Palaeoptera was predicted to be among the most difficult nodes to recover (352), and even now, with millions of nucleotides applied to the question, it must be considered unresolved (Fig. 5). We continue to see that the problem nodes from morphology still exist, with continued conflict for the placement of Diplura and Zoraptera and the status of Palaeoptera.
Strepsiptera revisited. Among the first of the truly genomic analyses was Niehuis et al. in 2012 (175), who sequenced the Strepsiptera nuclear genome, and compared it to previously sequenced genomes of 2 beetles, 4 hymenopterans, 3 flies, and Bombyx (silkmoth), with 2 outgroups. They tested four hypotheses regarding the placement of Strepsiptera, and concluded that it belonged with beetles, as originally placed by morphologists. This was further strengthened by McKenna (176), who added a neuropteran genome to the analysis, and by Boussau et al. (177), who added transcriptomes from additional key taxa to genomic sequences, and analyzed them with models designed to avoid branch-length effects. They ruled out the possibility of a close relationship between Strepsiptera and either of the beetle famiIies, Rhipiphoridae or Meloidae (among others).
Insect Phylogeny Resolved. Many labs are now collecting large datasets from hybrid capture techniques, such as anchored hybrid enrichment (353), or ultraconserved elements (354) (UCEs). Both methods allow for the recovery of data from degraded museum specimens. Anchored hybrid enrichment has the advantage that probes can be designed from transcriptomes, making data from the two sources completely combinable. UCEs have the advantage that probes can be designed without the need for sequences from closely related taxa. The Weirauch (Heteroptera), Johnson/Dietrich (Hemiptera and Psocodea), McKenna (Coleoptera), Wiegmann (Diptera), Kawahara (Lepidoptera) Ward, Schultz, and Brady (Formicidae) and Kjer/Frandsen (Trichoptera) labs currently have large hybrid enriched datasets in progress, and according to their conference presentations, these data are largely resolving long-standing problems with strong bootstrap2 support.
The Misof et al. insect phylogenomics study based on transcriptomes of 1478 genes (50), published in Science in November 2014, is by far the largest analysis to date of insect relationships, and their phylogeny and provides our current consensus (Fig. 5). Multiple technical advances occurred between Meusemann et al. in 2010 (349) and the Misof study in 2014 (50). Both studies involved many of the same authors. Although the Misof study is short in print, one of their strongest contributions is the 200 pages of supplementary materials, which include recommendations for careful assembly4, orthology prediction4, data masking4, and signal optimization. Protein domains were considered as partitions4, and site-specific rate models were developed. Diplura was sister to Insecta, in agreement with Letsch and S. Simon (355), despite the recovery of Entognatha in other large datasets (347)(349). The 2014 Misof et al. study was the first of the publications from the 1KITE initiative, which has now collected transcriptomes from over 1400 taxa. Subprojects in the works from 1KITE include large datasets targeted at “Basal hexapods”, Odonata, Polyneoptera, Paraneoptera, Hymenoptera, Coleoptera, Neuropterida, Trichoptera, Lepidoptera, and Amphiesmenoptera, along with over 100 side projects dealing with molecular evolution in insects. These side projects put insect phylogenomics at the forefront of the discovery of character systems based on genomic meta-characters, and their impact reaches beyond entomology with development of new phylogenomic approaches. The Misof et al. study has already had a visible impact on phylogenetics of higher order groups by facilitating target enrichment and providing open access data.
High throughput sequencing: A variety of methods that generate huge amounts of DNA sequence data. These methods were called “next-generation” during the Sanger sequencing period.
Transcriptome: All the genes that are active in a cell, tissue, or whole organism at the time a specimen is collected. These genes are sequenced by isolating messenger RNA from the cells.
Assembly: High throughput sequencing typically produces millions of fragments of relatively short sequences. These sequences are aligned, and stitched together into longer fragments called “contigs”, which are then sorted into orthologs.
Orthology prediction: An ortholog is a homologous gene. Gene duplication produces non-homologous, similar-looking genes (paralogs) that can disrupt phylogenetic analyses. Orthology prediction is the process of identifying orthologs, and distinguishing them from paralogs.
Data masking: Identifying “bad” data, and throwing it out. Problematic data comes from a variety of sources. Data can be misaligned, randomized, mostly missing, or it can violate model assumptions.
Partitions: Subsets of the data. Since models are used to analyze data, it is common to subdivide the data into subsets that are assumed to share similar properties so that the models are a closer match to biological reality.
Quartet mapping: A method of branch support that examines a particular node by sampling replicates of 4 randomly selected taxa that surround the node of interest. The results summarize the percentage of times that these quartets recover alternative topologies.
Pipeline: A sequence of analytical procedures. In computer science, a pipeline is a chain of computer programs that perform analytical steps in series.
Innovative approaches such as micro-computed tomography (356) and computer-based reconstruction (e.g., (357)), an optimized combined application of different techniques, and the concept of evolutionary morphology (358) have led to a remarkable renaissance in insect morphology in the last two decades, especially in Europe and Japan. Recent years have been characterized by matrices of increasing size and a distinctly improved documentation of the characters, made possible by the use of a broad array of techniques and an optimized workflow (359). The Bayesian results of the largest morphological character state matrix used in insect systematics up to that time (48) were fully compatible with transcriptomic studies (49)(50), indicating that morphology can still play a role in estimating and corroborating molecular phylogenetics.
Many examples of the value of integrated phylogenetics come from Misof et al. (50). Perhaps the most unusual result from Misof et al. was the grouping of Psocodea with Holometabola, but the possibility that model misspecification had influenced this placement could not be eliminated (50). Morphological data provide another reason to be skeptical (152). Zoraptera was not strongly placed in their study either (50), but it was reliably placed in a monophyletic Polyneoptera, as also suggested by recent morphological and embryological studies (360)(79). Despite recent progress, it is obvious that morphology has its limitations. Even large data sets of high quality create partially unsatisfying results, sometimes despite impressive lists of shared derived character states (synapomorphies) for presumptive clades, as in (360). Artifactual synapomorphies created by phylogenetic analyses – i.e. “cladistic noise” – often suggest results that are in fact insufficiently supported or not supported at all by any convincing features. An example is the “clade” Dictyoptera + (Zoraptera + Plecoptera) supported by a recent parsimony analysis of morphological data by Matsumura and colleagues (2015). As the authors pointed out, some of the obtained presumptive synapomorphies were obviously the result of misleading redundant evolution. Correlated characters can also cause artifacts in morphology-based phylogenetic reconstructions as addressed (361) in the context of the Palaeoptera problem (362)(342). Solutions were suggested, based on modified weighting or the exclusion of characters.
The Closest Relatives of Hexapods. A crucial question was apparently inaccessible to morphological approaches but was largely solved with molecular data: the systematic position of Hexapoda. A monophyletic Tracheata (Myriapoda + Hexapoda) was considered as granted in morphology-based studies, with Hexapoda either placed as sistergroup of Myriapoda (millipedes and centipedes), or as the sister taxon of a myriapod subgroup (363)(364). Molecular data sets of different size and composition and analyzed with different approaches, consistently yielded a clade Pancrustacea (also called Tetraconata), usually with hexapods placed among paraphyletic crustacean lineages (50)(257)(347)(349)(346)(366)(367)(368). Even though some morphological arguments for Pancrustacea have been presented (369)(370), a formal character analysis is still lacking and the morphological evidence far from convincing. Possible candidates for closest relatives of Hexapoda within Pancrustacea include the highly specialized relict group Remipedia, Malacostraca, and possibly the miniaturized Cephalocarida (50)(347), although other studies contradict this (117)(371). The tremendous morphological gap between these aquatic groups and the terrestrial hexapods hinders meaningful comparisons of morphological characters and hypotheses of homology. It was pointed out by Klass and Kristensen (73) that the monophyly of Hexapoda is not strongly supported morphologically, with basically only one character complex defining it–the regional specialization of the body into head, thorax and abdomen, with the thorax divided into 3-segments and an 11-segmented abdomen. However, as shown in Beutel et al. (372) the Pancrustacea concept has strong implications for the hexapod groundplan. The strongly supported placement of Hexapoda among crustacean groups implies an entire series of additional hexapod autapomorphies, such as terrestrial habits, simplified walking legs, fusion of the 2nd maxillae (labium), the loss of the ventral food rim, the absence of midgut glands and nephridial organs, and others.
Confidence and caution
Figure 5b shows an exponential growth in the size of datasets since the late 1980s, and we expect this growth to continue. It is tempting to think that every part of insect phylogeny has now been resolved with large datasets. Almost every node has strong bootstrap support. But bootstrap support was designed to evaluate stochasticity and with large datasets, stochasticity is reduced or even eliminated. This would be considered a good thing, if models and assumptions were perfect. However, because of the size of current datasets, small biases in the data, or misspecifications of the model can result in strong bootstrap support for error. We predict that model refinement will be a rich source of discovery in the future. For example, the failure to resolve the Palaeoptera problem, as indicated by quartet mapping4 (50), may point toward a true case of incomplete lineage sorting. Many of our difficult to resolve nodes may have complex and conflicting gene tree histories. However, we are reluctant to assume this biological explanation without appropriate analysis. Such a comfortable explanation for misbehaved data, or inappropriate models can make analytical failures seem like new discoveries. Every caution available at the time was applied by Misof et al. (50), and we find no reason to doubt their results. Moreover, their cautions are reflected in the expansive supplementary materials (50). We would like to emphasize, that phylogenies can never be more than hypotheses, subject to the limitations of models and assumptions. This statement is obvious, but bears repeating in the light of enormous datasets that are now available.
With every advance, from cladistics, to Sanger sequencing, and now genomics, we saw a wave of overconfidence that intractable conflicts would be solved, only to learn of new obstacles. We are only beginning to understand the behavior of large datasets, which is why we cannot write about these innovations from a historical perspective. The discipline will continue to improve. We still look for confirmation and congruence from other sources of data, such as morphology, and rare genomic events.
Both morphological and molecular investigations focused on insect systematics have made tremendous progress in the last decade. Similarly the investigation of extinct insects has increased its pace with advanced morphological techniques allowing a stunningly detailed reconstruction of amber fossils. Although improvements could still be made in communication across different lines of investigation, it is unlikely that we will see many major revolutions in deep insect phylogeny, outside the groups we have flagged as unresolved. The disagreement over parsimony vs. likelihood has been resolved. With genomics, we will likely continue to learn about the function of genes and links between developmental and phylogenetic processes and how these processes change over time and across lineages. Optimized pipelines4 of processing and connecting different sources of evidence is presently a key target for the future. Such pipelines are one of the main aims of 1KITE and associated projects. Continued integration of different disciplines will likely lead to a much better understanding of the complex evolution of insects, revealing why this group of organisms reached unparalleled species diversity and successfully conquered virtually every terrestrial and freshwater environment on Earth.
We thank Nicole Tam for preparing the illustrations. Phil Ward, Marek Borowiec, Harald Letsch, Charles Mitter, John Huelsenbeck, Bjorn v. Reumont, Duane McKenna, Günther Pass, Alex Blanke, Bernhard Misof, and Sabrina Simon, provided helpful comments. John Morse provided valuable insights about his advisor, Herbert Ross. KMK thanks the Schlinger endowment for funding.
We have no competing interests.
Fig. 1. Phylogeny modified from Börner 1904. Taxa are named by modern convention.
Fig. 2. Hennig’s 1969 phylogeny(11), combined and modified from the original figures. Numerals indicate fossils as Hennig listed in his figures: 1. Rhyniella; 2. Eopterum (no longer considered an insect); 3. Rhyniognatha; 4. Monura; 5. Triassomachilis; 6. Triplosoba pulchella ; 7. Permoplecoptera; 8. Alleged subgroups of Ephemeroptera; 9. Erasipteron; 10. Protodonata (Meganisoptera); 11. Protanisoptera; 12. Protozygoptera; 13. Stemgroup of Anisozygoptera+Anisoptera; 14. Sheimia sojanensis; 15. Protoelytroptera; 16. Mesoforficula and others; 17. Puknoblattina; 18. Paleozoic “Problattoidea” and Blattodea; 19. Oedischia; 20. Glosselytodea; 21. Sthenaropodidae; 22. Oedishiidae, Elcanidae; 23. Tettavus; 24. Triassolocusta; 25. Tcholmanvissia ; 26. “Paraplecoptera” sensu Sharov (now Eoblattida Handlirsch 1906); 27. Protoperlaria (now Prothorthoptera Handlirsch 1906); 28. Perlopsis and other definitive Plecoptera; 29. Permopsocodea; 30. Procicadellopsis; 31. Archipsyllidae; 32. Permothrips longipennis; 33. Permaphidopsis; 34. Mesococcus asiaticus; 35. Archescytinidae; 36. Cicadopsyllidae; 37. Permaleurodes rotundatus; 38. Auchenorrhycha; 39. Paraknightia; 40. Boreocixius; 41. Permosialis; 42. Palaeohemerobiidae and Permithonidae, sensu Carpenter; 43. Tshekardocoleus and other branches; 44. Archezyela; 45. Mecoptera from Australia; 46., 47. Paratrichoptera; 48. Microptysma; 49. Microptysmodes
Fig. 3. Modified from Wille 1960 (31). Taxa are named by modern convention.
Fig. 4. Main image: Electron micrograph of a male Stylops ovinae (Strepsiptera). All insects have a 3 segmented thorax, each with a pair of legs. Wings, when present, are found on the second and third segments. Strepsipterans have reduced forewings modified as sense organs (arrows) attached to the small middle thoracic segment. Their anterior thoracic segment is greatly reduced. Flies have similarly reduced hindwings, attached to the 3rd thoracic segment. The 3rd thoracic segments of Strepsipterans and beetles are highly expanded, containing the functional wings and associated flight muscles. Image copyright Hans Pohl, used with permission. Insert: Wikipedia creative commons.
Fig. 5. Current consensus, modified from Misof et al.(50). Previous studies mentioned in this review are numbered and color coded on the left, with nodes they supported on the right. Red; morphology, without formalized data matrices. Orange: Morphology, with computer analysis. Blue: Sanger sequenced data in which rRNA played a predominant role. Black: Sanger sequenced multiple nuclear protein-coding genes. Green: Large genomic or transcriptomic data. Panel B. The sizes of datasets, plotted through time. The Y axis is on a log scale. Colors as in Panel A. Data size is calculated by multiplying the number of taxa by the number of characters. For works where amino acids were used as characters, we multiplied the number of characters by 3, so that these datasets were comparable to nucleotide datasets. Transcriptome work often has many missing data, so that character numbers were multiplied by the proportion of data present.
Table 1. Names of higher taxa used in the text, and the groups they define. Other groups are indicated on Fig. 5. Taxa in bold are supported in the current consensus. Taxa in italics have been strongly rejected. Those without indication are (in the opinion of these authors) targets for additional attention Citations are numbered as in the literature cited section, with the name of the first author and the last two digits of the publication year, for quick reference: (9)=Börner 04; (10)= Hennig 53; (11)=Hennig 69; (14)=Crampton 38; (31)=Wille 60; (34)=Mickoleit 73; (39)=Martynov 25; (47)=Wiegmann 09; (48)=Beutel 11; (49)=Peters 14; (50)=Misof 14; (62)=Kristensen 75; (63)=Kristensen 81; (69)=Boudreaux 79; (72)=Beutel 01; (74)=Wheeler 01; (76)=Wheeler 93; (77)=Whiting 97; (100)=Wipfler 11; (101)=Blanke 12; (107)=Kjer 04; (108)=Field 88; (110)=Turbeville 91; (114)=Wheeler 89; (116)=Pashley 93; (117)=Reumont 09; (140)=Edgecombe 00; (141)=Kjer 06; (152)=Beutel 06; (154)=McKenna 10; (159)=Chalwatzis 96; (170)=Ishiwata 11; (171)=Huelsenbeck 98; (175)=Niehuis 12; (177)=Boussau 14; (204)=Terry 05; (232)=Friedrich 95; (239)=Misof 07; (242)=Luan 05; (243)=Giribet 04; (244)=Gao 08; (245)=Mallatt 09; (248)=Carapelli 00; (249)=Krauss 04; (251)=Bonneton 06; (256)=Boore 95; (281)=Wan 12; (276)=Chen 14; (308)=Cryan 12; (319)=Aspöck 02; (434)=Sasaki 13; (347)=Regier 10; (349)=Meusemann 10; (351)=Simon 12; (355)=Letsch 13; (367)=Cook 01; (373)=Blanke 14; (374)=Beier 69; (375)=Hadrys 12; (376)=Letsch 12; (377)=Savard 06; (378)=Staniczek 00; (379)=Bitsch 04; (380)=Kristensen 97; (381)=Bitsch 00; (382)=Shao 99; (383)=Giribet 01; (384)=Mallatt 06; (385)=Dell’Ampio 09; (386)=Hovmoller 02; (387)=Pisani 04; (388)=Regier 01; (389)= Rota-Stabelli (390)= Seeger 79 (391)= Blanke 15