Gossypium is a genus of flowering plants in the tribe Gossypieae of the mallow family, Malvaceae from which cotton is harvested. It is native to tropical and subtropical regions of the Old and New Worlds. There are about 50 Gossypium species, making it the largest genus in the tribe Gossypieae, and new species continue to be discovered. The name of the genus is derived from the Arabic word goz, which refers to a soft substance. Cotton is the primary natural fibre used by modern humans. Where cotton is cultivated it is a major oilseed crop and a main protein source for animal feed. Cotton is thus of great importance for agriculture, industry and trade, especially for tropical and subtropical countries in Africa, South America and Asia. Consequently, the genus Gossypium has long attracted the attention of scientists. The origin of the genus Gossypium is dated to around 5–10 million years ago. Gossypium species are distributed in arid to semiarid regions of the tropics and subtropics. Generally shrubs or shrub-like plants, the species of this genus are extraordinarily diverse in morphology and adaptation, ranging from fire-adapted, herbaceous perennials in Australia to trees in Mexico. Cultivated cottons are perennial shrubs most often grown as annuals. Plants are 1–2 m high in modern cropping systems, sometimes higher in traditional, multiannual cropping systems, now largely disappearing. The leaves are broad and lobed, with three to five (or rarely seven) lobes. The seeds are contained in a capsule called a 'boll', each seed surrounded by fibres of two types. These fibres are the more commercially interesting part of the plant and they are separated from the seed by a process called ginning. At the first ginning, the longer fibres, called staples, are removed and these are twisted together to form yarn for making thread and weaving into high quality textiles. At the second ginning, the shorter fibres, called 'linters', are removed, and these are woven into lower quality textiles (which include the eponymous Lint). Commercial species of cotton plant are G. hirsutum (>90% of world production), G. barbadense (3–4%), G. arboreum and G. herbaceum (together, 2%). Many varieties of cotton have been developed by selective breeding and hybridization of these species. Experiments are ongoing to cross-breed various desirable traits of wild cotton species into the principal commercial species, such as resistance to insects and diseases, and drought tolerance. Cotton fibres occur naturally in colours of white, brown, green, and some mixing of these. Most wild cottons are diploid, but a group of five species from America and Pacific islands are tetraploid, apparently due to a single hybridization event around 1.5 to 2 million years ago. The tetraploid species are G. hirsutum, G. tomentosum, G. mustelinum, G. barbadense, and G. darwinii. A public genome sequencing effort of cotton was initiated in 2007 by a consortium of public researchers. They agreed on a strategy to sequence the genome of cultivated, allotetraploid cotton. 'Allotetraploid' means that the genomes of these cotton species comprise two distinct subgenomes, referred to as the At and Dt (the 't' for tetraploid, to distinguish them from the A and D genomes of the related diploid species). The strategy is to sequence first the D-genome relative of allotetraploid cottons, G. raimondii, a wild South American (Peru, Ecuador) cotton species, because of its smaller size due essentially to less repetitive DNA (retrotransposons mainly). It has nearly one-third the number of bases of tetraploid cotton (AD), and each chromosome is only present once. The A genome of G. arboreum, the 'Old-World' cotton species (grown in India in particular), would be sequenced next. Its genome is roughly twice the size of G. raimondii's. Once both A and D genome sequences are assembled, then research could begin to sequence the actual genomes of tetraploid cultivated cotton varieties. This strategy is out of necessity; if one were to sequence the tetraploid genome without model diploid genomes, the euchromatic DNA sequences of the AD genomes would co-assemble and the repetitive elements of AD genomes would assembly independently into A and D sequences, respectively. Then there would be no way to untangle the mess of AD sequences without comparing them to their diploid counterparts. The public sector effort continues with the goal to create a high-quality, draft genome sequence from reads generated by all sources. The public-sector effort has generated Sanger reads of BACs, fosmids, and plasmids, as well as 454 reads. These later types of reads will be instrumental in assembling an initial draft of the D genome. In 2010, two companies (Monsanto and Illumina), completed enough Illumina sequencing to cover the D genome of G. raimondii about 50x. They announced they would donate their raw reads to the public. This public relations effort gave them some recognition for sequencing the cotton genome. Once the D genome is assembled from all of this raw material, it will undoubtedly assist in the assembly of the AD genomes of cultivated varieties of cotton, but a lot of hard work remains.