Articles > Mathematics > Bioinformatics > Beginning Perl for Bioinformatics by J. Tisdall

Beginning Perl for Bioinformatics by J. Tisdall

Beginning Perl for Bioinformatics 



  • James Tisdall


  • Paperback: 400 pages
  • Publisher: O’Reilly Media; 1st edition (October 15, 2001)
  • Language: English
  • ISBN-10: 0596000804
  • ISBN-13: 978-0596000806
  • Product Dimensions: 9.2 x 7.1 x 0.9 inches
  • Shipping Weight: 1.32 pounds 


Book Description

With its highly developed capacity to detect patterns in data, Perl has become one of the most popular languages for biological data analysis. But if you’re a biologist with little or no programming experience, starting out in Perl can be a challenge. Many biologists have a difficult time learning how to apply the language to bioinformatics. The most popular Perl programming books are often too theoretical and too focused on computer science for a non-programming biologist who needs to solve very specific problems. Beginning Perl for Bioinformatics is designed to get you quickly over the Perl language barrier by approaching programming as an important new laboratory skill, revealing Perl programs and techniques that are immediately useful in the lab. Each chapter focuses on solving a particular bioinformatics problem or class of problems, starting with the simplest and increasing in complexity as the book progresses. Each chapter includes programming exercises and teaches bioinformatics by showing and modifying programs that deal with various kinds of practical biological problems. By the end of the book you’ll have a solid understanding of Perl basics, a collection of programs for such tasks as parsing BLAST and GenBank, and the skills to take on more advanced bioinformatics programming. Some of the later chapters focus in greater detail on specific bioinformatics topics. This book is suitable for use as a classroom textbook, for self-study, and as a reference. The book covers:

  • Programming basics and working with DNA sequences and strings
  • Debugging your code
  • Simulating gene mutations using random number generators
  • Regular expressions and finding motifs in data
  • Arrays, hashes, and relational databases
  • Regular expressions and restriction maps
  • Using Perl to parse PDB records, annotations in GenBank, and BLAST output 

Book Info

Designed to get you quickly over the Perl language barrier by approaching programming as an important new laboratory skill revealing Perl programs and techniques that are immediately useful in the lab. Softcover.



Very timely introduction to PERL, November 10, 2001

Finally someone has written a beginning book on PERL for biologists, and has also done an excellent job of doing so. This book assumes no prior programming experience, and therefore suits the biologist who needs to concentrate on using computers to solve biological problems, and not have to become a computer scientist in the process. PERL can be a very cryptic language, but it is also extremely concise, and PERL programmers frequently and rightfully boast about their "one-liners" that accomplish complicated tasks with only one line of code.

Since it is addressed to readers with no programming experience, the author introduces some elementary concepts of programming in the first three chapters. These include what text editor to use, how to install PERL, how run PERL programs, and other relevant elementary topics.

The author then gets down to writing a program to store a DNA sequence in chapter 4. Very basic, it merely reads in a string and prints it out, but serves to start readers on their way to developing more useful programs. Later a program for the transcription of DNA to RNA is given, which illustrates nicely the binding, substitution and trace operators. Block diagrams are used here, and throughout the book, to illustrate basic PERL operators. The author shows in detail how to read protein sequence data from a file and how to use it in a PERL program. The reader is also introduced to the most ubiquitous data structure in all of computing: the array. Already the reader gets a taste of the power of PERL to manipulate arrays, using operations such as ‘unshift’, ‘push’, ‘splice’, etc.

The next chapter introduces conditional statements in PERL, as a warm-up for the discussion on finding motifs in sequences. The reader can see why PERL is the language of choice in bioinformatics, with its ability to find substrings or patterns in strings. Things do become more cryptic in the discussion of regular expressions, but the reader can get through it with some effort. Interesting programs are given for determining the frequency of nucleotides.

Since the programs have become more complicated to this point, a discussion of subroutines follows in the next chapter. And, for the same reason, the reader is introduced to debugging in PERL in this chapter also. The greater the complexity of the program, the harder it becomes to avoid making mistakes, and even more difficult to find them. The very important concepts of pass by value versus pass be reference are discussed briefly in this chapter.

Random number generators, so important in any consideration of mutations, are discussed in chapter 7. It is shown, via some straightforward programs, how to select a random location in DNA and mutate it with some other nucleotide. In addition, the author shows how to use random numbers to generate DNA sequences and mutate them in order to study the effect of mutations over time.

The next chapter is the most interesting in the book, for it shows how PERL can be used to simulate how the genetic code directs the translation of DNA into protein, the hash data structure being used extensively for this purpose. The author shows how to read DNA from files in FASTA format, and discusses in detail reading frames. He gives a useful subroutine to translate reading frames.

The author returns to regular expressions in chapter 9, wherein they are used as ‘wildcards’ to search for a particular string in a collection of strings. In addition, the range operator is used to find restriction sites. Regular expressions are also used in the next chapter to manipulate GenBank ‘flat files’. The author does however give URLs for more sophisticated bioinformatics software. This is followed in chapter 11 by a discussion of the use of PERL to work with files in the Protein Data Bank. Recursion, one of the most powerful techniques in programming, is introduced here.

Chapter 12 covers the Basic Local Alignment Search Tool (BLAST), wherein readers get a taste of the field of computational biology. This extremely popular software package is used to find similarity between a given sequence and a library of known sequences. The author does discuss some of the basic rudiments of string matching and homology, and encourages the reader to consult the BLAST documentation for further details. In addition, the author briefly discusses the Bioperl project in this chapter, and shows the reader how to run some elementary computations using it.

This book definitely is a timely one and it will serve the needs of biologists who need to obtain some programming expertise in PERL. There are helpful exercises at the end of each chapter that serve to solidify the understanding of the concepts introduced in the chapter. After a thorough study of it, readers will be well-equipped to use PERL in bioinformatics. With more mathematical background, readers after finishing it will be able to enter the exciting field of computational biology, a field that is exploding, and one in which will require imaginative programming skill in the future.