Software and Databases

My research has required that I develop and curate a number of databases as well as write software to address the questions that I am interested in. I have done extensive database development in MS Access, FileMaker and MySQL. For software development, I do my coding using Emacs, ConTEXT, and MS Visual Studio.NET. The languages I mainly use are Perl, Visual Basic, SQL, and R/GNU-S. I will use C/C++ on occasion, but I avoid it as much as possible.

Program: RepMiner: Transposable Element Data Mining Software
Languages: Perl, MySQL
SourceForge Project Website
Last Updated:
Oct, 2009

The RepMiner package takes a graph theory approach to the taxonomic assignment of DNA sequences with a focus on the biology of transposable elements. RepMiner makes use of transposable elements identified from model species to map the location of putative transposable elements onto homology based networks derived from comparing the sequences of the target genome to itself. This package is currently under heavy development.

Program: DAWGPAWS : Genome Annotation Pipeline
Languages: Perl
Info: SourceForge Project Website
Last Updated: June, 2009

DAWG-PAWS is a suite of scripts written in Perl that are desinged to assist a Distributed Annotation Working Group (DAWG) in the sequence annotation of BAC sized contigs. Since this suite of software was initially written to annotate randomly sampled wheat BACs, it is refered to as a Pipeline to Annoate Wheat Sequnces (PAWS). Although these program were initially designed for wheat, the scripts can be applied to nearly any eukaryotic sequence annoation pipeline.

Program: BACMan: BAC Data Management
Languages: MS Access, MySQL, Visual Basic for Applications and Visual Basic.Net
Info: SourceForge Project Website
Last Updated:
August, 2005

Bacterial Artificial Chromosome Data Management (BACMan) is a Microsoft Access based application designed for the management and analysis of hybridization data related to the high throughput screening of large insert genomic libraries associated with physical mapping projects.

Program: BACGrid
Languages: MS Word, MS Excel, Visual Basic For Applications
Last Updated:
January, 2002

This is an automated scoring system for the scoring of dot blot films resulting from hybridization screening of large insert genomic libraries. This software is no longer under active development but anyone else is welcome to take over development of this package. It is a great free alternative to Optical Character Recognition.

Program: BACEater
Languages: Visual Basic.NET
Last Updated: March, 2004

This program will convert scoring grid data from the ABBYY Fine Reader optical character recognition program to the format needed for the BACMan package. This is implemented as a standalone executable.

Languages: MS Access, Visual Basic for Applications
Last Updated:
March, 2006

A DNA Sequence database that allows for the management of small to medium size databases. This database is suitable for use by beginners in bioinformatics and includes useful functions such as parsers for FASTA formatted text files and conversion between UNIX and DOS file formats. This also includes the Big Dawg BLAST program which provides a graphical user interface to BLAST.

Program: OligoMan
Languages: MS Access, Visual Basic for Applications
Last Updated: March, 2005

The OligoMan database is designed for the management of data related to synthesized oligonucleotides. This database was used at PGML to keep track of nearly 30,000 oligos designed as PCR Primers and overgos.

Program: My Genome Collection with Assigned Taxonomy
Languages: PERL, MySQL
Info: SourceForge Project Website
Last Updated:
August, 2005

MyGCAT allows for the management of extremely large sequence database in MySQL. The MySQL database may be interfaced via a set of PERL scripts that allow for automated update of NCBI GenBank database files as well as maintenance of local proprietary sequence databases. All sequence files are taxonomically referenced and an HTML interface allows for the creation of  taxon specific BLAST databases.

Program: JPerl: Jamie's Perl Scripts for Bioinformatics
Information: Project Home, Downloads, Source, Project Wiki
Languages: PERL
Last Updated: April, 2007

Various PERL scripts that I have written for processing bioinformatics data as well as general PERL subfunctions that I use. Some of these scripts rely on the bioperl library or have other dependencies that are defined on the project pages. This set of scripts is hosted by

Program: JaRchive: Jamie's R Script Archive
Languages: R Statistical Programming Language
Information: Downloads, Source, Wiki
Last Updated:
April, 2007

A set of Scripts that I have written for the R Statistical Programming Language. This is not an R library, these are scripts that rely on other libraries for the heavy lifting.

Program: Visual Basic Functions For Bioinformatics
Languages: Visual Basic for Applications & Visual Basic.NET
Last Updated: February, 2005

A number of the visual basic functions that I have written for bioinformatics are here. Many of these are incorporated in my MS Access applications, but I have listed them here separately for others to use in their own modules.

Database: LabMan: PGML Lab Management Database
Languages: MS Access, Visual Basic for Applications, gdCom
Last Updated:
December, 2004

This database allows for inventory management of chemicals, lab supplies, and materials used at the Plant Genome Mapping Lab. The database makes use of barcode labels that are placed on drawers and cabinets throughout the lab to help track where items are stored. This database integrates inventory, an ordering database as well as pipette management. Early editions of this database were developed in FileMaker to allow for cross compatibility with both Macs and PCs. Current development of this database has been in MS Access. Some of the Access code for the ordering database was written by Hash Lala.

Database: Southeastern Endemics Database
Languages: MS Access, Visual Basic for Applications, PERL, ArcView 3.x, ArcView Spatial Analyst, Arc Avenue, 3DEM
Last Updated:
May, 2004

This database of rare plants endemic to the Southeastern United States was compiled in order to test for biogeographic and taxonomic selectivity in the distribution of rare plants endemic to the Southeastern United States. The data resulting from this work has been published in Castanea [1].

Author: James Estill
Last Updated: October 15, 2009

The content and opinions expressed on this web page do not necessarily reflect the views of nor are they endorsed by the University of Georgia or the University System of Georgia.