Fast Gene Sequence Search for Very Large Data File (Scripts) Publisher's description
from Binlin Wu
A research fellow at Harvard asked me to write a program to search for gene sequence, such as вЂ?TCCвЂ™, and record the next 4 codes
A research fellow at Harvard asked me to write a program to search for gene sequence, such as вЂ?TCCвЂ™, and record the next 4 codes. The data file was 14Gb. He tried some matlab codes, and the system froze, or kept running and never stopped.
I first tested using a loop method (V1.0). It turned out it would take a month to finish 14Gb data on my 1.8GHz Core 2 Duo/3Gb RAM PC. Then I updated it to use matrix. It turned out it would only take 1.3 hours on my 1.8Gb PC or 40 minutes on my 2.33GHz Core 2 Duo/2Gb RAM PC. It beat any codes that he got using Python or other languages.
I put the file here, and hopefully it will be useful to the people with the same situation.
System Requirements:MATLAB 7.11 (2010b)
Program Release Status: New Release
Program Install Support: Install and Uninstall