Now: Tutorial for Web and Software Design > Programming > Perl > Programming Content
> Data Munging for Non-Programming Biologists [Bookmark it]
Data Munging for Non-Programming Biologists

Data Munging for Non-Programming Biologists
by Amir Karger, Eitan Rubin |

Filling the Niche: The Scriptome and Other Solutions

One of my greatest concerns in talking to people about biologists' data munging is that people don't even realize that there's a problem, or they think it's already been solved. Biologists--who happily pipette things over and over and over again--don't realize that computers could save them lots of time. Too many programmers figure that anyone who needs to can just read Learning Perl. I'm all for that, of course, but experimental biologists need to spend much more of their time getting data (dissecting bee brains, say) than analyzing it, so they can't afford the time it takes to become programmers. They shouldn't have to. Does the average biologist need multiple inheritance, getprotobyname(), and negative look-behind regexes? There's a large body of problems out there that are too diverse for simple, inflexible tools to handle, but are too simple to need full-fledged programming.

How about teaching a three-hour course with just enough Perl to munge simple data? At minimum, it should teach variables, arrays, hashes, regular expressions, and control structures--and then there's syntax. "Wait, what's the difference between @{$a[$a]} and @a{$a[$a]} again?" "Oh, my, look at the time." As Damian Conway writes in "Seven Deadly Sins of Introductory Programming Language Design" (PDF link), syntax oddities often distract newbies from learning basic programming concepts. How much can you teach in three hours, and how much will they remember after a month without practicing?

Another route would be building a graphical program that can do everything a biologist would want, where pipelines are developed by dragging and dropping icons and connectors. Unfortunately, a comprehensive graphical environment requires a major programming effort to build, and to keep current. Not only that, but the interface for such a full-featured, graphical program will necessarily be complex, raising the learning barrier.

In building the Scriptome, we purposely narrowed our scope, to maximize learnability and memorability for occasional users. While teaching programming and graphical tools are effective solutions for some, I believe the Scriptome fills an empty niche in the data munging ecosphere (the greposphere?).

Creation Is Not Easy

How much progress have we made in addressing the problem space between tool use and programming? Our early reviews have been mostly positive, or at least constructive. Suzy, our first power user, started out skeptical, saying she'd probably have to learn Perl because any tools we gave her wouldn't be flexible enough. I encouraged her to use the Scriptome in parallel with learning Perl. She ended up a self-described "Scriptome monster," tweaking tool code and creating a 16-step protocol that did real bioinformatics. Still, one good review won't get you any Webby awards. Our first priority at this point is to build a user base and to get feedback on the learnability, memorability, and effectiveness of the website, with its 50 or so tools.

It will take more than just feedback to implement the myriad ideas we have for improving the Scriptome, which is why I'm here to make a bald-faced plea for your help. The project needs lots of new tools, new protocols, and possibly new interfaces. You, the Perl.com reader, can certainly write code for new tools; the real question is whether you (unlike certain, unnamed CPAN contributors) can also write good documentation and examples, or find bugs in early versions of tools. We would also love to get relevant protocol ideas. Check out the Scriptome project page and send something to me or the scriptome-users mailing list.

Here's a little challenge. I really did have a client who renamed 768 files by hand before I could Perl it for him. Can you write a generic renaming atom that a NPB could use? (Hint: "Tell the user to learn regular expressions" is not a valid solution.) The winner will receive a commemorative plaque (<bgcolor="gold">) on the Scriptome website.

Speaking of new interfaces, one common concern we hear from programmers is that NPBs won't be able or willing to handle the command-line paradigm shift and the few commands needed (cd, more, dir/ls) to use the Scriptome. In case our users do tell us it's a problem, we're exploring a few different ways to wrap the Scriptome tools, such as:

  • A Firefox plugin that gives you a command line in a toolbar and displays your output file in the browser. (Currently being developed by Rob Miller and his group at MIT.)
  • An Excel VBA that lets you put command lines into a column, and creates a shell script out of it.
  • Wrapping the command-line tools in Pise (web forms around shell commands) or GenePattern (a more general GUI bio tool).

We'll probably try several of these avenues, because they allow us to keep using the command-line interface if desired.

As for the future, well, who says that only biologists are interested in munging tabular data? Certainly, chemists and astronomers could get into this. I set my sights even higher. How about a Scriptome for a business manager wanting to munge reports? An Apache Scriptome to munge your website's access logs? An iTunes Scriptome to manage your music? Let's give users the power to do what they want with their data.

Sorry, GUI Neanderthalis, but you can't adapt to today's data munging needs. Make room for Homo Scriptiens!

Prev  [1] [2] [3] 

[Bookmark][Print] [Close][To Top]
  • Prev Article-Programming:

  • Next Article-Programming:
  • Related Materias
    Using the For匛ach Stateme
    Crystal Reports Tutorial -
    Creating a Toolbar - Visua
    Using Office applications 
    Visual Basic Explorer - Tu
    Visual Basic Explorer - Tu
    Visual Basic Explorer - Si
    Visual Basic Explorer- Gam
    Visual Basic Explorer- Gam
    Visual Basic Explorer- Gam
    Topics
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Graphic Design Tutorial
     

    Coreldraw Tutorial

      Illustrator Tutorial
      3D Graphics Articles
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial&Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial&Articles
     

    XML Style Tutorial

      AJAX Tutorial
      XML Mobile
    Flash Tutorial&Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial&Articles
     

    Linux Tutorial

      Symbian Tutorial
      MacOS Tutorial