Friday, July 4th, 2008

Backing up your Mac on an external disk

Filed under: Science and Technology — Daniel Lemire @ 19:28

A couple of weeks ago, I needed to backup my MacBook Pro to an external disk (a firewire G-Drive) because my hard drive was failing. I started shopping for a good backup solution, but none of them had the following features:

  • support for incremental backups: if a change is made, you only backup the files that differ;
  • adequate handling of IO errors (no all-out abort);
  • inexpensive.

Indeed, I tried two different tools, but they refused to backup my disk due to numerous IO errors. They would not even tell me how to fix my problem.

As it turns out, your Mac has already all it needs, by default, to do just that. First, create a file called “backup.sh”, make it executable (chmod +x backup.sh) and copy the following content to it:


#!/bin/sh
RSYNC="/usr/bin/rsync -E"
# my external disk is located at /Volumes/G-DRIVE\ MINI/
sudo $RSYNC -a -x -S --delete \
--exclude-from backup_excludes.txt $* / /Volumes/G-DRIVE\ MINI/
sudo bless -folder /Volumes/G-DRIVE\ MINI/System/Library/CoreServices

Then run it! Go to a shell and type “./backup.sh”. It will ask for you root password.

If you ever need to restore your files, then create a file called “restore.sh” with the following content:


RSYNC="/usr/bin/rsync -E"
sudo $RSYNC -a -x -S --delete \
--exclude-from backup_excludes.txt $* /Volumes/G-DRIVE\ MINI/ /Volumes/Maci
ntosh\ HD/
sudo bless -folder /Volumes/Macintosh\ HD/System/Library/CoreServices

Executing restore.sh may prove dangerous. Make sure you have tried booting from the external disk first. To boot from an external disk, I think you have to hold down the command key while rebooting.

Classifying research projects by depth

Filed under: Academia/Research — Daniel Lemire @ 10:02

Everything else being equal, picking the right problems is the key factor determining your success as a researcher (no matter how you define success). In a previous post, I proposed three categories of research problems:

  1. explain a previously unexplained observation;
  2. perfect an existing technique;
  3. invent a new problem.

It appears that all 3 categories are equally valid. Which technique you prefer is a matter of style.

Today, I would like to propose a new, orthogonal, categorization in terms of the depth of the problem you tackle. Some problems

  1. are narrow and well-defined, you can complete them in a few months;
  2. form a set of narrow and well-defined problems, likely to keep you busy for years.

I have tended myself toward the first category (see “my research process“). The benefit of a focused burst of research producing a distinct result should not be underestimated. The most obvious benefit is that you can quickly move on and thus, you can afford to try your hand at random problems. It is the equivalent of a hit-and-run. If you are the curious sort, it allows you to learn about a new topic, without investing your career in it. However, it makes applying for grants more difficult. You are also less likely to achieve some recognition because the depth of your contribution might be less.

The second category means that you must find yourself a niche and work over it for years. Indeed, preferably, not too many people in the world must be aware of these problems you have identified. The catch is: how can you know, ahead of time, that the topic and the problems you see now, will still be interesting in two or three years? Are you investing in vain? Presumably, if you can follow this strategy, grant applications and recognition may come more easily. But what happens if you get bored?

The two categories relate to how you read papers. If you read papers thinking “maybe I could build on their work”, then you will naturally tend to the first category. Reading a lot of papers on different topics favors random hit-and-run research projects. Are you reading the list of accepted papers looking for clues as to what you will work on next? Are you attending talks to pick up random new ideas?

However, if you tend to “pull” research papers out of the (virtual) library based on your own ideas, then you will more likely gravitate toward the deeper research projects. In this case, your mental filters are much stronger: you tend to filter out everything that does not directly relate to your goals. You may still attend many conferences, and read lists of accepted papers, but your brain will filter most of the data out.

Friday, June 27th, 2008

List of Accepted Papers to Large-Scale Recommender Systems Workshop

Filed under: Science and Technology — Daniel Lemire @ 17:03

We just posted the list of accepted papers to second workshop on Large-Scale Recommender Systems and the Netflix Prize Competition. Here are the titles:

  • Jinlong Wu and Tiejun Li. A Modified Fuzzy C-Means Algorithm For Collaborative Filtering
  • Gavin Potter. Putting the collaborator back into collaborative filtering
  • Andreas Toescher, Michael Jahrer and Robert Legenstein. Improved Neighborhood-Based Algorithms for Large-Scale Recommender Systems
  • Tamas Kiss, Miklos Kurucz, István Nagy and Andras A. Benczur. Large-scale recommenders based on Association Rule Mining
  • Oscar Celma and Pedro Cano. From hits to niches? or how popular artists can bias music recommendations
  • Domonkos Tikk, Gabor Takacs, Istvan Pilaszy and Bottyan Nemeth. Investigation of Various Matrix Factorization Methods for Large Recommender Systems

Tuesday, June 24th, 2008

Good research: invent new problems or explain mysteries

Filed under: Academia/Research — Daniel Lemire @ 19:12

It is a lot of work to grind through a research project and get an interesting paper out of it. Mostly, you have to be patient enough and work everyday at it. If you follow a sane process, it is difficult to fail entirely.

Picking the right research question is very important however: it is difficult to recover from a bad choice of topic. There are at least 3 types of good research questions: 1) explain with a theoretical model a (puzzling) experimental observation 2) improve by at least an order of magnitude an existing technique 3) make up a new problem and be the first to propose a solution (I call it Turney’s way).

I now believe that options 1 and 3 are far better than option 2. To illustrate my opinion, here is a little scenario:

  • read a paper;
  • think to yourself: I could improve this idea ten times over;
  • get excited, dream of fame, start crafting a paper;
  • late on Friday night, realize your contribution is tiny;
  • keep going (because you have invested so much);
  • months later, publish a weak paper.

So I submit to you Lemire’s first rule of good research: you must either be trying to explain puzzling experimental results, or be inventing new problems. In some sense, it amounts to discarding the “engineering way” which is to constantly perfect existing techniques.

Further reader: I have written much about how I think one can write a good paper and about my usual research process.

Monday, June 23rd, 2008

Lowly tasks you should do

Filed under: Academia/Research — Daniel Lemire @ 8:29

Many of my colleagues never mark assignments. I tend to mark papers on nearly a weekly basis. Why am I doing this? Because I believe that marking assignments is the best way to identify the weaknesses in my courses and learn from my students.

Many researchers never implement their ideas. They let their students do the lowly implementation work. I almost always do at least some of the implementation in all projects I work on. Why am I doing this? Because I believe that you never really understand an idea, even your own, until you have put it in practice. You never know how it feels to ride a bicycle until you have done it once, no matter how great your mind is.

On an unrelated note, my friend Yuhong came over during the week-end. She is a brand-new Software Engineering professor at Concordia University. She bought my wife some gorgeous flowers. Nice.

Thursday, June 19th, 2008

The Disadvantages of an Elite Education

Filed under: Academia/Research — Daniel Lemire @ 9:53

I just read a great essay by William Deresiewicz, an associate professor of English at Yale. His message is clear: heavy-league education is flawed.

Here is the killer sentence:

It’s no coincidence that our current president [Bush], the apotheosis of entitled mediocrity, went to Yale.

Via Sébastien Paquet.

See also my posts It may not matter all that much where you go to college and The 2 myths getting students into heavy-league schools.

Disclaimer. I am a University of Toronto graduate. The closest thing Canada has to an Elite education, I would guess.

Wednesday, June 18th, 2008

Too much stress

Filed under: Family and Health — Daniel Lemire @ 14:06

I suffer from exhaustion. In the last few weeks, I had to resign from a few hats I wore. I resigned as union treasurer and I resigned as chair of the IT M.Sc. degree. Today, I stood up my friend Yuhong on a lunch date. Too much to do, too little time. I have reached a breaking point.

Next Page »

30 queries. 0.227 seconds. Valid XHTML

Powered by WordPress

Subscribe to this blog in a reader or by Email.