download free 30 days trial version buy Bucket Explorer
   Documentation   Download   Purchase   Support   FAQs    Forum    Demo  

In the market for rolling backup

Suggest a new feature and help us improve Bucket Explorer for Amazon S3

In the market for rolling backup

Postby justSteve on Fri Sep 14, 2007 3:53 pm

I'm attempting to implement a stratagy that will watch a select group of local folders and, upon any additions or updates, save that file to S3. Before uploading the file it should be renamed to include the current date/time (e.g. myFile.txt.0914071013).

Additionally, I want the operations logged in a way that it can a) be recorded to Google Calendar; be imported to my local SQLExpress (or mySQL) server. (should be simple to parse a log to a series of INSERT statements).

Point is to create a rolling backup where any changed/added file is a) recorded as such for auditing purposes; b) all prior versions of a given file remain available.

I'm a software developer and want avoid the learning curve of the whole SVN thing. And I want an easy way to look back x number of weeks can explore which files changed and so on.



I'm currently using SuperFlexibleFileSynch which meets many of my requirements but, at this point, appears to have these shortcomings:

Overly long changed file detection process. I'm saving files from any of several dozen different subdirectories drilling down to thousands of files. If I have the operation scheduled to fire off every 15 minutes the apps appears to have to search thru all my defined directories. Last time it took close to 45 seconds to complete - during which, a popup box displayed the progress. The fact that it took 45 seconds in and of itself isn't a big deal but 2 things concern me:

1) Obviously I don't want my work interuppted every 15 seconds
2) I'd be somewhat concerned for system stability with this kind of operation going on so frequently. (Windows has enough problems without adding to them).

I assume SFFS is literally comparing local files to S3 at some level or another. I would think this could be avoided by making the programic assumption that S3 files will not have changed - no comparison needed - just track changed/added local files and upload 'em.

The other shortcoming concerns the procedures of bringing files back down. SFFS does not make it easy to bring a bunch of files back to my local system...it's oriented to being a mirroring tool as one would expect of a synchronizer.

But more often than not I'm looking for a simple way to bring files back down to compare them to existing files so I'd what an easy way to restore to an alternate path.

The other primary shortcoming is logging. SFFS produces a log of it's actions and I could produce an app that processes that log and uploads the info to Google's Calendar but I bet you guys would make much shorter work of that than i could. ;)

As you prepare the commandline tool I hope you'll take pains to protect system stability and unattended/non-intrusive operation. Don't bother developing a scheduler...Windows has that covered nicely with Task Scheduler.


What i'm describing is more than just a backup...it's automating the journaling process that would make it trival to see the inter-relationships of my various projects by showing which code (and the associated media files) changed at any given point in time.

I'm going to hammer at it from one more angle cuz i think it's a) important; b) not be described or implemented elsewhere....

A conventional, standalone backup can be tedious to search because, typically, it requires you to know the date the given file changed. If your search allows you to drill down the the given file and see all the times it changed you add an important dimension to the toolkit.

whew...really didn't mean to dump this much ascii when I started....sometimes i just can't help myself.

thx for listening
--steve...
justSteve
 
Posts: 4
Joined: Fri Sep 14, 2007 3:09 pm

Postby saurabh on Sun Sep 16, 2007 1:17 am

Steve,

Thanks for the feedback & your inputs on how the backup program should work. You have some very good suggestions here, specially adding the backup event to Google Calendar and the rolling backup.

I have never used the software you have mentioned in your post, so I cannot comment on how do they compare if a file already exists on Amazon or not. We will not only look at the modified time for sure. We have not done that in Bucket Explorer and we can't do that in the new tool. Many of our clients have sent feedback on our hash comparison and they like it the way it works today. It is very efficient and 100% robust.

Again, thanks for your feedback.

Saurabh
saurabh
 
Posts: 60
Joined: Tue Aug 26, 2008 8:30 am

Postby justSteve on Sun Sep 16, 2007 12:06 pm

The reference to the file date isn't for purposes of comparison...in fact, there is no attempt to make any comparison in this mode...it just writes all changed (non-archived bit) files to S3.

To avoid overwriting a file that had been uploaded, changed, and now ready for uploading again the program renames the file so that the date/time is embedded in the filename itself.

SFFS (http://www.superflexible.com/) changes myFile.txt to

myFile.d091607-t06555555.txt

I'm not clear on how the time format is arrived. I think for practical usage that perhaps minutes since midnight would work fine.

So only one bucket/folder is needed - the same file is kept unique in that folder cuz of the file re-naming procedure. I've referred to it as a rolling backup but i'm sure 'differential' backup refers to the same time. Finding the most efficient method of tracking changed files is surely a well known task by now.

Naturally you'd want to include plenty of include/exclude filters at the file level - perhaps RegExp-driven.
justSteve
 
Posts: 4
Joined: Fri Sep 14, 2007 3:09 pm

Postby saurabh on Sun Sep 16, 2007 12:27 pm

What if we just create a folder with date time stamp and upload the modified files in that folder instead of adding a time stamp on each file? Would that make the restore easier?

Or even better (may be) to somehow move / rename the old files and keep the most recent files without changing their names?

so, if the original folder had 2 files,
c:\data\file1.txt
c:\data\file2.txt

The first full backup would look like this ->
bucket\c:\data\file1.txt
bucket\c:\data\file2.txt

Now at time yyyymmdd.hhmmss, if file1.txt has not changed, and if file2.txt has changed, then the buckets would look something like this:
bucket\c:\data\file1.txt
bucket\c:\data\file2.txt (this is the new changed file).
bucket\modified_yyyymmdd.hhmmss\c:\data\file2.txt

This way, the master backup is always uptodate, and old files can be accessed if required.

This will only work if the individual files are not too big and there is very good network connectivity. If that is not the case, this schema may not be efficient, because Amazon does not provide a way to rename / move a file on S3. A rename is a download + upload command.
saurabh
 
Posts: 60
Joined: Tue Aug 26, 2008 8:30 am

Postby justSteve on Sun Sep 16, 2007 12:57 pm

I think I would avoid the overhead of download/rename.

One advantage of file mangling (the term SFFS uses for how they rename with date) is a simple directory listing shows all versions of a given file.

Another advantage is the fact that you are embedding yet another piece of information into the interface. When I look at the list of files i'm able to immediately see the date the file had been edited.

If i'm (mentally) in 'review mode', I may not be looking for myFile.txt...I might merely remember that i was working on myFile.txt the same day I worked on the file I _am looking for. The quicker you can show me myFile.txt's date the quicker i can locate the file who's name i couldn't remember.

Note that the date embedded in the filename isn't the current timestamp. It's the 'last edited' property of the given file. So if i were to fire the program now, the date embedded in the filename would be each files 'last edited' date was. That can be important for the initial backup...less so once i'm backing up changed files every 15 minutes.

Restoring really shouldn't be that difficult. You should be able to count on a schema where everything inbetween the first and second dots are the timestamp...should be relatively trival to locally rename based on that.

There could be a bigger challenge to displaying the sheer number of files that would eventually accumulate. That might be addressed by offering an archive function that showed only the x-number of most recent files.

Plus we could simply create a new, full backup in a new folder when the number of filenames to search/sort gets out of hand.

But from my usage perspective, I generate lots of changed files per day tending to be small text or graphic files.
justSteve
 
Posts: 4
Joined: Fri Sep 14, 2007 3:09 pm

Postby saurabh on Sun Sep 16, 2007 1:01 pm

Ok.. Got the point. Thanks for your input.
saurabh
 
Posts: 60
Joined: Tue Aug 26, 2008 8:30 am


Return to Suggest a new feature