Friday, February 4, 2011

How do DVCSs (DRCSs) work?

I have been hearing a lot of good things about DVCS systems, in particular about bazaar. Apart from the concept of distributed repository, I see two main advantages being touted: the merge is better automated, and the rename is handled right.

Could someone please point me at some text explaining how exactly the improvements work? How does bazaar know that I renamed a file? What if I rename two files as part of the same commit? What happens when I refactor by putting half of the file's contents into a new file, re-indenting everything and losing some whitespace in nearly every line?

In other words, I'd like to hear from people using bazaar (or another DVCS) in real life, or from people who know how it (they) works. Is the merge really that much better? And how is it achieved?

  • I'm not familiar with bazaar, but git doesn't track file renames. To git, this looks like a delete and an add. However, git is smart enough to see that the contents of the file already exist in its repository and will track their position in the system. If you split files up or merge them it's smart enough to keep track of segments of code (blobs) and store that information too.

  • Please change the title. Your question seems to be specifically about bazaar, and has nothing to do with DVCS in general. Or you really have two unrelated questions: (1) How does bazaar handle renames, and (2) How do DVCS products handle merges?

  • @Kristopher Johnson: I am interested in how DVCS handle renames, file splitting, indentation changes and other things related to major refactoring when the time comes to merge the changes. I've already learned (from the answer by Kyle) that git is trying to track pieces of text rather than files. It would be nice to get a description on how git tracks pieces of text (what is the algorithm/heuristics used).

    I did hear that Bazaar handles rename in a particularly interesting fashion, but this is not the sole topic of the question. Sorry if the question was muddled.

    From Arkadiy
  • DVCS achieve better merges by tracking the parent revisions of merges. In Subversion, when you merge one branch into another, you lose information about where the merge originated from. In a DVCS like Bazaar or Git, the "merged" revision ends up with two parent revisions.

    Renaming is handled differently between DVCS's. Git, for example, does not track renaming at all because it wasn't important to Linus. Mercurial records them as "copy old file to new, delete old". According to Mark Shuttleworth, founder of Canonical, Darcs and Bazaar are the only DVCS's that handle file renaming correctly.

    How does bazaar know that I renamed a file?

    Renames are specified by the user, just like adding or removing files. Use the "bzr rename <old> <new>" command to mark files or directories for renaming. If you've already renamed a file in the tree, you can use the "--after" option.

    What if I rename two files as part of the same commit?

    Then you type "bzr rename <old> <new>" once for each file. Bazaar doesn't try to guess which files have been renamed.

    What happens when I refactor by putting half of the file's contents into a new file, re-indenting everything and losing some whitespace in nearly every line?

    Then you type "bzr add" on the new file, since you're not really renaming it.

  • Related question, with a useful answer:

    http://stackoverflow.com/questions/43995/why-is-branching-and-merging-easier-in-mercurial-than-in-subversion

    From Arkadiy
  • Merge is not intrinsically better in DVCS, it is just that they would be practically very difficult to use if the branch/merge did not work correctly (svn arguably does not implement branching/merging correctly), because instead of making a checkout, you are making a new branch everytime you start working on a project from an existing code. I think some proprietary, centralized SCS do handle merge/branch correctly.

    The way it works for all of them is to record every commit in a Directly Acyclic Graph (DAG), and from this, you have different merge strategies available. Here you can find more information:

    http://revctrl.org/CategoryMergeAlgorithm

    At least hg, bzr and git can use external merge utilities.

  • The following is a discussion of how darcs (http://darcs.net) deals with patches - http://darcs.net/manual/node9.html.

  • Neat article to read

    http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/

    From Setori

0 comments:

Post a Comment