Version control systems nowadays suffer from the problem that all new version control systems are created by groups of hackers working on projects so big and complex that the existing systems aren’t powerful enough for them. So you keep getting more and more powerful and complex systems. git is so complex that no one who isn’t a software developer can use it correctly.
I was tasked with moving a complex natural-language processing program for the NIH from, I think, SVCS, to git. After three days studying git man pages and trying to explain them to a group of linguists, I gave up and put everything under QVCS, and it was smooth sailing after that.
I was taught to use git within a few days of starting to become a professional programmer. I’m a dyed-in-the-wool fanboy. I probably have no perspective at all here. But whenever I’ve used Mercurial everything seems backwards. People start recommending that I do wacky-sounding things like making two clones of a repository just to do what I’d normally do with git branch/git checkout… Is there any way to track multiple heads without just making multiple checkouts all over your disk?
Also, I strongly suspect that people who have trouble with git are just having trouble visualizing the DAG in their heads. If you run gitk—all whenever you get confused, you can actually see the thing, and then there’s nothing to be confused about.
...Though I suppose the above might just translate to “I’m a visual thinker, and everyone should be more like me.”
Well, to me, git’s DAG model is 100% obvious, and gitk --all is helpful in exactly the way you state — but at the beginning it was still confusing which command used how would produce the effect on the DAG (and working tree and index...) I wanted. Similarly, the commands to configure and manipulate branches and remotes are not entirely obvious, especially if you’ve gotten off the beaten path and want to fix your config to be normal.
Is there any way to track multiple heads without just making multiple checkouts all over your disk?
Taboo “track” and “checkouts”. I don’t know what you mean by “track”, and Mercurial doesn’t have checkouts, as I understand the term. A clone isn’t “checked out” of anything. (This was actually the hardest part for me to wrap my head around, coming from Subversion and the central-repository model, but I’m wondering whether you’re talking about the same thing or not.)
If you simply mean you want more than one head or branch, you don’t need multiple clones. You can switch your working copy between named branches or heads with “hg up”, and list them with “hg heads”.
It’s true that people often suggest just using clones instead of named branches, but IMO this only makes sense for short-lived branches that are going to be folded in to something else. Mercurial works just fine with named branches and multiple heads. You can also use bookmarks to give local names to specific heads—a kind of locally-named branch whose name isn’t propagated to other repositories.
I strongly suspect that people who have trouble with git are just having trouble visualizing the DAG in their heads.
No, we just read the man pages and run screaming. It’s not the model of a change-based system that’s the problem, it’s the UI design (or lack thereof). ;-)
From an outsider’s perspective, git’s UI is to mercurial’s UI as Perl’s is to Python. And since I’ve programmed almost exclusively in Python for about 13 years now, guess which one looks more attractive to me?
(Note: this doesn’t have anything to do with Mercurial’s innards being written in Python; other DVCS’s have been written in Python and didn’t have the same orthogonality of design.)
I’m told git massively improved its interface in the last few years. I started using it mainly in 2010 after switching from bzr, and had little trouble understanding the system (in fact I found hg’s interface to be kind of weird). But there you go.
(Also, wrt
Taboo “track” and “checkouts”. I don’t know what you mean by “track”, and Mercurial doesn’t have checkouts, as I understand the term. A clone isn’t “checked out” of anything.
In git-land “checkout” means a working directory; by “multiple checkouts all over your disk” I assume MBlume means multiple clones of the repository.)
Git is new. It’s already gotten easier to use (I’m already too much of a newb to have ever used the Git of Yore, which supposedly you needed a CS PhD to use effectively), and the folks at GitHub in particular seem to be working hard at sanding down its rough edges.
This is quite ancient. git started as a solution to technical problem of high performance distributed version control. They got user interface into something reasonable only later.
It’s still not that great. The internal DAG model is quite clean and clear. The actual commands do not always map clearly to this model. One common failure is often hiding or doing implicit magic to the staging area. Another is that many commands are a mish-mash of “manipulate the DAG” and “common user operations”, where doing only one or the other would be much clearer. I really doubt that the user interface will get much better, because to do so they really need to throw out backward compatibility.
Of course, Subversion is still “majority” VCS even for open-source projects. Maybe people need something other than Git to change that—or maybe SVK should become more widespread way to use SVN.
And for the sake of speed and stability Git doesn’t store some data that every other open-source DVCS does store, and I have heard some Git users to say it is acceptable tradeoff (which is true for them) and some to say that nobody should care about this kind of data.
Of course, better tool is never a solution to tool ideology. Evaluating multiple other tools isn’t either—after doing it with DVCSes, I now hate Git and implicitly assume that every tradeoff there is not fit for the medium-sized projects I’d care about.
I would guess that git is already more popular than svn for new projects (see github), and in at least some circles like among Ruby programmers still using svn for new stuff would raise some eyebrows. It’s definitely way past just early adopters, but I have no idea how to get reasonable vcs usage statistics.
I don’t know what you mean by these tradeoffs, git tends to store more data not less.
Well, Git stores code per se, for the rest of things it stores less data than either SVN or Bazaar (Mercurial, Monotone, Veracity).
It doesn’t track explicit directory renaming. It doesn’t keep branch history—if it did, reflog (which is local and only kept for 30 days) wouldn’t be needed. It only allows unique tags—so if you want to mark every revision where buildbot succeded to make both update and rolling back easy—you are out of luck (there can be a way—it is not obvious at all).
It knows each directory by its content, so it knows when a directory was renamed, without needing to be explicitly told.
It doesn’t keep branch history—if it did, reflog (which is local and only kept for 30 days) wouldn’t be needed.
Reflog is an essentially local thing, it shows where a branch used to point in a particular repository instance. It has little to do with history of the project, and often includes temporary commits that shouldn’t be distributed.
It only allows unique tags—so if you want to mark every revision where buildbot succeded to make both update and rolling back easy—you are out of luck
You need some way to specify what you’d want to update or roll back to—what kind of use case are you thinking about? You could support a successful-build branch, for example, so that its tip points to the last successful build, and you could create merge commits with previous successful builds as first parents, for the purpose of linking together all successful builds in that branch.
Tracking path by their content is not always good… It couples content changes with intent changes. If I need to make a copy of directory and then make the copies slowly diverge until they have nothing in common, I may want to mark which is for original intent and which is spin-off.
Branch history is not an inherently local thing.
When I have feature branches for recurring tasks, I will probably call them always the same. I will sometimes merge them with trunk and sometimes spawn them from the new trunk state. Later, I may wish to check whether some change was done in trunk or in the feature branch—it is quite likely to provide some information about commit intent. I can get it in every DVCS I know except Git easily—in Git I need to track DAG to get this information.
About succesful-build branch: for some projects I try to update to trunk head, and if it gives me too much trouble I look for closest previous revision which I can expect to work. In Monotone I simply mark some development commits as tested enough, there is a simple command to get all the branch.tested commits from the last month. This information says something about a commit, and to lose it I have to do something with the certificate that states it. In Git, rev-list behaviour depends on many things that happen later.
Linux kernel history is too big for any of the things I say to make sense for it. But in a medium project, I want to have access to strange parts of history to find out what happenned and how and what did we mean.
Which is my point exactly. It is one aspect of Vi’s criticism of git not storing some important data that is clearly valid. It is a tradeoff that probably doesn’t matter if you are Linus and you are storing code for a Linux kernel but in other cases it is a blatant flaw that needs to be worked around via compromises or kludges.
Git is the absolute worst version control system out there (except for all the others).
Empty directories are sometimes necessary and it’s a pain in the ass that git cannot store them at all. I had to put almost empty README.txt files in directories like log/ in many projects. It’s more a minor annoyance than anything more.
I have a complex enough deployment helper living in Monotone repository for which it is simpler and more natural to keep a few empty directories in the repository than to check-and-create from each of ten shellscripts. It is checkout-and-use, no other setup makes sense, so “just creating them in Makefile” would be suboptimal.
will deal with it. It’s a nice idempotent action. I started using mkdir -p as workaround for git issues, but now it just seems to make far more sense than dicking around manually maintaining working directories.
I know about “mkdir -p”—my non-problem (I was not going to use Git anyway for this project) is that I multiple places where to put it and if I miss one I will not notice for some time.
Saying that recreating something just in case right after checking out the new version makes more sense than simply storing along with all the rest seems to be exactly an example of tool imposing some workflow ideas on people.
Saying that recreating something just in case right after checking out the new version makes more sense than simply storing along with all the rest seems to be exactly an example of tool imposing some workflow ideas on people.
You have it backwards. Using version control to store working areas for programs rather than programs simply mkdir -ping working areas they need seems to be exactly an example of tool imposing some workflow ideas on people.
There are so many common cases where you absolutely need mkdir -p like dynamic working directory layout that it’s mentally simpler to just use it always. It works on 100% of problems, is idempotent, and resistant to human errors.
Why would I ever bother with VCS-based solution what only works in some simple cases like static working directory layouts, is based on non-idempotent operations, and fails often in case of human mistakes?
It just creates so much less mental overhead if you simply mkdir -p place where you want to create your files always, no exceptions.
I understand that people who use languages where mkdir -p is a non-trivial operation won’t get it, but that’s problem with their tools limiting their mindset.
I always try to do things so that they fail either often or never. Sometimes I don’t care much which is the case—if they do fail often, it is easy to reproduce and I will fix it with less effort than an elusive bug.
“mkdir -p” is not resistant to human error of not calling it / not including the file that does it in one of ten scripts.
“mkdir -p” means that browsing the VCS history you do not see the real layout.
I bother with VCS-based solution when it works better (and my “better” includes estimated time to catch trivial mistakes and ease of looking up old history) than “mkdir -p”. Dynamic working directory layout is something I have yet to see often, so I do not discard better-for-me solutions because of reasons unapplicable to current project.
Human mistakes depend on workflow. Do you often accidentally remove your checkout? And human mistakes in initial setups should cause the failure as often as possible when they happen.
“mkdir -p” before each cd will make the scripts considerably longer for no sane reason. And I have a few places in this very deployment helper system where “mkdir -p” before cd will make the failure mode less convenient.
And yes, all that is writen in shell script and I do use “mkdir -p” in the code where it made sense for my task at hand.
Somehow, most buildsystems for complex packages run make with changing working directories. Even in single script using subshells, it is trivial to save some effort with changing directories without complicating the code.
It is a deployment helper, not a single-package build system, so it has to do some work with numerous checkouts of dependencies. It is more natural to write some parts of code to run inside working directories of individual packages.
Is that something I need a justification for? My version control system throws away stuff that I am trying store. I’d also prefer it not to throw away files staring with ‘b’.
I’ve learned to make my programs pessimistic and recreate the file system if necessary. It surprised me a few times before I learned the quirks.
The happy ending is that nobody uses subversion any more, git won and has none of these problems.
It’s up to you how seriously you read my comment.
Hee. We still use subversion every day.
Version control systems nowadays suffer from the problem that all new version control systems are created by groups of hackers working on projects so big and complex that the existing systems aren’t powerful enough for them. So you keep getting more and more powerful and complex systems. git is so complex that no one who isn’t a software developer can use it correctly.
I was tasked with moving a complex natural-language processing program for the NIH from, I think, SVCS, to git. After three days studying git man pages and trying to explain them to a group of linguists, I gave up and put everything under QVCS, and it was smooth sailing after that.
Try mercurial. It’s got basically the same features, but is more comprehensible to human beings. There’s an excellent tutorial called hg init.
(And if you should happen to need to use other people’s stuff that’s in git, you can just use the git extension for mercurial.)
blinks
I was taught to use git within a few days of starting to become a professional programmer. I’m a dyed-in-the-wool fanboy. I probably have no perspective at all here. But whenever I’ve used Mercurial everything seems backwards. People start recommending that I do wacky-sounding things like making two clones of a repository just to do what I’d normally do with git branch/git checkout… Is there any way to track multiple heads without just making multiple checkouts all over your disk?
Also, I strongly suspect that people who have trouble with git are just having trouble visualizing the DAG in their heads. If you run gitk—all whenever you get confused, you can actually see the thing, and then there’s nothing to be confused about.
...Though I suppose the above might just translate to “I’m a visual thinker, and everyone should be more like me.”
Well, to me, git’s DAG model is 100% obvious, and
gitk --all
is helpful in exactly the way you state — but at the beginning it was still confusing which command used how would produce the effect on the DAG (and working tree and index...) I wanted. Similarly, the commands to configure and manipulate branches and remotes are not entirely obvious, especially if you’ve gotten off the beaten path and want to fix your config to be normal.Taboo “track” and “checkouts”. I don’t know what you mean by “track”, and Mercurial doesn’t have checkouts, as I understand the term. A clone isn’t “checked out” of anything. (This was actually the hardest part for me to wrap my head around, coming from Subversion and the central-repository model, but I’m wondering whether you’re talking about the same thing or not.)
If you simply mean you want more than one head or branch, you don’t need multiple clones. You can switch your working copy between named branches or heads with “hg up”, and list them with “hg heads”.
It’s true that people often suggest just using clones instead of named branches, but IMO this only makes sense for short-lived branches that are going to be folded in to something else. Mercurial works just fine with named branches and multiple heads. You can also use bookmarks to give local names to specific heads—a kind of locally-named branch whose name isn’t propagated to other repositories.
No, we just read the man pages and run screaming. It’s not the model of a change-based system that’s the problem, it’s the UI design (or lack thereof). ;-)
From an outsider’s perspective, git’s UI is to mercurial’s UI as Perl’s is to Python. And since I’ve programmed almost exclusively in Python for about 13 years now, guess which one looks more attractive to me?
(Note: this doesn’t have anything to do with Mercurial’s innards being written in Python; other DVCS’s have been written in Python and didn’t have the same orthogonality of design.)
I’m told git massively improved its interface in the last few years. I started using it mainly in 2010 after switching from bzr, and had little trouble understanding the system (in fact I found hg’s interface to be kind of weird). But there you go.
(Also, wrt
In git-land “checkout” means a working directory; by “multiple checkouts all over your disk” I assume MBlume means multiple clones of the repository.)
Harsh!
Git is new. It’s already gotten easier to use (I’m already too much of a newb to have ever used the Git of Yore, which supposedly you needed a CS PhD to use effectively), and the folks at GitHub in particular seem to be working hard at sanding down its rough edges.
My experience with git was in 2006 or 2007.
This is quite ancient. git started as a solution to technical problem of high performance distributed version control. They got user interface into something reasonable only later.
It’s still not that great. The internal DAG model is quite clean and clear. The actual commands do not always map clearly to this model. One common failure is often hiding or doing implicit magic to the staging area. Another is that many commands are a mish-mash of “manipulate the DAG” and “common user operations”, where doing only one or the other would be much clearer. I really doubt that the user interface will get much better, because to do so they really need to throw out backward compatibility.
There are some problem with DAG, too, because you are supposed to store the information with little meta-information.
There are precedents of tools wrapping Git command-line interface, so that part possibly could be fixed. I frankly do not know why nobody does it.
Of course, Subversion is still “majority” VCS even for open-source projects. Maybe people need something other than Git to change that—or maybe SVK should become more widespread way to use SVN.
And for the sake of speed and stability Git doesn’t store some data that every other open-source DVCS does store, and I have heard some Git users to say it is acceptable tradeoff (which is true for them) and some to say that nobody should care about this kind of data.
Of course, better tool is never a solution to tool ideology. Evaluating multiple other tools isn’t either—after doing it with DVCSes, I now hate Git and implicitly assume that every tradeoff there is not fit for the medium-sized projects I’d care about.
I would guess that git is already more popular than svn for new projects (see github), and in at least some circles like among Ruby programmers still using svn for new stuff would raise some eyebrows. It’s definitely way past just early adopters, but I have no idea how to get reasonable vcs usage statistics.
I don’t know what you mean by these tradeoffs, git tends to store more data not less.
Well, Git stores code per se, for the rest of things it stores less data than either SVN or Bazaar (Mercurial, Monotone, Veracity).
It doesn’t track explicit directory renaming. It doesn’t keep branch history—if it did, reflog (which is local and only kept for 30 days) wouldn’t be needed. It only allows unique tags—so if you want to mark every revision where buildbot succeded to make both update and rolling back easy—you are out of luck (there can be a way—it is not obvious at all).
It knows each directory by its content, so it knows when a directory was renamed, without needing to be explicitly told.
Reflog is an essentially local thing, it shows where a branch used to point in a particular repository instance. It has little to do with history of the project, and often includes temporary commits that shouldn’t be distributed.
You need some way to specify what you’d want to update or roll back to—what kind of use case are you thinking about? You could support a successful-build branch, for example, so that its tip points to the last successful build, and you could create merge commits with previous successful builds as first parents, for the purpose of linking together all successful builds in that branch.
Tracking path by their content is not always good… It couples content changes with intent changes. If I need to make a copy of directory and then make the copies slowly diverge until they have nothing in common, I may want to mark which is for original intent and which is spin-off.
Branch history is not an inherently local thing.
When I have feature branches for recurring tasks, I will probably call them always the same. I will sometimes merge them with trunk and sometimes spawn them from the new trunk state. Later, I may wish to check whether some change was done in trunk or in the feature branch—it is quite likely to provide some information about commit intent. I can get it in every DVCS I know except Git easily—in Git I need to track DAG to get this information.
About succesful-build branch: for some projects I try to update to trunk head, and if it gives me too much trouble I look for closest previous revision which I can expect to work. In Monotone I simply mark some development commits as tested enough, there is a simple command to get all the branch.tested commits from the last month. This information says something about a commit, and to lose it I have to do something with the certificate that states it. In Git, rev-list behaviour depends on many things that happen later.
Linux kernel history is too big for any of the things I say to make sense for it. But in a medium project, I want to have access to strange parts of history to find out what happenned and how and what did we mean.
Doesn’t work so well if the content is ‘nothing’.
Git doesn’t notice these at all.
Which is my point exactly. It is one aspect of Vi’s criticism of git not storing some important data that is clearly valid. It is a tradeoff that probably doesn’t matter if you are Linus and you are storing code for a Linux kernel but in other cases it is a blatant flaw that needs to be worked around via compromises or kludges.
Git is the absolute worst version control system out there (except for all the others).
In what situations would you want to store an empty directory and pay attention to whether it is renamed?
Empty directories are sometimes necessary and it’s a pain in the ass that git cannot store them at all. I had to put almost empty README.txt files in directories like log/ in many projects. It’s more a minor annoyance than anything more.
I have a complex enough deployment helper living in Monotone repository for which it is simpler and more natural to keep a few empty directories in the repository than to check-and-create from each of ten shellscripts. It is checkout-and-use, no other setup makes sense, so “just creating them in Makefile” would be suboptimal.
A single line of:
will deal with it. It’s a nice idempotent action. I started using
mkdir -p
as workaround forgit
issues, but now it just seems to make far more sense than dicking around manually maintaining working directories.I know about “mkdir -p”—my non-problem (I was not going to use Git anyway for this project) is that I multiple places where to put it and if I miss one I will not notice for some time.
Saying that recreating something just in case right after checking out the new version makes more sense than simply storing along with all the rest seems to be exactly an example of tool imposing some workflow ideas on people.
You have it backwards. Using version control to store working areas for programs rather than programs simply
mkdir -p
ing working areas they need seems to be exactly an example of tool imposing some workflow ideas on people.I’m mostly serious here.
I have two choices, you have one. My tool imposes less workflow ideas here. It’s totally information-theoretical.
There are so many common cases where you absolutely need
mkdir -p
like dynamic working directory layout that it’s mentally simpler to just use it always. It works on 100% of problems, is idempotent, and resistant to human errors.Why would I ever bother with VCS-based solution what only works in some simple cases like static working directory layouts, is based on non-idempotent operations, and fails often in case of human mistakes?
It just creates so much less mental overhead if you simply
mkdir -p
place where you want to create your files always, no exceptions.I understand that people who use languages where
mkdir -p
is a non-trivial operation won’t get it, but that’s problem with their tools limiting their mindset.I always try to do things so that they fail either often or never. Sometimes I don’t care much which is the case—if they do fail often, it is easy to reproduce and I will fix it with less effort than an elusive bug.
“mkdir -p” is not resistant to human error of not calling it / not including the file that does it in one of ten scripts.
“mkdir -p” means that browsing the VCS history you do not see the real layout.
I bother with VCS-based solution when it works better (and my “better” includes estimated time to catch trivial mistakes and ease of looking up old history) than “mkdir -p”. Dynamic working directory layout is something I have yet to see often, so I do not discard better-for-me solutions because of reasons unapplicable to current project.
Human mistakes depend on workflow. Do you often accidentally remove your checkout? And human mistakes in initial setups should cause the failure as often as possible when they happen.
“mkdir -p” before each cd will make the scripts considerably longer for no sane reason. And I have a few places in this very deployment helper system where “mkdir -p” before cd will make the failure mode less convenient.
And yes, all that is writen in shell script and I do use “mkdir -p” in the code where it made sense for my task at hand.
Why are you using
cd
at all? You usemkdir -p
before creating temporary files, and never randomlycd
.Anyway, this thread isn’t really getting anywhere.
Somehow, most buildsystems for complex packages run make with changing working directories. Even in single script using subshells, it is trivial to save some effort with changing directories without complicating the code.
It is a deployment helper, not a single-package build system, so it has to do some work with numerous checkouts of dependencies. It is more natural to write some parts of code to run inside working directories of individual packages.
This is closer to trolling at Vi than it is to a deep insight.
You’re mostly wrong. Enough so that I reread your comment 4 times to be sure I was parsing correctly.
Is that something I need a justification for? My version control system throws away stuff that I am trying store. I’d also prefer it not to throw away files staring with ‘b’.
I’ve learned to make my programs pessimistic and recreate the file system if necessary. It surprised me a few times before I learned the quirks.
No, just curious. I have not encountered and could not imagine a use case.
Directories, in my mind, are meta-information about files, so it makes no sense to me to store an empty directory.
I may be missing context here, but I frequently create empty directories to guide future filing/sorting behavior.
The examples mentioned so far could be described as meta information about future intended files.