Version Control Questions Megathread (SVN / git / whatever else)

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Version Control Questions Megathread (SVN / git / whatever else)

Xerophyte: Mar 17, 2008; This space intentionally left blank

chippy posted:

I use GitFlow. What's a better one? I'm about to start a new project, could be good to know.

As mentioned, do what you and the people you work with like and agree makes sense for how you work. If that's gitflow, use gitflow. It's got good tool support, it matches well to some people's mental models of how development and deployment should happen. I personally think it's a bit over-complicated, but that's me. If you want a longer anti-gitflow rant then there's of course a gitflow considered harmful, because programmers know how to run a joke into the ground.

In general I think that appropriate workflow depends on how familiar a team is with git, how they organize their releases and how they collaborate. A web shop doing a dev/staging/production split might want to reflect that in the repo, a native binary dinosaur has no particular reason to do so but might need a mechanism for platform-specific release branches. A fully remote team probably wants to avoid any workflow that requires non-asynchronous coordination, but if you're in the same office then talking with your peers about git for a minute per week isn't too bad.

The team I work in decided to not use gitflow because:
- We don't think the persistent develop and hotfix branches add any value for us.
- We like to keep a clean commit history, and feel that all the crossing merges and branches in gitflow made history relatively hard to follow.
- We want master to be the one branch people branch from and merge to, as well as the source for our nightly builds. Master will occasionally be slightly unstable when someone merges a feature with a bug that slipped through automated testing, and we're OK with that.

Other people may have different priorities on all of the above, and that's fine.

We went with a workflow where work is done in (ideally) short-lived feature branches on a shared central repository (company github) that are occasionally rebased on master during development and merged through PRs directly to master once approved. People are encouraged to use interactive rebase to clean up their bugfix commits, updates from code review comments, drop temporary debug code and so on prior to their merge. Some good commits on master are tagged as releases as required/on whim, and if we need a hotfix for a particular release we create a hotfix branch on that release tag and cherry-pick bugfix commits from master to create a patch release. It's similar to a lot of stuff discussed in the oneflow by the considered harmful person, mostly because both are patch submission workflows adapted to shared team repos and PRs.

Cons: the rebases can require more coordination for long-lived or collaborative feature branches. Our workflow doesn't have any tool enforcement (not that I think we'd use them). Some engineers aren't comfortable with squashing/dropping commits, interactive rebase or history editing in general (we're OK with the new-to-git engineers not doing that stuff, but sometimes people have Strong Opinions about commit history).

# ¿ Jul 20, 2017 09:22

Adbot: ADBOT LOVES YOU

# ¿ May 14, 2024 17:27

Xerophyte: Mar 17, 2008; This space intentionally left blank

chippy posted:

Urgh, I think I know what it is. I think the situation has been created by them doing a pull at the end of the rebase (because the remote branch now shows as diverged), instead of a force push*. Then, when they rebase the branch again, they started running into these types of problems. This is my working theory, anyway.

Probably, yeah. Typical scenario:
- End of work, time to push. git push -f
- New day, time to rebase. git rebase origin/master
- Hmm, git says my branch is out of date with its tracking branch and I should pull to update. git pull
- Oh, master is updated again. git rebase origin/master
and then suffering.

FWIW, we encourage the same approach of rebase and force-push on personal branches. I think it's a fine workflow, but it also makes git pull a big footgun when you're on one of those branches. I generally ask junior engineers to get a GUI with a nice log view, because even mild history editing like git rebase benefits from being able to actually see the history. Merges are a lot more lenient.

# ¿ Sep 15, 2017 09:09

Xerophyte: Mar 17, 2008; This space intentionally left blank

Taffer posted:

We had a lot of issues with submodules in the past, but I did not know about being able to update the submodules in a single commit, I'll have to look into that, thanks.

Everything bigger than 10MB is already in Git-LFS. But would that actually change performance? I thought Git-LFS just provided a means to put big files on a separate server when using something like github, but otherwise worked the same. But yeah, I'd definitely love to know some best-practices with prebuilt binaries - we definitely need them, because building all of the dependencies would take literal days, but we need a better way to manage them.

To clarify: you update the submodule repositories themselves separately and independently with whatever feature they need, in some number of commits. You then make a single commit to the main repo that updates all the submodule references in that repository to the versions you need. This is what we do, it works. You will suffer the general submodule annoyances since updating a submodule reference is opaque and atomic, which makes conflicts and diffs more difficult. If you can avoid splitting the repo then that's preferable, in my opinion; submodules are evil.

Git stores every version of every file and checks any changes (e.g. git status) against every file in the entire history. For text, file versions can be stored efficiently with deltas so performance is pretty good. For binaries, deltas don't work so every repo clone stores a complete copy of every version of every binary ever committed. Even if it's "just" 1 MB binaries, if you update them often -- and it sounded like you have some build artifacts in there, is that right? -- you will bloat the repo pretty fast and big repos are slow since there's just more data that git needs to compare changes against whenever you do something. How big is your .git for the repo in question?

Git-LFS tries to get around git's issues with binaries by basically replacing the file stored in the repo with a hash and a URL, which LFS then resolves more or less transparently. Git itself no longer needs to store a copy of every version of the binary, it just needs to store every hash and URL used to resolve the binary, which is small and so keeps the total repo size down and basic repo operations responsive. You just have to deal with LFS's own overhead instead, which includes keeping its own cache of every version of the binaries and so on. You can store small binaries that are rarely updated directly directly in the tree if you want, but I'd be very hesitant to put anything >100kb in there. Never store actual artifacts of the build that need to be constantly updated in git itself, that's asking for pain. Ideally, if it can't be line diffed then it doesn't go in git.

Store prebuilt binaries and build artifacts in a package manager or an artifact store with some useful api. We use Artifactory, I have no real concept of how it compares to other options. Store a text specification in json, xml or something in git, saying that this project has a dependency on libfoo, version 2.4.2, libbar, version 12.0.2 and so on. On configuring the build, download the requested version if it's not present wherever the user has specified that they store their externals. As a first version you can do this by making cmake download and unzip binaries from a server on if it fails to find them on configure, it's hacky but what we ran on for years (to my shame). Using nix or nuget or some other actual package management system is ultimately a lot simpler than rolling your own crappy package manager in cmake or some other language, though.

Xerophyte fucked around with this message at 14:22 on Jan 5, 2018

# ¿ Jan 5, 2018 14:20

Xerophyte: Mar 17, 2008; This space intentionally left blank

smackfu posted:

What does Git LFS look like if someone doesn�t have the LFS client? Do they just get stub files instead of the binaries?

A page and three weeks late but: they will get the stub files and be confused as to why things break.

Having a 500 MB file in a git repo where work is happening is probably not a good idea. You could use LFS or just have a bootstrap script or makefile download it if this is an environment where that makes sense.

A possible compromise is to make a submodule with only the binary. That at least isolates the giant binary data from the repository where work happens, while keeping the binary reasonably easy to update at need. The downside is that now you have a submodule and submodules are evil.

# ¿ Feb 23, 2018 08:08

Xerophyte: Mar 17, 2008; This space intentionally left blank

Cuntpunch posted:

I think the distinction is that their definition of 'logical changes' is extremely fine-grained - possibly to a fanatical degree. I'm talking "if you rename a thing, that should be a reviewed standalone commit." level of granularity.

There's a couple of different ways you can structure a git workflow, some of which are footguns and some of which work pretty well if everyone is on board. git is first designed to make it easy to email patch files full of commits to a central maintainer after all, which is a workflow pretty much no one uses.

The various third party tools are usually more opinionated. Gerrit's patch set workflow specifically expects that you'll have fairly large commits that are functionally separate from one-another and can be independently edited in response to feedback during the review process. Highly granular histories are not really a good match. You can end up resolving a ton of rebase conflicts after an amend and it can be hard to match post-review fixes to specific commits, as you noticed.

I'm used to rebase-heavy workflows -- we ran Critic as our code review tool which is built around review-by-commit plus using git rebase -i --autosquash and fixup! commits for review revisions, which is the sort of thing that gives people nightmares -- so I don't think rebasing a feature with a couple of commits in it is that bad. Frequently rebasing a large feature while preserving individual renames in individual commits sure does sound painful and insane though.

I guess in your shoes I'd have a chat with my coworkers about what problems I've been having with what I've been interpreting as their preferred workflow combined with Gerrit's amend-and-rebase reviews, and what I'd like to do differently (e.g. squash harder for a cleaner history). It sounds a lot like they're making life difficult for themselves, but maybe they have a plan or are open to feedback. If their reply is "shut up, noob" or some version thereof: well, uh ... poo poo.

# ¿ Mar 12, 2020 00:10

Xerophyte: Mar 17, 2008; This space intentionally left blank

peepsalot posted:

But, when doing a merge, as far as I can tell it only compares the "endpoints" of the merge, and does not consider intermediate commits for the purpose of tracking file renames.

I just wish git was not so goddamn stupid... like I don't suppose there is any way to have merges just consider individual commits one-at-a-time when resolving a file rename, rather than just the entire before/after?
Or like a way to change the "similarity index" metric to be character-based rather than line-based, i'm pretty sure the files would pass in that case too.

Merge is explicitly just merging endpoints. Rebase exists to let you apply the commits of the branch you are rebasing one-by-one onto the commit you are rebasing on, if you prefer to do that -- I find it generally preferable to always rebase whenever I'm updating a feature branch, but opinions very much differ on that sort of thing. However for an old, stale branch rebase can be very painful, since odds are the early commits will have conflicts in code that is changed in later commits. You often end up resolving multiple conflicts in different commits as the same region of code gets updated. You can change how broadly an individual git rebase (or merge) detects renames with the git rebase -X find-renames=10% strategy option, which may be useful when dealing with something that's exceedingly diverged.

You can technically also use an interactive rebase to interleave the commits of the two branches in some arbitrary order starting from some common ancestor, but unless everyone working on the codebase is OK with doing a big history rewrite on master I would strongly recommend against trying that.

# ¿ Mar 29, 2022 06:48

Adbot: ADBOT LOVES YOU

# ¿ May 14, 2024 17:27

Xerophyte: Mar 17, 2008; This space intentionally left blank

bobmarleysghost posted:

So going the rebase route, it won't merge any of my changes into the main branch? (I don't want my changes merged, yet)

No. Rebasing E on C means taking the deltas of each commit in the branch being rebased -- so D and E here -- and replaying those deltas on C. It's exactly meant to do the thing you're trying to do.

# ¿ Oct 5, 2023 15:46

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Version Control Questions Megathread (SVN / git / whatever else)