Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
smackfu
Jun 7, 2004

Keep in mind that git hooks are client side so don’t enforce any real restrictions.

Adbot
ADBOT LOVES YOU

Oysters Autobio
Mar 13, 2017
Anyone here ever adopt git for a data analytics team?

Trying to slowly push our team to adopt more SWE best practices while not overwhelming them so it'd be interesting to hear people's adaptation of gitflow or whatever with data analytics.

I'm talking here about maintaining both analytical products (jupyter notebook reports, visualizations etc) and small analytical data models (sharing reusable python transform and utility scripts for wrangling data). Can't version control our Tableau dashboards or anything (guess we could but no point for diffs or anything) but at least our notebooks and scripts.

We don't technically run any ETL or anything, but our data comes from all over the place so we're often doing lots of wrangling and cleaning and because most folks don't have a dev or SWE background everyone's keeping their notebooks saved locally or swapping them via email.

We use gitlab and have jupyterhub with the git extension, so I wanna write up a little SOP for a very barebones stripped down workflow of using a central repo. Thinking here to just have a dev and master branch with each person creating feature branches and merging into dev for peer review then into master on "production".

Most folks will prob be using the Gitlab GUI for this, so I'm thinking to setup lots of templates and also sync it up with our tasking tickets in Jira.

Also doing this out of a selfish desire to learn CI, gitlab runners and such because our work flow for "publishing" reports involves a bunch of different manual buttonology and copy paste duplication I think we could automate.

Any tips on stripping a Gitlab project to it's barebones and/or a decent user guide for such a workflow I could bootstrap from?

Oysters Autobio fucked around with this message at 18:52 on Aug 20, 2023

porkface
Dec 29, 2000

Have you looked at Meltano?

Oysters Autobio
Mar 13, 2017

porkface posted:

Have you looked at Meltano?

I had previously but for some reason looked past it. Thanks for flagging, it looks like actually would be a very good way to approach this in a prepackaged way.

Slimchandi
May 13, 2005
That finger on your temple is the barrel of my raygun
I use meltano for pretty much all our extract pipelines, plus you can normally adapt an existing tap for a specific scenario and add/remove what you don't need.

We are also a PowerBI shop so the latest changes with .pbir files move us a step forward to being able to change control even more. Bring it on.

Oysters Autobio
Mar 13, 2017

Slimchandi posted:

I use meltano for pretty much all our extract pipelines, plus you can normally adapt an existing tap for a specific scenario and add/remove what you don't need.

We are also a PowerBI shop so the latest changes with .pbir files move us a step forward to being able to change control even more. Bring it on.

What changes did PBI get that support version control? My big gripe with Tableau is it sucks for version control let alone templating.

bobmarleysghost
Mar 7, 2006



I have a rookie question, what's the best way to get any changes from the main branch into my dev branch?

Basically I want to go from here:


code:

A --- B --- C (main)
\
 D -- E (mydev)



To here:



code:

A --- B --- C (main)
             \
              D -- E (mydev)

korora
Sep 3, 2011
There are a couple of ways to do this but the picture you drew is a rebase (which is also the correct way).

bobmarleysghost
Mar 7, 2006



So going the rebase route, it won't merge any of my changes into the main branch? (I don't want my changes merged, yet)

Xerophyte
Mar 17, 2008

This space intentionally left blank

bobmarleysghost posted:

So going the rebase route, it won't merge any of my changes into the main branch? (I don't want my changes merged, yet)

No. Rebasing E on C means taking the deltas of each commit in the branch being rebased -- so D and E here -- and replaying those deltas on C. It's exactly meant to do the thing you're trying to do.

bobmarleysghost
Mar 7, 2006



nvm, thanks for the help!

bobmarleysghost fucked around with this message at 18:57 on Oct 5, 2023

StumblyWumbly
Sep 12, 2007

Batmanticore!
I may be wrong but rebasing dev seems particularly dangerous. My recollection is rebasing is poo poo if you have a lot of branches or the code has been pulled to multiple locations.
Pre rebase, most branches see the code as
code:
A --- B --- C (main)
\
 D -- E (mydev)
If you have branch X coming off E, then you rebase E, then you try to merge X into E, it will see that the E you're trying to merge into is not the E in its history and it will not know how to do the merge.

necrotic
Aug 2, 2005
I owe my brother big time for this!
This is his own branch not a general dev branch. Rebasing shared branches like a general dev or the main should be kept minimal but go ham on your own branches.

edit: and in that described case you also rebase X onto the new E after it was rebased.

StumblyWumbly
Sep 12, 2007

Batmanticore!
Yeah, you're right, now I remember. Doing it on your own branch is generally fine, but one time I had pulled someone else's feature branch so I could help them out, and when they rebased it ended up meaning I couldn't just pull in the branch I ended up needing to blow up some stuff and start over.
I think the rule we settled on is just don't rebase if your mydev branch has been pushed or branched off of.

RPATDO_LAMD
Mar 22, 2013

🐘🪠🍆
Yeah, rebasing rewrites commit history instead of just appending to it, and any history-rewriting operation becomes a huge pain in the rear end if anyone else has a copy of the branch.

Tequila Bob
Nov 2, 2011

IT'S HAL TIME, CHUMPS
I usually recommend merges instead of rebasing. That way, each commit is a record of the context in which it was written, and the merge commits themselves are a record of the conflict resolution steps - or a record affirming that there was no conflict.

This may be because I started in Mercurial before Git.

Maigius
Jun 29, 2013


I managed to disconnect my Git repository from the Eclipse project. What's the best way to reconnect the two?

spiritual bypass
Feb 19, 2008

Grimey Drawer
What does git status on the command line say?

Maigius
Jun 29, 2013


spiritual bypass posted:

What does git status on the command line say?

I have no idea how to check that. How would I go about doing that? I've only really used the functionality that's accessible through the Eclipse GUI.

smackfu
Jun 7, 2004

I think he means type “git status” on the command line.

New Yorp New Yorp
Jul 18, 2003

Only in Kenya.
Pillbug

Maigius posted:

I have no idea how to check that. How would I go about doing that? I've only really used the functionality that's accessible through the Eclipse GUI.

You're doing yourself a tremendous disservice by not learning at least some basic Git CLI commands.

Maigius
Jun 29, 2013


smackfu posted:

I think he means type “git status” on the command line.

Ok, when I do that while in the folder it shows me a giant list of files that had changed between local and remote. In Eclipse however, it's not showing the push or pull options and the project title went from project-name(branch) to just project-name.

spiritual bypass
Feb 19, 2008

Grimey Drawer
Ok, so at least that means git is still there and working

nielsm
Jun 1, 2009



At the very top of the git status, does it say what branch you're in? Or does it perhaps say "detached head"? Or maybe a merge or rebase in progress?

Maigius
Jun 29, 2013


nielsm posted:

At the very top of the git status, does it say what branch you're in? Or does it perhaps say "detached head"? Or maybe a merge or rebase in progress?

I still see the branch name, I was in the middle of a merge with a bunch of conflicts when I hit the disconnect option.

StumblyWumbly
Sep 12, 2007

Batmanticore!

Maigius posted:

I still see the branch name, I was in the middle of a merge with a bunch of conflicts when I hit the disconnect option.

It sounds like the Git part is fine, but Eclipse has set itself to ignore Git or forgot how to access git. Hope that tells you where to start looking.
The git info is all stored in .git which is in the project directory, there isn't much subtle going on there, and it sounds like .git is fine or the command line would be giving you issues.

Maigius
Jun 29, 2013


StumblyWumbly posted:

It sounds like the Git part is fine, but Eclipse has set itself to ignore Git or forgot how to access git. Hope that tells you where to start looking.
The git info is all stored in .git which is in the project directory, there isn't much subtle going on there, and it sounds like .git is fine or the command line would be giving you issues.

I got Eclipse and Git talking again and most of the merging issues cleaned up. I 100% need to learn more about Git. Thanks everyone for your help.

Tequila Bob
Nov 2, 2011

IT'S HAL TIME, CHUMPS
Learning Git isn't the easiest thing in the world, but there are two reasons you should:

1. It will pay dividends throughout your entire career of software development

2. You only have to learn it once. It's not like learning a programming language that gets new features every couple of years; learn Git once, know it forever. At least, that's been my experience.

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
I will preface this by saying I do all my day-to-day git work in GitHub desktop, but it's really useful to go through some tutorials to be able to do everything you'd normally do via the command line. Git has kept a focus on backwards compatibility, to the point where some of the Git workflows don't really make any sense, and so things like GitHub Desktop (or I'm assuming your Eclipse plugin) abstract them somewhat. On a day to day basis it's easier to just use that abstraction, but it's useful to know what's happening under the hood so that when stuff gets messed up, or you need to do something outside of that tool, you're not completely lost.

Vanadium
Jan 8, 2005

Tequila Bob posted:

Learning Git isn't the easiest thing in the world, but there are two reasons you should:

1. It will pay dividends throughout your entire career of software development

2. You only have to learn it once. It's not like learning a programming language that gets new features every couple of years; learn Git once, know it forever. At least, that's been my experience.

I bet that's what they told people about why to learn svn.

Tequila Bob
Nov 2, 2011

IT'S HAL TIME, CHUMPS

Vanadium posted:

I bet that's what they told people about why to learn svn.

Undeniably true!

That said, if something replaces Git, I'll eat my words. (Happily, too - I don't think Git is the best VCS, though it is very good. I prefer Mercurial's branching and UI.)

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Vanadium posted:

I bet that's what they told people about why to learn svn.

No, I'm pretty sure no one ever said that about svn. It never had even a fraction of the dominance that git has now, and there was only a year between SVN 1.0 and git's first release. Even during SVN's heyday it was clear that one of darcs, git, or mercurial were going to replace it, and it was just a question of which one and when.

lifg
Dec 4, 2000
<this tag left blank>
Muldoon
That’s weird, I always thought SVN was an old and established piece of software by the time git appeared, but you’re right.

necrotic
Aug 2, 2005
I owe my brother big time for this!
CVS was that, SVN was trying to be a better CVS.

smackfu
Jun 7, 2004

Did CVS and SVN actually need a server like git, or could they just work with a shared network drive?

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
Git doesn't need a server...

necrotic
Aug 2, 2005
I owe my brother big time for this!

smackfu posted:

Did CVS and SVN actually need a server like git, or could they just work with a shared network drive?

A shared drive in the context of CVS/SVN isn’t exactly much different than a central server. You still have a central server: it’s just for raw files instead of the SVN protocol.

And then you get to trust that shared network drive can correctly and safely handle frequently accessed _and mutated_ files across all users.

I wouldn’t be surprised to hear people did that, but running a server for the tools wasn’t exactly difficult.

It’s easy to forget, too, that branching in those systems _sucked_. Literally full copies of the entire trunk into another folder was a “branch”. At least with a central server if you all worked off trunk your changes only made it to others if you committed them. If it was a shared drive? Good luck.

RPATDO_LAMD
Mar 22, 2013

🐘🪠🍆

smackfu posted:

Did CVS and SVN actually need a server like git, or could they just work with a shared network drive?

SVN does need a server, it doesn't store the diffs locally.
one of its worst features is that checking out a branch entails re-downloading all the files in the repository from the server, so if you try the typical git thing of making new feature branches constantly you will want to tear your hair out within a day.

The main benefit of central server based VCs like these is the ability to acquire an exclusive write lock on files, (some call this "checking out" the file but that's confusing) so if you're loving around with some impossible-to-merge binary file you can ensure that nobody else touches it and you won't see merge conflict hell.
For example in game dev Unreal Engine stores a lot of code in binary "blueprint" files. if the editor's version control integration is hooked up to an SVN or Perforce server, the editor will prompt you whenever you modify a blueprint / have uncommitted changes to lock it in the VC. Using git for this is extremely painful since you won't know that two people are touching the same file until you hit an inevitable conflict, and since the merge tools for blueprints are nonexistent one dev or the other is going to have to re-make all their changes.

RPATDO_LAMD fucked around with this message at 02:06 on Oct 12, 2023

smackfu
Jun 7, 2004

necrotic posted:

A shared drive in the context of CVS/SVN isn’t exactly much different than a central server. You still have a central server: it’s just for raw files instead of the SVN protocol.

This was probably in the early 2000s when I was using it at work so shared windows drives were easy and already existed and running a dedicated server was a big deal.

Adbot
ADBOT LOVES YOU

ToxicFrog
Apr 26, 2008


smackfu posted:

Did CVS and SVN actually need a server like git, or could they just work with a shared network drive?

Git doesn't need a server. That's the "distributed" part of DVCS and one of the immediately obvious wins (that it shares with hg and other DVCS), especially for casual "I just want to version some personal projects and not be a server janitor" use, versus stuff like P4 and SVN.

SVN requires a server, and requires communicating with the server for many more operations than git does (because most of the history is stored on the server, unlike git which stores a complete local copy), which makes it brutally slow if you have a slow network connection or high latency to the server. I believe that modern SVN is somewhat better about this and caches more things locally, but it still needs a server backing it. (Last time I used it there were two server implementations, one that required a full Apache install with a bunch of special modules loaded and a much easier to administer standalone server (svnserve) with a giant "don't loving use this it will destroy your data forever" warning on it. I think svnserve is now actually usable, but don't quote me on that.)

CVS also requires a server, but I've never actually used it. So does P4 and for personal use it is actually way easier to janitor than SVN, I ran it for a few years until git came along and I dumpstered centralized version control in my personal life forever.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply