Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Oysters Autobio
Mar 13, 2017
I know years ago there was lots of ideas floating around about getting non-tech people to start using actual version control that didn't consist of Report2022_v1_fbls_Sallys_finalFINAL.docx but other than the little bit more appealing GUIs like GitKraken, did anyone ever actually develop some type of git-backed document management system for a document-heavy organization? I know many tech orgs just make their non-tech teams use Gitlab or Github or whatever but they can do this because they have essentially enough sway to force them to learn it, but I work in a very traditional organization that doesn't even use SharePoint let alone some type of structured version control and instead emails around everything. Oh, and for our use-case it needs to be self-hosted / on-prem , lol for reasons unknown to me.

Kind of something like a very stripped down Github that has a better UI, branching DVCS and made with markdown so you could visualize the entire workflow and each diff of a document. A self-hosted version of this Almanac app or Forestry IO so you've got a WYSIWYG rich text editor with version control for the non-tech users, but it's fully connected to a git repository so the more technically inclined can push out markdown and integrate it with their other workflows and you could do stuff like linting and other pipeline jobs

Atlassian started doing this with their Work Management or whatever but thats only an issue tracker and doesn't have any version control.

edit. I guess what I mean here is a 'docs as code' GUI for markdown based version control but not designed for technical users but like lawyers, policy, govt and academia in the humanities side that would want to be able to collaborate off very fine detailed docs that would be too flexible just using Google docs but not as 'scary' as GitHub.

Oysters Autobio fucked around with this message at 01:48 on Jul 27, 2022

Adbot
ADBOT LOVES YOU

Oysters Autobio
Mar 13, 2017

New Yorp New Yorp posted:

Git (and dvcs in general) is 100% the wrong tool for this job. Beside being insanely complicated, it's unsuitable for storing revisions of non-text assets. Why invest the effort into shoehorning a tool to work with content types and work flows for which it is, by design, wholly unsuitable?

If you need document management, use a document management tool. There are tons out there and they all suck in their own special way, but using git would be 1000x worse.

Aside from the complexity, what do you think are the biggest drawbacks for it? If you worked off of it in a compatible format like Markdown which then could run a pipeline to use pandoc or something else to convert it to whatever other format you need, would it not work to show those kind of revisions?

I thought "by design" Git is used to store complicated revisions of text (code). If you were say, a collaborative team of researchers writing out a complicated paper, wouldn't the complexity of the tool be worth the added tracking of revisions, commentary and the ability to branch? Or say legislators/governance teams or whatever drafting complex policy/legal docs but want to be able to allow for contributions without fear of it messing up their originals (I guess Wikis have version control too but in highly bureaucratic organizations getting "approval" is sacrosanct unfortunately) and who need precise tracking of revisions?

Only other document management tools I've used is something like SharePoint which requires you to "check-in" entire document or "check-out" entire documents, but this completely prevents concurrent work. I guess merging together different branches could be difficult if you were collaborating off something line by line but couldn't Git merge easily enough text documents that different people were working off of say, different entire sections?

Also interesting would be to use some of the language-based linters people have developed that can automatically scan for things like style guides (i.e. readability, plain language use, lack of passive voice, etc.).

Maybe I'm conflating Git to just mean being able to do "version control with branching" along with using Markdown for What You Say Is What You Mean (WYSIWYM) so that you can focus on just the text and let automation handle all the templating and formatting junk that people get stuck in when writing in MS Word. Definitely agree that using Git as-is would be too clunky for this, so thats why I'm curious if anyone's modded something like Gitlab, stripped it away of the unnecessary parts and re-built it with things like a rich text markdown editor that could be used in MRs.

Oysters Autobio
Mar 13, 2017

New Yorp New Yorp posted:

Your average non-technical document author is not going to be using Markdown (or Latex). They're going to be using Word or some other binary format. For anyone technical enough to be using an actual diffable text format, just learning enough Git to be dangerous is enough.

This seems very much like an idea that doesn't have an audience combined with every problem looking like a nail.

Yeah I guess you're right. I just want people to stop emailing random ad-hoc versions of Word documents, so the appealing parts of Git is the centralization, "single source of truth" and version control but none of those things are super unique to Git.

Unfortunately, most corporate enterprise document management software out there is just garbage so instead people are back to shared drives and email. I'm just gonna try and get people to use Confluence at least, thats maybe an achievable goal.

Oysters Autobio
Mar 13, 2017
Anyone here ever adopt git for a data analytics team?

Trying to slowly push our team to adopt more SWE best practices while not overwhelming them so it'd be interesting to hear people's adaptation of gitflow or whatever with data analytics.

I'm talking here about maintaining both analytical products (jupyter notebook reports, visualizations etc) and small analytical data models (sharing reusable python transform and utility scripts for wrangling data). Can't version control our Tableau dashboards or anything (guess we could but no point for diffs or anything) but at least our notebooks and scripts.

We don't technically run any ETL or anything, but our data comes from all over the place so we're often doing lots of wrangling and cleaning and because most folks don't have a dev or SWE background everyone's keeping their notebooks saved locally or swapping them via email.

We use gitlab and have jupyterhub with the git extension, so I wanna write up a little SOP for a very barebones stripped down workflow of using a central repo. Thinking here to just have a dev and master branch with each person creating feature branches and merging into dev for peer review then into master on "production".

Most folks will prob be using the Gitlab GUI for this, so I'm thinking to setup lots of templates and also sync it up with our tasking tickets in Jira.

Also doing this out of a selfish desire to learn CI, gitlab runners and such because our work flow for "publishing" reports involves a bunch of different manual buttonology and copy paste duplication I think we could automate.

Any tips on stripping a Gitlab project to it's barebones and/or a decent user guide for such a workflow I could bootstrap from?

Oysters Autobio fucked around with this message at 18:52 on Aug 20, 2023

Oysters Autobio
Mar 13, 2017

porkface posted:

Have you looked at Meltano?

I had previously but for some reason looked past it. Thanks for flagging, it looks like actually would be a very good way to approach this in a prepackaged way.

Adbot
ADBOT LOVES YOU

Oysters Autobio
Mar 13, 2017

Slimchandi posted:

I use meltano for pretty much all our extract pipelines, plus you can normally adapt an existing tap for a specific scenario and add/remove what you don't need.

We are also a PowerBI shop so the latest changes with .pbir files move us a step forward to being able to change control even more. Bring it on.

What changes did PBI get that support version control? My big gripe with Tableau is it sucks for version control let alone templating.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply