Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...
Maybe this is less of a programming question and more of a programming tool or programmer question but: can anyone suggest alternatives to Trac?

We're a small bioinformatics lab, rolling our own software and libraries for use by ourselves and sometimes other people. For a while, I've felt that we should be more disciplined about our code and knowledge: have all the code committed to central repositories and backed, have a single place for logging issues, writing down common knowledge and going to for downloading the latest version of our releases. Trac seemed to be the obvious choice - everyone uses it - and I delegated the job to one of my colleagues. However, he's been having terrible problems getting it configured, stumbling over one problem after another and has been roadblocked for a week trying to get it to talk to the database. (The db is running, the connection details are correct, but Trac just keeps reporting that it is unable to connect.) From mailing lists and googling, it looks like our experience isn't uncommon.

Naturally, I wonder what the alternatives are. Any suggestions? For what it's worth, we mainly work in Python, with some Javascript and Java sidelines.

Adbot
ADBOT LOVES YOU

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

Milde posted:

Trac's pretty amazing, I can't imagine why you'd want to throw away such a great piece of software. Are you trying to use MySQL or PostgreSQL with it? It's really meant to be used with SQLite, and there really isn't any good reason not to use SQLite, and that's painless to set up.

Apart from the whole "not working" bit? Admittedly none of us are sys-admins, but it's been fighting back for over a fortnight now. It might be great, but at this stage it's only natural to wonder how much more time we have to devote to just setting it up.

We started with SQLite, had no luck and swapped to PostgresSQL, largely to see if that would isolate the problem. Still no dice - there's a non-informative failure at the connect stage.

quote:

If you are using SQLite, make sure the user running Trac has read/write permissions to the Trac instance folder and the database file. Beyond that, there isn't really much that could go wrong with the database.

Thanks! That might be worth a look - we installed Trac as its own user but maybe one of the perms has gone astray.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...
I'm casting about looking for a suitable cross-platform installer. That is, I have some commandline programs to distribute, that have to be installed by non-technical types, and I'd rather work with a single installer type / product, even if I'll have to produce different installers for each platform.

Of course, the best examples are commercial and expensive (InstallAnywhere, InstallBuilder etc.) or don't support Macs. izPack looked good, but doesn't seem to have any support for installs that require permissions. That is, if the user wants to install into /usr/local/bin, izPack just complains it can't write to that directory, rather than asking for credentials. (The release refers to a feature called "SudoPanel", but it's completely undocumented.)

I've also looked as vainstall, Openjinstaller, jexpress, Advanced installer, Install4J, INstallJammer, Liftoff, Mini, and Innosetup. (Undocumented and buggy, no Mac version & still in dev, commercial, Windows based, Mac support "coming" ...)

Any other suggestions, or an I fresh out of luck?

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

tef posted:

Mac applications tend to come stand alone - you drag Application.app into /Applications

Installers aren't very popular.

outlier posted:

I have some commandline programs to distribute, that have to be installed by non-technical types

An installer is needed. I can't ask people to drag a binary into /usr/local/bin.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

Farrok posted:

Will my experience with MATLAB be helpful as I delve into something new, or will it still feel like starting from scratch? I have a good but not formal understanding of complexity theory and so on from having to write time constrained programs, but at what point is it necessary for me to deepen this understanding with a more formal background (ie from classes or a proper text book or CS reference)? And finally, if I can actually do this and want to use it on my phone, is there anything in particular I should know about programming for Windows Mobile? Any languages that aren't well supported or different considerations I need to always keep in mind, like screen size?

Knowing any programming language makes learning the next one easier, until you eventually can just pick them up, study the novel parts and get going. (I'm onto my 14th language. After a while, they all look the same ...) And Matlab is a decent structured language that basically Algolic, so you'll be in familiar territory with most modern programming languages. I think you can just leap in and worry about theoretical considerations a little later.

Matlab has a lot of similarity to Python, which I'm a big fan of, but if you're looking to target WinMobile, it might not be the right choice. (There are people doing work with Python on mobile devices - but it's unclear to me if the tools are solid or if it's still at the the hacking stage.) In that case, it may make the most sense to target Windows tools and frameworks, so you're looking at .Net, C# and VB. More experienced people can tell you what the preferred approach is.

Now my question: at my place of work (a scientific laboratory), there are people that spend their time manually classifying fly wings. That is, they receive a strange fly, photograph its wing, blow it up and compare it to a bunch of reference photographs to determine what species and strain the fly is. Apparently, flies have characteristic wing shapes and vascular patterns in their wing.

While observing this, I idlely thought that this should be a relatively simple image analysis and categorisation problem. The photos are good quality, there's no background, and the wings are simple but distinctly different. But it's been years since I did anything with imaging. Any pointers on the sort of tech or algorithms I should be looking at? Even a bunch of buzzwords would give me something to google for.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

Farrok posted:

Thanks for the info, thats quite helpful! As to your question, if the vein patterns are as regular for a given fly as you say, it should be quite easy to use any of a variety of image similarity measures to classify them. In particular, cross correlation or sum of square differences (especially the latter) are very easy algorithms to implement. The hard part is determining whether your image of interest needs to be translated or rotated in order to compare it to the reference image...that can be done manually, but then you might as well go ahead and classify it manually, too. You can use methods like gradient decent that iteratively approaches the the right orientation based on improvements in the similarity measures.

Be sure if you try different measures that its easy to switch back and forth between looking for minima and looking for maxima, though. Sum of square differences is a perfect match if it is 0 and cross correlation is perfect if it is 1, for example.

Awesome - thanks. I think I'll try and translate and rotate automatically before doing the manual, but even the manual fitting should be helpful.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...
An oddly specific question - being used to editing code on the Mac with TextMate, I'm happy using E (a TextMate look-a-like) on the PC. But on Ubuntu? Every editor is giving me the creeps.

I don't want an editor that requires I set up a "project". I don't want something I have to choose and open every file I need. What I like about TextMate is being able to open a folder as a project, and have the folder structure there in a sidebar as I edit the code in multiple tabs. Is there anything like this for Linux?

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

TagUrIt posted:

There's also some plugins for gedit that can get what you want. In specific, I'd look at ClassBrowser and if that isn't what you need, maybe fileset.

Awesome - just installed it and this is just what I want.

tef posted:

a warning:

regardless of what question you ask, if it mentions 'text editing' and 'linux' you will get a few responses saying "vim" or "emacs"

these are probably not what you want.

Your prescience is scaring me, sir.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...
A vague question, because I'm still forming my thoughts about this:

We have a number of large datasets that it would be useful to query remotely. The obvious old-school way would be to stick them in a database and open a hole in our firewall for traffic. I'm dissatisfied by this because I'd rather the queries and data were at a higher level (e.g. returning objects not records), I don't want "the other end" to have to worry about implementation details (e.g. "they're using mySQL with this schema") and so on. So I've started thinking about a data server that can be connected to and queried over over the web.

So:

* What's the prior art for this? Of course there's geospatial data, and there must be some huge atmospheric datasets that can be queried. Obviously a plain vanilla webservice might do, but if we need security (which is likely, as some data will not be available to everyone) a persistent connection would be useful.

* Similarly, what sort of art is there for query languages? SQL is an obvious starting point, but while SQL may be what's doing the work behind the scenes, we don't need the full richness of that at the user end, and writing our own general SQL implementation would be unnecessary.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

Triple Tech posted:

How expressive do you want access to your data? There's mostly fetch_by_id and fetch_all. Anything else, given customer intervention, is doomed to slowly reinvent parts of SQL.

More expressive - say "fetch all X with a modification date between Y and Z or a title that includes A or a location that is B". That's what we have at the moment implemented via webforms (and via SQL underneath).

Your point about reinventing SQL is well taken - one of the reasons why I want to see what else has been done before.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...
A graph layout / graphics type problem: I've written some code to draw phylogenetic trees, but I developed it from first principles, it's subpar and I'd like some pointers to decent prior art or algorithms.

In more detail: so a phylogeny is a evolutionary family tree. And I need to draw them, lots of them. In theory this should be easy. A phylogeny is a tree, which is a connected, directed, acyclic graph and there are oodles of packages and algorithms for doing layout and drawing of those, right? But:

1. A branch on a phylogeny represents time between events and so "longer" branches have to be drawn as longer. Almost all graph layout and drawing algorithms treat all branches as being of the same length and draw them accordingly.

2. What prior art / code out there seems to be of very poor quality. It works but extracting the general algorithm is near impossible. I haven't been able to find a general descriptions of drawing graphs with fixed relative branch lengths.

So, any pointers? Obviously literature would be best, as well as anything that directly refers to phylogenies, but if there is are algorithms or decent code for the right sort of graph, that would be great too.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

Scaevolus posted:

You've already considered graphviz, right? This is based on it and seems to do what you want.

Doesn't do different edge lengths for different branch lengths, which is the critical property I'm looking for.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...
So, graph databases:

I know, they're beloved of data scientists and other non-programmers, but they're actually useful. We (a bunch of biologists) are using them to build interaction graphs of molecules (like many other bunches of biologists) to work out causation and interference. But there's so many graph implementations out there that it's ridiculous.

The most popular tool appears to be Neo4J, which in terms of getting a graph up and going is certainly the most straightforward and user-friendly. But:

- The scaling or distributed solutions are poor
- They really push you towards their query language, Cypher, rather than something more portable and programmable like Gremlin. (And I was having huge trouble with their official Python module, creating nodes in code that just never seemed to propagate to the actual db.)
- One graph per database, one database per server. If you want a second graph, make another server.
- I want to do some fairly sophisticated traversal and I'm finding it hard in Neo4J

Then there's the whole property graph versus triplet issue. I find the property approach more intuitive but lately triplets are seeming more logical.

So, what are people's favourite graph platforms? What issues should I look out for?

Adbot
ADBOT LOVES YOU

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

Peristalsis posted:

I've just been given a project to map and import data from our legacy system to our half-built, home-grown replacement. I'll need to analyze the data tables in the old system (StarLIMS), figure out how that data is stored on the new system (if it is at all), and suggest changes to the existing and future schemas in the home-grown program to accommodate the import of the old data.

I guess you'd call this a data migration, though it almost seems like an ETL process. In any case, I've hacked at existing ETL processes in previous jobs, but I've never designed something like this from scratch. I'm doing a Google search now, but I'm wondering if anyone has any first-hand advice. Also, I assume this must be a common enough request that there are some existing tools which can automate much of it - are they worth investigating, or do they tend to require so much customization and configuration that you might as well have written it all from scratch?

I deal with something just like this a few years back: porting a large clinical trial from a legacy db to a new db and system (that actually worked). Some distilled wisdom:

* I think quibbling about whether it's a migration or ETL is neither here not there
* The scale of the task is very hard to predict and all depends on the original data. (Mine was this weird hyper-denormalised and versioned format that was super tough to extract.)
* The standard data science approach would be to write a one-shot ETL / transformation script. My feeling is that this will probably be an iterative task: you run it, something bombs out or is wrong in the output, you correct it and run again, rinse and repeat. This places a big emphasis on being able to correct and add to your script and see what it's doing. I wrote mine as a big table of transformations (get data from X, put it in Y after running it through these functions), which made it easy to see what was happening to anything and modify it.
* Never do anything destructive to your original data.
* There's solutions for this sort of stuff out there, but the only one I'm acquainted with is Pentaho which is fairly heavy weight.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply