Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
raminasi
Jan 25, 2005

a last drink with no ice

LP0 ON FIRE posted:

Thanks for the info. I read this just now: http://www.brpreiss.com/books/opus5/html/page423.html

Seems like there should be solutions to these problems. Like in that example, if you make "head" null, there would also be a bidirectional association to connected data structures. It could check the connected data structures and see if they have any external references if "head" leaves. Of course this would take up more memory and more processing, and it's most likely a lot more complicated than I think.

EDIT: I read more about problems with ARC, and there's a LOT more, and also some ways developed to solve or make problems better.

By "ARC" are you referring to reference counting in general, or Apple's compile-time thing?

I am not sure entirely what you are getting at with your proposed "fix" for the cycle problem, but I'm pretty sure you're talking about something that exceeds most definitions of "reference counting," which tend to be pretty simple.

Adbot
ADBOT LOVES YOU

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

LP0 ON FIRE posted:

Is there any reason garbage collection is better than automatic reference counting?

The short answer is that each has trade-offs. The long answer is very complicated, and part of that complication is that the lines between automatic reference counting and garbage collection get blurrier the closer you look.

Naive reference-counting systems will leak objects containing strong reference cycles. This is generally considered "incorrect": those objects are unreachable and therefore no longer part of the program's execution state, and generally it seems wrong for things outside of a program's execution state to cause it to halt, as any leak might eventually. There are languages (not many) in which such cycles provably cannot occur; reference counting would then still be "correct". In other languages, it is possible to statically prove the absence of cycles for a subset of objects; objects of those types can be reference-counted. There are collector implementation techniques which rely on using reference cycles and then dynamically detecting the creation of cycles.

Concurrent reference-counting systems tend to impose a relatively high overhead on copying a reference. Naive code patterns for passing around references often produce many reference copies. This can be mitigated with user intervention (e.g. std::move in C++) or compiler intelligence (when the reference is built in to the language). This overhead is at least proportional to the amount of work actually done by the program.

Naive reference-counting systems which destroy objects immediately when the reference count reaches zero are subject to stack overflow problems with deeply recursive references. There are implementation techniques to solve this, but they can introduce some of the non-determinism / unpredictability problems of heap-scanning garbage collection systems.

Naive mark-and-sweep garbage collectors are prone to very long pauses when the full collector kicks in. These pauses are generally considered to be an unacceptably bad performance consequence. This is particularly true because the pauses may kick in at times which are at worst non-deterministic and at best challenging to predict. There are many, many implementation techniques to address this; generally they trade increased per-operation overheads and memory use for fewer or shorter full collections. Generational collectors trade increased operational costs and memory use for fewer full collections (but more expensive mini-collections), assuming particular allocation patterns. Incremental collectors increase per-operation costs and decrease memory locality, but eliminate most full collections. Copying collectors produce more compact heaps but require significantly more memory. Asynchronous collectors make collection more transparent to the execution threads, but they increase operational costs substantially, tie up a second hardware thread, and tend to make processor cache coherence systems really, really unhappy.

Reference counting systems can cheaply forward a reference. Mark-and-sweep collectors can cheaply forward and copy references. Other collectors significantly increase both costs, often even when just forwarding a reference (e.g. when moving a reference from local scope to the heap).

Reference counting systems cleanly interoperate with other reference-counting and unique-ownership memory management schemes. Most heap-inspection collectors do not. Other systems can usually hold references into the GC heap, but only with a specially-scanned root object, and having a large number of such roots tends to heavily pessimize the collector. The GC heap can hold references to memory in other systems, but safely working with those references can be error-prone in the "glue" code.

Concurrent garbage collectors can usually efficiently support a language memory model that's safe (in a certain basic sense) against race conditions. Reference-counting systems generally cannot; an assignment to an existing reference requires exclusive access to that memory.

You can add a cycle collector to reference-counting systems. In principle, that's just a kind of collector, but when it's presented like this, it implies a collector that's heavily biased against the presence of cycles.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

rjmccall posted:

Concurrent garbage collectors can usually efficiently support a language memory model that's safe (in a certain basic sense) against race conditions. Reference-counting systems generally cannot; an assignment to an existing reference requires exclusive access to that memory.

Sorry, this isn't quite right. Assigning can in principle be done locklessly with a simple atomic swap; it's reading/copying that requires something more expensive to briefly lock out writers.

Amberskin
Dec 22, 2013

We come in peace! Legit!

Cast_No_Shadow posted:

Yeah it is, this is very early preliminary research at the moment. I was hoping to be pointed in possible directions.

Other constraints include having a minimal impact on the customer. Ideally as a customer you'd buy your item, black box stuff happens, your bonus is credited and you get dinged on your smartphone with no extra effort on your part (aside from maybe pre-registration). Although I recognize that this is an ideal.
Think huge companies as well, more wallgreens that mom & pop. Safe to assume they already have a certain level of CRM infrastructure.
Not constrained to just retailers that was just an easy example.

Is that any help?

Whatever technology you end selecting, you will be reduced to two kinds of solutions: online (realtime or near realtime) and batch.

If you need to acknowledge your "bonus" as soon as posible after the transaction has been completed, I'd say your best bet could be to publish a plain, old SOAP based web service and ask your affiliated retailers to invoke a call to it in their payment/order workflow. Nowadays I'd say every major retail software can do it, and even for a homegrown solution it should be relatively easy to implement.

If you do not need immediate responses, design a batch interface to your system so your affiliates can send you transaction files to process at your leisure.

Or you can do both.

(background: some years of experience working in the banking industry designing and making software architectures to do similar things).

Bob Morales
Aug 18, 2006


Just wear the fucking mask, Bob

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

The gently caress is a 'lambda'?

carry on then
Jul 10, 2010

by VideoGames

(and can't post for 10 years!)

Bob Morales posted:

The gently caress is a 'lambda'?

An anonymous function, or the operator used to create one: https://en.wikipedia.org/wiki/Anonymous_function

You can assign it to a variable, provide it as an argument to a function, etc. It usually captures the variables in scope when it's created.

Anything in particular you're asking about?

Bob Morales
Aug 18, 2006


Just wear the fucking mask, Bob

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

carry on then posted:

An anonymous function, or the operator used to create one: https://en.wikipedia.org/wiki/Anonymous_function

You can assign it to a variable, provide it as an argument to a function, etc. It usually captures the variables in scope when it's created.

Anything in particular you're asking about?

I kind of get it from Ruby or Javascript but I just don't grasp the 'how' or 'why'

TooMuchAbstraction
Oct 14, 2012

I spent four years making
Waves of Steel
Hell yes I'm going to turn my avatar into an ad for it.
Fun Shoe

Bob Morales posted:

I kind of get it from Ruby or Javascript but I just don't grasp the 'how' or 'why'

I have some bit of throwaway logic that I need to apply in bulk (for example, a sorting routine) but I don't want to make a fully-fledged function. Or I have a bit of logic that needs to fire when some piece of GUI is interacted with (e.g. a button is clicked), and I want the logic to be close to the GUI setup code. Those are the two use cases I run into most often, but there's plenty more. Start thinking of functions as interchangeable objects in their own right (functions as first-class objects) and you'll probably start using lambdas more often. Unfortunately, many languages don't actually let you treat functions as objects. :argh:

Skandranon
Sep 6, 2008
fucking stupid, dont listen to me

TooMuchAbstraction posted:

I have some bit of throwaway logic that I need to apply in bulk (for example, a sorting routine) but I don't want to make a fully-fledged function. Or I have a bit of logic that needs to fire when some piece of GUI is interacted with (e.g. a button is clicked), and I want the logic to be close to the GUI setup code. Those are the two use cases I run into most often, but there's plenty more. Start thinking of functions as interchangeable objects in their own right (functions as first-class objects) and you'll probably start using lambdas more often. Unfortunately, many languages don't actually let you treat functions as objects. :argh:

Yeah, I really like that about JavaScript, I think I would be cranky not being able to simply reassign functions.

Squashy Nipples
Aug 18, 2007

Not sure if this is right place to ask, but we have some smart folks in here, so...

I'm going to be leading a Big Data implementation. I'm the manager, not a developer, so I won't be down in the weeds. That said, I will be directly responsible for the quality of the translations of business requirements into reporting systems, so I need to bone up on Hadoop-compatible DA/BI tools.

Looks like NoSQL is the big one. Any recommendations on reading materials/tutorials for NoSQL?

If we can compare it to the many flavors of SQL, how much do the different implementations of NoSQL differ?

Any advice appreciated.

the talent deficit
Dec 20, 2003

self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture





Cassandra is the only nosql product that really gets used in 'big data' and it's integration with hadoop/spark is not so great. Aurora and Redshift are way more common (both are SQL, sort of). What kind of tools are you looking for?

Squashy Nipples
Aug 18, 2007

Not sure, I haven't been able to talk to the tech guys directly yet. I think it's the Cloudera variety of Hadoop?

Will dig more.

HappyHippo
Nov 19, 2003
Do you have an Air Miles Card?

TooMuchAbstraction posted:

I have some bit of throwaway logic that I need to apply in bulk (for example, a sorting routine) but I don't want to make a fully-fledged function. Or I have a bit of logic that needs to fire when some piece of GUI is interacted with (e.g. a button is clicked), and I want the logic to be close to the GUI setup code. Those are the two use cases I run into most often, but there's plenty more. Start thinking of functions as interchangeable objects in their own right (functions as first-class objects) and you'll probably start using lambdas more often. Unfortunately, many languages don't actually let you treat functions as objects. :argh:

Also, closures (in languages that have them).

Beef
Jul 26, 2004
Also, encapsulation of state.

Let's have the poor man's OO, poor man's closures argument again :allears:

Hadlock
Nov 9, 2004

I want to build a docker containerized app that will accept API posts and then display the most recent API post as an html page and save it to a database. The API post would include three things, auth token, roomid(int), occupied(bool). Longer term plan would be to output the last X records from the database based on an API for building graphs, etc.

Also I'd like to throw this finished in a docker container so if I could build it on top of a common language with a good docker base image (https://hub.docker.com/explore/) that would be great.

So to summarize, for an extremely simple API app with HTML output and basic db (could even be a flat file CSV really, performance is not important here) that has good containerland support, best language/tutorial(s)? Building and maintaining Ruby on Rails seems really excessive for what's little more than a "microservice".

ExcessBLarg!
Sep 1, 2001

Squashy Nipples posted:

If we can compare it to the many flavors of SQL, how much do the different implementations of NoSQL differ?
SQL is an actual standardized language that databases strive to support. "NoSQL" isn't any kind of standard, but a catch-all buzzword for databases that don't pretend to speak SQL. It's kind of like "cloud" that way.

Of the actual products that describe themselves as NoSQL (or did at one time), they tend to have little cross-compatibility and generally quite different data models, but are useful for different purposes. The thing they all tend to have in common is that, 20 years ago, someone might've shoehorned a SQL RDBMS into use for whatever task theses new databases were designed to handle, but for reasons of scalability or maintainability that became no longer effective for very large products (Google Search, Amazon's store, etc.). For a while NoSQL was really popular and everyone switched to Mongo, or something, whether it was warranted or not.

In a big data context you have to evaluate what your needs are in a data store and figure out which best targets your data needs while also being compatible with the frameworks you use. Such a store might be a SQL database, or might not.

Squashy Nipples
Aug 18, 2007

ExcessBLarg! posted:

SQL is an actual standardized language that databases strive to support. "NoSQL" isn't any kind of standard, but a catch-all buzzword for databases that don't pretend to speak SQL. It's kind of like "cloud" that way.

Of the actual products that describe themselves as NoSQL (or did at one time), they tend to have little cross-compatibility and generally quite different data models, but are useful for different purposes. The thing they all tend to have in common is that, 20 years ago, someone might've shoehorned a SQL RDBMS into use for whatever task theses new databases were designed to handle, but for reasons of scalability or maintainability that became no longer effective for very large products (Google Search, Amazon's store, etc.). For a while NoSQL was really popular and everyone switched to Mongo, or something, whether it was warranted or not.

In a big data context you have to evaluate what your needs are in a data store and figure out which best targets your data needs while also being compatible with the frameworks you use. Such a store might be a SQL database, or might not.

Wow, great answer, thank you!

Looks like they've mostly got the plumbing installed, and haven't made a lot of decisions on the analysis front. I saw one reference to a "Cloudera Analysis Package", but it was in a document translated from Spanish, so who knows. Guess I won't know for sure until I get to take a dive into the lake myself.

Slanderer
May 6, 2007
I'm currently dealing with a large data set from an experiment, and I'm looking to an alternative to Labview for viewing it. Basically, I have 30 test devices logging 200 points of data every minute, which is streamed to a set of PCs controlling the test and saved as CSV files (1 file per device per day). I'm looking for a better way to visualize and analyze the data, but I don't know the first thing about this stuff. I know that I want to be able to inspect certain signals from one device over months and zoom down to the scales of minutes, and also compare multiple devices at the same time (while keeping in mind that each unit will timestamp it's own data set individually, either the plotter needs to be ok with arbitrary floating point scales or I need to correct the timebase on each device's data and account for clock drift and stuff). I also need to be able to go right from the graph to the data (for instance, to check the state of various control and status registers at a certain point in the graph).

Can anyone point me in a direction for stuff to read about visualizing and analyzing Not-Big- Data?

TooMuchAbstraction
Oct 14, 2012

I spent four years making
Waves of Steel
Hell yes I'm going to turn my avatar into an ad for it.
Fun Shoe

Slanderer posted:

I'm currently dealing with a large data set from an experiment, and I'm looking to an alternative to Labview for viewing it. Basically, I have 30 test devices logging 200 points of data every minute, which is streamed to a set of PCs controlling the test and saved as CSV files (1 file per device per day). I'm looking for a better way to visualize and analyze the data, but I don't know the first thing about this stuff. I know that I want to be able to inspect certain signals from one device over months and zoom down to the scales of minutes, and also compare multiple devices at the same time (while keeping in mind that each unit will timestamp it's own data set individually, either the plotter needs to be ok with arbitrary floating point scales or I need to correct the timebase on each device's data and account for clock drift and stuff). I also need to be able to go right from the graph to the data (for instance, to check the state of various control and status registers at a certain point in the graph).

Can anyone point me in a direction for stuff to read about visualizing and analyzing Not-Big- Data?

My inclination would be to write some kind of script where you'd tell it what devices and what time range you care about, then it would spit out a single CSV file containing the relevant data. Then you load that CSV in Excel or another spreadsheet program (or in Matlab) and use it to generate graphs. The script wouldn't need to be very complicated since it's just mapping your "what device and what time range" input into a list of files and then merging those files together. If you wanted to be a bit fancier you could learn to use Matplotlib in Python; then you wouldn't need the intermediary CSV file.

The important thing is this case is to have an approach that's smarter than "I'll load the entire dataset and then select out the parts that are relevant". Your 30 test devices are probably generating, what, 50-100MB of data per day all told? (It's bigger than strictly speaking necessary, since the numbers are stored as text rather than binary, but that's fine). Loading several months' worth of data is still entirely in the range of what modern computers are capable of, but it'd slow things down especially if you don't have a solid-state drive.

fritz
Jul 26, 2003

TooMuchAbstraction posted:

My inclination would be to write some kind of script where you'd tell it what devices and what time range you care about, then it would spit out a single CSV file containing the relevant data. Then you load that CSV in Excel or another spreadsheet program (or in Matlab) and use it to generate graphs.
Why not just dump the data into a SQL db? surely both matlab and excel support populating things from queries.

TooMuchAbstraction
Oct 14, 2012

I spent four years making
Waves of Steel
Hell yes I'm going to turn my avatar into an ad for it.
Fun Shoe

fritz posted:

Why not just dump the data into a SQL db? surely both matlab and excel support populating things from queries.

Because then I have to maintain the CSV files and the database and have to coordinate interchange between the database and the analysis tools. Absolutely this would be a good idea if I expected to be doing this on a large scale, but I got the impression that Slanderer is in an academic environment where a) simplicity is valuable (every addition to the toolchain is one more thing for the next hapless grad student to have to maintain), and b) there's not really a need for large-scale.

Slanderer
May 6, 2007

TooMuchAbstraction posted:

My inclination would be to write some kind of script where you'd tell it what devices and what time range you care about, then it would spit out a single CSV file containing the relevant data. Then you load that CSV in Excel or another spreadsheet program (or in Matlab) and use it to generate graphs. The script wouldn't need to be very complicated since it's just mapping your "what device and what time range" input into a list of files and then merging those files together. If you wanted to be a bit fancier you could learn to use Matplotlib in Python; then you wouldn't need the intermediary CSV file.

The important thing is this case is to have an approach that's smarter than "I'll load the entire dataset and then select out the parts that are relevant". Your 30 test devices are probably generating, what, 50-100MB of data per day all told? (It's bigger than strictly speaking necessary, since the numbers are stored as text rather than binary, but that's fine). Loading several months' worth of data is still entirely in the range of what modern computers are capable of, but it'd slow things down especially if you don't have a solid-state drive.

I had assumed I would want to post-process the data into a new format, if only to allow me to synchronize timebases, and zero-fill missing data (from device downtime), and do any other cleaning-up. As for scripts+excel: I threw together an excel macro to import data from all of my devices and generate graphs + summary information for all of them, but haven't combined that with a script to concatenate csv files yet (i should probably just get a windows build of sed and let that do the majority of the work). But the main deal breaker (for me) is the limitations of excel charts. While they look nicer than most open-source desktop side visualization utilities*, it has 3 big limitations that come to mind:

1. I can't dynamically scale any of the axes with my mouse wheel
2. Only 2 y-axes per graph
3. They start to break when you have sufficiently-large datasets that are plotted without any decimation (I've hit that limit in the past), since it can't dynamically resample for the visible scale

TooMuchAbstraction posted:

Because then I have to maintain the CSV files and the database and have to coordinate interchange between the database and the analysis tools. Absolutely this would be a good idea if I expected to be doing this on a large scale, but I got the impression that Slanderer is in an academic environment where a) simplicity is valuable (every addition to the toolchain is one more thing for the next hapless grad student to have to maintain), and b) there's not really a need for large-scale.

Not academic, just work. I'm running some long-term tests on custom hardware I made, which is being analyzed by myself and a few others. While we have a test tool guy making some stuff in Labview to look at the data, it's taking forever because Labview is garbage. I can crap out more VBA for excel, but not anything with a simple interface that I can give to someone else to use (I do embedded software, so giving someone the ability to uncomment an optional #define is my version of a user interface), and it will still have the limitations of excel charts.


*stuff like Highcharts makes me wish I knew javascript and was able to build a local site to use it, because goddamn those are some nice graphs
http://www.highcharts.com/demo

Hadlock
Nov 9, 2004

You may want to look at ThingSpeak

The free tier allows up to 8 datapoints per API call, with a minimum average API delay of 15 seconds between posts. From there you can import the data directly in to Matlab with a visualization wizard of some sort

https://thingspeak.com/apps
https://thingspeak.com/apps/matlab_visualizations/templates

Thingspeak is also open source, so you can install it on your own server for free. Then you can do 100 datapoints every 1 second if you wanted.

If you have a linux machine somewhere, install Docker and you can spin up a ThingSpeak server in about 90 seconds. Pull this repository using github

code:
sudo apt-get install docker-engine
sudo service docker start
git clone https://github.com/Hadlock/thingspeak.git
docker-compose up -d
docker-compose run --rm web rake db:create
docker-compose run --rm web rake db:schema:load
I'm using this to pull in sensor data from the two restrooms in our (overcrowded) office and post it to a webpage so people know when there's a bathroom free. My use case is much lower than yours (2 bathrooms + 2 conference rooms @ 20 seconds) but it's an easy way to dump sensor data in to a database and get it back out again.

Hadlock fucked around with this message at 06:24 on Jan 22, 2016

Slanderer
May 6, 2007

Hadlock posted:

You may want to look at ThingSpeak

The free tier allows up to 8 datapoints per API call, with a minimum average API delay of 15 seconds between posts. From there you can import the data directly in to Matlab with a visualization wizard of some sort

https://thingspeak.com/apps
https://thingspeak.com/apps/matlab_visualizations/templates

Thingspeak is also open source, so you can install it on your own server for free. Then you can do 100 datapoints every 1 second if you wanted.

If you have a linux machine somewhere, install Docker and you can spin up a ThingSpeak server in about 90 seconds. Pull this repository using github

code:
sudo apt-get install docker-engine
sudo service docker start
git clone [url]https://github.com/Hadlock/thingspeak.git[/url]
docker-compose up -d
docker-compose run --rm web rake db:create
docker-compose run --rm web rake db:schema:load
I'm using this to pull in sensor data from the two restrooms in our (overcrowded) office and post it to a webpage so people know when there's a bathroom free. My use case is much lower than yours (2 bathrooms + 2 conference rooms @ 20 seconds) but it's an easy way to dump sensor data in to a database and get it back out again.

Thanks for the suggestion, but i dont think this works for me. My test machines are off the network for some reason (out of my hands), so I have technician manually copying the latest files off of them once a day and uploading them to a network share. If had network access (and could update more datapoints), this might work.

I gave Plotly a try, since their zoomable graphs seem to be pretty decent. I was able to upload single data sets from single device without much trouble, but after I wrote a batch file to parse and concatenate all of the CSV files for a single device for the past 2 1/2 months, it let me down. I'm not sure if that's a limit to the upload size when using their web version and on a free account, since their documentation is loving stupid. The csv file is like 60 MB (the xlsx version is only 40MBfor some reason).

Peristalsis
Apr 5, 2004
Move along.

Slanderer posted:

I'm currently dealing with a large data set from an experiment, and I'm looking to an alternative to Labview for viewing it. Basically, I have 30 test devices logging 200 points of data every minute, which is streamed to a set of PCs controlling the test and saved as CSV files (1 file per device per day). I'm looking for a better way to visualize and analyze the data, but I don't know the first thing about this stuff. I know that I want to be able to inspect certain signals from one device over months and zoom down to the scales of minutes, and also compare multiple devices at the same time (while keeping in mind that each unit will timestamp it's own data set individually, either the plotter needs to be ok with arbitrary floating point scales or I need to correct the timebase on each device's data and account for clock drift and stuff). I also need to be able to go right from the graph to the data (for instance, to check the state of various control and status registers at a certain point in the graph).

Can anyone point me in a direction for stuff to read about visualizing and analyzing Not-Big- Data?


Slanderer posted:

I had assumed I would want to post-process the data into a new format, if only to allow me to synchronize timebases, and zero-fill missing data (from device downtime), and do any other cleaning-up. As for scripts+excel: I threw together an excel macro to import data from all of my devices and generate graphs + summary information for all of them, but haven't combined that with a script to concatenate csv files yet (i should probably just get a windows build of sed and let that do the majority of the work). But the main deal breaker (for me) is the limitations of excel charts. While they look nicer than most open-source desktop side visualization utilities*, it has 3 big limitations that come to mind:

1. I can't dynamically scale any of the axes with my mouse wheel
2. Only 2 y-axes per graph
3. They start to break when you have sufficiently-large datasets that are plotted without any decimation (I've hit that limit in the past), since it can't dynamically resample for the visible scale


Not academic, just work. I'm running some long-term tests on custom hardware I made, which is being analyzed by myself and a few others. While we have a test tool guy making some stuff in Labview to look at the data, it's taking forever because Labview is garbage. I can crap out more VBA for excel, but not anything with a simple interface that I can give to someone else to use (I do embedded software, so giving someone the ability to uncomment an optional #define is my version of a user interface), and it will still have the limitations of excel charts.


*stuff like Highcharts makes me wish I knew javascript and was able to build a local site to use it, because goddamn those are some nice graphs
http://www.highcharts.com/demo

It sounds like you're trying to find a tool you can use to kludge together a solution, and that you're hoping to avoid actually creating an application of some sort. That's fine, but this really seems to me like a perfect place for a small database app - whether it's a web app or a desktop executable. That way, you can import all the data into a single place, add more data later without problems, and filter and sort based on data columns, rather than sorting through files all the time. A coworker of mine just did something that sounds kind of similar - he imported tons of spreadsheet data generated by a researcher who just left our organization, displayed it with Highcharts (I think), and wrapped it all up in a Ruby on Rails app. He's a really good programmer, and it took him a week or two of effort* to get it all sorted out, but it's a really nice app now that anyone can use, and that can be pretty easily extended/modified if/when needs change.

If you're confident that this is a short-term need for specific data, then it probably doesn't make sense to pursue a software engineering degree for the sake of getting a couple of nice charts to use 3 times. But if this is something that might be ongoing, or that you can later convert from an app to test out your setup to an app to analyze the data it provides, it might be worth looking for someone to write something up for you. You might also search open source repositories to see if anyone has already done something similar that you can tweak to get what you want.

If you really do just want an upgrade from Excel, my first thought is to use Matlab. You can use it to write data import scripts, filter and sort the data, and display charts of the data. I have no idea how its charting capabilities compare to any other package's, but you might also be able to import 3rd party tools into it. I know it used to have a pretty extensive, built-in interface to MS Office, so it's possible that you could funnel its output to some other charting package. One benefit of Matlab is that you can use it on whatever level you want - you can hack together one-off scripts to use at its command line, write more formal software applications with it, or find a place in between.

* I'm not sure if he worked on this full time over that period, or just intermittently between other assignments


Slanderer posted:

Thanks for the suggestion, but i dont think this works for me. My test machines are off the network for some reason (out of my hands), so I have technician manually copying the latest files off of them once a day and uploading them to a network share. If had network access (and could update more datapoints), this might work.

I gave Plotly a try, since their zoomable graphs seem to be pretty decent. I was able to upload single data sets from single device without much trouble, but after I wrote a batch file to parse and concatenate all of the CSV files for a single device for the past 2 1/2 months, it let me down. I'm not sure if that's a limit to the upload size when using their web version and on a free account, since their documentation is loving stupid. The csv file is like 60 MB (the xlsx version is only 40MBfor some reason).

When you say you wrote a batch file, do you mean an actual MS-DOS .bat file? If so, I'd suggest doing something else. Anything else. Every time I've tried to use them in the past 15 years, I've quickly run into problems with the lack of support and documentation for them. It certainly could be that plot.ly has limitations on its free version, but it's also possible that your batch file is corrupting the data. Even updating to a Linux shell script should give you more confidence in the outcome. If you're stuck with a Windows environment, I think CygWin gives you the ability to run Unix utilities in Windows, and you could also give PowerShell a try (though I've never used it, and can't say if it's a good tool for this sort of thing). The nuclear option would be to upgrade to a full scripting language to do this stuff. Python seems to be the most fashionable right now, but Ruby and Perl can also get the job done. I guess it's also possible (not likely) that a 60 MB file isn't uploading well over your browser, and that won't change with a different tool. I'd assume that the Excel file is smaller because it's stored in binary format, while the CSV is in text.

Slanderer
May 6, 2007

Peristalsis posted:

It sounds like you're trying to find a tool you can use to kludge together a solution, and that you're hoping to avoid actually creating an application of some sort. That's fine, but this really seems to me like a perfect place for a small database app - whether it's a web app or a desktop executable. That way, you can import all the data into a single place, add more data later without problems, and filter and sort based on data columns, rather than sorting through files all the time. A coworker of mine just did something that sounds kind of similar - he imported tons of spreadsheet data generated by a researcher who just left our organization, displayed it with Highcharts (I think), and wrapped it all up in a Ruby on Rails app. He's a really good programmer, and it took him a week or two of effort* to get it all sorted out, but it's a really nice app now that anyone can use, and that can be pretty easily extended/modified if/when needs change.

If you're confident that this is a short-term need for specific data, then it probably doesn't make sense to pursue a software engineering degree for the sake of getting a couple of nice charts to use 3 times. But if this is something that might be ongoing, or that you can later convert from an app to test out your setup to an app to analyze the data it provides, it might be worth looking for someone to write something up for you. You might also search open source repositories to see if anyone has already done something similar that you can tweak to get what you want.

If you really do just want an upgrade from Excel, my first thought is to use Matlab. You can use it to write data import scripts, filter and sort the data, and display charts of the data. I have no idea how its charting capabilities compare to any other package's, but you might also be able to import 3rd party tools into it. I know it used to have a pretty extensive, built-in interface to MS Office, so it's possible that you could funnel its output to some other charting package. One benefit of Matlab is that you can use it on whatever level you want - you can hack together one-off scripts to use at its command line, write more formal software applications with it, or find a place in between.

* I'm not sure if he worked on this full time over that period, or just intermittently between other assignments

This is something we would use in an ongoing capacity, but I'm not going to be able to convince anyone that we need a guy to do this anytime soon, and since I can't get that up and running myself I can't make any prototypes to show people and convince them of their utility. I guess I can take another shot at Matlab, but it definitely can't make the zoomable graphs I'm looking for, and only people with a Matlab license can run it, so I'm really not sure of it's utility here (unless it can handle large datasets better than excel)

Peristalsis posted:

When you say you wrote a batch file, do you mean an actual MS-DOS .bat file? If so, I'd suggest doing something else. Anything else. Every time I've tried to use them in the past 15 years, I've quickly run into problems with the lack of support and documentation for them. It certainly could be that plot.ly has limitations on its free version, but it's also possible that your batch file is corrupting the data. Even updating to a Linux shell script should give you more confidence in the outcome. If you're stuck with a Windows environment, I think CygWin gives you the ability to run Unix utilities in Windows, and you could also give PowerShell a try (though I've never used it, and can't say if it's a good tool for this sort of thing). The nuclear option would be to upgrade to a full scripting language to do this stuff. Python seems to be the most fashionable right now, but Ruby and Perl can also get the job done. I guess it's also possible (not likely) that a 60 MB file isn't uploading well over your browser, and that won't change with a different tool. I'd assume that the Excel file is smaller because it's stored in binary format, while the CSV is in text.

Hell yeah it was a .bat file, they own. For a task as braindead easy as searching a set of subdirectories for all CSV files with a certain string in their name and concatenating the contents and stripping out non-data lines, all scripting languages are gonna be pretty much the same. I already have cygwin, but for making GBS threads out a 10 line script to test an idea, who loving cares???

TooMuchAbstraction
Oct 14, 2012

I spent four years making
Waves of Steel
Hell yes I'm going to turn my avatar into an ad for it.
Fun Shoe
matplotlib (a Python graphing library) supports zoomable graphs. Could be worth taking a look. You'll also need to use numpy as the input to matplotlib; you can load CSVs directly into numpy. numpy also has good decimation, concatenation, etc. operations.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Big Red Dot in the Python thread keeps talking up his company's Bokeh project. It appears to support zoomable graphs.

Slanderer
May 6, 2007

TooMuchAbstraction posted:

matplotlib (a Python graphing library) supports zoomable graphs. Could be worth taking a look. You'll also need to use numpy as the input to matplotlib; you can load CSVs directly into numpy. numpy also has good decimation, concatenation, etc. operations.


Munkeymon posted:

Big Red Dot in the Python thread keeps talking up his company's Bokeh project. It appears to support zoomable graphs.

Thanks for these suggestions. I just messing with Python to see how hard it would be (and how long it would take) to reprocess my large CSV files to extract pre-selected columns and rows between a date range. I'll give these a shot.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Slanderer posted:

Thanks for these suggestions. I just messing with Python to see how hard it would be (and how long it would take) to reprocess my large CSV files to extract pre-selected columns and rows between a date range. I'll give these a shot.

Let me point you at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html then. The built-in CSV parser isn't bad but Pandas gets really powerful really fast. I'm not a superuser, though. Ask in the Python thread and people way better at it can help you more.

JawnV6
Jul 4, 2004

So hot ...

Slanderer posted:

Thanks for these suggestions. I just messing with Python to see how hard it would be (and how long it would take) to reprocess my large CSV files to extract pre-selected columns and rows between a date range. I'll give these a shot.

I put together a cheap oscope with matplotlib in a couple days. Had a streaming CSV file, the python would grab it and display a few different resolutions (last 100 samples, last hour, all time). Definitely look into numpy/pandas, pandas dataframes in ipython is threatening to replace Excel in my visualization toolbox.

All that said, I'm using a db solution & highcharts now and they are the bees knees. It is absolutely worth spending time to get your data into one of those interfaces.

Spiritus Nox
Sep 2, 2011

Anyone have any opinions on whether codecademy pro is worth it? I was thinking of taking a course on AngularJS there, but most of the meaty projects are locked away behind a subscription. 20 bucks a month doesn't seem bad, but it's enough that I don't want to throw it out there if the lessons aren't worth anything.

Skandranon
Sep 6, 2008
fucking stupid, dont listen to me

Spiritus Nox posted:

Anyone have any opinions on whether codecademy pro is worth it? I was thinking of taking a course on AngularJS there, but most of the meaty projects are locked away behind a subscription. 20 bucks a month doesn't seem bad, but it's enough that I don't want to throw it out there if the lessons aren't worth anything.

Nah, judging by their free stuff for AngularJS, it's already out of date and a poor example of large applications. It's no better than the documentation put out by Google, unless you just love their built in editor. Take a look at TODOMVC, it does a decent job of showing off a bunch of different frameworks, is free, and doesn't force you to do stuff in a certain order.

Spiritus Nox
Sep 2, 2011

Skandranon posted:

Nah, judging by their free stuff for AngularJS, it's already out of date and a poor example of large applications. It's no better than the documentation put out by Google, unless you just love their built in editor. Take a look at TODOMVC, it does a decent job of showing off a bunch of different frameworks, is free, and doesn't force you to do stuff in a certain order.

Thanks for the tip. I'll check TODOMVC out.

FAT32 SHAMER
Aug 16, 2012



Is there a good book/resource for beginner's android development? I've been put in charge of writing the iOS and Android app for my senior project and while I do have some experience writing android apps from years past, I never wrote an entire app, so I'd like to be able to spend a day relearning the basics and hopefully learning enough that I can fake it till I make it

So far I have successfully written the UI for the login and signup activities, and while it's been a learning experience just accomplishing that, I also discovered that I am much rustier than I thought.

Volmarias
Dec 31, 2002

EMAIL... THE INTERNET... SEARCH ENGINES...
Apparently, the Big Nerd Ranch book is pretty decent.

Aside from that, the official site has gotten an order of magnitude better than it was several years ago.

sarehu
Apr 20, 2007

(call/cc call/cc)
I want to profile an application with mutually recursive functions, A -> B -> A -> B -> ...

Using existing profiling tools and profile output display tools, you can look at the percentage of time spent inside A (and its children) and percentage spent inside B (and its children). That isn't very useful with mutual recursion. I'd like to display "time spent inside A (and its children) that is not spent inside some B that is a child of A." That is, those points of time where, looking up the callstack, A is nearer than B. Does an existing tool have this feature?

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
Presumably you could manually sum up time spent in A itself, plus the total time spent in functions (other than B) called directly by A.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
Wouldn't a line-based profile basically do what you need? It wouldn't give you a nice summary of time spent in A and B, but you could just sum the times for non-A/B lines.

If not, the gprof file format is pretty simple and it seems like it should be just a handful of lines of code to munge the file to reduce the time spent in B and all of its callees other than A to zero.

Adbot
ADBOT LOVES YOU

velvet milkman
Feb 13, 2012

by R. Guyovich
Has anyone had any success installing pandas on a virtualenv? I'm on Ubuntu 14.04 and cannot do it. StackOverflow is not helping me. I have no idea what any of the error messages I'm receiving mean. This comes up in angry red:

quote:

Command "/root/divsnipe/venv/bin/python -u -c "import setuptools, tokenize;__fil e__='/tmp/pip-build-YUHEfr/pandas/setup.py';exec(compile(getattr(tokenize, 'open ', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --r ecord /tmp/pip-YfWsKD-record/install-record.txt --single-version-externally-mana ged --compile --install-headers /root/divsnipe/venv/include/site/python2.7/panda s" failed with error code 1 in /tmp/pip-build-YUHEfr/pandas

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply