Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Hughmoris posted:

Question for those who use Postgres, before I go off and make a terrible tool. Is there anything that will read a CSV or Parquet file and auto-generate a .pgsql script that will create the tables and schemas for said CSV, so I can import it? Right now for learning purposes, I'm having to type out all the columns and types of said CSV before I can copy it with psql. It gets a bit tiring with multiple CSVs, or lots of columns.


Dbeaver

Adbot
ADBOT LOVES YOU

Dawncloack
Nov 26, 2007
ECKS DEE!
Nap Ghost
drat I am megadumb. Thanks.

uguu
Mar 9, 2014

I have to create a database for my job, but I'm unsure of the right way to design it.
We have a bunch of different datasets with timeseries data and we want to gather them all together.
I'm thinking of having either one big table with these columns:

dataset location time energy_use
uk xx
australia yy
Or a table listing all the datasets and then a different table dataset_table for each dataset with

location time energy_use
xx
yy
The second solution seems more elegant, but I can't figure out how to connect the dataset table to the different tables.

edit: the goal is to be able to select only the data from a selected dataset.
Maybe I can have a table BLA with:

dataset dataset_table
uk uk_table
australia australia_table

and do select * from (select dataset_table from BLA where dataset is uk)

uguu fucked around with this message at 00:11 on Aug 24, 2022

nielsm
Jun 1, 2009



How large are those datasets, how many are there, and which DBMS will you be using?

uguu
Mar 9, 2014

We'll be starting with about a dozen datasets, they are a couple of gigabyte each and we're using timescaleDB, which is postgresql.

The goal is to create an opensource dataset of building energy usage, which researchers can access and add to eventually.

I'm thinking I can have a table BLA with:

dataset dataset_table
uk uk_table
australia australia_table

and do select * from (select dataset_table from BLA where dataset is uk)

nielsm
Jun 1, 2009



That isn't valid SQL syntax.

If the datasets are all homogenous they should probably go into one table.
If you want to separate their storage you can maybe use table partitioning to split them up by dataset I'd.

uguu
Mar 9, 2014

They're not homogeneous unfortunately, some come with additional columns. Also each dataset has auxiliary tables describing e.g. the weather for each point in time or building characteristics. And those especially are heterogeneous.

Is there no way to link two tables, except by connecting on identical columns?

I could do the table selecting in the API, I guess?
Have the user select the UK dataset, then do select dataset _table from BLA, have that return uk_table and then create a query with select * from uk_table.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Does a "dataset" contain more than one "location"?

A good place to start is the PostgreSQL introduction/tutorial. The way to "link tables together" in SQL is with foreign keys. You need to understand is the relationships are one-to-one or one-to-many; many people start with ERDs (entity relationship diagrams) to see how things will "connect".

The additional data must be dealt with also. Rows of a table contain the same columns, so variable interpretation of rows is not advised. If the extra days is sometimes just being retrieved, but never a part of a query/filter, then a common pattern is to stuff it into a json field.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

uguu posted:

They're not homogeneous unfortunately, some come with additional columns. Also each dataset has auxiliary tables describing e.g. the weather for each point in time or building characteristics. And those especially are heterogeneous.

Is there no way to link two tables, except by connecting on identical columns?

I could do the table selecting in the API, I guess?
Have the user select the UK dataset, then do select dataset _table from BLA, have that return uk_table and then create a query with select * from uk_table.

Seems to me you're approaching the problem from the wrong end, or at least from an angle that makes a relational database sort of a poor fit for the task. If the goal is to be able to compare time series data from different data sets, the very first question you must ask yourself is what is actually important to compare. Then you design your database for comparing those common data points and figure out how to ingest the source data in a way that makes an apples to apples comparison possible. Only then can you start considering how to store and display any additional data.

To me it seems though that the bigger issue here is that this might be a fundamentally ill-posed problem. Trying to compare datasets of different origins with seemingly no standard methodology for data gathering is fraught with peril and difficult even with a lot of manual fudging. It might honestly be better to just do the minimum possible processing of the data into a sort of minimum structured format, and toss all of the details into a JSON blob that it's up to whatever researcher is using it to try to interpret.

uguu
Mar 9, 2014

Those are good points.
I considered InfluxDB, then I could just throw everything into a bucket and wash my hands of it. But as little as I know of sql, i know nothing of flux and the manual wasn't very helpful.
I will think about this a bunch more.
Thansk for the advice!

Fanged Lawn Wormy
Jan 4, 2008

SQUEAK! SQUEAK! SQUEAK!
I mostly program Arduino stuff. I'd put myself at a level of "Advanced" I guess... I use VisualStudio with a plugin to make things easier, and I have written (very basic) libraries and stuff before.
I mostly program solo. I'm self-taught, and don't really have a set of coding peers to ask questions like this to.

I'd like to start using Github to backup my stuff more often and be able to revert when I mess up hard. I already have an account, and I've messed around with BitBucket and SourceTree a little before, but I've never really stuck with it very well. I like the GUI, but for the most part, I really just want an automated setup that I can ignore until something goes wrong. Is that unreasonable or stupid? Should I actively be more hands on?

Right now, my file organizing is basically a main programming folder, with a series of sub folders organized by Job. Then, each job has sub-folders for individual programs/projects. After a job has been completed for at least a year, I throw the whole job folder into an Archive folder in that main programming parent directory.

In my head, the ideal is that at the end of an editing session, when I close the file out, it uploads those changes automatically, with the option to push an update if I am doing a huge change that I'm unsure of. I don't particularly want to deal with branches, etc, as again, this is mostly solo development, and I rarely am doing experiments on two sections of code at once.

Is this a good way to do it? What's the best way to implement this?

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Fanged Lawn Wormy posted:

I mostly program Arduino stuff. I'd put myself at a level of "Advanced" I guess... I use VisualStudio with a plugin to make things easier, and I have written (very basic) libraries and stuff before.
I mostly program solo. I'm self-taught, and don't really have a set of coding peers to ask questions like this to.

I'd like to start using Github to backup my stuff more often and be able to revert when I mess up hard. I already have an account, and I've messed around with BitBucket and SourceTree a little before, but I've never really stuck with it very well. I like the GUI, but for the most part, I really just want an automated setup that I can ignore until something goes wrong. Is that unreasonable or stupid? Should I actively be more hands on?

Right now, my file organizing is basically a main programming folder, with a series of sub folders organized by Job. Then, each job has sub-folders for individual programs/projects. After a job has been completed for at least a year, I throw the whole job folder into an Archive folder in that main programming parent directory.

In my head, the ideal is that at the end of an editing session, when I close the file out, it uploads those changes automatically, with the option to push an update if I am doing a huge change that I'm unsure of. I don't particularly want to deal with branches, etc, as again, this is mostly solo development, and I rarely am doing experiments on two sections of code at once.

Is this a good way to do it? What's the best way to implement this?

This doesn't have much to do with databases. Did you mean to post this in the general questions thread?

The question you're asking here kind of straddles two things: version control and backing up your personal files. They are separate concerns, although to the extent that you can use a version control tool like Git to distribute changes to multiple machines or online services, it does offer some level of backup-like utility.

If you are undertaking programming projects of any complexity then you owe it to yourself to use version control. It is worth the up-front effort to learn, and especially if you are working on your own there isn't really that much to learn.

I don't think automatically submitting to version control when you close the editor is a good idea. You should be making a decision about when to do it, and it needs to make sense if you later come back to look at the history of what you did. "Finished adding the new button" makes sense, "it was 8pm and I had to put Timmy to bed" doesn't. It might sound like overhead but once you are used to it, it won't seem like that.

BlackMK4
Aug 23, 2006

wat.
Megamarm
edit: I truly am an idiot, wrong thread

BlackMK4 fucked around with this message at 21:29 on Aug 26, 2022

Fanged Lawn Wormy
Jan 4, 2008

SQUEAK! SQUEAK! SQUEAK!

Hammerite posted:

This doesn't have much to do with databases. Did you mean to post this in the general questions thread?

The question you're asking here kind of straddles two things: version control and backing up your personal files. They are separate concerns, although to the extent that you can use a version control tool like Git to distribute changes to multiple machines or online services, it does offer some level of backup-like utility.

If you are undertaking programming projects of any complexity then you owe it to yourself to use version control. It is worth the up-front effort to learn, and especially if you are working on your own there isn't really that much to learn.

I don't think automatically submitting to version control when you close the editor is a good idea. You should be making a decision about when to do it, and it needs to make sense if you later come back to look at the history of what you did. "Finished adding the new button" makes sense, "it was 8pm and I had to put Timmy to bed" doesn't. It might sound like overhead but once you are used to it, it won't seem like that.

Hm. Thanks.

Ill do some work and get it figured. Thanks for the help even if it was the wrong thread! :doh:

Seventh Arrow
Jan 26, 2005

I'm sure this will be kindergarten-level for most of you, but I'm trying to get the lowest salary, along with the department:

https://www.db-fiddle.com/f/4EuiV3Pgbd7BTzRR4BKxZC/20

My impression was that if I don't put a GROUP BY, then it should give me a single result with the lowest overall salary, am I mistaken? If I don't put it, I get an error complaining about the lack of a GROUP BY.

nielsm
Jun 1, 2009



No, you can't get the smallest like that.
Order by the salary ascending (smallest first), and get the first one row.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Would something like this work for you?

https://www.db-fiddle.com/f/7GDN7hqs4aE3s2qvCVvYR4/1

Hughmoris fucked around with this message at 17:17 on Aug 29, 2022

Seventh Arrow
Jan 26, 2005

nielsm posted:

No, you can't get the smallest like that.
Order by the salary ascending (smallest first), and get the first one row.

I will try that, thank you!



I'm just seeing the same thing that I posted :confused:

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Seventh Arrow posted:

I will try that, thank you!

I'm just seeing the same thing that I posted :confused:

Try this one:

https://www.db-fiddle.com/f/7GDN7hqs4aE3s2qvCVvYR4/1

Seventh Arrow
Jan 26, 2005


That works, thanks! I would not have thought of using a subquery, very interesting.

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy

Seventh Arrow posted:

I'm sure this will be kindergarten-level for most of you, but I'm trying to get the lowest salary, along with the department:

https://www.db-fiddle.com/f/4EuiV3Pgbd7BTzRR4BKxZC/20

My impression was that if I don't put a GROUP BY, then it should give me a single result with the lowest overall salary, am I mistaken? If I don't put it, I get an error complaining about the lack of a GROUP BY.

When using any sort of summarization function, imagine the individual rows you're considering to be sheets of paper and your summarization to be a folder those papers are in. In this example, your names & salaries are all individual sheets of paper and your MIN function is your folder.

As soon as you try to calculate that function, imagine writing the answer on the outside of that folder and closing it shut - now you only have access to what you wrote on the folder and cannot look at any of the details inside it. This is why your query fails, as you're trying to return both aggregated information and non-aggregated information. Your query will run just fine if the only thing you try to select is the minimum salary without the details.

If you need to get the department along with the salary, you can use a subquery, select TOP 1 and order by the salary, or you can look into CTEs for more advanced stuff

Seventh Arrow
Jan 26, 2005

kumba posted:

When using any sort of summarization function, imagine the individual rows you're considering to be sheets of paper and your summarization to be a folder those papers are in. In this example, your names & salaries are all individual sheets of paper and your MIN function is your folder.

As soon as you try to calculate that function, imagine writing the answer on the outside of that folder and closing it shut - now you only have access to what you wrote on the folder and cannot look at any of the details inside it. This is why your query fails, as you're trying to return both aggregated information and non-aggregated information. Your query will run just fine if the only thing you try to select is the minimum salary without the details.

If you need to get the department along with the salary, you can use a subquery, select TOP 1 and order by the salary, or you can look into CTEs for more advanced stuff

This is a good analogy, thanks! I will mull it over some.

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
with postgres ltree, is there a pattern that allows you to store normalized data (no duplicate values) along trunk/leaf nodes, with lower nodes "overwriting" values higher in the tree?

so for example if ltree address "a" has a column val:1, "a.b" has null, and "a.b.c" has val:3, I basically want a way to do coalesce(3, null, 1). Is that inbuilt somewhere or what would be an efficient way to do that?

Paul MaudDib fucked around with this message at 17:48 on Sep 6, 2022

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003
That seems like an unusual use of ltree (which is itself somewhat obscure). I don't think ltree will enforce distinct values, my only guess there would be maybe try a unique index, not sure how it would actually work though. But also this doesn't seem particularly tree-structured to me. What if a.b.d also exists, or a.e? Which value takes precedence? If this case will never occur, then could you get away with just using an array and taking the first or last element of the array, depending on how you store it?

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
No, like I want to select leaf nodes in a tree, such that if you walk the tree from the root to the leaves, column values from parents are overridden by children values (if non null) as you walk down the tree.

Which I think is the answer, select leaf nodes, inner join a CTE that builds a parent row for each intermediate lpath on the tree.

Which, of course, is a prime candidate for a materialized view I’d think. Does Postgres do partially/dynamically rematerialized views these days?

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.
I have an Oracle 19 database (hosted on AWS as an RDS Oracle instance) with a table into which I am trying to load data from a windows box using sqlldr.exe.

The table is has five columns:
code:
CREATE TABLE "ADMIN"."USER_TEMP_DAILY" 
   (	"USERNAME" VARCHAR2(100) NOT NULL ENABLE, 
	"TEAM" NUMBER(38,0) NOT NULL ENABLE, 
	"STAMP" TIMESTAMP (6) NOT NULL ENABLE, 
	"SCORE" NUMBER(*,0), 
	"WU" NUMBER(*,0)
   )
and the data is a tab-delimited text file like this:

code:
anonymous	0	2013-06-11 00:00:00	5054409524	35892207
PS3	0	2013-06-11 00:00:00	2812787484	11614900
awachs	181223	2013-06-11 00:00:00	2324963142	450315
hpcs	213904	2013-06-11 00:00:00	1366612571	190002
When I attempt to load the data using a control file
code:
load data
infile 'C:\Scripts\fah_rds_oracle\staging\user_data.txt'
into table user_temp_daily
FIELDS TERMINATED BY '\t'
(username, team, stamp, score, wu)
it fails with "Record 1: Rejected - Error on table USER_TEMP_DAILY, column STAMP. ORA-01843: not a valid month". Clearly there is a problem with the format of my datetime string that Oracle doesn't like.

I've tried changing the nls_date_format to YYYY-MM-DD on the host using a parameter file per AWS instructions but that didn't change anything.

I'm looking at the log file and it has this entry:
code:
Table USER_TEMP_DAILY, loaded from every logical record.
Insert option in effect for this table: INSERT

   Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
USERNAME                            FIRST     *  WHT      CHARACTER            
TEAM                                 NEXT     *  WHT      CHARACTER            
STAMP                                NEXT     *  WHT      CHARACTER            
SCORE                                NEXT     *  WHT      CHARACTER            
WU                                   NEXT     *  WHT      CHARACTER            
It looks like every record is being loaded as a string and not as the proper data type. Is this something that gets done automatically, or do I specify it somewhere? Is this a red herring?

Agrikk fucked around with this message at 18:56 on Sep 7, 2022

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.
It was the sql loader control file

code:
load data
infile 'C:\Scripts\fah_rds_oracle\staging\user_data.txt'
into table user_temp_daily
FIELDS TERMINATED BY '\t'
(username, team, stamp TIMESTAMP 'YYYY-MM-DD HH24:mi:ss',score, wu)

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Not got a db to hand, but you were also falling into the trap of Oracle Date not being Timestamp. That data would go into a Date field just fine because they go to the second, timestamp is for sub-second and/or timezone data. This means there's also nls_timestamp_format, for setting what that should look like.

Seventh Arrow
Jan 26, 2005

I'm trying to work out a SQL problem from a supposed Reddit interview question: https://www.interviewquery.com/interview-guides/reddit-data-analyst (Ad comments)

I tried hammering out a solution by populating the tables with some fake data (which is probably wrong):

https://www.db-fiddle.com/f/3frdKCCacKFCUUYiL9dJj7/12

but now I'm not even sure I can make sense of the question. The idea is that you're supposed to calculate a percentage of comments for each ad in each of the tables. But isn't there going to be a one-to-one relationship between the amount of comments on given ad and the instances of the ad ID? In other words, every time someone comments on "polo shirts", then surely the ad ID for "polo shirts" is going to be there too, making the ratio 100%(?)

It's a bit tricky because they don't provide a sampling of data, just the schemas. In completely unrelated news, I have a data engineer interview with Reddit this week :tif:

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Seventh Arrow posted:

I'm trying to work out a SQL problem from a supposed Reddit interview question: https://www.interviewquery.com/interview-guides/reddit-data-analyst (Ad comments)

I tried hammering out a solution by populating the tables with some fake data (which is probably wrong):

https://www.db-fiddle.com/f/3frdKCCacKFCUUYiL9dJj7/12

but now I'm not even sure I can make sense of the question. The idea is that you're supposed to calculate a percentage of comments for each ad in each of the tables. But isn't there going to be a one-to-one relationship between the amount of comments on given ad and the instances of the ad ID? In other words, every time someone comments on "polo shirts", then surely the ad ID for "polo shirts" is going to be there too, making the ratio 100%(?)

It's a bit tricky because they don't provide a sampling of data, just the schemas. In completely unrelated news, I have a data engineer interview with Reddit this week :tif:

I think what they want is that if there are 7 "feed" comments and 3 "moments" comments then you have to say it's 70% "feed" and 30% "moment". You can see that in their example results, the rows sum to 1 (they are proportions rather than percentages as the column titles suggest).

Seventh Arrow
Jan 26, 2005

Yes, I was kind of wondering about that, since there only seemed to be one "instance" of each ad. How would you calculate the percentage, then? Just assume that "ad" is "10" and do something like "SELECT COUNT comment_id/10 * 100..."?

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Seventh Arrow posted:

Yes, I was kind of wondering about that, since there only seemed to be one "instance" of each ad. How would you calculate the percentage, then? Just assume that "ad" is "10" and do something like "SELECT COUNT comment_id/10 * 100..."?

I don't know what you mean by "one instance".

I don't know what the most performant way to answer the question is, but I would:

1. ignore the "ads" table entirely*
2. do a SELECT COUNT on the "feed_comments" table with GROUP BY (preferably this is a CTE, but it could be a subquery in the FROM clause; it doesn't matter)
3. do the same thing for the "moments_comments" table
4. do a FULL OUTER JOIN with COALESCE(..., 0) to combine the counts in one result set
5. do the maths to get the final result set in the form they've asked for

* the question arises of what they want to happen in the case where an ad has no comments in either table. I'm choosing to ignore those ads because the problem statement on that web page doesn't tell you what to do with them. If you were posed the question in an interview then of course you could ask the interviewer, and hopefully that would be received positively because it would indicate you had a good grasp of the details.

e: hello Cloudflare

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
The ads table is the source for ads that exist, which can be larger than the ads that appear in either comments table. Were I the grader, I'd deduct for an outer join because it only addresses the situation where the ad has at least one type of comment.

Of course asking the question is valid, but all of no-comments, one-type, and both-types can be handled with a single idiom without an outer join.


When I opened the page I jumped right to the hard problem. :3:

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

PhantomOfTheCopier posted:

The ads table is the source for ads that exist, which can be larger than the ads that appear in either comments table. Were I the grader, I'd deduct for an outer join because it only addresses the situation where the ad has at least one type of comment.

Of course asking the question is valid, but all of no-comments, one-type, and both-types can be handled with a single idiom without an outer join.


When I opened the page I jumped right to the hard problem. :3:

Well then, what percentage would you report for each type of comment in the case where there are no comments? Zero out of the zero comments are "feed" comments, what proportion or percentage do you deem that to be? There isn't an unambiguously mathematically correct answer, so absent any input from the problem setter on what behaviour they would want to see, any approach to those ads is reasonable.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
I'd ask if the same comment_id can appear both in feed_comments and moments_comments. It probably can't, but it's possible they put that in there as a trap. A lot of interviewers really love to feel clever. Tables with an inherited PK are a gently caress.

Seventh Arrow
Jan 26, 2005

Yeah I admittedly usually bomb these SQL/Python interview tests but I'm not getting to work with this stuff in my current job anymore, so I might as well try. Also, whenever possible I try to copy/paste the interview challenge and analyze it later so I can get better at this stuff.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Hammerite posted:

Well then, what percentage would you report for each type of comment in the case where there are no comments? Zero out of the zero comments are "feed" comments, what proportion or percentage do you deem that to be? There isn't an unambiguously mathematically correct answer, so absent any input from the problem setter on what behaviour they would want to see, any approach to those ads is reasonable.
The question is certainly valid. I'm merely concerned about a non-extensible solution when the issue has been identified a priori. "Why do BxC when you have A?"

:v: B refs A, C refs A... "I'll do a query in B to get counts, and join that with the counts from C!"

:allears: what if a comment is in B but not C?

:v: Oh well... I guess I can do a left join of B and C.

:allears: But what if it's in C and not B?

:v: I guess I need a right join, well then a full outer join of B and C... and still a coalesce to handle the nulls.

:allears: And now we need to add C refs A.

:v: I guess more joins and maybe a case statement for the combinations that are missing.

:allears: But what if it's in neither and we still want those rows?

:v: Another outer join! I can just dump it in Excel and...

:allears: Can you use A to help?

:v: ... Well if I (do this instead) it addresses that issue and all of the above variations, I should have done that in the first place. :science:



Does it take joins? Sure, but join against the list with primary key... because you just know the next question will be "Durr can we get the A.sale-price too since that might tell us something useful?" and the answer need not be "let me rewrite the entire query".

mortarr
Apr 28, 2005

frozen meat at high speed

uguu posted:

Those are good points.
I considered InfluxDB, then I could just throw everything into a bucket and wash my hands of it. But as little as I know of sql, i know nothing of flux and the manual wasn't very helpful.
I will think about this a bunch more.
Thansk for the advice!

Not sure if you made any progress on this, but I did a similar thing at my old work - we had about 30 years worth of 3-waters related scada measurements for various sites around the city at 5min intervals. I don't work there anymore and it was a couple of years ago, but basically we had a bunch of ETL processes that collated the data from the various sources and ultimately inserted into postgres/timescaledb, then we used PostgREST to surface a proc using timescaledb's hyperfunctions that would pull the detail data plus a couple of other procs that could be used to surface summary data to help finding the devices of interest.

The data looked like: device id/location/time/value/comment, so not wide, just shitloads of it.

We surfaced PostgREST through an api gateway that let us limit access via api keys, rate limit, have nice looking url paths and all that good stuff. It took a bit of thought to put it all together, as we were new to postgres and as well as timescaledb, but once it was set up it was pretty good, I really liked the idea of PostgREST and not needing to roll a custom api/db interface.

duck monster
Dec 15, 2004

Does anyone know if its possible to get Multicorn (The Postgres plugin that lets you write Foreign Data Wrappers in python) working in Azures Postgress service? We have a giant legacy OSC OSI PI thing (Big ugly enterprisey timescale database thing) that we all hate using but its kinda core to our operation. We're toying with ideas how to cut that expensive crackpipe out and as a first step accessing it via a FTW means we can later on just move it to a local timescaledb table instance.

It might just be a hare brained scheme, but we'd like to try it.

Adbot
ADBOT LOVES YOU

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Most cloud databases prohibit installation into the backends with that level of access. Azure still seems to restrict outbound connections so it's not possible directly. It will need to be custom based on your data and read/write model.

Most migrations handle this with some application support, at least fallback logic with reads from the old system and writes to the new system. If zero downtime is mandated there are methods up to the limits of the CAP theorem.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply