|
Hello Data Engineering thread! I’m sort of a hybrid Full Stack Engineer / Data Engineer in the Biotechnology space meaning I can build front facing UIs, backend APIs and Data pipelines. We’re an AWS shop so we use MWAA, S3 l, Redshift etc. React for any UI application and Django for REST and GraphQL APIs to serve the frontends and house business logic for data pipelines. Does anyone have a great approach syncing external databases into Redshift as is (aka no transformation)?
|
# ¿ Jan 3, 2024 22:24 |
|
|
# ¿ May 17, 2024 20:12 |
|
CompeAnansi posted:Since no one is replying I'll mention what we do for Postgres to Postgres syncs since Redshift is based on Postgres and has the COPY command (although it may be somewhat restricted). Assuming that none of the tables in the DB are larger than memory, you can use: Some of these databases have 600+ tables and I doubt they would fit into memory. We have been doing the table to s3 copy paradigm. I was wondering if there was a better way but seems like not really. Especially since we’re being really lazy right now with dropping the replica and copying full each time.
|
# ¿ Jan 6, 2024 02:38 |
|
I have a file based data engineering challenge. We are looking to migrate objects from one bucket to another but modify the object key in the process (I.e. inject path metadata). Anyone have any other solutions than invoking a lambda function/step function?
|
# ¿ Apr 30, 2024 21:20 |