Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Red Mike
Jul 11, 2011

KoRMaK posted:

I'm woring in C# (for a Unity game) and I have a bunch of boiler plate code that multiple objects need. Inheritence doesn't really seem to be a great solution, and I'd rather use a mixin (pulling from my Ruby programming day job here)

Well, those don't exist in C#, ok fine I'll do a macro. gently caress, those don't exist either. And Interfaces don't seem like they allow for the method to be defined on the interface - defeating the purpose because I'll have to redefine the same friggen method in every object.


Anyway, how do I do mixins-ish programming in C# so that I don't have to keep copy/pasting the same 3 line method over and over?

This sort of design is a really bad fit for C#. Instead of trying to make the design work by force (it's possible, from partial classes to some weird inheritance with tons of complicated abstract classes and templates) I recommend just ignoring what you'd do in Ruby and trying to learn the C# design, which is fundamentally composition-based. Using interfaces and templates can help make that composition a lot simpler and quicker to write.

I've seen way too many people come in from JS, Ruby, Python, and then make a hash of all their code because they just want to do <simple thing in JS/Ruby/Python> and don't want to spend a few hours learning C# first.

Adbot
ADBOT LOVES YOU

raminasi
Jan 25, 2005

a last drink with no ice
I want some WPF radio buttons to look like toggle buttons. This is a simple thing to want to do, and according to the internet all you have to do is to declare the RadioButton like <RadioButton Style="{StaticResource {x:Type ToggleButton}}" />. But it's not working for me - it compiles and runs fine, but the radio button still just looks like a radio button. Nobody else seems to have this problem. What stupid thing am I doing wrong?

KoRMaK
Jul 31, 2012



I did C# long before Ruby, but it's been a while and most of my sharpened skills are Ruby optimized now.

So anyway, the jist is that I should peel this off to a class and refer to that class for stuff. Makes sense.

EssOEss
Oct 23, 2006
128-bit approved
We might be able to provide more design advice if you can describe what exactly the scenario is. "There is some common functionality" is not really sufficient input to give advice on the topic.

Warbird
May 23, 2012

America's Favorite Dumbass

I'd love to get some direction for a minor issue we've been having at work. My old team is using a number of dev and test boxes that all are running Windows Server 2003 due to software requirements. This works fine and good, but they tend to kick each other off of a given box as they remote in. It's relatively min or, but would there be a practical way to have a program check to see if a box is in active use? If so, is there a way to return the windows ID of the person remoting in?

I had a semi-working PoC using an event scheduler to make a file on a shared drive if a box was inactive for x minutes, but this was not ideal and didn't seem to consistently work. Any suggestions?

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!
https://www.devopsonwindows.com/3-ways-remotely-view-who-is-logged-on/

Warbird
May 23, 2012

America's Favorite Dumbass

That console command at the end should be pretty easy to use in a GUI interface or something, thanks! Once minor wrinkle, for whatever reason, everyone accesses the boxes through the same user/pass combo. I'm assuming there's no way to trace back to the host machine that's accessing the box? The idle time is the focus here, so I'm not too worried if it's not possible.

Warbird
May 23, 2012

America's Favorite Dumbass

Warbird posted:

That console command at the end should be pretty easy to use in a GUI interface or something

Boy was I wrong. I've been banging my head against this all morning and the internet at the office decided to catch on fire or something, so this has been a great morning for trying to troubleshoot.

code:
string command = "query user /server:[SERVERNAME]";

System.Diagnostics.ProcessStartInfo procStartInfo =
	new System.Diagnostics.ProcessStartInfo("cmd", "/c " + command);


procStartInfo.RedirectStandardOutput = true;
procStartInfo.UseShellExecute = false;

procStartInfo.CreateNoWindow = true;

System.Diagnostics.Process proc = new System.Diagnostics.Process();
proc.StartInfo = procStartInfo;
proc.Start();

string result = proc.StandardOutput.ReadToEnd();
I've tried this that and the other to get this to work, but all I get from Standard Output is a blank line or the "Microsoft Windows [Version......" stuff at the top of the cmd window if I'm lucky. What am I doing wrong here?

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

You're effectively opening a second command window to run query.exe, while capturing the empty output of the first. Don't launch cmd.exe, launch query.exe.

Warbird
May 23, 2012

America's Favorite Dumbass

I had piddled around with that earlier, but the program can't seem to find query.exe despite me pointing straight to it.

code:
System.Diagnostics.ProcessStartInfo procStartInfo =
	new System.Diagnostics.ProcessStartInfo("query", "/c " + command);
code:
System.Diagnostics.ProcessStartInfo procStartInfo =
                    new System.Diagnostics.ProcessStartInfo(@"%SystemRoot%\system32\query.exe", "/c " + command);
code:
System.Diagnostics.ProcessStartInfo procStartInfo =
                    new System.Diagnostics.ProcessStartInfo(@"C:\Windows\System32\query.exe", "/c " + command);
The the last one is Copy As Path on query.exe. Do I need to set the workspace? I could see that being a factor, but a direct path should supersede that with no problem. Maybe it's an access restriction? I have admin on this machine and running VS as admin doesn't seem to produce any difference.

Warbird
May 23, 2012

America's Favorite Dumbass

Mystery solved:

Some guy on StackOverflow posted:

Most likely your app is 32-bit, and in 64-bit Windows references to C:\Windows\System32 get transparently redirected to C:\Windows\SysWOW64 for 32-bit apps. calc.exe happens to exist in both places, while soundrecorder.exe exists in the true System32 only.

When you launch from Start / Run the parent process is the 64-bit explorer.exe so no redirection is done, and the 64-bit C:\Windows\System32\soundrecorder.exe is found and started.

From File System Redirector:

code:
In most cases, whenever a 32-bit application attempts to access %windir%\System32, the access is redirected to %windir%\SysWOW64.
[ EDIT ] From the same page:
code:
32-bit applications can access the native system directory by substituting %windir%\Sysnative for %windir%\System32.
So the following would work to start soundrecorder.exe from the (real) C:\Windows\System32.
code:
psStartInfo.FileName = @"C:\Windows\Sysnative\soundrecorder.exe";

I'm going to renounce developing and go live in a tree.

EssOEss
Oct 23, 2006
128-bit approved
Speaking of bitness, I notice that for a few years now "Prefer 32-bit" has been enabled by default in .NET project templates. Seems like a step backwards to me - anyone know why that's the default?

Sab669
Sep 24, 2009

This is purely speculation, but I assume it's probably because there are still a lot of old computers in use. So they just make it default to the behavior that will make your application run fine on x86 or x64.

But yes, I've definitely had my share of issues where I say, "Why the gently caress isn't this wor-- oh god drat it, wrong DLL"

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



IIRC the official reason at least was that you probably don't actually need the 64 bit address space and a bunch of tooling still hadn't been updated to use it, so just default to what's likely to work best for most people most of the time, which is 32-bit binaries.

biznatchio
Mar 31, 2001


Buglord

Sab669 posted:

This is purely speculation, but I assume it's probably because there are still a lot of old computers in use. So they just make it default to the behavior that will make your application run fine on x86 or x64.

Not setting "Prefer 32-bit" doesn't stop your application from running on x86; it just means that if you are on an x64 system, the process will upgrade to 64-bit.

My suspicion is the setting defaults that way because unless a developer is acutely aware of the situation, they're not going to be testing their code as both 32-bit and 64-bit, and since most applications don't need 64-bit address space, the default might as well be 32-bit just to avoid any bitness issues with unmanaged libraries that could arise because the developer never bothered to consider the case.

But do note that "AnyCPU, Prefer 32-bit" is not the same as setting the build settings to "x86", because "AnyCPU, Prefer 32-bit" allows the code to run on ARM devices, whereas "x86" doesn't.

putin is a cunt
Apr 5, 2007

BOY DO I SURE ENJOY TRASH. THERE'S NOTHING MORE I LOVE THAN TO SIT DOWN IN FRONT OF THE BIG SCREEN AND EAT A BIIIIG STEAMY BOWL OF SHIT. WARNER BROS CAN COME OVER TO MY HOUSE AND ASSFUCK MY MOM WHILE I WATCH AND I WOULD CERTIFY IT FRESH, NO QUESTION
Very random question, but any Australian devs attending Microsoft's Ignite conference in the Gold Coast?

Mr Shiny Pants
Nov 12, 2012

The Wizard of Poz posted:

Very random question, but any Australian devs attending Microsoft's Ignite conference in the Gold Coast?

Another random question how's Australia for an European dev?

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

biznatchio posted:

My suspicion is the setting defaults that way because unless a developer is acutely aware of the situation, they're not going to be testing their code as both 32-bit and 64-bit, and since most applications don't need 64-bit address space, the default might as well be 32-bit just to avoid any bitness issues with unmanaged libraries that could arise because the developer never bothered to consider the case.

When you consider that the Visual Studio team decided against converting VS to a 64 bit program, because it wouldn't offer sufficient benefits, it puts into perspective just how unlikely it is to actually make good use of x64 architecture. You need to be eating up boatload of RAM without necessarily choking on CPU processing at the same time.

edit: granted, I think much of Visual Studio is written in C++ so switching that code to x64 probably involves manually flipping every fifth bit or whatever kind of poo poo low-level programmers have to do :gary:

NihilCredo fucked around with this message at 22:39 on Feb 15, 2017

sarehu
Apr 20, 2007

(call/cc call/cc)
Gotta flip the 5th and 6th bits :)

ljw1004
Jan 18, 2005

rum

NihilCredo posted:

When you consider that the Visual Studio team decided against converting VS to a 64 bit program, because it wouldn't offer sufficient benefits, it puts into perspective just how unlikely it is to actually make good use of x64 architecture. You need to be eating up boatload of RAM without necessarily choking on CPU processing at the same time.

That's exactly it. We need something like "AnyCPU" so your code will at least run on ARM if required. But we generally don't want to give up the speed+memorysize benefits of x86 when we're running on x64. Therefore "AnyCPU_32bit_preferred" is the best of both worlds.

The benefits of x64 for VS would be to allow truly huge ginormous projects to be loaded (and have Roslyn offering you all the nice features like find-all-references) without crashing due to running out of address space. The drawbacks would be that every single thing runs a little bit slower and takes more memory due to the doubled pointer sizes. In the end, the VS team decided it'd be better to work hard on reducing Roslyn memory consumption, and to move Roslyn into a separate x86 process with its own address-space, since these solutions gave better performance all around.


The only time you'd ever prefer x64 over x86 is when your app has truly huge datasets. And then it's not so much a matter of "preferring" x64 but rather *requiring* it.

putin is a cunt
Apr 5, 2007

BOY DO I SURE ENJOY TRASH. THERE'S NOTHING MORE I LOVE THAN TO SIT DOWN IN FRONT OF THE BIG SCREEN AND EAT A BIIIIG STEAMY BOWL OF SHIT. WARNER BROS CAN COME OVER TO MY HOUSE AND ASSFUCK MY MOM WHILE I WATCH AND I WOULD CERTIFY IT FRESH, NO QUESTION
Has anyone tried using Team Services to automate a build/release process for a multi-project solution?

I've got a multi-project solution from which I'd like to deploy one particular project to an Azure App Service. The problem is I can't figure out the right combination of magic strings to put into the configuration to get it to work. My main problem at the moment is that my VSTS Build definition doesn't seem to build the project I want to deploy, the folder never seems to be created, so then when it copies the files to be deployed in the Release definition, it finds and copies zero files.

Any ideas at all? Or any guides for this stuff? I'm finding it very difficult to Google anything without winding up at a generic Microsoft brochure site for Team Services.

EssOEss
Oct 23, 2006
128-bit approved
Team Services build process just automates basic commands like msbuild.exe. Figure out the right set of commands to build your thing locally and the same stuff will generally work on a build agent. It tends to be pretty straightforward, so speak about details if you want detailed help.

New Yorp New Yorp
Jul 18, 2003

Only in Kenya.
Pillbug
The MSBuild argument /p:GenerateProjectSpecificOutputFolder=True may be what you're after.

Also, the ability to run an inline powershell script is your friend here, since I'm assuming you don't have access to the file system to see what is or isn't getting generated and where it's landing in the file system. gci '$(Build.SourcesDirectory)' -rec, for example.

Opulent Ceremony
Feb 22, 2012
I've got an ASP.NET question. I'm looking at the .NET SDK for AWS S3 (http://docs.aws.amazon.com/AmazonS3/latest/dev/RetrievingObjectUsingNetSDK.html) for downloading a file from S3, which allows you to grab onto the ResponseStream of a file coming down from AWS.

If I were to, within my web server, grab that ResponseStream and just return it to the client in an MVC FileStreamResult (or whatever is necessary), I'm guessing that avoids requiring the entire file from being loaded into memory at once in my web server, but does it require my web server to sit around babysitting that stream so it can pass through to the client, or would it be handing off the AWS stream to the client and ending the process for my web server so it can get to other requests?

Thanks for the quick response VVV

Opulent Ceremony fucked around with this message at 19:42 on Feb 20, 2017

B-Nasty
May 25, 2005

Opulent Ceremony posted:

If I were to, within my web server, grab that ResponseStream and just return it to the client in an MVC FileStreamResult (or whatever is necessary), I'm guessing that avoids requiring the entire file from being loaded into memory at once in my web server, but does it require my web server to sit around babysitting that stream so it can pass through to the client, or would it be handing off the AWS stream to the client and ending the process for my web server so it can get to other requests?

Your server would be streaming from Amazon and down to the client.

You'd have to get tricky for the hands-off option. I did something similar once where I generated a secure link to the resource on Amazon (using a signed access URL) and the client was redirected directly to Amazon to pull the file.

Essential
Aug 14, 2003
I have a project that went from proof of concept to full blown production recently and I'm struggling with how to handle the amount of data I need to process. I have an app that sync's data from a local database to a Sql Azure database every 15 minutes. The problem now is scale as this went from a handful of installs to around 300 in the span of 1 month and will continue to grow to around 1000 by the end of the year.

Sometimes the amount of data is really huge, like 20K records per location all uploading at the same time. That's roughly 6 million rows all trying to add or update into the sql azure database at the same time. One of the major hurdles is there is no way to know if the data is new or changed at the local database. Right now I'm using BulkSQL to insert all of those rows (deleting the old data first). I had some Entity Framework code that instead checked against the azure database to see if the record was new or modified, but that was killing both my wcf service and the database dtu's.

Ideally there is a solution to only send new and/or modified records up to azure. The only way I can think of to do that is: 1) keep a local copy of the data to check against or 2) download the records from azure and check locally for new/modified records.

The other option I can think of is to continue to send all the data up, but have a very fast/performance minded cloud service to check if the record is new or modified and discard the others.

I'm really in need of help with this and hoping someone here either has experience and a solution or can point me to some person/company that can solve these issues.

kitten emergency
Jan 13, 2008

get meow this wack-ass crystal prison

Essential posted:

I have a project that went from proof of concept to full blown production recently and I'm struggling with how to handle the amount of data I need to process. I have an app that sync's data from a local database to a Sql Azure database every 15 minutes. The problem now is scale as this went from a handful of installs to around 300 in the span of 1 month and will continue to grow to around 1000 by the end of the year.

Sometimes the amount of data is really huge, like 20K records per location all uploading at the same time. That's roughly 6 million rows all trying to add or update into the sql azure database at the same time. One of the major hurdles is there is no way to know if the data is new or changed at the local database. Right now I'm using BulkSQL to insert all of those rows (deleting the old data first). I had some Entity Framework code that instead checked against the azure database to see if the record was new or modified, but that was killing both my wcf service and the database dtu's.

Ideally there is a solution to only send new and/or modified records up to azure. The only way I can think of to do that is: 1) keep a local copy of the data to check against or 2) download the records from azure and check locally for new/modified records.

The other option I can think of is to continue to send all the data up, but have a very fast/performance minded cloud service to check if the record is new or modified and discard the others.

I'm really in need of help with this and hoping someone here either has experience and a solution or can point me to some person/company that can solve these issues.

Are Azure Web Jobs anything like AWS Lambda? You could have each client send data through a on-demand service to de-duplicate and then batch create/updates from each local db into your main one. This assumes that clients won't have duplicate data with other clients.

Taking a step back, you should think about where you should optimize. Performance on the client side? Cloud spend/performance? How quickly do you need to de-duplicate and propagate to the azure DB?

Essential
Aug 14, 2003

uncurable mlady posted:

Are Azure Web Jobs anything like AWS Lambda? You could have each client send data through a on-demand service to de-duplicate and then batch create/updates from each local db into your main one. This assumes that clients won't have duplicate data with other clients.

Taking a step back, you should think about where you should optimize. Performance on the client side? Cloud spend/performance? How quickly do you need to de-duplicate and propagate to the azure DB?

Yes, I believe they are similar. Azure also has Azure Functions, which I think is meant to be the direct competitor to AWS Lambda. Clients will not have duplicate data due to a guid license key that goes with each record. I should also add that the records match our database model. The data is queried client side, changed to our data models and then uploaded. Right now data is sent up in batches of 100 through a wcf service and bulkinserted. If an office has 20k records then it's doing 200 uploads as fast as it can. When uploading just 1 location it's almost impossibly fast, sometimes taking only 5 or 10 seconds to upload all 20k records.

Right now the performance bottle neck seems to be the de-duplication of data on the cloud side (when I was using that method). It was so many queries against the db that it basically locked up. I'm literally sending up thousands and thousands of unchanged records every 15 minutes. 95% of the records are unchanged. I'm really hoping there is a reliable way to only upload new/modified records.

Data needs to be ready to go pretty fast as it's expected to be close to real-time data, so I need to get it loaded into the db as fast as I can. I do think there is also a reasonable performance consideration I can raise to the powers that be.

Thanks for the help!

rarbatrol
Apr 17, 2011

Hurt//maim//kill.
Your ideal solution(s) sounds like something I'd go with... if you can attach last-modified time stamps to everything, that'll let you filter your data down pretty easily, potentially going both directions.

But: What's breaking down for you right now? Can you run a sustained load test and find out what happens with 1000 installs?

Edit:

Essential posted:

Yes, I believe they are similar. Azure also has Azure Functions, which I think is meant to be the direct competitor to AWS Lambda. Clients will not have duplicate data due to a guid license key that goes with each record. I should also add that the records match our database model. The data is queried client side, changed to our data models and then uploaded. Right now data is sent up in batches of 100 through a wcf service and bulkinserted. If an office has 20k records then it's doing 200 uploads as fast as it can. When uploading just 1 location it's almost impossibly fast, sometimes taking only 5 or 10 seconds to upload all 20k records.

Right now the performance bottle neck seems to be the de-duplication of data on the cloud side (when I was using that method). It was so many queries against the db that it basically locked up. I'm literally sending up thousands and thousands of unchanged records every 15 minutes. 95% of the records are unchanged. I'm really hoping there is a reliable way to only upload new/modified records.

Data needs to be ready to go pretty fast as it's expected to be close to real-time data, so I need to get it loaded into the db as fast as I can. I do think there is also a reasonable performance consideration I can raise to the powers that be.

Thanks for the help!

You could investigate splitting up the incoming data into staging areas, potentially per-client since that's a natural partitioning key. What goes into a row? Unless you've got some big gnarly text, you could probably crank up your batch size, maybe to 1000. How many pieces of information do you need to perform de-duplication? There may be a way to hash the row data and use that as your modified flag. In my experience, SQL is pretty good at de-duplication style queries if you can massage your problem to work that way.

rarbatrol fucked around with this message at 05:12 on Feb 21, 2017

Essential
Aug 14, 2003

rarbatrol posted:

But: What's breaking down for you right now? Can you run a sustained load test and find out what happens with 1000 installs?
I believe what's breaking down is the load on the database. Inserting that many records is eating up ton's of DTU's (Sql azure db measurement, database transaction unit). The database is also being queried by hundreds of users at any given moment as well (via a web api 2.0 rest service).

I definitely can setup a load balance test. Right now I'm being asked to add features and skip on performance, so part of what I'm fighting is the usual crap of not being focused on the right thing. I'm trying to do all the optimizations after hours.

rarbatrol posted:

You could investigate splitting up the incoming data into staging areas, potentially per-client since that's a natural partitioning key. What goes into a row? Unless you've got some big gnarly text, you could probably crank up your batch size, maybe to 1000. How many pieces of information do you need to perform de-duplication? There may be a way to hash the row data and use that as your modified flag. In my experience, SQL is pretty good at de-duplication style queries if you can massage your problem to work that way.

There isn't any unusually large data. During previous testing 100 records seemed to be the sweet spot, although I think you are right, it's probably worth playing with that number again. Rows are typically around 10-20 columns, lot's of dates and money columns, a few integers, a few varchar sometimes up to 500 in size. I don't think they are particularly large.

One of my biggest problems is I don't understand how to de-duplicate the data without querying the database and doing a 1 to 1 compare. I think that is what's killing performance, having to check hundreds of thousands of rows to see what's new/modified. But how can I do that without querying the db?

Essential fucked around with this message at 05:22 on Feb 21, 2017

NiceAaron
Oct 19, 2003

Devote your hearts to the cause~

Just spitballing here, but if you have a "last modified timestamp" column on each table that the local database updates when a row is created or updated, then the local database can ask the Azure database "what's the latest timestamp you have for this customer" which should hopefully be faster than querying all of the data, and then only sync data that was modified after that timestamp.

rarbatrol
Apr 17, 2011

Hurt//maim//kill.

Essential posted:

I believe what's breaking down is the load on the database. Inserting that many records is eating up ton's of DTU's (Sql azure db measurement, database transaction unit). The database is also being queried by hundreds of users at any given moment as well (via a web api 2.0 rest service).

I definitely can setup a load balance test. Right now I'm being asked to add features and skip on performance, so part of what I'm fighting is the usual crap of not being focused on the right thing. I'm trying to do all the optimizations after hours.


There isn't any unusually large data. During previous testing 100 records seemed to be the sweet spot, although I think you are right, it's probably worth playing with that number again. Rows are typically around 10-20 columns, lot's of dates and money columns, a few integers, a few varchar sometimes up to 500 in size. I don't think they are particularly large.

One of my biggest problems is I don't understand how to de-duplicate the data without querying the database and doing a 1 to 1 compare. I think that is what's killing performance, having to check hundreds of thousands of rows to see what's new/modified. But how can I do that without querying the db?

Do you have some sort of global identifier column on this data? I don't have a lot of knowledge in the realm of DTUs or even what your data is like, but here's what I'm thinking:
1. Add some sort of entire-row hash value column
2. Bulk load rows (preferably pre-filtered by a last modified date - this is probably the quickest win) into a temp table
3. You can then compare existing identifiers and hash values
4. Same hash value -> no work, ignore them
5. Different hash value -> update-select them onto the old data
6. Data doesn't have an identifier -> either this is brand new and can be inserted, or you have to use the old expensive dedupe method probably?

Essential
Aug 14, 2003
First, thanks to everyone for help, I really appreciate it.

NiceAaron posted:

Just spitballing here, but if you have a "last modified timestamp" column on each table that the local database updates when a row is created or updated, then the local database can ask the Azure database "what's the latest timestamp you have for this customer" which should hopefully be faster than querying all of the data, and then only sync data that was modified after that timestamp.
The problem with this, is that the local database doesn't have anything to indicate that the record has changed. The only way to know that a value in the local database has changed is to compare it against something. There is no concept in the local database to show when a record was changed/modified. The local db can literally have record with column firstname, let's say a record has the value "bob". 15 minutes later someone could have changed that same record to "Fred" and there is nothing to indicate it has changed. In our azure tables, we do have created and modified columns, however I'm still struggling to see how I can use that against the local data.

rarbatrol posted:

Do you have some sort of global identifier column on this data? I don't have a lot of knowledge in the realm of DTUs or even what your data is like, but here's what I'm thinking:
1. Add some sort of entire-row hash value column
2. Bulk load rows (preferably pre-filtered by a last modified date - this is probably the quickest win) into a temp table
3. You can then compare existing identifiers and hash values
4. Same hash value -> no work, ignore them
5. Different hash value -> update-select them onto the old data
6. Data doesn't have an identifier -> either this is brand new and can be inserted, or you have to use the old expensive dedupe method probably?
No, we currently do not have any global identifier for a row. I'm trying to work through your suggestion, but again I come back to not knowing which rows have been modified. The example's I've given, let's say 20k rows, there is literally nothing in the local db that will show me 1 row is updated but 19,999 are the same. No modified date, changed date, nothing. The record is just *boomp* different now. I think your hash suggestion is good and could be part of the solution. Perhaps that could just help when de-duping data in the cloud.

Unless there is something fundamentally I don't understand (could very well be) about this whole process, I don't see how I can tell that a record was modified, without there being something in the record to indicate that. This also feels like something that has to have been solved before (many, many times) but I'm really struggling with it. I keep coming back to keeping a local data store of sorts. One of the big issue's though is I can't guarantee that the local store is the same as the azure db.

VVVV Not to the local database, no. VVVV

Essential fucked around with this message at 07:45 on Feb 21, 2017

redleader
Aug 18, 2005

Engage according to operational parameters
Could you add a last modified column or similar?

Bognar
Aug 4, 2011

I am the queen of France
Hot Rope Guy

Essential posted:

VVVV Not to the local database, no. VVVV

That seems like a pretty big restriction around your only way to know when something was modified client-side. Why aren't you able to modify the local database? Can you store additional data locally that's not in the database?

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


NiceAaron posted:

Just spitballing here, but if you have a "last modified timestamp" column on each table that the local database updates when a row is created or updated, then the local database can ask the Azure database "what's the latest timestamp you have for this customer" which should hopefully be faster than querying all of the data, and then only sync data that was modified after that timestamp.

SQL Server does this for you. Look into Change Tracking and/or Change Data Capture. Clients periodically connecting and syncing data is literally what CT and CDC are designed to handle.

Essential
Aug 14, 2003

Bognar posted:

That seems like a pretty big restriction around your only way to know when something was modified client-side. Why aren't you able to modify the local database? Can you store additional data locally that's not in the database?
Yes, it's a very big restriction. I can't modify the local database because it's not ours and we have read-only access. We can store data locally however. I forgot to mention, I recently created a method for storing json data locally. As the data is sent up and successfully stored in our cloud db, it's then written to a local json file. The files are fairly small and can be deserialized and queried against. Right now I haven't done much more than just create the files though.

Nth Doctor posted:

SQL Server does this for you. Look into Change Tracking and/or Change Data Capture. Clients periodically connecting and syncing data is literally what CT and CDC are designed to handle.
Sql Azure supports Change Tracking, not Change Data Capture. The real problem though is the local database's are not Sql server. There's actually a few different servers being used, such as MySQL, ctree SQL, Sql Anywhere, and one other really obscure one I can't think of at the moment.

beuges
Jul 4, 2005
fluffy bunny butterfly broomstick

Essential posted:

Yes, it's a very big restriction. I can't modify the local database because it's not ours and we have read-only access. We can store data locally however. I forgot to mention, I recently created a method for storing json data locally. As the data is sent up and successfully stored in our cloud db, it's then written to a local json file. The files are fairly small and can be deserialized and queried against. Right now I haven't done much more than just create the files though.

Sql Azure supports Change Tracking, not Change Data Capture. The real problem though is the local database's are not Sql server. There's actually a few different servers being used, such as MySQL, ctree SQL, Sql Anywhere, and one other really obscure one I can't think of at the moment.

Could you read out all the records of the local database and generate a hash for each one, e.g. just concatenate all the columns together into a string and SHA256 it, then upload the row id/hash pairs to your cloud service. The cloud service could then do a lookup and return the list of row id's who have a hash mismatch indicating some data has changed. The local server then uploads only those rows that have changed. When the cloud service receives the changed rows, it stores the data along with the hash which you can then use on the next sync. If you index the client id/row id/hash in the cloud, then that should bring your DB workload down a great deal.

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Better yet, store the id/hash pairs locally after every upload, and check for changed hashes locally before you even start the sync process. 20k hashes is nothing in terms of storage.

Adbot
ADBOT LOVES YOU

Essential
Aug 14, 2003

beuges posted:

Could you read out all the records of the local database and generate a hash for each one, e.g. just concatenate all the columns together into a string and SHA256 it, then upload the row id/hash pairs to your cloud service. The cloud service could then do a lookup and return the list of row id's who have a hash mismatch indicating some data has changed. The local server then uploads only those rows that have changed. When the cloud service receives the changed rows, it stores the data along with the hash which you can then use on the next sync. If you index the client id/row id/hash in the cloud, then that should bring your DB workload down a great deal.

NihilCredo posted:

Better yet, store the id/hash pairs locally after every upload, and check for changed hashes locally before you even start the sync process. 20k hashes is nothing in terms of storage.

Boom! That is a great idea, thanks! A local hash would be easy and fast!

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply