Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb
I'm interested in learning how to scale my web app beyond a single server. Ideally I'd like it to be hosted in multiple regions with GeoDNS taking you to the closest one. From what I can tell, the database replication portion of this is probably gonna be the most difficult part. It's tempting to use Amazon RDS, and I may end up going down that route, but I'd still like to learn how to do it myself. Is it feasible to have MySQL databases replicated across different regions? Do writes always go to the master? How bad does it get when something goes wrong? What's a good resource for finding answers to questions like these?

Adbot
ADBOT LOVES YOU

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

chmods please posted:

1) Once the database is built with SQLite why can't you query it locally instead of having to go out over the Internet?
2) Never ever ever ever ever ever ever ever expose an SQL server to the Internet. Build some kind of service (HTTP or otherwise) to act as a frontend if you really need to go over the internet.
3) I don't understand why you want to involve both SQLite and MySQL. Pick one.

I'd say pick SQLite, no need to mess with MySQL right now. 1 million rows is nothing, if you manage to get a slow query, then figure out how to optimize that specific one. Don't add indexes to everything, you'll be worse off.

fletcher fucked around with this message at 09:16 on Feb 25, 2014

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

dammitcharlie posted:

Sorry, maybe I don't understand quite how this all works, but I can't do it locally because I don't want the program to be so massive; I suppose it could be but my professor said that she'd prefer we keep the programs relatively small.

Your data shouldn't be stored with your code. Here are a couple options I would explore rather than trying to go client/server with MySQL:

* Can the data be generated by a small script so you can populate your database when needed?
* If it's some existing dataset and you can't generate it on the fly, can you just store a CSV of it on a thumbdrive/dropbox/ftp/whatever, and import into SQLite as needed?

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

dammitcharlie posted:

The data will be generated by scraping websites over the next few days. So I guess I should scrape the data into a CSV file and then store that CSV with the program which will load it into sqlite upon starting?

Yup I'd go with something just like that. Bonus points if you can simply specify a URL for the data source and if the data hasn't been downloaded before, it downloads it for you. That way the same steps for running it the first time can be used for running it a second time.

If you have time you might also want to explore having your scraper automatically compress the csv into a zip file, and then your program can uncompress it after it downloads it. I haven't used it before but it looks like Python has a nice interface for working with zip files.

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

ManoliIsFat posted:

Ya, in SQL, you test for null with "[column] IS NULL"

Gets me every time. Why is it like that?

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

Kumbamontu posted:

NULL isn't a number, it's the inherent lack of knowledge of a value. Therefore, the expression 'NULL =' anything really doesn't make sense, as you don't know if it's 0, a billion, -3.14, undefined, etc.

But in programming languages like Java you can do null comparisons with == :(

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

Agrikk posted:

I have what I think amounts to a data warehousing question for SQL Server 2012 and I'm not sure how to approach it.

I have a table consisting of user data from Folding@home that I have been capturing once an hour for the last year.

Table UserData has the structure:
code:
TimeStamp             UserName  Score      WorkUnits  TeamNumber
2014-04-29 12:34:56   Agrikk    33219802   20824      33
So basically, UserData contains an hourly snapshot for each of 1.5 million Folding@Home users for the last year.

There is one primary key on (TimeStamp, UserName, TEamNumber).

What I'm trying to figure out is how to build the underlying data structures will be accessed by a PHP web server to produce a page like this one that produces production data by user.

If I create a summary table that contains a single data point per user per day, I still have hundreds of millions of rows of data that need to be parsed to create a graph, and I just don't understand how to build a database that can return a query at web-browsing speeds.

I mean, the referenced page returns a query in less than a second, and I just don't understand how that is possible.

Does it have to generate it in real time on demand? Maybe just generate the summary report periodically with a cron job and serve up the generated report.

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb
Something like this maybe?

code:
SELECT SUBSTRING('Stone Brewing Company', 1, LOCATE('brew', 'Stone Brewing Company')-1);

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb
Trying to do everything with database queries seems like it will be pretty cumbersome. I would just write a little python program that uses a sqlite database or something to massage your data.

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

the posted:

Is this a good place to ask stupid questions about SQL queries to a Salesforce database?

Yea I think this would be an appropriate thread for that. I don't know nuthin bout no SQL in Salesforce though. Now, SOQL on the other hand... :D

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

the posted:

Haha yes, thanks. I'm using Beatbox in Python to access the database.

So, I am trying to access the Account History. Say an Account was changed from one name to another, and both of those changes are recorded in the history with a timestamp. What query would I use to grab that information, or is that even possible?

Yup that's doable.

Make sure you turn on History Tracking for the Account object if you haven't already. Setup->Customize->Accounts->Fields then click Set History Tracking button and select which fields you want tracked.

Now you can execute a query like:
code:
SELECT AccountId, CreatedById, CreatedDate, Field, NewValue, OldValue FROM AccountHistory
If you wanna just explore the database and execute queries and such I recommend their Workbench utility: https://workbench.developerforce.com

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

the posted:

edit: I'm reading through the SOQL help file, and what I'm really looking for is a listing of what all the field types are referring to. Like, how was I supposed to know that the Billing City is referred to as "billingCity" without seeing it in an example?

For that you have to use describe calls: https://www.salesforce.com/us/developer/docs/api/Content/sforce_api_calls_describesobjects.htm

It will return everything about that object: https://www.salesforce.com/us/developer/docs/api/Content/sforce_api_calls_describesobjects_describesobjectresult.htm#topic-title

In the array of fields that it returns you can see the label & name (Billing City vs. billingCity): https://www.salesforce.com/us/developer/docs/api/Content/sforce_api_calls_describesobjects_describesobjectresult.htm#i1427375

edit: or if you just want to look at it through the Salesforce UI go to Setup->Customize->Accounts->Fields and it will list all the Field Labels & Field Names

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

the posted:

Can I combine two selections? Like for that example, say I wanted to select the account history, but only for accounts located in New York. How would I do this? My inclination is to do something like:

code:
SELECT AccountId, CreatedById, CreatedDate, Field, NewValue, OldValue FROM AccountHistory 
AND Name, Id FROM Account WHERE billingCity = 'New York City'
or, moreover, how would I select a specific account?

Yup, you can do that:

code:
SELECT AccountId, CreatedById, CreatedDate, Field, NewValue, OldValue FROM AccountHistory WHERE Account.Name = 'Some Account Name'
or

code:
SELECT AccountId, CreatedById, CreatedDate, Field, NewValue, OldValue FROM AccountHistory WHERE AccountId IN (SELECT Id FROM Account WHERE Name = 'Some Account Name')
I think the latter may perform better, I can't remember

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

the posted:

fletcher, I'm hoping you can be my savior again. I'm not sure how good you are with Python, but you indicated that you use beatbox so I'm hoping you're the perfect person to help me.

I think I've figured out how to query the database thanks to you, but I'm having trouble storing the data. I have code that looks like this:

Python code:
list1 = []

sf = beatbox._tPartnerNS
svc = beatbox.Client()
svc.login("username","password")

query = svc.query("SELECT AccountId, CreatedById, CreatedDate, 
Field, NewValue, OldValue FROM AccountHistory WHERE Account.Industry 
= \'Government\' AND Account.Type = \'Prospect\'")

testdict = query['records']
total = query['size']
queryL = query['queryLocator']

while len(testdict) < int(str(total)):
	query = svc.queryMore(str(queryL))
	queryL = query['queryLocator']
	all_records = [', '.join(str(rec)) for rec in query[sf.records:]]
	list1.append(all_records)
As you can see, I want to just run through and append all of the 37,000 or so records that come up from that query. I originally tried handling them as dict files, but for some reason it was saving them all as instances so I couldn't append them. Now I'm just trying to append the information to a list, but it's currently coming out like this:

code:
In [16]: list1[1][1]
Out[16]: 'A, c, c, o, u, n, t, H, i, s, t, o, r, y, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 6, 
u, U, B, q, A, A, M, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 6, u, 7, g, P, A, A, Q, 2, 0, 1, 2,
 -, 0, 3, -, 2, 7, T, 1, 8, :, 2, 7, :, 0, 6, ., 0, 0, 0, Z, c, r, e, a, t, e, d'
I'm wondering if there's a better way to parse the data coming out of the query.

I've never used beatbox before, the bulk of my experience is using the Salesforce SOAP API via Java code.

It looks like you're actually using beatboxxx rather than beatbox? I couldn't get the test code to run with the latter.

What do you plan on doing with these records? Typically you don't want to load them all into memory at once, you process them in chunks and then use queryMore to get the next chunk from Salesforce. This keeps your memory footprint low.

At any rate, if you truly do need an all_records:

Python code:
query = svc.query("SELECT Id, Name FROM Account")

all_records = []

if query['size'] > 0:
    done = False
    while not done:
        all_records = all_records + query['records']
        if query['queryLocator']:
            query = svc.queryMore(query['queryLocator'])
        else:
            done = True

for record in all_records:
    print('%s, %s, %s' % (record.type, record.Id, record.Name))

fletcher fucked around with this message at 18:30 on Jun 17, 2014

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

the posted:

I want to do data analysis with them.

What I'm specifically doing right now is trying to produce meaningful relationships between the Opportunity Lead Source and the Account History. I want to know whether or not the Lead Source has any affect on whether or not an Account goes from a Suspect to a Prospect.

So, what I wanted to do was grab all the Account Histories from the market I'm looking at, and then also grab the Opportunities from a specific lead source (just to start with). Then I'd cross-reference based on Account ID (which both have), and hopefully end up with one list of all the Account Histories that are from that specific lead source. Then I wanted to look at all the histories where it was originally a Suspect that went to a Prospect, and see the time difference between those two entries.

There's probably a much simpler way to do this, but I'm just learning for the first time. I'm guessing you know some way that I can do this with *just one SOQL query*, right?

Not sure if there is a simple way to do that. You may have to go further down the route you are currently on and query/process the data manually yourself. Just be sure to leverage the built in relationships as much as you can to avoid extra queries.

And to be honest, I only have experience with Salesforce from the developer perspective. When it comes to actually using it like an end user things get a little fuzzy, I've never done anything with those Lead & Opportunity objects.

Could you do something with the delta between the CreatedDate & ConvertedDate?

code:
SELECT CreatedDate, ConvertedDate, ConvertedAccount.Name FROM Lead WHERE LeadSource = 'Web' AND IsConverted = TRUE

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

the posted:

I've got a LeadSource stored as "I'm an existing client", but when I try to query it it says that it's an invalid query. Is it stored somehow without the apostrophe?

I have it in python as \'I\'m an existing client\' to account for the apostrophe in the name.

edit: Specifically the error is this:

SoapFaultError: 'MALFORMED_QUERY' "MALFORMED_QUERY: Opportunity WHERE LeadSource = 'I'm an existing client' ^ ERROR at Row:1:Column:69 unexpected token: 'm'"

I think you need some more backslashes. You want the query to end up being:
code:
SELECT something FROM somewhere WHERE thingy = 'I\'m a real boy!'
I think something like this (this is output from the python console)
code:
>>> query = 'SELECT something FROM somewhere WHERE thingy = \'I\\\'m a real boy!\''
>>> query
"SELECT something FROM somewhere WHERE thingy = 'I\\'m a real boy!'"
>>> print query
SELECT something FROM somewhere WHERE thingy = 'I\'m a real boy!'

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

PhantomOfTheCopier posted:

Wait, you're using Python and passing a literal string into a database? Have you heard of SQL injection? There must be seventy thousand modules for data handling in Python by now.

It's Salesforce, you pretty much have to use string literals. At any rate, SOQL is read only. You can't CREATE/UPDATE/DELETE, those use a separate API. And there's no risk of querying for data you don't have access to, that is all handled by the Salesforce user security model.

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

the posted:

How do I format something like this:

Select Name FROM List WHERE Gender = 'Man' AND Color = 'Blue' OR Color = 'Red' OR Color = 'Yellow'


I want it to choose the first option AND one of the others, how do I make sure that happens?

Also, is there any way I can write the second condition where I don't have to write each COLOR every time?

Kumbamontu's response is the better solution, but I just wanted to point out that you weren't too far off. Just need to add some parentheses:

code:
Select Name FROM List WHERE Gender = 'Man' AND (Color = 'Blue' OR Color = 'Red' OR Color = 'Yellow')

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

the posted:

You can view my history, but I'm doing queries in Salesforce using Python. The SQL queries are basically no different, but they're embedded in Python code.

I'm grabbing one list with the following:

Python code:
opp_query = "SELECT AccountId, Id, StageName FROM Opportunity 
WHERE RecordType.Name = \'FD - Buying\'"
query_opps = svc.query(opp_query)
Then doing some stuff to it, and then doing a second query with:

Python code:
for row in filtered_opps:
	ocr_query = "SELECT ContactId FROM OpportunityContactRole WHERE IsPrimary = TRUE 
AND OpportunityId = " + '\'' + row[1] + '\''
	query = svc.query(ocr_query)
Where you can see here, I'm making about 20,000 individual queries (the size of filtered_opps).

Use a subquery (assuming you can perfrom 'some stuff' where additional SOQL)

code:
ocr_query = "SELECT ContactId FROM OpportunityContactRole WHERE IsPrimary = TRUE and OpportunityId IN (SELECT Id FROM Opportunity WHERE RecordType.Name = 'FD - Buying'"
Or just build a list of OpportunityIds and query for them all at once like this (note that you will need to quote & comma your opportunity_ids here, I've omitted that part). Also make sure you split this up into multiple queries if necessary due to restrictions on how long your SOQL statement can be (20k characters I think?)

Python code:
opportunity_ids = []
# do stuff to retrieve opportunity_ids
ocr_query = "SELECT ContactId FROM OpportunityContactRole WHERE IsPrimary = TRUE and OpportunityId IN (" + opportunity_ids +")"

the posted:

fletcher would know how to do this.

According to that page, I can link by simply going Blah.blah, but I'm getting errors when doing it.

For example, every Opportunity has an OpportunityContactRole associated with it, but I get errors when trying to reference that:

opp_query = "SELECT AccountId, Id, Stagename, OpportunityContactRole.ContactId FROM Opportunity WHERE RecordType.Name = \'FD - Buying\'"

Error:

SoapFaultError: 'INVALID_FIELD' "INVALID_FIELD: SELECT AccountId, Id, Stagename, OpportunityContactRole.ContactId ^ ERROR at Row:1:Column:34 Didn't understand relationship 'OpportunityContactRole' in field path. If you are attempting to use a custom relationship, be sure to append the '__r' after the custom relationship name. Please reference your WSDL or the describe call for the appropriate names."

Maybe flip it around and query for Opportunity fields through OpportunityContactRole:

Python code:
opp_query = "SELECT Opportunity.AccountId, Opportunity.Id, Opportunity.StageName, ContactId FROM OpportunityContactRole WHERE Opportunity.RecordType.Name = 'FD - Buying'"
(I think that should work, I don't have RecordTypes on my Opportunity object to try it)

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

Essential posted:

How important is it to have a CreatedDate & ModifiedDate columns on tables? I've got a guy here telling me that EVERY table, without exception should have those 2 columns. We're adding a couple tables and right now I don't care about CreatedDate/ModifiedDate on a few of the tables and I can't ever see it being needed on those tables. His argument is that what IF at some point in the future I want it and even if I don't, it isn't going to affect anything negatively.

I can see his point but I'd rather not add columns if I don't truly need them.

Personally, I like having audit fields on all of my tables (created date, created by, modified date, modified by), even if I don't foresee using that information (and there's been a few times where I'm glad I had it).

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

the posted:

In Salesforce, I have an Opportunity entity that has many different custom fields associated with it. What I'm finding out is that many of the custom fields are throwing errors when I make SOQL requests to them. But some aren't. And I can't figure out what is the difference, since they're all custom fields.

For example, the one I want to grab is called "Training_Contact__c." When I make a request:

code:
SELECT ContactId, Opportunity.Training_Contact__c, Opportunity.Name, IsPrimary 
FROM OpportunityContactRole WHERE Opportunity.RecordType.Name = \'FD - Buying\' 
AND Opportunity.IsClosed = TRUE AND Opportunity.CreatedDate >= 2012-05-01T00:00:00Z
I get the following error:

code:
SoapFaultError: 'INVALID_FIELD' "INVALID_FIELD: SELECT ContactId, 
Opportunity.Training_Contact__c ^ ERROR at Row:1:Column:19 No such column 
'Training_Contact__c' on entity 'Opportunity'. If you are attempting to use 
a custom field, be sure to append the '__c' after the custom field name. 
Please reference your WSDL or the describe call for the appropriate names."
But using the custom field "Time_zone__c" works fine. But then "Renewal_Date__c" doesn't. But then "Range__c" does. I can't figure out what's the difference, since they're all custom fields under the Opportunity entity.

Sounds like a field level security issue. Go to Setup->Manage Users->Users. Find the user you are executing the queries as and click on the name of the Profile. Then scroll down to the "Field-Level Security" section.

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

the posted:

Yep, that ended up being it! Turns out that literally the only group not given access to that field, out of the two dozen or so groups, was mine :negative:

I've been there my friend!

In general, how is your Salesforce stuff going? Making progress on whatever it is you are building?

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

the posted:

Yeah, the main issue I've been running into is twofold:

1. Making sure that the Python queries I make are accurate (i.e. making the same query in Salesforce to prove it)

2. Showing why using Python is better than running Salesforce reports (which #1 tends to sometimes go against).

We also ran into limitations of what we could do with Salesforce reports and ended up going down the route of Java + SOAP API + MySQL to accomplish what we needed to do. Salesforce reports are nice but sometimes you need to break out of them for more flexibility.

the posted:

fletcher, while you're here, I'm trying to query a list of Opportunities and find out whether or not they have a Primary Contact. I don't see that there's a specific field for that. It looks like I can query OpportunityContactRole to see if a contact is a Primary Contact, but the problem there is that there are some Opportunities that don't have Contacts at all, so I'd have to grab a list of all Opportunities, then a list of all OpportunityContactRoles, and then filter out what doesn't match.

Searching around it looks like this is a frequenty requested feature.

Sounds like you might be able to use a subquery to accomplish this? Something like:

code:
SELECT AccountId, (SELECT IsPrimary FROM OpportunityContactRoles) FROM Opportunity
Not sure if you've used that relationship name before, but here's where you can find it in workbench:

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb
I think that query handles A, B, and C though. For case A, the subquery would simply have a row count of 0.

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

Malcolm XML posted:

Left join that poo poo son

This is Salesforce, there is no JOIN keyword. The subquery version that I posted should be the equivalent of a LEFT OUTER JOIN

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

the posted:

fletcher, do you know why an admin would get an "Insufficient Privileges" message when trying to access any object page in Salesforce? I would assume they'd have access to everything.

I tried to access a page for an OpportunityContactRole object by going to the salesforce.com page, and it's popping that error. I sent the page "up the chain" to see if the super god admin guy at my company could access it, and he can't either.

edit: I guess those aren't considered "objects" but just relationships, so they don't have a page? The error message is misleading, though.

Are you referring to the standard list view for an object that uses the 3 letter object key prefix? (i.e. https://zzz.salesforce.com/001 for Accounts)

That list view is not supported for many objects, I think it might be the "layoutable" property that determines that, can't remember for sure though:

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

EAT THE EGGS RICOLA posted:

I have a postgres database that I need to replicate across a dozen servers.

Except half of the servers are airgapped from the other, no direct connections permitted. I can automate a file transfer but direct db connections will never be permitted.

Uhh... any suggestions?

Use a VPN or tunnel over SSH. Definitely don't open up the database port to the world!

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb
Why am I not getting a permission denied error here? I already did a FLUSH PRIVILEGES from the root mysql account before trying this:

code:
mysql> SHOW GRANTS;
+---------------------------------------------------------------------------------+
| Grants for test_user@localhost                                                  |
+---------------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'test_user'@'localhost'                                   |
| GRANT SELECT, INSERT ON `test_database`.`test_table` TO 'test_user'@'localhost' |
+---------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> LOAD DATA LOCAL INFILE '//tmp//test_data.csv' INTO TABLE test_database.test_table;
Query OK, 8 rows affected (0.01 sec)
Records: 8  Deleted: 0  Skipped: 0  Warnings: 0
I thought it would fail because I don't have the FILE permission

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

Anaxandrides posted:

FILE permissions are for files on the server locally -- remote file access is based on rights of the server proxy user to the remote machine.

This was with mysqld running on my localhost and /tmp/test_data.csv is also on my local machine, shouldn't the FILE permission cover that?

Adbot
ADBOT LOVES YOU

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

Anaxandrides posted:

Why did you use double backslashes, then? I can't say on that one -- is there a bug on MySQL that causes it to see this as a remote server?

Hmmm can't remember why I had the double backslashes, it seems to work with just singles though. Still can't get it to throw a permissions error though.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply