Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Acidian
Nov 24, 2006

I have an actual programing question for once, just started on using php with mysql so this will be a stupid newbie question.

I am after the number of rows in a table in my database. I found a way to do it using the COUNT(*) command, but I have a feeling it could be done much simpler?

code:
$query = "SELECT COUNT(*) FROM table"; 
$process_query = mysqli_query($db, $sql); //$db is my predefined database connection
$fetch_array = mysqli_fetch_array($process_query); //I can't figure out to get the value without fetching it as an array
$table_count=$fetch_array[0];
echo $table_count;
Does mysql not store the table size anywhere? If this was a large database then I assume it has to actually count every row every time the request is passed?

Edit: Using mysqli since I haven't done any oop in php yet, and so I have no idea how classes work (will be getting there in a later course).

Adbot
ADBOT LOVES YOU

McGlockenshire
Dec 16, 2005

GOLLOCKS!
While it's possible to get the size of a table by poking at INFORMATION_SCHEMA, just doing COUNT() is the accepted way. Some databases, including MySQL, will optimize this away to a simple information lookup. Don't worry about performance until you have actual performance problems and can measure them. Let the query planner and carefully created indexes do the work for you.

I'd really, really encourage you to not start with mysqli. The way it does prepared statements is weirdly restrictive and makes doing queries with dynamic argument sizes a real pain. This might not seem like a problem now, but you're going to end up dynamically assembling SQL very often. Also, the way it has a separate mechanism for statement handles (a normal thing) and result sets (a rare thing) can be handy if you're doing advanced work, but it's major overkill for people new to things and it can get very confusing when you have one but think you have the other.

Instead, I suggest learning PDO. Don't worry about how objects work in PHP. Just think of them as arrays that you can also call functions on for now, and you'll be close enough to the truth. It's safe enough to use them like this while learning the language.

A critical difference between PDO and mysqli is that if you ever want a result back from PDO, you must go through a path of preparing the statement, binding arguments to the statement, executing it, and then getting the results back. There are many reasons to use prepared statements, but avoiding SQL injection attacks is the big one. Sometimes this can feel like overkill, especially when there's nothing to bind, but it's an incredibly valuable habit to get in to as you learn.

Here's how I'd write that code using PDO, where $db is a connected PDO object.
php:
<?php
$row_count 0;
$statement $db->prepare('SELECT COUNT(*) FROM table');
$success $statement->execute();
if($success) {
    $row_count $statement->fetch(\PDO::FETCH_NUM)[0]; // more on FETCH_NUM below.
}

Now, after preaching about PDO, I need to give you the caveats. First, for some incredibly stupid reason (that reason being that PHP is a garbage language), it doesn't raise errors by default. This is easy to fix but the default being wrong is annoying. Many methods will return a success boolean, and you should get in the habit of always checking for success and taking appropriate actions on failure.

Second, the default settings of the fetch() method returns the next row in the result an array with both numeric indexes and named indexes, making it unsafe to iterate over. In the code here, I've manually asked fetch() to return only an array with numeric indexes. Nine times out of ten, you want your rows back with only named indexes (\PDO::FETCH_ASSOC). You can set the default fetch mode using the same attribute setting mechanism that lets you fix the default error handling.

Third, while PDO can speak to a variety of RDBMSes, merely using PDO will not let you move your code between different databases. Each database has syntax and behavioral quirks that will get in the way. For example, there's the rowCount method to be used on statements that don't include a result set, but it's useless on SQLite because the underlying API does not expose row count information for things that aren't result sets.

McGlockenshire fucked around with this message at 18:30 on Mar 30, 2018

Acidian
Nov 24, 2006

Wow, that's allot of help, thanks! I will just stick with COUNT() for getting the table size then.

Ditching mysqli now and starting with PDO is a bit demoralizing, took me long enough getting that figured out. I'll see what I do, maybe I will wait until I know more about how classes work, although I could probably figure it out without with allot of help from google.

I will have to read through the index post you linked, as I barely understand how indexing works (other than knowing I should be indexing my tables). I will be using allot of mysql moving forward, so the more I learn the better.

SQL injection I do not know what is yet, I assume it's similar to injecting javascript or html into a $_GET request, but the course I am on will cover it in the next chapter, and on how to defend against it.

McGlockenshire
Dec 16, 2005

GOLLOCKS!

Acidian posted:

SQL injection I do not know what is yet, I assume it's similar to injecting javascript or html into a $_GET request

Same idea, yes - the injection of active stuff where there should be no active stuff. The injection of user-provided Javascript (and HTML!) into output is called cross-site scripting, abbreviated XSS to prevent collision with CSS. Also to be very clear, XSS is not just a thing in $_GET, it's a problem with both a lack of input validation, input filtering & sanitization, and output escaping.

Like most widespread security problems, both SQL Injection and XSS require mindfulness in multiple places in your code. You need to be aware of these issues so you can apply the right techniques. I made a big old effortpost on SO eight years ago that you might find useful.

Peggle Fever
Sep 21, 2005

shake it

This is some great stuff. If I can make a suggestion, if you haven't already - look into a container based system like Docker. You can compose a local development environment that's easily adaptable - and mirrors your production environments. Will save you and your team a good deal of legwork.

There's a lot of open source Dockerfiles that you can derive from - https://github.com/webdevops/Dockerfile is a good place to start.

Setting up a Docker Compose file would give you the flexibility to mount your codebases into the containers, and have all your components containerized - https://docs.docker.com/compose/overview/

I've had headaches getting different developers all on the same page with different OS's, tools, and locally installed programs and services, and this was a good solution for me in past PHP development projects.

bigmandan
Sep 11, 2001

lol internet
College Slice

Acidian posted:

Wow, that's allot of help, thanks! I will just stick with COUNT() for getting the table size then.

Ditching mysqli now and starting with PDO is a bit demoralizing, took me long enough getting that figured out. I'll see what I do, maybe I will wait until I know more about how classes work, although I could probably figure it out without with allot of help from google.

I will have to read through the index post you linked, as I barely understand how indexing works (other than knowing I should be indexing my tables). I will be using allot of mysql moving forward, so the more I learn the better.

SQL injection I do not know what is yet, I assume it's similar to injecting javascript or html into a $_GET request, but the course I am on will cover it in the next chapter, and on how to defend against it.

You'll be thankful for switching over to PDO. Stick with it, and you'll get there. If you need some general tutorials check out https://laracasts.com/ . While it was initially focused on Laravel, it has expanded out to be a great PHP/webdev resource. There is a lesson for PDO too "https://laracasts.com/series/php-for-beginners/episodes/13"

Sab669
Sep 24, 2009

Edit: I'm a loving idiot :suicide:

Sab669 fucked around with this message at 15:22 on Apr 6, 2018

Acidian
Nov 24, 2006

Took me some time to figure out, but using PDO now.

I have a problem now, and I think I might be a little screwed.

I have a while loop that connects to a website, and because I don't want to kill the bandwith of the website or the server, I want a 1 second delay between each execution of the loop.

Loop:
1. Connect to website
2. Download page.
3. Parse page
4. Insert relevant data into SQL
5. Output progress to browser.
6. Repeat

The problem is the automatic timeout. The script might end up doing 1000 loops in total, and I can't have it timeout while it's working. I thought that if I put an echo statement as a progress tracker in the loop, then the loop would be constantly outputting somthing in each iteration. However, the script wants to complete before it outputs anything to the browser, so it timesout.

Is the only solution to extend the max execution time setting in php.ini? If I want the loop to break and output something between each iteration, will I need to use AJAX (which I don't know yet, I don't know any javascript yet).

-JS-
Jun 1, 2004

You can increase the timeout. To show output whilst the script is running use output buffering

But to be honest you're probably looking at fixing the wrong problem.

An AJAX approach as you mention would probably do the job (but you get the complexities of what happens if you close the browser mid way through), but really your parse job should run on the server as a background process and report its status to the page instead.

Acidian
Nov 24, 2006

-JS- posted:

but really your parse job should run on the server as a background process and report its status to the page instead.

How would I get it to run as a background process?

I will look into output buffering, seems like a good workaround.

Edit: How would I get it to run as a background process AND report it's status to the page*

I am looking at using the exec command and setting up a cron job, but that seemed alot of work for something that should be simple. I tried using output buffering, but it's not working. The browser doesn't output anything until the page is fully loaded (and I think the browser itself will timeout after 5 minutes, so I would have to limit the iterations of the loop to <300). Not sure if that is a quirk with nginx or Chrome.

Edit 2: Could I maybe set up a page redirect every 100 iterations of the loop, that refers to itself, and every time the page is loaded it checks the database for the first listing that has not been updated and does the next 100 iterations from there until all entries in the database are updated?

Acidian fucked around with this message at 16:54 on Apr 10, 2018

nielsm
Jun 1, 2009



The really proper way would be to have a separate service/daemon running on the server, which accepts jobs posted via a message queue, or maybe just inserted into a database table. How to actually set up something like that depends wholly on the hosting environment, and most cheap webhosts won't support this scenario at all. (Cheap webhosts will also kill any long running processes spawned by you, so an exec() solution would likely break the same way.)

Acidian
Nov 24, 2006

nielsm posted:

The really proper way would be to have a separate service/daemon running on the server, which accepts jobs posted via a message queue, or maybe just inserted into a database table. How to actually set up something like that depends wholly on the hosting environment, and most cheap webhosts won't support this scenario at all. (Cheap webhosts will also kill any long running processes spawned by you, so an exec() solution would likely break the same way.)

I am currently planning on running my server on AWS. I have an Ubuntu EC2 set up, and MariaDB RDS, but I haven't figured out how to get the two to talk to each other yet (basicly, I set up a security rule and a VPC, but I need to figure out the internal IP from the EC3 to the RDS, so I know what to put in my PDO credentials).

Anyway, I am not very experienced with linux, and I don't expect someone to sit down and write a 5 page explanation on how to set up a daemon like that and how to post the job requests via PHP. However, if you have any guides you can link me, that would be great.

When it comes to cost, AWS is free to a point. I think they will charge me if I start using the CPU overmuch, but for the limited use I have right now I don't think it will be a problem.

Edit: I think I figured it out. I might have some more specific questions later, but thanks for the help so far!

Acidian fucked around with this message at 08:46 on Apr 13, 2018

Acidian
Nov 24, 2006

Question about mysql

If a php script is doing several calls on MariaDB over a longer period, and another script starts running that is also doing database calls on the same database and table, or if I have users doing stuff that calls information from database, how is all that handled? Does the database queue the requests, and will my scripts time out if they don't get an immediate response?

McGlockenshire
Dec 16, 2005

GOLLOCKS!
The answer depends on what your queries are doing. If it's just reads, then you don't even need to worry about it at all, assuming MySQL is configured correctly for the data size and the hardware. There's a lot of caching involved at multiple levels, including an actual cache of queries and their result sets. The more you're writing, the more likely you are to start encountering performance problems when doing actions in parallel.

If you think you're having a performance problem, measure it before taking action by gathering performance metrics. The reason that premature optimization is the root of all evil is that it can lead you to make decisions that are unnecessary or that make things worse.

Can you explain your concerns more? The reason I've responded this way is that your questions lead me to believe this is the XY problem in action.

McGlockenshire fucked around with this message at 21:07 on Apr 21, 2018

Ranzear
Jul 25, 2013

Note first that all of this is when writes are involved. If you're just doing simple reads and nothing is logically dependent on them you can just read willy-nilly.

So first some boilerplate:
Every time you have a php script open a database handler, like $dbh = new PDO($dsn, $user, $password);, it opens its own connection to the db. Every one of those is separate and parallel. Think of each as a session as well. You should not be opening a new db handler for every statement, nor should you open a new one for every include either. One script invokation (be hit a web or cli) should have one database handler. With all that correct, you won't have race conditions within a single script. The lone db handler will do everything in order.

Now, within parallel scripts, the database's state changes fairly instantly on any given write. One script can sneak a read inbetween two writes from a different script, or sneak a write in that would change the logic of another script depending on when it read that entry. What this means though is that you can't have a script read out a bunch of data at the start and expect it to stay the same for the entire duration of the script. It can still be written to by other scripts. Similarly, subsequent writes can have invalid states in between that can be read by parallel scripts.

To prevent this, first learn what A.C.I.D. means. Atomicity is the main focus, but the whole script doesn't need to be atomic, which seems to be your other concern (mostly about blocking).

So there are two main things to look into:

First is when you need to write more than one thing in separate statements or otherwise need an 'all or nothing' bunch of things to happen to the db, so that you can't for instance deduct currency but not give the purchased item. These are transactions. Doing statements outside of transactions means they 'auto commit', meaning execute immediately, but opening a transaction saves committing until you say when, and everything happens then and all at once. The db does not show any changes at all until you commit, or you can throw away the changes mid process with rollback. Also, you must commit changes; your script ending automatically calls rollback too.

Side note: If you wrap stuff in a try block, your catch can check if you're in a transaction and call a rollback, meaning any thrown exception will toss out any changes to the db. Breaking out and never hitting a commit achieves this too, but it's good to be sure.

Second comes up if you have logic dependent on a read, like if a value in the database being 1 means set it to 0 or being 0 means set it to 1. If you read 1, something else may have set it to 0 before you manage to write 0. You need to invoke row locking, meaning nothing else can interact with that row until you commit, roll back, or your script ends. You do this with SELECT ... FOR UPDATE but only when you have a transaction active (ending the transaction unlocks the row, I think autocommit is triggered on reads, but locking is only useful in transactions anyway). This signals to MariaDB/MySQL that 'nobody can gently caress with this row until I'm done with it'. You can now read the 1 and be sure it's still 1 when you write 0, and anyone else trying to read from that row will wait until you're done and always read your 0. You can and should do even just a dummy SELECT FOR UPDATE on any row you might write to during your transaction so that everything is locked while your script figures out what it wants to write.

You want to keep your transactions tightly wrapped to what needs them, because SELECT FOR UPDATE will cause other scripts to block and hang if they try to read or write that row, but in a merciful world you should be using InnoDB meaning it's a row lock and not a whole table lock. Basically, start your transaction right before you do any important reads, and end it after you've done all important writes. Don't wrap a whole slow-rear end script in a single transaction unless the reads at the start are truly logically bound to the writes at the end.

Ranzear fucked around with this message at 22:04 on Apr 21, 2018

Acidian
Nov 24, 2006

McGlockenshire posted:

Can you explain your concerns more? The reason I've responded this way is that your questions lead me to believe this is the XY problem in action.

At this point, the question was mostly academic in nature. I am not having any performance problems, but I started thinking about it in connection with my earlier question. If a user triggers a long running script from my website (read this long rear end xml file, write to my DB, check all lines with distrubutor DB, update/write to my DB). Then even the same user might start interacting with the database with other scripts while this script is running (or if I have a script running as a cron job).

Ranzear posted:

Note first that all of this is when writes are involved. If you're just doing simple reads and nothing is logically dependent on them you can just read willy-nilly.

Thank you both. I will keep this in mind when I am writing scripts interacting with my DB. If it's important to lock the row before read and write, then I will make sure to do that. Thanks for explaining how to set that up. It hadn't occurred to me that logical operations which are depending on values in the DB could be changed by other scripts while the main script is running.

Also, I usually open a DB connection at the start of any script and end it when the script ends. For my web pages I open the connection in the header and close it in the footer, which are shared between all my webpages, even the ones that might not even do anything with the DB.

Since operations on the DB are cached then there's also a lot of (most) interactions I don't need to worry about.

McGlockenshire
Dec 16, 2005

GOLLOCKS!

Acidian posted:

(read this long rear end xml file, write to my DB, check all lines with distrubutor DB, update/write to my DB). Then even the same user might start interacting with the database with other scripts while this script is running (or if I have a script running as a cron job).

Do all of that writing inside a single transaction. While the XML is processing and being loaded into the database, other readers on the table won't know unless you fiddle with the isolation levels. Isolation is the "I" in ACID, noted in that great post above. If you don't wrap the writes in a transaction, or wrap each write in its own transaction, other readers will end up seeing the data before you might be ready for it.

Batching multiple writes or doing large single writes in transactions is also better for performance.

quote:

Also, I usually open a DB connection at the start of any script and end it when the script ends. For my web pages I open the connection in the header and close it in the footer, which are shared between all my webpages, even the ones that might not even do anything with the DB.

You don't really need to expressly close your connection. When PHP is done executing, all networking connections are closed, unless you're using persistent connections. Don't use persistent connections.

Also, and I'm sure you already know this or have figured it out, but it's bad practice to mix code that generates output with code that performs important program operations. You don't need to stop what you're doing and rewrite the world or anything, just know that intermixing business logic with HTML generation tends to lead to shooting yourself in the foot, repeatedly, forever, with a minigun. Of nukes. Unfortunately doing things "correctly" requires a tremendous amount of learning and also introduces all kinds of other complexities. You can start by thinking about moving all of the HTML generation out into external files. Just using includes is fine.

Acidian
Nov 24, 2006

McGlockenshire posted:

Do all of that writing inside a single transaction. While the XML is processing and being loaded into the database, other readers on the table won't know unless you fiddle with the isolation levels. Isolation is the "I" in ACID, noted in that great post above. If you don't wrap the writes in a transaction, or wrap each write in its own transaction, other readers will end up seeing the data before you might be ready for it.

Batching multiple writes or doing large single writes in transactions is also better for performance.


You don't really need to expressly close your connection. When PHP is done executing, all networking connections are closed, unless you're using persistent connections. Don't use persistent connections.

Also, and I'm sure you already know this or have figured it out, but it's bad practice to mix code that generates output with code that performs important program operations. You don't need to stop what you're doing and rewrite the world or anything, just know that intermixing business logic with HTML generation tends to lead to shooting yourself in the foot, repeatedly, forever, with a minigun. Of nukes. Unfortunately doing things "correctly" requires a tremendous amount of learning and also introduces all kinds of other complexities. You can start by thinking about moving all of the HTML generation out into external files. Just using includes is fine.

If I wanted to write it all as a single transaction. Then I would just read the whole XML sheet into a huge prepare() statement?

I see the default for MariaDB/InnoDB is repeatable read. I am having some issues wrapping my head around it all, but I think I am ok as long as I am writing complete lines to the end of the database and any scripts reading from the DB will only be working on completed rows. However, I will be doing logic operations later on, where it will be important to lock the row while the script reads the row and then updates it.

I know I don't need to close the connection, but I have it set up in my footer anyway, I guess it's an unnecessary precaution.

So far I have not been writing much html or generating html. Everything from <html> to <body> is included in a header.php file which is included in an include.php file that starts all my html documents. then I have everything from </body> to </html> in a footer.php which also closes the DB connection. Mostly I have only been generating tables to check that my code is working properly and making some basic pages for inputting files that will be imported to the DB. I am saving all the HTML/CSS for last, as I want to do some videocourses on them (my html is fine, by my css is really bad), and I want to do a course on javascript and AJAX as well. I want to learn PHP and SQL as well as I can, so I am focusing on everything server sided for now, and trying to solve it there when possible.

I tried setting up my server to use https / ssl, but since I am signing them myself the browser kept warning me the site might be unsafe, super annoying. Seems I need to pay for the browsers to recognize my certificate. Is there any way around that? Itskage mentioned earlier that I should be using SSL on my website.

Acidian fucked around with this message at 21:11 on Apr 22, 2018

BallerBallerDillz
Jun 11, 2009

Cock, Rules, Everything, Around, Me
Scratchmo
You can pay someone for a certificate or you can use Let's Encrypt for free. I'd recommend LE unless you need something more exotic like a wildcard cert or extended validation.

putin is a cunt
Apr 5, 2007

BOY DO I SURE ENJOY TRASH. THERE'S NOTHING MORE I LOVE THAN TO SIT DOWN IN FRONT OF THE BIG SCREEN AND EAT A BIIIIG STEAMY BOWL OF SHIT. WARNER BROS CAN COME OVER TO MY HOUSE AND ASSFUCK MY MOM WHILE I WATCH AND I WOULD CERTIFY IT FRESH, NO QUESTION
Pretty sure SSL certs are free on AWS, use that since you're already planning on hosting there.

jiggerypokery
Feb 1, 2012

...But I could hardly wait six months with a red hot jape like that under me belt.

Yup. AWS do free wildcard certs even. The set up is way, way easier than other ways of doing it, especially if you use route53 and why wouldn't you?

bigmandan
Sep 11, 2001

lol internet
College Slice

96 Port Hub posted:

You can pay someone for a certificate or you can use Let's Encrypt for free. I'd recommend LE unless you need something more exotic like a wildcard cert or extended validation.

Let's Encrypt can do wildcards now too. https://community.letsencrypt.org/t/acme-v2-and-wildcard-certificate-support-is-live/55579

Acidian
Nov 24, 2006

Alright, thank you guys, so much help, you are amazing!

Acidian
Nov 24, 2006

I am trying to convert array keys to strings in a foreach loop, but I am getting a notice from PHP warning me about array to string conversion.

code:
foreach($word_check_array as $check_k => $fix_v){
        $check = strval($check_k);
        $fix = strval($fix_v);
Is there an easy way of doing this? I thought about using the array_keys function to store the keys in new array, then use those array elements to make calls on the $word_check_array above. However, it's alot of code for something that seems like it should be really simple (as simple as above, it works, it just gives me notices).

Edit: Nevermind, I am stupid! I was adding arrays to $word_check_array ($word_check_array = ['key' => 'value']), rather than += 'key' => 'value'. Frustrating, but at least I learned something. :D

Acidian fucked around with this message at 10:32 on Jun 3, 2018

itskage
Aug 26, 2003


Anyone seem to have an issue with PHPUnit dying on a max time limit of 300 seconds? It seem to ignore php.ini settings and any set_time_limit/ini_set features. Our CI is is dying on it when running the full suite on new commits. Running it on just a few tests works fine, so it's like PHPUnit is trying to execute the tests with that time limit for the suite, which isn't going to work for this large project.

E: I think I found it. Going to be really sad if it's what I think it is.

itskage fucked around with this message at 17:00 on Jun 4, 2018

bigmandan
Sep 11, 2001

lol internet
College Slice

itskage posted:

Anyone seem to have an issue with PHPUnit dying on a max time limit of 300 seconds? It seem to ignore php.ini settings and any set_time_limit/ini_set features. Our CI is is dying on it when running the full suite on new commits. Running it on just a few tests works fine, so it's like PHPUnit is trying to execute the tests with that time limit for the suite, which isn't going to work for this large project.

E: I think I found it. Going to be really sad if it's what I think it is.

I'm curious... what do you think it is?

itskage
Aug 26, 2003


Someone had a set_time_limit set in a function that iterates over a lot of records. For the purposes of the unit test it doesn't matter, but once set it would stick and PHPUnit would use that for the rest of the test.


How do people handle this? I don't see an issue using set time limit for things that will take awhile and moving it beyond the 30 second default. I don't like the idea of configuring CI to be longer globally because something hanging can be caught in CI before it hits production.

The best idea I can think of is to have each phpunit's TestCase set_time_limit to default during setup. So that any classes or functions that use set time limit in other cases won't impede on others.


Edit: For the record we're adding tests to an existing 6 year old ball of mud project. It's a fun an interesting journey that's going about as well as you'd think something like that would go.

itskage fucked around with this message at 18:29 on Jun 4, 2018

bigmandan
Sep 11, 2001

lol internet
College Slice

itskage posted:

Someone had a set_time_limit set in a function that iterates over a lot of records. For the purposes of the unit test it doesn't matter, but once set it would stick and PHPUnit would use that for the rest of the test.


How do people handle this? I don't see an issue using set time limit for things that will take awhile and moving it beyond the 30 second default. I don't like the idea of configuring CI to be longer globally because something hanging can be caught in CI before it hits production.

The best idea I can think of is to have each phpunit's TestCase set_time_limit to default during setup. So that any classes or functions that use set time limit in other cases won't impede on others.


Edit: For the record we're adding tests to an existing 6 year old ball of mud project. It's a fun an interesting journey that's going about as well as you'd think something like that would go.

Changing the time limit in test setup may work, but it could cause issues down the road when other changes are made. To me it would make sense to have the function setting the time limit clean up after itself:

PHP code:
function aFunctionThatSetsTimeLimit()
{
	$originalLimit = ini_get('max_execution_time');
	set_time_limit($someLimit)

	// code

	set_time_limit($orginalLimit)

	return $whateverIfNeeded;
}
If used often enough the setting/resetting of the time limit could be wrapped up in some helper functions. Resetting the time limit may make some hosting providers angry though.

Ideally long running tasks should be in some sort of queue system but doing that refactor may be out of scope for your project.

Cool Matty
Jan 8, 2006
Usuyami no Sekai
I think this is probably the best place to ask this? Either way:

Right now I am fighting to get PHP-FPM/Nginx to behave on my local dev machine. I'm trying to use a Heroku buildpack locally, so I can test using the same environment as production.

This all seems well and good, my test page will load.... but only once. Attempting to refresh the page results in a 60 second 504 error timeout. Note that my test page is literally just: die('test');

If I spam refresh a bunch of times, eventually it might manage to load again, but it's extremely inconsistent. The behavior is very strange, and I don't even know where to begin to look for an answer. I've checked the error log for PHP-FPM, and it shows no errors. Nginx complains that the upstream connection timed out, which seems to indicate to me that the issue lies within PHP-FPM, but without anything in PHP-FPM's error log to go on, I have no idea where the hangup is. I did turn PHP-FPM's error logging all the way up to DEBUG, and it is logging, but not giving me any errors here.

As an aside, just to be sure, I have tested this page using PHP's built in server, and there was no issues, so I don't think it's a PHP configuration issue either, only something specific with PHP-FPM. I am running this on WSL Ubuntu 18.04.

Anyone got a clue of what might be wrong, or even where to start looking?

spiritual bypass
Feb 19, 2008

Grimey Drawer
I have no idea but maybe a proper Linux VM instead of WSL? I've only had bad experiences trying to help coworkers with it

Ranzear
Jul 25, 2013

Cool Matty posted:

As an aside, just to be sure, I have tested this page using PHP's built in server, and there was no issues, so I don't think it's a PHP configuration issue either, only something specific with PHP-FPM. I am running this on WSL Ubuntu 18.04.

Anyone got a clue of what might be wrong, or even where to start looking?

What's your fpm set to listen on? If it's a unix socket it could be a variety of file/permission issues. If it's just a tcp listener, try a unix socket instead. Be sure that nginx and php-fpm are running as the same user to simplify it further.

Cool Matty
Jan 8, 2006
Usuyami no Sekai

Ranzear posted:

What's your fpm set to listen on? If it's a unix socket it could be a variety of file/permission issues. If it's just a tcp listener, try a unix socket instead. Be sure that nginx and php-fpm are running as the same user to simplify it further.

It's on a socket. I tried TCP for giggles but apparently that's busted in WSL, so they recommend socket anyway. Both nginx and php-fpm are running as my local user (confirmed via top).

After digging a bit more, I feel like it has something to do with closing the connection after a request. If I disable fastcgi_buffers in nginx, I am no longer able to load the page at all. If I turn it back on, I can get the page to load the first request, but not subsequent ones. I can only assume that means the socket is working (otherwise it'd never load). But I don't know why php-fpm would not be finishing a request, or screwing up the buffer, or whatever it's doing there.

Woodsy Owl
Oct 27, 2004
I’ve had random stuff (rsync most recently) hang for inexplicable reasons on WSL. Are you serving the files from /mnt/c by any chance? If yes Then have a try moving it from the mounted Windows volume and into a folder within Ubuntu

bigmandan
Sep 11, 2001

lol internet
College Slice

Cool Matty posted:

It's on a socket. I tried TCP for giggles but apparently that's busted in WSL, so they recommend socket anyway. Both nginx and php-fpm are running as my local user (confirmed via top).

After digging a bit more, I feel like it has something to do with closing the connection after a request. If I disable fastcgi_buffers in nginx, I am no longer able to load the page at all. If I turn it back on, I can get the page to load the first request, but not subsequent ones. I can only assume that means the socket is working (otherwise it'd never load). But I don't know why php-fpm would not be finishing a request, or screwing up the buffer, or whatever it's doing there.

I was having a similar problem and I eventually got it working in my dev env at home. I'll post my config once I get a chance to do so.

edit:

Running WSL with Ubuntu 18.04.

/home/bigmandan/projects is a symlink to /mnt/c/projects

NOT simlinked, but normal file for server config. Had to turn off fastcgi buffering
/etc/nginx/sites-enabled/mysite


code:
server {

        listen 80;
        server_name mysite.test;
        root /home/bigmandan/projects/mysite;

        index index.html index.htm index.php;

        charset utf-8;

        location / {
                try_files $uri $uri/ /index.php?$query_string;
        }


        location = /favicon.ico { access_log off; log_not_found off; }
        location = /robots.txt  { access_log off; log_not_found off; }

        location ~ \.php$ {
                include snippets/fastcgi-php.conf;                
                fastcgi_pass unix:/var/run/php/php7.2-fpm.sock;
                fastcgi_buffering off;
        }

        location ~ /\.ht {
                deny all;
        }

        location ~ /.well-known {
                allow all;
        }
}
Changes to /etc/php/7.2/fpm/pool.d/www.conf
Had to change user/group AND listen.owner and listen.group

code:
user = bigmandan
group = bigmandan
listen.owner = bigmandan
listen.group = bigmandan
And finally nginx confing:

code:
user bigmandan;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
	worker_connections 768;
	# multi_accept on;
}

http {

	##
	# Basic Settings
	##

	sendfile on;
	tcp_nopush on;
	tcp_nodelay on;
	keepalive_timeout 65;
	types_hash_max_size 2048;	

	include /etc/nginx/mime.types;
	default_type application/octet-stream;

	##
	# SSL Settings
	##

	ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
	ssl_prefer_server_ciphers on;

	##
	# Logging Settings
	##

	access_log /var/log/nginx/access.log;
	error_log /var/log/nginx/error.log;

	##
	# Gzip Settings
	##

	gzip on;

	##
	# Virtual Host Configs
	##

	include /etc/nginx/conf.d/*.conf;
	include /etc/nginx/sites-enabled/*;
}

bigmandan fucked around with this message at 01:21 on Jun 22, 2018

Acidian
Nov 24, 2006

First off, I just want to thank you all again for the amazing help I have gotten here earlier. I really feel you all go above and beyond in some of your responses, and it's a great help.

I have set up my own test environement for Magento 2, and I am trying to work with a PIM (Akeno) to populate products to Magento 2 via the Magento 2 REST API.

This will by my first official work project, and also the first time I will be actively using OOP in PHP. Even though I am doing this in my own free time, I feel the project will be better off if another developer can come in and fix my code if I die in a fire or something. I kinda took it upon myself to create an integration between the PIM I am setting up and Magento 2 which is being hosted by professionals who know what they are doing. So I trying to format well, comment well, and sort everything into classes and objects. I am using GuzzleHttp for this project, and I even set up a composer.json with dependencies. I feel like such a big boy!

So regarding using REST. Is there any common practice to rate limit REST requests in a script? I will set up the larger scripts (that is, scripts that make a lot of calls) to run at like 4 am on Sundays, but even so I worry that making too many requests on the REST API will reduce performance on the Magento server, while the script is running. Also, I might have to run scipts during the day that might not make thousands of requests, but it will still make requests until it's done. I can't really test this impact on performance on my own Magento test environment in a good way.

From what I understand of the REST API, I need to send a PUT or POST request on one object at a time. This means that if I have 500 000 new products, then I need to make 500 000 individual calls on the API. Is there any way of adding more information per request, so that I can maybe send 50 objects in one request? The tutorial or documentation on the Magento site did not seem to indicate that you could.

The GuzzleHttp documentation also shows how you can queue up several requests, and then send multiple requests concurrently, but is there any point to that? I would assume it's better to just work through them sequentially. I also don't understand the difference between a synchronous request and an asynchronous request.

I also worry about script runtime, so I am thinking that maybe I will populate a mysql table with products that are being updated, then I will set a limit to how many products will be done in one run and relaunch the script with shell_exec() if there are still products in the table that have not been updated in Magento. This table can then also server as a backup in case any products fail to update for any reason.

Acidian fucked around with this message at 16:10 on Jul 8, 2018

bigmandan
Sep 11, 2001

lol internet
College Slice

Acidian posted:

So regarding using REST. Is there any common practice to rate limit REST requests in a script? I will set up the larger scripts (that is, scripts that make a lot of calls) to run at like 4 am on Sundays, but even so I worry that making too many requests on the REST API will reduce performance on the Magento server, while the script is running. Also, I might have to run scipts during the day that might not make thousands of requests, but it will still make requests until it's done. I can't really test this impact on performance on my own Magento test environment in a good way.

Rate limiting your requests will depend on what the endpoint limits are. Check their documentation to make sure. You could setup some sort of worker queue (beastalkd, rabbitmq, etc...) to perform your requests. The worker would send up to the maximum requests in the allowed time period then "sleep" until the next run.

Acidian posted:

From what I understand of the REST API, I need to send a PUT or POST request on one object at a time. This means that if I have 500 000 new products, then I need to make 500 000 individual calls on the API. Is there any way of adding more information per request, so that I can maybe send 50 objects in one request? The tutorial or documentation on the Magento site did not seem to indicate that you could.

This will really depend on what their endpoint accepts. I've used/written endpoints that accept a json array of objects. You'll have to dig around in their docs and see.

Acidian posted:

The GuzzleHttp documentation also shows how you can queue up several requests, and then send multiple requests concurrently, but is there any point to that? I would assume it's better to just work through them sequentially. I also don't understand the difference between a synchronous request and an asynchronous request.

If you have a lot of requests to make, sending multiple, concurrent, requests at a time could be more efficient than doing each sequentially. Using Guzzle's async would allow you to fire off a bunch of requests then have a handler do something with the response as they complete.

spiritual bypass
Feb 19, 2008

Grimey Drawer
It'd be better if you could find a way to load them directly on the server instead of using HTTP. That way, you at least have a shot at batching the inserts in a transaction.

Acidian
Nov 24, 2006

bigmandan posted:

Rate limiting your requests will depend on what the endpoint limits are. Check their documentation to make sure. You could setup some sort of worker queue (beastalkd, rabbitmq, etc...) to perform your requests. The worker would send up to the maximum requests in the allowed time period then "sleep" until the next run.

The documentation says there is no limit on the requests set by the API. Even so, I worry that it's a good idea to not overload the webserver with requests, especially if there are customers also browsing the site. However, I don't know enough about networking and how web servers handle requests to know if it's a problem or not.

I will sit down and learn beanstalkd, that seemed something that will be useful now and in later projects. Thank you!

bigmandan posted:

This will really depend on what their endpoint accepts. I've used/written endpoints that accept a json array of objects. You'll have to dig around in their docs and see.

As I understand the API documentation, the json formated file should only contain one object after the header. I haven't worked with restful APIs, and I am new to reading these types of documentations, so I wasn't sure if I had misunderstood something.


bigmandan posted:

If you have a lot of requests to make, sending multiple, concurrent, requests at a time could be more efficient than doing each sequentially. Using Guzzle's async would allow you to fire off a bunch of requests then have a handler do something with the response as they complete.

I do have a lot of requests to make, so to make it more efficient, I will give this a try. Thank you!


rt4 posted:

It'd be better if you could find a way to load them directly on the server instead of using HTTP. That way, you at least have a shot at batching the inserts in a transaction.

I have been giving this some thought, and it has some challenges. The way I am doing it currently, then I don't need any ftp/ssh access to the magento server (for a batch .csv file with all my products, for example), and I do not need to involve the hosting provider in installing anything on the server. They do have a service agreement that they might consider compromised by giving me access to the server, or by asking them to install scripts they don't know anything about.

I have a secondary issue, and that is with the ERP that my PIM will be integrated with as well. The only way I am allowed to interact with my ERP is through the RESTFUL API they have installed, which is thankfully the same as Magento (giving me a test system to develop on), but this means that even if I could set up a script on the magento server that talked directly to the database, I would still need a script that talks to the same API on our ERP platform. So I am thinking it's easier to just write one code for the same type of API but transacting between two different servers (I think the endpoint names might be different in some cases, but that's it).

Acidian fucked around with this message at 14:30 on Jul 10, 2018

xiw
Sep 25, 2011

i wake up at night
night action madness nightmares
maybe i am scum

Cpig Haiku contest 2020 winner

Cool Matty posted:

I think this is probably the best place to ask this? Either way:

Right now I am fighting to get PHP-FPM/Nginx to behave on my local dev machine. I'm trying to use a Heroku buildpack locally, so I can test using the same environment as production.

This all seems well and good, my test page will load.... but only once. Attempting to refresh the page results in a 60 second 504 error timeout. Note that my test page is literally just: die('test');

If I spam refresh a bunch of times, eventually it might manage to load again, but it's extremely inconsistent. The behavior is very strange, and I don't even know where to begin to look for an answer. I've checked the error log for PHP-FPM, and it shows no errors. Nginx complains that the upstream connection timed out, which seems to indicate to me that the issue lies within PHP-FPM, but without anything in PHP-FPM's error log to go on, I have no idea where the hangup is. I did turn PHP-FPM's error logging all the way up to DEBUG, and it is logging, but not giving me any errors here.

As an aside, just to be sure, I have tested this page using PHP's built in server, and there was no issues, so I don't think it's a PHP configuration issue either, only something specific with PHP-FPM. I am running this on WSL Ubuntu 18.04.

Anyone got a clue of what might be wrong, or even where to start looking?

WSL often randomly sucks if windows defender's real-time file scanning's on. I recommend turning it off while developing - you can turn it off permanently in group policy if you feel like it.

Adbot
ADBOT LOVES YOU

Acidian
Nov 24, 2006

I have an actual proper PHP question this time.

Using json_decode(), I have an array of nested arrays, and in theory I might be in a situation where I don't know how many child arrays there are.

With json_decode() I can chose to solve this as an object or as an associative array. One or the other doesn't seem to help me much in solving the problem.

So in theory, I need to go through each element, check if the element is a new array or not.

Then that array, might have more arrays, and I need to check each of those elements for new arrays.

Then I need to check those arrays for new arrays again.

In practice I know how many arrays there are, but to future proof the program, then I have to assume the amount of arrays and where they branch can change.

All the potential arrays should in theory be listed under $object->children_data or $array['children_data' => [new=>array]], if that is any help. However, it should seem possible to solve without that knowledge (could always use the is_array() function within a foreach loop as well).

I feel kinda stupid and I am baffled on how to solve this, but it seems like a common enough problem, so I hope you guys can help me.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply