Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Breetai
Nov 6, 2005

🥄Mah spoon is too big!🍌
Line 100 of the program:
Data ping
Set *some landing table
*Other conditionals*
Run;

Line 900:
Data ping
Set ping
Merge pong
Run;

Line 1400:
Proc sort
Data= ping
By *couple of variables*
Run;

Line 3000:
Data ping
Set ping
*Fuckton of conditionals*
Run;

Line 3500:
Dara final_ping
Set ping
Merge ding
Run;

Line 3900
Proc sort final_ping nodupkey
By *esoterica*
Run;

Line 4300:
Data final_ping
Set final_ping
Merge 20+ other datasets
Run;


...

My job is to locate the source of nondeterministic row selection in this program.

gently caress you.

Adbot
ADBOT LOVES YOU

Breetai
Nov 6, 2005

🥄Mah spoon is too big!🍌
Had to look at some poorly running code.

Turns out the issue was a library, call it lnd.lib.input_vw for argument's sake. Every time it's used in a join spool space usage goes up like crazy.

Did some investigations in dbc.tablesv.

lnd.lib.input_vw is pretty much:

code:
select *
from lnd.db.input_tb

union all

select *
from hist.db.input_tb
where hist.db.input_tb is basically identical to lnd.db.input_tb, with only two differences:
1. hist.db.input_tb contains all entries for 2016 and before (e.g. it's a historical data store) and lnd.db.input_tb contains everything from 2017 onwards. File sizes are 2TB and 1TB respectively.
2. While it has the same primary index (on a unique alphanumerical key) as lnd.db.input_tb, hist.db.input_tb has never had statistics run on its primary index.

Even a

code:
select top 10 * from lnd.lib.input_vw
query will use up 5 loving Terabytes of spool space.


"Why do our programs run so slow?"

Breetai
Nov 6, 2005

🥄Mah spoon is too big!🍌
Apologies for not contributing earlier.


code:

Select
num
, case when num/2 = int(num/2) then 'true' 
  else 'false'  
  end
  as iseven

from lnd.prd.db.allnums

Order by num

Breetai
Nov 6, 2005

🥄Mah spoon is too big!🍌
Our org uses SAS programs for regular scheduled reporting, and our data warehouse is a Teradata SQL environment that we can connect to using SAS EG. Generally speaking our programs are set up to to a whole bunch of data manipulation in Teradata, then bring refined datasets into the SAS grid in order to perform a few final transformations and then create the report outputs.

However, occasionally I will get a 'can you see why this program is running poorly' maintenance request and well...


So someone creates a SAS view via a connection to Teradata. Something like this:


code:
Proc sql;
<Teradata connection credentials>
Create view SASVIEW1 as select * from connection to teradata
(
Select <variables>
from <teradata table 1>

left join

	(select <variables>
	from <teradata table 2>

		left join

		(select <variables>
		from <teradata table 3>


			left join

			(select <variables>
			from <teradata table4>

on.....

)

And then they make HALF A DOZEN SAS views just like this. And then this happens:



code:
Proc sql;
<Teradata connection credentials>
crate table work.FINAL as
select * from connection to Teradata(

select * from SASVIEW1

union all

select * from SASVIEW2

union all

select * from SASVIEW3

union all

select * from SASVIEW4

union all

select * from SASVIEW5

union all

select * from SASVIEW6);

"Whycome my code run slow?" :gonk:

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply