Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Sad Panda posted:

Next part of my Blackjack program. A lookup table. A short extract of the data would be...


code:
        2   3   4   5   6   7   8   9   T   A
Hard 5  H   H   H   H   H   H   H   H   H   H
Hard 6  H   H   H   H   H   H   H   H   H   H
Hard 7  H   H   H   H   H   H   H   H   H   H
Hard 8  H   H   H   H   H   H   H   H   H   H
Hard 9  H   D   D   D   D   H   H   H   H   H
Hard 10 D   D   D   D   D   D   D   D   H   H
I want to be able to input 2, Hard 6 and it return H.

My original idea was 2D arrays, but that doesn't seem to support a column name which is what I'd call that 2/3/4/.. at the top. I found one solution, and he used a Pickled 'av table' (so the variable name suggests), but that seems a bit beyond me right now.


You could make each row a namedtuple type with a parameter for each column name

Adbot
ADBOT LOVES YOU

Cingulate
Oct 23, 2012

by Fluffdaddy

Sad Panda posted:

Next part of my Blackjack program. A lookup table. A short extract of the data would be...


code:
        2   3   4   5   6   7   8   9   T   A
Hard 5  H   H   H   H   H   H   H   H   H   H
Hard 6  H   H   H   H   H   H   H   H   H   H
Hard 7  H   H   H   H   H   H   H   H   H   H
Hard 8  H   H   H   H   H   H   H   H   H   H
Hard 9  H   D   D   D   D   H   H   H   H   H
Hard 10 D   D   D   D   D   D   D   D   H   H
I want to be able to input 2, Hard 6 and it return H.

My original idea was 2D arrays, but that doesn't seem to support a column name which is what I'd call that 2/3/4/.. at the top. I found one solution, and he used a Pickled 'av table' (so the variable name suggests), but that seems a bit beyond me right now.
In pandas, that would be df.loc["Hard 6", 2]

Sad Panda
Sep 22, 2004

I'm a Sad Panda.
baka, thank you. I tried namedtuples but it seemed a bit tedious.

Cingulate posted:

In pandas, that would be df.loc["Hard 6", 2]

Thank you. I created a CSV of the whole table.

Python code:
import pandas as pd
import numpy as np

headers = ["A", "2", "3", "4", "5", "6", "7", "8", "9", "10"]

df = pd.read_csv("bjstrat_no_headers.csv", names = headers)

print(df.head())
That gets me the following table output. Tomorrow I need to learn how to index it so I can use df.loc[]

code:
   A  2  3  4  5  6  7  8  9 10
5  H  H  H  H  H  H  H  H  H  H
6  H  H  H  H  H  H  H  H  H  H
7  H  H  H  H  H  H  H  H  H  H
8  H  H  H  H  H  H  H  H  H  H
9  H  H  D  D  D  D  H  H  H  H

Sad Panda
Sep 22, 2004

I'm a Sad Panda.

Sad Panda posted:

baka, thank you. I tried namedtuples but it seemed a bit tedious.


Thank you. I created a CSV of the whole table.

Python code:
import pandas as pd
import numpy as np

headers = ["A", "2", "3", "4", "5", "6", "7", "8", "9", "10"]

df = pd.read_csv("bjstrat_no_headers.csv", names = headers)

print(df.head())
That gets me the following table output. Tomorrow I need to learn how to index it so I can use df.loc[]

code:
   A  2  3  4  5  6  7  8  9 10
5  H  H  H  H  H  H  H  H  H  H
6  H  H  H  H  H  H  H  H  H  H
7  H  H  H  H  H  H  H  H  H  H
8  H  H  H  H  H  H  H  H  H  H
9  H  H  D  D  D  D  H  H  H  H


That was super simple.

Python code:

import pandas as pd

def right_move(player, dealer):
    """
    Index(['5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17',
           '18', '19', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8', 'A9', 'P2', 'P3',
           'P4', 'P5', 'P6', 'P7', 'P8', 'P9', 'P10', 'PA'])
    """

    headers = ["A", "2", "3", "4", "5", "6", "7", "8", "9", "10"]

    df = pd.read_csv("bjstrat_no_headers.csv", names = headers)

    return df.loc[player, dealer]


print(right_move("A7", "4"))
The index comment is there so I remember what are valid inputs for the player's hand.

Cingulate
Oct 23, 2012

by Fluffdaddy
I think that function is overkill. Either that, or it's a bit inefficient to always read in the csv whenever you want to retrieve one single number.

You could also write an input check that gracefully fails whenever you request a combination that doesn't exist.

In the long run, you probably want to construct a class (to handle state), and that class could store the df, and that function could be a method.

Sad Panda
Sep 22, 2004

I'm a Sad Panda.

Cingulate posted:

I think that function is overkill. Either that, or it's a bit inefficient to always read in the csv whenever you want to retrieve one single number.

You could also write an input check that gracefully fails whenever you request a combination that doesn't exist.

In the long run, you probably want to construct a class (to handle state), and that class could store the df, and that function could be a method.

You're right about the reading of the CSV. I've moved that outside of the function so it should just happen at load when the df object is created and then stay loaded in cache right?

Cingulate
Oct 23, 2012

by Fluffdaddy

Sad Panda posted:

You're right about the reading of the CSV. I've moved that outside of the function so it should just happen at load when the df object is created and then stay loaded in cache right?
Yes, if you define the object in public namespace, before the function is created, it will be available for calls to the function. You could also pass the data frame to the function as an argument, perhaps as a default.

Though as I said, you will probably end up using a class or two.

huhu
Feb 24, 2006
I constantly finding myself going what is that path to the thing I need to import and I start guessing is it "Foo" or "Foo.SubFoo" or "Foo.Sub.Bar"? Is there some easy way with Python, I use Pycharm specifically, to not have to Google where to import what I need from?

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
If pycharm is set up correctly it should tab complete imports.

SurgicalOntologist
Jun 17, 2004

You can also press Alt+Enter as you're writing code (referring to an object that you haven't imported yet) and PyCharm will give you a selection of possible imports and automatically add it to the top of the file.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

SurgicalOntologist posted:

You can also press Alt+Enter as you're writing code (referring to an object that you haven't imported yet) and PyCharm will give you a selection of possible imports and automatically add it to the top of the file.

Yes, this.

It's the best.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

Thermopyle posted:

Yes, this.

It's the best.

It really is.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Thermopyle posted:

Yes, this.

It's the best.

I don't know, Ctrl-B to go to wherever the thing your cursor is on is defined gives that a run for its money.

creatine
Jan 27, 2012




Quick question about Pandas or Numpy

I have a data file that contains text portions. One portion lists a CSV like structure defining an nxn table. Here is an example:

code:
11,PE-A,ECD-A,PE-Cy7-A,BV421-A,V500-A,BV605-A,BV650-A,BV711-A,BUV395-A,APC-A,APC-H7-A,1,0.3336454407055869,0.009828734546151432,0.0006011944357393652,0.022093971258161094,0.06740915071146864,0.016382603604436784,0.0064628623421644745,0,0.00032404021561245746,0,
0.15902629421847203,1,0.05756746361951867,0.0001331091623752106,0.0023294177951101156,0.12672034696736553,0.04652181110379498,0.02469183353120362,0,0.001434897167575553,0.000111128202093519,
0.050715491139849106,0.018323457619654312,1,0.001195314958685107,0.001349549808564774,0.0026605414444367966,0.0006940541255730611,0.0011953158020868615,0.00020375886573211922,
0.0004322805490562307,0.028694651680494655,0,0,0,1,0.17731983501175327,0.0054721155489870135,0.0008732098543392541,0.0005239259839100361,0.00020508397283219975,0.00005020302578245126,0,0.008719077940787832,
0.007594034484060056,0.0025313444593176274,0.22534767329261476,1,0.10429835975495191,0.027812898884272897,0.012642226765578224,0.011969514667189592,0,0.0002725613333625084,0.008716732526528536,
0.04082837169675032,0.0035552783240474463,0.12150979047200396,0.012778652512824788,1,0.47095249938495604,0.20941228625855562,0.000198318636806206,0.0007282022542095874,0,0.0012061101786162202,
0.0024792262934412554,0.001925389735514362,0.13779082391988126,0.011746103541065454,0.1433250445643598,1,0.4676605543481979,0.00009947263767646321,0.08301781548363452,0.006101701335871083,
0.00009066359448013661,0,0.06282845234908074,0.24099639344440216,0.0242983036274861,0.00193571385751263,0.02047781593847467,1,0.00004486404743690177,0.0032508043512414378,0.06338785455063425,
0,0,0,0.029791030842123773,0.0028081681100730633,0.0002441887914041609,0,0.0002441887914041744,1,0,0,0.00008568944676501405,0.00008568944676501475,0.0016254489867899681,0.0020220950017631333,
0.00019257970609806575,0.0006740309932765525,0.07953568888358856,0.0325460809471753,0.0005936422972606287,1,0.05816368394533759,0.001977622748417514,0.0010547319423763063,0.04745886026027305,
0.011259538641065218,0.004296399518681803,0.0010370625785491826,0.02725992856218971,0.01407441928303391,0,0.2556556067176683,1
The first element is how many columns and rows there are, followed by the headers for those columns and rows. Then the values are comma delimited. What would be the best way to populate a formatted pandas table with this data? Would the readcsv() function be enough?

edit: creating a numpy array first then converting to dataframe worked well enough.

creatine fucked around with this message at 14:57 on Mar 12, 2018

Jose Cuervo
Aug 25, 2004
Here is my folder structure:

code:
/scripts
	input_generation.py
/output
	/run_1
		/code
			input_generation.py
test_script.py
In test_script.py, when I write
Python code:
import input_generation as ig
I would like to import output/run_1/code/input_generation.py and not scripts/input_generation.py. Is there a way of achieving this?

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
You'll want to put an __init__.py everywhere so can import your folders as packages

which will allow you to do this

Python code:
import output.run_1.code.input_generation as ig

Jose Cuervo
Aug 25, 2004

Dr Subterfuge posted:

You'll want to put an __init__.py everywhere so can import your folders as packages

which will allow you to do this

Python code:
import output.run_1.code.input_generation as ig

Thanks.

Am I correct in thinking that with the following structure and __init__.py placement:

code:
/scripts
	input_generation.py
/output
	__init__.py
	/run_1
		__init__.py
		/code
			__init__.py
			input_generation.py
test_script.py
the statement
Python code:
import input_generation as ig
would use scripts/input_generation.py, while
Python code:
import output.run_1.code.input_generation as ig
would use output/run_1/code/input_generation.py?

But with the following structure and __init__.py placement:

code:
/scripts
	__init__.py
	input_generation.py
/output
	__init__.py
	/run_1
		__init__.py
		/code
			__init__.py
			input_generation.py
test_script.py
the statement
Python code:
import input_generation as ig
would throw an error?

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
It's not clear to me that

code:
import input_generation as ig
would work even in your first case, unless maybe /scripts is already in your sys.path somehow? Adding __init__.py to other subfolders shouldn't change that behavior one way or another.

This will work with an __init__.py in /scripts though:

code:
import scripts.input_generation as ig
E: added link to sys.path and /scripts import

Dr Subterfuge fucked around with this message at 22:26 on Mar 14, 2018

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

Jose Cuervo posted:

Thanks.

Am I correct in thinking that with the following structure and __init__.py placement:

...

You're incorrect in your thinking, so that's why it's not working.

In this structure:

code:
/scripts
	__init__.py
	input_generation.py
/output
	__init__.py
	/run_1
		__init__.py
		/code
			__init__.py
			input_generation.py
test_script.py
You have a Python package called scripts, which contains the Python module input_generation. From your test script, you would be able to do this:

code:
import scripts.input_generation as ig
In order to do the following:

code:
import input_generation as ig
Then you would need to rename the scripts directory to input_generation, and you'd also need to add the following to that __init__.py file:

code:
# input_generation/__init__.py
from .input_generation import *
You also might have to rename input_generation.py to something else but I'm not sure. I'm not sure if Python would care that the package (directory with a __init__.py file) and the module (the Python file itself) have the same name or not.

This part that you wrote is fine:

code:
import output.run_1.code.input_generation as ig
I hope that makes sense.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
__init__.py is basically a flag that tells python the folder that contains it is a package that can be imported. It can optionally contain some code to do some setup work (but it's going to be called every time you import anything in that package, so things should only go there if they really are general for the whole package). Python files can also be imported directly and are known as modules.

The lookup behavior for python is determined by an environment variable called PYTHONPATH, which is basically a list of places on your system to look through when python reaches an import statement, plus the current directory (where you launched your script). The search precedence starts with your current directory and works down the list in PYTHONPATH. The search stops as soon as it finds something, so if you have a local module called math, python will import that instead of the default one because your current directory takes precedence over everything else. Any module that is in a folder pointed to in your list of search paths can be imported directly, which means, for example, you if you have main.py in the same folder as foo.py, you can just import foo from main like this:

code:
import foo
Python stores its list of search paths at runtime in a variable in the sys module called path, which you can see like this:

code:
import sys

print(sys.path)
Python is actually looking through sys.path when doing the imports, so you can change where your script searches and when to your heart's content. Note that just because you can doesn't mean that you should, so it's best to work with the default behavior unless you have a good reason not to.

Dr Subterfuge fucked around with this message at 19:32 on Mar 15, 2018

Jose Cuervo
Aug 25, 2004

Boris Galerkin posted:

Helpful explanation.

Got it. Dr Subterfuge's suggestion of adding the scripts folder to the path does make what I posted work, and seems much simpler than having to rename the scripts folder.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

Jose Cuervo posted:

Got it. Dr Subterfuge's suggestion of adding the scripts folder to the path does make what I posted work, and seems much simpler than having to rename the scripts folder.

Yeah appending to sys.path works too but it’s more of a bandaid solution. I’m not saying it’s wrong to do it or anything but just something to keep in mind.

Also you coulda just kept scripts unchanged and do “from scripts import mymodule as ig” as well, didn’t need to rename the directory.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Jose Cuervo posted:

Got it. Dr Subterfuge's suggestion of adding the scripts folder to the path does make what I posted work, and seems much simpler than having to rename the scripts folder.

Be careful. Editing the python path is kinda/sorta of a hacky way to handle the situation. I mean, sometimes you need to do it because you need to preserve an API for legacy reasons or something.

You're better off just organizing your poo poo correctly.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
Yeah I mostly just trying to present a more complete picture of what was going on. The simplest way is making /scripts into a package and... not doing whatever it is you think is making editing sys.path necessary.

creatine
Jan 27, 2012




Question about matrix and vector multiplication:

I am working on a small project for work that reads in some scientific data from a flow cytometry experiment. The basic layout of the data is that I have m number of columns which are markers of interest and n number of rows which represent individual cells/events. I have to apply compensation to certain markers based on a given spill over matrix.

The general format for the compensation is this:



And I was using this to help me understand the math a bit more:



So far I can successfully get the inverse of the spillover matrix and select the markers needed for compensation. But, my question is: what would the most efficient way to do the calculations needed? I am hoping there's an easy to use numpy function that can do this so I don't have to iterate through every row and manually do the calculations.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

creatine posted:

Question about matrix and vector multiplication:

I am working on a small project for work that reads in some scientific data from a flow cytometry experiment. The basic layout of the data is that I have m number of columns which are markers of interest and n number of rows which represent individual cells/events. I have to apply compensation to certain markers based on a given spill over matrix.

The general format for the compensation is this:



And I was using this to help me understand the math a bit more:



So far I can successfully get the inverse of the spillover matrix and select the markers needed for compensation. But, my question is: what would the most efficient way to do the calculations needed? I am hoping there's an easy to use numpy function that can do this so I don't have to iterate through every row and manually do the calculations.

You can read matrices into 2D numpy arrays (check that it’s in the proper “orientation”).

Inverse of a matrix is provided by numpy.linalg.inv(M) and transpose of a matrix is built into the 2D array class with the “.T” property, eg M.T.

code:
from numpy.linalg import inv

M = some_matrix
N = inv(M).T
e: Maybe I misread your post. I don’t see what calculations you’re taking about.

creatine
Jan 27, 2012




Boris Galerkin posted:

You can read matrices into 2D numpy arrays (check that it’s in the proper “orientation”).

Inverse of a matrix is provided by numpy.linalg.inv(M) and transpose of a matrix is built into the 2D array class with the “.T” property, eg M.T.

code:
from numpy.linalg import inv

M = some_matrix
N = inv(M).T
e: Maybe I misread your post. I don’t see what calculations you’re taking about.

Sorry I wasn't clear. In second image at the bottom it tells you how to calculate FITCtrue. That's what I'm looking to do for every item in the uncompensated data column. So if i feed in a column of (1000,) and have the compensation vector that is (1,11) I am looking to get a return vector that maintains the (1000,) shape but with new values based on the true equation from above. Right now I'm going to try the dot function in pandas since that's what my datasets are currently read into.

Edit: basically I have a column vector of values:

[50
60
70]

Then I have a row vector of other values
[1.005, -0.004, 0]

And what I need to do is get the sum of the multiplication of each item in column * row so like:

[(50*1.005)+(50*-0.004)+(50*0)
Then the same for 60 and 70 and I should get a 3,1 vector as a result

creatine fucked around with this message at 00:14 on Mar 17, 2018

Dominoes
Sep 20, 2007

Python: fly with two words. Rust: Try to manipulate strings and end up with cows borrowed from strange places. :(

Sad Panda
Sep 22, 2004

I'm a Sad Panda.
I was learning some Comp Sci and that involved learning about linear search, binary search, bubble sort, insertion sort and merge sort. I decided that the best way to make sure that I understand them was to code them.

I'm still to write the merge sort, but the other ones work. While the insertion sort is super fast, the bubble sort is certainly not. Could someone point out the glaring mistake in my code? The first is my insertion sort split into two functions.

Python code:
def insertion_sort(list):
    # Create a sorted list and an unsorted list
    attempts = 0
    sorted_list = []
    unsorted_list  = list
    print(f"At the start - sorted {sorted_list}")
    print(f"At the start - unsorted {unsorted_list}")

    # Move the items 1 at a time from the unsorted list into the sorted list
    for i in range (0, len(unsorted_list)):
        sorted_list, attempts = insert_to_new_list(sorted_list, unsorted_list.pop(0), attempts)
    print(f"After {attempts} attempts - {sorted_list}")
    return sorted_list

def insert_to_new_list(list, value, attempts):
#    print(f"Called to try to add {value}")
    inserted = False
    if len(list) == 0:
        list.append(value)
    else:
        # Before inserting each item, position it so it is 'in order'
        for j in range(0, len(list)):
            attempts += 1
            if value <= list[j]:
#                print("Value is smaller or equal")
                list.insert(j, value)
                inserted = True
                break
            else:
#               print("Value is bigger")
                pass
        if inserted == False:
            list.insert(j+1, value)
    return list, attempts
Python code:
def bubble_sort(list):
    print(f"Original list is {list}")
    consecutive = 0
    attempts = 0
    # Check each item, if list[i] > list[i+1] swap them else leave them
    temp = 0
    for j in range(0, len(list)):
        for i in range(0, len(list)-1):
            attempts += 1
            print(i)
            if list[i] > list[i+1]:
                temp = list[i+1]
                list[i+1] = list[i]
                list[i] = temp
                print(f"Need to swap {list[i]} and {list[i+1]}")
                print(list)
                consecutive = 0
            else:
                print("Don't need to swap")
                consecutive += 1
        print(f"End of pass {j}")
        if consecutive >= len(list):
            print("A whole pass has happened with no swaps")
            break
    print(f"After {attempts} attempts...")
    print(list)
    return list
It could print a bit less, but that's just there as debug at the minute. That and it's fun to watch things bubble along.

Sad Panda fucked around with this message at 00:34 on Mar 18, 2018

huhu
Feb 24, 2006
I don't know much about sorting but I do know things tend to run a lot faster when you're not printing a bunch of stuff out.

Symbolic Butt
Mar 22, 2009

(_!_)
Buglord

Sad Panda posted:

Could someone point out the glaring mistake in my code?

This:

Sad Panda posted:

Python code:
    unsorted_list  = list

Eela6
May 25, 2007
Shredded Hen

ed, nm, answering the wrong question.

PS: you shouldn't pop things from the front of a list. If you must pop from the front, use collections.deque; if the order doesn't matter, just use pop() (without arguments) or iterate through.

A couple other bits of python niceties you could use:

the enumerate creates an index for the collection you're looping through. EG:
Python code:
for i, c in enumerate(['g','o','o','n']):
    print(f'({i}, {c})', end="\t")
pre:
(0, g)  (1, o)  (2, o)  (3, n)
A more 'pythonic' take on your insertion sort code would look like this:
Python code:
def insertion_sort(unsorted): #use a more descriptive name for variable argument rather than aliasing 'list'
    def insert(sorted, new_value):
        # no need for bounds check; this loop is a no-op if len(a) == 0
        for j, v in enumerate(a):
            if new_value <= v:
                sorted.insert(j, new_value)
                return a 
       
        a.append(new_value) # v is bigger than every element in a (this also covers the empty list)
        return a

    sorted = []
    for v in unsorted:
        sorted = insert(sorted, v)
    return sorted

Eela6 fucked around with this message at 05:46 on Mar 18, 2018

Sad Panda
Sep 22, 2004

I'm a Sad Panda.

Eela6 posted:

ed, nm, answering the wrong question.

PS: you shouldn't pop things from the front of a list. If you must pop from the front, use collections.deque; if the order doesn't matter, just use pop() (without arguments) or iterate through.

A couple other bits of python niceties you could use:

the enumerate creates an index for the collection you're looping through. EG:
Python code:
for i, c in enumerate(['g','o','o','n']):
    print(f'({i}, {c})', end="\t")
pre:
(0, g)  (1, o)  (2, o)  (3, n)
A more 'pythonic' take on your insertion sort code would look like this:
Python code:
def insertion_sort(unsorted): #use a more descriptive name for variable argument rather than aliasing 'list'
    def insert(sorted, new_value):
        # no need for bounds check; this loop is a no-op if len(a) == 0
        for j, v in enumerate(a):
            if new_value <= v:
                sorted.insert(j, new_value)
                return a 
       
        a.append(new_value) # v is bigger than every element in a (this also covers the empty list)
        return a

    sorted = []
    for v in unsorted:
        sorted = insert(sorted, v)
    return sorted

Thanks, I'll look at deque. The explanation of an insertion sort that I saw basically said the way it works is.

You get a set of data,
[3, 2, 8 , 4, 1, 9, 43]

You split it in 2, a sorted list (starts empty) and an unsorted list (starts as the whole of your original list).
[] [3, 2, 8 , 4, 1, 9, 43]

Iterate through the unsorted list, taking the first value and adding it to the sorted list in the correct position to ensure that it is still sorted.
[3] [2, 8 , 4, 1, 9, 43]
[2, 3] [8 , 4, 1, 9, 43]
[2, 3, 8] [4, 1, 9, 43]

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
Say I have two dataframes:
df1
code:
      A    B 
0    'a'   NaN   
1    'b'   NaN   
2    'c'   NaN  
3    'd'   NaN
and df2:
code:
      A    B 
0    'a'  'one'   
1    'c'  'two'
2    'd'  'three'
How do I replace the column B values in df1 with df2?
code:
      A    B 
0    'a'  'one'   
1    'b'   NaN
2    'c'  'two'  
3    'd'  'three'  
This seems to work as a solution to the example, but it feels like I'm just smashing pieces together until something works.
Python code:
df1 = pd.DataFrame({'A': ['a', 'b', 'c', 'd'],
                    'B': ['.']*4},
                    index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['a', 'c', 'd'],
                    'B': ['one', 'two', 'three']},
                    index=[0, 1, 2])
df1.set_index('A', drop=False, inplace=True)
df2.set_index('A', inplace=True)
df3 = pd.concat([df1, df2], axis=1)
df4 = df3.iloc[:,[0, 2]]
print(df4)
It doesn't preserve the initial indexing, but that's recoverable.

Dr Subterfuge fucked around with this message at 10:31 on Mar 18, 2018

Cingulate
Oct 23, 2012

by Fluffdaddy
You mean like this?

code:
df1 = pd.DataFrame({'A': ['a', 'b', 'c', 'd'],
                    'B': ['.']*4},
                    index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['a', 'c', 'd'],
                    'B': ['one', 'two', 'three']},
                    index=[0, 1, 2])
df1b = df1.set_index("A")
df1b["B"] = df2.set_index("A")["B"]

print(df1b)
       B
A       
a    one
b    NaN
c    two
d  three
(The assignment aligns on index.)

vikingstrike
Sep 23, 2007

whats happening, captain
That’s a one liner with pd.merge().

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

Cingulate posted:

You mean like this?

code:
df1 = pd.DataFrame({'A': ['a', 'b', 'c', 'd'],
                    'B': ['.']*4},
                    index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['a', 'c', 'd'],
                    'B': ['one', 'two', 'three']},
                    index=[0, 1, 2])
df1b = df1.set_index("A")
df1b["B"] = df2.set_index("A")["B"]

print(df1b)
       B
A       
a    one
b    NaN
c    two
d  three
(The assignment aligns on index.)

That feels much better. Somehow I was fixated on directly building the df I wanted instead of relying on col assignment. On the other hand


vikingstrike posted:

That's a one liner with pd.merge().

Oh hell. So it is.

Python code:
df1['B'] = df1.merge(df2, how='left', on=['A'])['B_y']
Thanks to you both. I need to get better at joins.

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Sad Panda posted:

Python code:
def bubble_sort(list):
    print(f"Original list is {list}")
    consecutive = 0
    attempts = 0
    # Check each item, if list[i] > list[i+1] swap them else leave them
    temp = 0
    for j in range(0, len(list)):
        for i in range(0, len(list)-1):
            attempts += 1
            print(i)
            if list[i] > list[i+1]:
                temp = list[i+1]
                list[i+1] = list[i]
                list[i] = temp
                print(f"Need to swap {list[i]} and {list[i+1]}")
                print(list)
                consecutive = 0
            else:
                print("Don't need to swap")
                consecutive += 1
        print(f"End of pass {j}")
        if consecutive >= len(list):
            print("A whole pass has happened with no swaps")
            break
    print(f"After {attempts} attempts...")
    print(list)
    return list

Not that it really matters, but the way you're checking if no swaps happened can cause an extra pass to occur. In a list of n items you're doing n-1 comparisons each pass, but you're looking for a chain of n or more comparisons that didn't cause a swap. So with a list like 5,1,2,3,4 the 5 will bubble up on the first pass, and now the list is sorted. But you want 5 swapless comparisons, and so far you have 0, so you do another pass, and now you have a run of 4. Gotta do it another time! Now you have 8

You could change the comparison operator, but really you can do this all a lot simpler - on each pass, set a 'swapped' flag to false. If you do a swap, set it to true. At the end of the pass, if swapped is still false, you're done!

You can also do smarter stuff with the looping, based on the idea that the largest item always bubbles to the right so the list gradually gets sorted from right to left (so you don't need to check those end items anymore) but bubble sort isn't very efficient anyway - that's probably why it seems slow, it is!



While we're on this, this is the best one of those visualisation videos I've seen

https://www.youtube.com/watch?v=sYd_-pAfbBw

The hue is where the dot should be in the array, and the closeness to the centre is how out of position it is. As each one gets sorted it flies out to its position on the circle edge. Some wild stuff happens in there :eyepop:

baka kaba fucked around with this message at 17:45 on Mar 18, 2018

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL

baka kaba posted:

While we're on this, this is the best one of those visualisation videos I've seen

https://www.youtube.com/watch?v=sYd_-pAfbBw

The hue is where the dot should be in the array, and the closeness to the centre is how out of position it is. As each one gets sorted it flies out to its position on the circle edge. Some wild stuff happens in there :eyepop:

Holy poo poo Radix is amazing.

Adbot
ADBOT LOVES YOU

tef
May 30, 2004

-> some l-system crap ->

Sad Panda posted:

I was learning some Comp Sci and that involved learning about linear search, binary search, bubble sort, insertion sort and merge sort. I decided that the best way to make sure that I understand them was to code them.

It's worth pointing out that many of these algorithms operate on *fixed length arrays* not *python lists*. When you're calling pop, insert, append, you're changing the length of the list, as well as changing the list. With these old algorithms, you probably want to stick to arr[0] = .....

Or maybe, build a new list entirely. The code always gets a bit tricky when you're using one data structure in place of two:

For example: this operates like an insert sort (find smallest, put at end of new list), but it doesn't re-use the front old list to store the values

code:
arr = []
while unsorted:
    smallest = min(unsorted)
    idx = unsorted.index(smallest)
    arr.append(unsorted.pop(idx))
When you're looking at these algorithms, you should try and think about the strategy (over arching goal) or tactics (actions towards goal) they take.

Insertion sort divides the list into two pieces, the smallest element and all elements larger than it. Bubble sort tries to make the list slightly less unsorted with each pass, moving elements to the right position. Both of these strategies can be adapted to build other sorting algorithms, and sometimes the tactics from the 60's and 70's aren't as valid today

You could make different tactics: like keeping the min, max as you scan through the list, and build up a new list from either end. Or perhaps change the strategy: break the list into (smaller than, bigger than) rather than (min, bigger than). Then sort either half.

You can start to develop a feel for different ways to sort structures. It might be worth looking at how python does it too, it uses a wonderful trick: Break the list into pre-sorted chunks and merge them. It's a little of both worlds: Scanning through a list to find out of place elements, and breaking the list down into smaller pieces to operate on.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply