This is a rather old note. Now the better way to learn python is to find a problem you want to solve, and ask chatGPT about it.
Documentations:
- Python: https://docs.python.org/3/
- Numpy: https://numpy.org/doc/stable/
- Pandas: https://pandas.pydata.org/docs/
Python built-in
Functions and callables
- len(obj)return the number of items
- type(obj)return the type of an object
- dir(obj)return list of name in the current scope
- sum(iterable, /, start=0)return summation of iterables
- max/min(iterable, *[, key, default])return maximum/minimum number of iterables
- pow(base, exp[, mod])return base to the power exp, mod makes a simpler way to say pow(base, exp) % mod
- range(stop)or- range(start, stop[, step])range is an immutable sequence type
- sorted(iterable, /, *, key=None, reverse=False)returns sorted iterables according to the key (function)
- //floor div;- %mod;- **exp;- abs(num)absolute value
Containers
- 
list[1, 2, 3, ‘a’]Indexed by integers,items are stored as stack (first in, last out) Operations: s[i:j:k];s +tconcatenation;s * numadding s to itself num times;s.index(x, i, j)index of x first occurrence of x in s in range i to j;x in scheck if x is in s;s.count(x);del s[i:j]same ass[i:j] = [];s.clear();s.copy()shallow copy;s.extend();s.append();s.insert(index, x),s.pop(index),s.remove(value);s.reverse()filter
- 
tuple(1, 2, 3, ‘a’)Similar to list, but objects are not mutable 
- 
set{1, 2, 3, ‘a’}Objects in a set are unordered and unique, used for quick search. Set is implemented as a hash table, which supports lookup/insert/delete in O(1) time complexity. 
- 
dictionary{ ‘key1’ : 1, ‘key2’: 2, ‘key3’: 3 }In older version of Python, dictionary is unordered, in python versions after 3.6, dictionary remembers the order of the key insertion Objects are key-value pairs, the keys are unique, immutable and can have different types. The values are mutable and can have different types. Operations: d[key];key in dcheck if key exists;d.get(key[, default=None])returns default if key doesn’t exist;d.keys(), d.values(), d.items()return iterables of keys, values and (key, val) paris;iter(d)ierate over keys;list(d)return list of all keys;del d[key];d.pop(key[, default]);d.clear();d.copy()shallow copy;d1 | d2return merged dict;
Strings, lists, and tuples are sequence types that can use the +, *, +=, and *= operators.
Strings
- count(substring[, start[, end]])return counts of substring within start and end range
- find(sub[, start[, end]])return the lowest index of the substring within start and end range, return -1 if sub not found,- rfindreturn the highest index.
- join(iterable)return str concatenated with strings in iterable
- strip([chars])when omitting argument, remove the white space (including \n) at the beginning and end of a string, when specifying chars, remove all combination of the chars,- lstrip()and- rstrip()only remove white space from left or right.
- split(sep=None, maxsplit=- 1)return a list of words in str, split by sep, when sep is None, runs of consecutive whitespace are regarded as a single separator
- replace(old, new[, count])return str after replacing old with new for count times.
- maketrans()make a map for translate() to substitute or remove a set of characters, you can pass 0, 2 or 3 argument.- 1 argument: has to be dictionary, the keys (character or int ASCII number) will be replaced by value (character or int)
- 2 arguments: both need to be string and need to have the same length, each char in string1 will be replace by the corresponding char in str2
- 3 arguments: characters in the third string will be deleted, string 1&2 will do the same replacement
- the map has to be passed into string.translate(transmap) to function
 
- upper(), lower(), capitalize(), title()return all upper, all lower, first letter cap, and first letter cap for each word.
Class
class classname:
    def __init__(self, p1, p2, p3): #constructor
        self.property1 =  p1
        self.property2 =  p2
    def fuc1(self, arg1): #functions
    def fuc2(self, arg1, arg2, arg3, ...): Miscellaneous
- 
List, set and dictionary comprehension squarelist = [x ** 2 for x in range(5) if x%2==0] squareset = {x**2 for x in range(5)} squaredic = {x : x ** 2 for x in range(5)}comprehension can be nested [[expression for j in range(nj)] for i in range(ni)]
- 
Map: list(map(float,[1,2,3]))same as[float(i) for i in [1,2,3]].
- 
Lambda function: an anonymous inline function consisting of a single expression. lambda [parameters]: expression
- 
*a_listunpack a list into separate positional arguments. e.g.func(*[1, 2])same asfunc(1, 2)
- 
File I/O open(file, mode='r', buffering=- 1, encoding=None, errors=None, newline=None, closefd=True, opener=None)- read data = [line.strip() for line in open('data.txt')]or usewith open(file) as f:, which closes the file after.
- write to file with open("data.txt",'a',newline='\n') as f: f.write(data)
- pickle and JSON
 
- read 
Numpy & ndarray (N-dimensional array)
np.info() for help
Init
- np.array(a_list, dtype=datatype)
- np.arange([start, ]stop, [step, ]dtype=None, *, like=None)
- np.linspace(start, stop, num=50, endpoint=True, dtype=None, axis=0)
- np.random.random(d0, d1, ..., dn)Create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1).
- np.zeros(shape)shape is int or tuple like (nrow, ncol)
- np.ones(shape)
- numpy.full(shape, fill_value, dtype=None, order='C', *, like=None)
- np.eye(ndim)or- np.identity(ndim)n-dimension identity matrix
- np.empty(shape)
Attributes
- arr.shapeshape as tuple
- arr.ndimn-dimension (like 1d, 2d or 3d arrays)
- arr.sizenumber of elements
- arr.dtypeand- arr.dtype.name
Slicing and searching
- arr[start:stop:step]applies to muliti dimenions- matrix[start:stop:step, start:stop:step],- :selects all
- arr[::-1]is the same as- np.flip(arr, axis=None), reverse the array
- arr[[0, 1, 3]]return array with elements at index 0, 1 and 3.
- arr[boolean list]return array with elements at- Truepositions.- arr[arr<1]selects all elements smaller than 1
- numpy.where(condition, [x, y, ]/)
- numpy.extract(condition, arr)[source]
- np.ndindex(nrow, ncol)return zipped iterables row_index, col_index
Operations
- arr.reshape(shape)return without changing arr
- arr.resize(shape)change arr directly, no return
- arr.ravel(),- arr.flatten()and- arr.reshape((-1,)), flatten returns a copy, reshape returns a view, ravel returns a view of the original array whenever possible.
- np.concatenate((a1, a2, ...), axis=0),- np.vstack((a1, a2))or- np.r_[a1, a2]stack row wise,- np.hstackor- np.c_[a1, a2]stack column wise,- np.dstackare special cases, stacking arrays along axis = 0, 1, 2
- np.vsplit(arr, obj), hsplit()Split the array vertically or horizontally. obj : index or slice
- np.append(arr, values, axis=None)return a copy of appended array
- np.insert(arr, obj, values, axis=None)obj : index, slice or sequence of ints
- np.delete(arr, obj, axis=None)return new array
- arr.copy()deep copy
- arr.Tor- np.transpose(arr)return a copy of the transposed array.
Math functions
- + - * /array arithmetic
- np.mod(), exp(), log(), sqrt(), mean(), maximum(), minimum(), round(), floor(), ceil(), sin(), cos(), ...,full list
- A @ Bmatrix multiplication
- np.dot(), vdot(), cross()doc product and cross product
- np.linalg.qr(), linalg.svd()matrix decomposition
- np.linalg.eig()return eigenvalues and right eigenvectors
Random Generator
rng = np.random.default_rng()
- rng.random(size=None)uniform random number from half-open interval [0.0, 1.0), customize range:- rng.uniform(low=0.0, high=1.0, size=None)
- rng.normal(mean=0.0, std=1.0, size=None)normal (Gaussian) distribution,- rng.standard_normal(size)is a special case where mean=0 and std = 1
- rng.binomial(n, p, size=None);- random.Generator.poisson(lam=1.0, size=None)
Sorting and counting
- arr.sort(axis=- 1, kind=None, order=None)sort array in place.- np.sort(arr)return a copy
- arr.argsort()or- np.argsort(arr)return indices;- argmin(),- argmax(),- np.argwhere()
- numpy.unique(ar, return_index=False, return_inverse=False, return_counts=False, axis=None, *, equal_nan=True)return array of sorted unique elements, if index or counts are true, return zipped items array, index/counts
Numpy dtypes and converting
- np.int_==- np.int64(code ‘l’) and- np.unit==- np.uint64(‘L’) unsigned; Other size- np.(u)int8/16/32/64
- np.float_==- np.double==- np.float64(‘d’); Other size- np.float16/32/64/128
- np.complex_``np.cdouble==- np.complex128(‘D’) Complex number contain 2 64-bit-precision floating-point numbers.
- np.bool_==- np.bool8(code ’?‘)
- np.str_==- np.unicode_(‘U)
- np.datetime64(‘M’) and- timedelta64(‘m’)
- np.object_(‘O’)
- arr.astype(dtype, order='K', casting='unsafe', subok=True, copy=True)
- arr.tolist();- arr.tostring()
File I/O
- Input numpy.genfromtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=" !#$%&'()*+, -./:;<=>?@[\\]^{|}~", replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True, max_rows=None, encoding='bytes', *, ndmin=0, like=None)
- Output numpy.savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ', encoding=None)
- arr.tofile(fid, sep='', format='%s')fid: filename or an open file object
- np.savez(file, *args, **kwds)*args: arr1, arr2, …; Save several arrays into a single file in uncompressed .npz format. access :- np.load(npzfile)
- numpy.save(file, arr, allow_pickle=True, fix_imports=True)save one arr to a .npy file.
Pandas & DataFrame
Pandas DataFrame is essentially a spreadsheet / 2D array. Each column is a pandas Series.
Init
- pd.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)data can be ndarray, dict, iterable or DataFrame (copy=False for df).
- pd.read_csv(file)options:
- sep = ','same as delimiter,- headerint, list of int, None;- namescolumn names;- index_col;- usecolsspecify cols (by list of name or index) to keep;- skiprows;- skipfooter;- nrowsnum of rows to read;- delim_whitespace=Falseand other options
- pd.read_excel(file),- pd.read_table(filename),- pd.read_sql(query, connection_object),- pd.read_json(json_string),- pd.read_html(url)
Attributes
- df.indexreturn row index names as padas RangeIndex object,- df.index.to_numpy()or- to_list()convert to ndarray or list
- df.columnsreturn column name as Index object- df.keys()
- df.dtypes
- df.valuesreturn ndarray of the spread sheet
- df.shapereturn tuple of (nrow, ncol)
- df.sizereturn number of elements/cells
- df.axesreturn a list [row index obj, columns]
- df.ndim
- df.emptyreturn bool of whether df is empty
Indexing and slicing
- df.at([row_label, col_label])row labels are int by default, col labels are often strings;- df.iat([row_index,col_index])
- df.loc[]access a group of rows and columns by label(s) or a boolean array. Int numbers will be interpreted as a label and never the index position. Bool array needs to be the same length as the axis being sliced.
- df.iloc[]similar to loc[], but arguments are indices instead of labels.- df.iloc[0]slice first row,- df.iloc[:, 0]first column,- df.iloc[0,0]first cell. Note that- df.iloc[0]return series and- df.iloc[[0]]return dataframe.
Data Inspection
- df.describe(percentiles=[.25,.5,.75])for each column, display counts, min, max, std and percentiles
- df.info()show index, dtype and memory
- df.head(n)and- df.tail(n)return the first and last n rows.
- s.value_counts(dropna=Fase)unique values and counts for series- df.value_count()counts unique rows
- df.count(axis=0, level=None, numeric_only=False)count non-NA cells for each col or row,- s.count()non-NA elements in series
Operations
- df[new_col]=valuesadd a new column
- df.insert(loc, col_name, value, allow_duplicates=False)insert a column at index loc.
- df.pop(col_name)return column and drop from frame. Raise KeyError if not found.
- df.drop(labels, axis=0, index, columns, level, inplace=False)drop specified labels from rows or columns.- axis=0drop index,- axis=1drop columns
- pd.concat(objs, axis=0, join='outer', ignore_index=False, sort=False, copy=True)concatenate pandas objects along a particular axis.
- df.pivot(index=None, columns=None, values=None)return reshaped df organized by given index / column values.
- pd.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)create a spreadsheet-style pivot table as df.
- df.groupby(by, axis=0, as_index=True, sort=True, dropna=True)group df with columns or mapping function
- df.apply(func, axis=0, raw=False). e.g.- df.apply(pd.Series.value_counts)return unique value and counts for all columns
- df.copy()deep copy
- df.dropna()- df.fillna(val)
Iterations
- for i in objproduce values if obj is Series, and column labels if obj is df
- for index, Series in df.iterrows()Iterating through pandas objects is generally slow, and you should never modify the df, because depending on the data types, the iterator may return a copy and not a view.
- df.items()act like- dict.items()iterates through key-value (label-series) pairs.
Math functions
- df.max(), min(), mean(), median(), std(), corr()std calculates variance, corr calculates correlation
File I/O
- pandas has a set of readerandwriterfunctions such aspd.read_csv()anddf.to_csv(), the official guide is here