This is a rather old note. Now the better way to learn python is to find a problem you want to solve, and ask chatGPT about it.
Documentations:
- Python: https://docs.python.org/3/
- Numpy: https://numpy.org/doc/stable/
- Pandas: https://pandas.pydata.org/docs/
Python built-in
Functions and callables
len(obj)
return the number of itemstype(obj)
return the type of an objectdir(obj)
return list of name in the current scopesum(iterable, /, start=0)
return summation of iterablesmax/min(iterable, *[, key, default])
return maximum/minimum number of iterablespow(base, exp[, mod])
return base to the power exp, mod makes a simpler way to say pow(base, exp) % modrange(stop)
orrange(start, stop[, step])
range is an immutable sequence typesorted(iterable, /, *, key=None, reverse=False)
returns sorted iterables according to the key (function)//
floor div;%
mod;**
exp;abs(num)
absolute value
Containers
-
list
[1, 2, 3, ‘a’]Indexed by integers,items are stored as stack (first in, last out)
Operations:
s[i:j:k]
;s +t
concatenation;s * num
adding s to itself num times;s.index(x, i, j)
index of x first occurrence of x in s in range i to j;x in s
check if x is in s;s.count(x)
;del s[i:j]
same ass[i:j] = []
;s.clear()
;s.copy()
shallow copy;s.extend()
;s.append()
;s.insert(index, x)
,s.pop(index)
,s.remove(value)
;s.reverse()
filter -
tuple
(1, 2, 3, ‘a’)Similar to list, but objects are not mutable
-
set
{1, 2, 3, ‘a’}Objects in a set are unordered and unique, used for quick search. Set is implemented as a hash table, which supports lookup/insert/delete in O(1) time complexity.
-
dictionary
{ ‘key1’ : 1, ‘key2’: 2, ‘key3’: 3 }In older version of Python, dictionary is unordered, in python versions after 3.6, dictionary remembers the order of the key insertion Objects are key-value pairs, the keys are unique, immutable and can have different types. The values are mutable and can have different types.
Operations:
d[key]
;key in d
check if key exists;d.get(key[, default=None])
returns default if key doesn’t exist;d.keys(), d.values(), d.items()
return iterables of keys, values and (key, val) paris;iter(d)
ierate over keys;list(d)
return list of all keys;del d[key]
;d.pop(key[, default])
;d.clear()
;d.copy()
shallow copy;d1 | d2
return merged dict;
Strings, lists, and tuples are sequence types that can use the +
, *
, +=
, and *=
operators.
Strings
count(substring[, start[, end]])
return counts of substring within start and end rangefind(sub[, start[, end]])
return the lowest index of the substring within start and end range, return -1 if sub not found,rfind
return the highest index.join(iterable)
return str concatenated with strings in iterablestrip([chars])
when omitting argument, remove the white space (including \n) at the beginning and end of a string, when specifying chars, remove all combination of the chars,lstrip()
andrstrip()
only remove white space from left or right.split(sep=None, maxsplit=- 1)
return a list of words in str, split by sep, when sep is None, runs of consecutive whitespace are regarded as a single separatorreplace(old, new[, count])
return str after replacing old with new for count times.maketrans()
make a map for translate() to substitute or remove a set of characters, you can pass 0, 2 or 3 argument.- 1 argument: has to be dictionary, the keys (character or int ASCII number) will be replaced by value (character or int)
- 2 arguments: both need to be string and need to have the same length, each char in string1 will be replace by the corresponding char in str2
- 3 arguments: characters in the third string will be deleted, string 1&2 will do the same replacement
- the map has to be passed into string.translate(transmap) to function
upper(), lower(), capitalize(), title()
return all upper, all lower, first letter cap, and first letter cap for each word.
Class
Miscellaneous
-
List, set and dictionary comprehension
comprehension can be nested
[[expression for j in range(nj)] for i in range(ni)]
-
Map:
list(map(float,[1,2,3]))
same as[float(i) for i in [1,2,3]]
. -
Lambda function: an anonymous inline function consisting of a single expression.
lambda [parameters]: expression
-
*a_list
unpack a list into separate positional arguments. e.g.func(*[1, 2])
same asfunc(1, 2)
-
File I/O
open(file, mode='r', buffering=- 1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
- read
data = [line.strip() for line in open('data.txt')]
or usewith open(file) as f:
, which closes the file after. - write to file
with open("data.txt",'a',newline='\n') as f: f.write(data)
- pickle and JSON
- read
Numpy & ndarray (N-dimensional array)
np.info()
for help
Init
np.array(a_list, dtype=datatype)
np.arange([start, ]stop, [step, ]dtype=None, *, like=None)
np.linspace(start, stop, num=50, endpoint=True, dtype=None, axis=0)
np.random.random(d0, d1, ..., dn)
Create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1).np.zeros(shape)
shape is int or tuple like (nrow, ncol)np.ones(shape)
numpy.full(shape, fill_value, dtype=None, order='C', *, like=None)
np.eye(ndim)
ornp.identity(ndim)
n-dimension identity matrixnp.empty(shape)
Attributes
arr.shape
shape as tuplearr.ndim
n-dimension (like 1d, 2d or 3d arrays)arr.size
number of elementsarr.dtype
andarr.dtype.name
Slicing and searching
arr[start:stop:step]
applies to muliti dimenionsmatrix[start:stop:step, start:stop:step]
,:
selects allarr[::-1]
is the same asnp.flip(arr, axis=None)
, reverse the arrayarr[[0, 1, 3]]
return array with elements at index 0, 1 and 3.arr[boolean list]
return array with elements atTrue
positions.arr[arr<1]
selects all elements smaller than 1numpy.where(condition, [x, y, ]/)
numpy.extract(condition, arr)[source]
np.ndindex(nrow, ncol)
return zipped iterables row_index, col_index
Operations
arr.reshape(shape)
return without changing arrarr.resize(shape)
change arr directly, no returnarr.ravel()
,arr.flatten()
andarr.reshape((-1,))
, flatten returns a copy, reshape returns a view, ravel returns a view of the original array whenever possible.np.concatenate((a1, a2, ...), axis=0)
,np.vstack((a1, a2))
ornp.r_[a1, a2]
stack row wise,np.hstack
ornp.c_[a1, a2]
stack column wise,np.dstack
are special cases, stacking arrays along axis = 0, 1, 2np.vsplit(arr, obj), hsplit()
Split the array vertically or horizontally. obj : index or slicenp.append(arr, values, axis=None)
return a copy of appended arraynp.insert(arr, obj, values, axis=None)
obj : index, slice or sequence of intsnp.delete(arr, obj, axis=None)
return new arrayarr.copy()
deep copyarr.T
ornp.transpose(arr)
return a copy of the transposed array.
Math functions
+ - * /
array arithmeticnp.mod(), exp(), log(), sqrt(), mean(), maximum(), minimum(), round(), floor(), ceil(), sin(), cos(), ...,
full listA @ B
matrix multiplicationnp.dot(), vdot(), cross()
doc product and cross productnp.linalg.qr(), linalg.svd()
matrix decompositionnp.linalg.eig()
return eigenvalues and right eigenvectors
Random Generator
rng = np.random.default_rng()
rng.random(size=None)
uniform random number from half-open interval [0.0, 1.0), customize range:rng.uniform(low=0.0, high=1.0, size=None)
rng.normal(mean=0.0, std=1.0, size=None)
normal (Gaussian) distribution,rng.standard_normal(size)
is a special case where mean=0 and std = 1rng.binomial(n, p, size=None)
;random.Generator.poisson(lam=1.0, size=None)
Sorting and counting
arr.sort(axis=- 1, kind=None, order=None)
sort array in place.np.sort(arr)
return a copyarr.argsort()
ornp.argsort(arr)
return indices;argmin()
,argmax()
,np.argwhere()
numpy.unique(ar, return_index=False, return_inverse=False, return_counts=False, axis=None, *, equal_nan=True)
return array of sorted unique elements, if index or counts are true, return zipped items array, index/counts
Numpy dtypes and converting
np.int_
==np.int64
(code ‘l’) andnp.unit
==np.uint64
(‘L’) unsigned; Other sizenp.(u)int8/16/32/64
np.float_
==np.double
==np.float64
(‘d’); Other sizenp.float16/32/64/128
np.complex_``np.cdouble
==np.complex128
(‘D’) Complex number contain 2 64-bit-precision floating-point numbers.np.bool_
==np.bool8
(code ’?‘)np.str_
==np.unicode_
(‘U)np.datetime64
(‘M’) andtimedelta64
(‘m’)np.object_
(‘O’)arr.astype(dtype, order='K', casting='unsafe', subok=True, copy=True)
arr.tolist()
;arr.tostring()
File I/O
- Input
numpy.genfromtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=" !#$%&'()*+, -./:;<=>?@[\\]^{|}~", replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True, max_rows=None, encoding='bytes', *, ndmin=0, like=None)
- Output
numpy.savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ', encoding=None)
arr.tofile(fid, sep='', format='%s')
fid: filename or an open file objectnp.savez(file, *args, **kwds)
*args: arr1, arr2, …; Save several arrays into a single file in uncompressed .npz format. access :np.load(npzfile)
numpy.save(file, arr, allow_pickle=True, fix_imports=True)
save one arr to a .npy file.
Pandas & DataFrame
Pandas DataFrame is essentially a spreadsheet / 2D array. Each column is a pandas Series.
Init
pd.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)
data can be ndarray, dict, iterable or DataFrame (copy=False for df).pd.read_csv(file)
options:sep = ','
same as delimiter,header
int, list of int, None;names
column names;index_col
;usecols
specify cols (by list of name or index) to keep;skiprows
;skipfooter
;nrows
num of rows to read;delim_whitespace=False
and other optionspd.read_excel(file)
,pd.read_table(filename)
,pd.read_sql(query, connection_object)
,pd.read_json(json_string)
,pd.read_html(url)
Attributes
df.index
return row index names as padas RangeIndex object,df.index.to_numpy()
orto_list()
convert to ndarray or listdf.columns
return column name as Index objectdf.keys()
df.dtypes
df.values
return ndarray of the spread sheetdf.shape
return tuple of (nrow, ncol)df.size
return number of elements/cellsdf.axes
return a list [row index obj, columns]df.ndim
df.empty
return bool of whether df is empty
Indexing and slicing
df.at([row_label, col_label])
row labels are int by default, col labels are often strings;df.iat([row_index,col_index])
df.loc[]
access a group of rows and columns by label(s) or a boolean array. Int numbers will be interpreted as a label and never the index position. Bool array needs to be the same length as the axis being sliced.df.iloc[]
similar to loc[], but arguments are indices instead of labels.df.iloc[0]
slice first row,df.iloc[:, 0]
first column,df.iloc[0,0]
first cell. Note thatdf.iloc[0]
return series anddf.iloc[[0]]
return dataframe.
Data Inspection
df.describe(percentiles=[.25,.5,.75])
for each column, display counts, min, max, std and percentilesdf.info()
show index, dtype and memorydf.head(n)
anddf.tail(n)
return the first and last n rows.s.value_counts(dropna=Fase)
unique values and counts for seriesdf.value_count()
counts unique rowsdf.count(axis=0, level=None, numeric_only=False)
count non-NA cells for each col or row,s.count()
non-NA elements in series
Operations
df[new_col]=values
add a new columndf.insert(loc, col_name, value, allow_duplicates=False)
insert a column at index loc.df.pop(col_name)
return column and drop from frame. Raise KeyError if not found.df.drop(labels, axis=0, index, columns, level, inplace=False)
drop specified labels from rows or columns.axis=0
drop index,axis=1
drop columnspd.concat(objs, axis=0, join='outer', ignore_index=False, sort=False, copy=True)
concatenate pandas objects along a particular axis.df.pivot(index=None, columns=None, values=None)
return reshaped df organized by given index / column values.pd.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)
create a spreadsheet-style pivot table as df.df.groupby(by, axis=0, as_index=True, sort=True, dropna=True)
group df with columns or mapping functiondf.apply(func, axis=0, raw=False)
. e.g.df.apply(pd.Series.value_counts)
return unique value and counts for all columnsdf.copy()
deep copydf.dropna()
df.fillna(val)
Iterations
for i in obj
produce values if obj is Series, and column labels if obj is dffor index, Series in df.iterrows()
Iterating through pandas objects is generally slow, and you should never modify the df, because depending on the data types, the iterator may return a copy and not a view.df.items()
act likedict.items()
iterates through key-value (label-series) pairs.
Math functions
df.max(), min(), mean(), median(), std(), corr()
std calculates variance, corr calculates correlation
File I/O
- pandas has a set of
reader
andwriter
functions such aspd.read_csv()
anddf.to_csv()
, the official guide is here