Pandas and Numpy syntax Cheat Sheet
2 min readMar 30, 2024
Pandas
Basically think of all the things you can do with SQL: SELECT, INSERT, UPDATE/DELETE, GROUP BY. How do you perform these operations with pandas?
- Basic data loading via read_json, basic operations: shift, diff, pct_change, cumprod, pd.to_datetime to convert date strings to actual datetime
- Time Series operations: mean/std with rolling vs ewm (Exponential Moving Averages)
- Basic filter: AND, OR, Negation, isnull and isna
- Filtering on index columns
- Sort, by different fields and some Asc, other Desc
- Nan handling: fillna and replace
- apply and Lambda transformation
- Aggregation
- Add one row to data frame
- Modify a field of a given row, avoiding: “A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead.”
- Column to numpy.ndarray and to simple list
- The Basic Loop: iterrows
- Pretty print your pandas: tabulate
OK, see this one short example!
Sample equity files? I will include one of the json files here for sake of convenience.
Numpy
- Array initialization: ones/zeros/arrange/linspace/randn
- reshape
- min/max/argmin/argmax
- Slicing
- filtering
- Array Operations
- Sum across rows or columns
- statistics: min/max/mean/var/std
- Broadcasting
Now, if we have Pandas why we need numpy?
- Pandas is built on top of numpy, a ton of features such as data loading /export to/from csv/json, to manipulation: SELECT/WHERE/GROUPBY/INSERT/UPDATE
- Many scientific libraries uses numpy.ndarray
For this simplest keras hello world example, you can see keras fit and predict takes np.array as input data