Hi there πŸ‘‹

Welcome to my blog! In this blog, I write short-read posts about coding, quantitative genetics, and miscellaneous things.

Migrated from Jekyll to Hugo

Jekyll and the Issues Around It I started this blog using Jekyll. First, I had to learn about Jekyll and how to write and maintain a blog with it. There was definitely a learning curve, but with tons of resources online, it wasn’t a huge deal. The real issue was installing and getting Jekyll to work. Jekyll is Ruby-based, with a bunch of dependencies like Ruby itself, RVM, Bundler, and gems. Mismatches between these dependencies can stop Jekyll from installing (like, the OS might be missing some stuff or can’t put together compatible dependencies) or even from working later (like after an OS update). ...

February 19, 2025 Β· Mohammad Ali Nilforooshan

Understanding SWAP and RAM disk

I was familiar with the concept of SWAP in memory management until recently when I became familiar with the concept of β€œRAM disk”. In this blog post, I introduce both briefly. In computing, memory management is critical to ensuring smooth performance and efficiency, especially as applications become more demanding. Two strategies, SWAP and RAM disk, help manage memory resources, albeit in very different ways. What is SWAP? SWAP, also known as swap space or swap memory, is a dedicated portion of disk storage set aside to act as an extension of a system’s physical memory (RAM). When a computer’s RAM is nearly full, the operating system can transfer some data from RAM to the SWAP area to avoid performance bottlenecks when RAM is under pressure. When the RAM cannot accommodate many programs and heavy applications, less active data is moved from the RAM to the SWAP space on the hard drive or SSD. This allows the system to continue running without immediate slowdowns. If that data is needed again, it’s β€œswapped in”, returning to RAM while another, less-needed piece of data may be swapped out. It should be noted that SWAP is considerably slower than RAM. ...

November 11, 2024 Β· Mohammad Ali Nilforooshan

git cheat sheet

Clone a repository git clone <sourse> Clone a specific branch of a repository git clone -b <branch> <sourse> Create a local repository git init See the changed files since the last commit git status Add/remove all the current changes to the next commit git add . Commit the changes git commit -m "message" Give commit extra description git commit -m "message Description" See the changes to the tracked files git diff Compare a file between two branches ...

October 25, 2024 Β· Mohammad Ali Nilforooshan

git quick start

Create a new repository on the command line. echo "# repository" >> README.md git init git add README.md git commit -m "first commit" git branch -M main git remote add origin https://github.com/<USER>/<REPONAME>.git git push -u origin main or push an existing repository from the command line. git remote add origin https://github.com/<USER>/<REPONAME>.git git branch -M main git push -u origin main

October 25, 2024 Β· Mohammad Ali Nilforooshan

Polars Tutorial

Adopted from https://www.youtube.com/playlist?list=PLINDUevGdb7U_ZRLCpqKWutmcY0vCUFOI See Polars Tutorial2 for more. Reading CSV Files import polars as pl # df = pl.read_csv('pl_data.csv') Creating New Columns df = pl.DataFrame({'Name': ['Mario', 'Luigi', 'Wario', 'Mario', 'Mario'], 'Age': [30,28,26,30,30]}) print(df) shape: (5, 2) β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β” β”‚ Name ┆ Age β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•ͺ═════║ β”‚ Mario ┆ 30 β”‚ β”‚ Luigi ┆ 28 β”‚ β”‚ Wario ┆ 26 β”‚ β”‚ Mario ┆ 30 β”‚ β”‚ Mario ┆ 30 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜ score_values = pl.Series([95,99,94,90,96]) df = df.with_columns(Score = pl.lit(score_values), Score_x2 = pl.lit(score_values*2)) print(df) shape: (5, 4) β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Name ┆ Age ┆ Score ┆ Score_x2 β”‚ β”‚ --- ┆ --- ┆ --- ┆ --- β”‚ β”‚ str ┆ i64 ┆ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•ͺ═════β•ͺ═══════β•ͺ══════════║ β”‚ Mario ┆ 30 ┆ 95 ┆ 190 β”‚ β”‚ Luigi ┆ 28 ┆ 99 ┆ 198 β”‚ β”‚ Wario ┆ 26 ┆ 94 ┆ 188 β”‚ β”‚ Mario ┆ 30 ┆ 90 ┆ 180 β”‚ β”‚ Mario ┆ 30 ┆ 96 ┆ 192 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Dropping Columns print(df.drop('Score_x2')) print(df.drop('Score_x2', 'Score')) print(df.drop(['Score_x2', 'Score'])) df.drop_in_place('Score_x2') print(df) shape: (5, 3) β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β” β”‚ Name ┆ Age ┆ Score β”‚ β”‚ --- ┆ --- ┆ --- β”‚ β”‚ str ┆ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•ͺ═════β•ͺ═══════║ β”‚ Mario ┆ 30 ┆ 95 β”‚ β”‚ Luigi ┆ 28 ┆ 99 β”‚ β”‚ Wario ┆ 26 ┆ 94 β”‚ β”‚ Mario ┆ 30 ┆ 90 β”‚ β”‚ Mario ┆ 30 ┆ 96 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜ shape: (5, 2) β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β” β”‚ Name ┆ Age β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•ͺ═════║ β”‚ Mario ┆ 30 β”‚ β”‚ Luigi ┆ 28 β”‚ β”‚ Wario ┆ 26 β”‚ β”‚ Mario ┆ 30 β”‚ β”‚ Mario ┆ 30 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜ shape: (5, 2) β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β” β”‚ Name ┆ Age β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•ͺ═════║ β”‚ Mario ┆ 30 β”‚ β”‚ Luigi ┆ 28 β”‚ β”‚ Wario ┆ 26 β”‚ β”‚ Mario ┆ 30 β”‚ β”‚ Mario ┆ 30 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜ shape: (5, 3) β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β” β”‚ Name ┆ Age ┆ Score β”‚ β”‚ --- ┆ --- ┆ --- β”‚ β”‚ str ┆ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•ͺ═════β•ͺ═══════║ β”‚ Mario ┆ 30 ┆ 95 β”‚ β”‚ Luigi ┆ 28 ┆ 99 β”‚ β”‚ Wario ┆ 26 ┆ 94 β”‚ β”‚ Mario ┆ 30 ┆ 90 β”‚ β”‚ Mario ┆ 30 ┆ 96 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜ Drop Duplicate Rows df = pl.DataFrame({'Name': ['Mario', 'Luigi', 'Wario', 'Mario', 'Mario'], 'Age': [30,28,26,30,30]}) print(df.unique()) shape: (3, 2) β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β” β”‚ Name ┆ Age β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•ͺ═════║ β”‚ Mario ┆ 30 β”‚ β”‚ Luigi ┆ 28 β”‚ β”‚ Wario ┆ 26 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜ Iterating Polars Dataframes for row in df.rows(): print(row) print(row[0]) ('Mario', 30) Mario ('Luigi', 28) Luigi ('Wario', 26) Wario ('Mario', 30) Mario ('Mario', 30) Mario Plot Polars Dataframe import matplotlib.pyplot as plt df = pl.DataFrame({'Date': ['1/1/2023','1/2/2023','1/3/2023','1/4/2023','1/5/2023','1/6/2023','1/7/2023','1/8/2023','1/9/2023','1/10/2023'], 'Price': [15,16,16,15,14,13,14,17,16,18]}) dates = list(df['Date']) prices = list(df['Price']) plt.plot(dates, prices) plt.show() ...

October 25, 2024 Β· Mohammad Ali Nilforooshan

Polars Tutorial2

Most of these commands are derived with the help of ChatGPT or stackoverflow. However, it is easier to look for a command here rather than searching for the answers to these questions over and over again. There might be a few Python examples here not directly related to Polars! Read . as missing value import polars as pl # df = pl.read_csv("your_file.csv", null_values=".", has_header = True) Replace . with missing value df = pl.DataFrame({ "col1": [".", "AB456", "GK789", "."], "col2": [10, 20, 30, 40], "col3": [1, 2, 3, 4] }) df = df.with_columns(pl.col(pl.String).replace({".": None})) print(df) shape: (4, 3) β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 ┆ col3 β”‚ β”‚ --- ┆ --- ┆ --- β”‚ β”‚ str ┆ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•ͺ══════β•ͺ══════║ β”‚ null ┆ 10 ┆ 1 β”‚ β”‚ AB456 ┆ 20 ┆ 2 β”‚ β”‚ GK789 ┆ 30 ┆ 3 β”‚ β”‚ null ┆ 40 ┆ 4 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ Get column names df = pl.DataFrame({ "col1": ["GK123", "AB456", "GK789", "CD012"], "col2": [10, 20, 30, 40], "col3": [1, 2, 3, 4] }) print(df.columns) # Get the last two columns print(df.columns[-2:]) # Get the first two columns print(df.columns[:2]) ['col1', 'col2', 'col3'] ['col2', 'col3'] ['col1', 'col2'] Change values in a column based on a condition in another column df = pl.DataFrame({ "col1": ["GK123", "AB456", "GK789", "CD012"], "col2": [10, 20, 30, 40] }) # Replace values in 'col2' based on the condition that 'col1' starts with 'GK' df = df.with_columns( pl.when(pl.col("col1").str.starts_with("GK")) .then(999) # Replace with your desired value .otherwise(pl.col("col2")) # Keep original value if condition is not met .alias("col2") ) print(df) shape: (4, 2) β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•ͺ══════║ β”‚ GK123 ┆ 999 β”‚ β”‚ AB456 ┆ 20 β”‚ β”‚ GK789 ┆ 999 β”‚ β”‚ CD012 ┆ 40 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ Change values in a column based on multiple conditions in another column df = pl.DataFrame({ "col1": ["GK123", "AB456", "GK789", "CD012"], "col2": [10, 25, 30, 40] }) # Replace values in 'col2' based on multiple conditions df = df.with_columns( pl.when((pl.col("col1").str.starts_with("GK")) & (pl.col("col2") > 20)) .then(999) # Replace with your desired value .otherwise(pl.col("col2")) # Keep original value if conditions are not met .alias("col2") ) print(df) df = pl.DataFrame({ "col1": ["GK123", "AB456", "GK789", "CD012"], "col2": [10, 25, 30, 40] }) df = df.with_columns( pl.when(pl.col("col2") < 11).then(20) .when(pl.col("col2") > 33).then(pl.col("col2") * 2) .otherwise(pl.col("col2")) .alias("col2") ) print(df) shape: (4, 2) β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•ͺ══════║ β”‚ GK123 ┆ 10 β”‚ β”‚ AB456 ┆ 25 β”‚ β”‚ GK789 ┆ 999 β”‚ β”‚ CD012 ┆ 40 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ shape: (4, 2) β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•ͺ══════║ β”‚ GK123 ┆ 20 β”‚ β”‚ AB456 ┆ 25 β”‚ β”‚ GK789 ┆ 30 β”‚ β”‚ CD012 ┆ 80 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ Extract a column into a vector col1_vector = df["col1"].to_list() print(col1_vector) ['GK123', 'AB456', 'GK789', 'CD012'] Extract values meeting a condition from a column df = pl.DataFrame({ "col1": [1, 2, 3], "col2": [4, 5, 6] }) # Extract values from 'col1' that are greater than 1 filtered_values = df.filter(pl.col("col1") > 1)["col1"].to_list() print(filtered_values) [2, 3] Get the dimension of the DataFrame df.shape (3, 2) Paste a text string to a range of numbers # Generate the sequence from 1001 to 1010 and concatenate "DK" with each value result = [f"DK{1000 + i}" for i in range(1, 11)] print(result) result = ["DK" + str(i) for i in range(1001, 1011)] print(result) ['DK1001', 'DK1002', 'DK1003', 'DK1004', 'DK1005', 'DK1006', 'DK1007', 'DK1008', 'DK1009', 'DK1010'] ['DK1001', 'DK1002', 'DK1003', 'DK1004', 'DK1005', 'DK1006', 'DK1007', 'DK1008', 'DK1009', 'DK1010'] Remove Initial characters in a string list string_list = ["AB123", "CD456", "EF789", "GH012"] # Remove the first two characters from each string modified_list = [s[2:] for s in string_list] print(modified_list) ['123', '456', '789', '012'] Copy a column into another when a condition is met df = pl.DataFrame({ "col1": ["GK123", "AB456", "GK789", "CD012"], "col2": [10, 20, 30, 40] }) # Copy 'col2' to 'col1' where 'col1' starts with 'GK' df = df.with_columns( pl.when(pl.col("col1").str.starts_with("GK")) .then(pl.col("col2")) # Copy 'col2' value to 'col1' .otherwise(pl.col("col1")) # Keep original 'col1' value if condition is not met .alias("col1") ) print(df) shape: (4, 2) β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•ͺ══════║ β”‚ 10 ┆ 10 β”‚ β”‚ AB456 ┆ 20 β”‚ β”‚ 30 ┆ 30 β”‚ β”‚ CD012 ┆ 40 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ Copy a list into a DataFrame column when a condition is met df = pl.DataFrame({ "col1": ["GK123", "AB456", "GK789", "CD012"], "col2": [10, 20, 30, 40] }) new_values = ["XY987", "ZW654", "XY987", "ZW654"] # Copy 'new_values' to 'col1' where 'col1' starts with 'GK' df = df.with_columns( pl.when(pl.col("col1").str.starts_with("GK")) .then(pl.Series(new_values)) # Assign new values with the same shape .otherwise(pl.col("col1")) .alias("col1") ) print(df) shape: (4, 2) β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•ͺ══════║ β”‚ XY987 ┆ 10 β”‚ β”‚ AB456 ┆ 20 β”‚ β”‚ XY987 ┆ 30 β”‚ β”‚ CD012 ┆ 40 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ Copy a list into a column df = df.with_columns(pl.Series("col1", new_values)) print(df) df = pl.DataFrame({ "col1": ["GK123", "AB456", "GK789", "CD012"], "col2": [10, 20, 30, 40] }) df = df.with_columns(col1 = pl.lit(pl.Series(new_values))) print(df) shape: (4, 2) β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•ͺ══════║ β”‚ XY987 ┆ 10 β”‚ β”‚ ZW654 ┆ 20 β”‚ β”‚ XY987 ┆ 30 β”‚ β”‚ ZW654 ┆ 40 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ shape: (4, 2) β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•ͺ══════║ β”‚ XY987 ┆ 10 β”‚ β”‚ ZW654 ┆ 20 β”‚ β”‚ XY987 ┆ 30 β”‚ β”‚ ZW654 ┆ 40 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ Copy row-sums (rowSums) of columns β€œa” and β€œb” into column β€œa” df = pl.DataFrame({ "a": [1, 2, 3], "b": [4, 5, 6] }) # Compute the row-wise sum of columns "a" and "b" and store it in column "a" df = df.with_columns( (pl.col("a") + pl.col("b")).alias("a") ) print(df) shape: (3, 2) β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β” β”‚ a ┆ b β”‚ β”‚ --- ┆ --- β”‚ β”‚ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•ͺ═════║ β”‚ 5 ┆ 4 β”‚ β”‚ 7 ┆ 5 β”‚ β”‚ 9 ┆ 6 β”‚ β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜ Drop columns with the column name ending with a pattern df = pl.DataFrame({ "col1": [1, 2, 3], "col2_right": [4, 5, 6], "col3": [7, 8, 9], "col4_right": [10, 11, 12] }) # Drop columns that end with "_right" df = df.select([col for col in df.columns if not col.endswith("_right")]) print(df) shape: (3, 2) β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col3 β”‚ β”‚ --- ┆ --- β”‚ β”‚ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•ͺ══════║ β”‚ 1 ┆ 7 β”‚ β”‚ 2 ┆ 8 β”‚ β”‚ 3 ┆ 9 β”‚ β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ Rename columns with the column name ending with a pattern df = pl.DataFrame({ "col1": [1, 2, 3], "col2_right": [4, 5, 6], "col3": [7, 8, 9], "col4_right": [10, 11, 12] }) # Rename columns ending with "_right" to "d*" (d1, d2, ...) new_names = {} counter = 1 for col in df.columns: if col.endswith("_right"): new_names[col] = f"d{counter}" counter += 1 # Apply the renaming df = df.rename(new_names) print(df) df = pl.DataFrame({ "col1": [1, 2, 3], "col2_right": [4, 5, 6], "col3": [7, 8, 9], "col4_right": [10, 11, 12] }) # Create a dictionary to hold the renaming mappings new_names = {} # Iterate through the columns and rename those ending with "_right" for col in df.columns: if col.endswith("_right"): # Extract the part before "_right" and prepend "d" base_name = col.split("_right")[0] new_names[col] = f"d{base_name}" # Apply the renaming df = df.rename(new_names) print(df) shape: (3, 4) β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β” β”‚ col1 ┆ d1 ┆ col3 ┆ d2 β”‚ β”‚ --- ┆ --- ┆ --- ┆ --- β”‚ β”‚ i64 ┆ i64 ┆ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•ͺ═════β•ͺ══════β•ͺ═════║ β”‚ 1 ┆ 4 ┆ 7 ┆ 10 β”‚ β”‚ 2 ┆ 5 ┆ 8 ┆ 11 β”‚ β”‚ 3 ┆ 6 ┆ 9 ┆ 12 β”‚ β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜ shape: (3, 4) β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ dcol2 ┆ col3 ┆ dcol4 β”‚ β”‚ --- ┆ --- ┆ --- ┆ --- β”‚ β”‚ i64 ┆ i64 ┆ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•ͺ═══════β•ͺ══════β•ͺ═══════║ β”‚ 1 ┆ 4 ┆ 7 ┆ 10 β”‚ β”‚ 2 ┆ 5 ┆ 8 ┆ 11 β”‚ β”‚ 3 ┆ 6 ┆ 9 ┆ 12 β”‚ β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜ Rename columns based on column index df = pl.DataFrame({ "original1": ["123-456", "789-012", "345-678"], "original2": [1, 2, 3], "original3": [10, 20, 30] }) # Get the first two column names old_names = df.columns[:2] # New names for the first two columns new_names = ["col1", "col2"] # Rename the first two columns df = df.rename({old_names[0]: new_names[0], old_names[1]: new_names[1]}) print(df) shape: (3, 3) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 ┆ original3 β”‚ β”‚ --- ┆ --- ┆ --- β”‚ β”‚ str ┆ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•β•β•ͺ══════β•ͺ═══════════║ β”‚ 123-456 ┆ 1 ┆ 10 β”‚ β”‚ 789-012 ┆ 2 ┆ 20 β”‚ β”‚ 345-678 ┆ 3 ┆ 30 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Rename all columns with a list df = pl.DataFrame({ "old_name1": [1, 2, 3], "old_name2": [4, 5, 6], "old_name3": [7, 8, 9] }) # List of new column names new_column_names = ["new_name1", "new_name2", "new_name3"] # Rename all columns df = df.rename({old: new for old, new in zip(df.columns, new_column_names)}) print(df) shape: (3, 3) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ new_name1 ┆ new_name2 ┆ new_name3 β”‚ β”‚ --- ┆ --- ┆ --- β”‚ β”‚ i64 ┆ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•β•β•β•β•ͺ═══════════β•ͺ═══════════║ β”‚ 1 ┆ 4 ┆ 7 β”‚ β”‚ 2 ┆ 5 ┆ 8 β”‚ β”‚ 3 ┆ 6 ┆ 9 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ The difference between .str.replace and .str.replace_all df = pl.DataFrame({ "col1": ["123-456-34", "789-012-78", "345-678-02"], "col2": [1, 2, 3] }) # Replace "-" with "" in the "col1" column df = df.with_columns( pl.col("col1").str.replace("-", "").alias("col1") ) print(df) df = pl.DataFrame({ "col1": ["123-456", "789-012", "345-678"], "col2": [1, 2, 3] }) # Replace "-" with "" in the "col1" column df = df.with_columns( pl.col("col1").str.replace_all("-", "").alias("col1") ) print(df) shape: (3, 2) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•β•β•β•β•ͺ══════║ β”‚ 123456-34 ┆ 1 β”‚ β”‚ 789012-78 ┆ 2 β”‚ β”‚ 345678-02 ┆ 3 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ shape: (3, 2) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•β•ͺ══════║ β”‚ 123456 ┆ 1 β”‚ β”‚ 789012 ┆ 2 β”‚ β”‚ 345678 ┆ 3 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ Read specific columns from a CSV file # df = pl.read_csv("data.csv", columns=[0, 2]) Replace null with 0 in the last two columns df = pl.DataFrame({ "col1": [1, 2, 3], "col2": [None, 5, None], "col3": [7, None, 9] }) # Replace null with 0 in the last two columns df = df.with_columns( [pl.col(df.columns[-2:]).fill_null(0)] ) print(df) shape: (3, 3) β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 ┆ col3 β”‚ β”‚ --- ┆ --- ┆ --- β”‚ β”‚ i64 ┆ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•ͺ══════β•ͺ══════║ β”‚ 1 ┆ 0 ┆ 7 β”‚ β”‚ 2 ┆ 5 ┆ 0 β”‚ β”‚ 3 ┆ 0 ┆ 9 β”‚ β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ Drop the first two characters from all elements in a DataFrame df = pl.DataFrame({ "col1": ["ab123", "ab456", "ab789"], "col2": ["cd001", "cd002", "cd003"] }) # Drop the first two characters from all elements in the DataFrame df = df.with_columns( [pl.col(c).str.slice(2) for c in df.columns] ) print(df) # Now, do the same, but turn the resulting elements into integers. df = pl.DataFrame({ "col1": ["ab123", "ab456", "ab789"], "col2": ["cd001", "cd002", "cd003"] }) # Drop the first two characters and convert the remaining strings to integers df = df.with_columns( [pl.col(c).str.slice(2).cast(pl.Int64) for c in df.columns] ) print(df) shape: (3, 2) β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ str β”‚ β•žβ•β•β•β•β•β•β•ͺ══════║ β”‚ 123 ┆ 001 β”‚ β”‚ 456 ┆ 002 β”‚ β”‚ 789 ┆ 003 β”‚ β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ shape: (3, 2) β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 β”‚ β”‚ --- ┆ --- β”‚ β”‚ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•ͺ══════║ β”‚ 123 ┆ 1 β”‚ β”‚ 456 ┆ 2 β”‚ β”‚ 789 ┆ 3 β”‚ β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ Drop a column by its index number df = pl.DataFrame({ "col1": [1, 2, 3], "col2": [4, 5, 6], "col3": [7, 8, 9] }) # Drop the first column by its index (0) df = df.drop(df.columns[0]) print(df) shape: (3, 2) β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col2 ┆ col3 β”‚ β”‚ --- ┆ --- β”‚ β”‚ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•ͺ══════║ β”‚ 4 ┆ 7 β”‚ β”‚ 5 ┆ 8 β”‚ β”‚ 6 ┆ 9 β”‚ β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ Reorder columns df.select(['col3', 'col2']) shape: (3, 2)col3col2i64i64748596 df.select([pl.col('col3'), pl.col('col2')]) shape: (3, 2)col3col2i64i64748596 Reorder DataFrame’s rows by matching a column with a list df = pl.DataFrame({ "col1": ["b", "c", "a", "e", "d"], "col2": [1, 2, 3, 4, 5] }) # The desired order for "col1" desired_order = ["a", "b", "c", "d", "e"] # Reorder the DataFrame by matching "col1" with the desired_order list df_reordered = df.with_columns( pl.col("col1").map_elements(lambda x: desired_order.index(x), return_dtype=pl.Int64).alias("sort_key") ).sort("sort_key").drop("sort_key") print(df_reordered) # or df_reordered = df.join(pl.DataFrame({"col1": desired_order}), on = "col1") print(df_reordered) shape: (5, 2) β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•ͺ══════║ β”‚ a ┆ 3 β”‚ β”‚ b ┆ 1 β”‚ β”‚ c ┆ 2 β”‚ β”‚ d ┆ 5 β”‚ β”‚ e ┆ 4 β”‚ β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ shape: (5, 2) β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•ͺ══════║ β”‚ a ┆ 3 β”‚ β”‚ b ┆ 1 β”‚ β”‚ c ┆ 2 β”‚ β”‚ d ┆ 5 β”‚ β”‚ e ┆ 4 β”‚ β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ Filter the DataFrame for elements containing a pattern in a column df = pl.DataFrame({ "col1": ["abc123", "def456", "ghi789", "abcxyz"], "col2": [1, 2, 3, 4] }) # Filter rows where "col1" contains the pattern "abc" filtered_df = df.filter(pl.col("col1").str.contains("abc")) print(filtered_df) shape: (2, 2) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 β”‚ β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•β•ͺ══════║ β”‚ abc123 ┆ 1 β”‚ β”‚ abcxyz ┆ 4 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ Compute the dot/inner product (crossproduct) between two Series s = pl.Series("a", [1, 2, 3]) s2 = pl.Series("b", [4.0, 5.0, 6.0]) s.dot(s2) 32.0 Extract rows of a DataFrame in a list of tuples print(df.rows()) [('abc123', 1), ('def456', 2), ('ghi789', 3), ('abcxyz', 4)] Get rowSums of a DataFrame df = pl.DataFrame({ "col1": [1, 2, 3], "col2": [4, 5, 6], "col3": [7, 8, 9] }) df = df.with_columns(df.select(pl.sum_horizontal("*").alias("row_sum"))) df.with_columns(df.select(pl.sum_horizontal(["col1","col2"]).alias("row_sum"))) print(df) shape: (3, 4) β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 ┆ col3 ┆ row_sum β”‚ β”‚ --- ┆ --- ┆ --- ┆ --- β”‚ β”‚ i64 ┆ i64 ┆ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•ͺ══════β•ͺ══════β•ͺ═════════║ β”‚ 1 ┆ 4 ┆ 7 ┆ 12 β”‚ β”‚ 2 ┆ 5 ┆ 8 ┆ 15 β”‚ β”‚ 3 ┆ 6 ┆ 9 ┆ 18 β”‚ β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ head and tail print(df.head()) print(df.head(2)) print(df.tail()) print(df.tail(2)) shape: (3, 4) β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 ┆ col3 ┆ row_sum β”‚ β”‚ --- ┆ --- ┆ --- ┆ --- β”‚ β”‚ i64 ┆ i64 ┆ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•ͺ══════β•ͺ══════β•ͺ═════════║ β”‚ 1 ┆ 4 ┆ 7 ┆ 12 β”‚ β”‚ 2 ┆ 5 ┆ 8 ┆ 15 β”‚ β”‚ 3 ┆ 6 ┆ 9 ┆ 18 β”‚ β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ shape: (2, 4) β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 ┆ col3 ┆ row_sum β”‚ β”‚ --- ┆ --- ┆ --- ┆ --- β”‚ β”‚ i64 ┆ i64 ┆ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•ͺ══════β•ͺ══════β•ͺ═════════║ β”‚ 1 ┆ 4 ┆ 7 ┆ 12 β”‚ β”‚ 2 ┆ 5 ┆ 8 ┆ 15 β”‚ β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ shape: (3, 4) β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 ┆ col3 ┆ row_sum β”‚ β”‚ --- ┆ --- ┆ --- ┆ --- β”‚ β”‚ i64 ┆ i64 ┆ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•ͺ══════β•ͺ══════β•ͺ═════════║ β”‚ 1 ┆ 4 ┆ 7 ┆ 12 β”‚ β”‚ 2 ┆ 5 ┆ 8 ┆ 15 β”‚ β”‚ 3 ┆ 6 ┆ 9 ┆ 18 β”‚ β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ shape: (2, 4) β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ col1 ┆ col2 ┆ col3 ┆ row_sum β”‚ β”‚ --- ┆ --- ┆ --- ┆ --- β”‚ β”‚ i64 ┆ i64 ┆ i64 ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•ͺ══════β•ͺ══════β•ͺ═════════║ β”‚ 2 ┆ 5 ┆ 8 ┆ 15 β”‚ β”‚ 3 ┆ 6 ┆ 9 ┆ 18 β”‚ β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Matrix-vector multiplication import numpy as np a = pl.DataFrame({ "col1": [5,1,3,2], "col2": [1,1,1,3], "col3": [1,2,1,4] }) b = [1, 2, 3] print(a.to_numpy().dot(b)) [10 9 8 20]

October 25, 2024 Β· Mohammad Ali Nilforooshan

RMarkdown cheat sheet

A basic YAML header to start with. Choose the desired output format. --- title: "RMarkdown Example" author: "Mohammad Ali Nilforooshan" date: "6 August 2017" output: html_document # output: pdf_document # output: word_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` horizontal rule *** or --- Formatting Manual line break: End the line with two or more spaces. italic *italic* and italic _italic_ bold **bold** and bold __bold__ superscript2 superscript^2^ subscript2 subscript~2~ ...

October 25, 2024 Β· Mohammad Ali Nilforooshan

Set git configuration variables

git config --global user.name "username" git config --global user.email "user@email.com" git config --global user.password "TOKEN" git config --global core.editor "vim" To see the Git configures do: git config --list Use crediential helper so that Git does not ask for username and password (token) every time. git config --global crediential.helper cache To unset the above (e.g., using a new token) do: git config --global --unset crediential.helper

October 25, 2024 Β· Mohammad Ali Nilforooshan

What does heritability do in BLUP?

Heritability ($\text h^2$) has biological and statistical definitions. Statistically, heritability is the proportion of the phenotypic variation explained by the genetic variation in the population. Let’s start with an example! I took example 3.1 from Mrode (2005). This example is about estimating fixed effects (sex) solutions and predicting breeding values of animals for pre-weaning gain (WWG) of beef calves. The data is presented in the following table: Calf Sex Sire Dam WWG (kg) 4 Male 1 Unknown 4.5 5 Female 3 2 2.9 6 Female 1 2 3.9 7 Male 4 5 3.5 8 Male 3 6 5.0 In an Animal Model BLUP, the inverse of the relationship matrix ($\mathbf A^{-1}$) is multiplied by $\lambda = (1 - \text h^2)/\text h^2$ (equal to $(4 - \text h^2)/\text h^2$ in a Sire Model BLUP). Considering $\text h^2$ = 1/3 (Ξ» = 2), the following solutions are obtained from BLUP: ...

October 25, 2024 Β· Mohammad Ali Nilforooshan