LGEO2185: Coding principles

Author

Antoine Stevens, Kristof Van Oost & Valentin Charlier

Learning Objectives

  • Learn good coding habits
  • Create code that enhance reproducibility and collaboration

Coding principles

There are as many versions of good practices as there are practitioners1 but generally there are a few coding principles which can help improve software quality if they are followed.

Principle Description
Readable Code is easy to read, with clear names and intuitive logic. Use comments only for non-obvious explanations.
Consistent Follows uniform styles, patterns, and naming conventions across the codebase.
Reusable Avoids repetition by using functions, classes, and modules to encapsulate recurring logic.
Focused Each function, class, or module serves a single purpose and avoids mixing unrelated responsibilities.
Tested Includes reliable tests covering edge cases, ensuring changes do not break the code.
Extensible Designed to allow new features to be added without modifying existing code unnecessarily.

We explain below how these principles can translate concretely for key code components.

1. Variables

Use searchable and intention-revealing names (and that can be pronounced)

Bad

x <- sum(sales)
y <- mean(prices)

Good

total_sales <- sum(sales)
average_price <- mean(prices)

Dont’ give names that are reserved words or functions (e.g., T, F, if, mean, …) in Base R

# TRUE and T are equivalent
TRUE == T # TRUE
[1] TRUE
2 == TRUE # FALSE
[1] FALSE
# Let's create an object named 'T'
T <- 2

# And now we can have problems down the line
TRUE == T # FALSE
[1] FALSE

Make meaningful distinctions

Bad

get_user_info()
get_client_data()
get_customer_record()

Good

get_user_info()
get_user_data()
get_user_record()

Avoid hard-coding

Bad

Sys.sleep(86400)

Good

seconds_in_day <- 60 * 60 * 24
Sys.sleep(seconds_in_day)

Avoid mental mapping

Bad

seq <- c("wheat", "barley", "maize")
for (item in seq) {
  do_stuff(item)
}

Good

crops <- c("wheat", "barley", "maize")
for (crop in crops) {
  do_stuff(crop)
}  

2. Functions

Functions should be used to avoid repetition – functions that are used only once in a code are not particularly useful!

Keep functions focused on a single task

Functions should focus on one specific task. This improves readability, debugging, and reusability.

Bad

# A function that combines unrelated tasks
calculate_mean_and_plot <- function(x) {
  mean_value <- mean(x, na.rm = TRUE)
  hist(x)  # Unnecessary visualization here
  return(mean_value)
}

Good

# A function that calculates the mean only
calculate_mean <- function(x) {
  mean(x, na.rm = TRUE)
}

# Separate visualization logic
plot_histogram <- function(x) {
  hist(x)
}

Maintain consistent levels of abstraction

Functions should operate at a single, consistent level of abstraction.

Bad

# High-level function handling low-level details
get_total_area <- function(df) {
  apply(df, 1, function(row) row[1] * row[2])
}

Good

# Low-level function
get_field_area <- function(length, width) {
  length * width
}

# High-level function
get_total_area <- function(df) {
  apply(df, 1, function(row) get_area(row[1], row[2]))
}

Use descriptive and consistent function names

Function names should clearly describe their purpose and follow a consistent naming convention (e.g., use snake_case2 or camelCase3).

Note

While you could use dots to separate words in R (e.g. mean.field), refrain to do this: * dots in variable names (e.g., object.attribute) are used to access attributes or methods of objects in other Programming Language (e.g. Python) so that not using dots will make your code more consistent and accessible to others. * In R, the dot (.) is commonly used in S3 object-oriented programming to separate methods from generic function names. For example, in print.data.frame, data.frame is a method for the generic print function. Overusing dots in variable names can hence lead to confusion, especially when working with object-oriented code.

Bad

# Unclear or inconsistent naming
calcSD <- function(x){...}
f1 <- function(data){...}

Good

# Descriptive and consistent naming
calc_standard_deviation <- function(x){...}
get_summary_statistics <- function(data){...}

Use default arguments wisely

Default arguments should support common use cases while keeping the function flexible.

Bad

# Overloaded with too many default arguments
series <- function(start = 1, end = 10, 
                   step = 1, reverse = FALSE,
                   as_list = FALSE) {
  sequence <- seq(start, end, by = step)
  if (reverse) sequence <- rev(sequence)
  if (as_list) sequence <- as.list(sequence)
  sequence
}

Good (although not very useful)

# Simple and clear default arguments
series <- function(start = 1, end = 10, step = 1) {
  seq(start, end, by = step)
}

3. Commenting

Comments are an essential part of clean and maintainable code when used appropriately. Regarding how much you should comment, there are really two schools here.

Some argue that excessive commenting often masks poorly written code, insisting that clear, expressive code with minimal comments is far superior to cluttered, over-commented code. Yet others counter that while code can show what it does, it rarely explains why —making thoughtful, generous comments essential.

So make up your mind ;-). In any case, GenAI has dramatically reduced the workload in generating comments so there are really not good reasons to avoid doing it.

Here’s a list of cases when comments are useful.

Explain non-obvious logic

Some logic may be complex, involve trade-offs, or rely on domain-specific knowledge that isn’t immediately clear to someone reading the code. A comment can clarify:

  • Why something is being done (intent or rationale).
  • Context behind decisions, such as limitations, dependencies, or performance considerations.
# Using binary search for faster lookup in sorted data
binary_search <- function(arr, target) {
  # Implementation details
}

Document assumptions or constraints

When the code relies on specific assumptions, constraints, or external factors (e.g., APIs, system behavior, or input formats), comments help future developers understand and respect those boundaries.

# Assumes `user_input` is sanitized before this function is called
process_input <- function(user_input) {
  # Implementation details
}

Provide high-level overviews

Comments can summarize what a block of code, function, or class is doing at a high level, without requiring the reader to dive into the implementation details.

# This function calculates the tax based on income brackets
calculate_tax <- function(income) {
  # Implementation details
}

Facilitate collaboration & maintenance

In large teams, comments help bridge the knowledge gap between developers, enabling them to quickly understand and contribute to the codebase. Furthermore, over time, the original developer may not work on the project, and others (or even the original developer) may need to modify or debug it. Comments can save hours of effort by providing immediate context. E.g., when implementing a workaround or a quick fix, comments clarify why a less-than-ideal approach was taken and when it should be revisited, or include hints about potential failure points or areas that need careful monitoring (TODO’s).

# TODO: Replace this with a proper caching mechanism once requirements are finalized
cache_data <- function(data) {
  # Temporary implementation
}

The balance: avoid over-commenting

While comments are important, too many comments or redundant ones can clutter code. Code should be as self-explanatory as possible, and comments should add value beyond what is obvious.

For example:

Bad

i <- 0 # Initialize variable i to zero

Good

# Counter for tracking retries 
# in case connection fails
i <- 0

Function documentation

Documenting functions is essential to ensure the reproducibility of code and its potential for future development. Proper documentation involves explaining the purpose of a function (briefly or in detail), describing the allowed parameters, and specifying the output. The roxygen2 package provides a range of tags to document functions in greater detail. More comprehensive documentation on its usage is available here. A best practice is to follow the roxygen2 standard, which uses standardized @ tags for function documentation. These tags make it easy to describe functions and their components clearly. Documentation written with roxygen2 is recognizable by the #' prefix. When developing an R package, this approach allows you to generate easily documentation accessible via ?function_name. Here is a simple example:

#' Add together two numbers.
#' 
#' @param x A number.
#' @param y A number.
#' @return The sum of \code{x} and \code{y}.
#' @examples
#' add(1, 1)
#' add(10, 1)
add <- function(x, y) {
  x + y
}

Structuring your script

  • Start a script with some information on author, date and by describing the purpose of your code, e.g.:
# Author: John Doe
# Date: 2025-01-22
# Description: This script cleans and preprocesses the raw data for analysis.
# It removes missing values, standardizes column names, and saves the processed data.
  • R allows you to use titles in scripts to provide structure and break up your file into easily readable chunks. Titles use a combination of # and repeated characters such as - or = to visually separate sections of code.
# Title 1 ---------------------------------------
# This section deals with data loading

# Title 2 =======================================
# This section handles data cleaning

4. File naming

Good file names are essential for ensuring that files are easy to locate, understand, and manage. Some tips:

  • Names should be regex-friendly: This means you can search them using keywords with tools like regex or the stringr package in R. To make this possible, avoid spaces, punctuation, accented characters, and case sensitivity.
  • Names should be human-readable: They are intuitive and descriptive, making it easy to understand what a file contains, even for someone unfamiliar with your work.
  • Names should be structured with delimiters: Use a consistent naming structure where sections of the name are separated by delimiters, and each section has a specific role. This helps extract information from file names programmatically.

Example:

You could adopt a name structure following the below logic:

{Layer}_{Source}_{Region}_{DataType}
Layer Source Region Datatype Example
raw sensor_data amazon temperature raw_sensor_data_amazon_temperature.csv
int (intermediary) satellite_imagery arctic vegetation_index int_satellite_imagery_arctic_vegetation_index.tif
prm (primary) climate_model global precipitation prm_climate_model_global_precipitation.nc
ft (feature) feature_dataset amazon canopy_height ft_feature_dataset_amazon_canopy_height.csv
mi (model input) model_input regional co2_emissions mi_model_input_regional_co2_emissions.csv
mo (model output) model_output global sea_level mo_model_output_global_sea_level.csv
NotePlaying well with default ordering

For tracking changes, version, or historical reference, you can add an additional identifier to your files. You could then follow these tips to leverage the computer’s default ordering:

  • Use numbers in file names: Include numeric prefixes to indicate order or creation date.

  • Use the YYYY-MM-DD date format: This ensures chronological sorting, regardless of region or system settings.

  • Left-pad numbers with zeroes: This prevents issues like 10 appearing before 2.

Conclusion

Consistent formatting is critical when writing R code to improve readability, maintainability, and collaboration. Formatting specifies how comments, whitespace, naming conventions, and indentation should be structured and defined in a script. One widely accepted formatting standard is defined by the tidyverse.

Manually formatting code can be time-consuming and error-prone. Thankfully, R provides tools and packages to automate this process:

  • styler allows you to interactively restyle selected text, files, or entire projects. It includes an RStudio add-in, the easiest way to re-style existing code
  • lintr performs automated checks to confirm that you conform to the style guide.

Further readings

Session info

sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Brussels
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4 compiler_4.4.1    fastmap_1.2.0     cli_3.6.5        
 [5] tools_4.4.1       htmltools_0.5.8.1 rstudioapi_0.18.0 yaml_2.3.12      
 [9] rmarkdown_2.28    knitr_1.48        jsonlite_2.0.0    xfun_0.48        
[13] digest_0.6.39     rlang_1.1.7       evaluate_1.0.5   

Footnotes

  1. Seriously, who wants to code badly ?↩︎

  2. Use underscores to separate names↩︎

  3. Capitalizes each new word except the first↩︎