Getting Started
This guide will walk you through installing Sparrow.jl and creating your first workflow.
Installation
Prerequisites
Sparrow.jl requires Julia 1.6 or later. You can download Julia from julialang.org.
Installing Sparrow.jl
Since Sparrow.jl is hosted on GitHub, you can install it using the package manager:
using Pkg
Pkg.add(url="https://github.com/csu-tropical/Sparrow.jl")Installing Dependencies
Sparrow.jl has several dependencies that will be automatically installed. However, some external tools may be required for certain operations:
- RadxConvert: For converting radar data formats
- RadxPrint: For reading radar file metadata
These tools are part of the LROSE toolkit and should be available in your system PATH if you plan to use the built-in radar processing steps.
Your First Workflow
Let's create a simple workflow that processes radar data files.
Step 1: Create a Workflow File
Create a new file called my_workflow.jl:
using Sparrow
# Define your workflow type
@workflow_type SimpleRadarWorkflow
# Define workflow steps
@workflow_step ConvertData
@workflow_step QualityCheck
# Create the workflow instance
workflow = SimpleRadarWorkflow(
# Directory configuration
base_working_dir = "/tmp/sparrow_work",
base_archive_dir = "/data/archive",
base_data_dir = "/data/raw",
base_plot_dir = "/data/plots",
# Time parameters
minute_span = 10, # Process data in 10-minute chunks
# Define the processing steps
# Format: (step_name, step_type, input_directory, archive)
steps = [
("convert", ConvertData, "base_data", false),
("qc", QualityCheck, "convert", true)
],
# Radar moments to process
raw_moment_names = ["DBZ", "VEL", "WIDTH"],
qc_moment_names = ["DBZ", "VEL"],
# Message level (0=error, 1=warning, 2=info, 3=debug, 4=trace)
message_level = 2
)
# Implement the conversion step
function Sparrow.workflow_step(workflow::SimpleRadarWorkflow, ::Type{ConvertData},
input_dir::String, output_dir::String;
step_name::String="", step_num::Int=0, kwargs...)
msg_info("Converting data from $(input_dir)")
# Create output directory
mkpath(output_dir)
# Find input files
input_files = readdir(input_dir; join=true)
filter!(f -> endswith(f, ".raw") || endswith(f, ".uf"), input_files)
# Process each file
for input_file in input_files
output_file = joinpath(output_dir, basename(input_file) * ".nc")
# Example: call external conversion tool
run(`radx_convert -f $(input_file) -outdir $(output_dir) -outformat cfradial`)
msg_debug("Converted $(basename(input_file))")
end
return length(input_files)
end
# Implement the quality check step
function Sparrow.workflow_step(workflow::SimpleRadarWorkflow, ::Type{QualityCheck},
input_dir::String, output_dir::String;
step_name::String="", step_num::Int=0, kwargs...)
msg_info("Performing quality control on data from $(input_dir)")
mkpath(output_dir)
# Find converted files
input_files = readdir(input_dir; join=true)
filter!(f -> endswith(f, ".nc"), input_files)
for input_file in input_files
# Your QC logic here
msg_debug("QC check on $(basename(input_file))")
# Copy file to output (replace with actual QC)
output_file = joinpath(output_dir, basename(input_file))
cp(input_file, output_file; force=true)
end
return length(input_files)
endStep 2: Run the Workflow
Run your workflow from the command line:
julia sparrow my_workflow.jl --datetime 20240101_000000 -v 2This will process data from January 1, 2024, 00:00:00 with informational message level.
Step 3: Use Distributed Processing
To use multiple workers for parallel processing:
julia sparrow my_workflow.jl --datetime 20240101_000000 \
--num_workers 4 --threads 2 -v 2This uses 4 distributed workers, each with 2 threads.
Step 4: Run on a Cluster
If you're using a Slurm cluster:
julia sparrow my_workflow.jl --datetime 20240101_000000 \
--slurm --num_workers 10This will submit jobs to Slurm with 10 workers.
Understanding Workflow Parameters
Required Parameters
Every workflow must have these parameters:
base_working_dir: Temporary working directory for intermediate filesbase_archive_dir: Directory for archived/processed filesbase_data_dir: Directory containing raw input datasteps: Vector of tuples:(step_name, step_type, input_directory, archive)
Common Optional Parameters
base_plot_dir: Directory for output plotsminute_span: Time span for each processing chunk (in minutes)reverse: Process files in reverse chronological order (default: false)message_level: Verbosity level (0-4, default: 2)raw_moment_names: Names of radar moments in raw dataqc_moment_names: Names of radar moments after QC
Accessing Parameters
Within your workflow step functions, you can access parameters using dictionary syntax:
function Sparrow.workflow_step(workflow::SimpleRadarWorkflow, ::Type{MyStep},
input_dir::String, output_dir::String;
kwargs...)
# Access required parameters
data_dir = workflow["base_data_dir"]
# Access with default value
span = Sparrow.get_param(workflow, "minute_span", 10)
# Access and type check
moments = Sparrow.get_param(workflow, "raw_moment_names", Vector{String})
endMessage System
Sparrow provides a structured message system for logging:
# Different severity levels
msg_error("Critical error!") # Level 0 - always shown
msg_warning("Something suspicious") # Level 1
msg_info("Processing file X") # Level 2 (default)
msg_debug("Intermediate value: Y") # Level 3
msg_trace("Detailed iteration Z") # Level 4
# Set message level globally
set_message_level(MSG_DEBUG) # Show debug and higher
# Or set in workflow parameters
workflow = MyWorkflow(
message_level = 3, # Debug level
# ... other params
)Next Steps
- Read the Workflow Guide for detailed information on building workflows
- Check out the Examples for more complex use cases
- Browse the API Reference for complete function documentation
Common Issues
World Age Errors
If you see world age errors, make sure you're using the sparrow launcher script rather than directly calling Julia, or use Base.invokelatest when dynamically loading workflow files.
Worker Communication
If workers can't access your workflow type, ensure the workflow file is included on all workers. The sparrow script handles this automatically.
File Path Issues
Always use absolute paths for directory parameters, or ensure relative paths are resolved correctly relative to where you run the command.