Why should I use Sampling Tool?
When working with large datasets in Excel, manually extracting samples can be time-consuming and error-prone. The Sampling Tool automates this process, allowing you to efficiently sample data based on user-defined parameters
. This VBA script provides both manual
and automatic
sampling modes, making data extraction seamless and flexible.
Manual vs. Automatic Sampling
- Manual Mode —Allows the user to specify the number of samples and the interval manually.
- Automatic Mode —Automatically calculates the interval by dividing the total data size by the sample size.
This flexibility ensures that users can choose the best method based on their dataset and requirements.
Example
Suppose you have a dataset with 1000 rows
and need 50 samples
:
Let’s say —
- In manual mode, you can set an
nth
interval of 20, meaning every 20th row will be picked. - In automatic mode, the VBA script calculates the interval automatically as
1000 / 50 = 20
.
So, what’s the differences—In Manual Mode you get a flexibility to change the interval of picking the samples whereas in Automatic Mode user only need to input the number of samples required and a random number that’s it, here the interval is automatically picked up by the code.
Impact of VBA Code and Functions
This VBA script enhances efficiency in data analysis by:
- Reducing manual errors in sample selection.
- Saving time in large dataset processing.
- Allowing structured and randomized sampling within a given range.
- Providing instant results with a dedicated output sheet named
Sampled_Data
.
VBA Code
Here’s the VBA code we are discussing:
Sub Sampling_tool()
Dim rng As Range
Dim ws As Worksheet, sampleSheet As Worksheet
Dim numSamples As Long, interval As Long, startNum As Long
Dim totalRows As Long
Dim i As Long, sampleRow As Long
Dim sampleCounter As Long
Dim activeWb As Workbook
Dim headerRow As Range
Dim mode As String
Dim modeChoice As VbMsgBoxResult
' Use the active workbook
Set activeWb = ActiveWorkbook
' Select the range including headers
On Error Resume Next
Set rng = Application.InputBox("Select the range including headers:", Type:=8)
On Error GoTo 0
If rng Is Nothing Then Exit Sub ' Exit if no range is selected
' Define the header row
Set headerRow = rng.Rows(1)
' Get the total number of rows excluding headers
totalRows = rng.Rows.Count - 1 ' Subtract 1 for the header row
' Get mode selection from user via dialog box
modeChoice = MsgBox("Choose Sampling Mode: Yes for Automatic, No for Manual", vbYesNoCancel + vbQuestion, "Sampling Mode Selection")
' Determine mode based on user selection
If modeChoice = vbYes Then
mode = "automatic"
ElseIf modeChoice = vbNo Then
mode = "manual"
Else
Exit Sub ' Exit if the user cancels
End If
' Get user inputs based on mode
If mode = "manual" Then
numSamples = CLng(Application.InputBox("Enter the number of samples:", Type:=1))
interval = CLng(Application.InputBox("Enter the sampling interval (e.g., every nth row):", Type:=1))
Else ' Automatic mode
numSamples = CLng(Application.InputBox("Enter the number of samples:", Type:=1))
interval = totalRows \ numSamples ' Calculate interval automatically
End If
startNum = CLng(Application.InputBox("Enter the random starting row number (within the interval):", Type:=1))
' Validate inputs
If numSamples <= 0 Or interval <= 0 Or startNum <= 0 Then
MsgBox "Invalid inputs. Please enter positive numbers.", vbExclamation
Exit Sub
End If
' Ensure the first sample is within range
If startNum > totalRows Then
MsgBox "Starting row exceeds the data range!", vbExclamation
Exit Sub
End If
' Create a new sheet for samples in the current workbook
Set ws = rng.Worksheet
On Error Resume Next
Set sampleSheet = activeWb.Sheets("Sampled_Data") ' Check if the sheet exists
On Error GoTo 0
' If sheet exists, delete it to create a fresh one
If Not sampleSheet Is Nothing Then
Application.DisplayAlerts = False
sampleSheet.Delete
Application.DisplayAlerts = True
End If
' Add a new sheet
Set sampleSheet = activeWb.Sheets.Add
sampleSheet.Name = "Sampled_Data"
' Add metadata to the new sheet
sampleSheet.Cells(1, 1).Value = "Samples Selected — " & numSamples
sampleSheet.Cells(2, 1).Value = "Population Size — " & totalRows
sampleSheet.Cells(3, 1).Value = "Sample Interval — " & interval
sampleSheet.Cells(4, 1).Value = "Initial Row — " & startNum
' Copy header to row 6 in the new sheet
headerRow.Copy Destination:=sampleSheet.Range("B6")
' Initialize sample counter
sampleCounter = 1 ' Start from row 7 (because row 6 is the header)
i = startNum
' Loop to extract samples
Do While sampleCounter <= numSamples And i <= totalRows
sampleRow = rng.row + i ' Actual row number in the sheet
' Ensure sampleRow does not exceed worksheet limits
If sampleRow > ws.Rows.Count Then Exit Do
ws.Rows(sampleRow).Copy Destination:=sampleSheet.Cells(sampleCounter + 6, 1) ' Start at A7
sampleCounter = sampleCounter + 1
i = i + interval ' Move to the next sample row
Loop
' Autofit columns in the new sheet
sampleSheet.Columns.AutoFit
MsgBox "Sampling complete! Data exported to 'Sampled_Data'", vbInformation
End Sub
How to Use This Code
To use this code in Excel:
- Open Excel and press
ALT + F11
to open the Visual Basic for Applications (VBA) editor. - In the VBA editor, insert a new module by right-clicking on any of the items in your project and selecting
Insert > Module
. - Copy and paste the above code into the module window.
- Close the VBA editor and return to Excel.
- Run the macro:
- Press
ALT + F8
, selectSampling_tool
, and clickRun
. - Choose between Manual or Automatic mode.
- Provide the necessary inputs.
- Press
- Check the newly created
Sampled_Data
sheet for your sampled dataset.
This tool simplifies the process of sampling large datasets and ensures structured data extraction without manual intervention.