A developer needs to read the User Documentation first, since this describes the top-down rationale for the processing pipeline. Then they should look at the section below Outline of application below for the bottom-up structure of this application: the code and additional resources are diverse and scattered, so this roadmap may be useful. Later sections deal with more nitty-gritty matters.
The distribution includes not just Java source code, but also
configuration files, native code, some data, libraries, documentation,
scripts, and code used for performing tests.
|-- build/ |-- data/ |-- html/ |-- jri/ |-- lib/ |-- native32/ |-- native64/ |-- native-src/ |-- scripts/ |-- servercache/ |-- shellscripts/ |-- src/ |-- test-src/ |-- backend.properties |-- build.xml |-- demo |-- demo.bat |-- derby.properties |-- info |-- info.bat `-- logging.properties
The main body of source code is located under src/, and comprises about 140 Java files in 12 functional groups. A (truncated) listing is shown below:
|-- backendClasses/ | `-- Preproc.java |-- channelClasses/ | |-- Channel.java | |-- ChannelScript.java | `-- Gratton.java |-- com/ | `-- nr/ |-- dsp/ | |-- FIR/ | `-- IIR/ |-- epochClasses/ | |-- EpochScript.java | |-- EpochSelector.java | |-- QueryResult.java | |-- QueryResultServerside.java | |-- QueryResultSuper.java | `-- Subset.java |-- frontendClasses/ | |-- CLI.java | |-- Demo.java | |-- MemoryMonitor.java | |-- Prescorer.java | `-- QueryDB.java |-- generalClasses/ | |-- BrowserLaunch.java | |-- CliHelpFrame.java | |-- ColorTableCellEditor.java | |-- CommonLog.java | |-- DataMode.java | |-- DetrendedFA.java | |-- Event.java | |-- GBC.java | |-- HmsFormatter.java | |-- HtmlHelpFrame.java | |-- IsoFormatter.java | |-- OptionArg.java | |-- ParamUtils.java | |-- QRFormatterDOM.java | |-- QRFormatterMat.java | |-- QRFormatterR.java | |-- QRFormatterSQL.java | |-- QRFormatterTxt.java | |-- QRSpt.java | |-- QueryResultGrid.java | |-- RAMJavaFileObject.java | |-- RuntimeCompiler.java | |-- Site.java | |-- SiteScalp.java | |-- SiteSet.java | |-- StreamingLogHandler.java | |-- StringJavaFileObject.java | |-- TabulateSeries.java | |-- Trans.java | |-- Unit.java | `-- Units.java |-- jnt/ | `-- FFT/ |-- outputClasses/ | |-- DBExport.java | |-- Export.java | `-- GenericOutput.java |-- plotClasses/ | |-- scores/ | |-- Cartesian.java | |-- CartesianX.java | |-- GenericDisplayFrame.java | |-- QRStatistics.java | |-- Stack.java | |-- Summary.java | `-- TiledStack.java |-- recordingClasses/ | |-- FileFormat.java | |-- Recording.java | |-- RecordingCnt.java | |-- RecordingEdf.java | |-- RecordingEeg.java | |-- RecordingException.java | |-- RecordingExceptions.java | |-- RecordingMec.java | `-- RecordingNs5.java |-- scorerClasses/ | |-- scores/ | |-- CallR.java | |-- DFA.java | |-- EegFit.java | |-- EegFitInterface.java | |-- GenericScorer.java | |-- HeartRate.java | |-- ReactionTime.java | |-- SpectPL.java | |-- TradERP.java | `-- TradSpect.java |-- seriesClasses/ | |-- binaryEvents/ | |-- seriesGeneration/ | |-- BasicStats.java | |-- Series.java | |-- SeriesAnalog.java | |-- SeriesBinary.java | |-- SeriesException.java | `-- Transformation.java |-- serverPlotClasses/ | |-- FullTimeSeries.java | `-- RecordingSummary.java `-- utilClasses/ |-- BuildDB1.java |-- BuildDB2.java `-- BuildDB3.java
All building and testing of Jeda can be performed via Ant tasks:
ant all Clean build directories, then compile and test [default] ant archive Create archive of all source files, JARs, scripts ant clean Delete build, docs and test directories ant cleanCache Delete all files in the server cache ant compile Compile Java sources and associated native code ant help Describe Ant and program options (Linux version only) ant isodist Create ISO comprising Java source+class and supporting files ant jardist Create jar of Java source files, and supporting files ant javadoc Create static and API documentation ant linecount Count lines of original java source code ant non Compile the Java source files ant r2r Create binary distribution, with supporting files ant testa Compile tests ant testb Compile and run tests: unit only ant testc Compile and run tests: unit and end-to-end ant testd Compile and run tests: tests under development
The functional outline is sketched below. It shows how
Time-series operations (those associated with the -scriptChannel
command line option) cover filtering, re-referencing, artifact correction,
resampling and decompositions. Most of the code for these operations is
built into the application, so the scripts usually just need to call
pre-existing methods within the Channel or the SeriesAnalog
classes.
The same operations can also be used to generate artificial (or 'mock') time-series, which is very useful for testing new analysis methods. The directory scripts/channelClasses/ contains examples of scripts that generate mock datasets, each approximating data from one of the standard Brain Resource paradigms. These datasets can have known ERPs, known variations across the scalp, known noise characteristics, and known event timings.
Each script typically (i) defines a subset of channels, (ii) does some operation on that subset, and (iii) update the parent list. Often the input data will contain a mixture of EEG, EOG, EMG, ECG, etc channels. The scripts will often have to perform operations tailored to each of the recording modalities, and repeat the above steps for each subset.
There are many methods available for creating and refining subsets:
| listsUnion(list1,list2,...) | Returns a merging of multiple lists |
| listsIntersect(list1,list2) | Returns the intersection of two lists |
| selectMode(DataMode mode) | Returns a list of time-series matching 'mode' |
| discardMode(DataMode mode) | Returns a list of time-series not matching 'mode' |
| selectSite(String regex) | Returns a list of time-series with names matching 'regex' |
| selectSite(String regex, list) | Returns the subset of 'list' that have names matching 'regex' |
| discardSite(String regex, list) | Returns the subset of 'list' that have names not matching 'regex' |
The second step is typically to do something to a subset of time-series. The script will iterate through a list, calling the appropriate methods on each time-series. Many methods exist within the SeriesAnalog class for basic waveform arithmetic and filtering. Methods that operate on multiple channels simultaneously are:
| av(list) | Returns the result of spatial averaging |
| doGratton(float epochDur) | Performs EOG artifact correction |
The third step is to update the list of raw time-series, or — in the case of mock data — to initialize the list. It may also be necessary to initialize the list of stimulus/response events: this too is possible.
| replaceAllSeries(list) | Replace the current Recording's series with the specified list |
| updateMatchingSeries(list) | Update the current Recording's series with the specified list |
| appendToCurrentSeries(series) | Append one series to the Recording's set of series |
| replaceAllEvents(newEvents) | Replace the current Recording's event list with the specified list |
New scripts should be placed somewhere under scripts/channelClasses, and called via the -scriptChannel command line option.
Epoch-related scripts (those associated with the -scriptEpoch
command line option) generate a table of event attributes. The rows of the
table correspond to 'events'; and these events might be stimulus
events, response events, or might be synthetic events
computed by the script. The script takes the list of low-level events
created during reading of the data file, is free to augment or transform
that list in any way. Thus a single low-level event of know finite duration
might be represented in the attribute table as one on-set event and one
off-set event. Or synthetic events might be created that are time-locked
to the onset of an alpha burst. There are no restrictions on what is
regarded as an event.
The columns of the attribute table are similarly unconstrained. Each column has a label and datatype, and one value per row. The column labels are alphanumeric strings (no spaces, case-insensitive); the allowed data types are integer, float, double, string, or timestamp; and the values are the corresponding Java objects (or null. The data types are strings like "FLOAT", "INTEGER", "VARCHAR5", "VARCHAR10", etc (strings are represented this way to facilitate their subsequent export to an SQL table).
Accordingly, the principal job of the script is to generate an array String[2][nCols] th and Object[nRows][nCols] td, and to return this information to the calling method. The meta information, th, combines the label and data type. The abstract superclass specifies methods such that the attribute values, td are returned one row at a time via an iterator: this suits later translation into SQL UPDATE commands.
Much of the power of this program derives from imaginative use of this script. Attaching an arbitrary number of attributes to any number of specific time points means that epochs can be binned in a way to tell any story.
Apart from the attribute table, the script also returns the offset (from each event) to the start of the epoch; and the duration of epochs. This might seem a separate matter that can be chosen later, however choosing these values within the script allows extra checking of what events to include in the attribute table. Events near the start and end of a recording will be disqualified if the corresponding epoch extends beyond the limits of the recording.
New scripts should be placed somewhere under scripts/epochClasses, and called via the -paradigm and -scriptEpoch command line options.
The result of the prior stages in the processing pipeline is a
data object comprising a 2-D grid of 'frames', where each frame is
a set of waveforms. See this example
of the canonical representation of such an object. The rows and columns
of the grid are implied by the arguments to -binV and -binH,
while the waveforms within each frame are distinguished by the argument
to -binZ.
The job of the scoring plug-in is to compute new values: values that might be associates with individual waveforms, or global values associated with the entire frame. Some examples are provided under the directory scorerClasses (simple examples and templates).
The second job of each scoring plug-in is to bundle the scores in such a way that the values can be plotted (where appropriate) by display plug-ins, and can be exported in various formats for separate statistical analysis. This is a complex matter. See How to manage scores for more about the data structures used for scores.
Display (and export) options take the grid of frames, possibly augmented
by scores, and generate output.
It is a challenge of sorts to obtain the time-series and scores from the grid of frames (particularly the fancy use of reflection). However it should be minor compared with the challenge of building a good GUI.
Ideas for additional display options:
Verification and validation during development involves both fine-grained
and overall tests. There is a unit testing
suite implemented for Jeda, which tests large or small program components
in isolation.
In one set of tests, specific inputs are supplied to methods, and the returned values are compared to expected values (and testb). This is well-suited to verification of the preprocessing (server) aspects of this application. In another set of tests, command-line arguments are passed to the main routine, and the full pipeline is checked against expected results (and testc).
ant testa Compile tests ant testb Compile and run tests: unit only ant testc Compile and run tests: unit and end-to-end ant testd Compile and run tests: tests under development
Unit testing requires that the Jar file from TestNG be installed. I unzip the TestNG package in its own directory, and then edit the property testng.jar in the build file, to point to the relevent jar file.
The command line options -previewSeries, -reviewSeries and -v also provide diagnostics, with user-selectable levels of detail.
This division of the overall task into three successive steps (dealing
with time-series operations, epoch selection, and output operations)
seems reasonable. Generally time-series operations do not
depend on events, and epoch selection do not affect time-series.
However it is easy to imagine situations that challenge this separation
of tasks. For example epochs might have as an attribute the phase of
the alpha power at the time of the stimulus: this would be generated
as part of epoch selection but requires access to the EEG time-series
from one scalp site. Another example is if some time-series filter
operation has to be applied at times marked by events, such as events
indicating the onset of an fMRI volume acquisition. These occasional
complications can be accommodated, but are mentioned here to show why
EEG analysis programs will never be simple to program, nor simple to use.
The complications multiply when there are acceptance criteria applied to the data. It can be unclear at what stage of the process that the criteria are applied: before or after filtering, before or after epoching?
The approach adopted here embraces scripting, which is explicit, flexible and reproducible. Also it copes well with arbitrary complexity, unlike GUIs. The scripts used by this program are central to its ability to transcend the humdrum, over-familiar `Tg vs Bg' level of analysis, and to do so with some level of convenience. Thus much of the complexity of analyses is encapsulated in a set of scripts for performing operations related to time-series, and another set related to epoch selection. This, plus the arbitrary number of view and scoring options, should cover most eventualities.
A nice example of the virtues of scripting is in the testing of scoring and display algorithms. The scripts can be used to generate continuous time-series with particular known characteristics, which can be compared to the output.
So far it appears that time-series operations (spatial averaging, re-montaging, filtering, EOG correction, etc) are relatively simple to implement. That is not true of the event information.
The plan is to deal with all these issues in a stepwise fashion, from decoding the raw events through to presenting a table from which users can choose epochs in a natural way. The following should be performed during the loading of each data file, due to the hardware-dependent nature of this data:
Then the user, using standard SQL commands, can select a number of rows from the database specific to the analysis goal, e.g. "(StimResp='TgNoresp' or StimResp='TgResp') and Site LIKE '_z' and Artifact<100". Thus a subset of both times and channels can be selected by a common mechanism. See the following suggestion of what a DB table might look like.
| Time | Stim | StimResp | TgOrd | BgOrd | SCR | Site | Artifact | ID |
|---|---|---|---|---|---|---|---|---|
| 3.232 | Tg | TgNoResp | 1 | NA | f | Cz | 5.4 | 10023883 |
| 4.452 | Tg | TgResp | 2 | NA | t | Cz | 9.4 | 10023883 |
| 6.459 | Bg | BgNoResp | NA | 1 | f | Cz | 32.1 | 10023883 |
| 7.862 | Tg | TgNoResp | 3 | NA | t | Pz | 105.8 | 10023883 |
| : | : | : | : | : | : | : | : | : |
Overall, it can be seen that epoch selection is very structured. Note how hardware-specific tasks are built into the file reader, paradigm-specific operations are implemented in a suitably crafted script, and only the epoch choice specific to the analysis goal is just a single line to be typed by the user. At each stage it is the right method for the job (c.f. an alternative approach).
The present scheme allows for good flexibility in bundling single trials into arbitrary subsets and then either scoring or viewing the result — but how to perform differencing prior to scoring? Differencing might be added as an option to the viewing module, and to the scoring module, but clearly the proper solution is to factor out such operations. How about grouping layout with epoch selection, as they collectively contribute to the subsetting problem; simplify the viewing and scoring group to show only viewing options; and adding the scoring to the view window — only the relevant scoring options.
It might be argued that subsetting is of little appeal when applied only to individual recordings. There would be too little signal to noise when subsets are formed. Perhaps the ideas expressed here should be expanded to allow multiple recordings, each with distinguishing attributes like subject ID or diagnostic category. Then waveforms from different subjects could be overlaid, as could spectra from different age bins. These are very desirable options indeed. However it is challenging for at least three reasons:
It is feasible to generate figures and scores from the command line. The value of this lies in reproducibility. Once the command line incantation is finalized, it can be used without any hesitation or complication to regenerate results. Also, in the context of report generation, the chance of human error can be reduced by command shell scripting of all figure generation.
It is feasible to carry out statistics and high-quality visualization directly from this application. With the aid of Java wrappers for R from Omegahat/RSJava, and the shared object version of R (/usr/lib/R/lib/libR.so), it is possible to perform any R operation, and return the results into the Java environment. See the tutorial RFromJava.pdf. This is one way to extend the scope of this application to cover the generation of figures for reports. The same functionality is available via rJava plus JRI. Likewise, JMatLink is well suited to help explot MatLab.
There are various ways to monitor memory usage, number of threads, etc.
Utilities include jstat,
jconsole
and
jvisualvm.
All work well and reliably when both they and the target application are
on the same host. Problems only arise when trying to monitor a remote
system.
For remote monitoring it is essential that there is an RMI registry running on the remote host, and for jstatd to be running. The local host will try to initiate a TCP connection to the remote RMI registry, and will fail (with a poor explanation) if that registry is not there or if blocked by a firewall. The simplest way to launch a registry is to run jstatd on the remote host thus:
remote:~/progs $ jstatd \
-J-Djava.security.policy=jstatd.all.policy \
[-J-Djava.rmi.server.logCalls=true] \
-J-Djava.rmi.server.hostname=172.20.22.12 \
[-p 1099] [-n JStatRemoteHost] &
Started this way (no -nr option) jstatd will create its own RMI registry, and will listen by default on TCP port 1099, and use the name 'JStatRemoteHost' to distinguish it from other registries that may be listening on the same port. [Note that as of late 2009 (Java version 1.6.0_17) it was essential that jstatd uses the default port and registry name.] There seem to be other ports involved, however, (try netstat -utanp | grep jstatd), so I find that firewalling must be completely disabled on the remote machine before monitoring can take place. Specifying a policy file is essential: the standard one is a three-liner, and easily found in the documentation. The hostname specification is peculiar. Most users don't require this property to be set, but I found that my machines infer their IP address to be 127.0.1.1; and that this cripples RMI. Setting the 'logCalls' property to true is extremely useful initially. You will probably want to start jstatd in the background ('&'), since you can then close the shell and jstatd will continue running (i.e. disown is not required).
Once jstatd is started it will report two items (if logging is turned on) then wait. You can further confirm that all is OK by turning to the local machine (firewalling of the local machine is OK), and running
local:~/ $ jps -l [-m] [-v] rmi://172.20.22.12 local:~/ $ jvisualvm
jvisualvm will need to be told the name and IP of the remote host. Then it will list all java apps running on that machine, including jstatd.
Now run an application such as Jeda on the remote host. Run with the following additional properties:
remote:~/progs $ java \
-Dcom.sun.management.jmxremote.port=12345 \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.authenticate=false ....
I am not sure of the significance or necessity of the port number.
And 'authenticate' can be set either to true or false, it seems (maybe
the permissive jstatd.all.policy file used by jstatd
renders this property irrelevant.)
The application should immediately be registered by jvisualvm, and monitoring will commence after right-clicking on that tree item, and selecting 'Open'. It is not necessary to enter the port number 12345, nor is it necessary to specify any username/password. Nor is there any call to enter any role/rolePassword defined under $JAVA_HOME/jre/lib/management. You can then monitor memory usage number of threads and number of classes.
The above is quite complex, and there are residual doubts about ports and authentication (does jmxremote.port=12345 allow jstatd to be bypassed?).
Overall, I suspect that this exercise is of dubious value, when there is the alternative of doing an ssh -X to the remote machine, and running jvisualvm on the remote host. This is just as responsive in my case, and jvisualvm is able to report more diagnostics when run natively. Also there is no need for jstatd, nor for all the extra property settings when starting an application, nor for firewalling to be disabled.
There are many analysis tools (see here)
although few are both alive and directly comparable to Jeda.
The most interesting available peer comparison
is with EEGLAB.
This is built on Matlab, and so has all the associated strengths and
weaknesses: easy vector and matrix operations, accessible scripts, user
extensions, a constrained model for plotting, no (enforced) object
orientation.
There is another relevant package, named bioelectromagnetism, which is definitely aimed at Matlab enthusiasts. It only accepts epoched data as input, so is complementary to the core functionality of Jeda. It might be of use for MRI-ERP data fusion, and for source localization via BrainStorm.
The Biosignal package is designed to handle many different recording modalities.
BrainVision Analyser is a very complete package for EEG and ERP analysis.
| Validate HTML CSS | Last changed 2010-10-07 |