Title: | Store and Manipulate Event Data |
---|---|
Description: | The events package manipulates, aggregates and otherwise messes with event data from 'KEDS' and 'TABARI' software and those with similar output. It also bundles several classic event data sets. Most functions are superseded by those in 'dplyr' and 'tidyr'. |
Authors: | William Lowe [aut, cre] |
Maintainer: | William Lowe <[email protected]> |
License: | GPL |
Version: | 0.6.1 |
Built: | 2024-11-07 04:26:53 UTC |
Source: | https://github.com/conjugateprior/events |
Lists actor codes
actors(edo)
actors(edo)
edo |
Event data |
Lists all the actor codes that occur in the event data in alphabetical order.
Array of actor codes
Will Lowe
data(levant.cameo) acts <- actors(levant.cameo) head(acts) tail(acts)
data(levant.cameo) acts <- actors(levant.cameo) head(acts) tail(acts)
Applies an eventscale to event data
add_eventscale(edo, sc)
add_eventscale(edo, sc)
edo |
Event data |
sc |
scale |
Applies an eventscale to event data. This adds a new field in the event data with the same name as the eventscale. Add as many as you want to keep around.
Event data with a scaling
Will Lowe
Coerce Event Scale to Data Frame
## S3 method for class 'eventscale' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
## S3 method for class 'eventscale' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
an event scale |
row.names |
ignored |
optional |
ignored |
... |
ignored This function converts a list with event codes as names and event scores as values into a data frame with column 'code' containing the event codes and column 'score' as the event's score |
a data.frame containing event codes and scores
Event data on the conflict during the collapse of Yugoslavia. Events are coded according to an extended WEIS scheme by the KEDS Project. The event stream contains 72953 events occurring between 2 April 1989 and 31 July 2003 involving 325 actors.
KEDS Project
Parus Analytics: https://www.parusanalytics.com/eventdata/data.dir/balk.html
A mapping of CAMEO event codes to [-10,10] representing a scale of conflict and cooperation, developed by the KEDS project. Taken from the documentation of the KEDS_Count software.
The version of CAMEO used here is 0.9B5 [07.03.2021].
KEDS Project
Parus Analytics: https://eventdata.parusanalytics.com/data.dir/cameo.html
Event data on Central Asia. Events are coded according to the WEIS scheme by the KEDS Project. The event stream contains 8377 events occurring between 02/05/1989 and 31/07/1999 involving 152 sources and 157 targets.
Original data comes from file CASIA.LEADS.6CODE (six character actor codes and coded from leads) with duplicates removed using the one_a_day filter.
KEDS Project
Parus Analytics: https://www.parusanalytics.com/eventdata/data.dir/casia.html
Lists event codes
codes(edo)
codes(edo)
edo |
Event data |
Lists all the event codes that appear in the event data
Array of event codes
Will Lowe
data(levant.cameo) cod <- codes(levant.cameo) head(codes) tail(codes)
data(levant.cameo) cod <- codes(levant.cameo) head(codes) tail(codes)
Stores, manipulates, scales, aggregates and creates directed dyadic time series from event data generated by KEDS, TABARI, or any other extraction tool with similarly structured output.
Events offers simple methods for aggregating and renaming actors and event codes, applying event scales, and constructing regular time series at a choice of temporal scales and measurement levels.
Will Lowe [email protected]
Discards all but relevant actors
filter_actors( edo, fun = function(x) TRUE, which = c("both", "target", "source") )
filter_actors( edo, fun = function(x) TRUE, which = c("both", "target", "source") )
edo |
Event data |
fun |
Function that returns |
which |
What actor roles should be filtered |
The which
parameter specifies whether the filter should be applied
only to targets, only to sources, or to all actors in the event data.
Event data containing only actors that pass through fun
Will Lowe
Discards all but relevant event codes
filter_codes(edo, fun = function(x) TRUE)
filter_codes(edo, fun = function(x) TRUE)
edo |
Event data |
fun |
Function that returns |
Applies the filter function to each event code to see whether to keep the observation.
Event data containing only events that pass through fun
Will Lowe
Applies a generic field filter to event data
filter_eventdata(edo, fun, which)
filter_eventdata(edo, fun, which)
edo |
Events data object |
fun |
Function that should be applied |
which |
Which fields should be filtered |
This function applies a filter function to event data.
It is the workhorse function behind the filter_
functions.
You should use these in ordinary use.
Event data
Will Lowe
data(levant.cameo) sp <- spotter("PAL", "ISR") ev_targ <- filter_eventdata(levant.cameo, sp, "target") # these actors as targets head(ev_targ, 3) ev_dyad <- filter_eventdata(levant.cameo, sp, c("source", "target")) # source and target head(ev_dyad, 3)
data(levant.cameo) sp <- spotter("PAL", "ISR") ev_targ <- filter_eventdata(levant.cameo, sp, "target") # these actors as targets head(ev_targ, 3) ev_dyad <- filter_eventdata(levant.cameo, sp, c("source", "target")) # source and target head(ev_dyad, 3)
Restricts events to a time period
filter_time(edo, start = min(edo$date), end = max(edo$date))
filter_time(edo, start = min(edo$date), end = max(edo$date))
edo |
Event data |
start |
A date or something convertable to a |
end |
A date or something convertable to a |
Restricts events on or after start
and before or on end
.
Event data restricted to a time period
Will Lowe
data(levant.cameo) ev_jan1980 <- filter_time(levant.cameo, start = as.Date("1980-01-01"), end = as.Date("1980-01-31")) ev_feb1980 <- filter_time(levant.cameo, start = "1980-02-01", end = "1980-01-29") ev_starttojan1980 <- filter_time(levant.cameo, end = "1980-01-29") head(ev_starttojan1980)
data(levant.cameo) ev_jan1980 <- filter_time(levant.cameo, start = as.Date("1980-01-01"), end = as.Date("1980-01-31")) ev_feb1980 <- filter_time(levant.cameo, start = "1980-02-01", end = "1980-01-29") ev_starttojan1980 <- filter_time(levant.cameo, end = "1980-01-29") head(ev_starttojan1980)
Event data for th Gulf States. Events are coded according to the CAMEO scheme by the KEDS Project. The event stream contains 29029 events occurring between 03/01/1992 and 07/31/2006 involving 411 sources and 397 targets.
KEDS Project
Parus Analytics: https://www.parusanalytics.com/eventdata/data.dir/gulf.html
Event data on Middle East. Events are coded according to the CAMEO scheme by the KEDS Project. The event stream contains 145709 events occurring between 15/04/1979 and 30/11/2011 involving 741 sources and 688 targets.
Original data comes from file REULE.201111.evt with documentation lines (marked as DOC DOC 999), match information, and duplicates removed.
KEDS Project
Parus Analytics: https://www.parusanalytics.com/eventdata/data.dir/levant.html
Aggregates events to a regular time interval
make_dyads( edo, scale = NULL, unit = c("week", "day", "month", "quarter", "year"), monday = TRUE, fun = mean, missing.data = NA )
make_dyads( edo, scale = NULL, unit = c("week", "day", "month", "quarter", "year"), monday = TRUE, fun = mean, missing.data = NA )
edo |
Event data |
scale |
Name of an eventscale or |
unit |
Temporal aggregation unit |
monday |
Whether weeks start on Monday. If |
fun |
Aggregation function. Should take a vector and return a scalar |
missing.data |
What weeks with no data are assigned |
In an event data set S, assume that =
length(actors(S))
actors
=
length(codes(S))
event codes occur. This function
creates data streams labelled by the combination of source and target
actors. If
scale
is NULL
these are -dimensional time series of event counts.
If
scale
names a scale that has been
added to the event data fun
is used to aggregate the events falling into
each temporal interval. This creates a univariate interval valued
time series for each directed dyad.
See the vignette for more detail and a worked example.
A list of named dyadic aggregated time series
Will Lowe
Creates a mapping function from list
make_fun_from_list(lst)
make_fun_from_list(lst)
lst |
A list |
Turns a list of the form list(a=c(1,2), b=3)
into a function
that returns 'a' when given 1 or 2 as argument, 'b' when given 3
and otherwise gives back its argument unchanged.
This is a convenience function to make it possible to specify onto mappings
using lists. The map_*
functions use it internally, but you might find a
a use for it.
A function that inverts the mapping specified by lst
Will Lowe
Makes an event scale
make_scale( name, types = NULL, values = NULL, file = NULL, desc = "", default = NA, sep = "," )
make_scale( name, types = NULL, values = NULL, file = NULL, desc = "", default = NA, sep = "," )
name |
Name of scale |
types |
Array of event codes |
values |
Array of event code values |
file |
Input file defining event codes and their values |
desc |
Optional description of the scale |
default |
What to assign event codes that have no mapping in the scale. Defaults to |
sep |
Separator in |
Makes an event scale from a specification found in a file or
using the types
and variables
parameters. If a file is specified it is assumed to be headerless and to
contain event codes in the first column and numerical values in the second
column.
Scales must be assigned a name and may also be assigned a
description. If you wish to assign codes without a specified value to
some particular value, set default
to something other than NA
.
An event scale object
Will Lowe
Aggregates actor codes
map_actors(edo, fun = function(x) x)
map_actors(edo, fun = function(x) x)
edo |
Event data |
fun |
Function or list specifying the aggregation mapping |
The function relabels actor codes according to the filter.
The filter may either be a function that returns the new name
of an event when handed the old one, or a list structured like
list(fruit=c('tomato', 'orange'), veg=c('red pepper', 'carrot'))
.
This function can also be used as a renaming function, but it is most useful when multiple codes should be treated as equivalent.
For a detailed example of event code and actor aggregation functions, see the 'Actor Filtering' and Count Aggregation' section of the vignette.
Event data with new actor codes
Will Lowe
Aggregates event codes
map_codes(edo, fun = function(x) x)
map_codes(edo, fun = function(x) x)
edo |
Event data |
fun |
Function or list specifying the aggregation mapping |
This function relabels event codes according to fun
,
which may either be a function that returns the new name
of an event when handed the old one, or a list with entries of the
form: lst[[newname]] = c(oldname1, oldname2)
.
It can also be used as a renaming function, but it is most useful when multiple codes should be treated as equivalent.
For a detailed example of event code and actor aggregation functions, see the 'Actor Filtering' and Count Aggregation' section of the vignette.
Event data with new event codes
Will Lowe
Tries to remove duplicate events
one_a_day(edo)
one_a_day(edo)
edo |
Event data object |
This function removes duplicates of any event that occurs to the same source and target on the same date with the same event code, on the assumption that these are in fact the same event reported twice.
This function can also be applied as part of read_keds
New event data object with duplicate events removed
Will Lowe
Plots scaled directed dyad
plot_dyad(dyad, ...)
plot_dyad(dyad, ...)
dyad |
One directed dyadic time series from the |
... |
Extra arguments to plot |
A convenience function to plot the named scale within a directed dyad against time.
Nothing, used for side effect
Will Lowe
Reads event data output files in free format
read_eventdata( d, col.format = "D.STC", one.a.day = TRUE, scrub.keds = TRUE, date.format = "%y%m%d", sep = "\t", head = FALSE )
read_eventdata( d, col.format = "D.STC", one.a.day = TRUE, scrub.keds = TRUE, date.format = "%y%m%d", sep = "\t", head = FALSE )
d |
Names of event data files |
col.format |
Format for columns in d (see details) |
one.a.day |
Whether to apply the duplicate event remover |
scrub.keds |
Whether to apply the data cleaner |
date.format |
How dates are represented in the orginal file |
sep |
File separator |
head |
Whether there is a header row in d |
Reads event data output and optionally applies the scrub_keds
cleaning function
and the one_a_day
duplicate removal filter.
This function assumes that d
is a vector of output files.
These are assumed to be sep
-separated text files. The column
ordering is given by the col.format
parameter:
D the date field
S the source actor field
T the target actor field
C the event code field
L the event code label field (optional)
Q the quote field (optional)
. (or anything not shown above) an ignorable column
e.g. the default "D.STC" format means that column 1 is the date, column 2 should be ignored, column 3 is the source, column 4 is the target, and column 5 is the event code. In this specification no quote or label columns are extracted.
The specification need only use the period to generate correct spacing, e.g. if there are 10 fields in each line, the first five of which are: data, something ignorable, source, target, event code, and the remaining five fields are ignorable then ""D.STC" is sufficient to extract date, source, target, and event code
The code plucks out just these columns, formats them appropriately and ignores everything else in the file. Only D, S, T, and C are required.
The format of the date field is given by format.date
An event data set
Will Lowe
# the first 1000 lines of raw TABARI output for Levant data, # (see data set "levant.cameo" for complete unlabeled data set) lev1000 <- system.file("extdata", "levant.cameo.top1000.txt", package = "events") evs1000 <- read_eventdata(lev1000, col.format = "DSTCL") head(evs1000, 3)
# the first 1000 lines of raw TABARI output for Levant data, # (see data set "levant.cameo" for complete unlabeled data set) lev1000 <- system.file("extdata", "levant.cameo.top1000.txt", package = "events") evs1000 <- read_eventdata(lev1000, col.format = "DSTCL") head(evs1000, 3)
Reads event data output files more robustly than read_eventdata
read_eventdata2( d, col.format = "D.STC", one.a.day = TRUE, scrub.keds = TRUE, date.format = "%y%m%d", sep = "\t", head = FALSE, verbose = TRUE )
read_eventdata2( d, col.format = "D.STC", one.a.day = TRUE, scrub.keds = TRUE, date.format = "%y%m%d", sep = "\t", head = FALSE, verbose = TRUE )
d |
Names of event data files |
col.format |
Format for columns in d (see details) |
one.a.day |
Whether to apply the duplicate event remover |
scrub.keds |
Whether to apply the data cleaner |
date.format |
How dates are represented in the orginal file |
sep |
File separator |
head |
Whether there is a header row in d |
verbose |
Whether to update read progress and report unreadable lines |
Reads event data output and optionally applies the scrub_keds
cleaning function and the one_a_day
duplicate removal filter.
This function is slower but more robust to line noise than
read_eventdata
. Use this when that one fails.
This function assumes that d
is a vector of output files.
These are assumed to be sep
-separated text files. The column
ordering is given by the col.format
parameter:
D the date field
S the source actor field
T the target actor field
C the event code field
L the event code label field (optional)
Q the quote field (optional)
. (or anything not shown above) an ignorable column
e.g. the defaul "D.STC" format means that column 1 is the date, column 2 should be ignored, column 3 is the source, column 4 is the target, and column 5 is the event code. The optional quote and label column are not searched for.
The code plucks out just these columns, formats them appropriately and ignores everything else in the file. Only D, S, T, and C are required.
The format of the date field is given by format.date
An event data set
Will Lowe
# the first 1000 lines of raw TABARI output for Levant data, # (see data set "levant.cameo" for complete unlabeled data set) lev1000 <- system.file("extdata", "levant.cameo.top1000.txt", package = "events") evs1000 <- read_eventdata2(lev1000, col.format = "DSTCL") head(evs1000, 3)
# the first 1000 lines of raw TABARI output for Levant data, # (see data set "levant.cameo" for complete unlabeled data set) lev1000 <- system.file("extdata", "levant.cameo.top1000.txt", package = "events") evs1000 <- read_eventdata2(lev1000, col.format = "DSTCL") head(evs1000, 3)
Reads KEDS event data output files
read_keds( d, keep.quote = FALSE, keep.label = TRUE, one.a.day = TRUE, scrub.keds = TRUE, date.format = "%y%m%d" )
read_keds( d, keep.quote = FALSE, keep.label = TRUE, one.a.day = TRUE, scrub.keds = TRUE, date.format = "%y%m%d" )
d |
Names of files of KEDS/TABARI output |
keep.quote |
Whether the exact noun phrase be retained |
keep.label |
Whether the label for the event code should be retained |
one.a.day |
Whether to apply the duplicate event remover |
scrub.keds |
Whether to apply the data cleaner |
date.format |
How dates are represented in the first column |
Reads KEDS output and optionally applies the scrub_keds
cleaning function
and the one_a_day
duplicate removal filter.
This function is thin wrapper around read_eventdata
which is
a thin wrapper around read.csv
.
For noisy datasets read_keds2
is slower but more robust.
Use that if this one fails.
This function assumes that d
are a vector of KEDS/TABARI output files.
These are assumed to be tab separated text files wherein the
first field is a date in yymmdd
format or as specified by date.format
,
the second and third fields are actor
codes, the fourth field is an event code, and the fifth field is a
text label for the event type, and the sixth field is a quote - some kind of
text from which the event code was inferred. Label and quote are optional and can
be discarded when reading in.
An event data set
Will Lowe
Reads KEDS/TABARI event data output files more robustly than read_keds
read_keds2( d, keep.quote = FALSE, keep.label = TRUE, one.a.day = TRUE, scrub.keds = TRUE, date.format = "%y%m%d", verbose = TRUE )
read_keds2( d, keep.quote = FALSE, keep.label = TRUE, one.a.day = TRUE, scrub.keds = TRUE, date.format = "%y%m%d", verbose = TRUE )
d |
Names of files of KEDS/TABARI output |
keep.quote |
Whether the exact noun phrase be retained |
keep.label |
Whether the label for the event code should be retained |
one.a.day |
Whether to apply the duplicate event remover |
scrub.keds |
Whether to apply the data cleaner |
date.format |
How dates are represented in the date column |
verbose |
Whether to show progress and report unreadable lines |
Reads KEDS/TABARI output and optionally applies
the scrub_keds
cleaning function
and the one_a_day
duplicate removal filter. This function is
slower but more robust to line noise than
read_keds
. Use this when that one fails.
This function assumes that d
are a vector of KEDS/TABARI output files.
These are assumed to be tab separated text files wherein the
first field is a date in yymmdd
format or as specified by date.format
,
the second and third fields are actor
codes, the fourth field is an event code, and the fifth field is a
text label for the event type, and the sixth field is a quote - some kind of
text from which the event code was inferred. Label and quote are optional and can
be discarded when reading in.
An event data set
Will Lowe
# the first 1000 lines of raw TABARI output for Levant data, # (see data set "levant.cameo" for complete unlabeled data set) lev1000 <- system.file("extdata", "levant.cameo.top1000.txt", package = "events") evs1000 <- read_keds2(lev1000) head(evs1000, 3)
# the first 1000 lines of raw TABARI output for Levant data, # (see data set "levant.cameo" for complete unlabeled data set) lev1000 <- system.file("extdata", "levant.cameo.top1000.txt", package = "events") evs1000 <- read_keds2(lev1000) head(evs1000, 3)
Shows which events codes are covered by a scale
scale_codes(es)
scale_codes(es)
es |
Eventscale |
Returns an array of event codes to which an eventscale assigns a value.
Array of scaleable event codes
Will Lowe
Checks coverage of scale for event data
scale_coverage(sc, edo)
scale_coverage(sc, edo)
sc |
An eventscale |
edo |
Event data |
Returns an array of event codes that occur in an event data set but are not assigned values by the scale. These are the codes that will, in subsequent processing, be assigned the scale's default value.
Array of unscaleable event codes
Will Lowe
Gets scale scores for event codes
score(eventscale, codes)
score(eventscale, codes)
eventscale |
An event scale |
codes |
Event codes |
Returns an array of scores corresponding to the the second argument's scale values or the scale's default value if not recognized.
You should use this function to avoid relying on the internal structure of event scales. They are currently lists, but this may change.
Numerical values for each event codes from the scale
Will Lowe
Removes well-known noise from KEDS output files
scrub_keds(edo)
scrub_keds(edo)
edo |
An event data object |
This function applies the regular expression based cleaning routine from the KEDS website. This is a direct translation from the original PERL which replaces capital 'O's and small 'l's with 0 and 1 respectively and removes the event code '—]', on the assumption that these are all output noise.
Event data
Will Lowe
Lists source actor codes
sources(edo)
sources(edo)
edo |
Event data |
Lists all the actor codes that appear as a source in the event data in alphabetical order.
Array of actor codes
Will Lowe
data(levant.cameo) src <- sources(levant.cameo) head(src) tail(src)
data(levant.cameo) src <- sources(levant.cameo) head(src) tail(src)
Hands back a function to spot the items it was given in (...
)
spotter(...)
spotter(...)
... |
The actor names for which the new function should return |
This is a convenience function for creates a function that returns true for exact matches to its arguments.
A function
Will Lowe
data("balkans.weis") head(balkans.weis, 3) sp <- spotter("SER", "SERMIL") events <- filter_actors(balkans.weis, sp) head(events, 3)
data("balkans.weis") head(balkans.weis, 3) sp <- spotter("SER", "SERMIL") events <- filter_actors(balkans.weis, sp) head(events, 3)
Summarises a set of event data
## S3 method for class 'eventdata' summary(object, ...)
## S3 method for class 'eventdata' summary(object, ...)
object |
Event data object |
... |
Not used |
This is a compact summary of an event data object. For more detail consult the object itself. Currently it is simply a data.frame with conventionally named column names, but that almost certainly will change to deal with larger datasets in later package versions. If your code uses the package's accessor functions then you won't feel a thing when this happens.
A short description of the event data
Will Lowe
Summarise an eventscale
## S3 method for class 'eventscale' summary(object, ...)
## S3 method for class 'eventscale' summary(object, ...)
object |
Scale |
... |
Not used |
Print summary statistics for an eventscale.
Nothing, used for side effect
Will Lowe
Lists target actor codes
targets(edo)
targets(edo)
edo |
Event data |
Lists all the actor codes that appear as a target in the event data in alphabetical order.
Array of actor codes
Will Lowe
data(levant.cameo) targs <- sources(levant.cameo) head(targs) tail(targs)
data(levant.cameo) targs <- sources(levant.cameo) head(targs) tail(targs)
Event data for Turkey. Events are coded according to the CAMEO scheme by the KEDS Project. The event stream contains 54466 events involving 164 sources and 166 targets between 15/04/1979 and 31/03/1999.
Note: This is the data set with only leads coded (GULF99.zip)
KEDS Project
Parus Analytics: https://www.parusanalytics.com/eventdata/data.dir/turkey.html
A mapping of WEIS event codes to [-10,10] representing a scale
of conflict and cooperation, developed by Joshua Goldstein and
slightly extended for the KEDS project.
Note: This mapping does not cover all the event codes in balkans.weis
.
Taken from the KEDS Project's documentation.
KEDS Project
Parus Analytics: https://www.parusanalytics.com/eventdata/data.dir/weis.html