You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: EidosScribe/EidosHelpFunctions.rtf
+6-4Lines changed: 6 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -4689,7 +4689,11 @@ The separator between values is supplied by
4689
4689
\f1\fs18 sep
4690
4690
\f3\fs20 ; it is a comma by default, but a tab can be used instead by supplying tab (
4691
4691
\f1\fs18 "\\t"
4692
-
\f3\fs20 in Eidos), or another character may also be used.\
4692
+
\f3\fs20 in Eidos), or another character may also be used. If
4693
+
\f1\fs18 sep
4694
+
\f3\fs20 is the empty string
4695
+
\f1\fs18 ""
4696
+
\f3\fs20 , the separator between values is \'93whitespace\'94, meaning one or more spaces or tabs. When the separator is whitespace, whitespace at the beginning or the end of a line will be ignored.\
4693
4697
Similarly, the character used to quote string values is a double quote (
4694
4698
\f1\fs18 '"'
4695
4699
\f3\fs20 in Eidos), by default, but another character may be supplied in
@@ -4910,7 +4914,6 @@ See
4910
4914
\f3\fs20 will be returned; if not,
4911
4915
\f1\fs18 F
4912
4916
\f3\fs20 will be returned (but at present, an error will result instead).\cf0\
<pclass="p5"><b>Reads data from a CSV or other delimited file</b> specified by <spanclass="s2">filePath</span> and returns a <spanclass="s2">DataFrame</span> object containing the data in a tabular form.<spanclass="Apple-converted-space"></span>CSV (comma-separated value) files use a somewhat standard file format in which a table of data is provided, with values within a row separated by commas, while rows in the table are separated by newlines.<spanclass="Apple-converted-space"></span>Software from R to Excel (and Eidos; see the <spanclass="s2">serialize()</span> method of <spanclass="s2">Dictionary</span>) can export data in CSV format.<spanclass="Apple-converted-space"></span>This function can actually also read files that use a delimiter other than commas; TSV (tab-separated value) files are a popular alternative.<spanclass="Apple-converted-space"></span>Since there is substantial variation in the exact file format for CSV files, this documentation will try to specify the precise format expected by this function.<spanclass="Apple-converted-space"></span>Note that CSV files represent values differently that Eidos usually does, and some of the format options allowed by <spanclass="s2">readCSV()</span>, such as decimal commas, are not otherwise available in Eidos.</p>
385
385
<pclass="p5">If <spanclass="s2">colNames</span> is <spanclass="s2">T</span> (the default), the first row of data is taken to be a header, containing the string names of the columns in the data table; those names will be used by the resulting <spanclass="s2">DataFrame</span>.<spanclass="Apple-converted-space"></span>If <spanclass="s2">colNames</span> is <spanclass="s2">F</span>, a header row is not expected and column names are auto-generated as <spanclass="s2">X1</span>, <spanclass="s2">X2</span>, etc.<spanclass="Apple-converted-space"></span>If <spanclass="s2">colNames</span> is a <spanclass="s2">string</span> vector, a header row is not expected and <spanclass="s2">colNames</span> will be used as the column names; if additional columns exist beyond the length of <spanclass="s2">colNames</span> their names will be auto-generated.<spanclass="Apple-converted-space"></span>Duplicate column names will generate a warning and be made unique.</p>
386
386
<pclass="p5">If <spanclass="s2">colTypes</span> is <spanclass="s2">NULL</span> (the default), the value type for each column will be guessed from the values it contains, as described below.<spanclass="Apple-converted-space"></span>If <spanclass="s2">colTypes</span> is a singleton <spanclass="s2">string</span>, it should contain single-letter codes indicating the desired type for each column, from left to right.<spanclass="Apple-converted-space"></span>The letters <spanclass="s2">lifs</span> have the same meaning as in Eidos signatures (<spanclass="s2">logical</span>, <spanclass="s2">integer</span>, <spanclass="s2">float</span>, and <spanclass="s2">string</span>); in addition, <spanclass="s2">?</span> may be used to indicate that the type for that column should be guessed as by default, and <spanclass="s2">_</span> or <spanclass="s2">-</span> may be used to indicate that that column should be skipped – omitted from the returned <spanclass="s2">DataFrame</span>.<spanclass="Apple-converted-space"></span>Other characters in <spanclass="s2">colTypes</span> will result in an error.<spanclass="Apple-converted-space"></span>If additional columns exist beyond the end of the <spanclass="s2">colTypes</span> string their types will be guessed as by default.</p>
387
-
<pclass="p5">The separator between values is supplied by <spanclass="s2">sep</span>; it is a comma by default, but a tab can be used instead by supplying tab (<spanclass="s2">"\t"</span> in Eidos), or another character may also be used.</p>
387
+
<pclass="p5">The separator between values is supplied by <spanclass="s2">sep</span>; it is a comma by default, but a tab can be used instead by supplying tab (<spanclass="s2">"\t"</span> in Eidos), or another character may also be used.<spanclass="Apple-converted-space"></span>If <spanclass="s2">sep</span> is the empty string <spanclass="s2">""</span>, the separator between values is “whitespace”, meaning one or more spaces or tabs.<spanclass="Apple-converted-space"></span>When the separator is whitespace, whitespace at the beginning or the end of a line will be ignored.</p>
388
388
<pclass="p5">Similarly, the character used to quote string values is a double quote (<spanclass="s2">'"'</span> in Eidos), by default, but another character may be supplied in <spanclass="s2">quote</span>.<spanclass="Apple-converted-space"></span>When the string delimiter is encountered, <i>all</i> following characters are considered to be part of the string until another string delimiter is encountered, terminating the string; this includes spaces, comment characters, newlines, and everything else.<spanclass="Apple-converted-space"></span>Within a string value, the string delimiter itself is used twice in a row to indicate that the delimiter itself is present within the string; for example, if the string value (shown without the usual surrounding quotes to try to avoid confusion) is <spanclass="s2">she said "hello"</span>, and the string delimiter is the double quote as it is by default, then in the CSV file the value would be given as <spanclass="s2">"she said ""hello"""</span>.<spanclass="Apple-converted-space"></span>The usual Eidos style of escaping characters using a backslash is <i>not</i> part of the CSV standard followed here.<spanclass="Apple-converted-space"></span>(When a string value is provided <i>without</i> using the string delimiter, all following characters are considered part of the string except a newline, the value separator <spanclass="s2">sep</span>, the quote separator <spanclass="s2">quote</span>, and the comment separator <spanclass="s2">comment</span>; if none of those characters are present in the string value, the quote delimiter may be omitted.)</p>
389
389
<pclass="p5">The character used to indicate a decimal delimiter in numbers may be supplied with <spanclass="s2">dec</span>; by default this is <spanclass="s2">"."</span> (and so <spanclass="s2">10.0</span> would be ten, written with a decimal point), but <spanclass="s2">","</span> is common in European data files (and so <spanclass="s2">10,0</span> would be ten, written with a decimal comma).<spanclass="Apple-converted-space"></span>Note that <spanclass="s2">dec</span> and <spanclass="s2">sep</span> may not be the same, so that it is unambiguous whether <spanclass="s2">10,0</span> is two numbers (<spanclass="s2">10</span> and <spanclass="s2">0</span>) or one number (<spanclass="s2">10.0</span>).<spanclass="Apple-converted-space"></span>For this reason, European CSV files that use a decimal comma typically use a semicolon as the value separator, which may be supplied with <spanclass="s2">sep=";"</span> to <spanclass="s2">readCSV()</span>.</p>
390
390
<pclass="p5">Finally, the remainder of a line following a comment character will be ignored when the file is read; by default <spanclass="s2">comment</span> is the empty string, <spanclass="s2">""</span>, indicating that comments do not exist at all, but <spanclass="s2">"#"</span> is a popular comment prefix.</p>
Copy file name to clipboardExpand all lines: VERSIONS
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,7 @@ development head (in the master branch):
15
15
fix the error position reported for assignment into a non-existent property; this fixes a bug in SLiMgui's autofix feature, as a side effect (with, e.g., "sim.generation = 5")
16
16
revise recipe 6.1.2 (reading a recombination map from a file) to use readCSV() instead of readFile()
17
17
extend the subset() method of DataFrame to accept NULL for rows/cols, to take entire columns or entire rows respectively, for usability
18
+
extend readCSV() to allow sep="", meaning that the separator is "whitespace", as in R
EIDOS_TERMINATION << "ERROR (Eidos_ExecuteFunction_readCSV): readCSV() requires that sep be a string of exactly one character." << EidosTerminate(nullptr);
578
+
if (sep_string.length() >1)
579
+
EIDOS_TERMINATION << "ERROR (Eidos_ExecuteFunction_readCSV): readCSV() requires that sep be a string of exactly one character, or the empty string \"\"." << EidosTerminate(nullptr);
580
580
if (quote_string.length() != 1)
581
581
EIDOS_TERMINATION << "ERROR (Eidos_ExecuteFunction_readCSV): readCSV() requires that quote be a string of exactly one character." << EidosTerminate(nullptr);
582
582
if (dec_string.length() != 1)
583
583
EIDOS_TERMINATION << "ERROR (Eidos_ExecuteFunction_readCSV): readCSV() requires that dec be a string of exactly one character." << EidosTerminate(nullptr);
584
584
if (comment_string.length() > 1)
585
585
EIDOS_TERMINATION << "ERROR (Eidos_ExecuteFunction_readCSV): readCSV() requires that comment be a string of exactly one character, or the empty string." << EidosTerminate(nullptr);
586
586
587
-
char sep = sep_string[0];
587
+
char sep = (sep_string.length() ? sep_string[0] : 0); // 0 indicates "whitespace separator", a special case
EIDOS_TERMINATION << "ERROR (Eidos_ExecuteFunction_readCSV): readCSV() requires sep, quote, dec, and comment to be different from each other." << EidosTerminate(nullptr);
EIDOS_TERMINATION << "ERROR (Eidos_ExecuteFunction_readCSV): readCSV() requires that dec be a printable, non-alphanumeric character that is not '+' or '-' (typically '.' or ',')." << EidosTerminate(nullptr);
0 commit comments