You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some of the tools and scripts in this gem use Ruby's `Logger` class to print information to `$stderr`. By default, the log level is set to `Logger::INFO`. For more verbose information, you can set the `LOG_LEVEL` environment variable to `DEBUG`:
@@ -124,23 +136,14 @@ To index into Solr, GeoCombine requires a Solr instance that is running the
124
136
$ bundle exec rake geocombine:index
125
137
```
126
138
127
-
Indexes the `geoblacklight.json` files in cloned repositories to a Solr index running at http://127.0.0.1:8983/solr
128
-
129
-
##### Custom Solr location
139
+
If Blacklight is installed in the ruby environment and a solr index is configured, the rake task will use the solr index configured in the Blacklight application (this is the case when invoking GeoCombine from your GeoBlacklight installation). If Blacklight is unavailable, the rake task will try to find a Solr instance running at `http://localhost:8983/solr/blacklight-core`.
130
140
131
-
Solr location can also be specified by an environment variable `SOLR_URL`.
141
+
You can also set a the Solr instance URL using `SOLR_URL`:
### Harvesting and indexing documents from GeoBlacklight sites
145
148
146
149
GeoCombine provides a Harvester class and rake task to harvest and index content from GeoBlacklight sites (or any site that follows the Blacklight API format). Given that the configurations can change from consumer to consumer and site to site, the class provides a relatively simple configuration API. This can be configured in an initializer, a wrapping rake task, or any other ruby context where the rake task our class would be invoked.
@@ -186,10 +189,6 @@ Crawl delays can be configured (in seconds) either globally for all sites or on
186
189
187
190
Solr's commitWithin option can be configured (in milliseconds) by passing a value under the commit_within key.
188
191
189
-
##### Debugging (default: false)
190
-
191
-
The harvester and indexer will only `puts` content when errors happen. It is possible to see some progress information by setting the debug configuration option.
192
-
193
192
#### Transforming Documents
194
193
195
194
You may need to transform documents that are harvested for various purposes (removing fields, adding fields, omitting a document all together, etc). You can configure some ruby code (a proc) that will take the document in, transform it, and return the transformed document. By default the indexer will remove the `score`, `timestamp`, and `_version_` fields from the documents harvested. If you provide your own transformer, you'll likely want to remove these fields in addition to the other transformations you provide.
0 commit comments