1. Spark和SnappyData独立运行:
启动Spark
cd /opt/spark-2.1.1-bin-hadoop2.7/sbin./start-master.sh./start-slave.sh --master=spark://locahost:7077
启动SnappyData:
mkdir -p /opt/snappydata-1.0.1-bin/work/localhost-locator-1mkdir -p /opt/snappydata-1.0.1-bin/work/localhost-server-1mkdir -p /opt/snappydata-1.0.1-bin/work/localhost-lead-1cd /opt/snappydata-1.0.1-bin/bin/../snappy locator start -dir=/opt/snappydata-1.0.1-bin/work/localhost-locator-1../snappy server start -dir=/opt/snappydata-1.0.1-bin/work/localhost-server-1 -locators=localhost[10334] -heap-size=8g../snappy leader start -dir=/opt/snappydata-1.0.1-bin/work/localhost-lead-1 -locators=localhost[10334] -spark.executor.cores=4
运行example:
package org.apache.spark.examples.snappydataimport org.apache.spark.sql.{SnappySession, SparkSession}/** * This example shows how an application can interact with SnappyStore in Split cluster mode. * By this mode an application can access metastore of an existing running SnappyStore. Hence it can * query tables, write to tables which reside in a SnappyStore. * * To run this example you need to set up a Snappy Cluster first . To do the same, follow the steps * mentioned below. * * 1. Go to SNAPPY_HOME. Your Snappy installation directory. * * 2. Start a Snappy cluster * ./sbin/snappy-start-all.sh * This will start a simple cluster with one data node, one lead node and a locator * * 3. Open Snappy Shell * ./bin/snappy-sql * This will open Snappy shell which can be used to create and query tables. * * 4. Connect to the Snappy Cluster. On the shell prompt type * connect client 'localhost:1527'; * * 5. Create a column table and insert some rows in SnappyStore. Type the followings in Snappy Shell. * * CREATE TABLE SNAPPY_COL_TABLE(r1 Integer, r2 Integer) USING COLUMN; * * insert into SNAPPY_COL_TABLE VALUES(1,1); * insert into SNAPPY_COL_TABLE VALUES(2,2); * * 6. Run this example to see how this program interacts with the Snappy Cluster * table (SNAPPY_COL_TABLE) that we created. This program also creates a table in SnappyStore. * After running this example you can also query the table from Snappy shell * e.g. select count(*) from TestColumnTable. * * bin/run-example snappydata.SmartConnectorExample * */object SmartConnectorExample { def main(args: Array[String]): Unit = { val builder = SparkSession .builder .appName("SmartConnectorExample") .master("spark://localhost:7077") // snappydata.connection property enables the application to interact with SnappyData store .config("snappydata.connection", "localhost:1527") args.foreach( prop => { val params = prop.split("=") builder.config(params(0), params(1)) }) val spark: SparkSession = builder .getOrCreate val snSession = new SnappySession(spark.sparkContext) println("\n\n #### Reading from the SnappyStore table SNAPPY_COL_TABLE #### \n") val colTable = snSession.table("SNAPPY_COL_TABLE") colTable.show(10) println(" #### Creating a table TestColumnTable #### \n") snSession.dropTable("TestColumnTable", ifExists = true) // Creating a table from a DataFrame val dataFrame = snSession.range(1000).selectExpr("id", "floor(rand() * 10000) as k") snSession.sql("create table TestColumnTable (id bigint not null, k bigint not null) using column") dataFrame.write.insertInto("TestColumnTable") println(" #### Write to table completed. ### \n\n" + "Now you can query table TestColumnTable using $SNAPPY_HOME/bin/snappy-shell") }}
2. Spark和SnappyData集成运行:
Getting Started with your Spark Distribution
If you are a Spark developer and already using Spark 2.1.1 the fastest way to work with SnappyData is to add SnappyData as a dependency. For instance, using the package
option in the Spark shell.
Open a command terminal, go to the location of the Spark installation directory, and enter the following:
$ cd# Create a directory for SnappyData artifacts$ mkdir quickstartdatadir$ ./bin/spark-shell --conf spark.snappydata.store.sys-disk-dir=quickstartdatadir --conf spark.snappydata.store.log-file=quickstartdatadir/quickstart.log --packages "SnappyDataInc:snappydata:1.0.1-s_2.11"
This opens the Spark shell and downloads the relevant SnappyData files to your local machine. Depending on your network connection speed, it may take some time to download the files.
All SnappyData metadata, as well as persistent data, is stored in the directory quickstartdatadir. The spark-shell can now be used to work with SnappyData using and .How to Use Spark-shell to run snappy-sql
After open spark shell, we must import snappy:
scala> import org.apache.spark.sql.{SnappySession, SparkSession}scala> val snappy = new org.apache.spark.sql.SnappySession(spark.sparkContext)
then we can run snappy sql:
scala> snappy.sql("CREATE TABLE SNAPPY_COL_TABLE(r1 Integer, r2 Integer) USING COLUMN")scala> snappy.sql("insert into SNAPPY_COL_TABLE VALUES(1,1)")scala> snappy.sql("insert into SNAPPY_COL_TABLE VALUES(2,2)")scala> snappy.sql("select count(*) from SNAPPY_COL_TABLE")