Skip to main content

Malcolm McRoberts

Malcolm McRoberts's Public Library

  • WebAPI Installation Guide

     
      
      

    Overview

     
      

     This page describes the database setup, maven build configuration and deployment of OHDSI/WebAPI to a Apache Tomcat environment. This application is Java-based, packaged as a WAR, and should be able to be deployed into any Java servlet container

  • <profiles>  <profile>  <id>webapi-postgresql</id>  <properties>  <datasource.driverClassName>org.postgresql.Driver</datasource.driverClassName>  <datasource.url>jdbc:postgresql://localhost:5432/OHDSI</datasource.url>  <datasource.username>ohdsi_app_user</datasource.username>  <datasource.password>app1</datasource.password>  <datasource.dialect>postgresql</datasource.dialect>  <datasource.ohdsi.schema>webapi</datasource.ohdsi.schema>  <flyway.datasource.driverClassName>${datasource.driverClassName}</flyway.datasource.driverClassName>  <flyway.datasource.url>${datasource.url}</flyway.datasource.url>  <flyway.datasource.username>ohdsi_admin_user</flyway.datasource.username>  <flyway.datasource.password>!PASSWORD!</flyway.datasource.password>  <flyway.locations>classpath:db/migration/postgresql</flyway.locations>  </properties>   </profile>  </profiles>

  • ATLAS Setup

     
      

     The following sections detail the process for setting up Atlas and its dependencies

  • /shared_data, here I use /shared_data/thirdparty_jars/.

      

    With direct Spark job submissions from terminal one can specify –driver-class-path argument  pointing to extra jars that should be provided to workers with the job. However this does not work with this approach, so we must configure these paths for front end and worker nodes in the spark-defaults.conf file, usually in  /opt/spark/conf directory.

      
    spark.driver.extraClassPath /shared_data/thirdparty_jars/mysql-connector-java-5.1.35-bin.jar spark.executor.extraClassPath /shared_data/thirdparty_jars/mysql-connector-java-5.1.35-bin.j

  • import pyspark_csv as pycsv sc.addPyFile('pyspark_csv.py') 
      

    Read csv data via SparkContext and convert it to DataFrame

      
    plaintext_rdd = sc.textFile('hdfs://x.x.x.x/blah.csv') dataframe = pycsv.csvToDataFrame(sqlCtx, plaintext_rdd)

  • conf/spark-defaults.conf adding both lines below.

      
    spark.driver.extraClassPath /path/to/my.jar spark.executor.extraClassPath /path/to/my.ja

  • os.system("kinit -k -t /home/USER/.keytabs/USER.keytab -p USER".replace("USER", user[0]));

    • If the user cannot be found in the passwd file, then the task-controller can not complete the switch to that user, causing a failure.

        

      YARN has a similar mechanism.

        

      In the absense of a secure cluster, the TaskTracker does NOT use this. Instead, the task actually runs as the mapred user on each node, but the JobTracker reports it as the submitting user.

        

      Your options at this point are to:

        
         
      1. Place the user in the /etc/passwd (and /etc/shadow) via something like adduser on every node that will launch tasks.
      2.  
      3. OR configure each node to do passwd map lookups via an LDAP server that stores account information via the LDAP posixAccount standards.
      4.  
        

      You don't mention which distro you're using, so it's hard to point you further than this

    • Spark on YARN  
         
      • /user/principal
      •  
      • Spark History Location
      •  
      • Spark Jar Location
1 - 20 of 2470 Next › Last »
20 items/page

Diigo is about better ways to research, share and collaborate on information. Learn more »

Join Diigo