Skip to content

A library for Spark that helps to standardize any input data (DataFrame) to adhere to the provided schema.

License

Notifications You must be signed in to change notification settings

AbsaOSS/spark-data-standardization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Data Standardization Library

License Release Java 8

  • Dataframe in
  • Standardized Dataframe out

Usage

Needed Provided Dependencies

The library needs following dependencies to be included in your project

"org.apache.spark" %% "spark-core" % SPARK_VERSION,
"org.apache.spark" %% "spark-sql" % SPARK_VERSION,
"za.co.absa" %% s"spark-commons-spark${SPARK_MAJOR}.${SPARK_MINOR}" % "0.6.1",

Usage in SBT:

"za.co.absa" %% "spark-data-standardization" % VERSION 

Usage in Maven

Scala 2.11 Maven Central

<dependency>
   <groupId>za.co.absa</groupId>
   <artifactId>spark-data-standardization_2.11</artifactId>
   <version>${latest_version}</version>
</dependency>

Scala 2.12 Maven Central

<dependency>
   <groupId>za.co.absa</groupId>
   <artifactId>spark-data-standardization_2.12</artifactId>
   <version>${latest_version}</version>
</dependency>

Scala 2.13 Maven Central

<dependency>
   <groupId>za.co.absa</groupId>
   <artifactId>spark-data-standardization_2.13</artifactId>
   <version>${latest_version}</version>
</dependency>

Spark and Scala compatibility

Scala 2.11 Scala 2.12 Scala 2.13
Spark 2.4.7 3.2.1 3.2.1

How to Release

Please see this file for more details.

How to generate Code coverage report

sbt ++<scala.version> jacoco

Code coverage will be generated on path:

{project-root}/target/scala-{scala_version}/jacoco/report/html

About

A library for Spark that helps to standardize any input data (DataFrame) to adhere to the provided schema.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 12

Languages