| dropDuplicates {SparkR} | R Documentation | 
Returns a new SparkDataFrame with duplicate rows removed, considering only the subset of columns.
dropDuplicates(x, ...) ## S4 method for signature 'SparkDataFrame' dropDuplicates(x, ...)
| x | A SparkDataFrame. | 
| ... | A character vector of column names or string column names. If the first argument contains a character vector, the followings are ignored. | 
A SparkDataFrame with duplicate rows removed.
dropDuplicates since 2.0.0
Other SparkDataFrame functions: SparkDataFrame-class,
agg, alias,
arrange, as.data.frame,
attach,SparkDataFrame-method,
broadcast, cache,
checkpoint, coalesce,
collect, colnames,
coltypes,
createOrReplaceTempView,
crossJoin, cube,
dapplyCollect, dapply,
describe, dim,
distinct, dropna,
drop, dtypes,
except, explain,
filter, first,
gapplyCollect, gapply,
getNumPartitions, group_by,
head, hint,
histogram, insertInto,
intersect, isLocal,
isStreaming, join,
limit, localCheckpoint,
merge, mutate,
ncol, nrow,
persist, printSchema,
randomSplit, rbind,
registerTempTable, rename,
repartition, rollup,
sample, saveAsTable,
schema, selectExpr,
select, showDF,
show, storageLevel,
str, subset,
summary, take,
toJSON, unionByName,
union, unpersist,
withColumn, withWatermark,
with, write.df,
write.jdbc, write.json,
write.orc, write.parquet,
write.stream, write.text
## Not run: 
##D sparkR.session()
##D path <- "path/to/file.json"
##D df <- read.json(path)
##D dropDuplicates(df)
##D dropDuplicates(df, "col1", "col2")
##D dropDuplicates(df, c("col1", "col2"))
## End(Not run)