hireejobs
Hyderabad Jobs
Banglore Jobs
Chennai Jobs
Delhi Jobs
Ahmedabad Jobs
Mumbai Jobs
Pune Jobs
Vijayawada Jobs
Gurgaon Jobs
Noida Jobs
Oil & Gas Jobs
Banking Jobs
Construction Jobs
Top Management Jobs
IT - Software Jobs
Medical Healthcare Jobs
Purchase / Logistics Jobs
Sales
Ajax Jobs
Designing Jobs
ASP .NET Jobs
Java Jobs
MySQL Jobs
Sap hr Jobs
Software Testing Jobs
Html Jobs
IT Jobs
Logistics Jobs
Customer Service Jobs
Airport Jobs
Banking Jobs
Driver Jobs
Part Time Jobs
Civil Engineering Jobs
Accountant Jobs
Safety Officer Jobs
Nursing Jobs
Civil Engineering Jobs
Hospitality Jobs
Part Time Jobs
Security Jobs
Finance Jobs
Marketing Jobs
Shipping Jobs
Real Estate Jobs
Telecom Jobs

Spark Developer

8.00 to 10.00 Years   Bangalore   22 Mar, 2019
Job LocationBangalore
EducationNot Mentioned
SalaryNot Disclosed
IndustryIT - Software
Functional Areaeneral / Other Software
EmploymentTypeFull-time

Job Description

  • Proven track record in HDFS and Unix commands.
  • Knowledge on extracting data from different sources such as DBMS, NoSQL.
  • Managed Spark on HDFS cluster
  • Very good knowledge of Spark & Scala
  • Ability to write MapReduce & Spark jobs
  • Experience with open source technologies used in Big Data analytics like Pig, Hive, HBase, Kafka
  • PySpark knowledge is a must and handle with Impala, Hive Data Lake.
  • Extracting Text using OCR mainly or with knowledge in Tesseract would fulfill the same.
  • Willingness to learn, ability to think skeptically about problems and results, curious to explore new techniques and domains.
  • Design Patterns (GoF) would be great in developing complex PySpark algorithms.
  • Data Extraction from various file formats.
  • In-depth knowledge on Unix commands especially using HDFS, Linux(Gentoo) and Spark.
  • Hands on experience is a must on Hadoop ecosystem.
  • Ability to work independently in a quickly evolving environment.
  • Familiarity with tools like Team Foundation Server [TFS].
  • Ability to analyze data, to identify issues like gaps and inconsistencies and to do root cause analysis
  • Experience in working with customers to identify and clarify requirements
  • Ability to design solutions that are fit for purpose whilst keeping options open for future needs
  • Strong verbal and written communication skills, good customer relationship skills
  • Database SQL, NoSQL
  • Hive, Imapala, SparkSQL
  • Connecting R to Hive/ Impala
  • Unix Hadoop Admin capabilities and Spark Admin capabilities
  • Advance Linux commands
  • Multithreading and Distributed understanding and development knowledge
  • Tools/ Languages Python (numpy, pandas, scikit, sklearn, nltk)
  • Spark /PySpark (sql, ml, graphX, streaming) developing certain algorithms that are not available in PySpark
  • Notebooks (jupyter notebook, zeppeling, databricks)
  • OCR (tesseract)
  • Data Ingestion (Kafka streaming using Spark
  • Data management Data Extraction (different formats such as PDF, HTML, JPEG and so on)
  • Data Cleaning (creating text format for data scientist)
  • Data Validation (post extraction using OCR the text that are extracted has to be validated and recorded)
  • Data Loading (connection to database, FTP connection, etc.)
  • Engineering Code packaging
  • ETL & Environment configuration (versioning, packages installation)
,

Keyskills :
engineeringopensourcelanguagesdbmsenvironmentpigtoolsdataextractiondatamanagementtfsrootcauseanalysisclarifymultithreadingnosqlgoodalgithms

Spark Developer Related Jobs

© 2019 Hireejobs All Rights Reserved