CapgeminiPosted Apr 1, 2026Updated Apr 1, 2026Source: careers.capgemini.com
Open Role
Verified listing with direct application source
Data Analyst
C
Capgemini
Direct application source
Develop and maintain PySpark data pipelines
Open
Apply NowEligibility Criteria
Bachelor’s degree in Computer Science, IT, Mathematics, Statistics, or related field Strong analytical and problem-solving skills Ability to work with large datasets Good communication skills Basic programming or data handling knowledge preferred Ability to work in team environment Candidates are expected to work with large datasets and support business decisions through data analysis.
Responsibilities
- Develop and maintain PySpark data pipelines
- Process and transform large datasets
- Perform data validation and cleansing
- Write efficient Python and PySpark code
- Ensure data quality and reliability
- Work with distributed computing systems
- Support business intelligence and reporting
- Collaborate with data engineering teams
- This role focuses on building ETL/ELT workflows and managing large-scale data processing environments.
- Job details
- Capgemini is hiring Data Analysts for its Hyderabad office to support data engineering and analytics workflows. The role involves developing data pipelines, processing large datasets, ensuring data quality, and supporting reporting systems. Candidates will work with distributed computing tools such as Spark and Databricks to manage enterprise-scale data environments.
Skills
- Ability to work with large datasets
- Python
- PySpark
- SQL
- ETL
- ELT workflows
- Data transformation
- Data validation
- Data pipelines
- Data analysis
- Problem-solving
- Communication skills
- The role requires working with distributed computing tools and transforming large datasets using technologies like Python and PySpark.
Education
- Bachelor’s degree in:
- Computer Science
- Information Technology
- Mathematics
- Statistics
- Data Science
- Engineering
- Or related discipline