Job Description:
Big data and distributed computing are at the heart of Adaltas. We support our partners in deploying, maintaining and optimizing some of the largest clusters in France. Adaltas is also an advocate and active contributor to Open Source, our latest focus being the new Hadoop distribution which is fully open source. This project is the TOSIT Data Platform (TDP).
During this internship, you will join the TDP project team and contribute to project development. You’ll deploy and test out-of-the-box Hadoop TDP clusters, you’ll contribute code in the form of iterative improvements to the existing code base, you’ll contribute to your knowledge of TDP in the form of customer-ready support resources, and you’ll gain hands-on experience. Core Hadoop components such as HDFS, YARN, Ranger, Spark, Hive and Zookeeper.
This will be a major challenge, with a large number of new technologies and development practices that you will have to deal with from day one. In return for your dedication, you’ll finish your internship fully equipped to take on a role in the big data realm.
Presentation of the company
Adaltas specializes in Big Data, Open Source and DevOps. We work both on-premises and in the cloud. We’re proud of our open source culture, and our contributions have helped users and companies around the world. Adaltas is built on an open culture. Our articles share our knowledge on Big Data, DevOps and many additional topics.
Skills required and acquired
Developing a TDP platform requires an understanding of Hadoop’s distributed computing model and how its core components (HDFS, YARN, etc.) work together to solve Big Data problems. Working knowledge of Linux and command line usage required.
During the internship you will learn:
- Hadoop cluster management
- Hadoop cluster security including Kerberos and SSL/TLS certificates
- High availability of services (HA).
- Scalability in Hadoop clusters
- Monitoring and health assessment of services and workplaces
- Fault-tolerant Hadoop cluster with recovery of lost data in case of infrastructure failures
- Infrastructure as Code (IaC) with DevOps tools like Ansible and Vagrant
- Code collaboration using Git on both Gitlab and Github
Responsibilities
- Familiarize yourself with the TDP distribution architecture and configuration methods
- Deploy and test secure and fault-tolerant TDP clusters
- Support the TDP knowledge base with troubleshooting guides, FAQs, and articles
- Participate in discussions on TDP program goals and roadmap strategies
- Proactively contribute ideas and code to drive iterative improvements in the TDP ecosystem
- Explore and analyze the differences between the major Hadoop distributions
Additional information
- Location: Boulogne-Billancourt, France
- Languages: French or English
- Start date: March 2022
- Duration: 6 months
Much of the digital world runs on open source software, and the Big Data industry is booming. This internship is an opportunity to gain valuable experience in both areas. TDP is now the only true open source Hadoop distribution. This is the right time to join us. As part of the TDP team, you will have the opportunity to learn one of the key models for big data processing and participate in TDP’s development and future roadmap. We believe that this is an exciting opportunity and that you will be ready for a successful career in Big Data upon completion of the internship.
Equipment available
A laptop with the following specifications:
- 32 GB of RAM
- 1TB SSD
- 8c/16t processor
A cluster that is composed of:
- 3x 28c/56t Intel Xeon Scalable Gold 6132
- 3x 192TB RAM DDR4 ECC 2666MHz
- 3x 14 SSD 480GB SATA Intel S4500 6Gbps
Kubernetes cluster and Hadoop cluster.
Salary
- Salary 1200 €/month
- Restaurant tickets
- Transport ticket
- Participation in one international conference
Past conferences we’ve attended include KubeCon, hosted by the CNCF Foundation, the Open Source Summit from the Linux Foundation, and Fosdem.
For any further information requests and to submit your application, please contact David Worms: