We help companies to identify best practices to develop a big data strategy, what technologies might be used, how to build effective analytics. To understand how to run that process, we often practice “hacking sessions” for non-technical people with our customers: custom workshops to help identify the right use cases and what type of insight could be obtained, analyze and map the company’s data landscape, support the re-design of the business models or concepts and calculate the ROI of a possible project.



Machine learning is the art and science of giving computers the ability learn from data and solve problems without being explicitly programmed to do so, and in the last 10 years it has enabled many enterprises to overcome challenges once thought to be impossible, from giant corporations to small and innovative startups.

In this course you will learn how to leverage its power too, from theoretical aspects to current practical methodologies, also in distributed environments, with a focus on hands-on excercises and real use cases.

Machine Learning (3 days)


  • Scope and motivations
  • Terminology and workflow
  • Typical pipelines
  • Approaches and algorithms
  • Algorithms in-depth
  • Use cases and demo


  • Features Engineering
  • Advanced pipelines
  • Specialized Algorithms
  • Model selection and evaluation
  • Recommender systems
  • Use cases and demo

Large Scale

  • Spark MLlib and Big Data
  • Deep Learning

Python 4 Data Science

Python is a language with a simple syntax, and a powerful set of libraries. It is an interpreted language, with a rich programming environment, including a robust debugger and profiler. While it is easy for beginners to learn, it is widely used in many scientific areas for data exploration. This course is an introduction to the Python programming language for developers with prior programming experience using other languages. We cover data types, control flow, object-oriented programming, and introduce the set of libraries most used in the field of data visualization and data science. In this course you will learn how to leverage its power too, from theoretical aspects to current practical methodologies, also in distributed environments, with a focus on hands-on excercises and real use cases.

Artificial Intelligence

The term Artificial Intelligence is used to describe machines that mimic “cognitive” functions that humans associate with other human minds, such as “learning” and “problem solving”. The course introduces the basic techniques for implementing these features: Machime Learning, Deep Learning using neural networks and Reinforcement Learning.
In this course you will learn how to leverage its power too, from theoretical aspects to current practical methodologies, also in distributed environments, with a focus on hands-on excercises and real use cases.



Apache Spark plays a relevant role in modern big data platforms thanks to its performance, flexibility, modularity and integrations with other technologies. Its easiness of use and abstraction from distributed computing makes it accessible for a wide developers audience. However this abstraction brings easily to low performances and very bad cluster resources usage if used without understanding some core concepts.

This course aims to explain how Spark works, how to use it correctly being aware of what happens under the hood, to take advantage of its features reaching high performances and scalability.

Spark Core + Spark SQL (2 days)

day 1

  • BigData overview
  • Spark Story & Community
  • Spark vs Hadoop
  • Spark Integrations
  • Spark Build
  • Spark Deployment
  • How it works
  • API overview
  • First Job (LAB)
  • RddAPI (LAB)

day 2

  • RDD vs DataFrame vs DataSet
  • DataFrame API (LAB)
  • Final project (LAB)
  • Tips & Tricks
  • Spark SQL
  • SparkSQL vs Hive vs Impala
  • SparkSQL API
  • SparkSQL Job ( API )
  • Spark Thrift Server + BI connection
Spark Streaming + Spark ML (2 days)

day 1

  • Spark Streaming
  • Spark Streaming vs Storm vs Flink
  • Spark Streaming integrations
  • First stream Job (LAB)
  • Lamda Architecture
  • Advanced Streaming (LAB)
  • Spark for Machine learning
  • ML vs MLLib
  • Algorithms

day 2

  • Clustering: K-Means (LAB)
  • Recommendation: ALS (LAB)
  • Model Server with Lambda Architecture
  • Model selection and evaluation
  • Tips & Tricks
  • Datascience & Production
  • Spark Notebook


Apache Hadoop is an open-source framework for reliable, scalable, distributed computing. It has some main modules, like HDFS or YARN, and a lot of other Hadoop-related projects exist that could be computing engines, data storage systems, coordination services and much more!

In this complex scenario, still growing overtime, finding the right tools for the various use cases is hard.
This course aims to show the main actors in this ecosystem, what they do, how they works and how they could be use together to build complex platforms serving different business needs.

Introduction to Hadoop (2 days)

Big Data Platforms

  • Overview
  • NoSQL benchmarking

Hadoop + Cloudera components

  • Hadoop vs RDBMS
  • Hadoop in Enterprise

Data Stores

  • HDFS Advanced
  • HBase Design & DataModel
  • Solr

Data Ingestion

  • Kafka
  • Sqoop
  • Flume

Data Analysis

  • Impala
  • Hive
  • Mapreduce Concepts & Development
  • Mapreduce Input&Output


  • Security Authentication
  • Security Authorisation
  • Hadoop Processes
In-Depth Administration (1 day)

Design & Setup

  • Hardware considerations
  • Software installation
  • Launch


  • Core config
  • Sanity tests
  • Machanics
  • Resources management


  • Charts & Dashboards
  • Custom triggers, custom alerts
  • Integrations and REST API

Test & Benchmarks

  • Functionality Tests
  • Performance


  • Cloudera Manager
  • HDFS operations
  • Host maintenance
  • Disaster recovery
  • Troubleshooting


Cassandra is one of the most popular NoSQL databases for IoT, unstructured data and large OLTP workloads.
NoSQL is all about the most classical tradeoff of computer science: performance versus flexibility, and it is crucial for a project success to deal with it in the correct way from both the development and operation standpoint.

Through this course you will understand the main features of available NoSql solutions, going then in depth with Cassandra’s architecture discovering its strengths, pitfalls and best practices to make the right choices in an informed and autonomous manner.

Cassandra Core (2 days)

day 1

  • BigData and NoSQL overview
  • Installation and configuration (LAB)
  • Tools: nodetool, cqlsh, stress (LAB)
  • Replication and Consistency
  • Gossip
  • Data Model

day 2

  • CQL (LAB)
  • Write and Read Path (LAB)
  • Compaction and Tombstoning
  • Hardware best practices
Operations (1 day)

part 1

  • Environment
  • Adding nodes (LAB)
  • Remove, Decommission and Replace nodes (LAB)
  • Bootstrap and Cleanup
  • Hinted Handoff (LAB)
  • Repair (LAB)

part 2

  • Backup and Recovery
  • Security
  • DR and MultiDatacenter
  • JVM tuning
  • Disk tuning
Data Model (1 day)

part 1

  • Logical Model
  • Conceptual model
  • Physical model
  • Data Types
  • How to validate model

part 2

  • Transactions
  • Client Side Joins
  • Best practices
  • Workshop (LAB)
Datastax platform and integrations (1 day)

part 1

  • Datastax overview
  • Solr Overview
  • Search fundamentals
  • Solr Queries (LAB)
  • Inverted Index and Document Scoring Datastax integration
  • CQL Extensions (LAB) Cassandra Spark Connector Read from Cassandra

part 2

  • Write into Cassandra
  • Group by, Join and Partitioning Dataframe
  • Lambda architecture


Lightbend courses.

  • Details coming soon!


The essential concepts of Blockchain technology.

Blockchain technology uses various types of decentralized structures to solve the problem of centralizing data and knowledge. They are not a novelty but the use of them for socially revolutionary projects make Blockchain to be a promoter of the fourth industrial revolution. With this course you can have an overview and you can be aware of how technology can help your business, to solve practical problems or introduce new products on the market.

Blockchain - Basic


  • Consent Algorithms (PoW, PoS, DPos, PoB, PoE.…)
  • Wallet and security (byzantine fault, dos, 51%, sybil, double spend etc)
  • Bitcoin (history and actuality)
  • Bitcoin - lightning network (history and actuality)
  • Ethereum and smart contracts (history and development tools)
  • General view on interesting and popular projects
Blockchain - Advanced


  • Privacy coin (monero / Zcash)
  • Other DLT (IOTA and Hyperledger)
  • Exchange - trading
  • Managing a full node (bitcoin / ethereum / iota)
  • Smart contracts (LAB)
  • IOTA (LAB)


Details coming soon!

  • Docker & Kubernes
  • Google GCP
  • Amazon AWS
  • Microsoft Azure



Scala is a general-purpose programming language providing support for functional programming and a strong static type system. It interoperates seamlessly with Java. Scala is the implementation language of many important frameworks, including Apache Spark, Kafka, and Akka. It provides the core infrastructure for sites such as Twitter, Tumblr and also Coursera. In this course you will discover the elements of the functional programming style and learn how to apply them usefully in your daily programming tasks. You will also learn to use the tools and frameworks most commonly used by developers in the enterprise world.


Python is a language with a simple syntax, and a powerful set of libraries. It is an interpreted language, with a rich programming environment, including a robust debugger and profiler. While it is easy for beginners to learn, it is widely used in many areas starting from system administration and devops until web application development and data science. This course will teach you the fundamentals and contemporary usage of Python, with a focus on developing best practices in writing Python and exploring the extensible and unique parts of the language)


Improve your knowledge wherever you are and whenever you can!