In python vi sono varie librerie per effettuare data analysis, una tra le più importanti è Pandas. Pandas offre funzionalità simili a R, come ad esempio i DataFrame per la gestione dei dati in formato tabella. Purtroppo Pandas è stata implementata per gestire solo piccole quantità di dati, quando la tabella (ad es. un file csv) supera la dimensione della memoria virtuale del sistema operativo (stesso ordine di grandezza della RAM dei nostri PC) Pandas smette di funzionare. Infatti, quando la dimensione dei dati supera una certa soglia vengono usate tecnologie differenti, questo è il contesto che viene anche chiamato…

Contributing to open-source projects is a rewarding experience. Even though the contribution regards a small amount of code, being aware that those few lines will be used by many users, many times, in the future, gives a really satisfying feeling.

The workflow was simple, I went on the issue tracker of the project, I selected a new feature to implement which I knew was not impossible to implement, then I manifested my intentions and I started working on it.

The challenging part of this work was to prove its mathematical correctness. I had the initial idea since the beginning, but…

Today I’ve played a bit with Spark-NLP, the idea is to use it to extract relationships between Named Entities and their Part of Speech tags in a very scalable way.

First, I installed Spark NLP, Python 3 (Jupyter Notebook), and used Apache Spark 2.4.0.
I solved the problem of configuring it by specifying the JAVA_HOME bash variable:

export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/"

Using python, I included all the required functions with:

import pyspark.sql.functions as F
from import Pipeline
from pyspark.sql import SparkSession
from pyspark.sql.types import StringType, ArrayType
from pyspark.sql.types import StructType, StructField
from sparknlp import DocumentAssembler
from sparknlp.annotator …

Today I stumbled across on a possible event that can lead an experienced R programmer who is learning python to completely misunderstand its syntax.

The problem arises since the R programming language allows the use of the . (dot) character inside variables definition. Instead, many programming languages, such as C, Java and Python interpret the . (dot) character as an operator, such as the plus (+ character) operator.

Here’s an example, where a new function is defined with the name plot.lpp:

plot.lpp <- function(x, ...) { ... }

As this Cross-validated response explains, there is additional meaning when a function…

Antonio Ercole De Luca

Data Architect — Senior Software Developer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store