Panier achat et Règles d’association

Introduction

Quand on vend des produits que ce soit au supermarché ou sur un site web, la notion de panier d’achat du consommateur intervient immédiatement.

Le consommateur a un comportement que l’on peut analyser à travers ses achats. Son comportement bien qu’individuel peut être régit par une règle plus globale. Il est ainsi possible de déterminer les offres qui plairont efficacement aux consommateurs.

Tout d’abord, qu’appel t’on un panier d’achat ?

Regardons un exemple extrait d’une base de données pour nous rendre compte,de ce que cela peut représenter:

Client Panier
1808  tropical.fruit, whole.milk, rolls.buns, citrus.fruit, sugar, meat, candy, semi.finished.bread, napkins, long.life.bakery.product
2552 tropical.fruit,whole.milk,other.vegetables, pot.plants,butter, chocolate,root.vegetables,coffee, shopping.bags,female.sanitary.products, hygiene.articles
2300 pip.fruit,other.vegetables,frankfurter, fruit.vegetable.juice,sausage, pork,flour,white.wine,hygiene.articles, long.life.bakery.product

Nous avons trois paniers de trois clients différents. Du lait, des fruits, des légumes, des jus de fruits sont parmi les produits achetés par ces clients. Cela correspond finalement à ce que chacun d’entre nous pouvons acheter lorsque nous faisons nos courses. 

On se rend rapidement compte que l’on peut essayer de déterminer des profils de consommations. Avec une approche plus globale, on peut efficacement augmenter la consommation des produits en connaissant les produits qui s’assemblent bien.

Un exemple très commun que l’on revoit tous les ans en France est pour la chandeleur au 2 février de chaque année. Les Français en cette période font des crêpes. Pour faire des crêpes il faut du lait , des œufs , de la farine , du beurrer, du sucres, de la confiture et de la patte à tartiner. Il n’est pas rare de voir sur les étalages des supermarchés pendant cette période de l’année proposer des promotions sur le lait, la farine et les œufs.

C’est effectivement un exemple assez facile. Il y a des outils pour chercher efficacement dans les bases de données des paniers de consommateurs et pour fabriquer des règles efficaces.

Les règles d’association

Les règles d’association font partie des règles utilisées par tous les groupes commerciaux compétents.

Avant de voir un exemple, nous nous attardons sur trois indices qui permettent de qualifier le résultat retourné par les algorithmes.

L’indice de support est la probabilité que l’on achète uniquement tous les produits ensemble dans la base de données. C’est assez rare sur des paniers complexes.

L’indice de confiance est  la probabilité que sachant que des produits sont dans une transaction on achète aussi sont association. Plus grand, mieux c’est.

L’indice du lift est la corrélation de significativité de l’association au sein de la base de données. Il faut qu’il soit supérieur a 1.

lhs rhs support confidence lift
whole.milk,

chocolate,

yogurt,

root.vegetables,

waffles

sliced.cheese 0.001282709 1 19.297030
other.vegetables,

rolls.buns,

yogurt,

sausage,

pastry

whole.milk 0.004104669 1 2.182531
citrus.fruit,

chocolate, oil

other.vegetables 0.003335044 1 2.655313
citrus.fruit,

pork,

cream.cheese.

whole.milk 0.002565418 1 2.182531
other.vegetables,

rolls.buns,

yogurt,pastry

whole.milk 0.008978964 0.8139535 1.776479
other.vegetables, beef,

bottled.water, 

detergent

soft.cheese 0.001282709 0.8333333 22.09751

Cette liste correspond en première colonne aux produits que l’on peut associer pour acheter le produit en deuxième colonne.

Dans cette exemple, on se rend compte de la difficulté d’une telle démarche dans une base complexe. Ces règles brutes demandent à être retravaillé systématiquement et prennent un temps considérable pour être utilisable.

En regardant l’exemple du lait, on s’aperçoit qu’il y a plusieurs possibilités pour que la vente soit améliorée. On pourrait se dire qu’en prenant juste les légumes et des petits pains cela suffirait, mais c’est une erreur. C’est bien l’association avec l’ensemble des ingrédients qui rend le panier cohérent pour le consommateur. Il y a plusieurs segments de consommateurs à qui on veut plaire en même temps d’où le choix cornélien pour constituer des offres commerciales qui feront mouche à chaque fois.

Inverser le processus

On a vu que les règles d’association donne un résultat brut et cohérent sur les paniers d’achat. Mais un commerce ne peut pas toujours fonctionner de la sorte. Il peut, par exemple, arrivé d’avoir des stocks de produits important à vendre et dont les règles d’association ne paraissent pas aider.

En utilisant une simple régression logistique ou une analyse de la variance on peut chercher dans l’ensemble des paniers , les produits dont l’action est significative. La description du modèle est suffisante pour valider à cout sûr une telle association de produits.

Dans notre exemple, nous cherchons le moyen le plus sure d’avoir une canette de bière dans le panier en choisissant d’autres produits.

L’anova donnera la liste suivante pour une significativité <1% (très bon, 99 % sure que ca colle):

whole.milk, rolls.buns,  brown.bread,  root.vegetables,  pastry, ham,  red.blush.wine,  sugar,    ice.cream ,   frozen.vegetables,  salty.snack ,    canned.vegetables

En regardant ce panier, on a certainement trouvé un segment de consommateur. C’est un consommateur qui ne cuisine pas ou très peu,  et qui aime le sucré.

Conclusion

En travaillant efficacement avec des données de panier on peut déterminer des segments de clientèles par rapport aux produits qu’ils consomment.

Le dernier exemple est frappant. On arrive à dessiner rapidement un dessin de notre panier et du consommateur qu’il pourrait être.

 

Prédire le résultat du championnat de Ligue1 Francaise

Gagné , perdre ou avoir match nul sont les trois résultats du football. Pour simuler un championnat, dans ce cadre, on peut utiliser des modèles simple et performant comme un régression logistique.

Regardons au 15/11/2020 , le classement à ce moment du championnat:

Club V N P Points
Paris Saint-Germain 8 0 2 24
Lille OSC 5 4 1 19
Stade Rennais FC 5 3 2 18
Olympique de Marseille 5 3 1 18
AS Monaco 5 2 3 17
OGC Nice 5 2 3 17
Olympique Lyonnais 4 5 1 17
Montpellier HSC 5 2 3 17
Angers SCO 5 1 4 16
FC Metz 4 3 3 15
RC Lens 4 2 2 14
Girondins de Bordeaux 3 3 4 12
Stade Brestois 29 4 0 6 12
FC Nantes 3 2 4 11
AS Saint-Étienne 3 1 6 10
Stade de Reims 2 3 5 9
FC Lorient 2 2 6 8
Nîmes Olympique 2 2 6 8
RC Strasbourg 2 0 8 6
Dijon FCO 0 4 6 4

En prenant ces données comme base de notre modèle de régression, le résultat final devrait être:

Club V N P Points
Paris Saint-Germain 35 0 3 105
Montpellier HSC 25 3 10 78
Stade Rennais FC 18 17 3 71
Lille OSC 17 16 5 67
AS Monaco 18 10 10 64
Olympique de Marseille 17 13 8 64
RC Lens 15 17 6 62
Angers SCO 18 5 15 59
Olympique Lyonnais 12 22 4 58
FC Metz 14 16 8 58
OGC Nice 14 10 14 52
Girondins de Bordeaux 13 10 15 49
AS Saint-Étienne 15 4 19 49
FC Nantes 11 13 14 46
Stade de Reims 12 7 19 43
Nîmes Olympique 12 5 21 41
Stade Brestois 29 10 0 28 30
FC Lorient 6 3 29 21
Dijon FCO 0 18 20 18
RC Strasbourg 3 1 34 10

Si on ajoute à la création du modèle, l’année précédente pour affiner notre modèle, le classement final reste identique:

Club V N P Points
Paris Saint-Germain 35 0 3 105
Montpellier HSC 25 3 10 78
Stade Rennais FC 18 17 3 71
Lille OSC 17 16 5 67
AS Monaco 18 10 10 64
Olympique de Marseille 17 13 8 64
RC Lens 15 17 6 62
Angers SCO 18 5 15 59
Olympique Lyonnais 12 22 4 58
FC Metz 14 16 8 58
OGC Nice 14 10 14 52
Girondins de Bordeaux 13 10 15 49
AS Saint-Étienne 15 4 19 49
FC Nantes 11 13 14 46
Stade de Reims 12 7 19 43
Nîmes Olympique 12 5 21 41
Stade Brestois 29 10 0 28 30
FC Lorient 6 3 29 21
Dijon FCO 0 18 20 18
RC Strasbourg 3 1 34 10

Nous verrons bien à la fin de la saison le classement final.

Rugby Activities

The main mission of video analysts is to carry out the coding, individually and collectively, of matches and training (« which player has achieved what at which time »). This work is carried out using sophisticated software allowing both sequencing and also the management, compilation and visualization of the data collected. Some tools, especially those dedicated to internal communication, can nevertheless be used by all club members. The integration of all these tools is part of an ever more intense quest to master the environment and therefore to achieve results.

The results of these analyzes can be reinterpreted to give this information a new face.
For example, we can process the match and compare all the players in relation to their activities. Once compared, we can group them together. This is what provides the following image from the final of the French rugby championship in 2018:

We immediately see the interest of such a solution. Here the grouping, represented by the red lines, is done on the whole match. It can also be performed throughout the match (after the first 30 events).

This same data can also be used to analyze the surface aspect and coordinates of a match.
Example of the geographic analysis of an entire match:

We can also see the same information at the level of a player:

Or we can see the dynamic evolution of a match:

 

Sports Solution.

Sports Solution.

We offer some analytics solution, all your analytics managed on a dedicated platform. From injuries detection, players analytics, training optimization, player selection or real time match optimization.

We will streamline your data management while automatically updating your website, mobile application with new analytics and Game Schedule in realtime.

We have huge implementation experience on different software solution at every level.

Our approach is iterative, we deliver just the solution needed to target all our energy in the quality of our work.


Performance Prediction.

One of the expanding areas necessitating good predictive accuracy is sport prediction, due to the large monetary amounts involved in betting. Club managers and owners are striving for classification models so that they can understand and formulate strategies needed to win matches. These models are based on numerous factors involved in the games, such as the results of historical matches, player performance indicators, and opposition information.


Visualization Solutions

With easy-to-understand visualization, we offers compact summary of the game, players performance and tactics. We are to enable coaches to analyze physical/technical/tactical performances of the players/teams/game all before the sweat goes dry.


Case study : Super Rugby Union.

We don’t need to explain what is this extraordinary Rugby championship. In this study, we cover matchs and players data from 2014 to 2016. This use case corresponds to a global view of rugby analytics. It doesn’t cover training or medical analytics that is day to day life of a team. Our experience can be mixed to more particular information and cover more analytics. The goals it to show analytics use case. From simple view as characteristics of players or to predict performance.


Use case : Compare your Players characteristics

There are more than 1000 players identified in the Super Rugby union. This data is composed by beginner or experienced players. It represent more than 100 competition matches. The presentation of the players is statically compared. It is strange in first approach . But we should consider that each player are different and have their speciality. It is exactly what the following map represent.

We have selected a short list of player to have the capability to see clearly the names of the players. In extracting these informations, we categorize the style of players in three area. The first area corresponds to run, scrums and mauls. It is on the right part of the map. The players on thhis area has a capability oriented on forward roles. The player named Keven Mealamu, Allblack and one of the best hooker in New Zealand, is one of the best to make runs, mauls or scrums. It is exactly what we need for a Hooker. The map appears more clear. The second Area corresponds to the Passes, Possessions and Turnovers. Clearly, it describes players who are playing quickly in the field. If we look the player named Sosene Anesi , we can extrapolate his capacity regarding his role in the game. He is a Wing or a Fullback. He is completely corresponding by this kind of players that need free playing area to avance. The third area describes players that have the capacity to make Kick, Goals and Drop. These players win by their foot. Kick is one of the aspect of a rugby game. It is why this specialities is not open to all.

Generic placeholder image

Use case : Comparing Full Back Players characteristics

This map is based on the same orientation offered by the previous screen. By example Israel Dagg make more run than Adam Ashley Copper. This two platers have their proper game style. When Israel Dagg brave other player, Ashley Copper should select to make a long kick above the defence.

Generic placeholder image

Use case : Team selection

The subject of team selection is more than a simple analysis. The impact of a player or a group of player needs a modelisation of a match. In this use case, we have selected the Super Rugby match of : Hurricanes agains Sharks in 2015. In this match, Hurricanes wins 32 – 24. The team of Hurricanes is important. A lot of great players plays in this team. hurricanes comes from Wellington in New Zealand. The Sharks comes from Durban in South Africa. We made a simulation based on different match to construct a complex model. This model has given a score for this match relatively close to the final result. Hurricanes Sharks

Hurricanes Sharks
Real 32 24
Estimate 28 22

Th objective of this sue case is to change the sharks team to win the match. In selecting the best player of certain position, we should obtain a better score for the sharks. The sharks was based on this team composition:

Generic placeholder image

After some research in the Championship, we get the best players at certain position. We select a least one player by line. The new Sharks team should be based on this team composition:

Generic placeholder image

This composition is not realistic.But it permits to analyse what should be the team to change the score efficiently. We can compare each team to see the difference between Huriccanes,Sharks and new Sharks. By evidence , Sharks an Hurricanes have different style game. Hurricanes have a game more based on Possession and Runs. Where Sharks prefer Kicks, Pilfers and Turnovers. The model has clearly make a choice to increase the capacity of Sharks team on Possesion and Runs.

Research Methods and Statistics

Research Methods and Statistics.

The Research Methodology and Statistical Reasoning includes topics ranging from what is a variable to, where can one use a two-way ANOVA.

Statistics are widely used in social sciences, business, and daily life. Given the pervasive use of statistics, this tool aims to train participants in the rationale underlying the use of statistics.

This solution aims to explain when to apply which statistical procedure, the concepts that govern these procedures, common errors when using statistics, and how to get the best analysis out of your data.

Research methodology is used a base to explain statistical reasoning.

Designing research methods requires knowledge about various methods and understanding data.

The comprehensive nature of the solution ensures that professionals are not only able to understand, but also apply the content.

The research design has several possibilities. First, you must decide if you are doing quantitative, qualitative, or mixed methods research. In a quantitative study, you are assessing participants’ responses on a measure. For example, participants can endorse their level of agreement on some scale. A qualitative design is a typically a semi-structured interview which gets transcribed, and the themes among the participants are derived. A mixed methods project is a mixture of both a quantitative and qualitative study.

Predict violent crime for 2013 at chester city , PA

Predict violent crime for 2013 at chester city , PA

This is a study of crime prediction at chester city, PA. Prediction is an hard task. With a low data content the must is to use an exponential method to predict the violent crimes rate. For 2013 , the violent crimes should decrease lowly . It is a tendency.

Predicted crimes value by 100k hab are

Real value Predict for 2013
Violent crime 3174 3165.4
Murder and nonnegligent manslaughter 64 64.2
Forcible rape 50 51.7
Robbery 637 635
Aggravated assault 2423 2414

Spatial modeling

Spatial modeling

Variogram

Experimental variogram

We begin the analysis with variogram analysis.

 Plus on s’éloigne du point plus la semi variance est importante. Il y a donc de forte disparité géographique.

Variogram at 0 °, 45 °, 90 ° and 135 °

 The variogram at 0 ° is little variant and resembles a nugget effect, compared to the other angles which have a greater tendency to increase the variance over the distant distances.

Using a variogram model

To use kriging linear modeling, we need a variogram model representative of our data. We use a spherical model to represent our variograms.  The model used is quite close to the four angles and seems a good compromise, strongly attenuating some angles and increasing others.

Kriging on a spatial grid

The result of kriging is as follows:  As seen in the exploratory analysis, we have a vertical axis that breaks down into two distinct parts. The lowest values ​​in this axis and the larger values ​​on the left and right sides. The representation on a grid, shows us clearly the distribution of the variable Y which has two weakly polluted zones in the middle of other very polluted zones.

Prediction error

 It can be seen that prediction errors are very important to the right and to the left of a vertical axis. The errors are due to the compromise made with our variogram model. On the left and right are strong values ​​close to low values. The linear model, smoothing the values, releases these errors.

Expertise

We are an Innovative Technology Leader.

Expert in Development, Data Science and Cloud solutions, we focus our energy to build your solution in the respect of delay and budget.

We manage complex solution and infrastructure for our Customers. We plan, identify and build solutions in adequation with Customers needs.


We support all Industries.

We work and deliver solutions for Great Banks across the world. We develop application for Industries, Governments, Health and Entertainment. Our work is to deliver great and performant solutions for our customers across the World.

Generic placeholder image

We make different Data Analitycs and Machine Learning. In making the study of your data, we found some pattern to discover new segments and new possibilities . With an innovative approach mixing modern software and methods, we make the best solution exploiting your data.

As a string contributor of the development world, we know which solution can correpond for a problem. In managing Flow, Complex Data Transformation or Computation, we adapt strongly the final solution to respond to these needs. The performance of our solution is a crucial point of our strategy.

Deep Data Analitycs

Deep Data Analitycs.

We are expert to sample, transform, evaluate and analyse your data.

We estimate the multifactorial densities of your systems.

We search for patterns, segments or clusters.

We make complex sampling and straightening.

We manage the design of your experiments, the quality of the processus, the performance analisys, customers satisfaction, service quality evaluation, Six Sigma and Lean Solutions .

We make factorial, hierarchical, mixed and optimal experiments plan.

We made Correspondence and Components Analisys on any kind of data.

We study saisonality and tendancy of your TimeSeries. we make complex sampling of your data including strata and quotas.

We certify your data effect by parametric and non parametric tests.

We analyse your geostatistical data to determine effects and correlations.

We make the automation of your data analyse and results on any platform or statistical software.

We optimize the data transformation and calculation for memory and time reduction.

We make Massive and Parallel computations.