Data Modeling

Data Modelling.

Our work on data modelling target the best tools and methods usable for your need.

Different models are used. We use multiple and complex regression for your data. It includes Ridge, Lasso or PLS models.

We can make complex Statistical model applicable to your data.

We model Timerseries to make prediction using different method as Exponential lissage or ARIMA model.

We take into the saisonality of your timeseries durint the modelisation process.

The spatial regression model is used to resolve geolocalized data. This kind of model is applicable to different geographical application as health or marketing.

We manage your longitudinales data in following and modelling cohorts and panels.

We model complex data structure with indirect or latent variables.

We make complex model to classify your data including Logistic Regression and Discriminant Analisys.

For complex modelling we use the Support Vector Machine.

We support unsupervised methods as Kononen Map or associations rules.

We support supervised methods as Regression Tree and Random Forest, Neural Network and Deep Learning, Boosting and Bagging.

We work on Text Mining, Streaming , Graph and Social Networks.

We make complex visualization systems including complex interactions.

We support Survival and Risk Models as Cox Models We support Multi state And Concurent Risk Models.

Data Analitycs

Deep Data Analitycs.
We are expert to sample, transform, evaluate and analyse your data. We estimate the multifactorial densities of your systems. We search for patterns, segments or clusters. We make complex sampling and straightening. We manage the design of your experiments, the quality of the processus, the performance analisys, customers satisfaction, service quality evaluation, Six Sigma and Lean Solutions . We make factorial, hierarchical, mixed and optimal experiments plan. We made Correspondence and Components Analisys on any kind of data. We study saisonality and tendancy of your TimeSeries. we make complex sampling of your data including strata and quotas. We certify your data effect by parametric and non parametric tests. We analyse your geostatistical data to determine effects and correlations. We make the automation of your data analyse and results on any platform or statistical software. We optimize the data transformation and calculation for memory and time reduction. We make Massive and Parallel computations.

Generic placeholder image
Data Modelling.
Our work on data modelling target the best tools and methods usable for your need. Different models are used. We use multiple and complex regression for your data. It includes Ridge, Lasso or PLS models. We can make complex Statistical model applicable to your data. We model Timerseries to make prediction using different method as Exponential lissage or ARIMA model. We take into the saisonality of your timeseries durint the modelisation process. The spatial regression model is used to resolve geolocalized data. This kind of model is applicable to different geographical application as health or marketing. We manage your longitudinales data in following and modelling cohorts and panels. We model complex data structure with indirect or latent variables. We make complex model to classify your data including Logistic Regression and Discriminant Analisys. For complex modelling we use the Support Vector Machine. We support unsupervised methods as Kononen Map or associations rules. We support supervised methods as Regression Tree and Random Forest, Neural Network and Deep Learning, Boosting and Bagging. We work on Text Mining, Streaming , Graph and Social Networks. We make complex visualization systems including complex interactions. We support Survival and Risk Models as Cox Models We support Multi state And Concurent Risk Models.

Capacitor Mounting Surface

Introduction

Choose the type of capacitor that has the longest life. · Estimate the average service life of the mounting surface capacitor under normal operating conditions:

  • Tension : 50 Volts
  • Température : 50° Celsus

Evaluation method of the average life of the capacitor mounting surface.

Reliability is expensive. But the cost of poor quality and reliability is even more expensive. Reliability must be supported throughout the life cycle of the product. Predicting reliability is very important for electronic components such as capacitors, diodes and resistors.

To estimate this average lifetime of the surface-mount capacitor, a specific method of acceleration of lifetime is set up. Indeed the capacitors have a life of several years, making their tests very long. Generally the acceleration of the life tests are done by accelerating the failure of the product by the addition of sustained stress.

As part of this study, two non-controllable parameters are used: temperature and voltage. By stressing these two parameters, the life of the capacitor will be much less with reasonable failure times, and will allow an estimate of the service life under average conditions of use.

The value measured in these tests is the MTTF (Mean Time To Failure) which represents the average time of failure.

Description of the model and the parameters of the study

The parameters of the study are:

  • Y : Average operating time to failure (in h)
  • 2 Production factors (controllable factors):
    • A: The type of dielectric (composition of the dielectric). 2 levels A1 and A2.
    • B: Operating temperature of the production process. 2 levels B1 and B2.
  • 2 Environmental factors (non-controllable factors):
    • C: Voltage (operating factors). 4 levels C1 = 200V, C2 = 250V, C3 = 300V and C4 = 350V.
    • D: Ambient temperature (environmental factor): 2 levels D1 = 175 ° C and D2 = 190 ° C.

The mathematical model for the factors of production is the following one taking into account only the links of the second order for the factors A and B: We use the Taguchi method where we minimize the dispersion of product performance in response to noise factors while maximizing dispersion in response to signal factors. S / B ratios can be calculated using Taguchi’s robust plan options. There are three ratios:

  • « Larger is better » to maximize the answer: S / N = -10 * log (Σ (1 / Y2) / n)
  • « Smaller is better » to minimize the response: S / N = -10 * log (Σ (Y2) / n)
  • « Nominal is best » based on mean and standard deviation

Choice of factorial plans

Production factors plan

L4 full factorial design plan for production factors:

Dielectric Production temperature Durée de vie
1 1
1 2
2 1
2 2

The model on the factors of production is the following The model has 4 levels of freedom. With AB factor, the LCP is 4. For orthogonality, the number of tests of the plane must be multiple of k (4). And to respect the number of degrees of freedom, the number of tests must be at least 4.

Experience Plan for Environmental Factors

Full L8 Factorial Experiment Plan for Environmental Factors: We begin by changing the Tension factor to 4 levels by two factors at two levels:

Tension (V)
200
250
300
350
Tension 1 Tension 2
1 1
1 2
2 1
2 1

The final experience plan is:

Tension 1 Tension 2 Température Lifetime
1 1 175
1 1 190
1 2 175
1 2 190
2 1 175
2 1 190
2 2 175
2 2 190

Postulated Model: For orthogonality, the number of tests of the plane must be multiple of k (8). And to respect the number of degrees of freedom, the number of tests must be at least 8.

Complete Factorial Plan with results

L4 + L8 Full Experience Plan Data

Tension 1 1 2 2 3 3 4 4 Signal / Bruit
Température  de fonction 1 2 1 2 1 2 1 2
Diélectrique Température de production Moyenne Ecart type Maximisation Valeur cible
1 1 430 950 560 210 310 230 250 230 396,25 2545 49,2 3,85
1 2 1080 1060 890 450 430 320 340 430 625 326,8 53,3 5,63
2 1 890 1060 680 310 310 310 250 230 505 325,5 50,5 3,8
2 2 1100 1080 1080 460 620 370 580 430 715 317,8 54,96 7,04

Analysis of the reliability of the factors of production

Model

Experience results for factors of production We take into account the average life span per group of factors, without considering the environmental factors directly.

Dielectric Production temperature Durée de vie
1 1 396,25
1 2 625
2 1 505
2 2 715

Analysis of the variance

To verify the conditions necessary for an analysis of variance, we must check the independence of the samples, the normality of the distribution and the homogeneity of the variances.

Normality of distribution

To test the normality of the distribution, we use a Shapiro-Wilk test. Statistic : 0.984044, Probability: 0,900591 . The probability of the test performed is greater than 5%, we can not reject the idea that the service life follows a normal distribution at the 95% level of confidence. The normality graph confirms the Gaussian distribution.

Independence of samples

The plot of residuals by number of observations makes it possible to highlight the independence of the samples.

Homogeneity of Variances

The plot of Residues by Production Temperature and Dielectric allows to conclude a homogeneous variance. The two graphs show residual variances of the same level.


Analysis of the ANOVA

Analysis with second order interactions.

The Pareto graph shows that the AB interaction is negligible. As part of the study, we will not consider this interaction. Analysis without AB interaction

By eliminating the AB interaction, we can evaluate the quality of each factor in the model. The results are as follows:

Source Production Sum of squares DDL Mean quadratic Report F Proba.
A:Dielectrique 9875,39 1 9875,39 112,36 0,0599
B:Température Production 48125,4 1 48125,4 547,56 0,0272
Total Error 87,8906 1 87,8906
Total (corr.) 58088,7 3

The effect of the production temperature has a probability of less than 5% which indicates that it is significantly different from zero and has a decisive effect at the 95% confidence level. 

The effects graph highlights the effect of each of the factors. It highlights and corroborates the analysis of the variance on the strong effect of the production temperature in the model. The effect of the dielectric is less important and can be neglected. The model adjusted with the dielectric factor is as follows: CONSTANT:560,313 , Diélectrique:49,6875 , Température Production:109,688. The equation of the adjusted model is: Durée de vie = 560,313 + 49,6875*Diélectrique + 109,688*Température de Production . The best combination that achieves the longest life is a dielectric factor of type 2 and a production temperature factor of level 2. This brings us to an estimated lifetime of 879.064 hours. The adjusted model without the dielectric factor is as follows: Durée de vie = 231,25 + 219,375*Température de Production. The best combination that achieves the longest life is a production temperature factor of 2. This brings us to an estimated life of 670 hours.

Environmental factors analysis

Signal / Noise Analysis

Results of experimentation

Dielectric Production temperature S / B The biggest of the best S / B Target value
1 1 49,2 3,85
1 2 53,3 5,63
2 1 50,5 3,8
2 2 54,96 7,04

We choose « the greatest the best » and the target value: · For one hand to know the best combination acting on the service life. · And on the other hand, identify combinations that offer greater resistance to environmental factors. 


 

The analysis of the values ​​and graphs of the effects makes it possible to choose the best combination respecting the instructions of robustness. Thus, the choice of the type 2 production temperature and the type 2 dielectric appears again as the best compromise.

Study of noise factors

{ » « } Voltage has a preponderant effect on the effect of noise in our model. The temperature factor of the environment is negligible in view of the Pareto graph. This is confirmed by the effects graph. The following variance analysis can also be checked:

Source Sum of squares DDL Mean quadratic Report F Proba.
A:Tension 1 1,38195E6 1 1,38195E6 29,07 0,0000
B:Tension 2 314028 1 314028 6,61 0,0158
C:Temperature Environnement 87153,1 1 87153,1 1,83 0,1866
Total Error 1,33116E6 28 47541,5
Total (corr.) 3,1143E6 31

Estimated life expectancy under normal conditions

The data used is taken from the previous choice which is the Dielectric factor at Level 2 and the Production temperature factor at level 2.

Tension Temperature Lifetime
200 175 1100
200 190 1080
250 175 1080
250 190 460
300 175 620
300 190 370
350 175 580
350 190 430

We use a survival data regression model with a normal log distribution. The result of the model estimation is as follows: Estimated Regression Model – Log-Normal The model is as follows: Duree Modele = exp(13,1167 – 0,00545509*Tension Env – 0,0281227*Temperature Env) . For a temperature of 50 ° and a voltage of 50 V, the estimate gives us: 92763 hours Which corresponds to 10 years of life. Model graph adjusted to 50V and 50 ° 

Conclusion

The analysis of the results allowed us to highlight the best compromise between the robustness and the longer life of surface response capacitors. Thus, it emerges from the foregoing elements, the surface capacitor having a type A2 dielectric and a type B2 production temperature. Based on this choice, we have determined, through a survival data regression model, that the surface response capacitor of this type has an estimated lifetime value of 92763 hours (10.6 years). Combined with the adjusted model graph, we found that the most robust of this type of surface response capacitor exceeded 130,000 hours (14.8 years).

Inspired by your needs.

Inspired by your needs.
We love the problem, not the solution. Our challenge is not building more products, but uncovering the better product to build. We start by study the problem before searching for the solution. Problems, not solutions, create a strong space for innovation.

Generic placeholder image
Make it better.
Each switch is grounded in new technology, but the reason we switched starts with old problems. In making it better, we make it more strong and more performant. The key to staying relevant to your problem and growing your needs does not come from throwing more features at them, but rather continuously uncovering problems and addressing them. This is the essence of Continuous Innovation.

Generic placeholder image
Searching strongly.
The first step is framing problem discovery conversations, not around problems, but rather around triggers that cause your needs to search others alternatives. We then attempt to unpack the causal forces that led them there, assess their current state and outcomes, and prioritize spaces for innovation.

Covid19 Some Statitics

With the Covid 19 a huge lot of data are usable for some data analysis.  It is very interesting for the Datascientist that we are. It is an incredible possibility for analysis this crisis on multiple views.

For start this kind of analysis, I would take two questions in this short article about food and Covid 19.  The first question has the more simple approach and would take care about the effect of a particular alimentation and the Covid death. We hear that obesity is an important factor of death. It is an exact sentence on a short population of people gravely sad by the Covid. But in a more important population of people is it correct. My first question is : Is there a food factor favorable to catch the Covid ?

My second question will approach the necessity of truth in country declaration. The dataset used in this study is about countries. There is no more information of region , religion, etc … We hear that some countries don’t declare reality of the Covid in their country. The reasons are unclear in this dramatic situation. But we can study the effect of a country on statistical data. My second question will be more complex to study and to understand: Is china can lie on the map effect of variables on countries ?

For the food factors, the response come with the study of the variance analysis of factors against the Covid death. And the result is clear and cannot be attacked.

Factors On Covid Death F Value Pr(>F) Effect
Alcoholic.Beverages 5.707 0.0184 Medium
Animal.Products 18.474 3.4e-05 Heavy
Animal.fats 1.314 0.2538 No
Aquatic.Products..Other 0.818 0.3675 No
Cereals…Excluding.Beer 0.006 0.9408 Never
Eggs 0.341 0.5603 Never
Fish..Seafood 0.022 0.8826 Never
Fruits…Excluding.Wine 0.000 0.9983 Never
Meat 1.173 0.2809 No
Milk…Excluding.Butter 2.634 0.1070 Little
Offals 2.389 0.1247 No
Oilcrops 0.147 0.7019 Never
Pulses 0.018 0.8941 Never
Spices 2.190 0.1414 No
Starchy.Roots 0.126 0.7232 Never
Stimulants 0.086 0.7698 Never
Sugar.Crops 0.004 0.9503 Never
Sugar…Sweeteners 0.392 0.5321 Never
Treenuts 3.417 0.0668 Little
Vegetal.Products 0.336 0.5633 Never
Vegetable.Oils 1.594 0.2090 No
Vegetables 5.757 0.0179 Medium
Miscellaneous 0.353 0.5534 Never
Obesity 0.253 0.6156 Never

Yes we have a response . Obesity is not the principal factor of death in a more global population. The winning factors are Alcoholic , Animal, Milk, Treenuts and Vegetables. You should say why vegetables and not vegetal products. There is an other aspect related to the reality of a dataset ? We don’t have information on population as age, wealth,… But clearly rich countries consume a lot of Alcoholic,vegetables, animal and treenuts. Poors countries have not the same conduct with the foods. If we see Japan or south corea , we will see more fish and sea foods.

This first response is interesting. It shows that eating behavior can be an important factor during the covid crisis. The time of the second question will talk more about countries and factors effects.

In first I would introduce a map that permits to show country against their factors.

The death direction is to the right. You should see the death orientation near the Kazakhstan country. This map is extremely clear and is completely linked with effect of factors on countries. More a country is on the left more the number of deaths are important. I let you identify and understand this map. But my work is to calculate the effect of china against other countries and variables. For that we construct exactly the same map without china and we add it as a complementary variable. In clear we use exactly the points calculated without china to replace this country on the map.

China is in red. The first point is that without china the map is relatively similar. It is a good behavior. The statistical map is stable. The second point is that china is replaced near their original position in the full map.

With the data given today and without anymore related information, we can say that china has no evidence of lie in their death declarations.

Please take care of dataset reality and missed informations.

 

Calculateur d’Amortissement Fiscal,Comptable et Dérogatoire