The Role of Sample Size on Interpretation of the Result in Applied Research A Study on the Analysis of Regression Models

Document Type : علمی - پژوهشی

Author

Departement of Statistics , Payame Noor university. Tehran. Iran

Abstract

Many applied types of research have been carried out using statistical hypothesis tests and many researchers reject or accept the research hypothesis only by looking at the p-value. On the other hand, due to the relationship between the p-value and the sample size, in large samples, despite the small size of the work, the hypothesis is accepted with very high confidence, and the researcher relies solely on the p-value to support practical insignificance results. Thus, many research studies can be classified as research that has many tests of meaningful assumption but lacks practicality and scientific significance. In this paper, first, considering the issue of big data, the issue of data volume and data diversity in big data from a statistical perspective, then the p-value, effect size, and confidence interval as three decision criteria in hypothetical tests on different samples in the range (173-19361) The impact of big data on these three indicators has been examined and in particular. The results showed: A large sample is not an advantage to increasing reliability in hypothetical tests. It can make meaningful claims that are not of practical importance and in small samples fall into the category of random effects and sampling error. Also, the effect size is not affected by the sample size and converges with increasing constant sample size. Finally, the data showed that the confidence interval is visually better than other indicators.

Keywords


  1. اسماعیلی، حمید؛ مینا توحیدی؛ سید روح‌اله روزگار و مهدی امیری (1389)، «P−value اصلاح شده، معیاری بهتر از P−value معمولی در فضای پارامتری محدودشده»، دهمین کنفرانس آمار ایران.
  2. ‏اسماعیلی، حمید؛ مینا توحیدی؛ سید روح‌اله روزگار و مهدی امیری (1390)، «−P مقدار معمولی و اصلاح شده، چگونه بهتر قضاوت کنیم؟». مجله علوم آماری، س5، ش۱، ص1−
  3. بازرگان‎ لاری، عبدالرضا (۱۳۸۴)‏، رگرسیون خطی کاربردی، شیراز: مرکز نشر دانشگاه شیراز.
  4. ‏‫برومیده، علی‌اکبر و حسن شاهقلیان (1383)، «با برخی از اشتباهات رایج در تحلیل‎های آماری آشنا شویم»، مجله اندیشه آماری، س9، ش1، ص23−
  5. پاک‌گوهر، علیرضا (1395)، «مقایسه کارایی روش‎های رده‌بندی کننده رگرسیون لجستیک و رگرسیون درختی برای متغیر وابسته باینری»، نشریه گستره علوم آماری، س1، ش2، ص7−
  6. سنمی علمداری، یعقوب (1395)، «مروری بر کلان داده‎ها BIG DATA». اولین همایش ملی نگرشی نوین در مهندسی برق و کامپیوتر.
  7. عارفی‌اصل، سولماز (1397)، «کلان داده، چالش و فرصتی بزرگ پیش روی حرفه حسابداری و حسابرسی». شانزدهمین همایش ملی حسابداری ایران.
  8. ‏‫شریفیان، نسترن و امید خزاعی (۱۳۹۱)‏، «توزیع p−مقدار تحت درست بودن فرض مقابل»، مقاله ارائه شده در چهل و سومین کنفرانس ریاضی کشور. دانشگاه تبریز.
  9. Andrade, C. (2019), The P value and statistical significance: misunderstandings, explanations, challenges, and alternatives, Indian journal of psychological medicine, 41(3), 210−
  10. Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R. & Johnson, V. E. (2018), Redefine statistical significance. Nature human behaviour, 2(1), 6-10.
  11. Betensky, R. A. (2019), The p-value requires context, not a threshold, The American Statistician, 73(sup1), 115-117.
  12. Cannon, Edmund Stuart & Cipriani, Giam Pietro (2006), Euro-illusion: A natural experiment, Journal of Money, Credit, and Banking, 38(5), 1391-1403.
  13. Chandler, Brian E; Myers, Matthew; Atkinson, Jennifer E; Bryer, Tom; Retting, Richard; Smithline, Jeff. Venglar, Steven P. (2013), Signalized Intersections Informational Guide. United States. Federal Highway Administration, Office of Safety.
  14. Chatfield, C. (1995), Problem Solving: A Statistician’s Guide, Chapman & Hall/CRC.
  15. Coenders, Germa & Pawlowsky-Glahn, Vera (2020), On interpretations of tests and effect sizes in regression models with a compositional predictor, SORT-Statistics and Operations Research Transactions, 201-220.
  16. Cohen, Jacob (1992), Things I have learned (so far), Presented at the Annual Convention of the American Psychological Association, 98th, Aug, 1990, Boston, MA, US; Presented at the aforementioned conference, American Psychological Association.
  17. Disdier, Anne−Célia; & Head, Keith (2008), The puzzling persistence of the distance effect on bilateral trade, The Review of Economics and statistics, 90(1), 37−
  18. Ghose, Anindya & Yao, Yuliang (2011), Using transaction prices to re-examine price dispersion in electronic markets, Information Systems Research, 22(2), 269-288.
  19. Goolsbee, Austan & Guryan, Jonathan (2006), The impact of Internet subsidies in public schools, The Review of Economics and Statistics, 88(2), 336-347.
  20. Greene, WH. (2003), Econometric analysis, 4th edn Prentice-Hall, Upper Saddle River, NJ.
  21. Hubbard, Raymond & Armstrong, J. Scott. (2006), Why we don’t really know what statistical significance means: Implications for educators, Journal of Marketing Education, 28(2), 114−
  22. Kafadar, K. (2021), Statistical significance, p−values and replicability, The Annals of Applied Statistics, 15(3), 1081−
  23. Kaisler, S. (2013), "Big data: Issues and challenges moving forward," 46th Hawaii International Conference on System Sciences (HICSS), IEEE.
  24. Kiarash Tanha; Neda Mohammadi & Leila Janani (2017), P−value: What is and what is not, Medical Journal Of the Islamic Republic of Iran, (1).
  25. Overby, Eric & Jap, Sandy. (2009), Electronic and physical market channels: A multiyear investigation in a market for products of uncertain quality, Management Science, 55(6), 940-957.
  26. Pakgohar, Alireza; Tabrizi, Reza Sigari; Khalili, Mohadeseh & Esmaeili, Alireza (2011), The role of human factor in incidence and severity of road crashes based on the CART and LR regression: a data mining approach, Procedia Computer Science, 3, 764-769.
  27. Rory, Icompliment (2020), Effect Size Is Just as Important as P-Value. Emergency Medicine News, 9.
  28. Tukey, John W. (1991), The philosophy of multiple comparisons, Statistical science, 100-116.
  29. Vittinghoff, Eric; Glidden, David V; Shiboski, Stephen, C. & McCulloch, Charles, E. (2011), Regression methods in biostatistics: linear, logistic, survival and repeated measures models, Springer Science & Business Media.
  30. Windmeijer, F., Liang, X., Hartwig, F. P. & Bowden, J. (2021), The confidence interval method for selecting valid instrumental variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(4), 752-776.
  31. Zikopoulos, C. Eaton, D. Deroos, T. Deutsch and G. Lapis (2012), Undrestanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, United State of America: Mc Graw Hill Companies.